Science topic

Bioinformatics Analysis - Science topic

Explore the latest questions and answers in Bioinformatics Analysis, and find Bioinformatics Analysis experts.
Questions related to Bioinformatics Analysis
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I am trying to analyze the MT-ATP6 region from the 1000 Genomes Phase 3 mitochondrial chromosome VCF but I see that the VCF does not include every single position. Why is that? I assume it is due to not having any/enough variants to account for the position based on the low-coverage sequencing technology used. However, are there any other reasons for this? How can I account for these missing positional variants? I am pretty new to VCFs so any insight would be greatly appreciated, thanks!
Relevant answer
Answer
Maybe you should take into account that there are various genomic regions that are not different from the reference (therefore the nucleotides at those positions will not be "called" as variants).These positions are probably highly conserved by evolution and no alteration can be present in there. If you want to have a file with all the nucleotides in that genomic regions, the variants and the " reference positions", maybe you should take a look at the other type of vcf file, the GVCF
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I have a few sample vcfs which are not in a very good quality. They are 23andme files from OpenSNP in the following format:
rsID, chromosome no, position, genotype
I have tried remapping them using Galaxy. However, I guess the error is due to the format. The vcfs contain only SNPs.
ANY IDEAS PLEASE? How can i make it work?
These vcfs are mapped on the GRCh36/hg18 and need to be remapped on hg38.
I have a specific list of SNPs (according to the hg38) in a csv format which I need to filter from each of these vcfs after remapping.
Please suggest any alternate workflows if there are any to help me make this work.
Relevant answer
Answer
The file you are trying to liftover is not a .vcf.
You need to reorder columns to match the Variant call format, that is #chromosome, #position, #rsID, #refAllele, #altAllele, etc.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I encountered an unusual observation while constructing a nomogram using the rms package with the Cox proportional hazards model. Specifically, when Karnofsky Performance Status (KPS) is used as a alone predictor, the nomogram points for KPS decrease from high to low. However, when KPS is combined with other variables in a multivariable model, the points for KPS increase from low to high. Additionally, I've noticed that the total points vary from low to high for all variables, while the 1-year survival probability shifts from high to low.
Could anyone help clarify why this directional shift in points occurs? Are there known factors, such as interactions, scaling differences, or confounding effects, that might explain this pattern?
Relevant answer
Answer
Thank you
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Good day! The question is really complex since CRISPR do not have any exact sequence - so the question is the probability of generation of 2 repeat units, each of 23-55 bp and having a short palindromic sequence within and maximum mismatch of 20%, interspersed with a spacer sequence that in 0.6-2.5 of repeat size and that doesn't match to left and right flank of the whole sequence, in a random sequence.
Relevant answer
Answer
Estimating the probability of forming a short CRISPR with a single spacer in a random sequence involves several steps. This calculation depends on the specific sequence characteristics and the CRISPR system's requirements. Here’s a structured approach to estimate this probability:
**1. Define the Parameters
**1.1 CRISPR System Characteristics:
Spacers: Typically, spacers in CRISPR systems are around 20 nucleotides long.
Protospacer Adjacent Motif (PAM): CRISPR systems require a PAM sequence adjacent to the target site. For example, the Streptococcus pyogenes Cas9 requires the PAM sequence "NGG."
**1.2 Random Sequence Properties:
Length: Determine the length of the random sequence where you are searching for the spacer.
Nucleotide Composition: For a truly random sequence, assume equal probabilities for each nucleotide (A, T, C, G).
**2. Calculate the Probability of a Specific Spacer Sequence
**2.1 Probability of Matching a Specific Spacer:
Calculate for PAM: If the PAM sequence is required, first calculate the probability of finding this PAM sequence in the random sequence.
Probability of Spacer Sequence: For a spacer of length L nucleotides, the probability of finding a specific sequence of length L in a random sequence is:
𝑃
(
spacer
)
=
(
1
4
)
𝐿
P(spacer)=(
4
1
)
L
where
1
4
4
1
is the probability of each nucleotide occurring at a specific position, and
𝐿
L is the length of the spacer.
**2.2 Consider PAM Sequence:
Probability of PAM: For a PAM sequence of length k nucleotides, assuming equal probability for each nucleotide, the probability of finding the PAM is:
𝑃
(
PAM
)
=
(
1
4
)
𝑘
P(PAM)=(
4
1
)
k
**3. Calculate the Probability of Spacer and PAM Co-occurrence
**3.1 Independent Events:
Assuming Independence: If the presence of the spacer and PAM are independent, the combined probability of finding both in the random sequence is:
𝑃
(
spacer and PAM
)
=
𝑃
(
spacer
)
×
𝑃
(
PAM
)
P(spacer and PAM)=P(spacer)×P(PAM)
**3.2 Search Space:
Length of Random Sequence: If you are searching within a sequence of length N, the number of potential positions for the spacer and PAM is N - (L + k - 1).
**4. Estimate the Expected Number of Hits
**4.1 Expected Hits:
Calculate Expected Number: Multiply the probability of finding the spacer and PAM by the number of potential positions:
Expected Number of Hits
=
𝑃
(
spacer and PAM
)
×
(
𝑁
(
𝐿
+
𝑘
1
)
)
Expected Number of Hits=P(spacer and PAM)×(N−(L+k−1))
**4.2 Adjust for Overlaps:
Overlap: Adjust calculations if the spacer and PAM are not independent or if there are constraints on their positioning relative to each other.
Example Calculation
Assuming:
Spacer length (L) = 20 nucleotides
PAM length (k) = 3 nucleotides
Random sequence length (N) = 1000 nucleotides
Probability of Spacer:
𝑃
(
spacer
)
=
(
1
4
)
20
=
9.09
×
1
0
13
P(spacer)=(
4
1
)
20
=9.09×10
−13
Probability of PAM:
𝑃
(
PAM
)
=
(
1
4
)
3
=
0.0156
P(PAM)=(
4
1
)
3
=0.0156
Combined Probability:
𝑃
(
spacer and PAM
)
=
9.09
×
1
0
13
×
0.0156
=
1.42
×
1
0
14
P(spacer and PAM)=9.09×10
−13
×0.0156=1.42×10
−14
Expected Hits:
Expected Number of Hits
=
1.42
×
1
0
14
×
(
1000
(
20
+
3
1
)
)
1.42
×
1
0
14
×
977
1.39
×
1
0
11
Expected Number of Hits=1.42×10
−14
×(1000−(20+3−1))≈1.42×10
−14
×977≈1.39×10
−11
Conclusion
In this example, the probability of finding a specific 20-nucleotide spacer and a 3-nucleotide PAM sequence in a random 1000-nucleotide sequence is extremely low, reflecting the challenge of finding specific CRISPR target sites. Adjust parameters accordingly based on your specific requirements and sequence characteristics.
l This protocol list might provide further insights to address this issue.
  • asked a question related to Bioinformatics Analysis
Question
12 answers
Hi everyone,
Its been over 3 months that I am trying to develop a script for variant calling and RNA seq analysis for my project. I have attended quite a few workshops but it feels like a scam. I have nobody who can guide me and I really want to learn the analysis. Can anybody tell me if there are currently any short term courses for the same?
Relevant answer
Answer
This is unfortunately the pain-point of bioinformatics journey which I've been through. I can recommend this book - https://www.biostarhandbook.com/ & have a look at the companion course https://www.biostarhandbook.com/edu/course/6/.
For variant calling (germline/hereditary) look into the GATK workflow https://gatk.broadinstitute.org/hc/en-us/articles/360036194592-Getting-started-with-GATK4
For the Salmon and Deseq-2 combination this tutorial will help but you need to know some Linux & Rstudio https://www.hadriengourle.com/tutorials/rna/#import-read-counts-using-tximport
Most freeware stuff is optimized or available on Linux. Ubuntu is not a bad start. If you need a virtual machine, have a look into Docker or https://www.virtualbox.org/
You should look into develop basic skills in Linux, Rstudio and Python to run the tools. Nowadays, you don't really need java or C-level languages to run the tools. Developing algorithms are a different story though. Keep your determination, I knew nothing in 2017 when I started as well :)
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I could provide a service in bioinformatics, please contact with me
Relevant answer
Answer
Dear Guigui Man,
With years of research and development experience in the field of next-generation sequencing (NGS), Creative Biolabs has accumulated extensive experience in bioinformatics analysis to support whole genome sequencing (WGS), whole exome sequencing (WES), targeted sequencing, whole transcriptome sequencing (WTS) and immune repertoire sequencing. We can offer high-quality custom bioinformatics analysis services to meet every unique requirement of our clients.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hello, I have some raw files which extension is .d (acquired from Brucker instrument). Which platforms would you recommend to perform the bioinformatic analysis (possibly fee downloadable)? I have experience using MaxQuant but it does not recognize the .d files. Any recommendation? Thank you in advance
Relevant answer
Answer
How it can recognize the .d folder? I attempt but it looks impossible.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hello,
In the literature, there are some MS/MS results that include hypothetical proteins, which can be shorter than 40 amino acids. I can also find these when I search for an organism in the protein section of NCBI. My question is, would it be absurd if I synthetically synthesize these peptides called hypothetical proteins and test them as drug candidates in certain disease models? Or are studies like the one I mentioned feasible and being conducted? If so, what procedure should I follow? For example, when I find a hypothetical protein, should I first perform a blast and then synthesize and use it if it meets certain conditions?
Is there any chance you could share some references with me that have been done in this manner?
I hope I have been able to convey what I want to ask.
Thank you for your answers.
Relevant answer
Answer
I cannot really answer your questions, but wondered, if this will be of some help. I have seen a review article about how lactobacilli degrade milk protein, and the resulting short peptides have medicinal properties. For instance: "Many studies focused on ACE inhibitor peptides, probably
due to the ease of use of in vitro anti-ACE assays. The well-known
Val-Pro-Pro and Ile-Pro-Pro peptides are produced during milk
fermentation by some Lb. helveticus strains. ... An additional
ACE inhibitory peptide sequence (Ala-Ile-Pro-Pro-Lys-Lys-Asn-
Gln-Asp) was also identified in milk fermented by Lb. helveticus."
Raveschot, C., Cudennec, B., Coutte, F., Flahaut, C., Fremont, M., Drider, D., & Dhulster, P. (2018). Production of bioactive peptides by Lactobacillus species: from gene to application. Frontiers in Microbiology, 9, 409606.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hey everyone,
my question is maybe strange at first glance, but simple: is the rapid 16S kit's only real advantage the significantly larger 16S data amount generation? Shouldn't I be perfectly able to collect necessary strain-level diversity 16S data on the data analysis level from a total nanopore metagenome, without the PCR bias, given enough sample input? If the above thinking is correct, would you consider triple-digit ng input (below 1ug) sufficient, at least for key players of a mixed microbial community?
Just trying to understand if I really need the 16S barcoding kit since I have the native one (which I will use for total metagenome anyway)
Cheers
A
Relevant answer
Answer
Abhijeet Singh both kits offer the same multiplexing capacity, if I understand the question you're asking - both 16S kit and the native kit that we have are "24 barcoding", native / 16S.
I am rather curious about the necessity of 16S in terms of sequencing success - I can see low complexity microbial samples getting sequenced just as succcessfully with a native kit as with 16S, but without the PCR amplification bias, which in fact affects relative quantification negatively, rather than being prerequisite for it as you seem to state (becasue amplification efficiency drops steeply after 60%+ GC content of the amplicon). PCR amplification probably makes a positive difference when trying to detect low-abundance species, but I am not interested in those in this project.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am trying to predict the stability of a protein with different SNP. I tried using DUET, Predict SNP and Dynamut. The problem with DUET is that I cannot do double mutation however, it gives fast result. But Predict SNP and Dynamut takes long time to generate the result in my case.
Please suggest me other tools that can be used for the stability prediction that are accurate also convenient.
Relevant answer
Answer
u're welcome
fred
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I am new to Desmond simulations and I want to know how can I find the estimated time left for a simulation to be completed? my 2nd query is how to perform B-Factor analysis after performing simulation on Desmond? Any help in this regard will be highly appreciated.
Thanks
Relevant answer
Answer
I hope someone finds this useful, I have myself been struggling with the same problem. My solution to this is looking at the chemical time and ns/day data, which is constantly updated.
To check this go to MONITOR -> double click running MD job -> check the last entry (status -running).
Double click the last entry for a 100ns default Simulation 100000 chemical time means 100ns this might give a rough idea of percentage job progress.
Going by the ns/day data can give you a rough idea of simulation speed like 11ns/day for 100ns simulation will finish on 10th day of the run applied.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Genome editing (also called gene editing) is a group of technologies that give scientists the ability to change an organism's DNA. These technologies allow genetic material to be added, removed, or altered at particular locations in the genome. Several approaches to genome editing have been developed. A recent one is known as CRISPR-Cas9, which is short for clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9. The CRISPR-Cas9 system has generated a lot of excitement in the scientific community because it is faster, cheaper, more accurate, and more efficient than other existing genome editing methods.
Relevant answer
Answer
Certainly! Here are some well-regarded reference books that delve into CRISPR gene editing technology, offering insights from the basics to advanced applications:
1. "CRISPR-Cas: A Laboratory Manual"
Edited by Jennifer Doudna, Prashant Mali, and Samuel Sternberg
  • This manual, edited by pioneers in the field, is a comprehensive resource that covers various aspects of CRISPR-Cas systems, including detailed protocols for their use in genetic engineering, tips for experimental design, and troubleshooting advice. It's an excellent practical guide for researchers and students alike.
2. "A Crack in Creation: Gene Editing and the Unthinkable Power to Control Evolution"
By Jennifer Doudna and Samuel Sternberg
  • Written by one of the co-discoverers of CRISPR-Cas9, this book offers a compelling look at the development of CRISPR gene editing technology, its potential applications, and the ethical considerations it raises. It provides a unique perspective from researchers directly involved in the development of CRISPR.
3. "The CRISPR Generation: The Story of the World’s First Gene-Edited Babies"
By Kiran Musunuru
  • This book explores the story and science behind the creation of the world’s first gene-edited babies. It delves into the CRISPR technology used, the controversy it sparked, and the ethical debates surrounding human genome editing.
4. "Genome Editing: The Next Step in Gene Therapy"
Edited by Toni Cathomen, Matthew Hirsch, and Matthew Porteus
  • A comprehensive text that covers the broader field of genome editing, including CRISPR-Cas systems, and their potential in gene therapy. It's an insightful read for those interested in the therapeutic applications of genome editing.
5. "CRISPR: Methods and Protocols"
Edited by Magnus Lundgren, Emmanuelle Charpentier, Peter C. Fineran
  • This book provides detailed methods and protocols for researchers working with CRISPR-Cas systems, offering practical advice on the design and implementation of genome editing experiments. It is suited for laboratory researchers and includes step-by-step procedures.
These books range from technical manuals to more narrative accounts of CRISPR technology's development and implications. Depending on your interest—be it the technicalities of gene editing, the story of CRISPR's discovery, or the ethical and societal impacts—you'll find valuable information in these references.
l Check out this protocol list; it might provide additional insights for resolving the issue.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I am using QUAST in Kbase to assess the quality of my genome assemblies of bacterial isolates.
The report from QUAST provided parameters such as N50 and Mismatches. I have found their meaning in https://quast.sourceforge.net/docs/manual.html#sec3.1. And I have learned that an ideal genome is contiguous, complete, and correct.
Most studies suggest the lower the mismatches or other values are, the better the quality will be.
However, are there any absolute values/thresholds that could be used to test whether this assembly is good quality?
(Some studies showed that the threshold depends on the size of the genome and the goal of the study. Then is there any way to calculate this threshold?)
Thank you very much!
Relevant answer
Answer
No absolute thresholds. Genome assembly is still very manual work.
However, QUAST scores are very simplistic. You should invest time and explore other metrics if you care about good genome assembly.
You can start with recommendations https://www.biostars.org/p/47224/#335008 (definitely at least BUSCO, Reapr, KAT). A sort of alternative to KAT would be Merqury, which is also very popular https://github.com/marbl/merqury. I also quite like ideel https://github.com/mw55309/ideel as part of assembly QC. You can do a simple read mapping back to the assembly and look for the number of mapped reads, the percentage of properly paired reads, ... Good assembly is a lot of work any many back-and-forth iterations.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello, I've recently started exploring molecular docking applications, and I'm still in the early stages.I'd like to ask which proteins should be considered when examining the antimicrobial effects of certain molecules.
Is there a list of these proteins(that I should use as a docking protein), or are there general rules for proteins that should definitely be examined?
Also, can I perform docking not with a molecule but directly with an organism? If so, what should I look for to predict antimicrobial effects?
Could you please guide me on this?
Thank you.
Relevant answer
Answer
it's important to consider specific proteins that play crucial roles in the survival and reproduction of microorganisms. Enzymes involved in cell wall synthesis: Proteins like penicillin-binding proteins (PBPs) are crucial for bacterial cell wall formation.
DNA gyrase and topoisomerases: Involved in DNA replication and repair, these are essential targets for antimicrobial compounds.
Ribosomal proteins: Targeting bacterial ribosomes can disrupt protein synthesis. Utilize databases like the Protein Data Bank (PDB) to find crystal structures of your selected proteins. Molecular docking predictions should be validated through in vitro and in vivo experiments.for in vitro evaluation you can use microorganisms directly.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
Introduction:
I am conducting research on the bacterial composition of fecal samples from both healthy and diseased individuals using 16S sequencing. I am seeking expert guidance on the appropriate bioinformatic analysis methods for my dataset.
Objective:
My goal is to analyze the bacterial communities in fecal samples from a diseased cohort and a control group of healthy individuals, using 16S rRNA gene sequencing.
Sequencing Method:
I have employed a Nanopore sequencer to acquire full-length 16S sequences.
Alignment Method:
For the alignment process, I have used the kraken2 tool.
Database:
The standard database provided by kraken2 has been utilized for the alignment.
Output Files:
I have generated 12 sets of output files, ranging from kraken2-report01 to kraken2-report12 and kraken-output01.txt to kraken-output12.txt.
Downstream Analysis:
I am contemplating two approaches for downstream analysis:
  1. Converting the output data into biom format using kraken-biom and then analyzing it on the QIIME2 platform.
  2. Converting the output data into either OTU or ASV format for analysis using MicrobiomeAnalyst.
Questions:
  1. Is there a specific method for converting the kraken2 output into biom format? If so, could you provide the steps for this conversion?
  2. If the conversion-based approach is not advisable, what are the recommended methods for diversity analysis and identification of variable species post-kraken2 analysis?
Relevant answer
Answer
follow these steps:
1. Install the necessary tools: Ensure that you have the required dependencies installed, including Kraken2, Krona, and biom-format. You can refer to the respective documentation for installation instructions.
2. Generate Krona report: Run the following command to generate a Krona report from your Kraken2 output:
```
ktImportTaxonomy -q 1 kraken-output01.txt kraken-output02.txt ... kraken-output12.txt -o krona_report.html
```
This command will generate a Krona HTML report (`krona_report.html`) representing the taxonomic composition of your samples.
3. Convert Krona report to biom format: Next, use the following command to convert the Krona report to the biom format:
```
ktImportText -n Krona_Report krona_report.html -o biom_file.biom
```
This command will generate the biom file (`biom_file.biom`) in the desired format.
4. Import biom file into QIIME2: You can then import the biom file into QIIME2 for downstream analysis. Use the appropriate QIIME2 command to import the biom file based on your analysis requirements.
It's important to note that the biom format is commonly used in QIIME1, but QIIME2 generally uses the newer QZA or QZV formats. If you'reIf you're using QIIME2 for downstream analysis, it is recommended to convert the Kraken2 output into the QIIME2-compatible format (QZA or QZV) instead of the biom format. You can do this using the QIIME2 command-line interface or Python API. Here's a general outline of the steps:
1. Install QIIME2: Follow the installation instructions provided by QIIME2 (https://docs.qiime2.org/2021.8/install/) to set up QIIME2 on your system.
2. Convert Kraken2 output to QIIME2 format: Use the `qiime tools import` command to convert the Kraken2 output to QIIME2-compatible format. Here's an example command:
```
qiime tools import \
--input-path kraken-output01.txt \
--output-path kraken-output01.qza \
--type 'FeatureData[Taxonomy]'
```
Repeat this step for each Kraken2 output file to convert them all to the QIIME2 format.
3. Perform diversity analysis: Once you have the QZA files, you can perform diversity analysis using various QIIME2 plugins. For example, you can use the `qiime diversity` plugin to calculate alpha and beta diversity metrics. Refer to the QIIME2 documentation and tutorials for more information on available plugins and analysis options.
Regarding the identification of variable species, Kraken2 provides taxonomic assignments for each sequence read. You can analyze the Kraken2 output directly to identify the taxonomic groups that show differential abundance between your diseased cohort and control group. This can be done using statistical analysis tools such as DESeq2, edgeR, or LEfSe. These tools can help you identify taxonomic groups that are significantly differentially abundant between the two groups.
Alternatively, if you decide to convert your data to OTU or ASV format for downstream analysis, you can use tools like QIIME2, mothur, or DADA2 to perform OTU or ASV clustering. These tools provide options for denoising, quality filtering, chimera removal, and clustering to generate OTU or ASV tables that can be further analyzed for diversity analysis, differential abundance testing, and other downstream analyses.
Hope it helps
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Could someone explain to me why the p-value in the right column of the forest plot is different than the p-value in the test for effect in the subgroup?
I thought that these two p.values should be the same.
Relevant answer
Answer
Now coming to your table p-value in the right column of the forest plot is the p-value for the overall test of the treatment effect across all subgroups. It is calculated by combining the results of the individual studies in the meta-analysis. In this case, the p-value is 0.56, which is not statistically significant.
The p-value for the test for effect in the subgroup is the p-value for the test of the null hypothesis that the treatment effect in the subgroup is equal to zero. It is calculated using only the data from the studies in the subgroup. In this case, the p-value for the test for effect in the subgroup is 0.094035, which is statistically significant.
The two p-values are different because of the heterogeneity between the studies in the meta-analysis. The heterogeneity statistic (0.5) is very high, which indicates that there is a lot of variability in the treatment effects across studies. This variability could be due to a number of factors, such as different study designs, different populations of patients, and different treatment regimens.
When there is heterogeneity in the treatment effects across studies, it is more difficult to detect a significant overall treatment effect. This is because the variability in the treatment effects across studies can mask the true effect of the treatment.
In this case, the p-value for the overall test of the treatment effect is not statistically significant, but the p-value for the test for effect in the subgroup is statistically significant. This suggests that the treatment may be effective in the subgroup, but it is not possible to draw a definitive conclusion without further research.
It is important to note that a statistically significant p-value for the test for effect in a subgroup does not necessarily mean that the treatment is clinically effective in that subgroup. It is possible that the difference in the treatment effect is small or that it is not clinically meaningful.
To determine whether the treatment is clinically effective in a subgroup, it is important to consider the magnitude of the difference in the treatment effect and the clinical implications of that difference
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I read some papers mentioning that they used the HMP reference genome for protein homology search and I've also read about the HUMAnN database elsewhere. I'm wondering what's the difference.
Relevant answer
Answer
The HMP (Human Microbiome Project) database is a resource that focuses on characterizing microbial genomes and metagenomes in human body sites. It provides reference genomes for different microorganisms, aiding in the identification and study of microbial species. For example, HMP offers reference genomes for specific gut bacteria like Bacteroides fragilis, aiding in taxonomic classification and species-level analysis.
On the other hand, the HUMAnN (HMP Unified Metabolic Analysis Network) database and software tool are geared toward functional profiling of microbial communities. It leverages the HMP reference genomes to identify and quantify metabolic pathways and gene families in metagenomic data. For instance, HUMAnN can determine the presence and abundance of pathways like glycolysis or the nitrogen cycle, shedding light on the metabolic activities within a microbial community.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello everyone; I am new to R programming. I want to calculate the firmicutes to Bacteroides ratio from my OTU table. I couldn't find the command and don't know how to do it. Please guide me on this.
I put an example of my OTU table.
Relevant answer
Answer
Thank you for this...
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I know many websites have simple tools like transcription and translation available, but are there any analysis tools that researchers need that either do not exist or are not publicly available? It could be anything from algorithms to visuals. Thanks!
Relevant answer
Answer
Abhijeet Singh Thank you for your response and mentioning my earlier post! My belief is that researchers would know tools that are missing based on the fact that they would run into such problem often during their research. If there is some manual analysis task that researchers can automate, I believe that PeptiCloud can be the perfect platform to develop and make those tools publicly available. (For instance, PeptiCloud has a unique feature that allows users to further alter codon sequence of each amino acid after codon optimization with respect to a specific bacterial strain). With that being said, if you could check out PeptiCloud for yourself and see if anything could be added or improved, that would be greatly appreciated!
  • asked a question related to Bioinformatics Analysis
Question
3 answers
Hi - I'm currently working with two RNA-Seq studies; one has RNA extracted from whole blood, the other PBMCs. Eventually we want to combine these data and perform some cell-specific deconvolution to look at DEGs.
Are there any recommended methods for batch correcting these data from different sources?
Mari
Relevant answer
Answer
It is better to consider batch as a factor in the design formula. The tximport pipeline proposed by Michael Love himself offers the most useful solution. Please have a look.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
From the link https://gtexportal.org/home/datasets, under V7, I'm trying to do R/Python analyses on the Gene TPM and Transcript TPM files. But in these files (and to open them I had to use Universal Viewer since the files are too large to view with an app like NotePad), I'm seeing a bunch of ID's for samples (i.e. GTEX-1117F-0226-SM-5GZZ7), followed by transcript ID's like ENSG00000223972.4, and then a bunch of numbers like 0.02865 (and they take up like 99% of the large files). Can someone help me decipher what the numbers mean, please? And are the numbers supposed to be assigned to a specific sample ID? (The amount of letters far exceed the amount of samples, btw). I tried opening these files as tables in R but I do not think R is categorizing the contents of the file correctly.
Relevant answer
Answer
GTEX-1117F-0226-SM-5GZZ7 is the sample ID and the ENSG00000223972.4 refers to the gene symbol according to the HUGO gene nomenclature. The numbers you are referring to are gene expression values. TPM (Transcripts Per Million) is a normalization method that has been used to scale these gene expression values so that it is possible to make the expression of genes comparable between samples. 
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I was using fragbuilder module in python to generate peptides of sizes 4, 6, and 10. However, the issue with fragbuilder module is that some of the bond angles are deviating from the standard values. For instance, C_alpha--C--N bond angle standard value is 121 degrees but fragbuilder assigns 111 degrees. This angle deviation causes a deviation in the distance between the nearest neighbor C_alpha---C_alpha and its value is 3.721 angstrom and the typical standard value is 3.8 A. Also another bond angle is a deviation from the standard value by 6 degrees which is the C_alpha---C---N whose value is 111.4 degrees and typical standard values are 117 degrees. My doubt is how much deviation is allowed for MD simulations of peptides (or proteins) while fixing the bond lengths and bonds angles ?
Relevant answer
Answer
Gary James Hunter Thanks for you reply.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I want to purchase Macbook mainly for the bioinformatics analysis propose i.e., Transcriptomics, smalRNA, Methylation, lncRNA and other. Would anyone please suggest to me the best affordable one?
Relevant answer
Answer
I think a small server is a better choice for processing bioinformatics data analysis as it is cheaper and more convenient. This is because many analyses can take a long time, and MacBook do not have good heat dissipation.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have two sequences from the predicted mRNA sequence (only exons, without intron) and gDNA sequence (with intron). Then, I align the sequences to confirm the position of exon in the DNA sequence. after that, I pick the primers from the exon region and check the specificity on Primer Blast. However, I also design primers only from predicted mRNA without considering the exon region on DNA sequence. Which is more appropriate to use in amplifying full-length genes in the DNA template?
Relevant answer
Answer
No, introns are not considered in designing primers for full-length gene regions in the DNA template. Primers are designed to amplify only the exonic regions of the gene.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
When we are in the step of aligning virulent factors against human proteom to exclude those proteins with > 35% homology what is the output that we have to use for the next step of predicting transmembrane helices and molecular weight for chosen proteins?
Relevant answer
Answer
Hi Mr David thank you so much for your answer, I really appreciate your help. My question is that do we use the non-homologus protein sequence aligned related to homosapiens, or do we use the original sequence submitted e.g. (virulent factors) for further steps. And if we use the later (the non-homologus protein sequence related to my specie of interest) then from where can we obtain the output fasta sequence on the blast p platform.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Hi everyone,
please bear with me, because I am a complete beginner with regard to any form of bioinformatics and I am trying to understand the best approach to my experiment.
I am currently trying to isolate cells and sequence them for further bioinformatic analysis, more precisely RNA-Sequencing.
We have, however, had issues with purity and while some samples we looked at reached a purity of >90% after isolation (we usually validate it by use of flow cytometry), some samples of different animal genotypes did not.
This leads me to my first question:
How important is cell purity for Bulk RNA-Seq?
Which purity should be reached for and adequate, realiable analysis?
If anyone has any recommendations for papers to look into regarding that subject, I would be most grateful, because I have no idea where to start and what to consider.
Further along in the story we surmised that maybe Single Cell RNA Sequencing might be the better option in cases of lower purity.
But again, the same question arose: how relevant is cell purity for the following analysis and is there a cut-off value not to be crossed?
Finally:
How advantegeous would using both methods be?
Sure, Bulk gives a better general overview and Single Cell is more precise, but do they complement each other or is it essentially redundant information gained by doing both experiments?
And are there any disadvantages to using only SC or do both methods completement each other when low purity levels are in the question?
Thank you a lot in advance!!
Relevant answer
Answer
Welcome to RNA-seq! It's a crazy and wild world. You will find that responses will depend a great deal on what you're aiming to achieve. So take my responses with this in mind...it depends!
Get as close to 100% as possible otherwise you'll be having to perform a set of validation experiments to ensure that any interesting findings are due to changes in your cell type of interest and not in a "contaminating" cell type. Single cell RNA-seq may be suitable here since you'll be able to get some cell type resolution and identify the populations in which the change is occurring, of course you'll still need to validate. The strenght of scRNA-seq is that you don't need to purify/enrich your population since these get resolved as part of the procedure/analysis. However, the a drawback with scRNA-seq is that you will loose a lot of low abundant transcripts, "dropout" is also a major issue. So, if you're comfortable loosing some info on potentially valuable transcripts then scRNA-seq may be the way to go. They do potentially complement each other especially because with bulk, you may get data about low expressed transcripts. But a big caveat, it all depends! You may consider identifying and collaborating with someone with expertise in RNA-seq (sample prep and data analysis) at your local institution.
Which papers? It depends. Start with papers that are answering a similar question to yours, then dig into what would be best for you study. You can consider reaching out to representative of companies like 10X Genomics and Miltenyi Biotec...that's also a good starting point. Good luck!
  • asked a question related to Bioinformatics Analysis
Question
9 answers
I have data from the our experimental model - where we analyze the immune response following BCG vaccination, and then the responses and clinical outcome following Mtb infection of our vaccinated models. Because we cannot experimentally follow the very same entity after evaulating the post-vaccination response also for the post vaccination plus post infection studies - we have such data from different batches. Is it possible to do correlation here between post vaccination responses of 5 replicates in one batch (in different vaccine candidates) versus 4-5 replicates in vaccination & infection from another batch? I ask this because we are not following up the same replicates for post vaccination and post infection measurements (as it is not experimentally feasible). If correlation is not the best method, are there other ways to analyze the patterns - such as strength of association between T cell response in BCG vaccinated models versus increased survival of BCG vaccinated models (both measurements are from different batches)? We have several groups like that, with a variety of parameters measured per group in different sets of experiments.
Thanks for your responses and help.
Relevant answer
Answer
To make it a bit simpler:
say you have treatments A and B, and your experiment is done in two batches 1 and 2.
If treatment A is analyzed in batch 1 and B in batch 2, then treatment and batch are perfectky confounded you you have no chance to distangle batch-effects from treatment effects.
If samples with treatment A are measured in batch 1 and 2, and also samples with treatment B are measured in both batches, then one can model the batch effect and reveal the treatment effect.
If you have treatments A+C in batch 1 and treatments B+C in batch 2, you might estimate the batch effect from treatment C and apply it to correct A and B as well (dangerous, if the batch effect also depends on the treatment, but better than nothing).
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I created this R package to allow easy VCF files visual analysis, investigate mutation rates per chromosome, gene, and much more: https://github.com/cccnrc/plot-VCF
The package is divided into 3 main sections, based on analysis target:
  1. variant Manhattan-style plots: visualize all/specific variants in your VCF file. You can plot subgroups based on position, sample, gene and/or exon
  2. chromosome summary plots: visualize plot of variants distribution across (selectable) chromosomes in your VCF file
  3. gene summary plots: visualize plot of variants distribution across (selectable) genes in your VCF file
Take a look at how many different things you can achieve in just one line of code!
It is extremely easy to install and use, well documented on the GitHub page: https://github.com/cccnrc/plot-VCF
I'd love to have your opinion, bugs you might find etc.
Relevant answer
Answer
I use TASSEL software for genome analysis. You need plink format of map and pad to operate it. You can try and explore this software
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Dear all,
I am performing analysis of 16S rRNA amplicon sequencing data. I have tested effectivity of two classifiers on the mock community and blast classifier shows the best result. However, I found out blast is using a local sequencing alignment. So I do not know if it is appropriate to use this classifier to assign a "mystery" sequence to a bacterial taxon. Is it possible that this approach will result to false positive results? Is it better to use Vsearch classifier which showed worse results but is using a global sequencing alignment?
And a bonus question. Should I use rarefied representative sequences to perform a taxonomy classification or not? I use rarefied data for alpha diversity testing (and for beta diversity testing I do not).
Thank you all for answers!
Martin
Relevant answer
Answer
  1. It is not rRNA amplicons but rRNA gene amplicons
  2. You are having amplicons which are probably 300-400 bp long, why do you think global alignment is better in this case?
  3. For rarification, read the following and decide yourself.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Hello all,
I am having an issue with 16S PICRUST data. There is always a warning message post PICRUST run that more than half of the sequences have been removed from further analysis. The reason might be that the ASV fasta files contains mix DNA sequences i.e. both positive and negative strands. PICRUST can only deal with positive sequences hence the output is based on approximately 50% of the sequences of FASTA file. I am really looking for some suggestion (computer programming) on identifying negative sequences from FASTA files based on NCBI BLASTn portal and reverse complementing it. Because this work would be difficult to be performed manually considering 6000 sequences of FASTA files. I have limited knowledge in coding. Any help would be greatly appreciated.
I am running this PICRUST pipeline as mentioned here https://github.com/picrust/picrust2/wiki/Full-pipeline-script. The ASV file has been generated by using raw FASTq files on QIIME2.
Relevant answer
Answer
Hi,
To reverse complement the fasta file, use the seqtk tool.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi,
Could anyone suggest free software or websites used to generate unique DNA barcode sequences [5-10nt] to label XXX genes for library screening?
Thank you in advance
Relevant answer
Answer
HI My dear
http://biorad-ads.com/DNABarcodeWeb/ this URL can you generate BarCode for your sequence , may be simple and more helpful for you .
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I designed two sets of primer to target a gene of interest in gene expression studies (cDNA). The first primer pair (T1) had an amplicon base pair of 200bp when I carried out Insilico PCR and on targeting my gene of interest in PCR it was successful (showed band on gel and sequencing successful). Coming to the second set of primer (T2) I carried out Insilico pcr and the expected amplicons size was 1200bp which was of interest to me (I'm carrying out bioinformatics analysis of the protein seq.), but on targeting the gene in the cDNA on PCR it wasn't successful. I have troubleshoot varying different parameters but no success. Could high number of sequence length be a hindrance in Pcr, and how can i overcome this problem?
Relevant answer
Answer
Dear Joseph Japhet , Amplifying a 1200 bp fragment is not a difficult task. However, your explanation lacks some information. Therefore it is difficult to troubleshoot your experiment. You need to verify what kind of taq polymerase you have used for the PCR. (I have used TaKaRa Xtaq to clone many genes in which length varied from 800bp to 2500bp). on the other hand, you need to give the PCR cycle information. Taq polymerase requires some time read through 1200bp and replicate the gene of interest. Therefore the extension time of the PCR cycle should be higher than 90 seconds (500bp per 30 seconds as a rule). Probably you may run the same PCR program to get both genes amplified. Please check the PCR parameters...
  • asked a question related to Bioinformatics Analysis
Question
6 answers
hello
Please introduce me the companies that provide biotechnology services such as designing different types of primers, NGS, RNASeq, etc.
Relevant answer
Answer
Following companies have offices in Tehran
Sanofil - Roche - Novo NorDisk - Novartis - Bayer - Johnson & Johnson - Merck KGaA - Zoetis - TCI - BioHorizons Implant Systems. In US many are located in Philadelphia, Pennsylvania and Boston, Mine and San Fransisco, California. - But Iran had made working with US companies nearly impossible. For ease of working outside Iran I'd look to China. In general use caution sharing new ideas with companies you have not worked with in past..
  • asked a question related to Bioinformatics Analysis
Question
6 answers
-RNA seq and bioinformatics were carried out by professionals.
- Gene in question shows ~700 fold differential regulation by qPCR in multiple independent cohort of experiments - not in RNA seq.
Please advise....
Relevant answer
Answer
The large fold-change indicates that the gene is likely not expressed to a high level. Under control conditions the experession can be almost zero, and a slightly larger expression under treatment conditions will result in a very large fold-change.
Low-expressed genes give only low or no counts in RNA seq. It might be that genes with no or very few counts are filtered out from the analysis, because the counts are not reliable. If the gene is not at all detected under control conditions (0 counts in all control samples), it is not possible to calculate any (finite) fold-change at all.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Like reactome or string
Relevant answer
Answer
Enrich R, webgetsalt, webGivi are very goof
  • asked a question related to Bioinformatics Analysis
Question
4 answers
thanks.
Relevant answer
Answer
Sure Azka Saleem
Steps u can consider following.
1. Download the SGD data as a fasta file/if separated as many files, u can merge all into single fasta file.
2. Make a list of IDs in MS excel whose sequences you need to extract.
3. Install TB tool in your dekstop, and search for fasta extract option, where u need to provide the database fasta file and the IDs and click ok.
4. In the output file you will have your required IDs sequence data.
If you still find it difficult to perform, u can mail me in nkrdas2@gmail.com
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have some lists of gene IDs from multi species, I want to have their compiled FASTA format files for each species. it looks tedious to copy each accession and collect FASTA seqs.
Batch Entrez is giving me error, may be because the identifier is related to other database.
Relevant answer
Answer
in my case for 1st gene list having TAIR identifiers i got their FASTA seq file from TAIR < download < bulk data retrieval < sequences.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
I'm in the initial stages of planning a miRNA seq experiment using human cultured cells and decided on TRIzol extraction, Truseq small RNA prep kit, using an illumina HiSeq2500. The illumina webinar suggests 10-20 Million reads for discovery, the QandA support page suggests 2-5M, and I wrote the tech support to ask, who suggested I do up to 100M reads for rare transcripts. Exiqon guide to miRNA discovery manual says there is not really any benefit on going over 5M reads. I was hoping to save money by pooling more samples in a lane, so I was hoping someone with experience might be able to suggest a suitable number of reads.
Relevant answer
Answer
i am working on cardiomyopathy patients Blood samples . and wanted to do miRNA sequencing can some one please suggest how many millions reads i need to sequence 20 millions or 30 millions and also please suggest the platform as well .
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Hi, I was hoping someone could recommend papers that discuss the impact of using averaged data in random forest analyses or in making regression models with large data sets for ecology.
For example, if I had 4,000 samples each from 40 sites and did a random forest analysis (looking at predictors of SOC, for example) using environmental metadata, how would that compare with doing a random forest of the averaged sample values from the 40 sites (so 40 rows of averaged data vs. 4,000 raw data points)?
I ask this because a lot of the 4,000 samples have missing sample-specific environmental data in the first place, but there are other samples within the same site that do have that data available.
I'm just a little confused on 1.) the appropriateness of interpolating average values based on missingness (best practices/warnings), 2.) the drawbacks of using smaller, averaged sample sizes to deal with missingness vs. using incomplete data sets vs. using significantly smaller sample sizes from only "complete" data, and 3.) the geospatial rules for linking environmental data with samples? (if 50% of plots in a site have soil texture data, and 50% of plots don't, yet they're all within the same site/area, what would be the best route for analysis?) (it could depend on variable, but I have ~50 soil chemical/physical variables?)
Thank you for any advice or paper or tutorial recommendations.
Relevant answer
Answer
Thank you!
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I am computing Van der Waal interactions in python for a peptide of size 10 residues for various conformations. The total conformations (or the number of PDB files is 300,000). Is it possible to compute only the 1-4 atom distances to compute Van der Waals interactions as the bonded and 1-3 atom distances are irrelevant when it comes to Van der Waal interactions using some python module?
Relevant answer
Answer
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hii, Is there a way I can extract the alternative spliced protein isoform structures from PDB? Also can we mapped the structure to uniprot sequence So we can know which structure belong to which isoform sequence?
Relevant answer
Answer
Unfortunately, most of the databases contain 3D structures only for canonical isoforms. But, you would try Google Colab, a phyton-based online notebook running AlphaFold2, which can predict the structure of any custom sequence or noncanonical isoform. Check this out https://youtu.be/le7NatFo8vI
  • asked a question related to Bioinformatics Analysis
Question
1 answer
After finishing the simulation of the cyclic peptide, I tried to find the most populated structure using the cluster peak density algorithm. from the literature, the representative structure was chosen as the structure with maximal ρsum (The summation of local densities of all residues in one structure, ρ𝑠𝑢𝑚 = ∑ ρ𝑖𝑛_𝑟𝑒𝑠𝑖=1) so how can I extract the structure which has the highest density for the all residue?
ref: Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492–1496
Relevant answer
Answer
Dear Sam Mohel ,
luster analysis is an exploratory analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.  More specifically, it tries to identify homogenous groups of cases if the grouping is not previously known.  Because it is exploratory, it does not make any distinction between dependent and independent variables.  The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hello! I'm new to bioinformatics and cancer databases. I was exploring cbioportal and analyzing coexpression of different genes through scatter plots. I noticed that the axis are labeled as " RSEM (Batch normalized from Illumina HiSeq_RNASeqV2)" (I attached an example so you can see). I know that RSEM is a transcript quantification software but what does "Batch normalized" mean? does it give upper quartile normalization? FPKM? or what?.
thanks in advance!
Relevant answer
Answer
It's upper quantile normalisation. See https://www.biostars.org/p/106127/.
Here is a paper comparing normalisation methods I personally find informative.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Suggestions of online databases/tools I can use to verify candidate genes
Relevant answer
Answer
I want to verify a list of genes, find them related to a disease I am researching on Blaise Manga Enuh
  • asked a question related to Bioinformatics Analysis
Question
16 answers
Apple's M1 mac is there in the market since 2020, but its application and compatability with bioinformatics analysis tools is scarcely discussed. For example, is it possible to index a human genome on M1 mac air (16gb), if yes, how much time it takes? Is it possible to Map reads to the reference genome? if yes, how much time it takes? Any headsup about the Conda experience?
Please share your thoughts and experiences... it can be of a great help...
Relevant answer
Answer
Yes, it is definitely possible. Bowtie has a very low memory requirement relative to other aligners. 16GB of RAM should be plenty. I've done it on a much older MAC with only 4GB and it completed successfully (though I had to run it for ~48 hours straight, so I highly recommend avoiding this particular set up). How much time it takes depends a lot on your parameters, background CPU usage, etc.... you should perform some benchmarking to figure that out.
As a more general answer to your question, 16GB of RAM will place you solidly in the "maybe" camp when answering the question "Does it have enough power to do <insert computational tool here>". It is unlikely to be enough to use STAR, more than enough to use Bowtie, way too little to use memory heavy software like Cell Ranger. In general, check the manual of any tool you'd like to use and see what the recommended and required parameters are, and that should tell you whether your set up will work or not.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi,
GO and KEGG functional analysis for a gene set was using the DAVID database (https://david.ncifcrf.gov/). However, the adjusted p-values (Bonferroni and Benjamini) of the enriched GO terms and KEGG pathways were more than 0.5. Meanwhile, a PPI network was constructed using the STRING database (https://string-db.org). The network was constructed with a confidence score of  0.4 was set as the cutoff criterion with no more than ten as the maximum number of interactions in the first shell. This step added a few more genes to the gene list, and genes with no interactions were removed. When the updated gene list was used for GO and KEGG functional analysis, the enriched GO terms and KEGG pathways were now significant (p-value < 0.05). Is the attempted workflow valid?
Relevant answer
Answer
Thank You, Dr Giovanni Colonna, for taking your time in answering the question. I concur with your explanation.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
Dear all, I am trying to use CD-hit to remove the duplicates from the file that is the output from trinity (RNA seq assembly).
I used the following parameters:
cd-hit-est -i in.fasta -o out_cdhit90.fasta -c 0.90 -n 9 -d 0 -M 0 -T 0
But the output file still contains lots of small or fragmented sequence plus the best one. How can I remove those small or fragmented duplicates by changing the parameters?
thanks
ZQ
Relevant answer
Answer
Hello, do you know any tool DIFFERENT from CD-hit to filter CDS unigenes.?
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi, I want to predict post-transitional modification for phosphorylation. I found lots of websites like Phosida, PhosphoSite Plus. I am just curious about is there any python code for this phosphorylation prediction. If you have, could you share the GitHub link?
Relevant answer
Answer
Shaban Ahmad thank you
  • asked a question related to Bioinformatics Analysis
Question
1 answer
Dear All,
I did a q PCR analysis to one micro RNA and it was upregulated in tumour tissues compared to normal ones. Then I applied a bioinformatic analysis to detect the target genes and the genes showed the most important targets for the microRNA were oncogenes (based on other studies).
I didn't do any further study on the target genes and I need to keep the bioinformatic analysis only. How can I discuss the results? Is there is any way I can discuss these results knowing that it will be only an in-silico study?
Many thanks
Relevant answer
Answer
You have analyzed a miRNA expression in normal and tumor tissue and found that miRNA expression was upregulated. Then you have used any target prediction tool and recognized that miRNA target. The targets are oncogenes as per literature. With this much analysis, you want to write results.
1. Based on the other contents of your manuscript & you yourself can decide whether this finding will be enough or not.
2. How you have done the qPCR analysis that will be important. What is your normal (considered it control) sample? How you have isolated miRNA? Have you used miRNA-specific cDNA syntheses for qPCR? What was your endogenous control? if these parts are fine then let's see other parts.
3. In a very simple manner, it can be said that if miRNA is upregulated and the target gene is an oncogene, then miRNA can target oncogene & we can call miRNA has an anti-cancer role. BUT, even if you analyzed via a bioinformatic tool (I'm sure that it will be a target prediction tool like targetscan or miRDB, etc.), you have to show target is getting targeted via basic experiments like PCR gene expression of target in normal & tumor tissue, western blot or little advance 3'UTR reporter assay. Otherwise, the target which is available is database/bioinformatic tools, they are just predicted. Without experimental validation, it becomes difficult to present.
4. Your question heading " Can upregulated micro RNAs upregulate the target genes ? Anser is yes it is possible. In our paper, we have found that upon overexpression of miRNA, several genes upregulated. This happens possibly because miRNA targets certain genes which get inhibited but upon suppression of those genes, trigger the upregulation of other genes. Gene functions in networks ways. you can check our paper.
A few more things are there which can be discussed. Will wait for your reply.
Regards
Saurabh
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi there,
I want to analyze deferentially gene expression of mice before and after treatment.
I have 6 mice and "paired-end" sequences, so how I could merge all my "before treatment" data to compare them with all "after treatment" data via DESeq2 ?!
Should I map/count/DESeq2 them separately?
Is there any way to combine (normalize) all replicates at first and then perform analysis like what we do generally in statistics?!
Thank you in advanced.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
I have two vcf files corresponding to the results of healthy tissue and tumor tissue. I want to compare these vcf files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis?
Thanks in advance.
  • asked a question related to Bioinformatics Analysis
Question
13 answers
EDIT: Please see below for the edited version of this question first (02.04.22)
Hi,
I am searching for a reliable normalization method. I have two chip-seq datas to be compared with t-test but the rpkm values are biased. So I need to fix this before the t-test. For instance, when a value is high, it doesn't mean it is high in reality. There can be another factor to see this value is high. In reality, I should see a value closer to mean. Likewise, if a value is low and the factor is strong, we can say that's the reason why we see the low value. We should have seen value much closer to the mean. In brief, what I want is to eliminate the effect of this factor.
In line with this purpose, I have another data showing how strong this factor is for each value in the chip-seq datas (with again RPKM values). Should I simply divide my rpkm values by the corresponding RPKM to get unbiased data? Or is it better to divide rpkm values by the ratio of RPKM/ Mean(RPKMs) ?
Do you have any other suggestions? How should I eliminate the factor?
Relevant answer
Answer
Actually, the log transformation in the figure I attached was done according to the formula: log((#1+1)/(#2+1)). Just later, I thought that I added "1" to my values to be able to carry out log transformation (not to eliminate zero values). So I considered that maybe, it would be more correct to add "1" to adjusted values just before the transformation.
Thanks again :) Jochen Wilhelm
  • asked a question related to Bioinformatics Analysis
Question
3 answers
What exactly is the role of HSP-90 in extracellular environment of the cell? I am wondering whether hsp90 is involved in the translocation of the client protein from outside to inside of the cell. If somebody is having some references please share with me. I am very curious about this molecule.
Relevant answer
Answer
My article about that just got accepted you can see it soon, including tests before, immediately and 2 hours after exercise
  • asked a question related to Bioinformatics Analysis
Question
4 answers
I have two different ChIP-seq data for different proteins, I have aligned them to some fragments in the DNA. Some of these fragments get zero read count for one of them or for both. To be able to say these fragments has protein X much more than the protein Y, I use student's t-test.
I wonder if It would be better to remove the zero values from both of the data showing rpkm values for each fragment. Moreover, they pose problem when I want to use log during data visualization part.
What would you suggest?
Relevant answer
Answer
Thank u so much for both your answer and suggestion David Eugene Booth
  • asked a question related to Bioinformatics Analysis
Question
11 answers
I have not much experience in bioinformatics and I need to find what are the common genes in several gene expression datasets, in other words, I need to find genes that match in all (or some) of my datasets. I am looking for some kind of tool that give me Venn diagrams with the coincident genes. Any suggestion (free software plese) will be very appreciated.
Relevant answer
Answer
For a List of Venn diagram tools, their features, and references, you may check out the link below.
  • asked a question related to Bioinformatics Analysis
Question
2 answers
"The development and validation of a medium density SNP genotyping assay in Shrimp" is a research proposal I'm currently working on. Given the restricted budget allotted (9,600 USD) to the project, I'd like to know ahead of time how much it might probably cost me.
Relevant answer
Answer
sorry
outside my area of expertise
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello. I am trying to run a haplotype analysis in PopArt. It's going well until I realized I can not load a previous work in PopArt. I can only export the graphical output as .svg, .png, or .pdf but not as a "network" file which I can reload or edit if I want to in the future. I noticed that it can be saved as a .nex file and the new file actually had additional lines (the portion of the code started with: "Begin NETWORK"). I think this is supposed to be read by PopArt but it fails to do so. I encounter parsing errors when I try to run the new file. I am not sure if there is a way around this as I am new to the software. Any help would be appreciated. Stay safe, anon!
Relevant answer
Answer
Great question, thanks for asking.
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello,
I am new in this field. I am doing metagenome analysis with shotgun reads. All reads are single ended. DNA was obtained from airways of human. I just want to find taxon abundances in the samples. Then I will predict the diversities and core microbes.
My mapping results are terrible. How can I handle bad mappings?? OR should I change the tools that I used the analysis?? Which tools are more accurate or sensitive for microbiome analysis?? I need any suggestions, please!
I followed this pipeline:
  1. Assembly was done using Megahit
  2. Short contigs (<200 bps) were removed using prinseq
  3. Read mapping against contigs was performed using BWA
  4. Similarity searches for GenBank, KEGG, , eggNOG were done using Diamond
  5. Binning was done using MaxBin2
You can find my mapping results in the attachment.
Relevant answer
Answer
Dymphan Gonsalves Thank you very much, your answer is very helpful.
  • asked a question related to Bioinformatics Analysis
Question
13 answers
In the R programming language, I'm going to install the MetaDE package. Nonetheless, I get a warning that package 'MetaDE' is not available for this version of R, A version of this package for your version of R might be available elsewhere. How can I overcome this issue while I'm using R version 4.1.0?
Relevant answer
Answer
Start with R 3.6.3 and then use devtools to get acquire metaDE.
  • asked a question related to Bioinformatics Analysis
Question
7 answers
I used WebMGA to cluster my NGS data (COG). I have problem on analyzing the data provided in output.zip since the format file is unknown, in this case do I need some specific software to open each of those files? 
Relevant answer
Hi!
I wonder if someone knows how reliable WebMGA is. I would like to know your opinion
  • asked a question related to Bioinformatics Analysis
Question
2 answers
Hi everyone,
Can anybody help in analyzing a density profile graph generated by a simulation run on GROMACS? I have attached the file for your reference.
Need an elaborate explanation as I am new to this. Kindly also suggest me any research articles related to this topic.
Thank you so much in advance!!
Good day
Regards
Renu
Relevant answer
Answer
Thank you so much. I will check them out.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Including these steps: 1) raw data format transformation for five companies 2) update positions for all SNPs to hg37 version 3) Quality control within companies 4) Pre-phasing (SHAPEIT2) and imputation (IMPUTE2) for all SNPs of each company 5) Perform GWAS using two logistic models for 27 phenotypes 6) Statistic and downstream bioinformatic analysis. 7) Estimation of genetic parameters (rg and hg). 8) PRS analysis. However. the size of my dataset only consist more than 1000 people. With no background knowledge, how long would this take as a bioinformatics master student?
Relevant answer
Answer
more than 1000? please tell the exact number of samples and size of the data?
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am writing to ask for more information about bioinformatics ideas for a humanized antibody. A humanized antibody has been causing me to wonder what kind of bioinformatician analysis I can do. Although I can use Docking and Molecular Dynamics to evaluate this antibody, I am looking for other ways to analyze it in structural bioinformatics. Please suggest how I can conduct a bioinformatics analysis of this antibody. Any relevant article to refer to would be greatly appreciated.
Relevant answer
Answer
If you want to perform structural analysis, I suggest you take a look at Quantum mechanical methods
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have taken references from various sources to write a code but I am not getting the proper dataset read in the R studio.
Relevant answer
Answer
If you have the GEO accession, then you may retrieve using getGEO function in R programming.
install.packages("GEOquery")
library(GEOquery)
data<- getGEO("accessionhere", GSEMatrix =TRUE, getGPL=FALSE)
let me know if you need any further help
  • asked a question related to Bioinformatics Analysis
Question
31 answers
Applications of bioinformatics in medicine is a key factor in technological advancement in the field of modern medical technologies.
In which areas of medical technology are the technological achievements of bioinformatics used?
What are the applications of bioinformatics in medicine?
Please reply
I invite you to the discussion
Thank you very much
Best wishes
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
Our Lab EMBS's Publication In collaboration with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
Our Lab EMBS's Publication In collaboration with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
Our Lab EMBS's Publication In collaboration with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
Our Lab EMBS's Publication In collaboration with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
Our Lab EMBS's Publication In collaboration with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Our Lab EMBS's Publication In collaboration with collaboration with University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Our Lab EMBS's Publication In collaboration with University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Our Lab EMBS's Publication In collaboration with King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Our Lab EMBS's Publication In collaboration with Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Our Lab EMBS's Publication In collaboration with Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Our Lab EMBS's Publication In collaboration with Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Our Lab EMBS's Publication In collaboration with School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Our Lab EMBS's Publication In collaboration with CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Our Lab EMBS's Publication In collaboration with Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Our Lab EMBS's Publication In collaboration with LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Our Lab EMBS's Publication In collaboration with Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Our Lab EMBS's Publication In collaboration with Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Our Lab EMBS's Publication In collaboration with National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Our Lab EMBS's Publication In collaboration with University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Our Lab EMBS's Publication In collaboration with NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Our Lab EMBS's Publication In collaboration with King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Our Lab EMBS's Publication In collaboration with School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Our Lab EMBS's Publication In collaboration with Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Our Lab EMBS's Publication In collaboration with Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
Our Lab EMBS's Publication In collaboration with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hello,
I am looking to obtain global RNA-Seq data for either E. coli or P. putida. I assume RNA-seq data is publicly available for many microbes, but I am unsure where I can access this information. Does anyone have insight as to what website or database I can find this data?
Many thanks,
Shawn
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi, I am a beginner in the field of cancer genomics. I am reading gene expression profiling papers in which researchers classify the cancer samples into two groups based on expression of group of genes. for example "High group" "Low group" and do survival analysis, then they associate these groups with other molecular and clinical parameters for example serum B2M levels, serum creatinine levels for 17p del, trisomy of 3. Some researchers classify the cancer samples into 10 groups. Now if I am proposing a cancer classification schemes and presenting a survival model based on 2 groups or 10 groups, How should I assess the predictive power of my proposed classification model and simultaneously how do i compare predictive power of mine with other survival models? Thanks you in advance.
Relevant answer
Answer
The survAUC R package provides a number of ways to compare models link: https://stats.stackexchange.com/questions/181634/how-to-compare-predictive-power-of-survival-models
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hi all,
I would like to now if you have any information related to this issues, more precisely companies who could provide services for
1. genome sequencing and assembly ;
2. whole methylation sequencing  for 20 samples including bioinformatics analysis
Thanks
Relevant answer
Answer
You can always reach out to Novogene for your sequencing projects. We have worked with them over the past few years and your get excellent service at great prices.
For the data-analysis, feel free to reach out to BISC Global (www.biscglobal.com), a bioinformatics, statistics and machine learning services company with teams in Europe and the US. We have a lot of expertise in epigenomics and genomics data analysis.
  • asked a question related to Bioinformatics Analysis
Question
5 answers
What is the script to do the quantile normalization to do a microarray dataset (GSE70970), by using limma? do i need to create model matrix first before proceeding to normalize it? i'm very new to R
Relevant answer
Answer
Yes first you need to create a numeric matrix and store it in A.
then try normalize
normalizeQuantiles(A, ties=TRUE)
ties = T will ties every column of your matrix A and the values will be normalized to the mean of the corresponding pooled quantiles.
Have fun and Happy Research!
  • asked a question related to Bioinformatics Analysis
Question
7 answers
I am wondering if these low levels of total RNA the samples are enough for RNA-seq. Does anyone already did it or has any suggestions to get a reliable data for bioinformatic analysis?
Relevant answer
Answer
  1. You are talking about total RNA and not the depleted RNA that is in itself tells that sample will not be a good choice to go for RNA-Seq
  2. Even with that amount of mRNA, sequencing is always doubtful
  3. Better we not talk about that data analysis, because if rubbish goes in rubbish comes out.
I would not go for RNA-Seq with these samples unless I have a huge amount of funding which I just want to step no matter what...
  • asked a question related to Bioinformatics Analysis
Question
12 answers
I've tried to dock an enzyme (523 residues) with its amino acid substrate, but no docking server can recognize a single amino acid as a ligand. What can I do for docking those molecules?
Relevant answer
Answer
as i remember you can use haddock from online server or if you have biovia you can use zdock and rdock
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I have whole-genome sequences of a fosmid DNA. I will do the bioinformatics analysis, and my main aim is to identify the sequences of my insert.
Could you recommend a cloud-based/desktop-based (preferably Windows OS) tool for whole-genome sequences analysis of fosmid DNA?
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have some files in bed and bedgraph format to analyze with IGV. My team and I tried to upload them on IGV following the IGV site's tutorias but it hasn't worked. The bedgraph files are large (5157) and we converted them to the bynary .tdf format using the IGVTools "Count" command but it hasn't worked. Only with some files we can see a single flat line on IGV screen without any information. With FilexT we can see that the files in bed and bedgraph are not damaged.
We think that the problem is the step when we select the option "Load from File" on IGV. How can we do? What can we do?
We use the IGV_2.10.3
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
1 answer
DNA barcoding is used to obtain taxonomic information about unidentified organisms. Apart from that what other types of Bioinformatics analysis might be performed with the DNA barcode data? What are the Bioinformatics Resources for DNA barcoding data analysis?
Relevant answer
Answer
If you mean bioinformatics resources /databases
NCBI
EMBL-EBI
DDBJ
BOLD (FOR, COI barcodes)
  • asked a question related to Bioinformatics Analysis
Question
6 answers
I have been asked to check the gene expression patterns of the cells for a RNA seq data after performing principal component analysis plot using MATLAB. I have a CSV file that has the principal component values stored, but I am not sure how to perform differential expression analysis using the PC values. Any MATLAB function available? Kindly help me. Thanks in advance.
Relevant answer
Answer
I am preparing E-Readiness Index for farmers, extension personnel and agricultural scientist separately to measure the degree of an individual to utilize tools of ICT in agriculture. I have selected sub-groups as well as indicators for the same but i am stuck how to obtain e-readiness score ? As per my reading of literature i realized that PCA or Factor Analysis provides relevancy and accuracy to indicators but my confusion is when to apply PCA ? On which data - data obtained from pre-testing data or the final collected data? Or without any data collected ? Please guide.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I work with spruce which means that we don't have high numbers of clonal replicates. In a RNA-seq experiment we had one clone with six individuals and five clones with two individual. For the one clone there are three control and three treated individuals. For the other clones there is only one replicate of each. I am trying to find a way to analysis this data. Is it possible to use the clone that has replicates as the reference and compare the other clones to it? Is there a test that can be used to see if the transcript counts of the clone with no replicates falls with in the 95% CI of the clone with replicates? I know there are some publication about single subject transcriptomics in medicine where they are trying to develop methods for personalized medicine when only one individual is sequenced.
Relevant answer
Answer
Thanks for you suggestion.
Best,
Melissa
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am using DAVID (https://david.ncifcrf.gov/home.jsp) to cluster some genes I found upregulated in my RNAseq data. I am just using the official gene symbol without any quantitative data. However, the KEGG pathway results are giving me p-values which are extremely high. It does not make any sense to me. How the p-value can be calculated without any number? Can the p-value be significant?
Relevant answer
Answer
Functional enrichments are based on statistical tests made with CATEGORICAL VARIABLES (which differ from numerical ones that we use, for instance, in a t-test). This means that your test will find out wether the genes of a particular pathway or biological process are significant, given the number of the genes/proteins that you put as input, the list of genes/proteins belonging to that pathway/process and a specific background (that could be the entire transcriptome, proteome, or the total list of genes/proteins of your experiment). Thus, you should not be surprised to get a p-value.
If you want to rank your genes/proteins, giving priority to genes that have a higher or lower fold change for your enrichment analysis, you could use GSEA from Broad Institute https://www.gsea-msigdb.org/gsea/index.jsp
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am trying to run the pamlX for CODEML but somehow not able to get start the run option. After loading all three files that are .ctl, .phy, and .tree, the pamlX program stands still and the RUN option do not works. Please assist me how I can start the RUN option compelte the analysis.
My interest is to identify lineages with accelerated evolution and test diverse branch models on CODEML, considering one to several ω ratios. If at all this analysis is possible in any other program kindly please suggest that too.
I have provided the test files in the attachment.
Relevant answer
Answer
Hello Syed, I was having the same problem that you were having, and I was able to resolve it. The problem of having the greyed out "Run" button in Codeml, that can be seen in the attached Figure 1, is due to not having the codeml.exe file in your program's directory folder. It's a problem accompanied with this error message seen in Figure 2, "Failed to find executable file 'codeml.exe' in the folder: [Your directory for PAML-X]. This directory is set up automatically when you first run the program.
To fix the problem, you need to download both links from this website:
This is the PAML download page.
Then extract the .zip folder, and extract the .tar folders within each. When the folders are both fully extracted, put the files from the PAML 4.9 folder, into your PAML-X folder, as can be seen in Figure 4. Put the PAML 4.9 files into whatever folder was given in the error message in Figure 2.
Then launch PAML-X again, from the same location, your directory folder, and click CodeML. This time, no error should be given and the "Run button will not be greyed out anymore, as can be seen in Figure 5.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I have a data (shown in attached pic ) where I have RNA seq data of various samples for the same the gene twice.
Now suppose for sample-1 if I want to measure the gene ( which is haplotypic in nature ) how do I consider its RNA seq for the sample no 1. Do I take average or do I consider median or should I consider both these versions of genes as separate genes ? I guess biologist would make better explanations.
Relevant answer
Answer
From what i see the RNA seq data presented is already pre-processed, since raw RNA seq data would be in counts and integer type... Its important to know how its pre-processed. There are much more advanced ways of analyzing such data instead of taking median or a mean. I would suggest looking into Differential Gene Expression analysis for Haplotypes...
  • asked a question related to Bioinformatics Analysis
Question
2 answers
The project's budget is 12,000$ does not include buying any equipments except (for example) a genotyping analysis kit, I did a project for analyzing genetic diversity and selection signatures in four endangered cattle breeds using Illumina BovineHD kit but it was not satisfying, any suggestions? it is very important and crucial for my career.
Thanks in advance,
Relevant answer
Answer
You can also choose to study the diversity, evolutionary phylogenetics or domestication of a species. Large stractural varitions on genetic disease is also a choice.
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hi,
i want to know if i can detect a mutation on a DNA sequence ( Sanger sequencing ) by using BioPython.
I want to know if there is a program to write to detect the position and the type of mutation in the generated sequence compared to a wild type sequence.
Best regards.
Relevant answer
Answer
Every organism is constantly evolving/undergoing mutations to better adapt to it’s environment. Similarly the novel coronavirus will also be undergoing mutations to better adapt to human. It is very important to understand the pattern in which it mutates. Understanding this pattern has significant impact to vaccine/drug development efforts. For example if the Spike protein is undergoing mutation, then the vaccine/drug developed to target the Spike protein may not be effective in all cases. This would mean that we will have to re-start our search for a new drug/vaccine against covid-19 again.
Regards,
Shafagat
  • asked a question related to Bioinformatics Analysis
Question
6 answers
Hello,
I really need to know what is the best way to do bioinformatic analysis of posttranslational modifications of human proteins?
Also, which tools and software do you recommend for this purpose? What about networKIN tool (http://networkin.info/)?
I would highly appreciate if you could help me in this regard.
Many thanks.
Best wishes,
Farah
Relevant answer
Answer
Thank you so much for your great advice. May I also ask how I can have access to
PTMscape tool? I am interested to use PTMscape as well. Thank you very much.
Best wishes.
Farah
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am trying to do bioinformatics analysis for LGG and GBM cohort from TCGA. I encountered a difficulty because I am not sure if I counted correctly whether the LGG cohort is heterogeneous or homogeneous in terms of IDH1 status (WT or Mutant). Can somebody help me do that or did it already? :) Thank You in advance for Your help :)
Relevant answer
Answer
In what context you are referring IDH homogeneity to?
In case you are asking about homozygous or heterozygous mutation, then it is mostly heterozygous missense mutation. homozygous IDH mutations in lower grade gliomas are rare.
  • asked a question related to Bioinformatics Analysis
Question
24 answers
Hello friends, today I am raising a concern- What are real palindromic DNA sequence ? off course you will say- Restriction enzymes sites, but through a video available at the link http://bit.ly/palindromicDNA, I am raising an issue that, in true sense mirror repeats are palindromic in nature as defined by standard English dictionaries. There are many unique properties of mirror repeats DNA which i will share later. Hopefully biological scientific community will accept mirror repeats as True English Palindrome. So please check out http://bit.ly/palindromicDNA
Relevant answer
Answer
If you define palindromes as being the same whether you read it forwards or backwards, mirror repeats are not palindromes, because DNA recognising proteins do recognise the DNA double strand. The sequence of the reverse strand is implied by base complementarity
5-'GGATCC-3' implies the sequence 5-'GGATCC-3' on the reverse strand, and therefore is palindromic when looking at the double strand, while
5-'GTGGACCAGGTG-3' would imply 5'-CACCTGGTCCAC-3' on the reverse strand, and therefore is not.
  • asked a question related to Bioinformatics Analysis
Question
7 answers
I will use this equation T(A,B)=(A*B )/(|A|2+|B|2 −A*B) to calculate Drug-likeness. Here, A is defined as a molecular descriptor for the compound and B is defined as the average molecular properties of all compounds in the Drug Bank database.
I have used PaDEL to calculate all the descriptors. Which descriptor I have to use?
Relevant answer
Answer
There is no rule of thumb to follow. It depends on the target set. If you know your target, then build a database of know drugs/molecules that are active (and inactive for modeling purpose) against this target. Once you have it use simple statistics (and/or machine learning) to find out the most important descriptors to build a classification model. Use the same for your unknown compounds for classification.
  • asked a question related to Bioinformatics Analysis
Question
9 answers
Rather than using sequence alignment data, I wanted to have phylogenetic tree from distance matrix and bootstrap as part of statistical analysis. Anyone to tell me how to execute this analysis?
Relevant answer
Answer
Dear all,
I have been working with MEGA and distance matrix. I am attching a mini-tutorial to compute UPGMA dendrograms.
I have tried with MEGA-X and it perfectly works.
#############################
1-Calculate your distance matrix with your favourite software (e.g. in ".xls" format)
2-Prepare the matrix as explained by MEGA developers (Figure 1) in a ".txt" file. I show my own datafile (Figure 2). Warning: Keep the structure, including blanks.
3-Save this file as ".meg" file. It will be ready to use in MEGA.
4-Open ".meg" file with MEGA, select Pairwise distance > Lower left matrix > OK
5-Check data in the file read by MEGA. The matrix should be identical that your previous one (".xls" or similar; Step 1)
6-Select data and click "Phylogeny". You could compute UPGMA tree
7-MEGA offers a good variety of options to customize your tree
Good luck!
  • asked a question related to Bioinformatics Analysis
Question
4 answers
Hello all,
I have a question regarding gene prediction for long metagenomic reads (MinION nanopore).
I was trying to understand the process of gene prediction. In my attempt, I classified my metagenomic sequences using a reference database by following methods :
1. I did Prodigal to predict ORFs using -p meta option and then ran a diamond aligner using e= 0.002
Result: 689 queries aligned
2. I directly used diamond for alignment of metagenomic reads to the reference database using the same e score value (I did not give identity parameter)
Result: 7292 queries aligned
3. I converted my DNA fasta file to protein sequence using GOTRANSEQ, and did the same analysis with same parameter.
Result: 169 queries aligned.
There is a huge difference between 2 and 3 method? Confused...!!
Which approach is better for predicting protein gene sequence for long reads ?
Is e-value a sufficient parameter for diamond blastp analysis? Do I need to give any identity % in case of first approach?
In addition, I would also like to confirm, whether I can directly use the translated file from PRODIGAL analysis (-a output) for DIAMOND?
Please help
Relevant answer
Answer
  1. In my opinion, it would be wiser to assemble the reads before doing any kind of analysis
  2. No wonder different methods give different results. The tools you are using are principally different, thus the results. You can go through the manual of each to see what they are basically designed for. Prodigal on one hand is designed for gene prediction thus it searches protein coding genes (full-length). Diamond on the other hand is a faster version of blast which tries to locally align two sequences irrespective to their functional characteristics or length.
After this, I don't think specific details are important to be further discussed.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I am looking fora command that will modify 3 chains available in the original pdb into a single chain and then renumber all of the residues. I have tried using alter command but when I export the pdb I get only one chain (of the initial trimer) and not the merged chain
Relevant answer
Answer
You can first renumber the chains using the alter command (https://pymolwiki.org/index.php?title=Alter&redirect=no) in such a way that each residue has a unique residue number,
e.g.
alter (chain B),resi=str(int(resi)+100)
alter (chainC),resi=str(int(resi)+200)
to give chains B and C an offset of 100 and 200, respectively.
then again use the alter command to change the chain label
e.g.
alter (all), chain='A'
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am trying to measure telomere length in an ant species using the TRF method, which is a Southern blot technique. Currently, I am struggling with analyzing the images and would appreciate any tips and suggestions on how to statistically analyze my data.
  1. Which software do you recommend to analyze the length?
  2. Any tips on how to image my membrane to get accurate results and is there anything to avoid while imaging?
I have attached an image for reference!
Thank you in advance and much appreciated
Relevant answer
Answer
TeloTool software. You can get more information from this paper: "TeloTool: a new tool for telomere length measurement from terminal restriction fragment analysis with improved probe intensity correction" (https://doi.org/10.1093/nar/gkt1315).
Good luck!
  • asked a question related to Bioinformatics Analysis
Question
1 answer
PSSM(Position-specific scoring matrix) is one of the key features to be used for B cell conformation epitope prediction but I am confused about how to use it as a feature.
Relevant answer
Answer
You can use : PFeature
go to the evolutionary info.
  • asked a question related to Bioinformatics Analysis
Question
1 answer
I am researching immune checkpoint genes in oral cancer. I wish to know how to use bioinformatics analysis to predict the synergistic drug combination of a certain immune checkpoint gene inhibitor and a common chemotherapeutic drug. If possible, this combination will provide the research direction for my future experimental design.
Could you please show me the paper, general methods, and database that I can do for achieving this goal? Thank you very much.
Relevant answer
Answer
Buen día, en el trabajo de investigación utilice productos que generaban inhibición frente al Streptococcus mutans, frente a un medicamento positivo que es la amoxicilina con acido clavulánico, que ya se conoce que hará efecto inhibidor, y efecto negativo con agua destilada, fueron medicamentos control, debes tener en cuenta, medicamentos control y medicamentos que quieres que realice el efecto para que puedas tener un mejor control de tu efecto inhibidor.
  • asked a question related to Bioinformatics Analysis
Question
3 answers
I am looking for a recent diagnosis for chikungunya virus through computational biology techniques.
Relevant answer
Answer
Molecular docking
  • asked a question related to Bioinformatics Analysis
Question
2 answers
I've read several papers that used a panel of genes to narrow the range of candidates in WES analysis. Some studies may select hundreds of so-called "phenotype-related" candidated genes, but some may use more. My question is how to design a panel that might be benifial to the following analysis?
Besides that obtaining knowledge from reviews, which database could I find out all possible genes that may have effects on the targeted phenotype? (OMIM? MouseMine?)
Relevant answer
Answer
there are many commercial products designed for such purposes, so call 'panel', Target sequence capture
  • asked a question related to Bioinformatics Analysis
Question
9 answers
I was trying to find a plasmid origin of replication, ori-finder did not find it, and also I tried to blast against their database and try to align to other similar bacteria, but with no success.
Can someone recommend a fairly simple program for someone who is not a bioinformatician?
I was also looking into GC skew analysis, if someone can recommend a program for that as well, i would appriciate it.
Thanks.
Relevant answer
Answer
I had find the the host bacteria own a native plasmid with RepA, How can I find the RepA's origin?
  • asked a question related to Bioinformatics Analysis
Question
2 answers
One of the steps during the preprocessing of the data from metatranscriptomic analysis is to remove any host reads (host contamination) by comparing to host database. But what if there is no host reference or the closest reference is the draft genome of the same family but different genus?
Relevant answer
Answer
In this case I will use Megahit, it is relatively fast de novo assembler.
Also removing host transcriptome is not necessary step, you can just skip the step and go for assembly.
Megahit was very useful for me in case of viruses.
In a case I used different species genome to remove host transcriptome and it was successful, but I don't know about different genus.
if there is no host genome I will go for these steps:
1- assembly with Megahit
2- identifying target organisms with blastn and blastx
3- mapping back rna-seq data to identified target organisms
4- reference guide assembly with Trinity assembler, it needs bam file of step 3
5- identifying target organisms with blastn and blastx