Science topic

Bioinformatics - Science topic

Explore the latest questions and answers in Bioinformatics, and find Bioinformatics experts.
Questions related to Bioinformatics
  • asked a question related to Bioinformatics
Question
3 answers
Could anyone recommend some good online Proteomics courses and/or books for both beginners and advanced students?
Relevant answer
Answer
  • asked a question related to Bioinformatics
Question
4 answers
For the molecular docking analysis using AutoDock Vina, how can I energy minimize hundreds of ligands in the ZINC15 drug library and also convert them to pdbqt?
Thank you very much!
Relevant answer
Answer
Salau Samuel Shina Thank you very much!
  • asked a question related to Bioinformatics
Question
3 answers
Hi, I have previously heard in a conference someone said "The isolates within group A are similar to group B with R > 0.2"
Is there a way for me to calculate the R value of two different groups of isolates based on their nucleotide/amino acid sequences or based on their sequence homology, so that in the end I could reach a conclusion just like the example I've provided?? Thank you so much!
Relevant answer
Answer
Dear Felix Hojaya,
Yes you can do it (Calculating the R: Transition/ Transversion index) by using MEGA X software. You just have to include all the sequences that you need to analyze and then go models section and then click on Estimate Transition/Transversion Bias). You will get the results based on the calculation of the all included sequences.
Wish you good luck
  • asked a question related to Bioinformatics
Question
3 answers
Hello everyone,
Any recommendations on whom to contact/use for bioinformatics services for analyzing micro RNA data?
Relevant answer
Answer
I use LatchBio for RNA-seq!
  • asked a question related to Bioinformatics
Question
1 answer
Hello everyone,
I did the assembly of chloroplast genomes for some Boraginaceae species, in a few species, I got an orientation problem where the rbcl gene position is within the circular shape in a clockwise direction, and the atpB and atpE genes position is are outside the circular shape in the anti-clockwise direction (please see the picture), and this is a different result from most assemblies of chloroplast genomes that have been published !!!. (usually, the rbcl gene is outside the circular shape and in the anti-clockwise direction while the atpB and atpE genes are within the circular shape and in a clockwise direction)
I am using Chlorobox to draw the gene map after Novoplast finishes the assembly, this issue happened with only two of 7 samples, the two samples are from the same family !!!, I change the seed and reference many times and still got the same result.
Any idea what I need to fix this?
Thanks!
Relevant answer
Answer
I think that maybe you have used blunt end digestion endonucleases, this makes inserts to be ligable in both directions (as your experimental results suggest). If that's the case, maybe this information is useful:
Both the Costa and Weiner or Delphi genetics Staby methods could fix this orientation problem.
  • asked a question related to Bioinformatics
Question
4 answers
Hello dear researchers,
can you introduce bioinformatics journals suitable for the field of computer science? Thank you very much for your guidance
Relevant answer
Answer
Computational and Structural Biotechnology
Computers and Electronics in Agriculture
  • asked a question related to Bioinformatics
Question
2 answers
I am a young bioinformatics student, want to have clues for my project pipeline. hints and expert answers are welcome. THANKS
Relevant answer
Answer
I'd start with inferring the amino acid sequence and then BLAST it against proteins of known function. That way you can leverage what is known about the homologous, known proteins to build your starting hypothesis about your unknown protein. You can then design more targeted experiments to test your hypothesis.
  • asked a question related to Bioinformatics
Question
3 answers
I have a good R and statistical analysis background (also with machine learning). in addition, I'm a fresh biotechnology grad. I would like to try to replicate some Rna-seq analysis using R papers (with their provided data). Any SHORT (beginner-friendly) papers to recommend?
Relevant answer
Answer
This might be a good tutorial for you to analyze RNA-seq data using R
  • asked a question related to Bioinformatics
Question
10 answers
Hi. I am working with arthropods (decapods, stomatopods, maxillopods...) using COI standard genetic marker for the identification. I am trying to assign my sequences to reference sequences from GenBank (via MIDORI). But what percent identity do I have to choose ? I don't find any resource mentioning it. Your kind response would be highly appreciated. Thanks Regards Lisa Loze
Relevant answer
Answer
Apply the species delimitation methods (ABGD, ASAP, mPTP, bPTP, ....)
  • asked a question related to Bioinformatics
Question
9 answers
Hi, I am new to bioinformatics. My project involves Next-generation sequencing to look at differential gene expression of treated and untreated cells. I found that many papers have different methodologies. Some paper uses Linux but I am not good at it. Can anyone suggest a standard pipeline that uses R studio on Windows PC? Your valuable suggestion will be greatly appreciated.
Relevant answer
Answer
Definitely check out the LatchBio RNA-seq analysis pipeline. It's super easy to use and cloud-based which I've found convenient.
  • asked a question related to Bioinformatics
Question
5 answers
Hi to everyone,
we have a list of genetic alterations that we found by sequencing and comparing two mice treated with tamoxifen. When we compared the two genome we found 1600 genetic alterations ranging from single substitution to 50-base insertion. We have the exact location of the mutations and the gene involved.
Is there a way to understand if the mutation did hit an exonic region? I'm not an expert in bioinformatics but I'm quite confident that this question should not be too difficult with the right tool. As I sad I'm not an expert bioinformatic but I can do very basic programming and analysis
Thank you in advance for your help
Relevant answer
Answer
Thank you all for your helpful suggestions. I'm going to format my file to run Annovar and maybe also snpEff. Thank you again!
  • asked a question related to Bioinformatics
Question
2 answers
Hii, Is there a way I can extract the alternative spliced protein isoform structures from PDB? Also can we mapped the structure to uniprot sequence So we can know which structure belong to which isoform sequence?
Relevant answer
Answer
Unfortunately, most of the databases contain 3D structures only for canonical isoforms. But, you would try Google Colab, a phyton-based online notebook running AlphaFold2, which can predict the structure of any custom sequence or noncanonical isoform. Check this out https://youtu.be/le7NatFo8vI
  • asked a question related to Bioinformatics
Question
3 answers
I want to download transcriptomes from different types of immune cells T, B, Effector T cells, plasmatic cells, monocytes, macrophages, etc. It can be from microarray, RNAseq or scRNAseq from human tissues
  • asked a question related to Bioinformatics
Question
1 answer
Hi everyone,
I'm running some analyses of differential expression (DE) in edgeR and then I want to test for enrichment using only the up-regulated genes using topGO. Curiously, when I apply more restrictive criteria during the analyses of DE, I tend to get MORE GOs significantly enriched.
For example, in a DE analysis using p<0.05, I got 337 over-expressed genes. Using these 337 in the enrichment analysis, I got 21 GOs significantly enriched.
If I run the same DE analysis but with p<0.01, where I get 122 over-expressed genes, enrichment analysis results in 31 enriched GOs.
Is this normal? Am I missing something? Looks counterintuitive to me.
Relevant answer
Answer
It could happen.
Let's suppose you have a pull of 20 genes. The you do a DE analysis and get 5/20 over expressed genes. During the enrichment analysis all GO terms where these genes are annotated will be check. For a specific term the chances of getting, for example, two gene that are over expressed and annotated to that term are small so that term has a high probability of being enriched.
But now let's suppose you have 10/20 over expressed genes. Here the chances of getting two genes that are over expressed and annotated to that GO term are not that small, so maybe that same term here is not enriched.
It not only depends of the amount of over-expressed genes you have, for example the size of the GO terms it also important.
I know is not the best explanation but I hope it serves to you.
Greets,
Borja
  • asked a question related to Bioinformatics
Question
5 answers
My area of research is genomic signal processing. I need to give names two experts from outside India in this area to review my work for a journal.
Can anyone kindly suggest experts in the areas of genomic signal processing, signal processing , Bioinformatics.
Relevant answer
Answer
I am working in Genomics Signal Processing ( Big-data Analysis) for the last 11 years and completed his Ph.D. in this domain itself.
Ph.D. thesis title is “Characterization of Periodicities in DNA Sequences Using Signal Processing” and the following contributions have been made by him in this domain:
(i) Journal Publications: 09
(ii) Conference Publications:04
(iii) Book Chapter:01
(iv) Ph.D. Guidance: 01 (Detection and Localization of the Hidden Patterns in the DNA Sequences Using Signal Processing-2022)
Also good knowledge of machine and deep learning algorithms.
Any researcher associated with this field can contact to me for help.
  • asked a question related to Bioinformatics
Question
4 answers
Is AlphaFold accurate enough when the protein shares less than sequence similarity with the closest, structurally solved homologue?
Besides, how accurate is Alphafold in de novo structure prediction of protein families without solved structures?
Relevant answer
Answer
Several studies have found that AlphaFold predicts better structures than other approaches. The EBI alphafold database (alphafold.ebi.ac.uk/faq) displays the per-residue confidence score (pLDDT). It provides information about the model's accuracy. The majority of plant proteins in the EBI AlphaFold DB (alphafold.ebi.ac.uk/search/text/Oryza%20sativa%20subsp.%20japonica) have a pLDDT score of >50. Even proteins with less identity appear to be accurately predicted by the AF. For longer proteins and disordered proteins, AlphaFold has some limitations.
  • asked a question related to Bioinformatics
Question
5 answers
Does anyone have experience with the table2asn tool of NCBI? It is used for submitting annotations of genomes using GFF files.
I am having a problem with it.
1) The windows tool I get running, but it gives me an error. The command used is the following: table2asn -M n -J -c w -t template.sbt -a r10k -l paired-ends -i FinalContigs.fsa -f 2700988623.gff -o output_file.sqn -Z output_file.dr -locus-tag-prefix xxxxx
template.sbt is a file generated by NCBI
FinalContigs.fsa is a file containing all the different contigs of the draft genome. The first contig is named Contig001, seconf contig is Contig002, and so on.
2700988623.gffis the file containing the annotation. Column 1 gives the contigs where the CDS and RNA are found in, so Contig001, Contig002, etc.
After running the command (as told by NCBI), I get the following error:
Cannot resolve lcl|Contig002: unknown
Line: 0
So if anyone could assist me with getting the tool running in Windows, that would be very much appreciated as it is the final step of submitting my genome.
Thanks in advance!
Relevant answer
Answer
Muhammad Arslan This is like 4 years late but maybe it gets found by someone in the future and can serve as a record. I tried the tool on all three OSs and the same problem persisted. However, it is not that chmod +x does not work, it is that after it is run it is really hard for the terminal to see the file and run it. In all three cases, this is overcome by adding the path to the file in front of its name as you run it. So, for example, instead of running
table2asn -M n -J -c w -euk -t...
You would add in the path and run
/Path/to/thefile/table2asn -M n -J -c w -euk -t....
Easiest way to know wha the exact path is is to just drag and drop the file into the terminal window and the terminal will tell you
Furthermore, to be more explicit, the exact steps to get this tool to run are:
1) Download it, unzip it (on Linux, or Ubuntu on PC or equivalent) you run "unzip NameOfFile" and you may have to install unzip first via "sudo apt-get install unzip"
2) Run chmod +x on the unzipped file. I did confirm that at this stage, you can rename the file to whatever you want, such as instead of "linux64.table2asn" you can just name it table2asn (I think the full name will be table2asn.table2asn but you can call it up with just table2asn)
3) Ru it with the path, or if you know what you are doing, you can add the file to the path and then it will run with just "table2asn" instead of needing to include the path. However, and this was so confusing to me, even if your terminal is in the same directory that the table2asn file is in it will still not see it and call it up, so I am not 100% sure if the "adding to path" route would fix the issue; I didn't try it yet
Anyways, for anyone searching the web for how to get this to work, there's the steps. A more comprehensive README if you will. As for all the options and tools, I haven't actually run this successfully yet (need to specify locus-tags apparently) so I cannot comment about the options yet
  • asked a question related to Bioinformatics
Question
7 answers
In plants, phyohormones induced responses generally see the binding of transcription factors to the special motifs present in the promoter of a downstream gene. After a few preliminary results, I was checking if the well known DNA binding element is present in my gene's promoter or not. Although I found a few variant of the original motif sequence in the promoter region, the exact motif which I was looking for I found is present in some +100 base pairs down to the start codon. In my understanding as far as I know, the DNA binding elements (motifs) are generally found in the upstream of the protein encoding region (promoter usually). But does anyone knows about cases where transcription factor binding sites, the DNA motifs are also present inside the gene itself?
Relevant answer
Answer
There are many, many examples of TF binding sites in the gene body. After the first whole genome ChIP experiments (ChIP-chip and ChIP-Seq) , it became obvious.
  • asked a question related to Bioinformatics
Question
4 answers
Dear All, I have a data set  from an RNA seq. experiment to compare treated Vs control group in several breeds of animals. In each breed, I want to see how is the extent of the sepration between treatment vs control groups based on 18.000 genes being DE between the two groups. In another word, in which breed was the effect of treatment the greatest ?. Usually one visualize this using MDS plot, but I can not estimate this visually. In my MDS analyses (either the one that run from inside edgeR or the one run on the CPM TMM normalized counts) I could see sepration between treated vs control groups in two breeds.
The question is 1) How can I estimate numerically the divergence of the two groups in each breed so that I can tell the difference in treatment effect in among breeds. I have actually tried calculating the Euclidean distance among control and treated individuals within each breed, but it gave a distance number between each individual within each group in a pairwise manner, so what is the best method to show an overall dissimilarity between control and treatment given the distance between each pairs of individuals of the groups. I also thought about correlation coefficient, but it gave strange results.
I have another question: Are there any relation between the common dispersion (estimated during the DE analyses in EdgeR) among genes in one breed and the number of sig DE genes identified in this breed and the degree of sepration between the two groups in the same breed ? in another word, Is this right that if for instance I found that in breed A the two groups have greater dissimilarity than in breed B, Should I expect higher dispersion for the genes or higher number of sig DE genes in breed A than in B ?
Any comment would be helpful
Relevant answer
Answer
Euclidean distance presumes the linear indpendence of the features used to calculate the distance. This is not the case here, so you can project the data set into a principal component space and then computing the distances using as features the component scores or you could use the Mahalanobis distance that take into consideration explicitly the existing correlation between features https://en.wikipedia.org/wiki/Mahalanobis_distance.
So said you can compute all the distinct pairwise distances between each couple made by a treatment and a control subject. The mean of these distances (computed over all the couples made by a T and a C subject) will be an estimate of the differences between groups while in the same time the standard deviation of these distances can give you and idea of the variability of these distances and the same procedure as applied to homogenous (Ci,Cj and Ti,Tj) distinct pairs made of both C and T couples will allow you to check for the increase in distance of hetrogeneous (C,T) vs homogeneous (CC and TT) couples.
  • asked a question related to Bioinformatics
Question
2 answers
I want to use it in a project, peptibody project, that we use an Fc-region attached to peptides.
Relevant answer
Answer
You could try UmabDB (The Antibody Therapies Database). A 7 day free trial is available.
  • asked a question related to Bioinformatics
Question
1 answer
Hey guys,
I have a list of full mouse TCR sequences that I got from my vdj scRNA seq experiment, I am just wondering if there is a tool that might help me to predict what these TCR might recognize?
Relevant answer
Answer
Not really. There's been a ton of effort going into developing tools to do this but most are in early stage development and not ready for routine usage. The main issue is a lack of training data that can be used to refine these algorithms.
There are new in vitro high-throughput functional TCR antigen profiling techniques that have come online in previous years, including the one we developed: https://www.researchgate.net/publication/336310777_Rapid_selection_and_identification_of_functional_CD8_T_cell_epitopes_from_large_peptide-coding_libraries
These approaches are powerful because they let you directly test large libraries of candidate sequences in an unbiased way. If you have any interest in trying out our method, send me a DM and we can talk further.
  • asked a question related to Bioinformatics
Question
9 answers
Hi dear friends,
I am a beginner in Bioinformatics and looking for tutorial sources of it. I have found massive references some of them do not match my goal or teach for a higher level, so I have gotten confused.
Can you refer me for the best free sources where I can go through step by step?
Thanks in advance.
Relevant answer
Answer
Hello Mohamed Khashan.
I know it's very difficult to start in Bioinformatics, but if you have a degree in Biological Sciences, you should learn a programming language (Python, R, C++...).
During all discovers, in my opinion, the introductions of the articles is so many important. I spent some time reading this.
In this article you can find out about omics data in general and its tools (https://www.sciencedirect.com/science/article/pii/S0961953419300868). Regarding pipelines, I advise you to look for notebooks in bioinformatics, like these:
Python for genomics in Coursera (https://www.coursera.org/learn/python-genomics)
ISCB (International Society for Computational Biology) always promotes online courses and conferences for the community in general.
Cambridge offers courses (https://bioinfotraining.bio.cam.ac.uk/)
Book:
Go ahead! Good Luck.
  • asked a question related to Bioinformatics
Question
1 answer
I have performed MD simulation of 100ns on Desmond and now I want to calculate solvent accessible surface (SASA) area over the trajectory. Can Anyone please guide how to do it? Thanking you all in Anticipation.
Relevant answer
Answer
you should generate the data file through the raw data. Automatically software will calculate all the required parameters.
  • asked a question related to Bioinformatics
Question
13 answers
Dear scientists,
I got a set of around 4000 protein ids from a proteomic experiment and I would like to globally analyse if the particular groups of proteins in my experiment are significantly more hydrophobic and/or aggregation-prone compared with other groups. I am looking for an R programming library or a web tool that will enable me to obtain some quantitative value for hydrophobicity per protein for my sets. One thing I may do is to just simply calculate the sequence length adjusted number of hydrophobic amino acids C, L, V, I, M, F, W but this seems to be a little naive and I am not sure about the biological relevance of such a simple calculation not taking into account the whole structural aspects of the sequences...I would be glad for ideas on any smarter approaches...please help
Relevant answer
Answer
Basically, what you want to do is to determine the amino acid composition for each sequence (e.g. in this R package https://cran.r-project.org/web/packages/protr/vignettes/protr.html using extractAAC()) and multiply the number of times a given amino acid occurs with its hydrophobicity index in your chosen scale (see https://web.expasy.org/protscale/ for different scales) and sum up the values. However, in my experience, hydrophobicity is a poor predictor of the aggregation propensity of folded proteins, as aggregation is frequently linked to imperfect folding rather than to the association of properly folded molecules.
  • asked a question related to Bioinformatics
Question
3 answers
Current search engines for MS/MS protein identifications such as: Mascot, MS Amanda, Sequest, etc., currently rely on the creation of a search library composed of computationally generated potential peptides through the cleavage by proteases (e.g., trypsin) of proteins from a given database. Different PTMs can be added to these computationally generated peptides, so that the search could be extended to address specific scientific questions, but this leads to significantly higher computational costs.
I have recently come across a case, where a highly enriched short protein could not be identified by a standard search, given that it was only generating a single peptide that had 2 fixed modifications. The modifications were not the most common there were and finding the right combination to use was time and computationally expensive.
I would like to open a discussion on the fact that pre-made peptidome libraries are a much better alternative to de-novo generated libraries of proteomes. Let’s get into the details!
As an example, I will use the ACE2 receptor, now infamously known to be the entry gate of Covid-19 into human cells.
The human ACE2 receptor undergoes a series of post translational event, such as: proteolytic cleavage by ADAM17 resulting in a soluble proteoform, glycosylation and phosphorylation of tyrosine-781 and serine 783.
In current search engines, the tryptic peptides generated would be generated from the first Methionine to the next positively charged residue and so on until the very last residue of the protein. If one would like to detect this protein in a sample and asses the presence of the mentioned PTMS, you would need to look for at least 2 phosphorylation sites per peptide and also check for S and Y phosphorylation. The search engine will then generate all possible combinations of SY single and double phosphorylate tryptic peptides to search for, which leads to exponentially increasing computational costs.
Since the protein is also cleaved by another protease in vivo, the 2 peptides before and after this site will not be accounted for as they do not end/begin after a positive residue. Since this is not a small protein, other peptides will probably still be detected, and the protein will eventually be identified.
I imagine a tool which would be used to generate the tryptic peptides as before, only accounting for the known PTM sites. In case of the ACE2 2 almost adjacent phosphorylation sites, this would lead to only 3 additional peptides (pY, pS, and pYpS). If the research question being asked is to identify novel phosphorylation sites, then only 1 phospho-site per peptide while looking for STY phosphorylation might already suffice, since the known ones will have already been accounted for. This can be applied to any combination of PTMs, massively reducing computational requirements. It is of course counterproductive to looking for PTMs in sterically inaccessible regions for example (e.g., hydrophobic core of the fold)
Databases of know annotated PTM sites of entire proteomes of many organisms are readily available. The tool could have a modular design in allowing the user to create a customized peptidome having any or all the following characteristics: trypsin/other enzyme used and/or accounting for known endogenous cleavage sites and/or accounting for known PTMs sites and/or accounting for natural variants.
I see a long list of advantages using this method and I would like to list the most important ones:
1. Identification of additional hits that could have been missed due to several reasons (e.g., tryptic peptides contain fixed modifications while not searching for these specific modifications due to computational resource limitation, or worse, small protein that would normally only yield in a single peptide that has 2 fixed modifications, one of which might be exotic)
2. Reduced computational time when trying to identify novel PTM sites
3. Lower false discovery rate since the peptidome used will be a much more closely related dataset to the actual sample composition than just a simple tryptic proteome and as a result newly identified spectra of interest can be more confidently assigned as the risk of artefacts is lower.
4. Single nucleotide polymorphisms can be analyzed analogously to PTM sites and would not result in exponentially larger search database.
5. More unique peptides could be assigned: If 2 proteins share a tryptic peptide, but one is known to be phosphorylated in this peptide but not the other, one could distinguish the phosphorylated peptide as having come only from one of the 2. In case of glycosylation this makes even more sense since some types of glycosylation only appear in a limited number of proteins, depending on their cellular localization
As the human proteoform project is taking on, maybe this would be the way of MS based proteomics to quickly catch up and help this project while advancing itself.
What are you thought on this? Are there any ongoing projects that would aim to do just that?
Relevant answer
Answer
To me, this debate seems somewhat reminiscent of the peptide-centric vs the spectrum-centric approaches.
Limiting the search space is generally a good idea for reducing FDR. Of course, if your peptide is not in your “limited” library you have no chance of identifying it. I see this as the biggest issue with this type of approach.
X!Tandem (and now other search tools) takes the approach that you do a broad initial search with few PTMs specified, then you broaden the PTMs once you have a smaller list of proteins to search. A neat approach in my view.
I’d be very careful with this source of information; “Databases of known annotated PTM sites of entire proteomes of many organisms are readily available.” I know of someone with a lot of de novo MS/MS experience who has undertaken an extensive manual review of phosphopeptides in the databases. The estimate (unpublished) is that around 30% are wrong. As tools progress and the amount of data increases, we look less at the raw MS/MS data. This is for very practical reasons, no one can manually verify 10,000 phosphopeptides, but we still need care when using this type of data.
Here are some papers that may be of relevance for you;
Lu, Yang Young, Jeff Bilmes, Ricard A Rodriguez-Mias, Judit Villén, and William Stafford Noble. “DIAmeter: Matching Peptides to Data-Independent Acquisition Mass Spectrometry Data.” Bioinformatics 37, no. Supplement_1 (July 1, 2021): i434–42. https://doi.org/10.1093/bioinformatics/btab284.
Searle, Brian C., Lindsay K. Pino, Jarrett D. Egertson, Ying S. Ting, Robert T. Lawrence, Brendan X. MacLean, Judit Villén, and Michael J. MacCoss. “Chromatogram Libraries Improve Peptide Detection and Quantification by Data Independent Acquisition Mass Spectrometry.” Nature Communications 9, no. 1 (December 3, 2018): 5128. https://doi.org/10.1038/s41467-018-07454-w.
Ludwig, Christina, Ludovic Gillet, George Rosenberger, Sabine Amon, Ben C. Collins, and Ruedi Aebersold. “Data‐independent Acquisition‐based SWATH‐MS for Quantitative Proteomics: A Tutorial.” Molecular Systems Biology 14, no. 8 (August 1, 2018): e8126. https://doi.org/10.15252/msb.20178126.
  • asked a question related to Bioinformatics
Question
1 answer
After finishing the simulation of the cyclic peptide, I tried to find the most populated structure using the cluster peak density algorithm. from the literature, the representative structure was chosen as the structure with maximal ρsum (The summation of local densities of all residues in one structure, ρ𝑠𝑢𝑚 = ∑ ρ𝑖𝑛_𝑟𝑒𝑠𝑖=1) so how can I extract the structure which has the highest density for the all residue?
ref: Clustering by Fast Search and Find of Density Peaks. Science 2014, 344, 1492–1496
Relevant answer
Answer
Dear Sam Mohel ,
luster analysis is an exploratory analysis that tries to identify structures within the data.  Cluster analysis is also called segmentation analysis or taxonomy analysis.  More specifically, it tries to identify homogenous groups of cases if the grouping is not previously known.  Because it is exploratory, it does not make any distinction between dependent and independent variables.  The different cluster analysis methods that SPSS offers can handle binary, nominal, ordinal, and scale (interval or ratio) data.
Regards,
Shafagat
  • asked a question related to Bioinformatics
Question
11 answers
Hi. I want to convert ~4000 ligands in .sdf files into .pdb or .pdbqt format for molecular docking. I used " >obabel *.sdf -opdb -m " command but the structure of the ligands changed drastically. attached is one of the ligand after conversion.
Is there any way of converting all ligands simultaneously without compromising its structure?
Relevant answer
Answer
Dear Tengku Kamilah
Try to download your sdf structures in 3D formats not 2D
Then, Run this code;
Firstly, splitting into separate files ;
$ obabel -isdf 4000.sdf -osdf -O *.sdf --split
Then, Energy Minimize:
obminimize
obminimize -ff MMFF94 -n 1000 *.sdf
Third to pdbqt:
$ obabel -isdf *.sdf -opdbqt -O*.pdbqt
  • asked a question related to Bioinformatics
Question
7 answers
I would like to draw an haplotype network in Geneious directly, without using any other software. Is there a way to do that? Any plug-in that could help?
Thank you!
Relevant answer
Answer
As far as I know, you can find uncorrected p-distances in Geneious once you have your alignment.
  • asked a question related to Bioinformatics
Question
6 answers
Hi,
GO and KEGG functional analysis for a gene set was using the DAVID database (https://david.ncifcrf.gov/). However, the adjusted p-values (Bonferroni and Benjamini) of the enriched GO terms and KEGG pathways were more than 0.5. Meanwhile, a PPI network was constructed using the STRING database (https://string-db.org). The network was constructed with a confidence score of  0.4 was set as the cutoff criterion with no more than ten as the maximum number of interactions in the first shell. This step added a few more genes to the gene list, and genes with no interactions were removed. When the updated gene list was used for GO and KEGG functional analysis, the enriched GO terms and KEGG pathways were now significant (p-value < 0.05). Is the attempted workflow valid?
Relevant answer
Answer
Thank You, Dr Giovanni Colonna, for taking your time in answering the question. I concur with your explanation.
  • asked a question related to Bioinformatics
Question
4 answers
Over the last few months, I have come across several posts on social media where scientists/researchers even Universities are flaunting their ranking as per AD Scientific Index https://www.adscientificindex.com/.
When I clicked on the website, I was surprised to discover that they are charging a fee (~24-30 USD) to add the information of an individual researcher.
So I started wondering if it's another scam of ‘predatory’ rankings.
What's your opinion in this regard?
Relevant answer
  • asked a question related to Bioinformatics
Question
7 answers
Hello,
I have a very small knowledge in bioinformatics, and part of my research project is based on analysis of proteomics and metabolomics data. However, I am struggling to find some resources (webinars, courses, websites, ...) to help me get started with understanding and analyzing my data. I would appreciate it if anyone can give me some suggestions.
Thank you!
Relevant answer
Answer
Did you perform the experiment yourself? What kind of digestion? Do you have raw files? How versed are you with LC-MS/MS? Do you have access to any proprietary software e.g. Proteinscape or protein discoverer? Would you like to analyse the data yourself? For proteomics data MaxQuant is a wonderful and user friendly resource and a number of videos are available. Besides, the manual available on its website is quite self explanatory. For metabolomics data, MetaboloAnalyst is the analogous software. However, make yourself versed with the Jargon.
  • asked a question related to Bioinformatics
Question
4 answers
Were anyone able to find the exact sequences related to those Ur-genes? Were they are already sequenced or not?
if anyone found those, from where?
Actually I am new to this field, so I have so many complications regarding this sequence findings using bioinformatics data bases. So if some one can help me, I really appreciate it.
Relevant answer
Answer
Take Ur genes from NCBI nucleotide database of closely related species of common bean, and blast search with the genome.
Look for protein coding regions from the sequence obtain from the blast result
Later check again by blasting to nr database from ncbi, you should get Ur genes hit
  • asked a question related to Bioinformatics
Question
1 answer
Developing bioinformatic pipelines
Relevant answer
Answer
You don't necessarily need that. Check this...
Zhang et al, cfDNApipe: a comprehensive quality control and analysis pipeline for cell-free DNA high-throughput sequencing data, Bioinformatics, Volume 37, Issue 22, 15 November 2021, Pages 4251–4252, https://doi.org/10.1093/bioinformatics/btab413
Larson et al., A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype-specific biomarkers for cancer detection. Nat Commun 12, 2357 (2021). https://doi.org/10.1038/s41467-021-22444-1
  • asked a question related to Bioinformatics
Question
10 answers
Can I dock multiple ligands in Autodock (either sequentially or simultaneously)?
Relevant answer
Answer
Hi Nitin Kulhar thanks for the message.
Though I have solved my problem that time and published an article on this:
Thanks Again.
  • asked a question related to Bioinformatics
Question
2 answers
Hi, I want to predict post-transitional modification for phosphorylation. I found lots of websites like Phosida, PhosphoSite Plus. I am just curious about is there any python code for this phosphorylation prediction. If you have, could you share the GitHub link?
Relevant answer
Answer
Shaban Ahmad thank you
  • asked a question related to Bioinformatics
Question
4 answers
There are the rust-resistant genes identified with Ur- symbol, but sequences for them can't find in NCBI, Phytozome such databases. If those genes are not already sequenced, I want to know whether there are any approach that differs from above mentioned, to find out the sequences for those genes.
Relevant answer
Answer
You can alternatively find the same gene from a different organisms and blast that sequence against the Whole Genome Sequence of your query species and that will extract your query gene sequence.
  • asked a question related to Bioinformatics
Question
8 answers
What are the limitations and disadvantages of Real-Time PCR (RT-PCR)?
What is a more specific and sensitive technique that can be used in the laboratory instead, particularly in cancer diagnosis?
Relevant answer
Answer
Limitations of End-Point PCR
Agarose gel results are obtained from the end point of the reaction. End-
point detection is very time consuming. Results may not be obtained for
days. Results are based on size discrimination, which may not be very
precise. As seen later in the section, the end point is variable from
sample to sample. While gels may not be able to resolve these
variabilities in yield, real-time PCR is sensitive enough to detect these
changes. Agarose Gel resolution is very poor, about 10 fold. Real-Time
PCR can detect as little as a two-fold change!
Some of the problems with End-Point Detection:
 Poor Precision
 Low sensitivity
 Short dynamic range < 2 logs
 Low resolution
 Non - Automated
 Size-based discrimination only
 Results are not expressed as numbers
 Ethidium bromide for staining is not very quantitative
 Post PCR processing
As like the conventional PCR, there are three main steps in real-time PCR;
Denaturation
Annealing
Extension
Denaturation occurs at 94°C where the double-stranded DNA is denatured and two single-stranded DNA is generated. The DNA is melted.
This single-stranded DNA is the sight of the annealing for the primers in the later step of the amplification.
Annealing occurs at 55°C to 66°C in which the sequence-specific primer bind to the single-stranded DNA. Along with it, the fluorescent dye or the probe bind to the DNA sequence too.
Extension occurs at 72°C at which the Taq DNA polymerase activated highest. In this step, the Taq adds dNTPs to the growing DNA strand.
Note: if the amplicons are less, combine the extension step with the annealing step (for real-time PCR only).
  • asked a question related to Bioinformatics
Question
3 answers
I have docked complexes generated by flexible docking via AutoDock Vina. To some extent, flexible docking introduces poor rotamers in the residues specified by user to be flexible. Is it possible to optimize the docked complexes prior molecular dynamics simulation? If yes, kindly mention the tools to do so.
Thanks
Relevant answer
Answer
It depends which software you'll use to perform the MS simulations. Most of the tutorials of GROMACS show how to perform the energy minimisation (that can correct these bonds). Usually, the steps for a MD are: minimisation, equilibration, and production (the MD).
  • asked a question related to Bioinformatics
Question
3 answers
I have to design a bioinformatics study to perform meta analysis of mirna. How should I approach it, what databases should I use and if I need any biocomputation approaches.
Any help is appreciated.
Relevant answer
Answer
I agree with Mohammad Sholeh .
There are different databases for different diseases. so you can use multiple gene expression datasets of that particular disease on which you are working and make a report of the expression pattern and clinical associations, and you may call it a meta-analysis, or you can also do a literature-based meta-analysis. Depends.
  • asked a question related to Bioinformatics
Question
2 answers
Good morning,
Please I have an issue with my run and I need some help.
We did a metagenomic 96 sample run on Minion Mk1B (short-read 16S 400bp amplicons). The run lasted for 72 hours. Output was 9.7 gigabases (FAST5 files are 241GB). After this base-calling was initiated.
After 24 hours, the base calling was only 17% at which we aborted the base calling to do it on our server.
After 4 days now and only less than 20% is done. Why is it taking so long? We are using guppy v5.
What is the expected output size for such runs?
Does analysis usually take this long?
Is there a time limit by which we should stop the runs?
Thank you in advance.
Relevant answer
Answer
In my experience, yes, it takes many days to do base-calling with the guppy base-caller. However, it depends on what guppy version you used; if you use the guppy CPU version it will be much slower compared with the GPU version. The number of threads you used also affects the base-calling process.
I think no time limit for this. You just need to wait until 100% to get all your data converted to FASTQ by the program, or, you may need to adjust the base-calling parameters or change the program version and re-run the guppy base-caller to make it faster.
  • asked a question related to Bioinformatics
Question
1 answer
Hi all,
I'm working with an RNA-seq data set consisting of a large number of samples, sequenced at around 50-80M reads. There's a bit of uncertainty as to what the precise experimental workflow was for generating these data, but my best understanding at the moment is that the TruSeq RNA sample preparation kit was used (https://www.illumina.com/documents/products/datasheets/datasheet_truseq_sample_prep_kits.pdf).
This kit starts with total RNA, uses oligo-dT beads to bind polyA+ mRNA, then fragments the mRNA and carries out cDNA synthesis with random hexamer primers.
The data I've seen thus far show a very strong bias towards the 3' end of transcripts, in some cases so extreme that only the exons at the very 3' end are covered, with the rest of the regions having close to no reads at all. This bias is particularly pronounced in genes with long transcripts.
I'm aware that using oligo-dT priming is known to introduce a 3' bias into RNA-seq data as the reverse transcriptase will not always be processive enough to reverse transcribe in one go, but I'm at a loss to explain why the approach above might generate 3' bias if random hexamers were used.
Could anyone suggest any ideas as to what the possible causes of 3' bias in RNA-seq data might be? Are there any causes other than oligo-dT priming?
Would also really appreciate a link to a paper if one exists. Thank you!
Relevant answer
Answer
This could be due to the mRNA being somewhat fragmented even before the polyA+ capture. RNA is fragile stuff.
  • asked a question related to Bioinformatics
Question
1 answer
Tl;dr: I’m trying to convert gene IDs of an obscure MRSA strain from Ensembl Bacteria to KEGG.
Hello,
I’m trying to do a pathway enrichment analysis of MRSA strain 107 using GSEA. I have gene expression data that are associated with the gene IDs from Ensembl Bacteria. I plan to use KEGG as my pathway database.
GSEA requires a .gmt file of the gene IDs/enrichment data (of which the gene IDs are from Ensembl), then requires a pathway file (from KEGG). If I try to do the analysis with both of these files, the gene IDs don’t match up, so GSEA can’t do it.
My question is whether there’s a way to convert these gene IDs specifically with these strains of MRSA from Ensembl Bacteria to a site like KEGG. Here are the resources I’ve already tried:
DAVID
Dbtodb
Syngoportal
G:convert
MetaScape
BioMart from Ensembl
Annotationdbi
All these are tools that work, but they don’t include my strain. How should I convert these Ensembl Bacteria gene IDs? Is there another option I don’t know about?
PS. I don’t need to use KEGG; if a different pathway database works, that would also be acceptable.
Relevant answer
Answer
If you're having an issue finding an exact ID match, you can try this method.
You collect all protein sequences of the strain and use BlastKOALA/GhostKOALA (tool available in the KEGG) to perform Blast. It will provide you with the KEGG's KO IDs. These IDs can also be used for pathway analysis.
Thank you
  • asked a question related to Bioinformatics
Question
4 answers
I did genotyping for a 92 patients, then I calculated the allele frequency for them. I get chi2 = 29, which is high, I have to reject the null hypothesis and indicate that my population is out of Hardy-Weinberg equilibrium. Is this is normal, if it is not, what I can do?
Relevant answer
Answer
Dear Suzanne,
If you are confident about your statistics, one or more assumptions behind the Hardy-Weinberg equilibrium must be false. It is most likely that your sampling of patients is not random and does not reflect the real population structure that they belong to.
There are two interesting possibilities.
First, their might be something unique about your group of patients which is linked to the alleles you have been investigating; i.e., the patients, because they all have the same type of medical condition may mean that the alleles you measured (which somehow underlies the medical condition), will be in disequilibrium. You could test this (perhaps) by looking at a second set of alleles that have nothing to do with the medical condition, and these we would expect to be in equilibrium.
Second, all of your patients may be from a population of people in which the alleles you are investigating are in disequilibrium (and the alleles are not directly linked to the medical condition they may all have). This might be the case if all of the patients were in a regional hospital serving an area which had little migration and mixing, but less likely if they were in a metropolitan hospital that serves a diverse city. You could test this by looking at other alleles not linked to the medical condition, and in this case we would expect them to be in disequilibrium too.
Regards, Andrew
  • asked a question related to Bioinformatics
Question
4 answers
Hello, I am trying to apply vcftools --diff in order to extract the different variants between two VCF files.
vcftools --vcf marked_I_tumor-pe.vcf --diff marked_I_normal-pe.vcf --diff-site --out t_v_n
I am getting this as result :
VCFtools - 0.1.16 (C) Adam Auton and Anthony Marcketta 2009 Parameters as interpreted: --vcf marked_I_tumor-pe.vcf --out out.diff.sites --diff marked_I_normal-pe.vcf --diff-site Comparing sites in VCF files... Found 75584 sites common to both files. Found 419593 sites only in main file. Found 84102 sites only in second file. Found 2908 non-matching overlapping sites. After filtering, kept 498085 out of a possible 498085 Sites Run Time = 6.00 seconds 0
I want to extract these 419593 sites which only belong to the main file (the first file) do you know if there is a way to do that? Can these sites that I want to extract be in a new vcf file? If you could help me, I would be more than thankful!
Thanks
Relevant answer
Answer
Run the vcf-isec command along with the -c option. For example, to get the variants present only within your tumour samples, try inputting "vcf-isec -c marked_I_tumor-pe.vcf marked_I_normal-pe.vcf > tumor_only_variants.vcf", and for normal variants only, just swap the order of the input vcf files. Hope the above helps.
  • asked a question related to Bioinformatics
Question
5 answers
I have two vcf files corresponding to the results of healthy tissue and tumor tissue. I want to compare these vcf files and remove their similarities. More specific I want to remove the information of the healthy tissue from the tumor one. Have you any suggestions on which tool I should use or any way that I can do my analysis?
Thanks in advance.
  • asked a question related to Bioinformatics
Question
4 answers
How to simulate the antimicrobial activity between a microorganism (mainly Mycobacterium tuberculosis) and an antimicrobial peptide? Bioinformatics!
Anyone interested un partnership?
Relevant answer
Answer
So, then what is the question? Just write down the (proper) ODE system and solve it. Or, at least, look for steady states and other limit regimes, and investigate their stability. As the first (however the great one) step.
  • asked a question related to Bioinformatics
Question
9 answers
Hi,
I have a highly skewed data in which there are many zero rpkm values. I can not show my data in boxplots due to the skewness. Therefore I tried to use log of the values for the boxplot. In that case, I had to remove the zero values.
What would you suggest for these trouble-making zeros ? Btw, it is not okey to remove zeros because they are almost 15% of the total data. So, it tells me something.
Does giving small numbers like 0,00001 work ? Does it change the student t-test's result? Even if it works, is it appropriate to show some skewed data in log format?
Relevant answer
Answer
In case that you have multiple samples, I would suggest to do a normalization using R. For example, the TMM method (PMID: 20196867) using edgeR should give normalized log-read-per-million value for all genes with a minor shift for 0-count genes, which were generally acceptable. Surely the last step is to transferring the output into RPKM for your use.
  • asked a question related to Bioinformatics
Question
3 answers
I have a dataset (fastq files) of 15 fastq flies uploaded on Galaxy bioinformatics portal. I deleted them some time back to free some space. Now I want to retrieve these files. I can see these files as deleted, but not able to restore or download them. Is there any way to get these files back on my portal.
Relevant answer
Answer
Sure! Note the
x shown, y deleted
stats displayed just beneath the history title? Clicking on the "y deleted" part will unhide the deleted datasets in your history bar. The ones you haven't purged yet should have an Undelete it link, which will do just what it says
  • asked a question related to Bioinformatics
Question
8 answers
I am interested in obtaining protein models from sequences containing aminoacids with changes in lateral chains (e.g. S-methyl thiocysteine instead cysteine).
Relevant answer
Answer
To perform minimisation in Vienna-PTM, you just have to change the "Minimize" field in the Web interface from "disabled" to "enabled", or you download the parameter files to use with a local installation of the GROMACS software package ( http://www.gromacs.org )
The way parameterization for Vienna-PTM is described in these papers:
Parameterization for post-translational modifications
  1. Petrov D*, Margreitter C*, Grandits M, Oostenbrink C & Zagrovic B: “Development and verification of force-field parameters for molecular dynamics simulations of protein post-translational modifications”PLOS Computational Biology, 9(7) (2013)
  2. Margreitter C, Reif M & Oostenbrink C (2017) "Update on phosphate and charged post-translationally modified amino acid parameters in the GROMOS force field" Journal of Computational Chemistry, 38(10), 714–720 .
There is also Swiss Side Chains (https://www.swisssidechain.ch), which provides alternative datasets for non-natural amino acids, and plug-ins for PyMOL(https://www.swisssidechain.ch/visualization/pymol.php) and UCSF Chimera (https://www.swisssidechain.ch/visualization/chimera.php). You can use these programs to substitute the canonical amino acids by the modified ones using these programs. The method used to generate the necessary files is described in this paper: https://www.swisssidechain.ch/data/Gfeller_2012_1.pdf . Again, you would download the parameter files to use with GROMACs for minimisation and molecular dynamics.
  • asked a question related to Bioinformatics
Question
9 answers
Hey guys, I was just wondering whether there are any public clusters for evaluation of NGS data like ? Thanks in advance for your comments!
Relevant answer
Answer
I have the best experience with the Galaxy cluster of bioinformatics tools https://usegalaxy.eu/ (it has a cutting-edge workflow for nearly all types of sequencing (DNA, RNA, long reads/short reads, metagenomics, epigenomics,...)
Hope this helps
Martin
  • asked a question related to Bioinformatics
Question
8 answers
#MachineLearning Hello researchers, I was wondering that what are the most remarkable applications of machine learning algorithms in Biological sciences? My naive thoughts are: DeepMind by Google: Protein Structure predictions #Epitope predictions by DTU - Technical University of Denmark in Immunology Predicting tumor entities by DKFZ German Cancer Research Center for Central Nervous System. Please add to this list, looking forward for an interactive discussion. #structuralbiology #bioinformatics #research #immunology #cancerresearch
Relevant answer
Answer
https://www.nature.com/articles/s41592-021-01380-4 Deep Learning based approaches for protein structure prediction have sent shock waves through the structural biology community. We anticipate far-reaching and long-lasting impact.
  • asked a question related to Bioinformatics
Question
2 answers
Dear friends, colleagues and experts,
I would be really grateful if anyone could help me to find the origin of this chemical compound. Despite many searches I have done on the Internet, I could not find its source, while based on bioinformatic studies this compound shows a great antiviral activity.
3‐[methoxy(phenylsulfanyl)methyl]tetradecan‐1‐ol
Relevant answer
Answer
Please go through the following link to have more information. This ligand does not have any common name, and it is not a problem to keep the name as it is in your article.
Anyhow you can export the file into .smile format and have a search. If you couldn't find the relevant information, you can report it as a novel ligand.
  • asked a question related to Bioinformatics
Question
3 answers
I am trying to make a median joining network in Network and trying to binarize columns. How can I binarize columns which have four nucleotides in the single column? Also is there any specific logic as to how I should proceed with binarization of the data?
Relevant answer
Answer
Dear Arjun Pal ,
Binarized Neural Networks are an intriguing neural network variant that can save memory, time, and energy.
Regards,
Shafagat
  • asked a question related to Bioinformatics
Question
5 answers
Hi!
The following error showed up while i was converting some small molecules from sdf to pdbqt in OpenBabel from PyRx:
<<bound method VSModel.PrepareLigandMol of <PyRx.vsModel.VSModel instance at 0x091D8EE0>>
Any tips?
Relevant answer
Answer
Convert your .sdf file directly using openbabel to mol2 and then .pdbqt. Hope it would work, sometime openbabel gives this error when you are asking for the same (not supported format).
Secondly, if this also don't work kindly convert your .sdf to .pdb and convert it to .pdbqt using autodock.
Good luck Julia Gomes
  • asked a question related to Bioinformatics
Question
4 answers
What are the single WGS articles?
Relevant answer
Answer
Interesting query
  • asked a question related to Bioinformatics
Question
8 answers
Hello everyone
I have numerous molecules with such complex configuration. The structure normalizer node of RDKit in KNIME can't handle these compounds and throw them as "failed". OpenBabel also can't do the job.
How can I prepare them (3D generation and energy minimization)?
Thanks
Relevant answer
Answer
You can convert this structure to SMILE and show the 3D in Chimera, and minimize energy. You can save this structure to pdb or mol2.
  • asked a question related to Bioinformatics
Question
3 answers
I am analyzing my small RNA seq data on Galaxy, I need to remove all rRNA reads from my data. I downloaded a rRNA reference genome for mice and tried mapping with Bowtie2, but it kept failing. Apparently my rRNA reference file had multiple duplicate names. Where can I get a rRNA reference genome from?
Relevant answer
Answer
Hi Sanat Bhadsavle , you can download the entire SILVA LSU/SSU rRNA Database (LSU= large subunit rRNA, SSU= small subunit rRNA). For practical reasons, just merge both LSU/SSU FASTA files. Now you can (additionally) isolate all entries that belong to mice. The SILVA entries contain the whole taxonomy in the name.
The next step, map your RNA-Seq data against our SILVA database, I would recommend hisat2 over bowtie2. The resulting SAM or BAM (mapping) file can be filtered to extract all entries that do not map to your reference.
samtools view -f 4 mapping.bam > unmapped.sam
will store all unmapped reads in the unnmapped.sam file
Cheers
Roman
See here for more information:
  • asked a question related to Bioinformatics
Question
6 answers
Hi all,
Are there any strong books or tutorials on the concepts underpinning manual inspection and editing of alignments garnered by our alignment algorithms? Applying to both pairwise and MSAs, and I know the difference is perhaps stark between the two, though not sure.
Warm regards,
Michael
  • asked a question related to Bioinformatics
Question
4 answers
Hello!
I would like to BLAST a 120bp nucleotide sequence to find non-identical, similar sequences. Is there a way to tell BLAST to ignore exact matches?
thank you!
Relevant answer
Answer
Thanks everyone for your responses! I ended up just excluding the organism phyla that were returning lots of exact matches
  • asked a question related to Bioinformatics
Question
5 answers
Hi,
I am interesting what is in bioinformatic most requested from companies and academia to find easiest a job? DNA sequence analysis or? Can someone suggest that if you know how to work with something would be in moest companies requested and have good chance to get a job? and is in industry and academia same situation, or is acadamia are different things needed?
What you suggest?
  • asked a question related to Bioinformatics
Question
4 answers
I am working with differentially expressed miRNAs where I only have the name of the mir itself without further details, for example mir-204.
Now for target prediction analysis using databases, there is mir-204-3p and mir-204-5p based on the 3p/5p strand position (forward or reverse). Regarding the functionality of both strands, it seems that both could be biologically functional, yet I read somewhere (https://www.biostars.org/p/150526/) that the 5p is the original arrangement and therefore it is more likely to be the active option. Is it reasonable to conclude that I should always take the 5p strand, or should I maybe take both into consideration?
Would appreciate your insight on this matter!
Relevant answer
Answer
If you got your miRNA of interest based on data from a previous study, and want to know if it corresponds to 5p/3p in the current miRBase relase, you can do the following: first, lookup the old miRNA ID in older miRBase releases and retrieve its sequence; then, do a search in miRBase using the retrieved sequence, and see to which mature miRNA it corresponds to.
In your case, mir-204 (from e.g. miRBase v.17) corresponds to mir-204-5p (miRBase v.22).
  • asked a question related to Bioinformatics
Question
3 answers
I need to superimpose and compare 4 pdb structures. The align and super commands in pymol only overlays 2 structures at a time. What will be the pymol command for superimposing more than 2 structures at a time?
Relevant answer
Answer
In PyMOL you can use A > align > all to this option to align all the open structures to a particular structure in a single step.
For multiple structure alignment you can also use VMD MultiSeq (https://www.ks.uiuc.edu/Training/Tutorials/vmd/tutorial-html/node7.html).
  • asked a question related to Bioinformatics
Question
3 answers
Hi,
I'm looking at MaxQuant Evidence output table for my analysis and have following questions:
1. I'm interested in counting the number of peptides (including those not unique). After filtering peptides associated with the CON and REV proteins, I counted the number of peptides left. This number is way higher than the number of unique peptides reported in Summary table so I'm wondering if I'm doing the right thing?
2. Also, there're a lot of peptides in Evidence table not having MS/MS m/z values reported. What does this mean?
3. Some other peptides contained NaN values for "Calibrated - Uncalibrated m/z", "Mass error", "Uncalibrated Mass error" and "Max Intensity m/z" columns. The "Type" for all these peptides were "MSMS". Is missing precursor the reason why?
4. Similarly, some peptides whose "Type" were "MULTI-MATCH" or "MULTI-MATCH-MSMS" included NaN values from "PIF" column to "Delta score" column. Why?
5. Last but not least, there were many missing values for "Intensity" column. Does that mean that those peptides were too low abundant?
Regards,
Jenna
Relevant answer
Answer
As far as I know the "Type" feature only describes the spectrum type. E.g. 'MULTI-MATCH' = MS1 labeling cluster identified by matching between runs. [from: http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable]
So I assume your 13 RAW files were obtained from multiple measurements of the same sample. Or at least it seams that way if you only obtained a single evidence table. I think MQ returns the MULTI-MATCH type if you use "match between runs" for your search. Either way - I would say these peptides are valid as they were matched between multiple runs.
Are you sure these peptides don't have a precursor ion? What does "Charge", "Retention length" or "Retention time" say for peptides of the type "MULTI-MATCH". If you have values there - you detected a precursor ion for sure.
Again I would highly recommend to a address such a detailed question to the MaxQuant builders.
  • asked a question related to Bioinformatics
Question
5 answers
Dear all,
Can you give me recommendation for specification of bioinformatic computer? Such as what kind of processor, RAM, storage memory, software for NGS analysis/ molecular dynamic?
Thanks in advance
Relevant answer
Answer
It basically depends on the software you are using. Before buying any part verify its benchmark from a reliable website since all expensive parts are not good. General specifications may be compiled as
1. Processor: Intel i5 (minimum) i7 or i9 recommended. or AMD equivalent. but make sure you check the benchmark of the processor since all i7 or i9 processors may not be good. verify your CPU processor at - https://www.cpubenchmark.net/
2. RAM: DDR4 mandatory. 8 GB or above.
3. Storage: You may buy an HDD hard disk (cheap) or SSD hard disk (expensive). Better buy a 1 TB or above HDD hard disk and a minimum 256 GB SSD hard disk. But make sure that you install your operating system in the SSD hard disk. doing this will make your system faster to respond.
4. Graphics card (GPU): If you need a graphic card check its compatibility at https://developer.nvidia.com/cuda-gpus (for Nvidia GPU). Its efficiency depends on the software you are using. If you are using GROMACS for MD simulation then the GPU should have a compute capability of 2.1 or above. Before buying a GPU check the compute capability, CUDA cores, Power Usage, Architecture etc. Compute capability above 5 will be enough for most of the softwares ( but prefer higher if you can). Check the number of CUDA cores since more CUDA is more better. also check the power consumption and number of fans since GPU may use too much power and may overheat. Also check the benchmark here (https://gpu.userbenchmark.com/) or in any other GPU benchmark Site. I strongly represent using a GPU since it accelerates your calculation.
5. SMPS: Get an SMPS which can support sufficient load (depends on your CPU, Fans, GPU etc)
6. Cooling System: Long running generates a lot of heat. A single CPU fan may not be enough. Add more fans.
7. Software for MD: for biomolecules or proteins GROMACS will be good. For other purpose LAMMPS will do. Also there are some other paid software's.
  • asked a question related to Bioinformatics
Question
7 answers
crRNA is responsible for recognizing and binding the sequences next to protospacer-adjacent motif (PAM), NGG, on the target DNA, whereas tracrRNA is essential to maintain cas9 nuclease activity. and most of the miRNA does not contain the PAM sequence (5’-NGG-3) 4. so how can you target them by using CRISPR Cas system?
Relevant answer
Answer
thank you dear Dr Sun Xin for your suggestion
  • asked a question related to Bioinformatics
Question
3 answers
Any kind of suggestions about open source tools that can be used with a standard personal computer are really appreciated. Thanks in advance
Relevant answer
Answer
  • asked a question related to Bioinformatics
Question
4 answers
What would be the most suitable tool for candidate gene investigation of a multifactorial disease using Whole-exome sequencing (WES) data?
Although the Exomiser is mostly used for rare Mendelian pattern diseases, could it be used for multifactorial diseases, changing the frequency filter, for example?
Relevant answer
Answer
Great question, thanks for asking.
  • asked a question related to Bioinformatics
Question
3 answers
Hello. I am trying to run a haplotype analysis in PopArt. It's going well until I realized I can not load a previous work in PopArt. I can only export the graphical output as .svg, .png, or .pdf but not as a "network" file which I can reload or edit if I want to in the future. I noticed that it can be saved as a .nex file and the new file actually had additional lines (the portion of the code started with: "Begin NETWORK"). I think this is supposed to be read by PopArt but it fails to do so. I encounter parsing errors when I try to run the new file. I am not sure if there is a way around this as I am new to the software. Any help would be appreciated. Stay safe, anon!
Relevant answer
Answer
Great question, thanks for asking.
  • asked a question related to Bioinformatics
Question
2 answers
I've all the images in a single folder and a CSV file containing image id and labels, encoded with 0 or 1. I've completed the code without meta-learning, but can't figure out how can I implement a few-shot learning model in my existing code. Here is my code link.
Please take a look at my code. Any help will be appreciable. Thank you.
Relevant answer
Answer
Great question, thanks for asking.
  • asked a question related to Bioinformatics
Question
4 answers
Hello,
I am new in this field. I am doing metagenome analysis with shotgun reads. All reads are single ended. DNA was obtained from airways of human. I just want to find taxon abundances in the samples. Then I will predict the diversities and core microbes.
My mapping results are terrible. How can I handle bad mappings?? OR should I change the tools that I used the analysis?? Which tools are more accurate or sensitive for microbiome analysis?? I need any suggestions, please!
I followed this pipeline:
  1. Assembly was done using Megahit
  2. Short contigs (<200 bps) were removed using prinseq
  3. Read mapping against contigs was performed using BWA
  4. Similarity searches for GenBank, KEGG, , eggNOG were done using Diamond
  5. Binning was done using MaxBin2
You can find my mapping results in the attachment.
Relevant answer
Answer
Dymphan Gonsalves Thank you very much, your answer is very helpful.
  • asked a question related to Bioinformatics
Question
5 answers
The STR markers I am working on are linked to Cystic Fibrosis Transmembrane (CFTR) gene and I wonder where should I check if these STR have previously been reported or not.
Cheers,
Relevant answer
Answer
Katie A Burnette I realized that the NCBI Probe database is not available anymore. Through the UCSC Genome Browser, I managed to find di- and tri-nucleotide microsatellites.
Have you got any clue how can I find tetranucleotide STR markers that have previously been reported for my gene of interest, CFTR?
  • asked a question related to Bioinformatics
Question
4 answers
Dear all,
I'm looking for a commercial partner who is able to determine log P (partition coefficient) of the protein with the use of bioinformatics tools. The protein mass is 2,4 kDa, it includes 22 amino acids and three S-S bridges.
Thank you in advance.
Relevant answer
Answer
It pretty much depends on what you want to know (and why). But perhaps this paper is of any use to you: Plante, J., Werner, S. JPlogP: an improved logP predictor trained using predicted data. J Cheminform 10, 61 (2018). https://doi.org/10.1186/s13321-018-0316-5 https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0316-5(see also the papers that cite this one in Google Scholar).
Since you are not looking for ‘just’ a chemical compound but peptides perhaps this paper is of any use: Thompson, S. J., Hattotuwagama, C. K., Holliday, J. D., & Flower, D. R. (2006). On the hydrophobicity of peptides: Comparing empirical predictions of peptide log P values. Bioinformation, 1(7), 237. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891704/
Again, not entirely sure what you want to know but when it is primarily focussing on hydrophobicity you might have a look at for example:
Best regards.
  • asked a question related to Bioinformatics
Question
9 answers
I am planning a study where I want to select a few potential miRNAs that are responsible for inhibiting protein translation from a particular gene.
I am new in this area so it would be helpful for me if I get some guidance on how to select the miRNAs for a specific gene and if there is a way to determine miRNA and mRNA interactions with any bioinformatics tool before moving forward.
I have selected a few miRNAs for my gene from TargetScan, Mirdb, and Mirtarbase but I want some other opinions.
Relevant answer
Answer
Sameh H. Mohamed Yes, different target prediction tools uses different algorithms, so there are differences in predicted targets. So, to avoid confusion, focus on one tool, you can prefer TargetScan whose results are well-optimized & established.
  • asked a question related to Bioinformatics