Computational Biology - Science topic
A field of biology concerned with the development of techniques for the collection and manipulation of biological data, and the use of such data to make biological discoveries or predictions. This field encompasses all computational methods and theories applicable to MOLECULAR BIOLOGY and areas of computer-based techniques for solving biological problems including manipulation of models and datasets.
Questions related to Computational Biology
I have some lists of gene IDs from multi species, I want to have their compiled FASTA format files for each species. it looks tedious to copy each accession and collect FASTA seqs.
Batch Entrez is giving me error, may be because the identifier is related to other database.
Is there an updated list of all the approved biological databases with a brief description of each DB?
I am a young bioinformatics student, want to have clues for my project pipeline. hints and expert answers are welcome. THANKS
I'm currently trying to figure out the specific interactions among two specific proteins.
Unfortunately, these protein complexes aren't available experimentally(e.g. x-ray crystallography, cryo-em data).
Thus, our research group has worked together with a computational biology research group and created an AlphaFold predicted complex structure.
Because there were steric clashes and some inaccuracy in the initially predicted structure, they performed an additional 'refinement' process.
Is MD simulation after this refinement mandatory? i.e. Is the 'refined' structure accurate enough to predict any specific amino acid interactions between the two protein subunits using PDBePISA, etc?
Or conversely, is the MD simulation-applied structure still not accurate enough for further protein interface analysis?
Can someone help me with this.
I have run a complex with DNA, Protein and RNA in NAMD for 50 ns. However, after the end of simulation, the base pairing of DNA and DNA RNA hybrid completely got disrupted. What should I do inorder to get a stable DNA and RNA in simulation. I have used CHARMM36 force field. The simulation was done at 310k.
I am trying to find differentially abundant microbes between two conditions. I have the relative abundance data but not the absolute read counts.
Is there any method that considers relative abundance data as input?
or any way to transform this data before use?
Hi everyone. I am currently working with viral hemerrogic fevers and need to dock lead molecules with RNA dependent RNA polymarase enzyme. My question is that ' is the structure or RdRp (RNA dependent RNA polymarase) same for different viruses lkke Ebola, Dengue, westnile etc. Or is their specific RdRp for each virus? I have searched PDB but could not find Ebola RdRp. Is there any other database from which i can find it?
So my last year project is Drug Efflux Pumps and Persistence in Methicillin Resistant Staphylococcus aureus and we gonna focus on persister cells to study the path way of antimicrobial resistance...my question is how can i link bioinformatics and some coding to this project without requiring wgs cause it's not an option inside our lab !I need a small yet beneficial technique/ tools in small scale that i can learn and implement by my self .PS I love programming in general but im still new to bioinformatics so i need help to link my passion for coding and my field "biotechnology"
I am a master's student of statistics. I have been in the field of econometrics and have taken projects on machine learning. However, I wish to change field. Can I have a supervisor who will be willing to mentor me through bioinformatics, taking my previous and current research areas into consideration? Or do I need another master's degree in bioinformatics or a related field before I can proceed to Phd?
I have been asked to check the gene expression patterns of the cells for a RNA seq data after performing principal component analysis plot using MATLAB. I have a CSV file that has the principal component values stored, but I am not sure how to perform differential expression analysis using the PC values. Any MATLAB function available? Kindly help me. Thanks in advance.
We have been trying to search the anatomical substance of acupuncture points in the skin. While conducting experiments, we found a new anatomical structure in rat skin. In this structure mRNA of a gene of which function is unknown is very highly expressed. We know the exact nucleic acid sequence of this gene. What experiments need to be done to find out the function of this gene? Can it be done through computational biology study?
Any comments would be greatly appreciated.
Hello, research community,
I am looking for some open problems in bioinformatics specifically in the area of, but not limited to, proteomics, and genomics. Since I am new to this area, any useful suggestions, a discussion on open problems and relevant resources are welcome.
I have a data (shown in attached pic ) where I have RNA seq data of various samples for the same the gene twice.
Now suppose for sample-1 if I want to measure the gene ( which is haplotypic in nature ) how do I consider its RNA seq for the sample no 1. Do I take average or do I consider median or should I consider both these versions of genes as separate genes ? I guess biologist would make better explanations.
I have RNA-sequence/est and genome sequence. I would like to identify the intron splice site with 5' GT-3'AG bias
This would help a Newbie who wants to go into bioinformatics and/or computational biology and wants to grow exponentially in the field.
Note: Information on this will be posted in Bioinformatics.co.ke
Add (Y) to be named and (N) not to be named. Without any of these, Names won't be mentioned.
Do you know an ISI open access journal in one of the following topics with quick reviewing process and short time to first decision . also short time to revision process to acceptance ?
I prefer journals with impact factor ranging from 1.5 to 3.3.
genomics data analysis
omics data analysis
Hello friends, today I am raising a concern- What are real palindromic DNA sequence ? off course you will say- Restriction enzymes sites, but through a video available at the link http://bit.ly/palindromicDNA, I am raising an issue that, in true sense mirror repeats are palindromic in nature as defined by standard English dictionaries. There are many unique properties of mirror repeats DNA which i will share later. Hopefully biological scientific community will accept mirror repeats as True English Palindrome. So please check out http://bit.ly/palindromicDNA
I would like to take the iupac names or 2 D structures of chemical compounds mentioned in publications which are not free. But these compound names are mentioned in abstract or supporting information which is free.
Are there any copyright issues from the journals concerned if I make use of the iupac names or the 2D struture of the compounds in my in silico research ?
The structures of these compounds are available freely on chemical databases which is in the public domain.
So is it legal to make use of the compound strutures in my computational work which is non-commercial in nature ?
I am looking for a comprehensive compiled list of variant databases and their date of development. I am also looking out for their respective links to get more details on each of these databases.
Grateful if someone could help me in this.
Thank you in advance.
PSSM(Position-specific scoring matrix) is one of the key features to be used for B cell conformation epitope prediction but I am confused about how to use it as a feature.
Hi everyone, I am using the pyTMs plug in on Pymol to phosphorylate a particular threonine in my protein. I am having difficulty in selecting only one or two residues - it seems I can only phosphorylate all serines/theronines. Does anyone know how to fix this?
Question1: Am I doing this right , means do setting up conda on server works for virtual screening (AUTODOCK)?
Question2: How can I modify the script (submit4.py) according to my server requirements?
Please read bellow for detailed explanation of the question.
I am new to Virtual Screening.
To learn this I had started with tutorial named “Using AutoDock 4 for Virtual Screening” (Attaching pdf) (http://autodock.scripps.edu/faqs-help/tutorial/using-autodock4-for-virtual-screening).
I was able to replicate the results (UPTO exercise 11) on my local machine.
Now I am trying to replicate the section named “Using the TSRI cluster: garibaldi” on my college server (page 32 in the pdf attached).
I do not have sudo rights in my college server.
So what I did was:
1) Installed CONDA on the server. I made a virtual environment there.
2) Installed autodock, autodock Vina, autodocktools, mgltools on CONDA environment.
3) Then I downloaded the file “submit4.py” and kept it in the path (here in the bin file of my CONDA environment) (I had changed the default path in the script) (attaching the script of submit4.py).
4) When I am launching my jobs. There I am getting this error -
“sh: 7: qsub:Permission denied”.
I had traced this problem back to 32nd line of the submit4.py script.
The line is-
“ qsub -l cput=23:00:00 -l nodes=1:ppn=1 -l walltime=23:30:00 -l mem=512mb %s.j >> %s ”
**so my questions are:**
Question1: Am I doing this right , means do setting up conda like this works for virtual screening ?
Question2: How can I modify the script (submit4.py) according to my server requirements?
The script for submit4.py:
# Usage: submit4.py stem ndlgs
import sys, posix, time
path = "/home/tushar19221/anaconda3/envs/tushar_env/bin/autodock4"
stem = sys.argv
ndlgs = int(sys.argv)
ndlg_start = 1
if (len(sys.argv) == 4):
ndlg_start = int(sys.argv)
cwd = posix.getcwd()
created = time.time()
jobIDsName = """%s.%.2f.jobIDs""" % (stem, created)
command = """touch %s\n""" % (jobIDsName,)
for i in xrange(ndlg_start, (ndlg_start + ndlgs)):
jobname = """%s.%03d""" % (stem, i)
command = """echo "ulimit -s unlimited
echo SHELL is $SHELL
echo PATH is $PATH
/home/tushar19221/anaconda3/envs/tushar_env/bin/autodock4 -p %s.dpf -l %s.dlg" > %s.j
chmod +x %s.j
qsub -l cput=23:00:00 -l nodes=1:ppn=1 -l walltime=23:30:00 -l mem=512mb %s.j >> %s
""" % (cwd, path, stem, jobname, jobname, jobname, jobIDsName)
# next i
command = """echo "Job %s was launched on %d processors with these
cat %s\n""" % (stem, ndlgs, jobIDsName,)
Thank you for reading.
Your help is highly appreciated.
Hope everyone is having a good day.
I want to learn computational biology. I have a PhD. in pharmacology. Lots of times I heard about the computational biology/bioinformatics but never had a guideline how to learn or to start this interesting field of research.
It would be very helpful if you can guide me through this.
Have a nice day.
Hello, im Phd student, In my master's thesis, I investigated the cytotoxic, apoptotic and cell cycle effects of an anticancer drug (Danusertib) on pancreatic cancer cells (CFPAC-1and Mia-PaCa-2) by using xCelligence and Flow cytometry in Cell culture lab.
However, I want to do my Phd thesis with virtual experiments using databases ( OMIM, COSMIC, GAD, TCGA) and computer power (maybe on Amazon web services, google cloud or azure) due to financial insufficiency and I like to spend time with computers. So I don't know where to start research about these things and can I do a logical research with these databases? Can anyone give a tip or advice ?
Hi Everyone, I am calculating the surface area per lipid of 200 ns membrane trajectory using MEMPLUGIN of VMD and it seems to be calculating the same too slow. Therefore, I would like to ask Is there any other way to calculate the surface area per lipid of 200 ns membrane trajectory, other than using MEMPLUGIN ofIf it is then please let me know or if you could help me to let me know how to use more and more processors while calculating the surface area per lipid that also will be very helpful for me. Please give your kind suggestions.
Thanking you in advance
If I have only two options in front of me to select either Gasteiger or the AM1-BCC, so based on which parameters or rules I can select the most appropriate charge scheme during minimization step for my ligand.
Does this has to do anything with my ligand size or the protein size whom I want to dock this ligand?
I have seen some people chose Gasteiger over AM1-BCC, I am confused why preferring an inferior charge algorithm when we have the option for choosing the semi-empirical novel type?
I have a protein where I have found a mutation which may be disease-associated. Now I want to show the non-bonded interaction of this particular residue with other surrounding residues to predict if any significant decrease of interaction has occurred or not due to this substitution. How can I find this with discovery studio/pymol or other visualization software?
In the fasta output of Prokka listing the name of genes, some genes does not have any name ("gene: NA"). My question is whether these genes are hypothetical or they do not have any name?
If the former one is the case, how Prokka determine them?
I'm working on bio-informatics. And my current project is to analyze public data from TCGA and GEO to find novel genes' relation to diagnosis or prognosis of cancer. The special feature of TCGA data is the enclosing clinical data, which we can use for further analysis.
I'm just want to know that is there any platform or source of public data like TCGA or GEO? Because sometimes I need to validate the data. But I can find another source for validation, and I have no available clinical data of the interest problem in my hospital.
Thank you very much for reading and sharing experience!!!
Excluding the obvious ones like ChemOffice, SciFinder or any other bulky packets. I am looking for those small programs that make life easier, like for instance: Mendeley (for managing documents) or Quartzy (for chemicals and protocols managing).
I am also desperately looking for some applications of similar kind for Android OS.
I would also like to hear some feedback about Electronic Lab Notebooks (ELNs). How are they working out for you? Would you choose to work in those or do you prefer paper?
I'm a molecular biologist, and i have a few projects coming up in transcriptomes and small RNA analysis. Can i get by without knowing any programming using user-friendly software such an Geneious Prime or another program you can suggest or is it absolutely a must?
I want to simulate a niosome bilayer with schrodinger software (molecular dynamics), but first I have to design the proper bilayer. Does anyone know a software or simulator to design a bilayer with a certain composition?
I have taken Illumina reads and aligned them to a reference genome using BWA then obtained the corresponding BAM/SAM files. I have also called SNPs, which are in VCF format, and tried to use this file to predict synonymous and nonsynonymous sites (using snpEff), but this will only give me a N/S ratio and I what I really want is the dN/dS ratio. Is there any way do this from the BWA alignments? I am new to NGS genome assembly, so any tips are much appreciated.
Few of us wanted to create a discord server for Biophysics. What we intend is to begin a commonplace for discussions/numerical experiments. Also possibly document the results in the form of blogs or other media.
I believe that there are many biophysics/computational biophysics/Molecular Dynamics enthusiasts here. Here is the server link: https://discord.gg/qRQRq2k
Come and join us. Let us learn together.
I have 14 miRNA that is related to a particular disease. I want to draw a network like Gene Networking (GeneMania).I can draw a network easily by inputting the gene name in genemania but which softwere can take input miRNA name like this? Which software is better? I was trying to use Cytoscape but it require pre-networking data (if I am not wrong). I am not sure whether I can get any pre-networking data for miRNA. Some of the miRNA is quite new and some old version of the software can't recognize that one.
Please help me how I can get a network like Genemania. I only can input different miRNA name and particular disease. Thanks.
I mean the number of contacts per protein residue with different different parts of the lipids. It more or less can be done in GROMACS, but you need to create many indexes for each trajectory, so it is quite a long analysis. I was wondering if there is any tcl script to use with VMD to do so.
There is a published substitution matrix for intrinsically disordered proteins that I would like to use for a BLAST search, but I am unable to find a program that supports uploading a custom matrix. I prefer to use R for my computational biology, but I will use what ever is needed to support the matrix. Any recommendations or tips? Thanks!
I've just came across these two algorithms and I was wondering whether there are any available versions for ACO and PSO as ranking-based feature selection approaches?
Any comment would be appreciate.
Hi, I have a computational biology backgroud and right now studying how the cells are organized in a tissue. Someone told me that my cells are orientated in specific manner so it has some kind of similarity with liquid crystal because there also mesogen (or molecules) of long axis aligned to the director. In general liquid crystal have the order parameter value in range between 0.3<S<0.8.
In my case, the order parameter is negative (-0.3) that means cell short axis aligned to the director. What shall I understand from this about the morphology of the cells? Any help will be appreciated. Thanks.
S = 0.5 <3cos^2(theta) -1> , where theta is the angle between director and molecule long axis.
We are interested in developing method for predicting siRNA, thus we need a large set of siRNA for developing models. I will highly appreciate if you please suggest best database or databases on siRNA. This will help us in creating large dataset that may cover all experimentally characterize siRNA. Please also suggest best (latest) prediction method on siRNA. Do you think their is possibility for developing better prediction method or this field is already saturated.
I have gene expression data from different conditions from different studies. Instead of using the actual TPM values for Pearson Correlation coefficient (PCC) calculation, I have decided to use Fold change values from different studies to eliminate biases from different studies. My question is whether using these raw fold change values for identifying co-expressed genes is a correct way to do it or should perform quantile normalization on these fold change values before using them for PCC calculation? (Note: Distribution of fold change values in different studies is quite different)
I've been using the refine.bio website to download normalized transcriptome data; each downloaded dataset consists in a compressed directory with an expression matrix in .tsv format, its metadata in .tsv format too and an aggregated metadata file in .json format.
I'm trying to associate the expression matrix with its metadata using R programming language, but I don't know how to do it, and I don't find the way in the site's documentation. I only know that I need reed these files with these commands:
> expression_df <- read.delim('SRP068114/SRP068114.tsv', header = TRUE,
> row.names = 1, stringsAsFactors = FALSE)
> metadata_list <- fromJSON(file = 'aggregated_metadata.json')
but I have no idea how to merge them for generating a full-informative matrix.
Can someone help me, please?
Thank you so much.
I am working on computationally understanding the active and inactive conformations of some proteins. Simulating the inactive conformation from the active conformation is reported in literature by performing enhanced sampling MD studies, like Metadynamics, REMD etc., in which energy is added in particular co-ordinates called collective variables. This makes it slightly biased.
If I run unbiased atomistic molecular dynamics simulations of several microseconds, will my protein explore the conformational space by crossing the energy barriers? Or will the system be eternally stuck in a local minimum which it first reaches?
Dear research professors and scholars,
I have developed a novel 3D protein structure (mutant DNA Gyrase enzyme of antibiotic-resistant E. coli) by homology modeling technique. Because this type of protein was not deposited in the protein data bank (www.pdb.com). So, I attempted to create mutant DNA Gyrase protein in homology modeling method. I would like to this protein in some online protein bank for future research on antibiotic-resistant related studies. Please suggest some online 3D protein upload website except www.pdb.com.
I'm looking for a book for microarray data analysis. I'm a mathematician and I'm interested to find a book able to give a framework for microarray data analysis (from the beginning to the end-backgroung correction, normalization, dim. reduction, clustering, etc...). I found this: http://www.springer.com/gp/book/9781402072604
There are some more appropriated ?
how to study a certain type of mutations with another type of protein (not mutation and totally different from mutation) by (bioinformatics tools). What is your opinion and suggestion about this?
I would like to perform positive selection analysis among mammals. I am mostly interested in positive selection in humans and I have started off with around 100 mammals species and looking for a way to find the ideal number of species I can use to detect the selection.
I want to have a balance in the evolutionary distances between the species I will include: not too distant or not too divergent. Previous discussions give a minimum number of species to include in a positive selection analysis; however, not much information is given about the maximum number.
What is the appropriate number of species that should be used in positive selection analysis and what would be the maximum? Also, what interval of evolutionary distance should be used to be able to detect the positive selection and avoid false positives at the same time?
I want to generate a graph showing the relative evolutionary constraints on single positions of a certain sequence of amino-acids (protein sequence).
I came across this attached figure of E.V. Koonin in his book "The Logic of Chance", what he called "genomescape", is there any method to measure evolutionary constraints per residue in a sequence of amino acids, and generate such a graph?
Hi. I have a protein structure, GPCR with 400 residues approx. The EC loop contains around 70 residues which makes it so flexible and they are not my attraction of studies, hence i hardly need it.
How is possible to clip the EC loop?
1. How to clip it?
2. Is there any consideration before I clipping?
Please help me with your suggesstions.
I just completed setting up my egpu setup for exediting GROMACS MD simulations. I have seen quite a few post here and there regarding this. So, I thought it would be a good idea to share my experiences.
1. Zotac GTX1050TI OC edition- 200$
2. EXP GDC Beast 8.4D mPCIe- 31$
3. A local premium grade 500W power supply- 12$
I had an old reliable Lenovo Z510 and sacrificed my wlan card. Put the adapter there and changed the BIOS graphics mode to UMA only. Working fine, got almost an 80% boost.
I have modeled a protein, performed MD simulation and Docking studies. What are the other/additional computational study that can be performed further in order to target a high impact journal.
Hi, I am a starting my BS' senior year in a few months. The major of my study is molecular and cell biology, I also have a decent background of computational biology tools used for analyzing high throughput sequencing data.
I am interested in perusing my graduate studies at coral reef genomics, biotechnology of coral reef restoration, etc. The problem is I am confused somehow and do not know where to start from. Can anyone give me any advice that can help (Recommending a quality lab that works in the field, having the contacts of a professor that works on the field and maybe needs to recruit a masters or Ph.D. student, or recommending an online course or a textbook that would help me get the required knowledge)? Please, provide me with anything that you think may help. Thanks in advance.
Thank you for all your support.
Thank you chandra mohan , Christian Janiesch and Ramin Sedaghat.
Looking for more published projects where students can get benefited by referring these documents.
Please share the docs directly into firstname.lastname@example.org or reply me here.
I am currently trying to model my proteins (they are antibody fragments) I am looking for this comparison between the two major tools for protein model prediction. I either find people saying to use either one of them but I failed to get any comparison in features and reliability or differences between those two.
It would be very helpful if someone could mention their thoughts on these two.
As the wall time is up, the production run stopped. now i have prd.cpt & prd_prev.cpt. so what is the difference between them and which one should be used in restarting simulation.Also in tutorial topol.tpr is a topology file or a binary file produce by grompp.
Several .tpr file have produced during simulation like "#prd.tpr.67# ", "#prd.tpr.64#" and "prd.tpr ". what is difference between them ? which one should be used to restart the simulation?
I'm trying to use homology modelling to find the structure of the N-terminus of a protein which is thought to contribute to its hetero-oligomeric structure. However, I can't find an x-ray structure to use as a template. I use blast to search for pdb structures but none of them seem to have solved the structure of this region even though it is in their sequence. Is there a way of filtering out structures which do not have this region solved? If not, what would be a good alternative to solve the structure computationally?
I am having problem understanding how weka calculates ROC curves. As I am generating machine learning models by 4 methods-j48, random forest, naiive bayes and when i save the data for roc by each of them number of instances vary. for some there are only three instances or some like hundred. Should I not be getting almost same number of instances in all the cases irrespective of the model. I understand that ROC is plot of True positive rate vs. False positive rate but how does number of instances come into picture in weka.
Dear all, I am working on a knockout strain of e coli for mixed acid production and have so far got a bit of understanding of Flux Balance Analysis. Now, for my next part of the project, I have to experimentally verify the results obtained from flux balance analysis. I haven't got much literature on where the people have done experimental verification of flux balance analysis. For that, I have devised the following experimental design-
- Batch Reactor(1.5 L working Volume)- Data to be taken only for the exponential phase.
- M9 minimal media with xylose as the carbon source(No other carbon source and no use of yeast extract, even though it enhances growth)
- Controlled pH of 6.8
- Oxygen supply and agitation speed- optimized values for mixed acid production. However, the Dissolved Oxygen probe shows the value to zero after a few hours of experiment, which means whatever oxygen is provided is consumed readily
- Calculating oxygen uptake rate- Not sure, need help
I request to look at the experimental procedure and provide your suggestions, answers, and comments.
I would like to ask if somebody know user friendly way to blast query sequence (protein or nucleotide) against custom (user-defined) database of sequences. Ideally to work on Windows :-)
Thank you for response
I am trying a coarse-grained simulation in gromacs to understand certain protein folding. I initially tried with triclinic box and -d 1.0. I used -c to center my protein in the solvent box. But the protein moved out of the solvent box during the simulation.
I tried a second time in a cubic box and -d 2.0. It didn't help.
Any suggestions in this regard?
I am wondering if could be possible to align set of protein sequences (for example 100 protein sequences) each to each by any user friendly way. I.e. sequence no. 1 with the sequence no. 2, sequence no. 1 with the sequence no. 3 .... sequence no. 1 with the sequence no. 100 ................................................................................................. and finally sequence no. 100 with the sequence no. 99. Does such tool exist? Ideally with some graphical output (heatmap of similarities,...).