Science topic

Ensemble - Science topic

Explore the latest questions and answers in Ensemble, and find Ensemble experts.
Questions related to Ensemble
  • asked a question related to Ensemble
Question
1 answer
Hello everyone!
I'm calculating protein dimer structure in CNS-solve v1.21 using distance restraints obtained from solution NMR experiments.
There is an issue during calculation that most structures (not all) in the ensemble have two specific amino acids: one tyrosine and one phenilalanine broken like shown in the picture. The problem reproduces even after I remove all restraints associated with this amino acids.
I tried to review topology file, but did not find anything suspicious about these residues.
I would greatly appreciate if you could give me any hints on how to solve this problem.
Relevant answer
Answer
Moved project to cns_solve version 1.3, problem no longer occurred
  • asked a question related to Ensemble
Question
1 answer
I've set up an emcee EnsembleSampler() with 50 walkers, 500 iterations. However, looking at the resulting traces, I don't think my walkers are fully exploring the parameter space. The walkers don't converge around what the true value of the parameter is.
The prior I'm giving to whole parameters is uniform and the p0 distribution is a random uniform distribution. The reason I believe this is wrong is because it appears that these walkers are exploring all of the parameter space and for the majority of their iterations without converging to the parameter
Graphs below:
Relevant answer
Answer
Yes, walker positions can converge when using an ensemble sampler, such as the affine-invariant MCMC ensemble sampler (e.g., emcee). In this method, a group of "walkers" (or chains) explore the parameter space simultaneously. As the walkers move, they collectively sample the posterior distribution. Over time, if the sampling is efficient and the model parameters are well-behaved, the walker positions converge towards the true posterior distribution. Convergence is typically assessed through diagnostics like the Gelman-Rubin statistic or by visually inspecting trace plots.
  • asked a question related to Ensemble
Question
1 answer
Hi,
I need to quantify the expression of CD44v6 in my RNAseq data. For this, I would need the ensembl number. Does anyone know what the ensemble number for CD44v6 is?
Thank you.
Relevant answer
Answer
There are multiple splice variants of the CD44 gene; CD44v6 denotes one of these variants. The Ensembl identification for the CD44 gene is typically utilized as a reference point when analyzing gene expression data and RNA sequencing data.
The human CD44 gene's Ensembl gene ID is ENSG00000026508. To determine the precise isoform, you might need to investigate at the transcript level, though, as CD44v6 is a particular splice variant. Ensembl offers comprehensive data on various gene transcripts and splice variants.
Use the Ensembl database to search for the CD44 gene in order to locate the specific transcript for CD44v6. Examining the many transcript variations here will allow you to determine which one corresponds to CD44v6.
The precise variant can be found by visiting the Ensembl website and utilizing the gene search function to examine the transcripts under the CD44 gene (ENSG00000026508). This is the Ensembl link for the CD44 gene: Ensembl CD44.
  • asked a question related to Ensemble
Question
5 answers
Hi everybody,
I'm looking for an SNP (rs497692), but when I insert the rs in the databases, it returns the alleles as T>A / T>C . But there is frequency for T and C . Not for A . Why?
can anyone help me?
Relevant answer
Answer
The situation you’re describing can arise due to several reasons related to how single nucleotide polymorphisms (SNPs) are documented and reported in databases such as NCBI and Ensembl. Here are some possible explanations for why you might not find the allele frequency for the "A" allele in the SNP rs497692:
  1. Low Frequency or Rare Allele:If the "A" allele is extremely rare or has a very low frequency in the population studied, it might not be reported or highlighted in the databases. Databases often focus on more common alleles with higher frequencies.
  2. Population-Specific Data:The allele frequencies reported might be specific to certain populations or datasets. If the "A" allele was not observed or was observed at a very low frequency in the populations studied, its frequency might not be included.
  3. Genotyping Errors or Ambiguities:Sometimes, genotyping errors or ambiguities in older datasets can result in incomplete or inaccurate allele frequency information. It’s possible that earlier studies did not detect the "A" allele reliably.
  4. Historical Data and Updates:The databases might be using historical data that hasn't been updated to reflect more recent findings. New studies might have detected the "A" allele, but the database you are checking might not have incorporated those updates yet.
  5. Minor Allele Reporting:Databases typically report the major (most frequent) and minor (second most frequent) alleles. If "A" is a minor allele that is less frequent than both "T" and "C," it might not be prominently reported.
  6. Database-Specific Annotation:Different databases have different methods and thresholds for reporting allele frequencies. The way NCBI and Ensembl report SNP data can differ, and some alleles might be included in one but not the other based on their criteria.
Steps to Resolve This:
  1. Check Multiple Databases:Look up the SNP in multiple databases such as dbSNP (NCBI), Ensembl, and the 1000 Genomes Project. Each might have slightly different data.
  2. Population-Specific Studies:Check if there are population-specific studies or databases that might have more detailed allele frequency information for different ethnic groups or regions.
  3. Recent Literature:Look for recent research papers or publications that might have studied this SNP in more detail. New findings might not yet be reflected in the major databases.
  • asked a question related to Ensemble
Question
4 answers
I am currently working on a financial project and require an expert in machine learning. Specifically, I need expertise in tree boosting, neural networks, random forests, linear and nonlinear frailty models, as well as ensemble methods. In addition to discussing the work, I will need the source code files of the models and analyses we develop. I hope this clarifies my requirements.
Relevant answer
Answer
I can collaborate.
  • asked a question related to Ensemble
Question
4 answers
I want to compare ensemble mean and member mean with era5 data as time serise
Relevant answer
Thank you Kishore Ragi, will try that
  • asked a question related to Ensemble
Question
2 answers
Une image d'un caractère manuscrit est une distribution de pixels dans un vecteur chaque pixel est une valeur comprise entre O et 255. Cette distribution peut être utilisée pour comprendre la répartition des données et en tirer des conclusions. La distribution de valeurs dans un vecteur peut être représentée en général sous forme de diagramme, d'histogramme ou d'autres types de graphiques. Ces graphiques permettent de visualiser la fréquence de chaque valeur dans le vecteur et d'identifier les valeurs qui se produisent le plus ou le moins souvent. Il existe plusieurs mesures statistiques qui peuvent être utilisées pour décrire la distribution de valeurs dans un vecteur notamment la moyenne, la médiane, l'écart-type et la variance. Ces mesures permettent de caractèriser la distribution de manière plus précise et de comparer les distributions de différents vecteurs. La méthode à utiliser dans une distribution de valeurs dans un vecteur peut être influencée par plusieurs facteurs, tels que la taille de l'échantilion, la nature des données ou la présence de valeurs aberrantes. C'est ce demier point qui consiste la problématique de cet artice. . Les méthodes statistiques sont généralement utilisées pour comparer la variance de plusieurs vecteurs de même type. L'explorations de ces nombreuses méthodes anciennes et récentes m'a conduit à tester l'utilisation de Principal Components Analesis combiné avec la distance de Mahalanobis PCA (analyse en composantes principales) est une technique qui permet de réduire la dimensionnalité d'un ensemble de données en transformant les variables d'origine en un nombre plus petit de variables non corrélées appelées composantes principales. La distance de Mahalanobis est une mesure de la distance entre un point et une distribution multidimensionnelle, qui tient compte des covariances entre les différentes variables. Autrement dit la distance de Mahalanobis est comme une seuil de distance entre un point et un groupe d'autres points pour identifier les points les plus éloignés ou inhabituels (données aberrantes (outliers)) ou pour déterminer si un point de données appartient à une distribution particulière (classification). Théoriquement l'application de PCA sur un ensemble d'images constituées de 28x28 (vecteur de 784 valeurs de pixels) réduira ces données à un nombre plus petit de variables, la distance Mahalanobis est appliquée sur les composantes principales pour identifier les images qui sont éloignées de la moyenne. si l'on trouve que l'image a une distance très élevée par rapport au centre de la distribution, cela peut indiquer que l'image est aberrante. Plus la distance de est grande, plus l'image est différente de l'ensemble de données. Le réajustement du seuil de distance utilisé est nécessaire en fonction des résultats pour identifier le seuil approprié. l'utilisation de ces méthodes statistiques sur des images de caractères manuscrit peut-elle donner de bons résultats en pratique?
Relevant answer
Answer
scikit-learn has several other uncommon methods. All of these methods have been tested without obtaining satisfactory results. In conclusion, traditional or uncommon statistical methods are not suitable for the data used in this problem.
  • asked a question related to Ensemble
Question
5 answers
Background
MD simulation is often used to investigate conformational changes caused by mutation. Consider a case where one has two ensembles of trajectories. We wish to see if there is a statistically significant difference between the ensembles by comparing the mean Calpha distance between specific residues. We require observations to be independent from each other.
Question
Within a single trajectory if the observations are taken far enough apart are they considered independent. For instance if we track the distance between Ser55 and Thr101 can we treat the distances at time point 10ns, 20ns, 30ns etc as independent observations.
Relevant answer
Answer
Hi Mohsen Sadeghi thank you for clarifying this, and I think this is more appropriate. And, yes the correlation time varies based on the property you are studying.
  • asked a question related to Ensemble
Question
1 answer
Hello,
I want to use AMBER to run a simulation of a membrane protein, small molecule and POPC bilayer membrane complex system, and the complex parameter files .inpcrd and .prmtop have been generated by tleap.
But I do not know how to optimize and balance the system. I searched some posts on the Internet and summed up the following optimization and balancing steps:
(1) Energy minimization of water and ions;
(2) Energy minimization of protein, small molecule and POPC;
(3) Heating system to 310K;
(4) Balancing water and ions using NVT ensemble;
(5) Balancing POPC using NPT ensemble;
(6) Balance protein and small molecule using NPT ensemble;
(7) Conventional molecular dynamics simulation under the NPT ensemble.
Whether this optimization and balancing process suitable for membrane protein, small molecules and POPC system? If not, which step needs to be improved?
In addition, if a residue of small molecule in .prmtop file is named SW2, how can I use the 'restraintmask' instruction in the .in file to restrict the small molecule?
Thank you!
Best wishes!
Relevant answer
no suggestions
  • asked a question related to Ensemble
Question
1 answer
Does anyone know a method for generating a disordered protein ensemble with AlphaFold-Multimer? The protein I am interested in is disordered and is bound to a protein that is mostly folded. I know of some methods that have been described in the literature (for instance, varying the random seed), but if someone has implementation details on such an approach I would be extremely grateful.
Relevant answer
Answer
Hi,
I haven't heard of any implementation for the exact application you have in mind, but besides asking AF-Multimer to generate a bunch of models, and repeating this with different random seeds, the following 2 papers may also inspire you to [1] do some in silico mutagenesis (e.g. Ala-screening) on the conserved resideus of the IDP; and [2] re-run of AF-Multimer with subsampled MSAs:
Alternatively, you could do MD or Monte Carlo simulations. These are believed to produce conformers more representative for the Boltzmann ensemble.
Best of luck,
Tamas
  • asked a question related to Ensemble
Question
1 answer
To induce Notch signalling in Huh 7 cells, I plan on cloning the NICD and then using it to activate the Notch Pathway. I need the region in the DNA that specifically codes for the Notch 1 intracellular DNA.
Relevant answer
Answer
Hi Umang
We got the Notch1 intracellular domain (NICD) as a gift from another university.
However, You can find the sequence below link-
Hope, this will help you. You can buy it from Addgene ($85).
Best wishes,
Subbroto
  • asked a question related to Ensemble
Question
1 answer
..
Relevant answer
Answer
Ensemble methods in machine learning refer to the technique of combining multiple models to improve the overall predictive performance. The idea behind ensemble methods is that combining the predictions of multiple models can lead to better accuracy and reduce the risk of overfitting.
Ensemble methods can be particularly useful in deep learning, where models are typically large and complex, and prone to overfitting. Here are some of the most popular ensemble methods used in deep learning:
  1. Bagging: Bagging, short for Bootstrap Aggregating, involves training multiple models independently on different subsets of the training data and then combining their predictions. Bagging can help reduce variance and improve accuracy, especially when the models used are high variance and prone to overfitting.
  2. Boosting: Boosting is another ensemble method that involves sequentially training multiple models, where each subsequent model tries to correct the errors of the previous model. Boosting can help reduce bias and improve the accuracy of models, especially when the models used are high bias.
  3. Stacking: Stacking involves combining the predictions of multiple models through another model, known as the meta-model. The meta-model is trained on the predictions of the base models, and its goal is to learn how to optimally combine their outputs. Stacking can be useful in situations where the base models have different strengths and weaknesses.
Ensemble methods can also be used in deep learning to improve model robustness, increase diversity, and improve generalization performance. By combining multiple models, ensemble methods can help capture different aspects of the data and reduce the risk of models overfitting to specific patterns in the data. However, ensemble methods can be computationally expensive and require careful selection and tuning of the individual models used.
  • asked a question related to Ensemble
Question
2 answers
I need to know how one can calulate the Wasserstein distance for probability measures of Gaussian Unitary Ensemble (GUE) and Gaussian Orthogonal Ensemble (GOE). These are two famous ensembles in Random Matrix Theory (RMT). Is there any analytical way or programming code you can help me with?
Relevant answer
Answer
Yes, the Wasserstein distance between two probability measures can be calculated using the Kantorovich-Rubinstein duality. For reference, check the paper "On the convergence of the empirical measure of eigenvalues of random matrices" by L. Pastur and M. Shcherbina, published in the Journal of Mathematical Physics in 2000. In this paper, the authors calculate the Wasserstein distance between the empirical measure of eigenvalues of GUE and GOE, which can be related to the Wasserstein distance between the underlying probability measures. The result is given in terms of the Tracy-Widom distribution, which is the limiting distribution of the largest eigenvalue of GUE and GOE.
I hope this helps.
  • asked a question related to Ensemble
Question
2 answers
Does some one know some examples or tutorial how to use the biomaRt R package to automate the gene name recognition of the Ensembl database?
Best wishes,
Relevant answer
Answer
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite.
Regards,
Shafagat
  • asked a question related to Ensemble
Question
5 answers
Hi everyone,
I have a doubt regarding the high negative average pressure values in MD. The prepared system was minimized (NVT ensemble) with 10ns using DESMOND software. While doing Simulation Quality Analysis, I found high negative average pressures after minimization. Further, when I performed MD (NPT ensemble) with 200ns, I observed the average pressure was around 4 bars. Is it okay to get high negative average pressure? What does the change from highly negative to positive values tell about the system? I appreciate it if you provide me with an answer. I thank you in advance. With regards,
Relevant answer
Answer
The pressure inside solids can be a very high negative value meaning you are expanding your system. But when you relax the system with NPT you do not get desired pressure but something fluctuating around meaning expand/ shrink. If you have perfect potential (smooth one with a good tail) you can continue NVT or even NVE and pressure remains constant with much less fluctuations. But if your potential needs some conditions for P to remain constant (like a large cutoff high k-space accuracy and so on), I suggest continuing with NPT because it is less expensive than those conditions. If this is the case, it is worth trying a large damping (some thousands of timesteps) which means you are barostat less frequently. Larger damping is less expensive but it may or may not reduce pressure fluctuations.
  • asked a question related to Ensemble
Question
3 answers
I'm planning to do some molecular dynamic simulation with Gaussian, (trying to simulate the alignment of an ensemble of water molecules in the presence of strong field), can anyone please explain if that is possible with Gaussian?
Relevant answer
Answer
Not valid for reporting the research.....
Use GROMACS, NAMD, AMBER etc for the same
Regards
  • asked a question related to Ensemble
Question
4 answers
how i can assign weights at the output of the class so that i can ensemble them
Relevant answer
Answer
Dear university staff!
I inform you that my lecture on electronic medicine on the topic: "The use of automated system-cognitive analysis for the classification of human organ tumors" can be downloaded from the site: https://www.patreon.com/user?u =87599532
Lecture with sound in English. You can download it and listen to it at your convenience.
Sincerely,
Vladimir Ryabtsev, Doctor of Technical Science, Professor Information Technologies.
  • asked a question related to Ensemble
Question
4 answers
all of the models are already fitted.
Relevant answer
Answer
Dear university staff!
I inform you that my lecture on electronic medicine on the topic: "The use of automated system-cognitive analysis for the classification of human organ tumors" can be downloaded from the site: https://www.patreon.com/user?u =87599532
Lecture with sound in English. You can download it and listen to it at your convenience.
Sincerely,
Vladimir Ryabtsev, Doctor of Technical Science, Professor Information Technologies.
  • asked a question related to Ensemble
Question
5 answers
Overfitting is a type of modeling error that results in the failure to predict future observations effectively or fit additional data in the existing model. It occurs when a function is too closely fit a limited set of data points and usually ends with more parameters than the data can accommodate. It is common for huge data sets to have some anomalies, so when this data is used for any kind of modeling, it can result in inaccuracies in the analysis.
Overfitting can be prevented by following a few methods namely-
  • Cross-validation: Where the initial training data is split into several mini-test sets and each mini-data set is used to tune the model.
  • Remove features: Remove irrelevant features manually from the algorithms and use feature selection heuristics to identify the important features
  • Regularisation: This involves various ways of making your model simpler so that there’s little room for error due to obscurity. Adding penalty parameters and pruning your decision tree are ways of doing that.
  • Ensembling: These are machine learning techniques for combining multiple separate predictions. The most popular methods of ensembling are bagging and boosting.
Relevant answer
Answer
Overfitting makes the model relevant to its data set only, and irrelevant to any other data sets. Some of the methods used to prevent overfitting include ensembling, data augmentation, data simplification, and cross-validation.
Regards,
Shafagat
  • asked a question related to Ensemble
Question
2 answers
i have a large nanoparticle (diameter of 16 nm) and I want to study its entrance through a lipid bilayer. i am running in NPT ensemble using GROMACS. However, for pressure coupling there are isotropic and semiisotropic options. The mdp documentation says that for semiisotropic coupling, the z is decoupled from the and y directions and is useful when simulating membranes. however, I tried both options.
the one with semiisotropic coupling, a huge change in the box dimensions in all directions occurred and this was because the nanoparticle penetrated the upper leaflet so the lipids moved away in both x and y directions increasing these dimensions which resulted in a decrease in the z direction.
the one with the isotropic coupling did not have a huge differences in the box dimensions.
attached are the images of both trials
Relevant answer
Answer
I think the semiisotropic would be the best choice in the case of the membrane, as mentioned in the documentation. If the box size didn't change with the entrance of the nanoparticle, the membrane pressure will be increased.
  • asked a question related to Ensemble
Question
1 answer
Hi,
I would like to generate background error using ensemble perturbation method. I am using WRF's genbe module. I am fairly new to modeling and I have no idea how to generate the ensemble outputs. Could someone please tell me the steps I have to follow to generate the ensemble outputs?
Thankyou.
Relevant answer
Answer
Hi
I wonder if you manage it, and can share the solution?
  • asked a question related to Ensemble
Question
1 answer
I am running a cooling simulation of stainless steel alloy using EAM potential. The initial configuration at 5000[K] was well equilibrated with a density plateau at around 7.9g/cc simulated in the NPT ensemble followed by additional NVT ensemble simulation to achieve a well-equilibrated structure. However, during cooling, I notice that the density of the system continue to decrease, indicating box expansion while the temperature was decreasing! I am using an NPT ensemble for the simulation and pressure is kept at iso 0 0 (gauge value). I do not understand why the box keeps enlarging while kinetic energy is being withdrawn from the system (as the temperature is being reduced).
Relevant answer
Answer
There are several Possible reasons. However I cannot be certain from the given information. Here are some possible reasons:
1. The NPT imposes a pressure condition, check what was the pressure before cooling and how it changes during. Maybe the box expanding to reach the desired 0 pressure.
2. Check the potential energy for possible phase transitions. 5000 K is super high, I doubt that an EAM potential will support that high temperature accurately.
  • asked a question related to Ensemble
Question
3 answers
I have raster images from three models and I want to create their ensemble output. The correlation among rasters is near 70%. Is it a scientifically true approach? Can we combine 3 model outputs like that?
If yes, what statistical method should apply
Relevant answer
Answer
Zaibun Nisa In Python, follow these steps:
1. Begin with a subset of the training dataset.
2. On the dataset, train a basic model.
3. Make predictions on the entire dataset using the third model.
4. Determine the errors using the expected and actual values.
5. Assign the same weight to all data items.
6. Give more weight to data pieces that were mistakenly forecasted.
  • asked a question related to Ensemble
Question
4 answers
We know that aqueous electrolyte solutions have a lower heat capacity compared to pure water. For example, the heat capacity of a saturated CaCl2 solution at 20℃ (74.5 g/100 g H2O) has a specific heat capacity (Cp) of ~2.4 kJ/kg∙K, much lower than that of water (4.18 kJ/kg∙K).
My question is, how can we explain this phenomenon on a molecular or even quantum perspective?
I understand that, at such a high concentration, there are very few "free" water molecules. The majority of them are "trapped" in the hydration shells of the Ca2+ and Cl- ions. These water molecules from dative covalent bonds with the ions, thus unable to have free translational or rotational movement (i.e. their degrees of freedom are decreased). The water-ion ensemble must now move together.
But how does that explain the lower heat capacity?
Relevant answer
Answer
it is, mainly, due to the
"H-bond(s)",
in ('pure & liquid') water.
The lower heat capacity in aqueous electrolyte solutions is due to the lower percentage of the Hydrogen bonding (due to the lower[1] percentage of the remaining 'pure & liquid' water).
1. It is a lower percentage because some quantity of (smaller "H-bonding" number/ing) this water, e.g. the rest percentage, is, actually, frozen/trapped near/by the (hydrated-)ions of the electrolyte.
  • asked a question related to Ensemble
Question
5 answers
How can I combine three classifiers of deep learning in python language ?
Relevant answer
Answer
can anyone tell how am i ensemble deep learning models those having different input shape array?
as Dnn using 2D input array shape while CNN and RNN using 3D shape input arrays.
all of the models are already fitted
  • asked a question related to Ensemble
Question
4 answers
hello everyone
my question is how i can find a standard deviation of a mode before and after optimization. i am working on supervised learning model. i am applying an ensemble technique to this. can anyone help me please.
thank you
Relevant answer
Answer
thank you all for the answers
  • asked a question related to Ensemble
Question
1 answer
Hi,
I wonder if anyone knows how to create ensembles using the pdb-tools (https://wenmr.science.uu.nl/pdbtools/reference)? The protein-protein docking have been completed by InterEvDock2 where I set different constraints and did several docking runs. Because the InterEvDock2 only performs rigid docking, I would like to normalize the score of different runs using pdb-tools. For each docking run, I need to create an ensemble with a few best docking poses. However, I don't know how to implement it in pdb-tools (what specific pipeline needs to be loaded).
Any feedback is welcome. Thanks.
Relevant answer
  • asked a question related to Ensemble
Question
2 answers
Ensemble docking allow to dock a single ligand or a ligand library against multiple conformations of a single receptor.
Now, imagine we have a group of proteins which are functionally conserved and share similar ligand/s. Moreover, they are highly similar in the structures (Identity rate in AA level is more than 90%) and almost a perfect superimposition of 3D structure can be made by different tools.
Docking analysis was performed for each protein solely and as expected the binding pocket and residues are similar.
Now here is the question: Can we perform Ensembled docking for this situation?
Relevant answer
Answer
Usually "ensemble" means "configurational ensemble," i.e., a collection of structures of a single system, with the structures differing only in atomic coordinates. But technically, yes, one can make an ensemble of different systems and dock into them -- if the structures (and pharmacophore features) align well . . .
  • asked a question related to Ensemble
Question
1 answer
Hello All, I am just a beginner on the CRISPR world. I need to extract a gene locus sequence from zebrafish. do you have a step-by-step guide to ensembl or any other software for me to do this?
Relevant answer
Answer
CRISPR is a technology that can be used to edit genes and, as such, will likely change the world. The essence of CRISPR is simple: it's a way of finding a specific bit of DNA inside a cell. After that, the next step in CRISPR gene editing is usually to alter that piece of DNA.Imagine a future where parents can create bespoke babies, selecting the height and eye color of their unborn children. In fact, imagine that all traits of life forms can be customized to one’s preferences: domestic pet size, plant longevity, and more.
The zebrafish has emerged as a leading model organism for the study of vertebrate biology, because of the remarkable cellular resolution with which the embryo can be studied, the ease of assaying its development and physiology in the laboratory, and its amenability to genetic analysesThe zebrafish is a powerful experimental system for uncovering gene function in vertebrate organisms. Nevertheless, studies in the zebrafish have been limited by the approaches available for eliminating gene function. Here we present simple and efficient methods for inducing, detecting, and recovering mutations at virtually any locus in the zebrafish. Briefly, double-strand DNA breaks are induced at a locus of interest by synthetic nucleases, called TALENs. Subsequent host repair of the DNA lesions leads to the generation of insertion and deletion mutations at the targeted locus.
  • asked a question related to Ensemble
Question
5 answers
hi
please guide
how to make multi model ensemble of regional climate model?
I am using south asia domain of cordex and my variables are precipitation, tmax and tmin.
there exists 153 different combinations for these three variables of historical, rcp 4.5 and rcp8.5 scenarios.
how to shortlist models and then how to proceed?
Relevant answer
Answer
I agree with Toni Klemm.
  • asked a question related to Ensemble
Question
4 answers
How we can find different independent configurations sampled from the NVT ensemble in gromacs?
Relevant answer
Answer
Dear Sutanu,
I am new to Gromacs so could you please tell me how I could find different configuration by using gmx trjconv.?
I have tried the command "gmx_mpi trjconv -f nvt.gro -s nvt.gro -o conf.xtc -t0 150 -timestep 40" but it's showing "no output, last frame read at t=0"
  • asked a question related to Ensemble
Question
5 answers
Hi all,
I attempt to use the long short-term memory (LSTM) of a deep learning method to generate the precipitation ensemble of 20 CMIP6 model simulations for SSP scenarios. Dear all, could anybody provide some memo or specification about LSTM in ensemble use? Anyway, thank you!
Relevant answer
Answer
It seems that LSTM-like models might not good at learning the long-term trends in time series and thus the predicted time series have low accuracy in long-term trends, even they show good variations (e.g., high correlation coefficients).
  • asked a question related to Ensemble
Question
3 answers
Anyone did the GO term enrichment analysis for the non-model organism or plants Differentially expressed genes recently? I tried to use the agriGOv2 analysis toolkit to get the GO term but couldn't access it. Is it down permanently? ShinyGO is another option but the genes need to be in a specific format like panther, ensemble etc. Could you suggest a better option of doing the analysis in R or using other software?
Relevant answer
Answer
Thank you
Reza Shokri Gharelo
and Jesús María Vielba for your suggestions. I came up with the goseq R package for the GO term analysis, for which I aligned my reads with a new version of the reference genome from the Ensemble Plants.
  • asked a question related to Ensemble
Question
4 answers
hello guys!
Can anyone tell me how I can ensemble a neural network. I use the patternet type. if some one know please help me. I am doing my code using Matlab. Can anyone please help with my code?
hope to get a reply from you guys.
thank you
Relevant answer
Answer
DEAR Chandrima debnath, please consider these links
  • asked a question related to Ensemble
Question
10 answers
Can anyone suggest any ensembling methods for the output of pre-trained models? Suppose, there is a dataset containing cats and dogs. Three pre-trained models are applied i.e., VGG16, VGG19, and ResNet50. How will you apply ensembling techniques? Bagging, boosting, voting etc.
Relevant answer
  • asked a question related to Ensemble
Question
6 answers
Hi all,
Besides CESM LENS2, which modeling groups provide large ensemble experiments in the CMIP6 era?
Thanks
Relevant answer
Answer
Dear Oliver
Why don't you use the ESGF database as your main source?
If you do not know how to use it for your purpose, I can help you.
Last year I started my research by choosing some SSP scenarios from this website.
Here is my first article:
Best regards
Amir
  • asked a question related to Ensemble
Question
1 answer
Tl;dr: I’m trying to convert gene IDs of an obscure MRSA strain from Ensembl Bacteria to KEGG.
Hello,
I’m trying to do a pathway enrichment analysis of MRSA strain 107 using GSEA. I have gene expression data that are associated with the gene IDs from Ensembl Bacteria. I plan to use KEGG as my pathway database.
GSEA requires a .gmt file of the gene IDs/enrichment data (of which the gene IDs are from Ensembl), then requires a pathway file (from KEGG). If I try to do the analysis with both of these files, the gene IDs don’t match up, so GSEA can’t do it.
My question is whether there’s a way to convert these gene IDs specifically with these strains of MRSA from Ensembl Bacteria to a site like KEGG. Here are the resources I’ve already tried:
DAVID
Dbtodb
Syngoportal
G:convert
MetaScape
BioMart from Ensembl
Annotationdbi
All these are tools that work, but they don’t include my strain. How should I convert these Ensembl Bacteria gene IDs? Is there another option I don’t know about?
PS. I don’t need to use KEGG; if a different pathway database works, that would also be acceptable.
Relevant answer
Answer
If you're having an issue finding an exact ID match, you can try this method.
You collect all protein sequences of the strain and use BlastKOALA/GhostKOALA (tool available in the KEGG) to perform Blast. It will provide you with the KEGG's KO IDs. These IDs can also be used for pathway analysis.
Thank you
  • asked a question related to Ensemble
Question
3 answers
I was wondering if training a neural network in the deep ensemble setting can lead to a network with a posterior vs. a point estimate architecture?
Recently there have been discussions over the interpretability of Deep ensembles as Bayesian models. This led me to this thought that whether or not we can learn a posterior at the end of training in such a scenario?
Relevant answer
Answer
This paper answers that Lakshminarayanan, B., Pritzel, A., and Blundell, C.Simple and scalable predictive uncertainty estima-tion using deep ensembles.InAdvances in NeuralInformation Processing Systems, volume 30. Curran As-sociates, Inc., 2017, https://proceedings.neurips.cc/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf
  • asked a question related to Ensemble
Question
3 answers
I would like to know why the system is equilibrated at 10 K with NVT and followed with 300 K with the NPT method while performing MD simulations. Please, provide available references.
Relevant answer
Answer
As you are now aware, there is quite a bit of "craft" in the practice of molecular dynamics simulations. If simulations could be run for seconds or even milliseconds, there would be no need for much of this craftmanship. As we are constrained to nanoseconds, or at most microseconds, one makes a concerted effort to gently push away from ground truth--the crystal structure---and into the deeper waters of everything floating freely in an NPT simulation. This is solely to avoid running long segments of your simulations in unphysical or unproductive conformations.
This strategy uses initial minimizations to clear bad contacts between solvating waters introduced into the model and hold ions in place because the force fields do not really represent ion coordination. (Van der Waals and electrostatic forces are spherically symmetric. Octahedral coordination is a geometrical accident in the force field, not an intrinsic property of ionic bonding.) Initial dynamics runs are typically NVT with backbone atoms constrained or fixed to allow side chains to begin softly swaying in the breeze. Subsequently, constraints are released and further NPT simulations are run for 1 fs time steps before production runs at 2 ps commence for the duration.
Some simulations also utilize high and low temperatures as another means of control. Low temperatures effectively constrain the atoms in the crystal structure without having to provide constraint information. High temperatures achieve something approaching longer time steps without resorting to higher-order integrations to maintain fidelity. I rarely found fiddling with the temperature to be of much use and, instead, fiddled with constraints. It is a matter of personal taste to a large degree.
You can examine results reported by other groups in similar molecular systems to see what their practices are. You can also look at tutorials for the codes that you are using. Most developers have something like a "best practices" page that explains some of the curious choices for the various parameters involved in the simulations.
  • asked a question related to Ensemble
Question
1 answer
Why do I get high negative pressure values at the NVT ensemble step with 10 ns at 10 K temperature, and the pressure increases to single-digit positive pressure values at the NPT ensemble step with 200 ns at 300 K temperature?
Please, provide the answer with a reference. I appreciate any help you can provide.
  • asked a question related to Ensemble
Question
8 answers
hi. I am working on ensemble learning algorithm and how i can implement this using an ANN. please help me on this. and how a simple ensemble looks like in matlab can anyone help me on this
  • asked a question related to Ensemble
Question
5 answers
Could you explain to me more about the cubist model?
is it ensemble or individual?
Thanks
Relevant answer
  • asked a question related to Ensemble
Question
6 answers
Also on NCBI, these Ensembl IDs match to one gene only
Relevant answer
Answer
Haseeb Javed
, it doesn't lead to a paper, could you please add the correct link?
  • asked a question related to Ensemble
Question
5 answers
I want to develop an ensemble approach where the final layer of a CNN model(Flatten layer in this case) will be followed by a K-Means Clustering algorithm where I want to cluster inputs into a number of categories same as required number of categories in a task. I want help regarding how to apply K-Means Clustering with a CNN.
Relevant answer
Answer
If you do a classification task you could just use both classification algorithm k-means and CNN to classify then you'll be more confident about your classification (even better if you use more than just two methods)
  • asked a question related to Ensemble
Question
1 answer
Hi everyone,
I am currently developing a Gibbs Ensemble Monte Carlo algorithm. I am trying to implement a Widom Insertion Method to calculate the chemical potential of the liquid-phase and gas-phase boxes; however, I haven't been able to successfully do it. My inter-particle potential is that of a hard sphere (i.e. equal to infinity when particles overlap). I suspect the issue with my implementation has to do with how I've been treating the instances where the inserted particles overlap with any of the particles already present in the box I'm trying to determine the chemical potential of. I've been guiding myself by the work of Frenkel & Smit; more specifically, the article attached. Can anyone with experience in this topic help me figure this out?
Thank you beforehand for any assistance anyone may provide!
Relevant answer
Answer
Widom's insertion method works for hard spheres, too. The Boltzmann factor exp(-beta Delta U) is 1 for a successful insertion and 0 for a failure. The problem is, however, that a rather large number of insertion attempts is required to obtain a meaningful ensemble average for the liquid phase.
For hard spheres it might be better to determine the chemical potential by thermodynamic integration or –if an insertion method must be used – by a multi-step insertion (i.e., insertion of a point particle followed by a gradually increase of its size).
  • asked a question related to Ensemble
Question
4 answers
Hi!
I`ve downloaded the "Supplementary_files_format_and_content" of one deposited dataset from GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129718 ). The excel file includes RPKM values, genes and annotation for each Sample. I am not interested in re-analyzing everything from scratch but I would like just to see the trend (either UP or DOWN regulation) of specific genes I am interested. My plan then was to convert the gene_id column into gene_symbol column so it would be easier for me to identify my gene of interest but I have noticed that a lot of gene has multiple transcript_id for the same gene (and ensembl id). How am I supposed to deal with this multiple "transcript_id"? which one am I suppose to look at?
I can have for example same gene_id (ENSMUSG00000028943), same locus (4:152120313-152152454) but different length (1375, 1593 etc) and different transcript_id (ENSMUST00000105657, ENSMUST00000105656). the transcrpt_id a lots of them for multiple genes (not only 2 as shown brefly above)!
Thank you in advance!
Camilla
Relevant answer
Answer
Hi! thank you very much! I will follow your suggestion (indeed it just to see whether some gene have a specific trend in agreement with my RNAseq data and hypothesis)!
  • asked a question related to Ensemble
Question
1 answer
Hi. I'm dealing with spatial transcriptomic data and find the gene of interest. Now we need to know what transcript isoform of the RNA was expressed in our sample. However, NCBI shows this gene has 3 isoforms while ENSEMBL only shows one. Thus we want to run spaceranger with the reference of NCBI, but 10X only provides the mice reference of ENSEMBL. So I downloaded the gff and fna file from NCBI, transfered the gff into gtf, then generated the reference directory as taught in the spaceranger tutorial. But spaceraneger can not work with this reference directory. It just crashes in the middle of the process. Did I do something wrong when generating the reference? Or does anyone have the mice NCBI reference for spaceranger?
Relevant answer
Answer
Hi Kleran
I think you followed the support (https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/advanced/references) and I see 2 points that could be at the origin of your problem:
- the first one is the genome you selected, is it mm10 genome?
- the second one is that data you downloaded must be compatible with STAR aligner, a point you need to dig in...
all the best
fred
  • asked a question related to Ensemble
Question
8 answers
Hi, everyone,
I just calculated a pure water-box(32 molecules pre-equilibrated by LAMMPS) to learn how to simulate a NVT ensemble by VASP, but unfortunately, I cannot get a converged energy profile(shown as the figure). It keeps increasing! Could anyone provide some suggestions?
Besides, I noticed that it is the potential energy of the Nosé thermostat keeps increasing, while the F or E0 converges well.
Here is my INCAR file:
SYSTEM = Test
LSCALAPACK = .FALSE.
#Start parameters
NPAR = 6
PREC = Normal
LREAL = Auto
ISTART = 0
ICHARG = 2
#Electronic relaxation
ENCUT = 600
ALGO = Fast
NELM = 300
EDIFF = 1E-5
NELMIN = 5
#MD parameters
ISYM = 0
IBRION = 0
POTIM = 0.5
NSW = 30000
TEBEG = 300
IWAVPR = 11
#NVT canonical model
ISIF = 2
MDALGO = 2
SMASS = 0
#DOS related
ISMEAR = 0
SIGMA = 0.05
#Switches
LWAVE = .FALSE.
LCHARG = .FALSE.
IVDW = 11
Thanks a lot.
Relevant answer
Answer
Norman Geist , thanks. I did not know the problem of DFT before. I will have a try :)
  • asked a question related to Ensemble
Question
7 answers
i was making a classification model ( 3 class ) for early detection of cracks in ball bearings, the data set is limited 120 rows and 14 features. the classifiers and their parameters is listed below can you please suggest which model will be the best (not simply accuracy also consider model complexity )
Relevant answer
Answer
It is better to use 10 fold cross-validation mode for calculating results and comparing with other trees.
Most probably if this is vibration data, random forest tree always exhibits superior performance.
  • asked a question related to Ensemble
Question
3 answers
I am working on the future stream flow of my study area by using a single GCM. Before this, I did accuracy assessment of all the available GCMs on the basis of available data and selected the top most model for further use. Is this a good approach? I do not want to use the ensemble data of 4-5 models.
Relevant answer
  • asked a question related to Ensemble
Question
5 answers
We need to prepare a weighted average multi-model ensemble of projected future daily precipitation by assigning weights to individual CMIP6 models based on past performance. For this purpose, We want to use Bayesian Model Averaging. Since the distribution of precipitation is highly skewed with large number of zeros in it, a mixed (discrete-gamma) distribution is preferred as the conditional PDF as per Sloughter et al., (2007).
Considering 'y' as the reference (observed ) data and 'fk' as the modelled data of kth model,
The conditional PDF consists of two parts. The first part estimates P(y=0|fk) using a logistic regression model. The second part consists the following the term P(y>0|fk)*g(y|fk).
Since the computation of P(y>0|fk) is not mentioned in the referred manuscript, If I can compute P(y=0|fk), Can I compute P(y>0|fk) as 1-P(y=0|fk) in this case?
If not, Can someone help in computing P(y>0|fk)?
You can find the the referred paper here https://doi.org/10.1175/MWR3441.1
Thanks
Relevant answer
Answer
Yes. You can proceed with that formula as you deal with Precipitation data, which contains only non-negative values. according to axioms of probability P(y≠0|fk)=1-P(y=0|fk).
You can find a worked example in the book titled “Statistical Methods in Hydrology and Hydroclimatology(DOI: 10.1007/978-981-10-8779-0)”
  • asked a question related to Ensemble
Question
6 answers
Dear researchers
Objective:
We've applied machine learning methods such as artificial neural networks, random forest, and support vector machines to predict stroke patient's recovery.
Materials and methods:
We have stroke patients' clinical data from EMRs(electronic medical records) and their kinematic data obtained by the exoskeleton robot's sensor system(from gait training).
The clinical data are ordinal and categorical, and the kinematic data are time-series data.
Clinical data and kinematic data have been integrated into tabular data by applying moving windows to time-series data (obtained mean, std, median, max, and min).
Limitations:
In our experience, it was not easy to use all the data for training at once because the types and characteristics of clinical data and kinematic data were different.
Thus, we are applying the ensembling method to various neural network models.
(We've tried conventional bagging or stacking algorithms to the outputs of the neural networks.)
Question:
At this point, we would like to know some reasonable, preferred, recommended methods for ensembling the neural network models with different data learned separately. (i.e., how to combine a neural network model trained by clinical data and another model trained by kinematic data)
Relevant answer
Answer
  • asked a question related to Ensemble
Question
9 answers
Hello everyone
I am working on a sparsely gauged mountainous watershed.
I want to use RCM for precipitation and temperatures.
Please help me how to select RCMs and please also share source from they can be obtained.
Relevant answer
Answer
This is an answer to the question you asked Dr. Sherien
Summera Fahmi Khan you can interpolate four points surrounding the gauge point to extract the RCM data
then you can use KGE for comparison between both (RCM data and Observation data)
  • asked a question related to Ensemble
Question
3 answers
I have a VASP MD simulation of a 2x2x1 supercell of Al2O3 totaling 120 atoms. The supercell was initially relaxed and then run for 1500 time steps (0.1 fs time step, 1e-7 EDIFF) in the NVE ensemble (MDALGO=1, ANDERSEN_PROB=0). Velocities were initialized to 500 K (TEBEG=500). As a sanity check, I ran the same MD simulation with TEBEG=0 and the energy does remain constant. I'm struggling to understand why there is an initial jump in the energy. My intuition is that the energy should be more or less constant as in classical MD. Is there a reason for this?
Relevant answer
Answer
Hi Giacomo, yes I also ran NVE with MDALGO=0 and SMASS=-3. The results were essentially identical. I have a continued simulation running now. It remain consistent to 3000 steps so far. I'll try a longer time step and lower temp as you suggest.
  • asked a question related to Ensemble
Question
7 answers
Is there any simple code to perform the training of ensemble classifier of SVM and ANN on a set of data (available in Matlab like wine, fisheriris, etc... )
Thanks
Relevant answer
Answer
Jane Sun I cannot find the web-link as given
when googled, I get : https://www.solvergen.com/blog
and at the blog tab, its given : 502 Bad Gateway
nginx/1.14.0 (Ubuntu)
  • asked a question related to Ensemble
  • asked a question related to Ensemble
Question
3 answers
I have identified 12 transcript variants of my gene of interest from ensembl and I want to find the expression of these transcripts in body tissues using GTEx. I think to do this however, I need to have the rs number for the transcript variants. I was wondering if anyone can suggest the best way to go about finding this information out as I am struggling?
Relevant answer
Answer
I would also say that you are confusing concepts, in GTEx you will find eQTLs or sQTLs in which the presence of an SNP (rs) is correlated with the expression of a gene (eQTL) or with the expression of the isoforms of a gene (sQTLs). Therefore, if you have the Ensembl ID of the transcripts of a gene and you want to compare it with the GTEx database, you will possibly get the SNP that is correlated with it. I don't know if I made myself clear.
  • asked a question related to Ensemble
Question
3 answers
The main result of decoherence theory is that the non-diagonal elements of a quantum object's density matrix become zero due to uncontrolled interactions with the environment. For me, that only means that there will we no more interference effects between the superposed states. But there still remain the diagonal elements of the density matrix. So there is still a superposition of classical alternatives left. How does that solve the measurement problem ?
Moreover, doesn't the mathematical derivation of the decoherence effect involve an ensemble average over all possible environmental disturbances ? How does this help when we are interested in the behavior of a specific system in a specific environment ?
Relevant answer
Answer
Thanks to 'Juan Weisz' and 'L. I. Plimak' for your quick answers!
I just want to add that the reason for my question was an article ( https://arxiv.org/pdf/1612.00676.pdf ), in which physicists were surveyed about their attitudes concerning the foundations of quantum mechanics. I was shocked to see (in Fig.6) that 29% considered the measurement problem as solved by decoherence, and 17% considered it even as a pseudoproblem. I my opinion, the measurement problem is absolutely important, but still unsolved.
  • asked a question related to Ensemble
Question
8 answers
I'm working on the impact of climate change on water resources. how to choose the best ensemble from RCM projected rainfall? what method I should use to compare different RCM and choose the best out of that?
Relevant answer
Answer
Dear Razi,
that is a good question which puzzled me a lot during the last few years.
There are various approaches on that and it always depends on your expectations ...
1. As decision maker you may want to use various future projections, covering e.g. wet a future, a dry future, or a hot or cold future. There are some papers from Alex Ruane on this ... I have applied it also in my latest publication "To bias correct or not to bias correct ..."
2. You may also want to determine the "best" performing RCM based on an validation of the historical "baseline" simulations. Then you compare the historical simulations with the statistics of observations or re-analysis data.
The question here is if this gives you enough credibility that the future projections are also the most reliable ones? The underlying scenarios to drive the RCMs are extremely uncertain, which means that even you trust a certain RCM more than another one, this does not urgently lead to more reliable simulations for the future period.
I would therefore prefer the first procedure of considering RCMs which cover different states (cold/wet, hot/wet, cold dry, cold wet, normal) to generate an ensemble, but if you think you can identify the best performing (or if you are only interested in simulations for the past), there is a subsetting algorithm from the group of Samaniego (main author is Stefan Thober).
  • asked a question related to Ensemble
Question
2 answers
In addition to the experimental data, various thermodynamic models are used to evaluate defect concentration in materials. How to understand those thermodynamic models such as Wagner Schottky and Bragg- Williams?
Regards
Subha Sanket Panda
Relevant answer
Answer
Raj Kumar Kapooria sir, Thank you very much for the answer. But sir my query is regarding those thermodynamic models and how to interpret them in any system practically?
Thanks and Regards
Subha Sanket Panda
  • asked a question related to Ensemble
Question
1 answer
I have generated denoised images using several models and would like to ensemble at the prediction level to achieve superior denoising results. What would be the best way to combine (averaging, max voting, weighted averaging, etc.) these denoised images to achieve superior denoising performance?
Relevant answer
Answer
Sorry but I don’t practise those skills bécause I’m just a student in thé second year of my bachelor in Political Science in Université Saint-Louis in Brussels
  • asked a question related to Ensemble
Question
4 answers
Hello everybody
I ran 50 ns MD simulation in NPT ensemble, using Desmond, on tyrosinase a metallo-enzyme containing Cu 2+. I am doing this to evaluate the stability of the complex obtained from molecular docking of the protein with an active ligand. Before running the production stage, I used the default relaxation protocol provided by Desmond. The system was parametrized by employing the OPLS3e force field.
Cu 2+ chelation by the catalytic histidines remains stable during the entire simulation. However, the ligand, which do not chelate the ions, already at the first frame of the simulation, tend to leave the active site. So, I would like to figure out what could be the reason. Furthermore, I would like to have some suggestion about some specific relaxation protocols different from default one, to deal with the problem just mentioned.
Relevant answer
Answer
i would run several equilibration MD simulations, starting with different initial velocities, at very low temperature (5K), with tight temperature control, and with very short time step (0.01 fs), and gradually bringing the system to the state at which the data will be collected for analysis.
  • asked a question related to Ensemble
Question
3 answers
hello,
in order to find methylated regions of murine promoter we are currently integrating data from Ensembl, DBTSS and EPD. Any suggestion for further databases?
Relevant answer
Answer
K-SPMM: a database of murine spermatogenic promoters modules & motifs.
  • asked a question related to Ensemble
Question
1 answer
I am a beginner at Molecular dynamics. I am trying to gather snapshots of a given material at different temperatures. For each temperature, I have thought to increase the system temperature to "T" K, and equilibriate at that temperature. For the first part(heating) I am using an NVE ensemble. But somehow the temperature is not raising beyond 0K (start temperature). Is this because there is no thermostat at NVE? What can be the alternative route? Using something like the Berendsen Thermostat(SMASS=-1)?
Input file :
PREC = Normal ! standard precision
ENMAX = 400 ! cutoff should be set manually
ISMEAR = 0 ; SIGMA = 0.1
ISYM = 0
IBRION = 0 ! molecular dynamics
IALGO=48
ISIF = 0
NSW = 380 ! 1000 steps
POTIM = 0.5 ! timestep 0.5 fs
MDALGO = 0
SMASS = -3
TEBEG = 0; TEEND = 190 ! temperature
Relevant answer
Answer
Zero temperature means no motion.
  • asked a question related to Ensemble
Question
4 answers
Hi there, Is there any software tool or database to identify the entire 5' and 3' UTR regions of bacterial gene? I am aware that eukaryotic genes are clearly annotated in Ensembl and Genbank with these details. But unfortunately I couldnt able to find this information for bacterial genes. Your help on this would be very much appreciated. Many thanks in advance.
Relevant answer
Answer
Hi Mohamed,
I had the same question, and I addressed it like this:
I looked for the target gene/transcript in NCBI, found the fasta sequence. This is in most cases the CDS of the gene. Then I doublechecked whether this gene HAS an additional 5'/3' UTR in the transcript. ENSEMBL has also a bacterial platform, which is great. For my target gene, the same sequence was found without additional UTRs. So I assume it just hasn't any (Which might be true for many genes, as it's polycistronic RNA).
But, if you have found a better solution, I'm very happy if you could share it.
  • asked a question related to Ensemble
Question
6 answers
Hi
Dear researchers
In ensemble-based architectural design
Which algorithms are more useful for classification?
What is the difference between parallel and ensemble architecture?
Thanks
I am waiting for your answer
Relevant answer
Answer
RUboost is available in matlab
  • asked a question related to Ensemble
Question
3 answers
I am new to Docking and MD, and currently try to catch on it. I try one of the webserver for docking, but unfortunately not working. it says " Your PDB contains multiple forms of the same residue VAL 134. This is not supported in the current form. If you would like to supply multiple conformations, please create an ensemble". Then, when I checked manually, I found many atom that had different version (attached in file). Anyone has suggestion to fix the problem? Thanks!
Relevant answer
Answer
These are "alternative locations", meaning that in high resolution structure, you may observe different conformations in the electron density. Depending on the programs you are using, you either need to set some parameter to tell the program to ignore all but the most highly occupied conformation, or you need to pre-process the PDB files to remove the secondary locations. Many structure visualisation programs contain options to select, and by extension, selectively delete alternative locations - check the manual for the program you are using
In VMD, you can use the PDB plugin to do so. https://www.ks.uiuc.edu/Research/vmd/plugins/molfile/pdbplugin.html
In Rosetta, a simple python script can be used to clean up a pdb file:
tools/protein_tools/scripts/clean_pdb.py   - Prepare PDBs for Rosetta by cleaning and renumbering residues.
in PyMOL, you can use the removal.py script: https://pymolwiki.org/index.php/Removealt or by simply specifying
remove not (alt ' '+'A')
alter all, alt=' '
  • asked a question related to Ensemble
Question
44 answers
While teaching Quantum mechanics to beginners, do you feel that the traditonal historical development of the subject followed by wave mechanics approach be replaced by axiomatic introduction to the subject, followed by discussion on Quantum mechanics of spin ensembles approach ?
which would be better mode of exposition option for such studentship level ?
Relevant answer
Answer
Dear Debopam Ghosh, interesting question.
I would suggest two approaches, following my own humble experience as a student and as a teacher.
If one wants to teach Quantum Mechanics for Quantum Computing and related fields, a more linear algebra (math) approach can be used.
If one wants to teach Quantum Physics to engineer, physics and chemistry students, a modern physics approach is needed, since students need to understand all experiments before going into the math.
It is just an opinion.
  • asked a question related to Ensemble
Question
3 answers
On what base could it be possible to determine the number of models? does using two or three models during a given application could it imply multi-model approach?
Relevant answer
Answer
interested
  • asked a question related to Ensemble
Question
2 answers
Do you have excellent knowledge of both SAS and Matlab programming, and would you be interested in collaborating on a manuscript that deals with Methodologies for Ensemble Forecasting, with application to fisheries population dynamics? You are preferably a MSc/PhD student with strong quantitative background.
Relevant answer
Answer
This is interesting topic, mostly concerned with classification problem for Fishes species recognition. Can be performed via MATLAB or Python. I used MATLAB by considering the data at hand as manifolds valued data via parametric modeling framework.
  • asked a question related to Ensemble
Question
3 answers
Hello,
I have a list of Ensembl protein Ids ("ENSP...", got them from PAXdb) and I wish to find their matching dna sequences.
It seems trivial but I didn't find a way to do it...
I could find the appropriate gene Id for each protein and then get the cds nucleotide sequence but it seems inaccurate (because of alternative splicing).
Any thoughts?
Thank you!
Relevant answer
Answer
Use Ensembl Biomart.
  • asked a question related to Ensemble
Question
4 answers
Hello All,
I am working on MD simulation study using DESMOND for a protein-ligand complex (size of my protein is around 500 a.a). Can anyone please tell me on what basis I need to set the different parameters for the same, like -
1. Simulation time
2. Recording interval for energy and Trajectory
3. Ensemble Class (NPT, NVT, etc)
Thank you all
Regards
Relevant answer
Answer
There are many papers you may consult where Desmond is used for P-L simulations.
Starting with scratch you may initially do it for
Simulation time: 100ns
Recording interval = 20ps and NPT ensemble.
This might take around 4-6GB of your storage. If you are not satisfied with ligand stability you may further extend that simulation as well,=.
  • asked a question related to Ensemble
Question
2 answers
Does anybody know? The site http://pedb.vib.be/ seems to do not work. Maybe, it is located now in different site?
Relevant answer
Answer
  • asked a question related to Ensemble
Question
1 answer
I am trying to calculate some hydrodynamic properties from MD results. Part of the process in to calculate the transvers current correlation function which is formulated as C(q,t)=⟨J∗(0)J(t)⟩
. The issue is that this formula is regarded as canonical ensemble average in literature which should be calculated based on parameter $\Beta$. However my intuition is that this should be a form of autocorrelation or cross correlation of a rolling window. This is confusing to me and I would like to ask if anyone can provide me a pseudo code example for this calculation.
Relevant answer