Science topic

# Comparative Modeling - Science topic

Explore the latest questions and answers in Comparative Modeling, and find Comparative Modeling experts.
Questions related to Comparative Modeling
• asked a question related to Comparative Modeling
Question
Me and my team are developing a statistical test to compare models and observations in a model-independent way. We found some 2d and 3d example of models coming from either simulations or theory and their respective observables to use, however, 1d scenarios are somehow harder to find. If you know some famous example, or directly from your own research we would be happy to include it in our paper.
Ideally the example would have a set of 1d observables (e.g. mass, period, spin) and a sample simulated from the model in exam (or a specific parametric form for that model, so that it is straight forward to sample from it).
Fell free to contact me directly if you want to know more about the project or on how we would use your example!!
If I am understanding this correctly the only real answer is a model with no observables.
• asked a question related to Comparative Modeling
Question
Greetings everyone and thank you in advance for your response to my questions.
I am currently learning my way through using AMOS for my multiple regression analysis, so I am humbly asking for some guidance, insights and opinions from fellow experts in here.
I have two models that I want to test out the roles of social media addiction, social comparison (two types: upwards comparison and downwards comparison) and experience in using skin lightening products on skin tone satisfaction. Based on these constructs and literature reviews, I have built two models, the proposed model and the alternative model. The models are as in the image attached.
My questions are:
1) How can I compare models if I used multigroup analysis to test for the moderation effect of social media addiction in the proposed model, i.e., Chi-square difference? I converted social media addiction into 2 groups: addicted and not addicted to social media. If I were to compare AIC and BIC values with the alternative model for model selection, which AIC&BIC values should I use? The AIC&BIC from n = all participants, or from each "addicted" and "not addicted to social media" groups? Is there any steps-by-steps guidelines that I can follow to properly compare a multigroup analysis model?
2) Based on my models, am i doing it right if I compute the AIC and BIC values for each hypothesis in each prospective models (i.e., H1 is considered as R1 model, H2 as R2 model etc.) or I should just compute the AIC and BIC values for the two models as it is (testing for all hypotheses simultaneously, hence computing the AIC and BIC of the model as a whole)?
I apologize if the questions are lengthy. I tried to explain as much as I could because I cannot find any literature to help me with this uncertainty of mine. Your responses are very much appreciated!
This is an interesting question, but in my opinion, the pure statistics of fit inidces cannot tell you which model is "true". It needs a strong theory and moderation and mediation have very different implications. A cross sectional model is not suitable to distinguish between them, but a longitudinal may catch the mediating role.
Just as an illustration, why you may unable to decide which model is better, here a small simulation in R:
set.seed(666)
X <- rnorm(100)
M <- 0.5*X + rnorm(100, 0, 0.8)
I <- X*M
Y <- 0*X + 0.5*M + 0.5*I + rnorm(100, 0, 0.8)
# MEDIATION STEPS
summary(mod1 <- lm(Y ~ X))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.2350 0.1271 1.849 0.0675 .
# X 0.2545 0.1238 2.055 0.0426 *
summary(mod2 <- lm(M ~ X ))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.06358 0.08997 -0.707 0.481
# X 0.65975 0.08766 7.526 2.56e-11 ***
summary(mod3 <- lm(Y ~ X + M))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.2636 0.1214 2.171 0.03239 *
# X -0.0421 0.1482 -0.284 0.77703
# M 0.4495 0.1360 3.305 0.00133 **
# MODERATION
summary(mod4 <- lm(Y ~ X * M))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.11526 0.08753 -1.317 0.191
# X 0.08135 0.09908 0.821 0.414
# M 0.54141 0.09069 5.970 3.99e-08 ***
# X:M 0.56770 0.05100 11.132 < 2e-16 ***
In this model the classical Baron and Kenny approach (I didnt want to do the extra step via indirect effects) we have a full mediation. X predicts Y and also M. In the MR the influence of X is reduced to zero and M has still a significant weight.
In the moderation, the interaction is significant. So, both models are correct and represent the original simulated model. Therefore, in this simulation both models represent the true underlying model and neither is wrong.
• asked a question related to Comparative Modeling
Question
Hey everybody!
I'm implementing a Bayesian Negative Binomial using STATA 17. Because of some colinearity or convergence issues, I needed to put my variables in different blocks in the modeling process. Yet, it is a bit confusing to choose the most optimum number of blocks (and also their exact set of variables) for a model. Do you have any idea about it?
Apart from that, what criteria do you suggest (DIC, Acceptance rate, Efficiency, variables significance, etc.) for comparing models developed using various number of blocks?
I don't quite understand the "blocks" portion, but if you are trying to split variables it may be the case that the pathologies you have observed (poor convergence) that you should Google more about "identifiability" in the context of modelling.
I may be too far removed from academia since leaving for industry, but quite frankly all the criteria you mentioned are garbage. What is important is what the posterior distribution looks like (which you already have as a precursor to those criteria) and how it behaves. So take the samples and do that fun part of science where you "play" and look into what the predictions do and change when you fiddle with things and try to gain insight
• asked a question related to Comparative Modeling
Question
I am researching trading strategies, particularly involving time series analysis, but also machine learning, on data sets that range as far back as 1950 and would like to compare the accuracy of various models, up to ~25 at a time.
My for loops in R for an ARIMA + GARCH model of the S&P500 values starting from 1950 to today take about 10 hours to evaluate all models with p and q both ranging from 0 to 5.
My specs are an AMD Intel i5-8600K, 16GB of DDR4 RAM, and an AMD R9 Fury, but I am not using any parallel processing.
Thank you.
It depends on the type of analysis. I do it on i7 with 16 GB RAM and it works fine.
• asked a question related to Comparative Modeling
Question
My hypothesis is that the functional diversity of fish (responsible variable) increases until an optimum level of a gradient of habitat structural complexity (predictor variable), but decreases after that (which I have noticed in the graphic inspection). Then, I am really interested in this hump-shaped relationship.
To test this relationship, I will run a beta regression. In R, I have found two ways to include a second-order term in the model: I(x^2) and poly(x,2). The first one does not include the lower-order term, but the 'poly' function does.
According to Cohen, Cohen, West, & Aiken (2003), in order that the higher order terms have meaning, all lower order terms must be included, since higher order terms are reflective of the specific level of curvature they represent only if all lower order terms are partialed out.
First, I would like to know if this is a consensus and if I really cannot use only the second order term as a predictor.
Second, if I use a likelihood ratio test to compare models (e.g., only first order term vs. first and second order term) and the result is not significant, how can I choose the best model?
Hi, Barbara
First of all, I appreciate your way of previously analyzing data and I endorse all Jean's suggestions. The way of including second-order terms are also the best I know so far.
For clarification, I have some minor perspectives.
First, to understand the statement of Cohen et al. to justify the inclusion of the lower-order terms when dealing with quadratic (or another polynomial) equations, it is easier to drawback to their basic form. Let's take a look at quadratic ones:
f(x) = a + bx + cx²
The lower-order term, in this case, is b, which is, by definition, the coefficient that defines the position of the curve on the X (horizontal) axis. If you remove it from your model fitting, you will declare that it is equal to zero and will force the inflection to occur at (0,y). In practice, your hump-shaped equation will peak when 'habitat complexity' is zero, which may be unrealistic. Thus, it is fundamental for model fitting that your b should be non-zero, and I must say negative, so the curve will trace rightwards.
Second, the likelihood approach actually provides you with some useful guidance. If not sure, confirm it through the AIC.
Best wishes,
Matheus
• asked a question related to Comparative Modeling
Question
Hi all! I know that Pagel's λ which can be estimated using gls() in R is not appropriate for binary data, but I wonder if I can still use gls() in order to compare models with different correlation structures (Brownian, OU, etc) using gls(). None of the packages I have found, which are appropriate for binary models, produce AIC values which I can use to compare the models. Cheers!
You probably want to have a look at the geiger package and do fitDiscrete() with transform=lambda for Pagel's λ. AICs are reported.
Brownian Motion and Ornstein-Uhlenbeck models are things that apply to continuous data, though.
• asked a question related to Comparative Modeling
Question
We have a survey with N = 7000 responses to a set of Likert-style questions, demographics, and similar. We separated it into training and test samples, and are now doing exploratory analyses on the training data.
I was expecting to generate a list of hypotheses like in other social psychological work, e.g., "A will correlate positively with B". We could do that and verify them in the holdout sample. However, I don't know how in this kind of test to evaluate the fit of the prediction. r > 0 is not very informative.
We could run models such a regressions with certain factors, and then compare model fit with or without sets of predictors; that could be evaluated by comparing indices of fit. But most of the predictions I imagined weren't models like these.
So the question is: can you recommend a way to specify hypotheses about correlations? Should we be using an equivalence test to test against a certain effect size? Is there any way to do this in aggregate for a big table of correlations?
As well as correlations, could you do t-tests and ANOVAs, and with the number of responses you have, regression studies?
• asked a question related to Comparative Modeling
Question
Hello,
I am pretty new with this CFD topic.
I am about to design an underwater towed body (close to the surface, a part is surface piercing) and I want to get informations regarding the resistance of the hull. For this I was looking for comparable models. The best I found was a Torpedo with an wetted surface very close to my body. I set up a simulation and now I have results, but I am not really sure, if everything I set up was right. So I set up a simulation for the Torpedo as well (same settings), but the average Drag is twice as much as reported in the paper. To specify my questions:
1. There are plenty peaks within the Drag curve shouldn´t (if all goes right) the curves become "flat"?
2. I am not sure about the boundary settings, orientated at the boat tutorial.
Hi Jan,
To start with, no, your simulation doesn't seem to have converged - those peaks look rather suspicious. If there is a steady solution for your simulation, the drag should indeed converge to a constant value. Those peaks look like some numerical artefact, but still, taking a closer look at your results (velocity field, pressure, volume fraction, etc.) as a function of time might be helpful.
What settings to choose depends on your problem: Froude number, Reynolds number, whether to expect cavitation, streaking, etc. Sorry, I couldn't make much of the screenshots you included. Are you using a steady or an unsteady solver? RANS, LES, DNS, laminar? Domain size relative to towed body size? Symmetry? Rigid body motion/moving mesh?
You mentioned a paper that reported drag values for this torpedo. I think the best way to go would be to try and reproduce these results first, perhaps contact the authors to find out which settings they used.
Hope this helps,
Sita
• asked a question related to Comparative Modeling
Question
I am doing meditation where life satisfaction is a mediator between Tourism Impact and Quality of life and I want to compare this model in two regions. Do I have to check the mediation in both the region separately or there is any other way? kindly suggest a way.....
Check how we did in our paper:
• asked a question related to Comparative Modeling
Question
I am estimating the relationships between Economic Growth (GDP), Public Debt and Private Debt through a PVAR model in which my panel data consists of 20 countries across 22 years.
First of all, how can I know what is the optimal lag length I should be using for such an analysis?
Then using IPS test for stationarity, two of my variables turn out to have unit roots in the panels and that's why I would think of taking the first differences in both of them to use in the big PVAR model. Would you think this is correct to follow?
Given I mentioned that, then this means I should be worrying about Co-integration? If yes, what do you advise me to do?
Finally, when analyzing my estimations after different changes, I cannot compare the models since there is no R-Squared, do you think there is a specific test I can run to compare models?
Thanks a lot!
Dear Ahmed, for your optimal lag length, you should first run a PVAR and from there you go to lag structure to see the optimal lag length chosen by the following information criteria: AIC, SIC, HIC, BIC. But I would suggest that you set a maximum lag length within which any of the aforementioned information criteria can choose the optimal lag automatically.
For your panel unit root test, there are various procedures that come together when you run the test. The LLC, IPS, Fisher-ADF, PP tests. If there is a mixed result of unit root test among those procedures, you can go with the outcome of the majority.
If your variables have unit root or they are non-stationary and require differencing, then you need to run a panel cointegration test to find out if long-run relations exist. The procedure for cointegration test is the Johansen-Fisher combined test and the Kao residual-based test.
For model comparison between fixed and random effects, you may have to run the Hausman model selection test.
I hope this helps...
• asked a question related to Comparative Modeling
Question
My research area is Entrepreneurship. I have a new research proposal on how Entrepreneurs can finance their projects through crowd funding. Literature search revealed that U.S.A is already adopting this novel finance model. Therefore, I need to collaborate with researcher(s) from U.S.A, Europe or other International organizations, for research grants/aids especially in area of data collection since I am comparing the model in developed and developing countries.
Interesting research area. However, crowdfunding yet not popular in developing countries.
• asked a question related to Comparative Modeling
Question
Dear partners, I try to model a RC high building with shear walls in Sap2000. (20 stories). But I need to model with wide column method. I read a lot of information about this, but when I compare the model with wide column method and the same model with shell wall, the results are very differents.
For this reason, I need help with the steps or definitions of the properties and seccion of the elements: Wide Column and rigid beam. Or your comments about this topic.
Attached a image of the plan.
• asked a question related to Comparative Modeling
Question
when I run glm function in r and try to select the best prediction equation, the following message appear in r and can't go further.could any one please help me, how can I scape from this problem?
Warning messages:
1: In dpois(y, mu, log = TRUE) : non-integer x = 14.610000
2: In dpois(y, mu, log = TRUE) : non-integer x = 14.390000
3: In dpois(y, mu, log = TRUE) : non-integer x = 14.420000
4: In dpois(y, mu, log = TRUE) : non-integer x = 14.790000.
and to compare models using AIC, the result turnsd in to infinity and i cant understand it
The Poisson model is about counts. Count values are positive natural numbers including zero (0, 1, 2, 3, ...). In your data seem to be non-integers, that is, floating point numbers with some non-zero decimal digits. The waring show these values (14.61 is not an integer; this cannot be a count value: how will one count 14.61 things?).
If your data should be counts, than it should not contain non-integers. If your data are not representing counts, then the Poisson model is wrong. You should consider a gamma model or a log-normal model instead.
• asked a question related to Comparative Modeling
Question
Hi,
To all stats experts, is there a way to compare if a model fits better in one population compared to another? I'm thinking, maybe by comparing the model fit indices?
Hi Haram,
in my humble opinion, better than comparing the model-data-fits would be to test each model and response to misfit by inspecting possible reasons.
The reason is that if both models don't fit you get nothing by choosing the better fitting model (as both indicate problems). Choosing the better fitting model is based on the unreasonable assumption that the better fitting model is the model with the higher degree of causal correctness (less causally biased) but
a) This assumption is flawed because models which are tremendously wrong can have good fit indices
b) Even if its "more correct" it may be still substantially wrong.
The fit-philosophy behind SEM resembles a bit the prediction-orientation of regression models where the goal is to find a nice prediction of the data. If this is your goal/orientation as well, go for it, that's fine.
If however, your orientation is one of a critical realist and you attempt to specify a model that corresponds to a natural causal system "out there", than watch the details (chi-square test) as they can tell you interesting things from which you learn.
My 2c
Holger
• asked a question related to Comparative Modeling
Question
I have a cooperation proposal for researchers who are interested in long time research aimed to C2C, B2C, B2B co-creation processes based on transaction cost. On beginning I am able to share my questionnaire and data base related to problem transactions costs and co-creating value of social media busisness model. I used SEM analysis to proof of thesis about dependence between transaction costs factors like: behavioral uncertainty, opportunism, assets specificity e.t.c and comparative transaction costs in choice of different type and form of value co-creating. I compare in this model factors coused costs and benefits for users of social media app. We could publish the results together.
I am interested
• asked a question related to Comparative Modeling
Question
I have three dependent variables, and 10 predictors and I am analyzing the data with multivariate regression. However, I need to compare the model and the contribution of each predictor with another groups. Any ideas how to proceed?
The interaction approach already suggested by everybody here is the way to go for comparing models. As far as looking at the importance of variables, I would suggest trying lasso(also known as elastic net). Programs are available in R. I have added some references. Best wishes, David Booth
• asked a question related to Comparative Modeling
Question
Hello, I have a longitudinal data (30 measures) from 30 subjects. These subjects are divided into three groups (a, b, c).
My question is on how should I build the LME, this is one possible approach:
I could start with the null model (M1 = response ~ time)
and then include an additive fixed effect effect from the groups, this would result in (M2 = response ~ time + groups) and compare both. Then, include an interaction term (M3 = response ~ time * groups)
and again compare.
Then, adding the random effects for the intercept would result in (M4 = response ~time*groups, random = 1|Subject), and finally the full model, with random effects for both intercept and slope (M5 = response ~ time*groups, random = Time|Subject).
On the other hand, I could start including the random effects from zero (M1). Is there a correct approach to this problem?
On the comparison part:
I am comparing models with difference in the fixed effects through wald t-tests (anova (mn)). With this result I check the individual significance of a fixed effect instead of comparing two or more models directly.
Whereas when the fixed effects are the same but the changes occur in the random effects, I am using anova (m1, m2, ...mn) to compare the best model.
Is this the correct approach also?
The *best* approach is to have good arguments for a particular model. Model comparison based only on a set of observed data is neccesarily suboptimal.
Is the linear relationship between the response and time reasonable?
Is the chosen error model (e.g. normal) reasonable?
I don't understand why you want to compare the fixed-effects models when you know that you have longitudinal data? These models both ignore that the errors are partially correlated!
For the random effect (Time|Subject) be aware that here intercept and slope are correlated. If you want the model allow to fit a random intercept that is uncorrelated to the random slope, then use (Time-1|Subject) + (1|Subject).
Fit the model you deem resonable and then interpret the fitted coefficients. If you think this model is too complicated, then you should not built it in the first plase. Instead make a simpler model and check it's performance (w.r.t. to aim of your model, what in this case typically is prediction or forecast). I don't see the logic behind building different models and comparing them on a given set of data.
If you really have a couple of different models, then you should better explicitely state the (relative) prior credibility of these models. You can then use the data to adjust these credibilities. That would be a "Bayesian model comparison", if you like. In this way you at least have some clear statements about your models, conditioned on the data, whereas your model comparisons only give you information about the data, conditioned on the models (which credability is nowhere stated).
• asked a question related to Comparative Modeling
Question
My coauthors and I are finishing up a project where we look at ways in which model data is compared with observational data. We found specific conceptual disconnects for using r2 and various Goodness-of-Fit methods for this. As a matter of academic diligence, I would like to inquire what other researchers use for comparing model output with observational data. r2 is probably the most popular because it is so easy to implement. If r2 were not available to you, what would you use? A preprint of our article is available at
Tabulate theoretical values (from the model) against corresponding observed values (from experiment), then calculate the RMSD (root mean squared deviation). The more value pairs you include, the better.
• asked a question related to Comparative Modeling
Question
Hello, I have created a SWAT model in a basin where flow data does not exist. There are few water quality measurements: less than 10 measurements each for years 2013 and 2016 only, with 2013 data from July to October and 2016 data from September to December. Am I even able to calibrate and validate this model? How can I compare model outputs with my water quality nutrient data? Also, how can I compare SWAT outputs with my water quality nutrient data? Much guidance needed, thanks.
Hi
As you know calibration is a difficult issue especially in ungauged or data sparse regions. New studies in this field show that you can use satellite based datasets for calibration of hydrological models like SWAT.
You can predict stream flow by calibrating simulated soil moisture and remote sensing based soil moisture products such as SMOS, ASCAT.
In other words by calibrating soil moisture you can simulate stream flow accurately. I attached some new papers for you in this field..
I hope that works
the best
Azizian
• asked a question related to Comparative Modeling
Question
I am currently using a dynamic in vitro BBB model for my research, and I would like to compare models using primary rat brain endothelial cells (which I am isolating myself) and an immortalised rat brain endothelial cell line. I would like to use RBE4 in particular as they are the most well-characterised, and I do not want to use a human cell line as I would like to be able to directly compare the results with my primary rat cells. However, I cannot find a source of these cells within the UK, any help in sourcing some would be appreciated.
The authors of the 2nd publication listed in the Cellosaurus entry for RBE4:
are from London, so a good start point would be to contact them as there are no cell line collections that distributed RBE4.
Best regards
Amos
• asked a question related to Comparative Modeling
Question
The likelihood ratio test of comparing reduced model with full model differs by fixed factor result to chi-square distribution of zero degree of freedom.
/* reduced model */
proc mixed method = ml;
class block gen;
model rtwt = /ddfm = kr;
random block gen;
run;
/* full model*/
proc mixed method = ml;
class block gen;
model rtwt = prop_hav/ddfm = kr;
random block gen;
run;
There are 3 degree of freedom from reduced model - block variance, genotype variance, and residual variance. The same degree of freedom for full model with includes prop_hav as covariate. The difference in their -2 loglikelihood has zero degree of freedom under chi-distribution. Please could anyone guide me on how to compare these model to ascertain if the full modelis significantly different from reduced model.
LRT only works when one model is a simplification of another. You can’t take out terms and add in with one go, do it in two stages.
the models to be compared must be nested
• asked a question related to Comparative Modeling
Question
I want compare results of model FEM and Experimentals of tunnel lining with the theory or any standard norm!
Thanks you
There analytical solution of Wang and Penizen, its simple equations that you can find even calculations in the paper. I do not know why you are using SAP, as you can not model soil as a continuum material in SAP. I am not sure that there is any analytical solution for structural forces of circular tunnel with soil springs. I suggest you to use soil vision software ( FEM (PLAXIS, ABAQUS, LS-DYNA, OPENSEES) or FDM (FLAC) ) to compare structural forces with analytical solution.
• asked a question related to Comparative Modeling
Question
When first being exposed to methods of model selection in ARIMA class models, I was told that visual inspection of the ACF and PACF (when data is stationary) is satisfactory.
However when compared to the model selection using AIC or BIC we will end up with a different model sometimes.
Attached is the plot of time series data I'm using I am using along with its ACFs and PACFs. Visual inspection would lead us to conclude that the appropriate model would be an AR(1).
However based on the model from AIC (given by the R command auto.arima) the appropriate model is an ARMA(2,2).
In terms of model selection which method of selection is preferred? Visual inspection of ACF and PACF or the use of a selected Information criterion?
Here are some observations
1-We investigate the ACF and PACF of the times series itself to see which model they suggest.
2-We check the ACF and PACF of the residual, after fitting a model to the time series, to see if this residual is a white noise.
3-We use an information criterion like AIC or BIC to choose among correctly fitted models.
4- From my own experience, depending on how complex your time series is, auto.arima() might not be the best modeling tool. Rather, you should use arima() and set its parameters based on the investigation you’ve already done on the AFC and PACF of your time series .
• asked a question related to Comparative Modeling
Question
I need to compare models for predicting DBH and Height from stump diameter at 30 cm. I am looking for allometric models for tropical and subtropical tree species. my objective is to predict stem/bole biomass from stump diameter at 30 cm. I have species wise allometric equations for volume and biomass for a particular region and these equations are based on DBH and Height, but not the stump dia.
And is there any alternative way to reach my objectives.
I need suggestions...
Since you're ultimate goal is to develop an allometric equation that predicts stem biomass (above ground) from stump diameter, you should first calculate volume from DBH and Height. Then calculate stem biomass of trees from volume and species-specific wood density. If you have enough stump dia and stem biomass information, you can easily develop an allometric equation as:
Stem Biomass ~ f(Stump Diameter)
Depending on your data, you can try fitting different linear/ non-linear models and choose the best fit model based on R2/AIC values.
Hope this helps.
• asked a question related to Comparative Modeling
Question
I am wanting to take a 10-12 residue loop from an E.coli enzyme polypeptide which confers a particular allosteric mechanism(substrate specificity) and apply it to a non-homologous mouse catalytic enzyme in order to generate a novel enzyme whose hybrid loop I can then refine and adjust to see if we can make it produce a similar allosteric selectivity depending on the loop orientation in response to different substrates binding at the active site.
Now, my question is this; what freely available computational resources exist that can do this?
I'm vaguely familiar with a number of various programs and servers, but Rosetta match/design/dock is too non-user friendly and overly complex with no clear indication it can accomplish what I want specfifically ( I'm aware it offers loopbuilding, or matching or modelling, but I cannot find evidence of any others doing this with loops in the literature, and theres no clear answer as to whether or not it is able to select specific location/orientation to place loop, the existing literature seems to only ever use comparative modelling which is no use here as there's no homology, we are trying to create a novel function. (and no clear GUI so I can observe/experiment with placement.)
pymols build function, Chimeras loop builder and Peptidebuilder all seem to suggest their builders and minimizers can do what I need them to, but there's never a clear 'add this loop to this protein in this specific location' function, it's always too indirect.
I'm fairly new to computational biochemistry, can someone please point me in the right direction, how can I simply and clearly cut a loop from one protein, stick it on another myself computationally and then assess whether it modulates enzyme specificity?
Is there a service or program that does this? Or is my best bet using the likes of I-TASSER, zdock or Rosetta?
Thanks in advance, I'm probably missing something alarmingly obvious, but I've been going round in circles and it's giving me such a headache!
Prime also provides useful tools for loop modeling
• asked a question related to Comparative Modeling
Question
Hello, everyone:
Recently, I was trying to build 3D structure of one protein which has no crystal structure, I was wondering some questions.
First, what are the lowest threshold of sequence coverage and identity between the target sequence and the template for comparative Modeling?
Second, Dose Ab Initio Prediction can generate reliable results for a protein with no significant sequence identity to the known template?
Third, using the popular software Itasser, the parameters of result are as listed follows: C-score is -0.78, TM-score is 0.61+-0.14 and Exp. RMSD is 9.4+-4.6. The manual says that result with C-score greater than -1.5 is reliable. However, I find that the sequence coverage between my target sequence and the best test template is 12% and 31% respectively, Dose anyone can give some evaluation on the prediction? Can I use it in the published article?
Finally, can any one suggest pepeline of classical articles for prediction of 3D structure?
Thanks very much for any reply!
actually the coverage is  quite unacceptable, we usualy don't accept less than 35% of coverage so as it reflects enormous degree of virtuality to the model, which  in turn makes it useless for further structural analysis, the RMSD is also discouraging, usually a homology search on pdb may provide you some good templates to start where you could direct your modelling server to try, trials would then provide you some acceptable model for energy, local geometry and stability measurements for quality estimation methods, the case may be quite difficult for transmemebrane proteins so it depends..
• asked a question related to Comparative Modeling
Question
Where can I access topside sounder data (e.g. Alouette I/II, ISIS I/II, etc). I would like some actual electron density profiles to compare with models. The higher the orbital altitude, the better.
Best regards,
Alex
You can also ftp the data in cdf format. If you prefer that, let me know and I'll pass on a link or can just send you the files.
• asked a question related to Comparative Modeling
Question
Request help in updating myself with the latest developments in DEA. Are there any comparable models that are developed beyond dea. SFA, DEA are productivity assesment tools. Kindly suggest any developments in the area of productivity measurement.
Dear Avinash,
I suggest you some links and attached files in subject about subject.
- Data Envelopment Analysis as Nonparametric Least-Squares ... - JStor
- An introduction to efficiency and productivity - The Ohio State University
Best regards
• asked a question related to Comparative Modeling
Question
I want make comparison between my proposed model and one existing model based on 3 different performance error such as MSE, RMSE & MAPE. Also, I have 4 different medical data sets to be trained for classification purpose. My question is that, should I use the same number of iterations  using the 3 different performance error for each of the data set? or what is the best way for comparing the models in terms of performance error?
In plotting the graph, what is the best way to plot the curve? Performance error vs no. of iterations or what?
Thanks
What's the bottom line? How to compare models
• asked a question related to Comparative Modeling
Question
Respected All,
I have modeled a structure. when i am going to predict  the quality of the modeled structure after energy minimization by Ramachandran plot. Some residues other than binding pocket lies in outlier region". what are the possible reason of these residue lying in outlier region? How can we predict the reason behind this error in model?
Best Regard,
Unfortunately Nasir, outliers in structures are just bad conformations of amino acids. However, if you are working with unstructured or disordered proteins then you might have large number of outliers since that is the instrinsic property of such proteins - "disordered state"
Regarding whether amino acids which are in outlier region have specific physiological roles - I am not sure since there are no such report to my knowledge.
• asked a question related to Comparative Modeling
Question
I am interested in comparing model fit among proc mixed repeated measures models for 8 different outcomes. Fit Statistics, AIC, AICC, etc., were used to select the best fitting model for each of the 8 outcomes, however, I would like to be able to compare the best fit model for each outcome to each other. For example, if I wanted to compare how much variation in weight was accounted for by the best fit model for weight to how much variation in height was accounted for by the best fit model for height, what metric would I use? I have found some reports and publications that use Covariance Parameter Estimates to compute the Intraclass Correlation Coefficient(ICC) to compare models employing different hieratical structures for the same outcome and I wondered if the ICC could be used for my situation as well.
OK I understand better now. Actually I misread and thought that there were 8 independent variables and that confused me. But I see now that there are 8 possible outcomes and we are selecting the best-fit model.
ICC, AIC and AICC etc will not help in this case. ICC is usable if you have a cluster or group. You need a statistic like an R-squared to measure variance explained by explanatory variables. I have attached references about this. I understand that this is an open research subject now.
IMO Selecting best model can also be done by analyzing residuals obtained from different models.You could check to see if they are normal, homoscedastic, small and the model which gives you best residuals might be considered best.
• asked a question related to Comparative Modeling
Question
measured reflectance peak amplitude decreased, with compared to modeled spectra?
If you talk about the specular reflectance, a common reason is surface roughness... to give a more extensive answer, we would need more information about what kind of sample you measure in what spectral range by what method and instrument etc....
• asked a question related to Comparative Modeling
Question
I modeled two protein domains structure based on comparative modelling by using MODELLER. I want to link or combine them into one dimer protein based on available template. I need you experiences in in this case since I stuck here. I am awaiting your proposed procedure and MODELLER script.
Yours sincerely
Dear Naresh,
I sent  the paper to your private account.
• asked a question related to Comparative Modeling
Question
I used the "comparative modeling" algorithm of the online tool ROBETTA to model a protein structure (http://robetta.bakerlab.org/). The tool always generates five models, to which the ROBETTA FAQs state "In the case of homology modeling predictions, the models are ordered by the likelihood of the alignment clusters. Each alignment cluster represents a particular topology."
Thus, if I understand it correctly, the model 1 will be more likely to be the "true structure" than model 2 and model 2 will be more likely to be the "true structure" than model 3 and so on. I am wondering whether this is quantifiable or with other words: I want to know HOW MUCH MORE LIKELY is model 2 than model 1. Is it possible to get "some numbers" for this?
I have a second problem. I generated a mutant of the protein that I am trying to model and the mutation (one single point mutant) is a complete loss-of-function mutant in vivo. When I run the comparative modeling algorithm on ROBETTA for the wildtype protein I get a structure that looks very plausible (for a lot of reasons) and it is the model ranking second ("model 2"). If I use the "mutant" with the "loss-of-function" mutation" under the same conditions in the same algorithm, I also get this structure (or something VERY SIMILAR). However, this structure is now not model 2 but instead model 4. It seems to be "ranking lower" (i.e. be "less likely"?). Can I interpret these data in a way that the loss-of-function mutation makes it harder to acquire this specific fold (e.g. can I at least say the data hint in this direction)? And if yes, would it be possible to quantify this (i.e. saying HOW MUCH does the mutation interfere with folding)?
Hi Ralf,
If the size of your protein is less than 400 amino acids you can give a try on CABS Flex server.
3 amino acids is not much of a deviation so I concur with the structural biologist.
Regarding the folding cascade of the mutant protein it is difficult to comment offhand but I believe if the mutation is in a structurally important region for the protein, then it might assume a completely different folding pathway in comparison to its wild type.
• asked a question related to Comparative Modeling
Question
Hello,
I am modeling a tsunami model from a FEM software. To reduce the calculation time I did it as a 2 D model (assume the plane strain condition is per 1 mm width) .
To compare the model results with experimental results do I have to consider any factors? ( ex- multiply software results it with the experiment model width?, etc )
Hello:
In the first place, whether or not you can reduce the FEM model to 2D must be justified by at least a first level understanding of the actual problem. You must be convinced that strain, for example, is negligible in the z-direction. For example, take strip rolling or forging.
If you are convinced that the problem you are solving is reducible to 2D model without much loss of accuracy, then you can compare your FEM results with whatever experimental results available for the system you are addressing.
Some times, the experimental system may also have to be so specifically devised to be a reduced and simpler model of the reality so that you can far more easily conduct the experiments and compare your simulation results with it.  Remember, though, that the validation in this case may not exactly hold good for the real scenario but certain conclusions can always be deduced and the FEM model may possibly be used for an unknown situation of loading condition to predict the behaviour of the system.
• asked a question related to Comparative Modeling
Question
Hi,
I'm using a lagrangian dispersion model to study the contribuition of urban traffic to concentrations of NOX. We need to compare model results with measurements, but of course the model does not include chemistry and cannot simulate the photochemical NOX.On the other hand we both have a regional scale model that can treat the full chemistry and we have access to several monitoring station data.
I'm currently studying the method described in  Saravanan A., et al, "A Method for Estimating Urban Background Concentrations in Support of Hybrid Air Pollution Modeling for Environmental Health Studies", Int. J. Environ. Res. Public Health 2014. but I can't seem to find a practical example of the method application.
Can someone suggest an alternative method with a reference in which we can see it actually applied to some cases?
thanks
Felicita
Yes Francesco,
I received it. Thank you very much!
Felicita
• asked a question related to Comparative Modeling
Question
According  to ASME standards validation of a physical/mathematical model has to be done by comparing model predictions with the "real world", i.e. with experimental results.
Often, however, no such results are availible, at least not with  the necessary accuracy and error analysis when it comes to turbulent flow.
Shouldn't we, as an alternative, take high quality DNS results instead and still call it "validation of a model" ?
The problem with validating models against DNS is that some or many of the DNS data sets have not gone through a proper scrutiny check themselves, see e.g. our assessment of available DNS data sets of one of the (not THE) most canonical flow cases, the zero pressure gradient turbulent boundary layer.
However, ones such a scrutiny check has been performed (the details of which need to be discussed I presume), both DNS and experiments, should both be considered as "experiments", i.e. a numerical and a physical experiments, representing a "real flow case". As shown here,
it is not a validation against experiments, but a cross-validation of both sides.
• asked a question related to Comparative Modeling
Question
It is known that Wald tests and likelihood ratio tests typically yield very similar results, especially as the sample size increases.
When conducting trend test for a continuous variable in CLR, should I report the p-value for trend from the likelihood ratio tests comparing the model with the continuous variable to that without the continuous variable, or report the p-value for the continuous odds ratio (Ward test)?
Using LR test to test for linear trend, you should NOT compare the model with the continuous variable with the model without the continuous variable.
Instead, you should compare the model with the categorized variable with the model with the same variable as a continuous model.
Say you have a variable in 5 levels (e.g. age groups 1 through 5) and you would like to test for a linear trend between these 5 levels on some outcome (e.g. yearly income over 100.000\$). First you estimate the model with these 5 levels as individual levels using indicator variables (leaving one level out as the reference level), If no other explanatory variables are included in the model, this model has 4 df (degrees of freedom). You save the model likelihood. Then you re-estimate the same model now using the variable as a continuous variable. This model only has 1 df (if no other explanatory variables are included), that is, this model has 3 less df than the full model. You save the likelihood for this model and compares it with the likelihood for the full model, that is a LR test with 3 df. If this test is insignificant, you can infer that the simple (continuous) model is not statically worse than the full model, and as it is simpler, it is “better”, and you can conclude that your data is well described by the linear model. However, the estimate for the linear trend needs to be significant.
That is, to show a linear trend you need to demonstrate two things: 1) an insignificant LR-test as described above, and 2) a significant estimate for the continuous variable.
• asked a question related to Comparative Modeling
Question
I'm using "out of sample maximum likelihood" to assess model fit from several candidate prediction models generated in SAS 9.3 using PROC GLIMMIX or PROC GENMOD (binary distribution). "Out of sample maximum likelihood" is calculated as: ∑(measured * log(predicted) + (1-measured)*log(1-predicted)). Has anyone used this method to compare similar models, or can you recommend other statistics to compare models and assess model fit? I'm seeking method alternatives or references for application of "out of sample maximum likelihood" to assess model fit.
I am not experienced, but I found this:
to be an interesting presentation.
• asked a question related to Comparative Modeling
Question
AUROC is not necessary to be equal to the model accuracy. Sometimes the AUC records higher rate than the accuracy and vice versa. What dose that tell us? Is there any link to ranking quality or confidence of the decision of the model?
The AUROC has direct relation with the confidence of the model's decision. It helps you to decide where to draw the threshold line between normal and abnormal cases. it depends on the relation between sensitivity and specificity, where, Sensitivity is the ratio of cases that are positively classified to number of positive case, while, Specificity is the ratio of cases that are negatively classified to the number of the negative cases.
If the ROC curve comes under the (Specificity+Sensitivity) line, it means the model can identify positive cases at high accuracy rate, but detecting negative cases will not be the same (low rate of accuracy), while if the ROC curve is higher than the line the negative cases will be classified correctly.
The perfect value of this curve is (0,1) where
sensitivity = 1, and the value of (1 - specificity) should be  = 0.
• asked a question related to Comparative Modeling
Question
If I am trying to estimate two completing models using maximum likelihood estimation and one of them fails to converge. Can I assume that the model that converged is a better fit ?
No, it depends on your method of optimization that you have used. Probably an initial point close to a singular point makes the method to diverge. What kind of method did you use? Or you used a built-in method of some package?
• asked a question related to Comparative Modeling
Question
Often times applied researchers use nonlinear transformations on variables that are not normally distributed prior to modeling. I would like to show that other techniques result in superior models, but comparative model fit statistics such as AIC and BIC cannot be used with a nonlinear transformation (Burnham & Anderson, 2002). Are there other statistical methods for comparing models?
The burden of proof is that the method is demonstrably better. Apply the method and show that the data are described and interpreted in a meaningful manner and better than another way.
• asked a question related to Comparative Modeling
Question
I took a protein domain sequence from Uniprot database. My current aim is to design a structural model of this domain using a domain homologue with 25% sequence identity. Will it be a good choice to serve the purpose.?
In my opinion a good model is a model that is useful [for designing experiments, informing hypotheses, rationalise sequence conservation etc], so it really depends on what you want to use your model for.
In practical terms I would use profile alignments and fold recognition to avoid or reduce template/target alignment errors. Then use template based modelling e.g. in modeller.
If more than one structure for this domain has been solved, look at structural variation. Should give you some idea where to expect variation [here your model in all likelihood will be least reliable] and which structural features are more conserved [you are probably closer to reality here].
@Milad I quite like I-TASSER based on its impressive CASP performances, but prefer Modeller when there is an obvious template for modelling [and importantly use my own alignment in modeller]. At 25% sequence identity I wouldn't worry about loop prediction [or bother much with loop refinement] as in all likelihood there will be inaccuracies in the 'core' already.