Science topic

MCMC - Science topic

Explore the latest questions and answers in MCMC, and find MCMC experts.
Questions related to MCMC
  • asked a question related to MCMC
Question
2 answers
What does the close similarity in estimation results up to the third decimal place between the MLE and MCMC methods indicate, given that the difference only appears in the fourth decimal place?
Relevant answer
Answer
That, if your MCMC has been formulate with a Bayesian argument in mind and the likelihood of the data isnt strong, you did not inform your priors very strongly. The posterior is then simply the likelihood. It becomes the question of you actually set up the model with a Bayesian argument in mind.
  • asked a question related to MCMC
Question
1 answer
I've set up an emcee EnsembleSampler() with 50 walkers, 500 iterations. However, looking at the resulting traces, I don't think my walkers are fully exploring the parameter space. The walkers don't converge around what the true value of the parameter is.
The prior I'm giving to whole parameters is uniform and the p0 distribution is a random uniform distribution. The reason I believe this is wrong is because it appears that these walkers are exploring all of the parameter space and for the majority of their iterations without converging to the parameter
Graphs below:
Relevant answer
Answer
Yes, walker positions can converge when using an ensemble sampler, such as the affine-invariant MCMC ensemble sampler (e.g., emcee). In this method, a group of "walkers" (or chains) explore the parameter space simultaneously. As the walkers move, they collectively sample the posterior distribution. Over time, if the sampling is efficient and the model parameters are well-behaved, the walker positions converge towards the true posterior distribution. Convergence is typically assessed through diagnostics like the Gelman-Rubin statistic or by visually inspecting trace plots.
  • asked a question related to MCMC
Question
7 answers
Dear All
I am trying to applying a phylogenetic correction to an MCMC model, but I have problems in making the inverse matrix. I can visualise the treeplot very well, but when I use the script:
inv.phylo<-inverseA(phylo_ultra,nodes="TIPS",scale=TRUE)
R tells me that there is an error:
Error in pedigree[, 2] : incorrect number of dimensions
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :
Do you have any experience with this? I couldn't find a solution so far
Thanks in advance
David
Relevant answer
Answer
I've had the same issue sometimes. Usually it's because the names in both data files aren't exactly the same and therefore it fails matching them. Check every latin names or ID column used for the match is exactly the same (just a space character might be an issue).
  • asked a question related to MCMC
Question
1 answer
Is there any study which address the grid in-dependency for simulating a droplet using MCMP LB (Shan-Chen method). I want to simulate a droplet in different system sizes 50x50, 100x100, 200x200. the selection of relaxation time and interaction strength is performed in a way that we have same physical unit system however results of simulation in different system sizes are different from each other.
Relevant answer
Answer
Hi Sajjad, I am also facing the same issues, I am comparing the growth and collapse of a bubble based on the Shan-Chen model with the Rayleigh-Plesset equation, but with a different mesh size, the results differ. Have you managed to figure it out?
  • asked a question related to MCMC
Question
26 answers
A parameter is defined as a value of the population whereas a statistic is a value of the data, i.e., the mean can be a statistic or parameter. However, it can be quite ambiguously and the context is obscured or missing . For example, the linear model has parameters B0 and B1; in E(y|x) = B0+B1*X1. If B1 is estimated via ML, B1 is a statistic (likelihood), but via MCMC a parameter (posterior). However, the coefficients B0 and B1 are described as parameters, even before estimation, but are not population parameters.
Thus, in referring to the parameters of the linear model the ambiguity remains in the use of the word parameters (if one does not specify the method of estimation or is unaware of it). Given I would like to express in a single sentence what I did, "I estimated the model statistic B1" highlights that it is the likelihood of the coefficient B1, whereas "I estimated the model parameter B1" highlights that it is the posterior of coefficient B1.
The question, is it valid to refer the the coefficients B0 and B1 after estimation as statistic or parameter given the likelihood or posterior? I have not read the expression statistic(s) in referring to the coefficients in a linear model only as parameter(s) (or sample parameter). I am trying to avoid ambiguity in my writing and clear my brain.
Thank you in advance
P.S. I adjusted the question
Relevant answer
Answer
Stefano Nembrini : it was not Fisher who invented the word statistics. Credit must go to Sir John Sinclair (https://en.wikipedia.org/wiki/Sir_John_Sinclair,_1st_Baronet) 1754-1835, who introduced the term into English. He adapted it from a German word. To quote the man himself
"Many people were at first surprised at my using the words "statistical" and "statistics", as it was supposed that some term in our own language might have expressed the same meaning
the idea I annex to the term is an inquiry into the state of a country, for the purpose of ascertaining the quantum of happiness enjoyed by its inhabitants, and the means of its future improvement; but as I thought that a new word might attract more public attention, I resolved on adopting it, and I hope it is now completely naturalised and incorporated with our language."
And I would urge you to be careful with your own words. It does not "feel like" the questioner is making a fuss. This is something that you think, an opinion. Thoughts are not feelings.
Nor do words have the meanings that you assign to them. If they did, communication, science, ethics, even food labelling would be impossible.
Finally, if the question irritates you, I suggest you leave it alone. Researchgate is a collaborative, supportive platform. If you want something more adversarial, there's plenty of choice.
  • asked a question related to MCMC
Question
7 answers
Hey Members, I'm running quantile regression with panel data using STATA, i find that there are two options :
1- Robust quantile regression for panel data: Standard using (qregpd )
2- Robust quantile regression for panel data with MCMC: using (adaptive Markov chain Monte Carlo)
Can anyone please explain me the use of MCMC ? how can i analyse the output of Robust quantile regression for panel data with MCMC ? thanks
Relevant answer
Answer
Thank you Dr. Mohamed-Mourad
  • asked a question related to MCMC
Question
2 answers
I am constructing a phylogenetic tree using align sequences of 4 mt-genes [2 coding, 2 noncoding]. Total nucleotide length extends up to about 2500bp. I am stuck in one point, therefore I want to get into an exact idea on how to determine the exact numbers for following;
*Number of generations
*Sample frequency
*Diagnostic frequency
*Number of parallel runs
*Number of heat chains
If someone could help me with this I would be much grateful.
Thank you..
Relevant answer
Answer
Daniel Carrera Lopez
thank you for your valuable response. Hope this would help me.
  • asked a question related to MCMC
Question
2 answers
I need to create a genetic simulation and test my method on it for my dissertation research. As I search, I saw it's possible with "MCMC" and I'm using R.
Did anyone create a population like that and can help me to solve this?
Relevant answer
Answer
Maybe the attached screenshot will be helpful to you. Best wishes David Booth
  • asked a question related to MCMC
Question
3 answers
Hi
I am wondering if there is a standard effective sample size (ESS) to declare the convergence of a chain. Well, the convergence of a MCMC chain is obviously assessed using different convergent diagnostic tests: Gelma-Rubin, Gewke, Heidelberger-Welch test... However, if I would evaluate the ESS of parameters estimated by Gibb sampling how to define the values are adequate.
Example. My samples were 10,000, after burn-in, thin... The ESS for estimates are
Var1= 780
Var2= 323
Var3= 2,963
Var4= 5,456
Var5= 174
Var6= 8,351
Are ESSs of Var1, Var3 and Var4 acceptable? Should I run again a MCMC?
Thanks in advance for your answers.
Relevant answer
Answer
The more the chains, the higher the convergence, and hence stationarity of the variables
  • asked a question related to MCMC
Question
6 answers
Hello everyone
I have precipitation data at 5 different meteorological stations. All of these stations have missing data. I have tried MCMC method using XLSTAT.
It gives negative values for some of the observations. (plz see attached file)
Can anyone guide me what to do with these values?
Should these be replaced with zero or considered as missing again?
Or any other way?
Thanks in anticipation
  • asked a question related to MCMC
Question
2 answers
I have noisy data points, where the peak signal-to-noise ratio (PSNR) may sometimes be less than unity (hence, more noise than signal may be present). I am fitting a model with fitting parameters to this noisy data, using MCMC (Markov Chain Monte Carlo) methods. I want to know if using a noise filter on the noisy data points (such as a Wiener filter in real space or a bandpass filter in Fourier space), before doing the MCMC fitting, would cause the 90% HPDI contour (highest posterior density interval) of the joint posterior probability distribution of the fitting parameters to be tighter or wider (precision), and closer or farther away from the true parameter values (accuracy)?
Relevant answer
Answer
As Ray Kidd mentioned, filtering data is futile. First, the noise is part of the data. In some cases noise can be informative. Filtering can not increase the information content of the data. The information content is an unalterable state of nature. Second, if the filter happens to be inappropriate, the parameter estimates can be meaningless. Third, sometimes the data information content is too low to compute meaningful parameter estimates. Using a filter tells you nothing about parameter estimate uncertainties.
One approach is to include a model for the noise in the parameter estimate model. If you know a lot about the noise, include all that information in the model for the data instead of using a data filter. Never include ad-hoc, indefensible assumptions about the noise (or the parameters).
I noticed Bayesian terms in your keywords. Under no circumstance should you filter the data with Bayesian methods. Bayesian methods include one or more models for the noise in the model for the data. This means there will be well-designed, prior probability, distributions for the noise. Objective, prior probability distributions consistent with maximum entropy principles are the best one can do.
The width of the posterior probability distributions for the parameters will tell you if the data information content is simply too low to compute meaningful parameter estimates.This can be explored by adding noise to simulated data (or empirical data). At some point the signal-to-noise ratio will no longer support meaningful parameter estimates.
  • asked a question related to MCMC
Question
8 answers
To see complete details, please find the attached file. Thanks.
Relevant answer
Answer
Since y|x,z or z|x,y are all easily simulated from. Probably you only need to sample from x|y,z using any appropriate MCMC sampler that you are familiar with. I recommend looking at Slice sampling, Neal 2003.
  • asked a question related to MCMC
Question
5 answers
My model contains five parameters. I want to make Bayesian estimation, but the Bayes estimates can not be obtained in closed form. So, I used Metropolis-Hastings to generate MCMC samples from conditional posterior density of each parameter. The trace and Auto-correlation plots were used to evaluate the generated sample. The trace plots for four parameters are random and the Auto-correlation plots are decreasing whereas for the fifth parameter(I will referred as alpha1), the trace plot is not random and the lags in Auto-correlation plot is not decreasing. I read that I should use thinning to reduce the Auto-correlation.
Q1: I should do thinning for the five parameters or only for alpha1?
Q2: How can I do thinning in R program (I wrote the code not a function in R)?
I would be appreciated if someone help me. Many thanks in advance.
  • asked a question related to MCMC
Question
7 answers
In Bayesian linear regression, what are the following indicators used for? Spectral density at 0; MCMC sd. error; Relative Numer. Eff; Inefficicy factor; tau; sigma_e.
Relevant answer
  • asked a question related to MCMC
Question
148 answers
In Bayesian linear regression, what are the following indicators used for? Spectral density at 0; MCMC sd. error; Relative Numer. Eff; Inefficicy factor; tau; sigma_e.
… Read more
  • LINEARMODEL.docx
Relevant answer
Answer
Thanks Dr
  • asked a question related to MCMC
Question
2 answers
Hello fellow researchers,
I am doing a research which involves estimating the parameters of the Cox Ingersoll Ross (CIR) SDE using a Bayesian approach. I propose using the Euler scheme in my approach. Could some one please direct me to any implementation code out there in R, Python or Matlab?
Thank you !!
Relevant answer
Answer
For those who were following this question, after a long search I couldn't find any package that implements the CIR model under a Bayesian framework. So I wrote up a Python script to do that. Interested readers can find the code in my GitHub repository https://github.com/Kwabena16108/CIR-Bayesian-Estimation.
Hope this helps.
  • asked a question related to MCMC
Question
3 answers
Hello:
I need to update the probability distribution of a random variable which depends from 3 others which I can modify, but the function is a black box, I only know that the future values of those 3 variables depends on the previous value of the one I want to update. I´m using MCMC to update it but I can´t understand very well how to update a distribution using MCMC, could you please mention a paper, book or website where I could find a good example to start understanding it?
Thanks in advance.
Pablo.
  • asked a question related to MCMC
Question
4 answers
I have no mathematics background and learning the reconstruction of species phylogeny.
I understand the of principle of MCMC is to use baysian statistic, which is to deduce the posterior prob. from the prior prob. and likelihood of the parameters.
The question is:
what is the intuition behind RJMCMC? how does it defer from conventional MCMC? and when to use it?
I think MCMC is well-explained in the internet, but what RJMCMC explained in the internet is the math equation.
Relevant answer
Answer
From a fit-for-purpose perspective, where MCMC samples the posterior of the parameter space for a single model, RJMCMC is intended to sample the joint posterior of the parameter *and model* space given a prior of several models. For some model spaces, such as variable selection, the number of possible models could be very large (tens or hundreds of thousands). Practically, this allows a single Markov chain to theoretically sample all possible models (or variable combinations) and yield posterior model probabilities for each.
Mathematically, RJMCMC is achieved via revisiting the detailed balance sufficient condition upon which Metropolis Hastings MCMC is derived. By considering multiple parameter subspaces and moves between these subspaces, a transition kernel that satisfies detailed balance under these conditions is chosen to be the acceptance ratio for RJMCMC. A few requirements are necessary to enforce for this to be valid, and these are dimension matching of auxiliary variables that are drawn from proposal distributions for each transdimensional move, and the bijective (invertible) transformation that maps these variables and the current state to the new subspace (or model).
  • asked a question related to MCMC
Question
8 answers
MCMC sampling is often used to produce samples from Bayesian posterior distributions. However, the MCMC method in general associates with computational difficulty and lack of transparency. Specialized computer programs are needed to implement MCMC sampling and the convergence of MCMC calculations needs to be assessed.
A numerical method known as “probability domain simulation (PDS)” (Huang and Fergen 1997) might be an effective alternative to MCMC sampling. A two-dimensional PDS can be easily implemented with Excel spreadsheets (Huang 2020). It outputs the joint posterior distribution of the two unknown parameters in the form of an m×n matrix, from which the marginal posterior distribution of each parameter can be readily obtained. PDS guarantees that the calculation is convergent. Further study of comparing PDS with MCMC is warranted to evaluate the potential of PDS as a general numerical procedure for Bayesian methods.
Huang H 2020 A new Bayesian method for measurement uncertainty analysis and the unification of frequentist and Bayesian inference, preprint,
Huang H and Fergen R E 1995 Probability-domain simulation - A new probabilistic method for water quality modeling. WEF Specialty Conference "Toxic Substances in Water Environments: Assessment and Control" (Cincinnati, Ohio, May 14-17, 1995),
Relevant answer
Answer
The Excel spreadsheet isn't transparent-it's exactly the opposite, since it provides the result, without showing how it's obtained. There's no problem of principle in programming a Monte Carlo sampling using Excel; just that the code won't be efficient, since an Excel speadsheet isn't designed for such calculations, when they really get interesting.
  • asked a question related to MCMC
Question
3 answers
Trying to perform multiple imputation using NPBayesImputeCat, at the stage where I specify burn-ins, MCMC iterations, and thinning, under the following
DPM_Model_1 <- CreateModel(Pew_MCAR_copy, NULL, 250, 0, 0.25, 0.25, 1234)
DPM_Model_1$Run(5, 10, 1)
always I receive the error message
Error in DPM_Model_1$Run(5, 10, 1) : could not find valid method.
My syntax is reflected directly in In Hu, Akanda, Wang, 12 Jul 2020, preprint, "Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat," The R Journal, page 6.
Not sure why/what I am doing wrong. My data set has 3403 observations and 51 unordered categorical variables. Just one of these variables has missing values; 613 observations contain the variable with the value missing. Trying to impute those missings.
Any feedback immensely appreciated -- Thanks, Mark
Relevant answer
Answer
Israa, so many thanks for your response. I came up with a workaround, or, I simply shut down and tried again. I needed to solve a problem quickly, and the quick solution wasn't too elegant. But please give me a little time and I will get back. I did receive another response from one of the authors of the package and I will look that up -- Mark
  • asked a question related to MCMC
Question
3 answers
After using MCMC to determine 2 parameter estimates, the output is as follows:
Parameter A:
Median: 1.45
2.5%: -0.98
97.5%: 3.19
Parameter B:
Median: -3.29
2.5%: -10.64
97.5%: 19.13
Does anybody know how to interpret these credible intervals? Can I say that these parameters are not significantly different to 0 given the negative CI?
Thanks.
Relevant answer
Answer
CIs allow you to say that you are 95% certain that the true parameter falls within that Confidence Interval. I'll refer you to Geoff Cumming's 'Understanding the New Statistics' for further information.
  • asked a question related to MCMC
Question
2 answers
Dynamic Structural Equation Modeling (DSEM) is a great tool to analyze intensive longitudinal data. Currently, I am working on a dataset of 64 participants and 30-50 timepoints. However, the data was collected over the course of 81-150 days. In other words, the time intervals between every two measurements are very uneven, ranging from 1 to 20. I know an AR(1) DSEM model in Mplus using MCMC imputation and Bayesian estimator and can produce a converged model. However, with the amount of missing data (70%-80%), are the results trustworthy? Thanks!
Relevant answer
Answer
The best way to answer this question is probably by conducting a Monte Carlo simulation with the parameters/missing patterns that apply specifically to your application. You can save the parameter estimates from your application (that you're not sure you can trust) using the OUTPUT: SVALUES command in Mplus. This makes it very easy to set up a realistic population model for the simulation in Mplus using the MONTECARLO and MODEL POPULATION statements. You can also simulate missing data patterns in Mplus in the same simulation.
  • asked a question related to MCMC
Question
3 answers
Hello everyone,
I need to assess whether a MCMC chain (implemented in BayPass 2.2) has reached convergence and, as far as I know, in order to check it, I must look at the posterior distribution of a chosen parameter.
Indeed, BayPass prints such a log file. Trouble is, the parameter is a matrix of post-burnin and thinned MCMC samples. So far, I've always checked MCMC convergence of a parameter either eye-metrically or by using the R package boa, but dealing with a matix-parameter I am a little in trouble.
Furthermore, I feel I'm missing some pieces in the theoretical understanding of MCMC chains, which may be hampering both my hypothesizing as well my attempt to check convergence.
If anyone can have some hints or point me to a misunderstanding of mine I would be very grateful.
Thanks in advance
Relevant answer
Answer
You can use BOA (R program for assessing the convergence of MCMC chains)
  • asked a question related to MCMC
Question
8 answers
Hello fellow researchers,
I am doing a research in extreme value theory where I have to estimate the parameters of a generalized Pareto distribution using a Bayesian approach. I would really appreciate it anyone can point me to any code in R, Matlab or Python that estimate the GPD.
Relevant answer
Answer
# load packages library(extRemes) library(xts) # get data from eHYD ehyd_url <- "http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2" precipitation_xts <- read_ehyd(ehyd_url) # mean residual life plot: mrlplot(precipitation_xts, main="Mean Residual Life Plot") # The mean residual life plot depicts the Thresholds (u) vs Mean Excess flow. # The idea is to find the lowest threshold where the plot is nearly linear; # taking into account the 95% confidence bounds. # fitting the GPD model over a range of thresholds threshrange.plot(precipitation_xts, r = c(30, 45), nint = 16) # ismev implementation is faster: # ismev::gpd.fitrange(precipitation_xts, umin=30, umax=45, nint = 16) # set threshold th <- 40 # maximum likelihood estimation pot_mle <- fevd(as.vector(precipitation_xts), method = "MLE", type="GP", threshold=th) # diagnostic plots plot(pot_mle) rl_mle <- return.level(pot_mle, conf = 0.05, return.period= c(2,5,10,20,50,100)) # L-moments estimation pot_lmom <- fevd(as.vector(precipitation_xts), method = "Lmoments", type="GP", threshold=th) # diagnostic plots plot(pot_lmom) rl_lmom <- return.level(pot_lmom, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(pot_mle, type="rl",      main="Return Level Plot for Oberwang w/ MLE",      ylim=c(0,200), pch=16) loc <- as.numeric(return.level(pot_mle, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(pot_lmom, type="rl",      main="Return Level Plot for Oberwang w/ L-Moments",      ylim=c(0,200)) loc <- as.numeric(return.level(pot_lmom, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels results <- t(data.frame(mle=as.numeric(rl_mle),                      lmom=as.numeric(rl_lmom))) colnames(results) <- c(2,5,10,20,50,100) round(results,1)
  • asked a question related to MCMC
Question
1 answer
Hey there,
I am looking for literature or tutorial videos, which are concerned with the question of how multilevel markov chain monte carlo models are conducted in R or Mplus. Can someone help me?
Thank you in advance!
Best regards
Robin
Relevant answer
Answer
I don't know if this is still useful but there is lots of material for brms in R. For example:
The Bayes task view lists dozens of packages that use MCMC methods:
However, brms is probably the easiest entry if you a new to MCMC and Bayes. There are a variety of different MCMC approaches and brms uses Hamiltonian (implemented in Stan). So if you want to know more about this form of MCMC you'd just need to look up Stan (though it isn't trivial if you have no physics background). Other MCMC approaches like Gibbs sampling (usually in jags or BUGS) or Metropolis-Hastings are easier to understand and in theory could be programmed directly in R (which some people do). However, usually one wants to run compiled code and using a specialised MCMC tool like Stan or jags makes sense. Packages like brms allow you to work in R and set up the model in Stan or jags for you.
  • asked a question related to MCMC
Question
2 answers
Dear all, I have a comparative phylogenetic model on MCMCglmm, with a binary response variable, 4 binary explanatory variables as fixed effects and the phylogeny as a random effect. I would greatly appreciate it if someone could help me interpret the output of the model, particularly the density plots of (Intercept), location_selection and material_transport, which I have attached. As you can see these three variables have a triangular posterior distribution, and although I have found a few papers of triangular distribution, they talk about it as a prior distribution. I haven't found anything about a triangular posterior distribution. Following is also the output of the model, in case that helps:
Iterations = 120000001:1319999501 Thinning interval = 500 Sample size = 2400000 DIC: 71.74413 G-structure: ~phylo post.mean l-95% CI u-95% CI eff.samp phylo 3.841 1.215e-11 9.991 1567107 R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units 1 1 1 0 Location effects: care01 ~ location_selection + substrate_modification + material_transport + laying_location_is_food post.mean l-95% CI u-95% CI eff.samp pMCMC (Intercept) -250.852 -568.741 -5.698 1648 < 4e-07 *** location_selection 246.618 1.747 564.414 1648 0.000358 *** substrate_modification 6.168 3.514 9.248 1651692 < 4e-07 *** material_transport 475.747 165.100 706.887 1079 1e-05 *** laying_location_is_food 0.187 -2.019 2.408 2393680 0.858883 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Many thanks in advance!
Relevant answer
Answer
Very naively, the posterior distribution has just the same inetrpretation like any probability distribution. If it has a triangular density, like for location_selection, then this just meens that you should expect lower values of this parameter more likely than larger values. Negative values and values > 700 are considered impossible. If this makes sense or is reasonable depends on what this parameter means (in your real life model and how it is represented in your statistical model), and what prior you put on that parameter. It mayalso depend on if the algorithms found a "good" solution (maybe there is an MCMM expert around hwo could help you there).
  • asked a question related to MCMC
Question
3 answers
I use MCMCglmm to run fixed effects model and I use multiple fixed effects. Is it possible to check the effect of the interactions between the fixed effects, or interactions are only vailable for the random effects? Many thanks in advance!
Relevant answer
Answer
Safa Aouinti thank you very much!
  • asked a question related to MCMC
Question
1 answer
Hi,
I have a standard SEIR model and would like to run a simple Bayesian MCMC (Metropolis-Hastings) inference on COVID data. How do you do this on R?
Many thanks!
Relevant answer
Answer
Package 'EpiILM' - CRAN
  • asked a question related to MCMC
Question
4 answers
I am trying to estimate the most likely number ok K using MCMC (Markov Chain Monte-Carlo Inference Of Clusters From Genotype Data) function in Geneland R-package by Gilles Guillot et al. I am a lit bit confused when it comes to varnpop and freq.model arguments
In the package reference manual https://cran.r-project.org/web/packages/Geneland/Geneland.pdf  one may read:
varnpop = TRUE *should not* be used in conjunction with freq.model = "Correlated"
From the other hand, other manual http://www2.imm.dtu.dk/~gigu/Geneland/Geneland-Doc.pdf recommends example of MCMC usage which looks like this:
MCMC(coordinates=coord, geno.dip.codom=geno, varnpop=TRUE, npopmax=10, spatial=TRUE, freq.model="Correlated", nit=100000, thinning=100, path.mcmc="./")
I am not sure how to reconcile these two contradictory pieces of information, any suggestions?
Relevant answer
Answer
Dear all, I am experiencing the same problem with correlated allele freqs. model in Geneland; whether the MCMC do not converge or converge with too many and not credible clusters. However, the uncorrelated freqs. model seems to perform well...
It's been any advance in this question since 2018? Cheers.
  • asked a question related to MCMC
Question
5 answers
I encountered the below errors when I run the spatial AFT survival model with CAR prior to the SpBayesSurv package.
Error1:
"Starting initial MCMC based on parametric model:
Error in solve.default(scaleS * Shat0) :   Lapack routine dgesv: system is exactly singular: U[1,1] = 0"
Error 2:
"chol(): decomposition failed "
Best regards
Eisa
Relevant answer
Answer
Hi Huaichao,
Good day! I wish this message finds you in good spirits. I still encounter this error.
Best,
Eisa
  • asked a question related to MCMC
Question
7 answers
Respectfully, I have analyzed two different molecular data and also the concatenated dataset in order to find the best K (optimum number of genetic groups) using the STRUCTURE HARVESTER (a web based program).
I upload the Result zip file generated from STRUCTURE into the STRUCTURE HARVESTER, and the out put was interesting! K was not the same for each marker and even the concatenated data.
I am really confused!
How I must explain? Which of them I can trust? Is sth wrong with the analyses in STRUCTURE. I did them as follow: The analysis with the admixture model and correlated allele frequency, was run with the length of 10000 burnin period and MCMC replications each set at 50000. We also set a K value ranging from 1 to 10 with five independent iterations.
In my eyes one of the K is more wise because it separate species groups, but the remaining were strange! How I can persuade myself to accept only two groups for very different species!
I will appreciate anyone who can help me.
Regards
Relevant answer
Answer
Dear Atena,
first of all, the reasoning might be simply because of the markers used. If you are running structure, this is basically a POPULATION GENETICS approach. Of couse, you can play around with diffferent combinations of admixture and no admixture (with the second option more close to "species-level" studies), but also correlated and uncorrelated allele frequencies are compensating for the first option in one or the other direction.
IRAP is not really a marker system following necessarily "mendelian segregation" and it is difficult to find any reasonable selection of priors. Wjhat is about the breeding system of your target species?
However, I would expect that the K=9 structure of data is simply a further splitting of the major K=2 cluster? If this is NOT the case, then your combimed data do not follow principles of population genetics - this means use other methods to analyse your data, e.g. simple PCO of unbiased/unconstraint measures of your raw data.
I hope this helps,
Marcus
  • asked a question related to MCMC
Question
2 answers
I am attempting to run my SNP dataset in STRUCTURE, at 10K burnin and 10K MCMC reps. I have 7864 loci and 84 individuals. so I ran for K=1 to K=20, at 5 iterations. Now, I am waiting days on my laptop for STRUCTURE to run! I saw on another post that there is a faster version of STRUCTURE in the works - do you have any recommendations on how to speed up the process?
Relevant answer
Answer
Thanks
  • asked a question related to MCMC
Question
2 answers
Both the formal (MCMC) and informal (Generalized Likelihood Uncertainty Estimation) Bayesian methods are widely used in the quantification of uncertainty. As far as I know, the GLUE method is extremely subjective and the choice of the likelihood function is various. This is confusing. So, what are the advantages of GLUE and is it worthy of being admired? Is it just because it doesn't need to reference the error function? What is the pros and cons between the two methods? What to pay attention to when constructing a new informal likelihood function(like LOA)?
Relevant answer
Answer
Mr. Peng, I think, your problem can be solved with you read Vrug's and Baven's paper about MCMC formal and GLUE.
  • asked a question related to MCMC
Question
7 answers
Currently, I am studying the MCMC and its variants, i.e., Hamiltonian MC, however, I am not sure what is the best approach to practically diagnosing the convergence and quality of MCMC samplers. At this moment, I diagnose the convergence based on the central limit theorem (CLT). I found that CLT is not the best approach to diagnose the convergence because, for Gaussian case, I can use any optimization methods which show superiority above MCMC samplers.
Kindly seek your advice in this matter.
Great thanks!
Relevant answer
Answer
I agree with all the previous answers, but I think I could add a couple more comments.
Generally speaking, I think the first thing to look at, is the histograms of your Quantities of Interest (QoI) for all MCMC chains. Since MCMC is independent of the initial state, if your chains have converged, then the histograms from all chains for the same QoI should overlap giving you a similar mean/median etc.
Beyond that, the R-hat criteria (compares variances between the chains) is a very good one and I have used it quite a lot. It is also very easy to implement, so that is definitely a plus.
Additionally, I have seen people using autocorrelation lags to check how independent are states of Markov chain. Plotting autocorrelation as a function of lag should estimate how many iterations of Markov chain are needed for effectively independent samples.
Last but not least, if you are working with Hamiltonian Monte Carlo, you could also look at the potential energy as a function of iterations. For example, in gradient-based minimization methods (and here I refer to local optimization techniques), we typically look at the objective function as a function of iterations to determine convergence. In the HMC, the potential energy is the negative log of the likelihood function (aka your objective function), and therefore it makes sense to look at how it is decreasing as you run the algorithm for more iterations.
Anyways, these are simply my personal suggestions/thoughts. I hope they will help!
All the best,
Maria
  • asked a question related to MCMC
Question
12 answers
Greeting Researchers!!!
Doing an article on Bayesian Estimation but stuck in R-programming for MCMC algorithm.
If someone have idea about R- Package or related paper or methodology please let me know.
Distribution : Burr X type II ( Generalized Reliay Distribution)
Prior Distribution : Gamma or Uniform
Looking Forward
Zeb
Relevant answer
Answer
I hope this book will be helpful to do MCMC
Best regards@Alam Zeb Khan
  • asked a question related to MCMC
Question
3 answers
Hi all,
I have some question specifically about a Beast analysis I am trying to run (I am a beginner).
My dataset contains 2 mtDNA genes (COi and 16S) and 2 nuclear DNA genes (H3, 18S). I specified for - 16S the substitution model HKY+I+G and clock rate, 18S the sub. model K80+I, COi the sub. model TrNeF+I+G and clock rate, H3 the sub. model the same previous model. I have run MCMC chain length for 100 million and the ESS value for gamma.shape 3 is still below 100.
Do you have an advice how to increase this parameter?
Relevant answer
Answer
The first issue is why are you trying to estimate that parameter? Who cares? More constructively, I suggest you examine the trace, sometimes increasing burnin will sort things. Sometimes the chains have not converged.
  • asked a question related to MCMC
Question
4 answers
  • When using MCMC's formal likelihood function to perform uncertainty analysis on a set of structure time-history curves. Because the number of observation points reached hundreds of thousands, that the parameter uncertainty distribution almost converged to one single point.
  • In this case, there is almost no difference between the results of deterministic analysis with MCMC, so does MCMC in this case still make sense? What can explain from this result. How much data volume is suitable for MCMC.
Relevant answer
Answer
It depends how complicated your model is! If you are fitting a standard fixed effects multiple regression it is not surprising that the data will dominate to give a tight posterior. But are you over-simplifying reality to get a simple model? Just to take one 'exception' - do you not have heteroscedasticity? To take another, have you no structure to take account of- were all the measurements done by the same person, are you assuming no measurement error?
MCMC / Bayesian analysis is designed for realistically 'complex' analysis, see the MCMC chapter here:
  • asked a question related to MCMC
Question
10 answers
I am working on extremes in R and I have estimated parameters for gev and gpd using mle and lmom. But I can't estimate the parameters for Gamma-Pareto and Gamma-generalized Pareto distributions using mle, lmom and adaptive MCMC in R Studio. Could you be able to help me with the codes?
Relevant answer
Answer
You could send an email to the corresponding author of the following manuscript
Alzaatreh, A., Famoye, F., & Lee, C. (2012). Gamma-Pareto distribution and its applications. Journal of Modern Applied Statistical Methods, 11(1), 7.
In this case, you can ask them to share the codes of the manuscript that has the cited model already implemented in R.
  • asked a question related to MCMC
Question
3 answers
Dear Colleagues.
Does anyone has knowledge in running Migrate-N software? As the software introduction says it estimate effective population size (Ne) but the output I got only for effective MCMC sample size? What is the difference between them? Can I calculate Ne based on SNP data? and Is it reliable to do so?
Kind regards,
Vu,
P.S. Output of a run is attached!
Relevant answer
Answer
You’re most welcome.
  • asked a question related to MCMC
Question
4 answers
Dear all,
I am working with my doctoral thesis and trying to fit a generalized linear mixed effects model by using ‘MCMCglmm’ package in R. And actually this is the first time I work with it. I had repeatedly read Jarrod's tutorial materials and papers and they are very helpful for understanding the MCMCglmm method. However, there are still some problems about the priors specification I failed to figure out. I had been working with them for a couple of weeks but I cannot solve them.
In my research, the dependent variable is the number of people participating in household energy conservation program (count outcome). It has been repeatedly measured for each day over approximately three years for each of 360 communities (the data are thus quite big and n = 371, 520). In addition, these communities are located at different districts (there are a total of 90 districts). Thus, the longitudinal daily count data are nested within communities, which are nested within districts. My research aims to investigate which time-variant and time-invariant factors would influence the (daily) number of participants in such program. The basic model is (over-dispersed) Poisson model and the codes are cited as follows.
# load the data
load("dat.big.rdata")
#the requisite package
require(MCMCglmm)
#give the priors
prior.poi <- list(R = list(V = diag(1), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(3)*0.02, nu =4),
G2=list(V=diag(3)*0.02, nu=4)
)
)
#fit the model
model.poi <- MCMCglmm(y ~ 1 + t + x + x:t + t2 + t3 + t4 + c1 + c2 + c3 + d1 + d2 + d3,
random = ~ us(1 + t + x):no_c + us(1 + t + x):no_d,
rcov = ~idh(1):units,
family = "poisson",
data = dat.big,
prior = prior.poi,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
In the fixed effects part, ‘y’ is the count outcome; ‘t’ measures time in elapsed days since the start of the program; ‘x’ is another behavior intervention implemented for some communities. ‘t2 ~ t4’ are other time-variant factors (i.e. dummies measuring weekend and public holiday, and log term of average daily temperature); ‘c1 ~ c3’, and ‘d1 ~ d3’ measure the community and district-level characteristics respectively, which are time-invariant variables (e.g. total population, area size). In the random effects part, ‘no_c’ and ‘no_d’ are the record number of each community and district.
Since there are many excess zeros in my data, so I further run a hurdle (over-dispersed) Poisson model, as follows.
#give the priors
prior.hp <- list(R = list(V = diag(2), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(6)*0.02, nu =7),
G2=list(V=diag(6)*0.02, nu=7)
)
)
#fit the model
model.hp <- MCMCglmm(y ~ -1 + trait + trait:t + trait:x + trait:x:t + trait:t2 + trait:t3 + trait:t4 + trait:c1 + trait:c2 + trait:c3 + trait:d1 + trait:d2 + trait:d3,
random = ~ us(trait + trait:t + trait:x):no_c + us(trait + trait:t + trait:x):no_d,
rcov = ~idh(trait):units,
family = "hupoisson",
data = dat.big,
prior = prior.hp,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
Both OD and hurdle Poisson models could work well only when ‘fix = 1’ was added into the R- structure of the prior specification. When it was removed from the priors, both models would return the error message: “Mixed model equations singular: use a (stronger) prior”, and stop running. This error would not disappear regardless of whether parameter expansions were used in the G-structure (that is, alpha.mu=rep(0, 3), alpha.V=diag(3)*25^2 for OD Poisson model, and alpha.mu=rep(0, 6), alpha.V=diag(6)*25^2 for hurdle model) or not, either whether other elements in R-structure were removed/adjusted or not.
In hurdle Poisson model, since the covariance matrix for zero-alteration process cannot be estimated, ‘fix = 2’ should be used in R-structure rather than “fix = 1”. However, the model could not run well unless the residual variance for the zero-truncated Poisson process is fixed at 1, as described above.
My question is that is it appropriate to fix the residual variance for both zero-alteration and Poisson processes at 1 in the R-structure? Is it too ‘informative’ for my model estimation? Are there any other priors I can take to make the model run well?
Thanks for any idea about these questions.
Relevant answer
Answer
Thank you all for these suggestions. I use this prior and MCMCglmm method in my analysis for the time being, and I will try brms later. Thanks angain!
  • asked a question related to MCMC
Question
6 answers
I recently moved from distance-based techniques to model-based techniques and I am trying to analyse a dataset I collected during my PhD using the Bayesian method described in Hui 2016 (boral R package). I collected 50 macroinvertebrate samples in a river stretch (approximatively 10x10 m, so in a very small area) according to a two axes grid (x-axis parallel to the shoreline, y-axis transversal to the river stretch). For each point I have several environmental variables, relative coordinates inside the grid and the community matrix (site x species) with abundance data. With these data I would create a correlated response model (e.i. including both environmental covariates and latent variables) using the boral R package (this will allow me to quantify the effect of environmental variable as well as latent variables for each taxon). According to the boral manual there are two different ways to implement site correlation in the model: via random row-effect or by assuming a non-independence correlation structure for the latent variables across sites (in this case the distance matrix for sites has to be added to the model). As specified at page 6, the latter should be used whether one a-priori believes that the spatial correlation cannot be sufficiently well accounted for by row effect. However, moving away from an independence correlation structure for the latent variables massively increases computation time for MCMC sampling. So, my questions are: which is the best solution accounting for spatial correlation? How can be interpreted the random row-effect? Can it be seen as a proxy for spatial correlation?
Any suggestion would be really appreciated
Thank you
Gemma
Relevant answer
Answer
Thank you very much for your return. Thank you for this exchange, for links and documents too.
cordially
  • asked a question related to MCMC
Question
3 answers
What software should I use for MCMC analysis? And is there any good tutorial to learn and do it by self without any statistics background ?
Relevant answer
Answer
Yes agree with kauahi Perez.
  • asked a question related to MCMC
Question
6 answers
Hello everyone,
What are the good sources for learning Markov chain Monte Carlo (MCMC) simulation?
Relevant answer
Answer
There are many sources depending on the what level you'd like to learn the material. Two good references, one very applied with many coding examples, and the other one more theoretical are:
  • asked a question related to MCMC
Question
4 answers
I am using MCMC with the Metropolos-Hasting algorithm to generate solutions of a non linear regression problem.
**Likelihood**
My likelihood is a gaussian distribution centered in 0 of the residuals with sigma as the measurements uncertainties.
**Priors**
The three discrete parameters can take any integer values between 1 and 20. There are the number of components used by three distincts PLS models to make predictions.
The three continuous parameters represents the average number of carbon atoms attached to three different functional groups. We know from past experiences that they are in certain ranges. So we have bounded uniform priors for these three parameters.
**Proposal distribution**
I am wondering how to set my proposal distribution, a multinormal distribution is not possible as I have discrete variables (3 continuous variables and 3 discrete).
I have thought about using uniform distributions for each variable as a proposal distribution. This won't be efficient but if I run enough iterations that should give meaningful results. Does anyone encountered a similar problem and has any advice?
Thanks for your time!
Relevant answer
Answer
helpful
  • asked a question related to MCMC
Question
3 answers
Hi Monica and Anna,
Just saw your PeCALE work and wondered about the project's scope and contributors. Always happy to consider new connections, options to share, contribute etc.
Best for all our new year's challenges.
Angela Foley
Special Engagement Programs Coordinator, MCMC
HDR student, Western Sydney University
Relevant answer
Answer
Ta Monica - lovely to hear from you, we may not be a good fit. FYI: Last year I taught nearly two thousand students (in school and out, from pre-school to tertiary, but mainly in primary settings) covering urban nature usually within a decolonised framework.
Always good to think things over, Cheers Angela
  • asked a question related to MCMC
Question
3 answers
Hi all
I am currently trying to run a 2-level model using runmlwin with a random intercepts and a random slope. My goal is to run this model in the MCMC mode.
All the variance estimates under the Random-effects Parameters are zero which do not imply to run the model in MCMC mode. The error which I am having is "MCMC Error 0315: Prior variance matrix is not positive definite".
How to rectify this problem?
Regards,
Shujaa Waqar
Relevant answer
Answer
This a well know problem and you have to intervene to change the covariances to zero and the variances to a non-negative value of say 0.001; you do this from inside MLwiN by right clicking on the offending estimate and giving it a more appropriate starting value.
Technically the variance covariance matrix cannot be inverted, either because of a variance that is negative or zero and/or because the covariance implies a correlation outside the range of -1 to +1 - this is due to IGLS algorithm not finding a good estimate and that is why you have switched to MCMC ; the fixed part estimates are usually OK. The underlying cause is usually an overly complex random part given how many units ( that is power) you have in that part of the model. MLwiN also assumes that you have sorted (and carried the rest of the data) on the units that form the structure of your model - check in Hierarchy.
See mcmc chapter in this
  • asked a question related to MCMC
Question
1 answer
Hello,
I seem to be having issues with convergence in my Bayesian analysis. I'm using a single gene large dataset of 418 individuals. My PSFR values say N/A in my output but my split frequency is 0.007. Also, my consensus tree gives me posterior probabilities of 0.5 or 1 with no distnguishable clades (see attached). Below is my Bayes block:
begin mrbayes;
charset F_1 = 1 - 655\3;
charset F_2 = 2 - 656\3;
charset F_3 = 3 - 657\3;
partition currentPartition = 3: F_1, F_2, F_3;
set partition = currentPartition;
lset applyto=(1) nst=6 rates=gamma;
lset applyto=(2) nst=2 rates=invgamma;
lset applyto=(3) nst=6 rates=gamma;
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
mcmc ngen= 24000000 append=yes samplefreq=1000 nchains=8;
sump burnin = 10000;
sumt burnin = 10000;
end;
Any advice? Thanks!
Relevant answer
Answer
You have a fairly larger dataset so I would try with more generation time. On other hand I would check a modeltest for the 1st and 2nd as often they have the same model when tested (I don't have much experience but that is what I have seen).
I am not sure you need to you need to unlink the 3 partition and then set the priors as variable. Have you tried to remove:
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
Finally, I don't see in the code where do you set the 2 independent runs (nrun=Number or independent analysis with the same dataset and script) so I guess you run 8 chains but in a single run. So how come they don't converge? All the chains are dependent.
Try to add nrun=2 [usual for Bayesian] each run has 4 chains by default so no need to set that up.
mcmc ngen= 24000000 append=yes samplefreq=1000 nrun=2;
  • asked a question related to MCMC
Question
11 answers
Dear All
I have conducted a multilevel model for binary outcome (self-rated health), using Stata .
About 72000 individuals at level 1 are nested within 50 countries at level 2.
The number of countries is rather small, to obtain more robust estimate I performed bayesian melogit and the Deviance Information Criterion (DIC) was used to compare the fit of different models. Is this correct ? or using melogit (instead of MCMC estimate) the result will be robust enough?
Below is the code I have:
bayes, mcmcsize(2500) : melogit Health centered_Age i.Sex i.Marital i.Income i.Percived-Inequality lnGDP GINI || Country: , or
Any comments or suggestions would be greatly appreciated.
Relevant answer
Answer
I would use MCMC to be sure that you get good estimates in this situation, see
  • asked a question related to MCMC
Question
3 answers
Hi,
I was wondering if there is an R-package (and functions therein) that implements Bayesian Phylogenetic Mixed Models (BPMM) or is the general R-package "MCMCglmm" for Bayesian Mixed Models currently the best option?
Relevant answer
Answer
Bit late, but in case anyone else wants this, here's a paper that implements BPMMs with MCMCglmm.
  • asked a question related to MCMC
Question
9 answers
I'm looking for an example of Dynamic Bayesian Network that contains  continuous and discrete variables implemented on genie & smile ?
Relevant answer
Answer
Thank you Professor Rafael,
No not now. It was 2 years ago.
Now every thing is well. But if you have relevant materials you can provide me with it.
  • asked a question related to MCMC
Question
3 answers
I want to know if scientifically logical that I implement ~MCMC method on MCAR dataset?
I have created a data set with 20% missing data under " missing completely at random (MCAR)" assumption. I want to use MCMC to impute missing data of my data set.
I want to make sure that MCMC can be used for MCAR data.
Markov Chain Monte Carlo (MCMC)
missing completely at random (MCAR)
Relevant answer
  • asked a question related to MCMC
Question
3 answers
Hi,
I have downloaded the new BEAST ( v2.5.1 ). I have worked previously in the older version (v1...something). I understood that you need to install different modules to study different questions and that each module provides different "tab" sets in BEAUti to set up the analysis. When I check the tutorials (for the new version (I guess), they all have the following set of tabs on their embedded photos : partitions; taxa, Tips, Traits, Sites, Clocks, Trees, States, Priors, Operators, MCMC. The only tabs in standard module I get now are: Partitions, Tip Dates, Site Model, Clock Model, Priors, MCMC. With modules such as "StarBeast" there are in addition taxon sets, and multiple species coalescence; other modules have some different tabs, but I cannot find this "standard look" they use in tutorials.
My question is how to retrieve these tabs, are they now part of a special module?
Ivana
Relevant answer
Answer
Great! Best greetings,
Salvatore
  • asked a question related to MCMC
Question
4 answers
I use WinBUGS, a software which employ that method. If you have a beginner publication related, please point it out.
Relevant answer
Answer
You might be interested in the MCMC chapter in this book - I have tried to set out in simple steps what is going on in the process of simpler and more complex models and given sample code to demonstrate the ideas
  • asked a question related to MCMC
Question
4 answers
As in the title: when running a MrBayes analysis on a large (190 samples) SNP dataset, all the 122 trees in the .trprobs file have the same probability (p=0.008). I am looking for the 95% credible set of trees, but I suspect that they should not have the same posterior probability. ESS values are low (<50), but I only ran this analysis as a test before a longer one which ought to get ESS above 200.
Is this an error? Any recommendations on getting around this? Would just increasing ESS solve this issue?
Edit: after a re-run which produced higher ESS values (>200), I still have the same issue.
Relevant answer
Answer
95% HPD Interval
--------------------
Parameter Mean Variance Lower Upper Median min ESS* avg ESS PSRF+
--------------------------------------------------------------------------------------------------
TL 1.061980 0.000095 1.042879 1.080876 1.062098 451.19 601.10 1.001
r(A<->C) 0.103976 0.000009 0.098559 0.110387 0.103935 281.42 309.01 1.016
r(A<->G) 0.360533 0.000027 0.350404 0.370181 0.360801 150.72 161.41 1.003
r(A<->T) 0.071054 0.000006 0.066360 0.075745 0.071203 289.18 321.59 1.001
r(C<->G) 0.020114 0.000002 0.017775 0.022591 0.020091 317.71 326.55 1.000
r(C<->T) 0.346303 0.000028 0.336815 0.357066 0.346073 144.77 178.70 1.000
r(G<->T) 0.098020 0.000008 0.092461 0.103101 0.097997 223.08 252.54 1.000
pi(A) 0.247380 0.000009 0.241770 0.253749 0.247389 263.70 273.91 1.005
pi(C) 0.247447 0.000010 0.242003 0.253900 0.247466 223.47 293.12 1.003
pi(G) 0.254260 0.000009 0.248221 0.259791 0.254200 246.65 248.82 1.005
pi(T) 0.250914 0.000010 0.244960 0.257107 0.250921 260.87 265.54 1.004
alpha 71.419755 35.677437 59.166980 82.053450 71.338790 728.65 739.83 1.001
--------------------------------------------------------------------------------------------------
Run Arithmetic mean Harmonic mean
--------------------------------------
1 -96849.51 -96921.21
2 -96844.91 -96921.99
--------------------------------------
TOTAL -96845.59 -96921.67
--------------------------------------
Overlay plot shows good plateauing.
Thanks!
  • asked a question related to MCMC
Question
2 answers
Hello everyone! Is there a formal way (such as Brooks-Gelman-Rubin (BGR) statistics or or Geweke diagnostic statistic) to determine convergence of Markov chain Monte Carlo (MCMC) if one estimates an econometric model using the latest bayes command in Stata 15? Stata 15 seems to only rely on graphical methods to determine convergence of the MCMCs but I am also interested in formal tests. However, I have not found anything yet on the same in Stata 15. Any help will be appreciated. Thank You!!
Relevant answer
Answer
Hi Niladri, thanks for your suggestion. I think I felt lazy to do the coding... Based on your suggestion, I will try to do it. In case I do not succeed in Stata, I will just look to R or WinBUGS! Thank you.
  • asked a question related to MCMC
Question
3 answers
In terms of true observation in the form of numbers
Relevant answer
Answer
MCMC can be used for that. There are some interesting books that discuss MCMC applications for Ecology related problems, for example; Kery's Introduction to WinBUGS for Ecologists: Bayesian Approach to Regression, ANOVA, Mixed Models and Related Analyses could be a good catch for you.
  • asked a question related to MCMC
Question
3 answers
Using BEAST2 fossilized birth death model, I'm receiving low/red ESS values specifically for my posterior and prior. I'm already at 100M generations, which seems really high, but aside from increasing the generation number more, how do I increase the ESS values?
Relevant answer
Answer
I have maximized most of the scales at the operators tab, and my ESS significantly increased, will that have an effect on my analysis?
  • asked a question related to MCMC
Question
3 answers
Dear all,
I drew the (HPD) region using "HPDregionplot" function in R which is implemented under "emdbook" package. I do not put the TRUE code of generation the (X, Y) because its very long ( which is implemented using MCMC method). I put artificial example, but the TRUE two graphs when N=200 and 750 are presented in the attached file.
The code
set.seed(1)
x1 <- rnorm(200, 0, 1)
y1 <- rnorm(200, 0, 1)
x2 <- rnorm(750, 0, 1)
y2 <- rnorm(750, 0, 1)
library(emdbook)
x1.y1=cbind(x1, y1)
plot(x1.y1)
HPDregionplot(x1.y1, prob = c(0.95), add=TRUE)
x2.y2=cbind(x2, y2)
plot(x2.y2)
HPDregionplot(x2.y2, prob = c(0.95), add=TRUE)
I have two questions:
First Question: How can I compare between the attached two graphs?
Second One: If there are a code (preferred to be in R) to count the points fall within the previous region?
Any help please?
Many thanks in advance.
Manal Salem
The link of "emdbook" package:
Relevant answer
Answer
Statistical Rethinking (by Richard McElreath) is highy recommended.
  • asked a question related to MCMC
Question
3 answers
Hi everyone I am new user of BEAST2 software. I worked on population genetics, phylogeny and phylogeography of a complex species in Caryophyllaceae family. I identified haplotypes in cpDNA and nDNA and have phylogeny trees based on haplotypes. Now I want estimate historical demography and divergence time but I cant calibrate it. There isn’t a fossil data or substitution rate. Only before the age of main clade of genus estimated 11 million years. In a similar papers I read “Posterior estimates of the mutation Rate and time of divergence were obtained by Markov Chain Monte Carlo(MCMC)analysis.” Any guidance appreciated Masi
Relevant answer
Answer
Following
  • asked a question related to MCMC
Question
6 answers
My best substitution model was estimated to be GTR+G+I using JmodelTest.
in Beauti setting, all given values (including gamma shape,proportion of invariant and rate ac,ag etc) was set and frequencies was set as empirical. (all parameters in site model were set estimate except Proportion invariant and rate CT remained as default-unticked)
i have run MCMC chain length from 1million to 30 million and the ESS values for posterior, prior and rate CG still below 200.
May i know how can i improve the ESS value?
Relevant answer
Answer
Hi Artur,
here i attached my Nexus file for your reference.
and one of the xml file and log file i got after running beast.
and also a screenshot for your quick glance.
Thanks.
Hi Chen, im now trying running 40 million ngen. Thx.
  • asked a question related to MCMC
Question
5 answers
Hello everyone ,
I have a question about the Bayesian Hierachical model and Simultaneous equations . I want to establish two model, one for the individual tree growth, another for the individual tree mortality. The two models can all build by Bayesian method through the SAS PROC MCMC or R2WinBUGS.
Obviously, there are some relationship between the growth and mortality ,so I want to konw, Can I use the Simultaneous equations or SUR to estimate the two Baysian model together.
Could you give me some advice, Nomatter an essay or code example, thanks a lot.
Relevant answer
Answer
CHECK THE FILE GIVEN BELOW
  • asked a question related to MCMC
Question
6 answers
I am interested in running a model predicting a continuous variable, using several fixed effects as well as random subject and item effects. I would like to estimate the model using MCMC sampling (or some equivalent method). I'm not interested in fitting priors. What is the easier way to do this in R?
Relevant answer
Answer
Try brms (my recommendation) and rstanarm packages, which allow you to use a syntax similar to the one used with the lme4 package, but from a Bayesian perspective (MCMC using Stan -Hamiltonian Monte Carlo-). Alternatively, use the inla (integrated nested Laplace approximations) package as an alternative to MCMC techniques in Bayesian inference. In any case, be careful with the default priors: they can be a pain.
Regards.
  • asked a question related to MCMC
Question
5 answers
we are using generalized stepping-stone sampling method to calculate MLE (Marginal likelihood estimation) values. then after getting these MLE values from BEAST, how can we calculate the Bayes Factor?
Relevant answer
Answer
You can calculat the value of logBF from the formula: logBF = logPr(D|M1) – logPr(D|M2) , the vaule ranging from 3-5 strong support for M1 better fit to the data....
  • asked a question related to MCMC
Question
3 answers
I am modelling electrical conductivity using some experimental. The model fitts the data. Now, I need to do parameter optimization for variables used in the model (independent ). The model originally is a logistic function. 
Relevant answer
Answer
@ José Arzola-Ruiz and Yusdel Díaz....thanks very much... I think, i put question in a slightly wrong way. I edit my question. I need to use MCMC to deremine liklyhood for the parameter optimization (parameters used in my model).
  • asked a question related to MCMC
Question
4 answers
I generated a XML file generated by BEAST 2.4.5 and succeeded to implement a pilot run on my own notebook, whereas it went wrong on another computer. This computer has well-setting jre and jdk, and also work on BEAST’s example files.
The details of the error as followed:
Start likelihood: -Infinity after 10 initialisation attempts
Fatal exception: Could not find a proper state to initialise. Perhaps try another seed.
P(posterior) = -Infinity (was -Infinity)
              P(prior) = -108.44711033039145 (was -108.44711033039145)
              P(CalibratedYuleModel.t:ets) = 0.3436272137456058 (was 0.3436272137456058)
              P(CalibratedYuleBirthRatePrior.t:ets) = -6.907755278982137 (was -6.907755278982137)
              P(ClockPrior.c:ets) = 0.0 (was 0.0)
              P(GammaShapePrior.s:ets) = -1.0 (was -1.0)
              P(GammaShapePrior.s:g3pdh) = -1.0 (was -1.0)
              P(GammaShapePrior.s:its) = -1.0 (was -1.0)
              P(KappaPrior.s:ets) = -1.8653600339742873 (was -1.8653600339742873)
              P(KappaPrior.s:g3pdh) = -1.8653600339742873 (was -1.8653600339742873)
              P(KappaPrior.s:its) = -1.8653600339742873 (was -1.8653600339742873)
              P(outgroup.prior) = -93.28690216323206 (was -93.28690216323206)
              P(likelihood) = -Infinity (was -Infinity)
              P(treeLikelihood.ets) = -Infinity (was -Infinity)
              P(treeLikelihood.g3pdh) = NaN (was NaN)  **
              P(treeLikelihood.its) = NaN (was NaN)  **
java.lang.RuntimeException: Could not find a proper state to initialise. Perhaps try another seed.
       at beast.core.MCMC.run(Unknown Source)
       at beast.app.BeastMCMC.run(Unknown Source)
       at beast.app.beastapp.BeastMain.<init>(Unknown Source)
       at beast.app.beastapp.BeastMain.main(Unknown Source)
       at beast.app.beastapp.BeastLauncher.main(Unknown Source)
Fatal exception: Could not find a proper state to initialise. Perhaps try another seed.
BEAST has terminated with an error. Please select QUIT from the menu.
Relevant answer
Answer
I've also had issues with using BEAST across different Java versions (e.g. proprietary vs open, Java 7 vs Java 8). Make sure these are the same across both computers.
  • asked a question related to MCMC
Question
2 answers
Hello everyone,
I am a new user of BEAST and I have some trouble to find the right way to set my priors in order to create a Bayesian Skyline Plot. Basically, I have a dataset with 150 sequences and I want to calibrate my analysis assuming a split of haplotype lineages occurred during an event between 3.4 and 1.8 million years ago. Which parameters I have to modified in order to do that? (see Fig.1).
Thanks in advance!!!
PS. The substitution model that I am using is a TN93 + Inv (base frequencies set as Empirical); The Clock model is Lognormal relaxed clock with a fix clock rate to 2.0E-6.
Relevant answer
Answer
ok got it thanks! and if I just want to set the parameters leaving the default the clock rate, which parameters I have to change? - Yes I have already checked the available tutorials but I will check again hopefully finding the right answer! thanks for your help! :-)
  • asked a question related to MCMC
Question
4 answers
Hi
I have hierarchical Bayesian model with 32 unknown parameters (alpha_1, alpha_2,..., alpha_30, mu, .sigma^2).
conditional posterior distributions of mu  and sigma^2 have closed forms, but conditional posterior distributions of ( alpha_1, alpha_2,..., alpha_30 ) are not closed form, so I need to use Metropolis Hastings sampler within Gibbs sampler.
My question what is an algorithm and R code of Metropolis Hastings sampler of 30 parameters?
Relevant answer
Answer
Thank you of your answer
  • asked a question related to MCMC
Question
4 answers
Hello,
By using the parameter setting below, I am trying to obtain the results for further analysis in Structure harvester. However, even tough I check the option "compute the probability of the data (for estimating K)", I cannot find the related result file in any folders related to the analysis. But my input file is correct, the analysis runs without problem and I obtain the result files on correct folders, the only problem is missing file to use in structure harvester, which should be named for example "K1ReRun_run_1_f". What am I missing? If you make some suggestions I will be appreciated.
Length of Burnin Period: 10000
Number of MCMC Reps after Burnin: 50000
Ancestry Model Info: No Admixture Model
* Use Sampling Location Information
* Use Population IDs as Sampling Location Information
Frequency Model Info: Allele Frequencies are Correlated among Pops
* Assume Different Values of Fst for Different Subpopulations
* Prior Mean of Fst for Pops: 0.01
* Prior SD of Fst for Pops: 0.05
* Use Constant Lambda (Allele Frequencies Parameter)
* Value of Lambda: 1.0
Advanced Options
Estimate the Probability of the Data Under the Model
Frequency of Metropolis update for Q: 10
and I tried the same with different iteration numbers: 1, 5, 10
Relevant answer
Answer
Thank you for your reply, I really did not understand what was the problem with first simulations that I tried with different values, but the new ones are working. Maybe there was a temporary problem, I really don't know. For now, I get results from harvester.
Edit: It might sound unrelated,  but it might be because of a virus was interfering with softwares. Because also geneclass2 was not working, after I deleted a suspicious file they both started to work. 
Thanks a lot again!
  • asked a question related to MCMC
Question
1 answer
I wonder about the means of "bootstrap likelihood" in RAxML.
I'm running ML+Rapid bootstrap for analysis settings, about 1428 taxa, DNA sequence, on the standard-RAxML-master using our server(NOT RAxML gui1.5b1 in WINDOW).
But I can't understand this bootstrap likelihood value.
My running situation is below
Bootstrap[0]: Time 1157.559750 seconds, bootstrap likelihood -168810.464641, best rearrangement setting 13
I wanna know that what -168810.464641 is.
In addition, what is this optimal value?? (For example, in Bayesian inference using Mrbayes, convergence diagnostic <0.01.)
Thank you.
Relevant answer
Answer
As you probably know, when you do a ML tree search, you will get a best ML tree based on your DNA alignment, and this tree will have a certain likelihood value (that you will also get at the end of your analysis).
However, as you probably know too, for every bootstrap pseudoreplicate (usually 100 or 1000 times), the program will generate a new alignment by choosing randomly N columns (some columns will be chosen several times and others will be absent) from your original alignment of N columns.
Therefore, every time the program will do a new tree search from this new alignment (for each bootstrap pseudoreplicate), the likelihood of the subsequent tree (and potentially the tree topology itself) will be different, because the alignment on which you do the search is different from the original alignment (original alignment from which the likelihood of the best tree is calculated), and this is the value of this specific tree generated by the bootstrap search that is given to you.
  • asked a question related to MCMC
Question
4 answers
I am new to phylobayes and I am trying to make a tree with posterior probabilities at the nodes. However, I cant really figure out what command that calculate these numbers. The trees constructed with the pb and readpb commands only seem to include branch lengths and when I open them in FigTree I'm not able to visualize posterior probabilities....
Relevant answer
Answer
Ok, got it. Thanks a lot Dan.
  • asked a question related to MCMC
Question
4 answers
In the final execution with mpirun the output tree was not generated and at the end there is this statement: mpirun realized that the process ranks 0 with PID 1534 in the node .... left in signal 6 (Abort trap: 6).
However, I have all  nex.run.t and nex.run.p but I do not know how to use the command line to get a tree.
Can someone help me?
Relevant answer
Answer
 FYI, another option is the bayestree package in R
  • asked a question related to MCMC
Question
3 answers
microsatellites; run lenght: 10.000; MCMC reps.: 10.000; Nr. of iteration: 20; Set a K from 1 to 25
Relevant answer
Answer
Hi Edit,
You most welcome!
  • asked a question related to MCMC
Question
5 answers
I'm working on source partitioning for two lake food webs. While working on quantifying the sources using SIAR, I have come across some difficulties.
Here are my problems:
First, zoobenthos are the consumers for my systems, and I'm trying to calculate the source portions as detailed as possible; therefore, the "consumers" is sorted into groups based on genus, and most of the time, there is only one sample for each group. When I tried to calculate them in SIAR, it turned out a warning as"
=============== READ THIS ===============
There may be some problems with this data.
Some of the standard deviations seem especially large.
Please check to see whether the target data lie outside
the convex hull implied by the sources.
SIAR rates the problem with this data set as:
Severe - possibly severely affecting results
========================================
" . I don't know which part of my data leads to the problem, the consumers or the sources? How should I do with it ?
Second, the convergence diagnostics shows results like follows:"
Worst parameters are ...
detritus-LG5 SD1G2 detritus-LG2 detritus-DG5 detritus-DG2 SD1G1
0.0007164114 0.0033956284 0.0034963043 0.0092774801 0.0251035281 0.0334041570
SD2G3 SD1G4 detritus-HG4 detritus-DG3
0.0372546252 0.0495544969 0.0624383672 0.1138505731
If lots of the p-values are very small, try a longer run of the MCMC. "
I don't quite get the meaning of those numbers, what do the p-values measures?
Third, when estimating the source portion, should I use mode or mean in the results?
Thank you very much! 
Relevant answer
Answer
I don't know which part of my data leads to the problem, the consumers or the sources? How should I do with it ?
I suggest you to apply the method proposed by Smith et al., (2013) (http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12048/abstract) to check if your model assumptions (TEFs, and the set of sources) are correct. Maybe you missed a source for certain consumers. Besides, you can check the TEF for 15N. The widely used 3.4 is generally wrong regarding benthic invertebrates (specially detritivorous) look at Vanderklift and Ponsard (2003). If you use the wrong factor then your consumers will probably be outside the polygon..
Additionally, whereas you often have n=1 why don´t you try to group species into guilds or Functional feeding groups? This could help you get some clear results. 
 
I don't quite get the meaning of those numbers, what do the p-values measures?
Andrew answered it.
when estimating the source portion, should I use mode or mean in the results?
I am not sure if you mean the output of model or your input data. If it is the output of the SIAR, I think that you should report the mean and the confidence interval. Your input is generally the mean. 
  • asked a question related to MCMC
Question
5 answers
After I have been reading a lot of recent literature regarding SWAT model calibration, I figured out that the approach of initial auto-calibration using SWAT-CUP that then followed with manual calibration is the best!. Therefore, I tried to investigate which procedure in SWAT cup is will be the most able to serve the purpose to calibrate a stream flow. However, I get confused to chose one though in general SUFI2 seems of the highest credits to be chosen. Accordingly, I would appreciate if you advice me of a procedure among them and provide reference support your claim.
Thank you so much
Elham 
  • asked a question related to MCMC
Question
3 answers
At present the JPDA and Fuzzy clustering are the two main data association techniques other than MHT and MCMC for target tracking.  Is there any scope of new data association techniques that can be applicable in real time.  
Relevant answer
Answer
You could have a look at the following, for an interesting perspective:
  • asked a question related to MCMC
Question
2 answers
I have both the Reaction Time (RT) and Accuracy Rate (ACC,range from 0 to 1) data for my 2*2 designed experiment. I want to know whether there is an interaction between the two factors. Two-way ANOVA could be used easily for my purpose. But I would like to calculate the bayes factor (BF) to further confirm the result. I use the same R code with MCMCregress function to do this. The RT variale seems to get compatible BF results as compared to the ANOVA. However, the BFs of ACC are pretty small (less than 1) for all my data, even when the interaction effects are very significant based on the ANOVA.The code goes as follows:
model1.ACC <- MCMCregress(ACC~cond+precond+cond*precond,
data=matrix.bayes,
b0=0,
B0=0.001,c0=0.001, d0=0.001, delta0=0, Delta0=0.001,
marginal.likelihood="Chib95", mcmc=50000)
model2.ACC <- MCMCregress(ACC~cond+precond,
data=matrix.bayes,
b0=0,
B0=0.001,c0=0.001, d0=0.001, delta0=0, Delta0=0.001,
marginal.likelihood="Chib95", mcmc=50000)
BF.ACC <- BayesFactor(model1.ACC, model2.ACC)
mod.probs.ACC <- PostProbMod(BF.ACC)
print (BF.ACC)
print(mod.probs.ACC)
Is there anything wrong in my code? Any guidance will be appreciated.
Relevant answer
Answer
Thank you so much, Chiedu.
  • asked a question related to MCMC
Question
2 answers
To see complete details, please find the attached file. Thanks.
Relevant answer
Answer
Another option is to try Acceptance-rejection method. It works as long as you have PDF. However, it is quite inefficient, especially for large m.
  • asked a question related to MCMC
Question
3 answers
MCMC imputation
Relevant answer
Answer
Okay, then ideally, these variance components should be exactly what I have explained above:
"between variance" the variance among treatments, that is, how much of the variation in the response variable(s) is explained by your explanatory variable(s) ; and the "within variance" is the variance left unexplained (how much variation there is within treatments after removing the differences among them). The total variance should simply be the sum of within and between variances
Now I don't know SAS (I mostly use R) and I don't know exactly what method they use in this procedure. If your data are not balanced, the order of the explanatory variables in the model can change the variance components, or not, depending on the type of sum of squares used. So the interpretation of the variance components is conditional on the previous components ("how much variation does Y explain after accounting for X? how much variation does Z explain after accounting for X and Y?...)
If it is not clear for you, I recommend you read more about ANOVA (similar enough to ANCOVA but better documented), sum of squares and F-tests; and the SAS documentation.
Cheers,
Timohée
  • asked a question related to MCMC
Question
3 answers
The posterior probability is mostly (>0.90), ESSs are great, MCMC samplings converge very well but I am getting overlapped HPD for divergence time estimate. I used relaxed molecular clock with log noarmal distribution. I am analyzing a mitochondrial gene with two calibration node, one mid-interior and the other is very recent. 
Relevant answer
Answer
Dear Binod Regmi,
I found this post from the BEAST team very helpful when setting the parameters for divergence time estimates: http://beast2.org/2015/06/23/help-beast-acts-weird-or-how-to-set-up-rates/
  • asked a question related to MCMC
Question
2 answers
Hello,
I’m using the DISCRETE package of BayesTraitsV2 to analyze two binary traits. I know how to do the LRT between the likelihood values of two modules, but I don’t know how to do the next steps. I’m confused about the result of dependent module. If the q12, q23…represent the transition rates between different states? Why these values in the results of ML and MCMC is so different? And how to get the likelihood value of this parameters as well as to do the LRT?
Thank you so much!
Relevant answer
Answer
Dear Nirmala S.V.S.G,
Thank you very much! These pdf files will be very helpful.
  • asked a question related to MCMC
Question
2 answers
It's not given in the summary as much as I can tell. I'd like to report distributions for ancestral theta and tau values, but in the mcmc files, only ancestral theta is given. I can calculate the posterior distribution of theta but unless I'm missing something I'd have to sum multiple tau values before I can find the ancestral value. An estimate of tau is given every 10% of the mcmc chain in the screen output, but this isn't sufficient to calculate the posterior probability. Any suggestions for getting these values would be helpful. Thanks!
Relevant answer
Answer
hi... First of all what is BP & P??? and can you please little elaborate your problem please if possible express in mathematical expressions...
  • asked a question related to MCMC
Question
4 answers
I wonder if there is any MCMC sampling method which uses the definition of the target CDF instead of the target PDF; however, I may use a proposal PDF.
I would like to use Metropolis-Hastings but it is not possible because the calculation of the acceptance ratio is defined in terms of the target PDF.
I say this because it is impossible for me to obtain the PDF associated to certain CDF without doing some kind of numerical approximation which may bias my simulation; also, my CDF might be not continuous, and therefore I cannot differenciate it to obtain the PDF
Regards!
Relevant answer
Answer
What I mean, is that for example I would like to sample from a copula by means of MCMC. Copulas are not always differentiable.
  • asked a question related to MCMC
Question
1 answer
Does anyone could share a protocol for divergence time calculation based on nucleotide sequences (plant chloroplast genomes)?
I tried with BEAST but failed to generate the input file xml file by BEAUti.
My potocol as follows;
BEAUTi (V. 1.8)
File-> Import input.nexus data  
@taxa -> ingroup (only known samples: ex: rice & sorghum divergence time: 13mya ) and outgroup
@sites -> HYK (substitution model), Gamma (site heterogeneity model)
@clocks -> strict model
@trees -> yule process
@Prior-> All normal, tmrca (untitled)-> normal (Mean:13 SD; 2) how this value calculated?
                  Ucld.mean -> gamma -> shape: 1000,  scale: 0.001,                         
@MCMC -> length of chain-1,000,000 generation, Echo state – 10,000, log parameter – 1000
 5,000,000 generation, Echo state – 50,000, log parameter – 1000
The Error while generating xml file for BEAST
“ BEAST has terminated with an error. Please select QUIT from the menu.
Parsing error - poorly formed BEAST file, uTb_mfft_nexus 28+1_strict_pri17+2.xml:
Error parsing '<uniformPrior>' element with id, 'null':
Uniform prior uniformPrior cannot take a bound at infinity, because it returns 1/(high-low) = 1/inf
plz  suggest me how to overcome this issue.
Any other protocol you like to suggest?
Many thanks
sam
Relevant answer
Answer
Dear Sampath Perumal,
Pls check the attachment. 
  • asked a question related to MCMC
Question
2 answers
Deal all, 
Please I want to use the RASP software for my samples to construct divergence time among my samples, different samples with different species of Bufoidea.
I dont know what is the calibrator which I have to use for my samples as my sequences were sequenced for d loop and 12s of mtDNA and unfortunately, there is no data published for time calibration using these two genes. According this problem. i had to use the RASP as it using the substitution rate but I tried alot to use its tutorial but I could not.
Please I need help
Hani
Relevant answer
Answer
Dear Hani,
I susuggest to perform this analysis with BEAST-v1.8.3.
Best wwishes. 
  • asked a question related to MCMC
Question
3 answers
I'm pretty new running MCMC-GLMMs, and I have began with the R package MCMCglmm
I have a unique dependent-unique factor model with random structure and zero-altered poisson distribution:
R- structure ~us(trait):Block
G-structure ~idh(trait):units
family  "zapoisson"
fitting simple prior
I tried to get HDP Intervals but I got an error message. Apart from posterior mean, confidence intervals, effective size and MCMC p-values would it be possible to get any of the mentioned values?
Thanks in advance,
Martin
Relevant answer
Answer
That was very appreciated, thank you. The followed procedure on the paper looks very promising, although complex.
Regards.
  • asked a question related to MCMC
Question
5 answers
Hi,
i am running a multinomial multilevel regression. My dependent variable is the probability of a university to belong cluster A, B or C. I have 459 university nested in 33 countries. The predicting variables are universities characteristics. Two of the predicting variables generate problems in the estimate - "IGLS/RIGLS numeric warning SSP matrix for fixed part has gone negative definite". If I suppress the warning and allow the estimate, then when I turn to MCMC estimate the message is "MCMC Error 0315: Prior variance matrix is not positive definite". The 2 variables that causes problems are binary - 0 or 1. I am wondering if the problem is related to the fact that for many countries I only have '0s' and no '1s' ?
Thanks a lot, Marco
(btw, other discussions are available on the topic, and suggest there might be problems related to the data, but they do not discuss what these problems can be, nor how to solve them - see for instance: https://www.cmm.bristol.ac.uk/forum/viewtopic.php?t=23, https://www.cmm.bristol.ac.uk/forum/viewtopic.php?t=270
Relevant answer
Answer
The problem with the IGLS estimator is lack of information to estimate the model you have specified; then when you then take it over  as starting values for MCMC it cannot proceed as the starting values are inadmissible.
Two suggestions
1) choose a different reference base - I usually choose to drop the outcome that has the most frequent occurrence - this often works.
2) reduce the complexity of the model in the random part - In IGLS initially remove all the covariances at the higher level - you can click on each one to remove them one at a time  or  click on the Omega in the equations window and choose set diagonal matrix ; if its level 3 - the command is SMAT 3 0. You will see the covariances are no longer there but the variances on the diagonal are. Start in IGLS - hopefully to converge . Now put the covariances back in while in IGLS - they will have the values of zero (click on the Omega and request set full matrix - ie SMAT 3 1) . Do not press start in IGLS but switch to MCMC than Start - this should produce a positive definite matrix ). You will need a long run and a long burinn to get way from these initial values. Sometimes in IGLS even with no covariances  you get extreme results for the variance estimates - by that I mean an estimate of zero or greater than say 10. If you right  click on these problematic terms - you can set your own initial estimate - say something like 0.01 - in the IGLS mode and then switch to MCMC using these chosen estimates as your starting position. Again ensure by looking at the  trajectories of each parameter that the estimates have got away from the estimates you have imposed.
Apologies that this is rather involved but you are trying to fit a complex model with rather limited data and the algorithms sometimes need help - the art of modeling!
 See page 282  of the following for a bit more explanation and example
It is good to see that you are using CMM's resources - this site is a guide to all our material
and this is the section on the multinomial
And yes if all this fails your only option is to combine categories so that the algorithms have something to work on (the covariances at the higher level represent the tendency for a country to favour  A to also favor B - and if you do not have A and B in a country - there is not a lot to go on!)
  • asked a question related to MCMC
Question
1 answer
Hi,
I'm trying to implement the factor structure model using Gauss Quadrature or MCMC method as done by many in their research work.
The model is to find the effect of latent factors on response variable where in equation term it can be written as
Y_i = βiXi + βcLatentVariable1 + βnLatentVariable2 + ϵi
where Xi is a vector of controls
Now, one simple way that I know to implement the model is to use manifest variable to find factor scores and use scores as independent variable in above equation. The other way is to first estimate the distribution of factors and use that to estimate above model. But , I'm not sure, how to computationally do it using Gauss Quadrature or MCMC method. 
Can somebody please explain step by step how to computationally implement this model in simpler way in any of the languages from MATLAB, Stata or R.
Thank You!
Relevant answer
Answer
no i dont !!
  • asked a question related to MCMC
Question
4 answers
Am actually trying to trying  compare the existing method MCMC with other existing  methods with censored data based on "HIGH, LOW and MEDIUM censoring " any suitable link or material will help. 
Relevant answer
Answer
Hi. Thank you. I would still go to R help first. If you would be interested in running SAS my colleague at Wake Forest, Dr. Nonna Sorokina, might be able to help. Best wishes, David
  • asked a question related to MCMC
Question
5 answers
For my Matlab code, as soon as the number of random variables becomes 3, acceptance rate of MCMC using metropolis-hasting algorithm drops to less than 1%.
Relevant answer
Answer
You need to choose a different proposal density. If that does not help you may want to use some kind of adaptive MCMC algorithms.
  • asked a question related to MCMC
Question
2 answers
Suppose there are some survival data and we want to fit a distribution on them using Bayesian semiparametric approach. Dirichlet process prior is useful in this method. after computing conditional posterior distribution for all unknown parameters I want to simulating data from dirichlet process using Gibbs sampler method. now how can I do these?
Relevant answer
Answer
Gibbs sampler is used to estimate the posterior distributions of all known parameters. It seems that you want to sample from the data model. Based on the obtained posterior mixture model with known number of mixtures and corresponding parameters, the data model is now a finite mixture model, and then you can simply sample desired new data, all of which will follow the same density function.
  • asked a question related to MCMC
Question
3 answers
Bayesian evolutionary analysis, I have run my experiment for 100 million generations, but the ESS is still below hundred. What can I do?
Relevant answer
Answer
Increasing thinning intervals could help to reduce autocorrelation, although some authors discourage thinning technique.
Inclusion of parameter extensions is probabbly what you`re looking for. Some packages, as MCMCglmm (in R), already include easy ways to implement parameter extensions in the variance structure.
  • asked a question related to MCMC
Question
3 answers
Markov Chain Monte Carlo is a method based on Markov chains that allows us to obtain samples (in a Monte Carlo setting) from non-standard distributions from which we cannot draw samples directly. My question is for Markov chains and "state-of-the-art" for the Monte Carlo sampling. Another way question might be, there's no way such as Markov chain that can be used for Monte Carlo sampling of? I know that MCMC has a theoretical roots (in terms of conditions such as (a) periodicity, homogeneity, and the fine detail), but I wonder if there is "similar" probabilistic models / methods Monte Carlo to sampling the same Markov chain.
Relevant answer
Answer
You could look into nested sampling.
Instead of sampling from the posterior directly, it samples from a sequence of uniform distributions that contract around the peak. You can think of it as a cross between simulated annealing and MCMC.
Pros: 
  • Excellent at exploring complicated and multi-modal posteriors
  • Samples with very little a-priori information (no proposal matrices etc)
  • Simultaneously computes traditionally extremely challenging integrals over the posterior, such as the evidence (normalisation constant) or relative entropy
Cons:
  • No where near as established as traditional MCMC.
  • If your posterior is very simple, it will likely never be faster than MCMC
  • asked a question related to MCMC
Question
3 answers
I was trying to do Bayesian analysis on some of my sequence data using BEAST 1.7.5 to see how closely related they are and their migration patterns. 
The substitution model used was GTR+I+G (strict molecular clock). I did 10 million iterations primarily to have a better ESS thus a rich posterior probability. Well it worked fine and for each run, I had ESS <700.
But once their locations (discrete trait) are added to the analysis, ESS dropped down to <10. Even after combining 4 independent runs, ESS remained low (<75). Trees each run generated were significantly different and location patterns doesn't seem to right. The branch colours were really confusing.
Can anyone help me to get this analysis right with the discrete trait (location)?
I guess if everything goes right, the posterior probability values I got w/o locations should be similar to with locations, right?
My expertise with Bayesian algorithms and BEAST/ beauti is extremely low.
Thanks
Relevant answer
Answer
Hi Santiago,
Pretty much everything has a low ESS, not just posterior. It's just one discrete trait (locations) and total 65 sequences. 
What could possibly be the reason to fail the analysis? Lack of sequence and location data throughout the timeline to support the migration patterns? 
Thanks
  • asked a question related to MCMC
Question
4 answers
Hello, everyone! Recently, I was reading manuals of P4
 software about testing compositional homogeneity on the data. 
The author assumes Tree and model  based  composition  test 
works the best.Now I have three puzzles: 1) Which fixed tree should
 be used for this test? Is this tree the best tree generated by
 the homogeneous model using Bayesian MCMC? 2) Is the
 homogeneous model the best fit model suggested by software 
such as  Protest of Modelgenerator? 3) How to carry out these
 steps using P4, Can anyone provide scripts kindly? Any answer
 will be appreciated. Thanks!
Relevant answer
Answer
I am the author of P4. When speaking of "compositional homogeneity" or "compositional heterogeneity" we need to be clear about whether we are talking about homogeneity over the tree (or over time, or over taxa), or over the data. The Elhaik and Graur software IsoPlotter suggested by Davide F Castaldi addresses the latter, that is heterogeneity over the data. The Liu et al study and the Luo study both used the NDCH model as implemented in p4, and addressed compositional heterogeneity over the tree. Is that what the original poster had in mind?
To clarify my opinion for the original poster, I do not assume that the tree and model based composition fit test that I described there works the best. It has advantages over simpler tests, and is really a model fit test, and so is more general than a compositional homogeneity test. Both the Liu et al study and the Luo study used an MCMC, using posterior predictive simulations, which I would recommend over the tree and model based composition fit test. P4 comes with a lot of documentation and examples to show you how to do that, but you can use other software.
  • asked a question related to MCMC
Question
4 answers
As a lay man user of python,I am looking for some scripts (MCMC) to fit model to data. I want to  optimize parameters (cross correlation-each parameters contribution and effect in the model equation) using maximum  likelyhood approach using MCMC. I need to optimize parameters using a logistic regression model (cross-correlation).Any suggestions and help will be appreciated .
Relevant answer
Answer
There is well-spread standard for simulation of dynamical systems called Modelica. There are both commercial and open source implementation of Modelica. I use the free JModelica implementation since a few years. It runds on Windows (Linux and also Mac with some work). JModelica is using Python and all administration of simulation including presentation of result are done in Python. In the JModelica manual you find good and detailed examples of how to set up fit model to experimental data. See more at http://www.jmodelica.org
  • asked a question related to MCMC
Question
4 answers
I have a nexus tree file from a BEAST mcmc run, and would like to assess tree convergence in AWTY. However, when I upload the file (which is <500MB) and select an analysis, I always get the following error. I assume there is a problem with the file format, but am not sure what it is. Thanks!
"AWTY has encountered the following error:
errorString; ?>"
Relevant answer
Answer
Hi Holly - posting my response here in case others have the same issue. BEAST annotates output trees with lots of information that AWTY cannot parse. You will have to generate a new set of trees without these extra annotations. The easiest way to do this is to read your tree file in to R using the ape package, with a function like read.tree(). This will strip out all of the BEAST annotations as far as I know. You can then write a new tree file with topologies only that should be parsable by AWTY. 
  • asked a question related to MCMC
Question
8 answers
How can I calculate conditional and marginal r-squared (cf. Nakagawa et al 2013, http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00261.x/abstract) for a Bayesian multilevel, multi-response model fit with the R function MCMCglmm?
Thanks for any assistance!
Relevant answer
Answer
I don't know how to do this in R, because I'm more of a WinBUGS person for this sort of thing. There you'd just monitor the precision (or variance) and whatever ratio you're interested in, and it would just another parameter that was reported, with credible interval. So if you don't get any better suggestions, WinBUGS links with R very nicely. Somebody else might ask whether you really need the R-squared, and whether something else might be better.
  • asked a question related to MCMC
Question
2 answers
MCMCDA is the most effective technique for sampling from a target distribution.
Relevant answer
Answer