Science topic

Bayesian Inference - Science topic

Explore the latest questions and answers in Bayesian Inference, and find Bayesian Inference experts.
Questions related to Bayesian Inference
  • asked a question related to Bayesian Inference
Question
4 answers
jModelTest suggests set up this models, but MrBayes don´t accept it; neverthless in the paper: Colletotrichum gloeosporioides species complex. Studies in Mycology 13, they used some of those. Thank you for you attention. 
Relevant answer
Answer
mostly they are replaced by the GTR models. It is better to use Iqtree where the models are automatically set according to the dataset.
  • asked a question related to Bayesian Inference
Question
2 answers
I have noisy data points, where the peak signal-to-noise ratio (PSNR) may sometimes be less than unity (hence, more noise than signal may be present). I am fitting a model with fitting parameters to this noisy data, using MCMC (Markov Chain Monte Carlo) methods. I want to know if using a noise filter on the noisy data points (such as a Wiener filter in real space or a bandpass filter in Fourier space), before doing the MCMC fitting, would cause the 90% HPDI contour (highest posterior density interval) of the joint posterior probability distribution of the fitting parameters to be tighter or wider (precision), and closer or farther away from the true parameter values (accuracy)?
Relevant answer
Answer
As Ray Kidd mentioned, filtering data is futile. First, the noise is part of the data. In some cases noise can be informative. Filtering can not increase the information content of the data. The information content is an unalterable state of nature. Second, if the filter happens to be inappropriate, the parameter estimates can be meaningless. Third, sometimes the data information content is too low to compute meaningful parameter estimates. Using a filter tells you nothing about parameter estimate uncertainties.
One approach is to include a model for the noise in the parameter estimate model. If you know a lot about the noise, include all that information in the model for the data instead of using a data filter. Never include ad-hoc, indefensible assumptions about the noise (or the parameters).
I noticed Bayesian terms in your keywords. Under no circumstance should you filter the data with Bayesian methods. Bayesian methods include one or more models for the noise in the model for the data. This means there will be well-designed, prior probability, distributions for the noise. Objective, prior probability distributions consistent with maximum entropy principles are the best one can do.
The width of the posterior probability distributions for the parameters will tell you if the data information content is simply too low to compute meaningful parameter estimates.This can be explored by adding noise to simulated data (or empirical data). At some point the signal-to-noise ratio will no longer support meaningful parameter estimates.
  • asked a question related to Bayesian Inference
Question
2 answers
Hello fellow researchers,
I am doing a research which involves estimating the parameters of the Cox Ingersoll Ross (CIR) SDE using a Bayesian approach. I propose using the Euler scheme in my approach. Could some one please direct me to any implementation code out there in R, Python or Matlab?
Thank you !!
Relevant answer
Answer
For those who were following this question, after a long search I couldn't find any package that implements the CIR model under a Bayesian framework. So I wrote up a Python script to do that. Interested readers can find the code in my GitHub repository https://github.com/Kwabena16108/CIR-Bayesian-Estimation.
Hope this helps.
  • asked a question related to Bayesian Inference
Question
3 answers
I have been working on developing a unified marketing measurement model which combines the output of Market Mix Models and Multi touch attribution models.
I came across articles where Bayesian priors is a suggested method, but I am yet to come across any research paper which actually discusses the details of implementation/feasibility of using this Bayesian priors approach.
If someone is/has worked on this topic and could share some notes/references or guide me in the right direction, I would be highly obliged.
Relevant answer
Answer
When you combine MMM and MTA methodologies, you arrive at a unified measurement approach one in which multiple types of data sets, techniques, and approaches illustrate not only each channel’s impact on sales but also how the interaction between channels and non-media variables can influence buying decisions. In turn, these insights allow you to know where to invest your ad spend for the greatest incremental impact, as well as which creative, copy, and messaging will resonate best on each channel. This allows you to be more strategic and to improve tactical decisions.
  • asked a question related to Bayesian Inference
Question
4 answers
In Bayesian Inference, we have to choose a prior distribution of parameter for finding Bayes estimate which depends upon our belief and experience.
I would like to know what are steps or rule we should follow for taking a prior distribution of a parameter. Please help me with the same so that I can proceed.
Relevant answer
Answer
Some other short articles that will be extremly helpful to you are
  1. 3 Basics of Bayesian Statistics - CMU Statistics (http://www.stat.cmu.edu/~brian/463-663/week09/Chapter%2003.pdf)
  2. Chapter 12 Bayesian Inference - CMU Statistics (http://www.stat.cmu.edu/~larry/=sml/Bayes.pdf)
  3. Bayesian analysis - MIT OpenCourseWare (https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec15.pdf)
  • asked a question related to Bayesian Inference
Question
5 answers
I'm trying to establish Bayes factor for the difference between two correlation coefficients (Pearson r). (That is, what evidence is there in favor for the null hypothesis that two correlation coefficients do not differ?)
I have searched extensively online but haven't found an answer. I appreciate any tips, preferably links to online calculators or free software tools that can calculate this.
Thank you!
Relevant answer
Answer
Is it possible to do a Bayesian reanalysis, from OR data, which are converted to r-correlation values to estimate the Bayes factor?
  • asked a question related to Bayesian Inference
Question
3 answers
Dear colleagues!
Recently when reconstructing phylogenetic trees based on COI gene for species identification purposes I noticed the strict inconsistency of the topologies produced by neighbor-joining (NJ) and bayesian inference (BI) methods. As you can see on a figure NJ produce clear species bifurcation whereis BI forms stem cluster.
It is only typical for the species with low interspecific divergance (below 2%).
What are the inherent features of these algorithms to make such inconsistency?
Any ideas on BI behaviour?
I do not ask what is true or which algorithm is best. I'm curious what features of BI (for instance) could explain this.
Thank you!
Relevant answer
Answer
Bayesian inference usually produces a consensus tree likely to contain multifurcations, especially at the basal branches and for the data containing low numbers of variable sites. On the contrary, NJ tries to infer a binary tree and inserts short internal branches almost unsupported. Slight changes in the model of molecular evolution in this case are likely to change the whole topology. But it is highly probable to find an NJ tree among the top topologies in the MrBayes (or its likes :) output. For the same reason BI is more resistant to long branch attraction (seemingly not your case, still..)
  • asked a question related to Bayesian Inference
Question
8 answers
Hello fellow researchers,
I am doing a research in extreme value theory where I have to estimate the parameters of a generalized Pareto distribution using a Bayesian approach. I would really appreciate it anyone can point me to any code in R, Matlab or Python that estimate the GPD.
Relevant answer
Answer
# load packages library(extRemes) library(xts) # get data from eHYD ehyd_url <- "http://ehyd.gv.at/eHYD/MessstellenExtraData/nlv?id=105700&file=2" precipitation_xts <- read_ehyd(ehyd_url) # mean residual life plot: mrlplot(precipitation_xts, main="Mean Residual Life Plot") # The mean residual life plot depicts the Thresholds (u) vs Mean Excess flow. # The idea is to find the lowest threshold where the plot is nearly linear; # taking into account the 95% confidence bounds. # fitting the GPD model over a range of thresholds threshrange.plot(precipitation_xts, r = c(30, 45), nint = 16) # ismev implementation is faster: # ismev::gpd.fitrange(precipitation_xts, umin=30, umax=45, nint = 16) # set threshold th <- 40 # maximum likelihood estimation pot_mle <- fevd(as.vector(precipitation_xts), method = "MLE", type="GP", threshold=th) # diagnostic plots plot(pot_mle) rl_mle <- return.level(pot_mle, conf = 0.05, return.period= c(2,5,10,20,50,100)) # L-moments estimation pot_lmom <- fevd(as.vector(precipitation_xts), method = "Lmoments", type="GP", threshold=th) # diagnostic plots plot(pot_lmom) rl_lmom <- return.level(pot_lmom, conf = 0.05, return.period= c(2,5,10,20,50,100)) # return level plots par(mfcol=c(1,2)) # return level plot w/ MLE plot(pot_mle, type="rl",      main="Return Level Plot for Oberwang w/ MLE",      ylim=c(0,200), pch=16) loc <- as.numeric(return.level(pot_mle, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # return level plot w/ LMOM plot(pot_lmom, type="rl",      main="Return Level Plot for Oberwang w/ L-Moments",      ylim=c(0,200)) loc <- as.numeric(return.level(pot_lmom, conf = 0.05,return.period=100)) segments(100, 0, 100, loc, col= 'midnightblue',lty=6) segments(0.01,loc,100, loc, col='midnightblue', lty=6) # comparison of return levels results <- t(data.frame(mle=as.numeric(rl_mle),                      lmom=as.numeric(rl_lmom))) colnames(results) <- c(2,5,10,20,50,100) round(results,1)
  • asked a question related to Bayesian Inference
Question
13 answers
In general term, Bayesian estimation provides better results than MLE . Is there any situation, Where Maximum Likelihood Estimation (MLE) methods gives better results than Bayesian Estimation Methods?
Relevant answer
Answer
I think that your answer may vary depending on what you consider as better results. In your case, I will assume that you are referring to better results in terms of smaller bias and mean square error. As stated above, if you have poor knowledge and assume a prior that is very far from the true value, the MLE may return better results. In terms of Bias, if you work hard you can remove the Bias of the MLE using formal rules and you will get better results in terms of Bias and MSE. But if you would like to look at as point estimation, the MLE can be seen as the MAP when you assume a uniform distribution.
On the other hand, the question is much more profound in terms of treating your parameter as a random variable and including uncertainty in your inference. This kind of approach may assist you during the construction of the model, especially if you have a complex structure, for instance, hierarchical models (with many levels) are handled much easier under the Bayesian approach.
  • asked a question related to Bayesian Inference
Question
2 answers
I'm building a model that estimates the effect of a firm level variable (x) on its financial performance (y). It is a cross-classified three-level model with yearly measures of a firm in level-1 and firms in level-2. Firms are then cross-classified by countries and industries in level 3. To account for the contextual factors that moderate the x->y relationship, the model includes terms that interact (x) with moderators at firm level and industry/country level.
I'm able to estimate the model using frequentist HLM via LME4 on R and bayesian HLM via RStan.
Now, I'm interested in ranking the firms based on magnitude of effect that variable (x) has on financial performance (y). While some literature I came across ranked the individual effects using median values of the random effects, posterior means, or empirical bayes estimates, the models in these papers did not include any interaction terms.
I would appreciate any thoughts on ways in which such ranking can be done, while including the effect of various firm level and cross-level interaction terms.
Thank you!
Relevant answer
Answer
Please have a look at this article on multilevel analysis
  • asked a question related to Bayesian Inference
Question
4 answers
The Null Hypothesis "statistically" Significance Testing (NHST), in which the P-value serves as the index of “statistically significant,” is the most widely used [misinterpreted and abused] statistical method in psychology (Sterling et al., 1995; Cumming et al., 2007).
Rresearchers overstating conclusions beyond what the data would support and "careful fixing cheats if statistically significant", is quite common! However, a good reviewer (and experienced in same field/subject) would notice tht! Why? Increasing bait_like activities of predatory journals and a widespread lack of methodological sophistication, with researchers using poorly designed experiments with small sample sizes and inappropriate statistical models (Gelman and Carlin, 2014).
In the Neyman–Pearson framework optimally (which may not be possible in some instances) setting α and β assures long-term decision-making efficiency in light of our costs and benefits by committing Type I and Type II errors... this is frequestist approach, would bayesian approach make a difference where trial or studies repetition are limited/not possible? Perhabs, a prior calculations and report of the probability of replication to complement NHST! .
Relevant answer
Answer
First, p-values have nothing to do with design of studies but lots to do with interpretation. There are as u point out already other possible ways to look at things many involving Bayesian ideas. Many in the social sciences. As a statistics professor I have not seen an increased demand for such studies this makes me think that the people abusing p-values would probably abuse Bayesian ideas too We must,IMO get away from the idea that there's some magic thing that makes a study publishable which is what drives p-hacking in my view. I suppose people like that are not interested in good science but rather in body count of papers. In any case education is part of the answer. BTW, greater power is achieved by finding better methods that would make power obsolete perhaps. In any case as you might've expected I will call for increased statistical research and education. I don't however expect the people that are abusing p-values to have much interest. Thus better education in both methods and ethics may be the best hope for future researchers. Best wishes, D. Booth
  • asked a question related to Bayesian Inference
Question
5 answers
I wonder if it is possible to infer the origin of invasive lineages based on genetic data for many native and invasive populations. I first planned to use DIYABC v2.1.0, but I am not used to Genepop file format and information on input formatting of DNA sequence is scarce. Does it work with >300bp loci? Is triploidy a problem? Is there an easy way to convert FASTA data into genepop data? (Knowing that widgetcon does not handle DNA data for genepop format) Is DIYABC still used (last release in 2015) or are there better options for now? Thanks for any answer.
Relevant answer
Answer
Great, I hope everything goes well with snp-sites. If you can obtain the .vcf file, take a look in both these resources, they may help you to analyze your data in terms of structure: https://grunwaldlab.github.io/Population_Genetics_in_R/TOC.html https://popgen.nescent.org/index.html From my experiences, after having a .vcf, and with the help of PGDSpider, you can go for many different analyzes. Good luck and sure, tell me how everything is going.
  • asked a question related to Bayesian Inference
Question
6 answers
I have a dataset of 150 samples of 60 binary variables. The pcalg binCItest results are not impressive and i can not find a straightforward way to do this using bnlearn or catnet. I appreciate if you let me know your suggestion, especially if accompanied by a code snippet. Thanks
Relevant answer
Answer
bnlearn package in R should do the job. Maybe you need to change how your binary data is represented such as using yes/no text instead of 0/1 or something like that. I have never experienced any issues using bnlearn package for binary data
  • asked a question related to Bayesian Inference
Question
4 answers
I am completing a Bayesian Linear Regression in JASP in which I am trying to see whether two key variables (IVs) predict mean accuracy on a task (DV).
When I complete the analysis, for Variable 1 there is a BFinclusion value of 20.802, and for Variable 2 there is a BFinclusion value of 1.271. Given that BFinclusion values quantify the change from prior inclusion odds to posterior inclusion odds and can be interpreted as the evidence in the data for including a predictor in the model, can I directly compare the BFinclusion values for each variable?
For instance, can I say that Variable 1 is approximately 16 times more likely to be included in a model to predict accuracy than Variable 2? (Because 20.802 divided by 1.271 is 16.367 and therefore the inclusion odds for Variable one are approximately 16 times higher).
Thank you in advance for any responses, I really appreciate your time!
Relevant answer
Answer
If you have performed the analysis separately it may be an indirect inference that can be reported, although with the evidence the second predictor does not have any significant frequency effect, in case it is significant and you have performed the regression including both predictors you can refer a possible mediating effect of the first predictor.
  • asked a question related to Bayesian Inference
Question
1 answer
Hi,
I have a standard SEIR model and would like to run a simple Bayesian MCMC (Metropolis-Hastings) inference on COVID data. How do you do this on R?
Many thanks!
Relevant answer
Answer
Package 'EpiILM' - CRAN
  • asked a question related to Bayesian Inference
Question
3 answers
When using the Bayesian inference approach to identify the posterior distribution of the parameters, should we randomly sample each of the parameters from their corresponding posterior distributions to get the variance of the prediction, or should we just sample points from the converged trace plot (a combination of well-fitted parameters values)?
As there could be a case where different combinations of parameters were able to give a good fit during the fitting process due to correlation. Wondering if taking random samples from the trace plot will be able to capture the parameter uncertainties?
Thank you.
Relevant answer
Answer
If you have a sample from the marginal or the conditional distribution with closed-form expression, you can compute directly the posterior estimates, such as posterior mean and variance. If you are using some approach such as the metropolis-hastings algorithm you have to check the convergence of your chains before compute your posterior estimates. Just remember that if you want to make predictions, you have to sample for the predictive posterior distribution.
  • asked a question related to Bayesian Inference
Question
44 answers
Assume that a measurement gives n observations y1, y2, ......yn. The data are drawn from a normal distribution: Y~N(μ,σ^2), and prior distribution is μ ̃~N(y_prior,σ_prior^2). If σ^2 is known, the posterior mean is the weighted mean of the sample mean y ̅ and the prior mean y_prior. This is the standard solution that can be found in many textbooks or lecture notes. When σ^2 is unknown, I expect to have a similar solution, i.e. simply replace σ^2 with its estimator s^2 in the posterior mean. However, I cannot find this solution anywhere. My question is: “does this solution exists?” If the answer is yes, where I can find a reference? If the answer is no, why?
Relevant answer
Answer
Let me point at the core difference between the "usual" Bayes' and the Jeffreys' proposal.
1. usual Bayes uses prior info in the form of a FINITE weight, reducible without loss of gnerality to probability distribution, called then prior pd.
2. Jeffreys' algorithm - also possible to be named Bayesian - uses the prior info in a form of INFINITE measure, which is a product of
2.1. (obviously finite) normal pd for the mean and
2.2. (obviously infinite) Lebesgue mesure on positive reals R+ = (0, oo).
3. In the case under considerations the resulting a posteriori info is a finite weight and it can be for further needs used in the "usual" Bayes' as a prior probability distribution.
Hopefully, this explains the acceptability ofJeffreys' formulas when there is NO INFO A PRIORI ABOUT THE SIGMA.
PS1. One can use any other algorithm, since statistics is based on our "impression" that we have about existence of some probabilities. In every case the results of the statistical inferrence are subject of a CONVENTION BETWEEN USERS; the mathematical derivation of algorithms is only a tool to transform principles into concrete formulas. But the principles (e.g. maximum likelihood) are our free choice supported by some elegant consistency with our image.
PS. Moreover, in every case when applying the probabilities to a single event, we are completely lost wondering why the things do not happen according to our expectations. The answer is: For a single event no probability calculus can help predict the result.
  • asked a question related to Bayesian Inference
Question
2 answers
I am trying to develop a multivariate Poisson-lognormal CAR model. My model has 3 dependent variables, 10 explanatory variables and structured and unstructured random effects. One of my most important variables (traffic volume) in the model is providing a negative sign which should be positive. When I develop a simple negative binomial model (with all variables), the variable ((traffic volume) sign is positive. Thinking it might be because of the influence of another variable, I tried to develop the MVCAR model with only (traffic volume) and structured and unstructured random effect, but the sign is still negative. I cannot remove this variable as its the most important one.
I should be noting that the variance of heterogeneous effects for one of my dependent variable is extremely large (around 5000).
Can anyone tell me what might be the issue?
Thanks in advance
Relevant answer
Answer
The sign of a relationship can change when you put in random slopes. In standard regression the fixed slope is the general slope, but when random slopes are used, the associated estimate of the fixed slope is the precision-weighted slope averaged across all the allowed to vary slopes . see
  • asked a question related to Bayesian Inference
Question
2 answers
I'm not just interested in priors (probability) of unwanted outcomes and self generating hypothesis, I'm also interested in the level of relevance (values) of certain stimuli.
A paper that can explain some of my interests is: "Predictive Processing and the Varieties of Psychological Trauma" (2017) by Wilkinson, Dodgson & Meares.
Relevant answer
I suggested you to search the main papers in this area and then call the authors or co-authors to start a conversation.
  • asked a question related to Bayesian Inference
Question
7 answers
Which of the two criterion is more appropriate to select the model of nucleation substitution?
Relevant answer
Answer
Relics of Phylowar II can be found in this thread.
  • asked a question related to Bayesian Inference
Question
5 answers
Dear Researchers/Scholars,
Suppose we have time series variable X1, X2 and Y1. where Y1 is dependent on these two. They are more or less linearly related. Data for all these variables are given from 1970 to 2018. We have to forecast values of Y1 for 2040 or 2060 based on these two variables.
What method would you like to suggest (other than a linear regression)?
We have a fact that these series es have a different pattern since 1990. I want to make this 1990-2018 data as prior information and then to find a posterior for Y1. Now, please let me know how to asses this prior distribution?
or any suggestions?
Best Regards,
Abhay
Relevant answer
Answer
Let me play the devil's advocate:
You have data for the past 50 years. However, you say that there is a mayor break or change in the pattern around 1990, so that you want to use only the more recent 30 years ... to predict what will be in 30 or 50 years in the future?
I doubt that this makes any sense. Toss some dice. It will be as reliable as your model predictions.
If "phase changes" like around 1990 can happen, they can happen in the future, too. Additionally, many other things can happen that we are not even aware of today. The uncertainty about such things must be considerd. Further, as you don't have any model that might be justifed by subject matter arguments, there is a universe of possibilities, again adding to the uncertainty of the prediction. If you consider all this, you will almost surely find that the predcition interval 30 or 50 years ahead will be so wide that it can't have any practical benefit.
You can surely grab one possible model, select some subset of your data, and neglect anything else, then you can make a possibly sufficiently precise forecast, which applies to this model fitted on this data, assuming that nothing else happens or can impact the dependent variable. Nice. But typically practically useless. It's a bit different when you has a model, based on a theory. Then you could at least say that this theory would predict this and that. But if you select a model just because the data looks like it's fitting, you actually have nothing.
It's important to think about all this before you invest a lot of work and time in such projects! It may turn out, in the end, that your approach is still good and helpful. But many such "data-driven forecast models" I have seen in my life have benn completely worthless, pure waste. Good enough to give a useful forecast for the next 2-3 years, but not for decades.
  • asked a question related to Bayesian Inference
Question
3 answers
Hello,
I have identified clades with fair support values (BI >0.7) in my backbone tree. I would like to reinforce these supports so I rebuilt ingroup trees for each of the clades using their respective sister and basal groups as outgroups, and using more markers. The ingroup tree contains only a subset of individuals of the clade due to sample and seqeunce availability, but it does encompass all of the subclades. Suppose the ingroup and the backbone trees share the same topology, is it reasonable for me to substitute the backbone branch support with that of the ingroup? (with annotation noting that the branch support is reinforced by a ingroup tree)
Thank you very much in advance!
Relevant answer
Answer
@LaHim, I think that approach makes sense overall. I would also suggest including the actual support values on the summary tree that are over some threshold (e.g. 60%). With the subtrees, it sounds like there is a trade-off between including additional sequence data vs. not including some taxa within the presumed clades because data are not available. There's no single right answer to how to make that decision, but I would avoid simply picking the combination that gives you the highest support values.
  • asked a question related to Bayesian Inference
Question
30 answers
I've been thinking about this topic for a while. I admit that I still need to do more work to fully understand the full implications of the problem, but suggests that under certain conditions, Bayesian inference may have pathological results. Does it matter for science? Can we just avoid theories that generate those pathologies? If not, what can we do about it?
I have provided a more detailed commentary here:
Relevant answer
Answer
My practical experience suggests that that different priors do not usually make much difference unless there is very liitle information in the data. indded different results could be seen as a feature and not a problem so that you could do a sesnitivity analysis with different priors. There is a nice eaxmple of this in Clayton and Hills Statsitical Methods in epidemiology
  • asked a question related to Bayesian Inference
Question
1 answer
Hello,
I seem to be having issues with convergence in my Bayesian analysis. I'm using a single gene large dataset of 418 individuals. My PSFR values say N/A in my output but my split frequency is 0.007. Also, my consensus tree gives me posterior probabilities of 0.5 or 1 with no distnguishable clades (see attached). Below is my Bayes block:
begin mrbayes;
charset F_1 = 1 - 655\3;
charset F_2 = 2 - 656\3;
charset F_3 = 3 - 657\3;
partition currentPartition = 3: F_1, F_2, F_3;
set partition = currentPartition;
lset applyto=(1) nst=6 rates=gamma;
lset applyto=(2) nst=2 rates=invgamma;
lset applyto=(3) nst=6 rates=gamma;
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
mcmc ngen= 24000000 append=yes samplefreq=1000 nchains=8;
sump burnin = 10000;
sumt burnin = 10000;
end;
Any advice? Thanks!
Relevant answer
Answer
You have a fairly larger dataset so I would try with more generation time. On other hand I would check a modeltest for the 1st and 2nd as often they have the same model when tested (I don't have much experience but that is what I have seen).
I am not sure you need to you need to unlink the 3 partition and then set the priors as variable. Have you tried to remove:
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
Finally, I don't see in the code where do you set the 2 independent runs (nrun=Number or independent analysis with the same dataset and script) so I guess you run 8 chains but in a single run. So how come they don't converge? All the chains are dependent.
Try to add nrun=2 [usual for Bayesian] each run has 4 chains by default so no need to set that up.
mcmc ngen= 24000000 append=yes samplefreq=1000 nrun=2;
  • asked a question related to Bayesian Inference
Question
5 answers
In the litterature I found that we could use at all three different models ( 1. Neighbor-joining trees, 2. Maximum Likelihood and
3. Maximum Parsimony and Bayesian Inference). I would know what is the method that I could apply for my plasmids; knowing that they assigned to the same IncL/M group.
Thank you
Relevant answer
Answer
No neighbor doing is not best except for getting a rough first view of the quality of your data an a preview of what you might find with a better method of maximum likelihood. So many advantages to likelihood. Salmi please cite some reasons that you think NJ is best. See my earlier response.
  • asked a question related to Bayesian Inference
Question
5 answers
In Bayesian inference, likelihood function may be unable to obtain posterior distribution in some cases. In this cases, is there any alternative approach (apart from MCMC-alternative methods) as alternative to likelihood function for bayesian inferences ?
Relevant answer
Answer
You can dear brother, Çağatay Çetinkaya use the Approximative bayesian methods like: Integrated Nested Laplace Approximation developped bay Rue, H. and Martino, S. and Chopin, N. (2009). a package named "INLA" is available in R cran , or simply you will find a lot about it in http://www.r-inla.org
  • asked a question related to Bayesian Inference
Question
5 answers
Hello,
I am working with an outcome variable that follows a count (Poisson) distribution.
I have 3 IV that follow a normal distribution and 1 DV that follow a count distribution. Thus, I'd like to compute a Negative Binomial Regression.
Yet, instead of a Maximum Likelihood Estimation, I would like to use Bayesian Inference Approach to specify the estimate of my model Negative Binomial Regression.
But I really cannot manage (yet) to understand how to compute a Bayesian Negative Binomial Regression in R.
I would be really delighted and grateful is someone could provide any help in this regard,
Thank you!
Sincerely,
Nicolas
Relevant answer
Answer
First, note that the distribution of IVs does not matter in regression models.
The brms package in R provides Bayesian negative binomial regression. The command for a full model would be:
brm(DV ~ IV1 * IV2, family = "negbinomial", data = YourData)
You can extract and interpret the results in much the same way as Poisson regression, which I describe in chapter 7.4 of my book draft:
  • asked a question related to Bayesian Inference
Question
7 answers
Dear Researchers,
I have a single point for my parameter as a prior information and 26 data points as current data-set.
How can I incorporate that point(/single point prior value) while doing Bayesian Analysis.
(Initially, I use to run a model with non-informative prior without considering the old info as it wasn't valid).
In this particular case, I want to know, Is there any way to include this old evidence (single prior point)? If yes, How can I? Which way should I select and why?
Best Regards,
Abhay
Relevant answer
Answer
If I understand you correctly, you want to concentrate the density of your prior distribution about that parameter on a single point. In other words, the prior will be a point mass distribution. Then your posterior will also be a point mass distribution and concentrates all density on that point. I highly doubt that is what you really want, since your data will not influence the posterior at all. If you are really confident, put a informative prior on that parameter that will also allow some uncertainty.
  • asked a question related to Bayesian Inference
Question
4 answers
Suppose we have n samples (x1, x2, …, xn) independently taken from a normal distribution, where known variance σ2 and unknown mean μ.
Considering non-informative prior distributions, the posterior distribution of the mean p(μ/D) follows normal distribution with μn and σn2, where μn is the sample mean of the n samples (i.e., μn=(x1+x2+…+xn)/n), σn2 is σ2/n, and D = {x1, x2, …, xn} (i.e., p(μ/D) ~ N(μn, σ2/n)).
Let the new data D’ be {x1, x2, …, xn, x1new, x2new, …, xknew}. That is, we take additional k (k<n) samples independently from the original distribution N(μ, σ2). However, before taking the additional samples, we can know the posterior predictive distribution for the additional sample. According to Bayesian statistics, the posterior predictive distribution p(xnew/D) follows normal distribution with μn and σn2+ σ2 (i.e., p(xnew/D) ~ N(μn, σ2/n+ σ2)). Namely, the variance becomes higher to reflect the uncertainty of μ. So far, this is what I know.
My question is, if we know p(xnew/D) for the additional samples, can we predict the posterior distribution p(μ/D’) before taking the additional k samples? I think that p(μ/D’) seems to be calculated based on p(xnew/D), but I have not gotten the answer yet. So, I need help. Please borrow your wisdom. Thanks in advance.
Relevant answer
Answer
I am not sure that I am right but lets add something to the discussion.
As likelihod is the joint prob (prodct of pdfs)of the sample and also prior/posterior predictive is the prob fnctn of xnew, therefore, to me, the liklihood fntn of x and xnew is the product of likelihood of x and prior/posterior predictive of xnew. U have the likelihood fnctn u can derive the posterior p(mu/D’).
  • asked a question related to Bayesian Inference
Question
38 answers
The question stated above is the title of a book review (see http://www.journals.uchicago.edu/doi/pdfplus/10.1086/694936?utm_content=bufferaebfd&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer ). I thought it would be interesting to read both opinions about the book reviewed ("The Future of Phylogenetic Systematics: The Legacy of Willi Hennig.") and colleagues' own answers to the questions: "Does the Future of Systematics Really Rest on the Legacy of One Mid-20th-Century German Entomologist?"
Relevant answer
Answer
The question posed by David Baum in the title of his book review, "Does the future of systematics really rest on the legacy of one mid-20th-Century German entomologist?" is deliberate distortion of our book's title, :The future of phylogenetic systematics: the legacy of Willi Hennig." The book is comoosed of contributions from a Linnean Society symposium that celebrated Willi Hennig's 100th birthday, and it was the organizers' intent to mark his legacy with a book that both looked back at Hennig's historical impact on phylogenetics, and tried to assess how his contributions are still relevant today. All the plesion discussion above is a distraction that has nothing to do with either our book or with Baum's review.
Does the future of evolutionary biology rely on the contributions of one 19th Century English biologist? Does the future of physics rely on the contributions of a 17th-Century English physicist/astronomer? Probably not. But that does not mean that we do not honor the contributions of Darwin or Newton or deny that subsequent work "rests on their legacies." As Newton himself said, "if I see further, it is because I stand on the shoulders of giants."
The fact that people are still arguing about whether or not classifications ought to be based on monophyletic (in the Hennigian sense) groups is a pretty clear indication that Hennig's ideas remain pertinent in the 21st Century.
  • asked a question related to Bayesian Inference
Question
4 answers
Dear all,
Here is my question:
In Bayesian inference, we can always iterate the process of transforming prior probabilities into posterior probability when new data is collected. I wonder if the frequency of data collection can affect the final result. For example, if I want to decide whether a coin is a fair one, I can either 1) flip the coin 10 times and get 4 heads, and then flip it for another 10 times and get 3 more heads; or 2) mark the same observation as flipping the coin 20 times and getting 7 heads. In the first case, I will do Bayesian inference twice, while in the second one, I only need to infer once. Would such a difference in the frequency of data collection affect the final result?
Thanks to your help!
Relevant answer
Answer
Many thanks to Luca, Jochen, and Christopher. You three have made it quite easy to understand!
  • asked a question related to Bayesian Inference
Question
2 answers
I am currently doing a phylogenetic analysis for a species of virus having many subtypes (Human papillomavirus) for around 100 whole genome specimen and 50 partial sequences belonging to ~6 subtypes. Is there a specific number of Maximum Likelihood / Bayesian Inference trees I need to infer for the phylogenetic study to be accurate?
At the moment for ML I aligned the sequences with ClustalO and
using RAxML with 25 ML trees and autoMRE stopping criterion. But I have no idea if 25ML treees is enough for the analysis
Relevant answer
Answer
Hi Manuel,
Concerning Bayesian analyses there are some parameters that can be checked to see if the analyses have been run for enough time. You can check the “average standard deviation of split frequencies” (if you run MrBayes for example). You can check it in the program screen log while the analyses are running. Usually values below 0.01 are considered acceptable.
Which program are you using?
Alternatively, you can check the output files in Tracer. If you have values of ESS above 200 for the different parameters, the analyses are ok.
I hope it helps,
Salvatore
  • asked a question related to Bayesian Inference
Question
4 answers
Using R's arms package, I've run two Bayesian analyses, one with "power" as a continuous predictor (the 'null' model) and one with power + condition + condition x power. The WAIC for the two models are nearly identical: -.017 difference. This suggests that there are no condition differences.
But, when I examine the credibility intervals of the condition main effect and the interaction, neither one includes zero: [-0.11, -0.03 ] and [0.05, 0.19]. Further complicating matters, when I use the "hypothesis" command in brms to test if each is zero, the evidence ratios (BFs) are .265 and .798 (indicating evidence in favor of the null, right?) but the test tells me that the expected value of zero is outside the range. I don't understand!
I have the same models tested on a different data set with a different condition manipulation, and again the WAICs are very similar, the CIs don't include zero, but now the evidence ratios are 4.38 and 4.84.
I am very confused. The WAICs for both models indicate no effect of condition but the CIs don't include zero. Furthermore, the BFs indicate a result consistent with (WAIC) no effect in the first experiment but not for the second experiment.
My guess is that this has something to do with my specification of the prior, but I would have thought that all three metrics would be affected similarly by my specification of the prior. Any ideas?
Relevant answer
As Gelman says on the BDA book, model selection still is a on going subject of research. I would trust more on the posterior intervals, given that they were grounded on probability axioms.
If the answer is useful, please recommend for others,
  • asked a question related to Bayesian Inference
Question
4 answers
Hi,
Recently I was reading a paper from Will G Hopkins about magnitude-based inferences and downloaded his related spreadsheet. This concept is new for me and still gives me a lot of doubts. Indeed I did not understand what he means as "value of effect statistic" and what he considers beneficial (+ive) or harm (-ive). Someone can help me ?
Thanks in advance
Relevant answer
Answer
magnitude-based inference(MBI) is an interpretation that clinicians need in their evidence-based approach. Due to clinical heterogeneity and small sample sized, clinicians' decision-making about an treatment is not only based on statistical significance, but an overall interpretation of effects including benefits and harms of a treatment, confidence interval (range of uncertianty).
  • asked a question related to Bayesian Inference
Question
5 answers
I am have some well-grounded knowledge in Bayesian Inference, Linear mixed models, and probabilistic graphical models. Image processing is a new learning topic for me.
Relevant answer
Answer
I like this one: http://szeliski.org/Book/
However, if you are more interested in low-level image processing, I would suggest
  • asked a question related to Bayesian Inference
Question
10 answers
I am aware that the Consistency Index uses the number of changes in a matrix, but I haven't found a way to do the matrix nor to calculate this index on any software.
Relevant answer
Answer
MESQUITE is a phylogenetic analysis package (as are MEGA, PHYLIP, PAUP, DAMBE and a few others).   A package such as MESQUITE has many advantages such as having built-in functions for consistency index scoring, but a disadvantage is that you need to leard how to use the package, such as importing your data file, importing your treefile, and running the job you want.  It can sometimes be frustrating to figure out exactly what data format is needed for each type of data.
Anyway, MESQUITE does have a consistency index module built in.  I do not find this built in to DAMBE or MEGA. 
Before you go to a lot of trouble calculating the consistency index value for your data and tree, I think you should find out if you will gain any useful information from this value.  Do you know what a "good" value should be for your type of data, for example?  The Consistency Index can be very useful for morphological character data sets in some organisms where morphology evolves nicely.  For DNA and amino acid sequence data the consistency index usually does not give us much information about the quality of the data or the tree.
  • asked a question related to Bayesian Inference
Question
1 answer
I've assessed the affect of a population in a field study from a dimensional perspective using the Self-Assessment Manikin. Would it be possible to calculate or infer anxiety from arousal, valence, and dominance SAM scores?
  • asked a question related to Bayesian Inference
Question
3 answers
I want to calculate BIC value but I have problem with it in Lisrel. Is Lisrel calculate the Bayesian Information Criteria (BIC)? How can I calculate BIC value in Lisrel output?
Relevant answer
Answer
I haven't touched Lisrel for maybe 6 years, but in my knowledge its output includes the number of parameters and AIC (not sure if it also prints -2ln(L) or L in the output). You can use these two equations:
AIC= 2t - 2ln(L) and BIC = tln(N)- 2ln(L)
where t is the number of parameters.
  • asked a question related to Bayesian Inference
Question
11 answers
AIC and BIC are Information criteria methods used to assess model fit while penalizing the number of estimated parameters. As I understand, when performing model selection, the one with the lowest AIC or BIC is preferred. In a situation am working on, a model with the lowest AIC is not necessarily the one with the lowest BIC. Is there any reason to prefer one over the other? 
Relevant answer
Answer
As Geoffrey pointed out, the BIC penalizes more heavily for complex models.  So the context certainly matters here.
  • asked a question related to Bayesian Inference
Question
3 answers
I was wondering if it makes sense to keep characters that are parsimony-uninformative (autapomorphies) when making a phylogeny analysis using Bayesian inference. Since it is a probabilistic method, it may be informative to keep these characters. Does it make sense?
Relevant answer
Ok. I will keep then. I think it is important to show if there is "morphological long branch". Even in parsimony it is interesting if there is a branch full of "autapomorphies", because it highlights that the taxon is very different from the others, and taxa such as these are usually "wildcards" too;
  • asked a question related to Bayesian Inference
Question
2 answers
For my research I am required to develop a computational model that incorporates various parameters( with various measurement units) to be the bases of visual task classifier. 
I read some papers, where some authors used Bayesian inference and others used Markov Model. 
I understand general concept of both techniques, but I would like to understand them in a better way. 
Can anyone kindly suggest easy to understand resources that explain how such techniques are used in developing mathematical models and in classification?
Relevant answer
Answer
Hi! If you are not familiar with mathematical modeling, there are three resources I would recommend above all:
Bishop - Pattern recognition and machine learning (a real classic in Bayesian machine learning)
Hastie, Tibshirani, Friedman - The elements of statistical learning (a frequentist view on learning)
Murphy - Machine learning (a modern book with a lot of applications)
  • asked a question related to Bayesian Inference
Question
3 answers
Hi all,
I'm running the new version of MrBayes and I'm using Mega to look at the resulting tree. I noticed that MrBayes automatically includes a translate block in the tre file, substituting the taxa labels by numbers. However, I'd much rather have the original taxa labels in my tree than these numbers. Does anybody know how to avoid this from happening? This hasn't been a problem with earlier versions of MrBayes.
Thanks!
Relevant answer
Answer
Hi Laura, 
The problem is not so much with MrBayes, but rather with MEGA as a tree viewer. MrBayes does include the Translate block to store all the sampled and consensus trees from mcmc run in a more efficient way. Most tree viewer programs (FigTree, TreeGraph, TreeView, Mesquite, etc.) have no problems reading tree files with such Translate blocks and show your OTUs with the original taxa labels; MEGA only reads the newick trees from your tree files but not the Translate block, so it shows your trees with numbered OTUs as they appear in the newick format. I've been using Mesquite and FigTree to visualise trees from the new MrBayes and had no problems with taxa labels (these programs also offer more options to edit, format, and analyse trees).
Cheers,
Oksana
  • asked a question related to Bayesian Inference
Question
3 answers
MCMC imputation
Relevant answer
Answer
Okay, then ideally, these variance components should be exactly what I have explained above:
"between variance" the variance among treatments, that is, how much of the variation in the response variable(s) is explained by your explanatory variable(s) ; and the "within variance" is the variance left unexplained (how much variation there is within treatments after removing the differences among them). The total variance should simply be the sum of within and between variances
Now I don't know SAS (I mostly use R) and I don't know exactly what method they use in this procedure. If your data are not balanced, the order of the explanatory variables in the model can change the variance components, or not, depending on the type of sum of squares used. So the interpretation of the variance components is conditional on the previous components ("how much variation does Y explain after accounting for X? how much variation does Z explain after accounting for X and Y?...)
If it is not clear for you, I recommend you read more about ANOVA (similar enough to ANCOVA but better documented), sum of squares and F-tests; and the SAS documentation.
Cheers,
Timohée
  • asked a question related to Bayesian Inference
Question
3 answers
Dear All,
I am using a mesh based Monte Carlo (MC) package (TIM-OS) and want to compare surface intensity with  analytical solutions (e.g. Farrell et al).
MC packages give Fluence as output parameter; is it the same as intensity?
How it could be converted to photon/mm2 ??
And finally mesh based packages give this parameter for each element or surface; how it could be compared to analytical solutions which are fully spatially resolved?
Any idea of insights are greatly appreciated.
Relevant answer
Answer
Compare deviance of both. Smaller deviance is better.  Moreover,  smaller  posterior SD will be better approximation, which often seen with simulations.
a a khan
  • asked a question related to Bayesian Inference
Question
9 answers
What is the most generalized and efficient way to express the underlying rules that govern a system? What is the most efficient route to discovering such rules?
Relevant answer
Answer
Hello,
Thoroughly designed experiments where preferably only independent variables are being manipulated. 
Regards, Witold
  • asked a question related to Bayesian Inference
Question
13 answers
In some cases, e.g. with the selection of certain priors, the results from Bayesian methods will yield the same results as using ML methods. In that case, then why use Bayesian? What is the advantage of using Bayesian if in some cases we will get the same results as ML? Also, should both methods always end up with the same conclusions for the same data or for the same statistical technique used?
Relevant answer
Answer
Ideally Bayesian analysis should get the same results but the two search processes are quite different and the tree criteria are different.
Maximum likelihood analysis searches specifically for the globally optimal (highest likelihood) tree estimate. Bayesian analysis yields a consensus tree of all the sampled trees after the burnin period. Usually (if the priors are well chosen) the set of Bayes trees will include the ML tree but not always.
  • asked a question related to Bayesian Inference
Question
3 answers
Please, is anyone have some references regarding the influence of sampling on inference when using bayesian statistics?
I just beginning to use bayesian and I try to better understand some results on personal datas with verry heterogeneous sample size.
Thank's in adance
Guillaume
Relevant answer
Answer
The paper by Siu and Kelly from 1998 for example explains this nicely in my opinion.
  • asked a question related to Bayesian Inference
Question
6 answers
in Bayesian inference we can model prior knowledge using prior distribution. There is a lot of information available on how to construct flat or weakly informative priors, but I cannot find good examples of how to construct a prior from historical data.
How, would I, for example, deal with the following situations:
1) A manager has to decide on whether to stop or pursuit the redesign of a website. A pilot study is inconclusive about the expected revenue. But, the manager has had five projects in the past, with a recorded revenue increase of factor 1.1, 1.2, 1.1,  1.3, 1. How can one add the managers optimism as a prior to the data from the pilot study.
2) An experiment (N = 20) is conducted where response time is measured in two conditions, A and B. The same experiment has been done in dozens of studies before, and all studies have reported their sample size, the average response time and standard deviation of A and B. Again: How to put that into priors?
I would be very thankful for any pointers to accessible material on the issue.
--Martin
Relevant answer
Answer
Straight foreward: use the old data and some reasonable "flat" prior and calculate the posterior for that data. This will be the prior for your new data.
  • asked a question related to Bayesian Inference
Question
4 answers
Please see my question that is attached about prior distribution.
Relevant answer
Answer
You might consider a hierarchical model, where you have n independent Dirichlet distributions with common parameter (like in your reply to Hamideh Sadat Fatemi Ghomi), and then put a prior on the common parameter.  This would typically require MCMC to estimate. But you could also use empirical Bayes to come up with a point estimate for the common prior and solve this model exactly.
  • asked a question related to Bayesian Inference
Question
5 answers
I have used spBayes (spGLM) etc, for binary data. Now I want an r package to do analysis on ordered spatial data using Bayesian inference via MCMC. GPS coordinates of observations are included in the data. Suggestions of the appropriate package, code or examples of usage will be appreciated.
Thanks in advance.
Relevant answer
Answer
What about CARBayes?
  • asked a question related to Bayesian Inference
Question
7 answers
I am new to modern phylogenetic analysis, so there are many things I still don't understand. For instance, recently, morphology found a second wind in phylogenetic studies, with many analyses using discrete morphological characters on equal ground with sequences in Bayesian framework. However, most if not all the works I've read were using Mk model of evolution for such characters. At first glance, it is highly unrealistic - a supposedly complex phenotypic character basically behaves like a site, completely neutral and stochastic.
Nevertheless, the trees generated from purely morphological datasets by Bayesian analysis (MAP or consensus) are always extremely similar or even identical to the trees generated by Maximum Parsimony analysis (the shortest tree). Why? Is it some property of the parsimony method itself and Mk model just simulates it?
Also, could someone please explain to me, what's the biological meaning of "stationary distribution" for a phenotypic character in Bayesian framework?
Thank you.
Relevant answer
Answer
Hello,
The answer to your first question is here :
Its a complex and interesting subject !
  • asked a question related to Bayesian Inference
Question
4 answers
What is the raison d'être of uninformative priors? Isn't one of the Bayesian goals that of allowing to systematically incorporate prior information into inference?
Thanks!
Relevant answer
Answer
# If we want inference to be driven solely form data, why do we even bother specifying a prior?
Inference neccesarily goes beyond data, and information is a quantity or entity that exists only in the relation of data and a context (what is here the current "state of conviction" [you may call it "konwlege"], the assumed models, and the kown circumstances of the experiment/study/survey/data collection).
So data alone is and cannot not enough or the only thing to make inference. One needs a model and a context to assert the information of the data (relevant to and related to the model and the context). The model can often be described in some formal way, like a statistical model, either being "obviousely reasonable" or derived from "obviousely reasonable" basic assumptions. The likelihood is the a function that relates the data to the model. This still is not enough to make inference. We still need a context to interpret this function (or likelihood ratios, or p-values from likelihood-ratio tests, etc.). An example can demonstrate this:
If I play lotto (the experiment) the event of winning the lottery has a very small probability under the hypothesis (model) that the result was just a wild guess. The probability of winning would get a mconsiderably higher probability if I had psychic powers so that I could foresee the next lotto numbers. Now I perform the experiment and I win. Wow :) The p-value of this result P(win|guessing) is very small. But would I take this result to conclude that I have psychic powers? Certainly not. But this conclusion is not driven by the data but by the context, by what all we know (or belive) about how the world works. In this context the result remains a "lucky" incident, nothing else, because there is no other data that would be simpler/better be explained from such an assumption and there is such a lot of data that is much more likely if we do not assume that I had psychic powers.
This was quite an extreme example, because we are so sure that psychic powers do not exist (what only means that we would need a hell lot of good data obtained from adequate experiments to reconsider its existence). There are other examples where the influence of the prior knowledge (or believe) is not that drastic.
What is the sense of a so called "uninformative prior"?
Priors are essentially arbitrary. There is no law that would prohibit that two people (agents, social groups, scientific communities) have considerably different priors. Several aspects of the priors can be attributed to (or be seen as consequences of) considering particular (prior) nformation. The above example uses an extremely "informative" prior because there is very much experience about people obviousely lacking psychic powers and because we have no idea how we would integrate the existence of psychic powers into the body of our other models. But if we could not agree on how this information is to be weigted, in other words: if we could not agree on a common wager in a bet for/against the existense of psychic powers? A hoped-for solution is to eliminate all the impact of our former experiencees and of the rest of our models, what would lead to a "non-informative prior".
A sensible non-informative prior is the uniform prior, what says that one considered all possible hypotheses as equally likely. It makes sense in a way that it expresses that we don't see any reason to prefer one of the hypotheses over any other (this is related to Laplaces principle of indifference). However, the hypothesis space may be transformed. But a uniform prior in one space is non-uniform in a transformed space. Here Jeffreys proposed priors that are invariant against transformations.*
# Isn't one of the Bayesian goals that of allowing to systematically incorporate prior information into inference?
No, you turned this around a bit. It is not to incorportae prior information into inference, it is to incorparate the current information into our beliefs (or knowledge). So we do have some beliefs before seeing the data, and the data exhibits a momentum to change our beliefs to a state after accounting for the new data. The data does not tell us where we *are*, it just tells us how far we have to move into which direction. You can see data as being a "force" that acts on masses. Defining a force makes sense only in relation to masses. No masses, no forces. The data changes the impulses, but it does not determine what impulse any mass has to have. This depends on the inpulse the mass had before the force was acting on it.
The Bayesian goal is to provide an objective and systematic way to calculate apply the "force" on a given "impulse".
---
* you may imagine a binomial experiment to infer a proportion (p). You may argue that the uniform for p in [0;1] is an uninformative prior. You can express the proportion as an odds ratio, or = p/(1-p). The prior chosen above won't be uniform on the scale of the odds ratio. Which one is correct? Jeffrey's prior for the binomial is invariant under this transformation. So giving the posterior obtained via Jeffrey's prior would be convertible between the proportion and the odds ratio. Hence, "non-informative" means that the posterior won't depend on the way the data is interpreted (as proportion or as odds ratio). I personally doubt that there is any "really objective" non-informative prior. That would be like attempt to describe the frequency of a wave without giving any time scale ("fequency" is an entity that exists only in conjunction with "time"; as soon as I remove "time" from my model it makes no sense or it is impossible to talk about "frequency").
  • asked a question related to Bayesian Inference
Question
3 answers
The idea is to use the Bayesian approach to estimate the distributions of spaces between uniform and exponential order statistics. Are there any papers or articles on this subject?
Thanks so much.
Relevant answer
Answer
Check this link :
The best friendly book on Baeysian statistics I have red so far. Probably easier as a first read while containing loads of information plus some code.  You can also find some stuffs on the website.
Good luck with that
  • asked a question related to Bayesian Inference
Question
22 answers
I checked the web and found no clear definition on how these various statistical methods differ from each other and how they are estimated. Could anyone elaborate a little bit?
Relevant answer
Answer
Hello Charles,
There are a lot of different methods for making a phylogeny. Below is an answer I had to another question asking about different methods for making trees. But in short maximum likelihood and Bayesian methods are the two most robust and commonly used methods. Neighbor joining is just a clustering algorithm that clusters haplotypes based on genetic distance and is not often used for publication in recent literature.
"Neighbor joining and UPGMA are clustering algorithms that can make quick trees but are not the most reliable, especially when dealing with deeper divergence times. These method are good to give you an idea about your data, but are almost never acceptable for publication. 
Maximum parsimony and minimum evolution are methods that try to minimize branch lengths by either minimizing distance (minimum evolution) or minimizing the number of mutations (maximum parsimony). The major problem with these methods is that the fail to take into account many factors of sequence evolution (e.g. reversals, convergence, and homoplasy). Thus, the deeper the divergence times that more likely these methods will lead to erroneous or poorly supported groupings.
Maximum likelihood and Bayesian methods can apply a model of sequence evolution and are ideal for building a phylogeny using sequence data. These methods are the two methods that are most often used in publications and many reviewers prefer these methods. The main downside of these methods is that they are computational expensive. However, with the today's computers this is not too much of a problem (except for some next-generation sequencing methods).
For maximum likelihood MEGA is not the fastest method out there. For example, if you have a hundred samples for 16S, RAxML will complete is a matter of minutes whereas MEGA will take hours. This will become more of a problem with the more data you include. MEGA also does not have Bayesian method, thus, if you choose to use a Bayesian method you will have to look elsewhere.
If the phylogeny is the main focus of your work, my suggestion is to make both maximum likelihood and Bayesian trees. For these methods you will need to choose a model of sequence evolution. The best way to do this is to use JModelTest, in which you simply input your alignment and it will tell you the best model for your data (you will have to do this if you use MEGA or other software to make your tree). From there you can run a maximum likelihood tree (I would use PhyML or RAxML) and a Bayesian tree (I would use Mr.Bayes or BEAST). 
If you do not want to download these programs or do not have access to a computer that you can dedicate to making trees I would suggest using the CIPRES online portal. 
This portal will give you access to fast computers that can run in parallel and has all of the software I have mention above in an easy to use point and click environment. This is a great resource to use and is free (they just ask that you cite them).
Hope this helps and if you have any questions please ask.
Max"
What is the best phylogenetic tree construction method for bacterial identification? - ResearchGate. Available from: https://www.researchgate.net/post/What_is_the_best_phylogenetic_tree_construction_method_for_bacterial_identification [accessed Aug 24, 2015].
Which tree model (Maximum Likelihood or Neighbor Joining) is better to use with mitochondrial control region haplotypes? - ResearchGate. Available from: https://www.researchgate.net/post/Which_tree_model_Maximum_Likelihood_or_Neighbor_Joining_is_better_to_use_with_mitochondrial_control_region_haplotypes [accessed Oct 27, 2015].
  • asked a question related to Bayesian Inference
Question
3 answers
Bayesian evolutionary analysis, I have run my experiment for 100 million generations, but the ESS is still below hundred. What can I do?
Relevant answer
Answer
Increasing thinning intervals could help to reduce autocorrelation, although some authors discourage thinning technique.
Inclusion of parameter extensions is probabbly what you`re looking for. Some packages, as MCMCglmm (in R), already include easy ways to implement parameter extensions in the variance structure.
  • asked a question related to Bayesian Inference
Question
3 answers
We have a poolof hairs (around 10 for each reference sample coming from one individual) representing 15 individuals (e.g.).
We can characterize each of them by microscopy for several morphological characters, and by microspectrophotometry for colours informations.
These methods results for each hair in one set of discontinuous/qualitative data (morphological characters) and for the same hair in one set of continuous/quantitative data (colours informations). We can analyze them separately. That is not a problem.
But how can we analyse the two sets in a pooled matrix (combining qualitative and quantitative data) following a standardized protocol (that could be reused latter, like that)?
The questions we need to answer are :
- to test if all hairs coming from the same people cluster in the same group;
- for an unknown sample (of one hair at minimum), to search the group from which is the closest;
- and of course, to have a statistical estimation of the validity of the clusters or the similarity between unknown hairs and the closest clusters.
What is the best way to do that and the best software easy to use? (like XlStat?)
Thank you for your suggestions and ideas.
Relevant answer
Answer
I think the Taguchi-Maharanobis strategy is proper for your inquiry.
We consider each reference samples as each classes and one individual as the other class. This is the same as one-class SVM.
I developed the optimal LDF on the minimum number of misclassisications (MNM).
We discriminate the data consists of reference samples with one individual.
We evaluate the discriminant scores by obtained LDFs and choose reference calass with the most biggest t-value.
I am willing to co-work.
  • asked a question related to Bayesian Inference
Question
8 answers
Many people apply all different kinds of phylogenetic approaches (Neighbor-joining, Maximum likelihood, Bayesian inference) to identify genes/proteins of interest in different groups of animals and/or plants. Additionally, depending on the method used, various kinds of substitution models are available (AIC, BIC, LRT). 
Let's assume that I want to generate robust gene trees to identify orthologs and paralogs of genes of interest in various distantly related species. These trees might contain a few sequences or whole gene families (e.g. homeobox genes...). What is the best suited approach and model for that? Does anyone have good recommendations for reviews/publications that address this problem?
Thanks in advance! 
Relevant answer
Answer
Are you trying to identify gene families? Or do you already have families and are trying to understand the relationships therein?
If you are trying to find the families to explore you can use a clustering algorithm, such as OrthoMCL, to create the families.
To parse the relationships within the gene families, I would think the best approach would include blasting all of the organisms/genomes of interest with known members of the family, aligning the hits with the queries, and treeing the resulting alignment. Depending on how you define the term "family" you would also want to include known negatives in the analysis as well. I.e. a closely related families that are not part of the one you are examining. This will allow you to see the boundaries of yours and not misidentify divergent genes as new versions in the family. 
In contrast to Ariel Chipman, I favour ML and Bayesian methods. NJ and other less computationally intensive tree-building tend to fall victim to artefacts in complex data more easily than ML and Bayes, such as long branch attraction (although they have their failings, too). I would interpret polytomies as he suggests, but would conclude that NJ is falling prey to the data rather than answering a different question. But I do not aim to start a fight. Really, we would hope that each method would yield similar or identical results. That is how you can be most confident. ML will work best if you use an appropriate model. Programs like jModelTest (DNA) and ProtTest (protein) can give you an estimate of the best model. I prefer the AICc criterion for my DNA sequences, but ideally AICc, BIC and DT will all agree on the same model. Bayesian programs, such as MrBayes can either be instructed to use this model or you can skip the model testing and instruct the program to select the best model as part of its run using the "mixed" parameter.
  • asked a question related to Bayesian Inference
Question
9 answers
I am constructing a bayesian  with the following form to estimate the parameters from the responses:
P(parameters I responses) is proportional to P(responses I Parameters)*P(parameters)
Then I use MCMC to draw sample from the posterion.
My problem is that I have more than one parameter and several responses, which may be correlated. since the responses are correlated I cannot use the joint probability for the likelihood like the following: 
P(responses I Parameters)=P(responses1 I Parameters)*P(responses2 I Parameters)
I was wondering if anybody can nicely help me with this problem.
Relevant answer
Answer
If you do not normalize data and extract components from covariance instead of correlation matrix you keep alive the initial standard deviation of variables.
  • asked a question related to Bayesian Inference
Question
4 answers
Model discrepancy in such cases?
Relevant answer
Answer
A likelihood function equal to a Dirac delta is a degenerate case, as the Dirac delta is not a function. It would mean that, after the observations, you are completely sure about the value of the parameter and all possible uncertainty is removed. In theory you can deal with such cases, but in practice, they are imposible. If your question arise from a practical situation, the observation model should be reviewed.
For such likelihood functions, discrepancy analysis is also degenerate, as the final (posterior) distribution of the parameter/s is concentrated in a single value of the parameter/s.
  • asked a question related to Bayesian Inference
Question
9 answers
Hi All,
I am working with mtDNA and want to know how I can determine burn-in generation and number of chain in MrBayes and other Bayesian inference software?
thanks
Hossein
Relevant answer
Answer
After your run finishes a file is generated called pstat. There are statistics for all of your parameters in there. The manual for v3.2 suggests That minESS for each variable should be 200 or greater and that PSRF should be 1. If memory serves, ESS is effective sample size (you have enough samples from the run to make a judgement) and PSRF is an estimate of how similar the two chains are (converged).
  • asked a question related to Bayesian Inference
Question
3 answers
after running my data, I was surprised to get some results > 1 (ie, more than 100% "certain").is that possible or is there something wrong in my data/parameters?
Relevant answer
Answer
Tell us what you are computing.    The answer to a Bayesian question is usually a posterior probability density, which is always normalized so that the INTEGRAL over all parameter space is equal to one, but the density values are dependent on the scaling (units) and can take on any (positive) values.
  • asked a question related to Bayesian Inference
Question
5 answers
I have a dataset of COI sequences and I'd like to obtain Bayesian Skyline Plots (BSPs) with BEAST for my populations. I made 5 replicates runs obtaining 5 .log and 5 .trees files. I used LogCombiner 2.2.0 to obtain single .log and .trees files from the 5 replicates, in order to construct the BSPs with Tracer. LogCombiner was able to construct a combo file for trees, but it did not for .log files. The program stops without producing any file and without giving any error message. Actually it made nothing....! Any suggestion or hint?
Relevant answer
Answer
Thanks everybody! LogCombiner seems to work when I use an old version (1.7).
  • asked a question related to Bayesian Inference
Question
6 answers
Does anyone know a citable paper in which the marginal likelihood of a normal distribution with unknown mean and variance is derived?
A short sketch of how the procedure should look like: The joint probability is given by P(X,mu,sigma2|alpha,beta), where X is the data. Rearranging gives P(X|mu, sigma2) x P(mu|sigma2) x P(sigma2). Integrating out mu and sigma2 should yield the marginal likelihood function.
I found several paper which work with the marginal likelihood for the linear regression model with a normal prior on the beta and an inverse gamma prior on the sigma2 (see e.g. (Fearnhead & Liu, 2007)). Or deriving the posterior distribution of the unknown parameters, but not the marginal likelihood.
I hope the question was understandable and anyone may help me.
Greetz,
Sven.
Relevant answer
Answer
Hey,
so as far as I can tell the derivation in Xuan agrees with my point - that you can't integrate over an inverse gamma prior on the variance unless the prior on sigma2 is linked to the prior on the mean (beta in this case). The horizontal arrow in Figure 2.1 of Xuan is crucial! It is this assumption that makes 2.22 still come out as Gaussian with variance proportional to sigma2, and not some horrible combination of parameters.
The fact that the priors are linked is very important, and often under-appreciated. To understand this, imagine a slider that allows you to change the value of sigma2 in the Xuan model. As you move this slider you change the variance of your likelihood AND the variance of your prior on beta. This is not the same as a model with independent priors on beta and sigma2, in which moving the slider would have no effect on the prior on beta.
Connecting the priors up in this way is exactly equivalent to assuming a joint normal-inverse-gamma prior on both parameters - in other words the results of Xuan and Greenberg that you found are identical (notice that Xuan 2.25 is the same as Greenberg's final answer, aside from a change of variables).
As for what you want to do, I am slightly confused. You want to assume a Gaussian likelihood and integrate over a normal-inverse-gamma prior on means and variances? If so then isn't the result in Greenberg already exactly what you want?
I'm interested to hear where this goes!
Bob
  • asked a question related to Bayesian Inference
Question
1 answer
I need an example for this package
Relevant answer
BUGS seems easier and here is a free version: http://www.openbugs.net/w/FrontPage
  • asked a question related to Bayesian Inference
Question
9 answers
Many might first think of Bayesian statistics.
"Synthetic estimates" may come to mind. (Ken Brewer included a chapter on synthetic estimation: Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press.)
My first thought is for "borrowing strength." I would say that if you do that, then you are using small area estimation (SAE). That is, I define SAE as any estimation technique for finite population sampling which "borrows strength."
Some references are given below.
What do you think? 
Relevant answer
Answer
Dear Jim, if I may,
Roughly, small area estimation is one of the statistical techniques (including the estimation of parameters for sub-populations), used when the sub-population in question is integrated in a larger context.
  • asked a question related to Bayesian Inference
Question
3 answers
I have selected representative strains from each cluster in my Maximum Likelihood tree for divergence dates estimation but I made sure I covered the entire time period of detection of these strains (1975-2012). Will my exclusion of some of the strains affect my estimated evolutionary rate and times of divergence at the nodes after BEAST analysis?
Relevant answer
Answer
Since it is a bayesian method, anything related to the data influence the output. Excluding some taxa should not influence so much the estimate, but the distribution of them across the years may do. If you have more taxa from a specific range of time, this may drive the estimate towards one direction rather than another. The punctual estimate may not change, but then you can have a broader 95% HPD.
  • asked a question related to Bayesian Inference
Question
7 answers
I have a maximum clade credibility tree drawn and I am trying to get the age range at each node but I don't seem to get it right. How do I get that? I have the node bars displayed but I need the range displayed in years at the node as well. Or how do I interpret [10, 24.58] displayed at the node of my tMRCA?
Relevant answer
Answer
To expand a little bit on what Thomas said: 
1. Check the box for "Node Labels" to display values at the nodes
2. At the drop-down menu for "Display:" choose "height_95%_HPD"
The numbers given here will be the upper and lower values within the 95% highest probability density recovered from your (assuming) BEAST run. So, [10, 24.58] means that the 95% HPD for that node is between an age of 10 (assuming millions of years, but depends on what you put in here) and 24.58 (Mya). This range is exactly what is shown by the bars that are given when you select "Node bars" and choose Display -> "height_95%_HPD".
Generally, I think the node label output from FigTree can be a little messy when you are dealing with ranges - you might want to go and edit these in Illustrator. 
  • asked a question related to Bayesian Inference
Question
10 answers
I am trying to run a JAGS model with two model comparison (both binomial dbern models with several categorical predictors). I am using the rjags package.
Could anyone suggest or give some references how to write such a model? I would greatly appreciate your input.
Relevant answer
Answer
As mentioned you can recourse to DIC, Posterior Predictive checks. Have a look at the Bugs book and Gelman's book "Bayesian Data Analysis"
  • asked a question related to Bayesian Inference
Question
38 answers
There are plenty of debates in the literature which statistical practice is better. But both approaches have many advantages but also some shortcomings. Could you suggest any references that would describe which approach to choose and when? Thank you for your valuable help!
Relevant answer
Answer
There are lots of papers on this which will be a better way to inform your opinion than a small number of brief responses. Maybe if we list examples of these. I'll start with Efron at http://statweb.stanford.edu/~ckirby/brad/papers/2005BayesFreqSci.pdf which I think provides a fairly direct answer to your question from someone whose opinions about statistics are much better to listen to than mine!
  • asked a question related to Bayesian Inference
Question
5 answers
hi all,
do you know any published study that report that it would be better (for forecasting accuracy) to have ONLY discrete data in Bayesian Network? or, alternatively, continuous data may lead to inference problem??
Relevant answer
Answer
Dear Prakash;
i agree with you; i was just hoping that such comparison does exist; it seems that i should do it myself to find out;
kind regards;
Amer.
  • asked a question related to Bayesian Inference
Question
6 answers
I want to make a control chart in R. The LCL, UCL and CL are found by Bayes estimator and use Weibull Distribution. The CL could be found by the Trapezoid rule. How could I make a control chart with R? Is there any syntax or package or journal that I can use or read to solve this problem? Thanks
Relevant answer
Answer
I'm sorry Mr, i don't know mure about you.... 
But i really thanks to you for your attention to me so far...
I'll send you my script and data...
Thanks before for your help
  • asked a question related to Bayesian Inference
Question
4 answers
Im working with continuous characters to generate a tree, so far I have tried MLtraits and Parsimony (TNT).  I was wondering if there is a way to run a Bayesian Analysis without discretizing  the data.
Thanks, 
Melissa
Relevant answer
Answer
Melissa, it seems that MrBayes can do what you ask. I have never tried that, but at least there is an option for specifying continuous characters in the input nexus file ("Datatype = Continuous"):
  • asked a question related to Bayesian Inference
Question
4 answers
Hi. In MrBayes there is concept of partition where we have to create a subset of codon may anyone tell me how to decide how to know how many subset there will be there for a codon position. If I am analysing whole genome how can I divide it in subset?
Relevant answer
Answer
Hi Amol,
If you have a whole genome, are you sure MrBayes is the best approach to use? It can take quite a long time for the runs to converge and provide reliable results with a whole genome dataset. Any other Bayesian approach (e.g. BEAST) will have more or less the same problem. You might want to try running a ML analysis using RAxML for example if your aim is to generate a phylogenetic hypothesis.
Anyway, if you want to proceed with MrBayes, you have to arbitrary choose how to divide the data, like by loci, or by loci and codons within each loci. Here's an example using two concatenated loci (A and B), the first with 1000bp and second with 500bp.
By loci only:
charset lociA = 1-1000;
charset lociB = 1001-1500;
By loci and codons:
charset lociA1 = 1-1000\3;
charset lociA2 = 2-1000\3;
charset lociA3 = 3-1000\3;
charset lociB1 = 1001-1500\3;
charset lociB2 = 1002-1500\3;
charset lociB3 = 1003-1500\3;
Now for deciding what's the best partition strategy for your data (instead of an arbitrary choice), the best option will be using PartitionFinder: http://www.robertlanfear.com/partitionfinder/
That's a really nice software, and it's definitely better than "guessing" the best partitions strategy.
I hope that helps!
Cheers,
   Fabricius
  • asked a question related to Bayesian Inference
Question
4 answers
For a classification problem, consider two steps: training and testing.
Assume that the Bayesian network classifier is required to be designed based on data.
At first, uniform distribution is assigned to each node because there is no prior information. Then in training step, the parameters of each node is determined based on Maximum Likelihood estimation.
Is it correct to say learned parameters define posterior distribution?
In addition, the next step is testing. P(Class | evidence) is calculated for each feature vectors and the posterior distribution is evaluated for each class node. I would assign the feature vector to any class that have the highest probability.
Is it called the maximum a-posteriori?
Relevant answer
  • asked a question related to Bayesian Inference
Question
3 answers
Dear all,
I have been thinking about the following: I have a set of models which share the characteristic that they have each two rate parameters and another parameter that has a similar interpretation in all models. Now each model is mixed with a different model of a mechanism to explain aspects that are not covered by the common part. The data I have to compare these different mechanisms is a special case for the common part of all models (not related to the mechanism I'm interested in): both rate parameters have to be equal and the third parameter has to be zero.
So my question is now: when comparing the models (transdimensional mcmc) to figure out the best mechanism, should I use the simpler parametrization (with only one rate paramter and disregarding the parameter which is zero) since my data has this special case?
Or should I use the full model (which, by the way, when compared against the simplified model in this special case data is worse regarding the Bayes factor).
I know the question is a bit abstract, however, I feel adding concrete model details would rather confuse the issue. Since I'm new to this kind of analysis, maybe this is a common case with a common solution? Although I couldn't find anything, yet.
I'm very intersted in your opinions!
Thanks in advance,
Relevant answer
Answer
Jan,
I'm not an expert on this but the answer seems to me to be that you should go with the model with the simplest parametrization because as you say "data I have to compare these different mechanisms is a special case for the common part of all models (not related to the mechanism I'm interested in): both rate parameters have to be equal and the third parameter has to be zero."
Other than that, in the future I guess you may want to apply a full parametrization to new data that do not fall in the special case you describe.
Hth,
cheers,
SCM
  • asked a question related to Bayesian Inference
Question
4 answers
I want to use Mr Bayes for Bayesian inference model choice across a wide range of phylogenetic and evolutionary models (MCMC analysis). Which software will be best to convert the alignment files into the nexus file and thus to support Mr Bayes? Can anybody teach me the details about Mr Bayes.
Relevant answer
Answer