Science topic

Bayesian Analysis - Science topic

Explore the latest questions and answers in Bayesian Analysis, and find Bayesian Analysis experts.
Questions related to Bayesian Analysis
  • asked a question related to Bayesian Analysis
Question
5 answers
Dear all,
I want to conduct a mutilevel regression analysis with three levels with a binary outcome in Mplus. I am able to set the script, but I need to choose Bayes as estimator and the results seem to be quite fragile. I am not experienced with Bayes estimation so far and am not sure yet whether or not the results I obtain are trustworthy.
Now, I realized that in the Mplus User's Guide, it is said: "For TYPE=THREELEVEL, observed outcome variables can be continuous. Complex survey features are not available for
TYPE=THREELEVEL with categorical variables or TYPE=CROSSCLASSIFIED because these models are estimated using Bayesian analysis for which complex survey features have not been
generally developed." (page 262). To be honest, I have problems understanding what that means for my model. I do not need any further commands to the model (e.g., weighting), so, I am not sure whether or not that should worry me.
Can anyone tell me if I need to interpret results from Bayes estimation (from threelevel analysis with a binary outcome) any different? Or can anyone suggest specific literature?
In addition, I know that there is a debate whether or not binary outcomes may be treated as continuous data. Whenever possible, I always tried to treat binary outcomes as categorical. However, may treating the binary outcome as continuous be an option for me here?
(As an extra question: Does anyone know if I can perform threelevel analysis with binary outcomes any easier in R? I have only very basic knowledge in R, but maybe I could work with that, as well, if it is better suitable for this particular analysis.)
Any help is appreciated :-) Thanks!
Saskia
Relevant answer
Answer
This model/type of analysis is very complex, and you mention that the results appear to be "fragile" or unstable. I would contact the Mplus support directly. They will be happy to look over your output to see if the results can be trusted. They can also give you further advice on how to best approach this type of analysis in Mplus.
  • asked a question related to Bayesian Analysis
Question
3 answers
I am trying to understand and use the best model selection method for my study. I have inferred models using swarm optimisation methods and used AIC for model selection. On the other hand I am also seeing a lot of references and discussions about BIC as well. Apparently, many papers have tried and concluded to use both AIC and BIC to select the best models. My question here is, What if I use ABC along side with AIC and BIC, how this will effect my study in better way and what would be its pros and corns of using ABC, BIC and AIC as model selection methods ?
Thanks
Relevant answer
Answer
First let me comment on a couple of things. See the first screenshot for some cautions to doing what you suggested. Now the second attachment shows what we do.. That has worked well for us in our application. Best wishes David Booth
  • asked a question related to Bayesian Analysis
Question
6 answers
Does this area has the potential for research?
Relevant answer
Answer
I think the main reason is related to their complexity. For example, I tried hard to reach the result of the following paper:
but I was not able. Although better results than conventional monitoring methods may be acquired, Bayesian methods lead to several complexity.
Best regards.
  • asked a question related to Bayesian Analysis
Question
2 answers
JModel Test suggested HKY+G and TPM3uf+I+G as best fit models. I used the following set up for HKY+G model block preparation in MrBayes and it worked perfectly-
Begin mrbayes;
set autoclose=no nowarn=yes;
lset nst=2; lset rates=gamma;
prset tratiopr=fixed(4.99);
prset statefreqpr=fixed(0.2135,0.2819,0.2094,0.2953);
prset shapepr=fixed(0.1610);
mcmc ngen=1000000 nchains=4 relburnin=yes burninfrac=0.25 printfreq=1000 samplefreq=1000 savebrlens=yes;
mcmc;
sumt;
end;
But not sure how to set the TPM3uf+I+G model in MrBayes. Any suggestion would be highly appreciated.
Relevant answer
Answer
João Machado thanks heaps for your suggestions. I will try with the replaced model first, if that doesn't work then will go with MrAIC.pl
  • asked a question related to Bayesian Analysis
Question
3 answers
I use STATA to undertake Bayesian analysis, but I'm not certain what or where the posterior probability is in the Stata output. The Stata Bayesian Analysis Reference Manual is a manual I've tried reading repeatedly, but it hasn't helped me get a handle on this problem. I discovered one study in Malaysia employing it, but I still do not comprehend it. Please help resolve this matter.
Relevant answer
Answer
Have a look at JASP, which is free, focussed on Bayesian analysis, and does very nice graphical summaries.
  • asked a question related to Bayesian Analysis
Question
4 answers
Can anyone recommend R package(s) dealing with Bayesian analysis with Dirichlet process please? Thanks!
Relevant answer
Answer
Many thanks for the suggestions. I will check out those packages!
  • asked a question related to Bayesian Analysis
Question
2 answers
Are there any advantages to using a Bayesian approach for the analysis of zero inflated count data, specifically dental caries data? Any references that would support that analysis?
Relevant answer
Answer
It really depends where the zeros are coming from. Bayesian models have the principled advantage that one can integrate prior knowledge into the data in a natural way.
For example, assume that a rare event (e.g. a rare disease) did not occur in your study. With frequentist statistics, your estimate for the probability of this event would be zero. However, if you have prior information from previous studies telling you that the probability of this event is, say 0.01, then you could use a prior distribution peaked around 0.01 and combine this with your data. Then, after combining your data (Likelihood) with the prior, the posteriors probability distribution of this rare disease probability would have a peak somewhere below 0.01 (because it did not happen in your data). However, the event would still not be considered impossible, just rare.
As Kerav said, it depends what the situation is. But if this is your problem, you could go for a Bayesian model. Otherwise, ZIP models are also a way of dealing with it.
  • asked a question related to Bayesian Analysis
Question
3 answers
Is there a clear justification or any advantage to using a Bayesian approach for the analysis of zero-inflated count data? Specifically dental caries
Relevant answer
Answer
A justification would be if you had prior information that you want to incorporate in the analysis in a principled way. Or if you want to make probability statements about parameter values, what requires to fromulate some prior knowledge about these parameters (frequentists analysis make probability statements only about the data).
  • asked a question related to Bayesian Analysis
Question
3 answers
Hey everybody!
I'm implementing a Bayesian Negative Binomial using STATA 17. Because of some colinearity or convergence issues, I needed to put my variables in different blocks in the modeling process. Yet, it is a bit confusing to choose the most optimum number of blocks (and also their exact set of variables) for a model. Do you have any idea about it?
Apart from that, what criteria do you suggest (DIC, Acceptance rate, Efficiency, variables significance, etc.) for comparing models developed using various number of blocks?
I appreciate any help you can provide in advance.
Relevant answer
Answer
I don't quite understand the "blocks" portion, but if you are trying to split variables it may be the case that the pathologies you have observed (poor convergence) that you should Google more about "identifiability" in the context of modelling.
I may be too far removed from academia since leaving for industry, but quite frankly all the criteria you mentioned are garbage. What is important is what the posterior distribution looks like (which you already have as a precursor to those criteria) and how it behaves. So take the samples and do that fun part of science where you "play" and look into what the predictions do and change when you fiddle with things and try to gain insight
  • asked a question related to Bayesian Analysis
Question
3 answers
Which model is better for investigating outcome – exposure relationship spatially?
· Data are counties
· Dependent variable is incidence per county
· Independent variable is median of XXX per county
· Spearman’s correlation, significant and negative
· Spatial autorelation of incidence values close to zero
· Local clusters, detected but majorities are single counties
· Geographically weighted regression, local coefficients mix of positive and negative but global regression coefficient negative
With this background, we should go for geographically weighted regression or bayesian convolution model with structured and unstructured random effects.
Relevant answer
Answer
I am not convinced that countries are in general an appropriate level for measuring exposure due to large within country variations. I would need to know the context in which you are doing this work.
  • asked a question related to Bayesian Analysis
Question
4 answers
I am looking for papers / sources that outline any kind of general rule for a minimum sample size (real observations) for using the bootstrap.
For now I am using it to generate the CI for a mean. Would be nice to know if there are minimum 'n' suggestions for other operations like correlation etc. PM for code etc.
Relevant answer
Answer
Dear Cory,
I'm not aware of any book or paper addressing this, but am quite confident there is no magic number as it depends on the data and research goal.
The assumption of bootstrap is that the sample is representative of the population, and the probability of this assumption holding is proportional to sample size. What happens when getting bootstrapped confidence intervals with small sample sizes?
RESOLUTION
Extremely small sample sizes, means bootstrap samples will just be repetitions of the same combinations. For example, bootstrapping 3 values give only 3^3 (27) possible combinations. If we try to get confidence intervals from those, there are only 27 possible values to choose from. Semi-parametric or parametric bootstrapping will smooth this out but make more assumptions...
SPURIOUS RESULTS
Doing bootstrapped confidence intervals with these will often result in spurious results (confidence intervals that exclude the population mean). A quick simulation matching the design of your project should be able to help guide you. Here's a quick demonstration using R for estimating a means and 95% C.I. from a population with a normal distribution N(0,1).
##### R code start #####
# define number of simulations
iter <- 1000
# sample sizes used
n <- 2:20
# simulate data sets from a population with mu = 0, sigma = 1
# and bootstrapped 95% CI
sim.mean <- sim.lwr <- sim.upr <- matrix(NA, nrow = length(n), ncol = iter)
for (i in 1:length(n)){
# simulate a sample
set.seed(321321) # use same random seed for each sample size
xi <- replicate(n = iter, expr = rnorm(n = n[i]))
# calculate mean and bootstrapped CI
sim.mean[i,] <- colMeans(xi)
sim.lwr[i,] <- apply(xi, 2, quantile, probs = 0.025)
sim.upr[i,] <- apply(xi, 2, quantile, probs = 0.975)
}
# calculate how many times the bootstrapped confidence intervals excludes the population mean
out <- vector(length = length(n))
for (i in 1:length(n)){
out[i] <- sum(sim.lwr[i,] > 0 | sim.upr[i,] < 0)
}
##### R code end #####
For this particular simulation, we see that with extremely small sample sizes (e.g. 2-3), estimated means get as far as 2SD from the population mean, and the (bootstrapped) confidence intervals exclude the true population mean very often!
For this scenario, I would definitely go above 5 but most likely more depending on the research goal and data...
  • asked a question related to Bayesian Analysis
Question
2 answers
I am trying to set models of evolution for a DNA dataset using jModelTest2.
Some partitions, when run multiple times, will return different best fit models (ex. F81 G, TPM1ufG, and F81 G after three runs).
Which model should I accept in this scenario?
I've heard a recommendation to select the simplest model if the output is inconsistent, but that was unsourced and in a few cases the models with have the same complexity.
*edit: By complexity, I mean qualifiers like G or I
Relevant answer
Answer
Interesting and challenging question. At first, I would say that depending on the specific aim of your phylogeny, slightly different models should not play the main role in taxa relationship estimation. Usually, the framework (tree prior + data) does. By the way, please, make sure you are not "ordering" your output by different criteria.
Perhaps some colleagues will not agree with this, but I would suggest you compare AIC (AICc), BIC, and ML values from these different runs. Theoretically, you would pick the one with the lowest AICc values even from different runs (considering you did that under the same conditions). ML values and BIC should give you some directions, as well. In addition, I would recommend you look at some different model estimation frameworks, just like bModelTest from the BEAST package.
The problem in bModelTest is that to test the whole set of models you will expend a lot of time. Another cons is that you may not be able to test partitions like in partitionfinder (which is, by the way, another option). The bModelTest authors recommend you constrain the set of models to be tested. In any case, with bModelTest you will end with a probabilistic averaged metric of which family of models you would pick up.
It might be not the most orthodox solution, but will give you some reproducible direction.
Good luck
  • asked a question related to Bayesian Analysis
Question
12 answers
I want to use BEAST to do EBSP analyses with two loci. I open two "input.nex" files in BEAUti to generate a "output.xml" file (In the Trees panel select Extended Bayesian skyline plot for the tree prior), and then run BEAST. I do not know if this is right and I do not know what to do next. I can not construct the trend of demographic history in Tracer just like BSP. I got one log file but two trees files (for each locus), and I do not know how to import both tree files into Tracer.
Relevant answer
Answer
One of the best websites to find the answers to these questions is the following link:
  • asked a question related to Bayesian Analysis
Question
4 answers
The only interpretation of the Bayes Factor calculated using the output of BayesTraits is this:
<2 Weak evidence
>2 Positive evidence
5-10 Strong evidence
>10 Very strong evidence
Is there a reason why Bayes Factor won't get a negative value?
Also, is a Bayes Factor value that provides 'negative' evidence, because they way I see, weak evidence is still evidence in favour of the dependent model, but not strong enough. Or is this not the case?
Many thanks in advance!
  • asked a question related to Bayesian Analysis
Question
2 answers
I am the owner of a luxurious mall which caters to the luxury segment (sells only branded goods with high prices) in one of the posh areas in the city. I have given the space to rent on a quarterly basis based on the a) possible revenue generation potential b) rent the shopkeeper is willing to pay per square feet c) how many customers they shop draws to the mall. Since the past historical data I have is just an estimate for all these factors I decide to treat the information as prior knowledge with the prior mean being computed using past historical data. Now what type of prior pdfs shall I use? I have to think on the line of a Bayesian and create a strategy for giving space for different shops in the mall so that I see long term profit.
Relevant answer
Answer
Dear Gopal,
this sound somewhat more complex than it looks at first sight... Anyway, in addition die Andrew's suggestion I would like to mention, that the type of distributions should be different between these factors. With:
revenue (probably Normal, Exponential)
rent (probably normal)
Coustomer Count (Poisson, most likely with over-dispersed error term, because people often do not go shopping alone, but with a friend, such that 'if the shop draws one customer, another one joins automatically'. In other words, you have a third 'unknown' variable which leads to correlated events, thus overdispersion)
Now I am speculating about what you actually try to do:
If your goal is to generate "predictions" about the revenue of a potential shop and use it as criterion to decide whether to give space (yes vs no), then the first step probably would be to define some model, which uses space, customer count and rent-willing-to pay as predictors, and revenue as predicted variable. Then optimize the model (fit) based on existing data. With this, you have an estimate the above mentioned distribution characteristics (mean and variance). This gives you a range of posterior predictions (e.g., marginal posterior of each predictor's influence, or combined). How much revenue is enough to decide, is of course, your decision. But with this model (and the estimated predictor weights, including all their posterior uncertainty), you could use the distribution posteriors directly to predict revenue with any constellation of the predictors you give into the model, and see wether the credible interval (i.e., the range of the predicted revenue) passes your decision cut-off. The priors for the predictors then directly come from the posteriors (i.e., you can use the samples from the fitted posterior distributions, as if new renters are from the same renter-population).
Hope this helps.
Best
René
Ps: If you make profit based on my suggestion. Thankful donations are welcome ;))
  • asked a question related to Bayesian Analysis
Question
2 answers
Its going to be a huge shift for marketers, tracking identity is tricky at the best of times with online/offline and multiple channels of engagement - but when the current methods of targeting, measurement and attribution get disrupted, its going to be extremely difficult to get identity right to deliver exceptional customer experiences whilst getting compliance right.
We have put our framework and initial results show promising measurement techniques including Advanced Neo-classical fusion models (borrowed from Financial industry, Biochemical Stochastic & Deterministic frameworks) and applied Bayesian and Space models to run the optimisations. Initial results are looking very good and happy to share our wider thinking thru this work with everyone.
Link to our framework:
Please suggest how would you be handling this environmental change and suggest methods to measure digital landscape going forward.
#datascience #analytics #machinelearning #artificialintelligence #reinforcementlearning #cookieless #measurementsolutions #digital #digitaltransfromation #algorithms #econometrics #MMM #AI #mediastrategy #marketinganalytics #retargeting #audiencetargeting #cmo
Relevant answer
Answer
Here are a few ideas about how marketers can do this:
• Encourage site login by better authenticated experiences or other consumer-oriented rewards to increase the number of persistent IDs.
• Create a holistic customer view by combining customer and other owned first-party data (e.g., web data) and establishing a persistent cross-channel customer ID.
• Allow customer segmentation, targeting, and measurement across all organizations and platforms. Measurement and audience control can be supported by integrating martech and ad tech pipes wherever possible.
  • asked a question related to Bayesian Analysis
Question
3 answers
I am doing a Bayesian comparison between two proportions, H0 being Proportion(Protein)> Proportion(Mixed). Here the Proportion is of no. of times a free-ranging dog(s) ate from a box(Protein, Mixed). Being a binary variable (Eat from the box: Yes/No), it is of the beta-binomial family. In experiment 1, I use a uniform prior: Beta(1,1). The no. of successes (x) and the no. of failures(y) for Protein and Mixed are as follows:
x1= 19, y1= 25; x2= 8, y2= 33
So my beta posterior shape parameters for Protein are (19+1, 25+1) and for Mixed are (8+1, 33+1).
I am planning to use these posterior parameters as priors for my current experiment. The two experiments have the same set-up. The only difference being the no. of dogs (individual vs groups). I don't expect to see a radical difference in results.
So my question is, can i use the previous beta posteriors as current priors in the way I have written it down, i.e B(20,26) for protein and B(9,34) for mixed. (where, B stands for beta distribution).
Current experiment info: x1=41, y1= 47; x2= 41, y2= 49
Information about me: I come from a biology background with minimal math and programming knowledge. I have been learning Bayesian on my own for about 2 months now. I am willing to learn but math heavy explanations tend to go over my head. Feel free to point me towards relevant resources. I am using R to do the calculations.
Relevant answer
Answer
Yes of course.
  • asked a question related to Bayesian Analysis
Question
5 answers
I am a Ph.D. research scholar in forensic microbiology I need to find out the cluster of 16srRNA between various people.
#clusteranalysis #forensics #microbiology #forensicmicrobiology
Relevant answer
Answer
DataCamp has a course on that (with easy R application), you can follow the first module for free :
If you want to dig into the theory. A great free textbook for pretty much everything in terms of data analysis is here :
Good luck,
  • asked a question related to Bayesian Analysis
Question
1 answer
Hi all. I am trying to construct a Bayesian analysis and phylogenetic tree using MrBayes with a set of 85 sequences with an alignment of 900 bp. I am having trouble getting convergence and the analysis just keeps running. Here are my parameters. Does anyone have any ideas for different parameters that I could use to get convergence?
nst=6 rates = invgamma samplefreq=1000 nchains=4 stoprule=YES stopval=0.01 burninfrac=0.25 temp=0.25
Thank you!
Relevant answer
Answer
Convergence in many cases reach in infinite time, of course depending on your hardware specifics. So, either you can change the parameters and try again or just quit analysis after X iterations and mention the specific parameters in your publication.
Further, no one can say which parameter to change for convergence. It is very specific to the data you are analysing.
  • asked a question related to Bayesian Analysis
Question
1 answer
I am doing bayesian analysis of Partial CprM region of dengue virus serorype 3 using beast v.1.10.4 . In the end of the analysis i have to calculate tmrca and rate of nucleotide substitution. Do i need to specify the tip dates for each individual sequence while generating xml file using beauti? Does absence of tip dates affects the results? What if i proceed the analysis by leaving the tip dates alone ? Please suggest
Relevant answer
Answer
Arshi Islam You have too use tip dates to be able to infer the tMRCAs
  • asked a question related to Bayesian Analysis
Question
10 answers
I have the following equation, Eating choice (Yes/No)~ Box type (A/B/C). So the question is whether a certain box type affects eating choice in groups of dogs. The eating choice of each member in a group from one (two or all three) of the boxes combined will give me the final choice of the group. I want to carry out a bayesian logistic regression. So bernoulli distribution with beta conjugate to be used. I have data from a previous experiment (analysed with frequentist methods) on the same equation (EC~ BT), but on individual dogs. So I have the following questions:
How do I find out the correct priors to use for the groups of dogs equation in a bayesian glm? Do I run the data from my previous experiment (individual dogs) through a bayesian glm with weakly informative or a non informative priors and then use the upper and lower limits of the posterior distribution as the prior for my new data? If this is incorrect, what should I do? Feel free to point me towards relevant articles and resources.
Other relevant information: I am using brms package in R. I have a biology background with minimal math experience. I am open to learning but I struggle if the articles/resources are math heavy in explanation.
Thanks
Relevant answer
Answer
Sounds reasonable, all of it
  • asked a question related to Bayesian Analysis
Question
3 answers
Hello, everyone,
One of the aims of my current study is to place a particular plant species within the context of the whole genus phylogenetically. The systematic position of these plant species is well known. But the species that I have in hand is rare and endemic and its phylogenetic position is obscured.
I will launch a phylogenetic analysis of the DNA sequences of this plant species. I have a few DNA sequences of 5 markers. Please I need help and answers to the following points:
1- I did a BLASTn search, should I use Mega blast search instead?
2- If the retrieved list includes a plant species with several accessions, should I download all accessions or just one accession for each species is enough for the phylogenetic analysis?
3- What is the threshold of similarity percent to the query sequence I should select?
4- How many accessions needed to cover the diversity of the plant species under investigation?
5- Which phylogenetic analysis methodology suits systematic research, (i.e.) Bayesian analysis, Maximum likelihood, or something else?
Relevant answer
Answer
1- Any homology search tool should be fine, yes it's better to use megablast and even better to use protein sequences than nucleotide sequences in this case i recommend to use DELTA-BLAST (Domain Enhanced Lookup Time Accelerated BLAST).
2- Generally more representative sequences more robust tree you got. Also, more balance group-sequences representatives are recommended.
3- there is no similarity cut off but it's better to go for 90% and more similarity. keeping the sequence coverage in mind.
4- i think this point was covered is #2
5- Both Bayesian analysis and Maximum likelihood is recommended.
  • asked a question related to Bayesian Analysis
Question
5 answers
I'm trying to establish Bayes factor for the difference between two correlation coefficients (Pearson r). (That is, what evidence is there in favor for the null hypothesis that two correlation coefficients do not differ?)
I have searched extensively online but haven't found an answer. I appreciate any tips, preferably links to online calculators or free software tools that can calculate this.
Thank you!
Relevant answer
Answer
Is it possible to do a Bayesian reanalysis, from OR data, which are converted to r-correlation values to estimate the Bayes factor?
  • asked a question related to Bayesian Analysis
Question
1 answer
I wish to know the difference between the BN and Markov model. In what type of problems one is better than other?
In case of reliability analysis of a power plant, where equipment failures are considered, which model should be used and why?
Thank You!
Relevant answer
Answer
Dear Sanchit Saran Agarwal , Here is the answer
BAYESIAN
A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
MARKOV
An example of a Markov random field. Each edge represents dependency. In this example: A depends on B and D. B depends on A and D. D depends on A, B, and E. E depends on D and C. C depends on E.
In the domain of physics and probability, a Markov random field (often abbreviated as MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to be Markov random field if it satisfies Markov properties.
A Markov network or MRF is similar to a Bayesian network in its representation of dependencies; the differences being that Bayesian networks are directed and acyclic, whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies). The underlying graph of a Markov random field may be finite or infinite.
  • asked a question related to Bayesian Analysis
Question
1 answer
For a dynamic Bayesian network (DBN) with a warm spare gate having one primary and one back-up component:
If the primary component P is active at the first time slice, then its failure rate is lambda (P) and the failure rate of back up component S1 is [alpha*lambda (S1)].
If the primary component P fails at the first time slice, then its failure rate is lambda (P) and the failure rate of back up component S1 is [lambda (S1)].
My question is, above are the conditional probabilities of primary and backup component. In a DBN, prior failure probability is also required. What will the prior failure probability of back up component? Will it be calculated using lambda (S1) or alpha*lambda (S1)?
Thank you
regards
Sanchit
Relevant answer
Answer
  • asked a question related to Bayesian Analysis
Question
4 answers
I am completing a Bayesian Linear Regression in JASP in which I am trying to see whether two key variables (IVs) predict mean accuracy on a task (DV).
When I complete the analysis, for Variable 1 there is a BFinclusion value of 20.802, and for Variable 2 there is a BFinclusion value of 1.271. Given that BFinclusion values quantify the change from prior inclusion odds to posterior inclusion odds and can be interpreted as the evidence in the data for including a predictor in the model, can I directly compare the BFinclusion values for each variable?
For instance, can I say that Variable 1 is approximately 16 times more likely to be included in a model to predict accuracy than Variable 2? (Because 20.802 divided by 1.271 is 16.367 and therefore the inclusion odds for Variable one are approximately 16 times higher).
Thank you in advance for any responses, I really appreciate your time!
Relevant answer
Answer
If you have performed the analysis separately it may be an indirect inference that can be reported, although with the evidence the second predictor does not have any significant frequency effect, in case it is significant and you have performed the regression including both predictors you can refer a possible mediating effect of the first predictor.
  • asked a question related to Bayesian Analysis
Question
4 answers
Hi there,
I have ddRADseq data, with RAD-loci assembled de-novo using the STACKS pipeline for two species and their hybrids. As such, I was wondering how I could estimate Fst for each of those putative loci without using BayeScan. This is because many of my samples are hybrids which violate the model assumptions of the program.
I have seen a method mentioned which uses the ' Bayesian implementation of the F-model' by Gompert et al 2012, but I am unsure how to practically put this into use. I have also tried using outFLANK, but it needs a data-set containing no missing data.
Thanks a lot.
Rowan
Relevant answer
Answer
Hey! this is a super old question, but still here my answer in case someone else stumbles upon your question.
Fst per Locus can be calculated directly with STACKs in the population function/pipeline. Then, you can use that stuff to run OutFlank in R either using this script and your FST values: https://github.com/whitlock/OutFLANK/blob/master/R/OutFLANK.R
our run OutFlank on a genind object that you can convert in R from a genepop or structure input file using the import2genind function and then this other script to calculate all the parameters required from there: https://rpubs.com/lotterhos/outflank
Hope this helps! PS: I prefer the second...
  • asked a question related to Bayesian Analysis
Question
5 answers
I am having a difficult time modeling a binary dataset that my team has gathered. It is a dataset of different prescriptions per each patient and an outcome of a certain event.
The goal is to determine if certain prescriptions lead to decreased rates of alerts.
However, there are two main problems:
The data is fully binary without much overlap or interaction between columns
There is a lot of noise in the data, and the chance of the alert occurring is due to chance.
I have run the data through different algorithms to determine any trends or factors that may be important, but have not been very successful.
I was thinking of doing a simple bayesian analysis to determine if each setting has an impact on the outcome of the alerts, but would love to be able to involve many features if possible.
Is there anything I am missing or that I could try to determine the influence of different treatments?
Relevant answer
Answer
Take a look at this nice material about logistic regression (https://gking.harvard.edu/files/gking/files/1s.pdf) or in books addressing the topic (e.g., Applied Logistic Regression by Hosmer & Lemeshow). In these materials, you'll find there how to treat the output of your model in order to obtain estimates for the measures I've mentioned earlier.
Indeed, the structure of your data is tricky, and it might be necessary to gain more information, if possible. With much noise in the data, you may obtain misleading model-based results for the treatment effects.
Good luck!
  • asked a question related to Bayesian Analysis
Question
8 answers
Hi, 
I'm performing some bayesian estimations in AMOS for censored data. For this kind of data, bayesian estimation is actually the only option available in AMOS. However, I find no way to compare different models, since DIC is not provided when data are censored (i.e. non-numerical data). 
The only information I get about model fitting is a value of "p" (Posterior predictive), which according to the User Guide, should be " around 0.5.
So:
1) Can this "p" be used to compare between models (the closer to 0.5, the better the model...)? In my particular case I'm running a full model and I get a p of 0.23, but  don't know if this is good enough.When I run alternative model I get lower "p", in some cases even 0.00, which I interpret as a model not supported by the data.
2) Another idea that came to my mind is to use bayesian imputation of the censored data, so I get a clean (all numeric) dataset, and then use it to perform bayesian estimation of the parameters, obtaining in this case a value of DIC since all data are numerical now.
What is your opinion?
Thanks
Relevant answer
Answer
I am not much familiar with SEM using censored data but I guess the PPP values of the models can be compared and higher PPP will indicate better model. BUT frankly speaking not sure how this will help.
  • asked a question related to Bayesian Analysis
Question
2 answers
I performed a Bayesian paired sample t-test by using JASP (Great tool by the way). In the results' table, when I have a BF10 which is really large (e.g., BF10 = 186), then the Error% produces a NaN. In other comparisons, where the BF is smaller the Error% shows a number. Though, this value of Error% is really small (e.g., ~ 9.875e -5 ).
Relevant answer
Answer
Solved. The problem was clearly the software (JASP) and not the analyses. The problem derives from "The one-sided BF is calculated by departing from the two-sided BF and then adding a correcting factor. The correction factor is close to its maximum value, which might produce the problem." So, it is indeed due to the large BF and the specificity of the H1. Nevertheless, the most recent version of JASP is able to calculate even smaller Error%. I tried the 0.11.1.0 version of JASP and the problem in my analyses is sorted. I received extremely small values of Error% (e.g., ~ 3.643e-24 ) instead of NaN. Though, since the values are so close to 0 someone could report just an Errorr% ~0 (approx zero). In the forum of JASP, I noticed that even in the recent version may get a NaN for Error%, which indicates that the t-value is really large (as the BF value) and that the Error% is really small (i.e., close to zero). If someone faces similar problems with JASP, you may visit the following forum: https://forum.cogsci.nl/index.php?p=/categories/jasp-bayesfactor/p5
  • asked a question related to Bayesian Analysis
Question
3 answers
While executing a nexus file in MrBayes for phylogenetic analysis, if ntax=40, and out of these 40 taxa, some of them have a specific value of nchar (characters) and others have a different value (based on the sequence length), then whether there is any method to execute those files? If yes, then what kind of command in the <nchar=...>, MrBayes will follow to read that nexus file?
Please help.
Relevant answer
Answer
The input for mrbayes (or any phylogeny reconstruction algorithm) is a set of aligned sequences, and as such, all of them must have the same number of characters, regardless of their original lengths (the shorter sequences will be filled with gaps, typically represented with the dash symbol). So I guess you need to align your sequences as a first step (using mafft for example), and then convert the output (which typically is in FASTA format) to the nexus format. For this step you can use EMBOSS or any online FASTA/NEXUS converter such as:
Then, use the nexus as input for mrbayes. You will notice that all of your sequences will have the same length after aligned.
I hope that helps.
Best regards.
  • asked a question related to Bayesian Analysis
Question
4 answers
I am estimating a multiple regression model (with one 2-way interaction) using Maximum Likelihood estimation (MLE). Due to some substantial missingness on some important covariates (30-60% missing; n=19000), I estimated the multiple regression model using two missing data treatments (Listwise Deletion, Multiple Imputation). These methods, however, produced different results - example, interaction was significant when using multiple imputation, but not listwise deletion.
Is there a method/test to evaluate which approach (listwise deletion or multiple imputation) is more trustworthy? In papers I've read/reviewed, people often seem to find concordance between their model coefficients when using listwise deletion and multiple imputation.
Also, for those interested, these models were estimated in Mplus, and I implemented a multiple imputation based on bayesian analysis to generate imputed datasets followed by maximum likelihood estimation.
Thanks much,
Dan
Relevant answer
Answer
Hello Dan,
Yes; different tactics for addressing missing data frequently yield inconsistent results.
Usually the first order of business in trying to select a suitable method is to determine whether the data appear to be missing completely at random (mcar), missing at random (mar), or to have systematic relationships with presence/absence of data points.
Having said that, I suspect the multiple imputation approach you used is likely to be more warmly received than a listwise deletion approach that costs you 60% or more of the data set.
Good luck with your work.
  • asked a question related to Bayesian Analysis
Question
3 answers
I am conducting a population genetic study on several species of fishes using microsatellite alleles. I have used BayesAss (http://www.rannala.org/software/) to estimate migration percentages to and from each population in my datasets for each species. Superficially, the outputs provided seem to be apparently correct and adhere to pre-conceived hypotheses. However, the output data do not provide me with p-values to indicate if the estimates are statistically significant. For each estimate (mean) the SD is provided in the output file, and after reading the generated Trace file (https://www.beast2.org/tracer-2/ ) several other outputs are calculated (SE, SD, variance, upper and lower 95% CIs) but no p-value. Could anyone help me either a) identify the p-value in the output that I am somehow missing? or b) calculate the p-value from the outputs provided? Unfortunately statistics aren't my strong suit so I feel like the answer might be obvious with the results I've been given but I can't figure it out from what I've looked up so far.
Thanks!
Relevant answer
Answer
Hi,
first of: I assume you have decided on the "best" run using the trace files (reviwed for example in the Tracer software) as well as a comparison of runs by their Bayesian Deviance (which is thankfully easily calculated with the R script provided by Meirmans (2013) himself (see article below).
If so, you can calculate the 95% confidence intervalls as alternative to a p-value (actually the concept of p-values is more and more under critical review lately), by multiplying the standard deviation with 1.96. You will find various good explanations online for this if you look for "calculate confidence intervall from standard error". Then, a common approach would be to interpret migration rates as significant if their 95% confidence intervall does not include zero.
Hope this helps,
Cheers
  • asked a question related to Bayesian Analysis
Question
2 answers
I am currently using MEGA, but I find that it is different from the analysis methods commonly used in the literature. I would like to ask how to analyze Bayesian with PAUP or other software.
Relevant answer
Answer
PAUP does not do Bayesian Analysis. It does Maximum Likelihood Optimization of the parameters and trees. Bayes (eg. Mr.Bayes or Beast) optimizes both the model parameters and the trees at the same time using Metropolis-Coupled Markov-Chain Monte-Carlo (MCMCMC) searches. So they are quite different.
I would choose MLO anytime over Bayes because Bayes analysis depends on choosing very good priors, which is difficult even for the best of us analysts.
  • asked a question related to Bayesian Analysis
Question
4 answers
Dear all,
I am working with my doctoral thesis and trying to fit a generalized linear mixed effects model by using ‘MCMCglmm’ package in R. And actually this is the first time I work with it. I had repeatedly read Jarrod's tutorial materials and papers and they are very helpful for understanding the MCMCglmm method. However, there are still some problems about the priors specification I failed to figure out. I had been working with them for a couple of weeks but I cannot solve them.
In my research, the dependent variable is the number of people participating in household energy conservation program (count outcome). It has been repeatedly measured for each day over approximately three years for each of 360 communities (the data are thus quite big and n = 371, 520). In addition, these communities are located at different districts (there are a total of 90 districts). Thus, the longitudinal daily count data are nested within communities, which are nested within districts. My research aims to investigate which time-variant and time-invariant factors would influence the (daily) number of participants in such program. The basic model is (over-dispersed) Poisson model and the codes are cited as follows.
# load the data
load("dat.big.rdata")
#the requisite package
require(MCMCglmm)
#give the priors
prior.poi <- list(R = list(V = diag(1), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(3)*0.02, nu =4),
G2=list(V=diag(3)*0.02, nu=4)
)
)
#fit the model
model.poi <- MCMCglmm(y ~ 1 + t + x + x:t + t2 + t3 + t4 + c1 + c2 + c3 + d1 + d2 + d3,
random = ~ us(1 + t + x):no_c + us(1 + t + x):no_d,
rcov = ~idh(1):units,
family = "poisson",
data = dat.big,
prior = prior.poi,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
In the fixed effects part, ‘y’ is the count outcome; ‘t’ measures time in elapsed days since the start of the program; ‘x’ is another behavior intervention implemented for some communities. ‘t2 ~ t4’ are other time-variant factors (i.e. dummies measuring weekend and public holiday, and log term of average daily temperature); ‘c1 ~ c3’, and ‘d1 ~ d3’ measure the community and district-level characteristics respectively, which are time-invariant variables (e.g. total population, area size). In the random effects part, ‘no_c’ and ‘no_d’ are the record number of each community and district.
Since there are many excess zeros in my data, so I further run a hurdle (over-dispersed) Poisson model, as follows.
#give the priors
prior.hp <- list(R = list(V = diag(2), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(6)*0.02, nu =7),
G2=list(V=diag(6)*0.02, nu=7)
)
)
#fit the model
model.hp <- MCMCglmm(y ~ -1 + trait + trait:t + trait:x + trait:x:t + trait:t2 + trait:t3 + trait:t4 + trait:c1 + trait:c2 + trait:c3 + trait:d1 + trait:d2 + trait:d3,
random = ~ us(trait + trait:t + trait:x):no_c + us(trait + trait:t + trait:x):no_d,
rcov = ~idh(trait):units,
family = "hupoisson",
data = dat.big,
prior = prior.hp,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
Both OD and hurdle Poisson models could work well only when ‘fix = 1’ was added into the R- structure of the prior specification. When it was removed from the priors, both models would return the error message: “Mixed model equations singular: use a (stronger) prior”, and stop running. This error would not disappear regardless of whether parameter expansions were used in the G-structure (that is, alpha.mu=rep(0, 3), alpha.V=diag(3)*25^2 for OD Poisson model, and alpha.mu=rep(0, 6), alpha.V=diag(6)*25^2 for hurdle model) or not, either whether other elements in R-structure were removed/adjusted or not.
In hurdle Poisson model, since the covariance matrix for zero-alteration process cannot be estimated, ‘fix = 2’ should be used in R-structure rather than “fix = 1”. However, the model could not run well unless the residual variance for the zero-truncated Poisson process is fixed at 1, as described above.
My question is that is it appropriate to fix the residual variance for both zero-alteration and Poisson processes at 1 in the R-structure? Is it too ‘informative’ for my model estimation? Are there any other priors I can take to make the model run well?
Thanks for any idea about these questions.
Relevant answer
Answer
Thank you all for these suggestions. I use this prior and MCMCglmm method in my analysis for the time being, and I will try brms later. Thanks angain!
  • asked a question related to Bayesian Analysis
Question
16 answers
Question edited:for clarity:
My study is an observational two-wavel panels study involving one group samples with different levels of baseline pre-outcome measures.
There are three outcome measurements that will be measured two times (pre-rest and post-rest):
1. Subjective fatigue level (measured by visual analog score - continous numerical data)
2. Work engagement level (measured by Likert scale - ordinal data)
3. Objective fatigue level (mean reaction time in miliseconds - continous numberical data)
The independant variables consist of different type of data i.e. continous numerical (age, hours, etc), categorical (yes/no, role, etc) and ordinal type (likert scale).
To represent the concept of recovery i.e. unwinding of initial fatigue level, i decided to measure recovery by substracting pre-measure with post-measure for each outcome, and the score differences are operationally defined as recovery level (subjective recovery level, objective recovery level and engagement recovery level).
I would like to determine whether the independant variables would significantly predict each outcome (subjective fatigue, work engagement and objective fatigue).
Currently i am thinking of these statistical strategies. Kindly comments on these strategies whether they are appropriate.
1. Multiple linear regression, however one outcome measure i.e. work engagement is ordinal data.
2. Hierarchical regression or hierarchical linear modelling or multilevel modelling, but i am not quite familiar with the concept, assumption or other aspect of these method.
3. I would consider reading on beta regression (sorry, this is my first time reading on this method).
4. Structural Equation Modelling.
- Can the 3 different type of fatigue measurement act as inidcator to measure an outcome latent construct of Fatigue?
- Can the independant variables consist of mix type of continous, categorical and ordinal type of data
Thanks for your kind assistance.
Regards,
Fadhli
Relevant answer
Answer
Hello Fadhli,
Don't worry about your grammar. I should have been more careful, too.
I think you have enough people for your analyses.
I am attaching an article from a highly respected methodologist / statistician that should reassure you about your work engagement variable being able to be regarded as at the equal-interval level for purposes of analysis. It has some highlighting that I placed in it. I hope that's OK.
Robt.
  • asked a question related to Bayesian Analysis
Question
4 answers
Hello everyone.
My question is about the interpretation of the output obtained from an online calculator of Bayes factors. I am using the calculators provided by the Perception and Cognition Lab of the University of Missouri.
The reference paper for these calculators is Rouder et al. (2009), attached.
What I am not being able to understand is how to interpret the outputs in which the calculator states that the JZS factor is in favour of the null hypothesis. For example, in the attached output I obtain a JZS factor of 3.7 "in favour of the null hypothesis".
In other cases I find equally big or bigger JZS values "in favour of the alternative hypothesis" (also attached).
I think that this might be because the calculator reports the reciprocal of the JZS, depending on whether it is supporting one or the other hypothesis. So in both cases, the output of the calculator tells you which hypothesis is favoured (H0 or H1), and the higher the number the stronger the support for this hypothesis. Is that correct?
If this is the case, as JZS of 3.7 in favour of the null should mean that the null is 3.7 times more likely to be true than false, under the current data...
Can anybody confirm if I understood it correctly?
Thanks!
Relevant answer
Answer
Thank you again!
  • asked a question related to Bayesian Analysis
Question
5 answers
Dear Researchers/Scholars,
Suppose we have time series variable X1, X2 and Y1. where Y1 is dependent on these two. They are more or less linearly related. Data for all these variables are given from 1970 to 2018. We have to forecast values of Y1 for 2040 or 2060 based on these two variables.
What method would you like to suggest (other than a linear regression)?
We have a fact that these series es have a different pattern since 1990. I want to make this 1990-2018 data as prior information and then to find a posterior for Y1. Now, please let me know how to asses this prior distribution?
or any suggestions?
Best Regards,
Abhay
Relevant answer
Answer
Let me play the devil's advocate:
You have data for the past 50 years. However, you say that there is a mayor break or change in the pattern around 1990, so that you want to use only the more recent 30 years ... to predict what will be in 30 or 50 years in the future?
I doubt that this makes any sense. Toss some dice. It will be as reliable as your model predictions.
If "phase changes" like around 1990 can happen, they can happen in the future, too. Additionally, many other things can happen that we are not even aware of today. The uncertainty about such things must be considerd. Further, as you don't have any model that might be justifed by subject matter arguments, there is a universe of possibilities, again adding to the uncertainty of the prediction. If you consider all this, you will almost surely find that the predcition interval 30 or 50 years ahead will be so wide that it can't have any practical benefit.
You can surely grab one possible model, select some subset of your data, and neglect anything else, then you can make a possibly sufficiently precise forecast, which applies to this model fitted on this data, assuming that nothing else happens or can impact the dependent variable. Nice. But typically practically useless. It's a bit different when you has a model, based on a theory. Then you could at least say that this theory would predict this and that. But if you select a model just because the data looks like it's fitting, you actually have nothing.
It's important to think about all this before you invest a lot of work and time in such projects! It may turn out, in the end, that your approach is still good and helpful. But many such "data-driven forecast models" I have seen in my life have benn completely worthless, pure waste. Good enough to give a useful forecast for the next 2-3 years, but not for decades.
  • asked a question related to Bayesian Analysis
Question
4 answers
Hello. I have a little question about Bayesian analysis. The external group I have identified in my work is inside the tree I created with Bayesian analysis. Is this normal? or how can this be explained?
Note: I am using MrBayes for analysis and reviewing the trees with Figtree.
Relevant answer
Answer
I think what you are saying is that the outgroup taxon you included in the analysis is not the outgroup in the displayed consensus tree from your Bayesian analysis (i.e., it is not sister to all of the other taxa in the tree).
If that is the case, then consider the following:
1) Unless you explicitly tell MrBayes which taxon is the outgroup, the consensus tree will be displayed with an arbitrary rooting (or possibly a longest-branch rooting, depending on the settings/version, I think). This is because....
2) If you are using the typical GTR-style likelihood function in MrBayes (i.e., based on a time-reversible matrix model of evolution), then the outgroup can't be inferred by the program because all possible rootings of a given phylogenetic tree (topology + branch lengths) will map to the exact same likelihood score (that's what "time-reversible" means in this context). Note that all "other" fixed-matrix models (K2P, HKY, etc. for DNA; JC, WAG, JTT, etc. for proteins) are also time-reversible, and GTR is just the "generalized" form of this type of model.
If you are really confident that your chosen outgroup is the true outgroup (e.g., you are analyzing orthologous animal genes and your chosen outgroup is an orthologous fungal gene), then it's probably fine to manually re-root the phylogeny accordingly (e.g., in FigTree).
  • asked a question related to Bayesian Analysis
Question
2 answers
Hi,
I am running BAMM on different Newick-format chronograms. While all other trees run perfectly fine, I am getting some strange results when using one particular tree. This tree originally had a polytomy, which I could not resolve using the multi2di function in R, so I changed branch lengths in the raw text file. Because I could not change branch lengths manually and maintain the tree perfectly ultrametric, I then used the force.ultrametric function to fix it. However, I do not think this is the source of my problem, because the results are off all across the tree, not just in the clade I modified.
Once I run BAMM, the analysis itself is much much slower than it is with my other, much bigger trees (for control file and run info see attachments). The resulting event.data file is huge (65MB) and the plot.bammdata visuals are a mess (see attachments Result1 and Result3). It looks to me as if it does not recognise branches and/or nodes correctly, so it plots rate shifts on branches instead of nodes and, for some reason, plots hundreds of them.
If anyone has seen anything like this before I would greatly appreciate your help. If you need any additional information please contact me. Thanks in advance,
Eva Turk
Relevant answer
Answer
Thanks for your comment, but I actually already figured out what the problem was. One of the parameters, updateRateLambdaShift, was set to 0 instead of 1.
  • asked a question related to Bayesian Analysis
Question
1 answer
Hello,
I seem to be having issues with convergence in my Bayesian analysis. I'm using a single gene large dataset of 418 individuals. My PSFR values say N/A in my output but my split frequency is 0.007. Also, my consensus tree gives me posterior probabilities of 0.5 or 1 with no distnguishable clades (see attached). Below is my Bayes block:
begin mrbayes;
charset F_1 = 1 - 655\3;
charset F_2 = 2 - 656\3;
charset F_3 = 3 - 657\3;
partition currentPartition = 3: F_1, F_2, F_3;
set partition = currentPartition;
lset applyto=(1) nst=6 rates=gamma;
lset applyto=(2) nst=2 rates=invgamma;
lset applyto=(3) nst=6 rates=gamma;
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
mcmc ngen= 24000000 append=yes samplefreq=1000 nchains=8;
sump burnin = 10000;
sumt burnin = 10000;
end;
Any advice? Thanks!
Relevant answer
Answer
You have a fairly larger dataset so I would try with more generation time. On other hand I would check a modeltest for the 1st and 2nd as often they have the same model when tested (I don't have much experience but that is what I have seen).
I am not sure you need to you need to unlink the 3 partition and then set the priors as variable. Have you tried to remove:
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
Finally, I don't see in the code where do you set the 2 independent runs (nrun=Number or independent analysis with the same dataset and script) so I guess you run 8 chains but in a single run. So how come they don't converge? All the chains are dependent.
Try to add nrun=2 [usual for Bayesian] each run has 4 chains by default so no need to set that up.
mcmc ngen= 24000000 append=yes samplefreq=1000 nrun=2;
  • asked a question related to Bayesian Analysis
Question
3 answers
I am running BAPS v6.0 to explore population structure in my study species. I am performing a population mixture analysis using the spatial clustering of individuals module. As input, I have a multiple sequence alignment in FASTA format and a text file with coordinates of sampling locations of my individuals.
Before running the analysis, I have set the output file in the "File" menu in the GUI. According to the manual, numerical results should then be written to a .txt file in this directory.
The analyses run fine, and the results appear in the console. However, they are not written to a .txt file in the directory specified above, and searching my computer for the expected file name turns up nothing, so it is not simply being written somewhere else.
I wonder if anybody else has encountered this problem?
Relevant answer
Answer
Debasish B Krishnatreya Thanks very much, I will give that a go!
  • asked a question related to Bayesian Analysis
Question
3 answers
I have the NEXUS file of Sequences.
Relevant answer
Answer
First, You should CONVERT the MAS file to a nexus file and save it in the same folder as the MrBayes.
RECOMMENDATION:
-Place the outgroup taxon in the first rank
-CHOOSE A SHORT FILE NAME
Second, open the file and copy the commands at the end of the sequences that run GTR+G+I model for a nuclear data set. You should modify some commands based on your data (I put them in [...])
Note: There are many details. So read the manual carefully to learn how to change the commands. Since you want to analyze a ribosomal marker, choosing a suitable DNA model (command nucmodel) is crucial.
begin mrbayes;
log start filename= [same as nexus file's name].log;
outgroup 1;
set autoclose=yes nowarn=no;
lset coding=all nucmodel= 4by4 nst=6 rates=invgamma ngammacat=4;
prset Revmatpr=dirichlet(1,1,1,1,1,1) statefreqpr=dirichlet(10,10,10,10) shapepr=exponential (0.05) pinvarpr=uniform(0,1) topologypr=uniform Brlenspr=unconstrained:exp(100.0) ratepr=variable;
mcmcp filename=[PREFERRED NAME] nchains=4 nruns=2 samplefreq=100 printfreq=1000 savebrlens=yes mcmcdiagn=yes diagnfreq=1000 stoprule=YES stopval=0.01 minpartfreq=0.05 relburnin=yes burninfrac=.25;
mcmc;
sump burninfrac=.25;
sumt burninfrac=.25;
log stop;
end;
Third: open Mrbayes and write:
exe [FILE NAME].nex
Now you can have a coffee!
  • asked a question related to Bayesian Analysis
Question
2 answers
Actually I am doing a phylogenic tree using the Bayesian analysis, but unfortunally I can’t run the JModelTest in my computer (Windonds 7), so in fact I am doing my trees without determining the best-fit model for one gene previously. Is possible to do that? Is necessary to perform with best-fit model?
thanks very much for the response
Relevant answer
Answer
Hola Renato, como te dicen arriba es imperativo obtener un modelo previamente a la realización del análisis bayesiano, yo personalmente uso PartitionFinder, es bastante fácil de usar y rapidísimo. Si te puedo ayudar en algo, con gusto.
Saludos
  • asked a question related to Bayesian Analysis
Question
8 answers
Dear Colleagues,
I am currently working on a research project that includes using Bayesian analysis techniques to estimate the parameters of some distribution whose pdf is to some extend is complicated. It occurred to me that transforming the distribution to another one whose pdf is simpler to study might help in estimating the parameters of the baseline distribution.
My question is: Is this idea known and used by statisticians to estimate parameters in the Bayesian framework?
Thanks in advance for your fruitful ideas.
Mohamed I. Riffi
Relevant answer
Answer
Mohamed,
There is no problem with such a transformation in principle. It is routine in some applications – for example when fitting a lognormal distribution the model is based on log transformed data and the parameters of the lognormal are subsequently inferred. Priors and posteriors for the parameters in your model infer priors and posteriors for parameters that are the subject of inference. This is a particularly desirable property as it allows you to specify a statistical model that is appropriate for your data without being unduly constrained by a standard frequentist/classical inference framework.
Some care should be taken when specifying prior distributions, especially if these are intended to be non-informative for the parameters that are the subject of interest (rather than those in your statistical model). It’s easier and indeed preferable if you have data to inform informative priors either directly on the parameters or on functions of your parameters.
  • asked a question related to Bayesian Analysis
Question
11 answers
I recorded data from 17 subjects 600 trials each. I fitted the data for a linear model(2 parameter) and exponential (3 parameter) model. Overall r square for the exponential model was better and SSE was less. I calculated AIC, BIC, HQC for both model. Average of 17 subjects are
Exponential: 17.700531(AIC) 14.388660(BIC) 11.232778(HQC) 0.829579 (r sqaure)
Linear: 19.355811(AIC) 16.375185(BIC) 14.153079(HQC) 0.723486 (r sqaure)
is it valid to select the exponential model?
Relevant answer
Answer
Balima Larba Hubert , statistics like AIC, AICc, and BIC are valid for different models with different numbers of parameters, as long the same data is used for each, the dependent variable is the same, the dependent variable isn't transformed, and some other conditions.
  • asked a question related to Bayesian Analysis
Question
11 answers
I want to find out Bayes estimate under some LOSS function using some informative prior distribution like Gamma prior etc. but I don't know the criteria and procedure to select the value of Hyper-Parameter of prior distribution.
So anyone is here who can guide me for choosing the value of Hyper-Parameter.
Thank You
Relevant answer
Answer
To obtain the best values of the hyperparameters of the prior distribution, i suggest to read this paper "Kundu, D. (2008). Bayesian inference and life testing plan for the Weibull distribution in presence of progressive censoring. Technometrics, 50(2), 144-154". In this paper the author used the mean and the variance of the hyperparameters to choose the hyperparameter values.
  • asked a question related to Bayesian Analysis
Question
6 answers
Hello all,
I cannot get Mr Bayes to install on my MAC. Each time I download the MAC version what I get is a file in my downloads that has no installation capability. The user manual says that it should contain and installer but none comes up. I downloaded BEAGLE separately and a CUDO driver per instructions I found on a GitHub help page. Then re-downloaded Mr Bayes but again no installer populates. It definitely isn't installed as the mb command in my terminal window is still unrecognized. I tried downloading though GitHub but this didn't work either. Please has anyone had this issue. Admittedly, I know absolutely nothing about coding and fear having no GUI interface, but I understand taxonomy, and I require a bayesian analysis.
Thanks!
Relevant answer
Answer
I had the same problem and the link above at sourceforge have moved their version of MrBayes!
So I downloaded the second to last version, 3.2.5 for Mac from this link and it seems to work.
Good luck!
  • asked a question related to Bayesian Analysis
Question
7 answers
Dear Researchers,
I have a single point for my parameter as a prior information and 26 data points as current data-set.
How can I incorporate that point(/single point prior value) while doing Bayesian Analysis.
(Initially, I use to run a model with non-informative prior without considering the old info as it wasn't valid).
In this particular case, I want to know, Is there any way to include this old evidence (single prior point)? If yes, How can I? Which way should I select and why?
Best Regards,
Abhay
Relevant answer
Answer
If I understand you correctly, you want to concentrate the density of your prior distribution about that parameter on a single point. In other words, the prior will be a point mass distribution. Then your posterior will also be a point mass distribution and concentrates all density on that point. I highly doubt that is what you really want, since your data will not influence the posterior at all. If you are really confident, put a informative prior on that parameter that will also allow some uncertainty.
  • asked a question related to Bayesian Analysis
Question
4 answers
Am trying to predict more confident mean value of coefficients of systems , those determined by their no. of failures and linked consequences, by Bayesian Analysis though am confused how to calculate its likelihood. Most of research papers address number of failures in specific time intervals but am using number of failures and its impact on reliability. What distribution or model can i use to determine its likelihood function. Any relevant paper or material , specifically from railway industry if there is any.
Relevant answer
Answer
Dear Mary Nawaz, What is the progress in the solving of the problem you have aimed at. Go slowly, think deeply, work out perfectly. Success is then certain.
  • asked a question related to Bayesian Analysis
Question
3 answers
Using BEAST2 fossilized birth death model, I'm receiving low/red ESS values specifically for my posterior and prior. I'm already at 100M generations, which seems really high, but aside from increasing the generation number more, how do I increase the ESS values?
Relevant answer
Answer
I have maximized most of the scales at the operators tab, and my ESS significantly increased, will that have an effect on my analysis?
  • asked a question related to Bayesian Analysis
Question
3 answers
It is very well known and important to make power analysis for determine the statistical power of loci to detect genetic differentiation. But does anybody know if there are power analysis for amova (analysis of molecular variance), for bayesian analysis of migration and population structuring analysis (quantified as FST) . If there are some. How I can do the power analysis for this one.
Thanks,
Relevant answer
Answer
Following
  • asked a question related to Bayesian Analysis
Question
3 answers
I have done quiet a bit of reading (the manual, publication, papers that have used the method); however, I still have a question regarding sigma² and obtaining reliable results. The publication states:
"It is important to note that when the method fails to identify the true model, the results obtained give hints that they are not reliable. For instance, the posterior probability of models with either factor is nonnegligible (>15%) and the estimates of sigma² are very high (upper bound of the HPDI well over 1)."
This is one of my latest result:
Highest probability model: 5 (Constant, G3)
mean mode 95% HPDI
a0 -4.80 -4.90 [-7.67 ; -1.81 ] (Constant)
a3 8.32 8.70 [5.02 ; 11.4 ] (G3)
sigma² 37.9 25.6 [11.9 ; 75.9 ]
Am I to conclude that these results are not reliable? What might cause such a large sigma² and unreliable results? I ran the program for a long time so I do not think that's the issue. This problem continues to happen with many other trials that I've done. Does any one have any advice or recommendations? Thanks!
Relevant answer
Answer
Thanks Mabi
  • asked a question related to Bayesian Analysis
Question
3 answers
Is it a valid approach, when estimating a bayesian model, to build a 3-dimensional graph which plots the bayesian model estimates of the coefficient (y-axis) dependent on different assumed prior values for the standard deviation (z-axis) and mean (x-axis) of the prior?
Is this what is meant by superpriors?
If yes, can someone point me towards some relevant papers/books?
Thanks in advance.
  • asked a question related to Bayesian Analysis
Question
6 answers
I am working on a regression problem, where I achieved a very low MSE, which usually means that R2 coefficient should be close to 1, especially that the regressed curve is very close to the true curve.
The problem is the R2 is very low, even though the MSE is very low. What does this mean from the data nature perspective ?? and Why can this happen??
I understand that R2 equals MSE divided by variance, which implies that low R2 means either low MSE or very high variance. I hope somebody can explain this!
Relevant answer
Answer
I could be misunderstanding your question. But R2 is your Sum of squares explained (aka regression) divided by your total variance. It is not your MSE divided by variance as you noted above. Therefore, the size of MSE has no direct relation to R2. It seems like you have a very small sum of squares explained/regression that is causing you to have a small R2. Because mean squares are impacted by DF as mentioned above (they are simply sum of squares divided by degrees of freedom), the size of the Mean Squares for either explained or error don't make a lot of sense to interpret without considering the other. Their relative size of one to another is more important than their absolute size (which is how we get f values). As pointed out above, a small MSE could have a lot do to with the number of degrees of freedom you have as well as the amount of unexplained variance you have. Without seeing the results, it seems like you have a relatively (to sum of squares explained/regression) large sum of squares error, but a lot of degrees of freedom, which is causing a small MSE and a small R2. Best of luck.
  • asked a question related to Bayesian Analysis
Question
1 answer
We made a “regular” two-way repeated measures ANOVA analysis and found 2 main effects that were very similar in size (around Eta-squared =.313). However, a Bayesian two-way repeated measures ANOVA analysis, revealed very strong evidence for only one main effect (B = 49.86), but inconclusive evidence for the other effect (B=2.42).
First question: Does the gap between the 2 ANOVA's sound familiar? And if so what does it mean?
Moreover, a comparison between the models revealed that the model that best explained the data, was the one containing the two main effects. Following these results, we concluded that the 2-main effects that were found are valid.
Second question: Does that make sense?
Relevant answer
Answer
I doubt that this is due to the estimation engine (least squares vs MCMC). Usually, they give very similar results under weak priors.
I'd check the following potential issues in that order:
  1. Have you estimated the very same model? Perhaps, one model included the interaction effect, the other doesn't?
  2. Are your independent variables orthogonal? Or correlated perhaps? Colinearity of predictors can do very strong things to your estimates.
  3. Does your data reasonably comply to ANOVA assumptions? Normal and constant residual distribution?
  4. Have you inadvertedly set a strong prior in the Bayesian model?
  5. Are the MCMC chains of the Bayesian model mixing well? (they should look like a hairy caterpillar and have no sudden steps or plateaus).
  • asked a question related to Bayesian Analysis
Question
3 answers
When doing Bayesian analyses, is there a certain effective sample size that is considered the minimum acceptable sample size, within psychology?
Relevant answer
Answer
Hi all,
although the question is 2 months old, there seems to be the, i.m.o., most important answer missing, which is a reference to this article:
Schönbrodt, F. D., Wagenmakers, E. J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods, 22(2), 322.
Basic message: (also brought up against standard Null-HypothesisTesting) Even if your "t-test" (or any other classic 'standard method') produces a significant result, while having an inconclusive Bayes Factor for this result (around 1 +-3), the significant result is not worth mentioning (unreliable). What you then do: just go on sampling, untill your Bayes Factor is conclusive, either for or against your hypothesis. :) (Makes things easy, doesn't it?)
Hope this helps.
Best, René
  • asked a question related to Bayesian Analysis
Question
9 answers
There are generally three approaches towards "binning" (discretizing) a continuous variable for Bayesian Analysis: 1) Frequency, 2) Quintile and 3) Entropy. Frequency divides the range of the variable into equal size amplitude-difference bins over the range. Quintile creates bins based on an equal number of the variable's amplitude "counts" in each bin, resulting in equivalent of a uniform distribution of the variable's amplitude across the range, and the bin sizes (widths) are different for each segment. Entropy creates bin sizes based on (what I believe) equal entropy, or information, in each bin. Shannon Entropy is the probability times the log base 2 of the probability.
Therefore, I believe Quintile is binning based on equal probability in each bin and Entropy is based on the probability times the log of probability in each bin. If this is true, then how can the bin sizes be different with the same number of partitions?
Teach me oh Sensei...
Relevant answer
Do you mean quantile, not quintile? Quantile bins have equal probability but entropy bins show expected information, which is different, and thus would produce different bin sizes :)
  • asked a question related to Bayesian Analysis
Question
4 answers
Using R's arms package, I've run two Bayesian analyses, one with "power" as a continuous predictor (the 'null' model) and one with power + condition + condition x power. The WAIC for the two models are nearly identical: -.017 difference. This suggests that there are no condition differences.
But, when I examine the credibility intervals of the condition main effect and the interaction, neither one includes zero: [-0.11, -0.03 ] and [0.05, 0.19]. Further complicating matters, when I use the "hypothesis" command in brms to test if each is zero, the evidence ratios (BFs) are .265 and .798 (indicating evidence in favor of the null, right?) but the test tells me that the expected value of zero is outside the range. I don't understand!
I have the same models tested on a different data set with a different condition manipulation, and again the WAICs are very similar, the CIs don't include zero, but now the evidence ratios are 4.38 and 4.84.
I am very confused. The WAICs for both models indicate no effect of condition but the CIs don't include zero. Furthermore, the BFs indicate a result consistent with (WAIC) no effect in the first experiment but not for the second experiment.
My guess is that this has something to do with my specification of the prior, but I would have thought that all three metrics would be affected similarly by my specification of the prior. Any ideas?
Relevant answer
As Gelman says on the BDA book, model selection still is a on going subject of research. I would trust more on the posterior intervals, given that they were grounded on probability axioms.
If the answer is useful, please recommend for others,
  • asked a question related to Bayesian Analysis
Question
9 answers
I have a set of disease cases in the polygon form as an attribute of each city. There are some 180 cities (polygons) that 2-5 of them recorded more than 300 cases, about 100 of them contain 0-2 cases and the rest recorded 2-20 disease cases. I'm going to evaluate the possible correlation between illness and some environmental factor such as temperature, precipitation, etc.
However, the distribution of the disease data is severely non-normal and violates many statistical methods' assumptions.
Do you have any suggestion in this case?
Relevant answer
Answer
To assess the correlations in your data set you could use a non-parametric correlation measure like Spearman's rho. Also, if you analyse spatial autocorrelation in your spatial pattern, you should use a Monte Carlo/ramdomization approach to determine you p-values. See e.g. http://pysal.readthedocs.io/en/latest/library/esda/moran.html --> p_rand
Another thing you could use is Poisson regression to determine the strength and direction of influence of your environmental factors on the number of local disease cases. 
Cheers,
Jan
  • asked a question related to Bayesian Analysis
Question
2 answers
I have been trying to run DIYABC on my microsat data. All looks fine according to sample input files, and the program reads my file fine. However, I cannot get beyond setting my historical models. I continue to get an error message that states that I must indicate when samples are taken. No where in the manual or online have I found out how to do this. If I use the provided sample dataset, I get the same errors.  I'm at a loss. Can anyone help?
Relevant answer
Answer
Hi Jessica, I think what you're looking for it right at the bottom of page 8 of the manual. It should like this for example "0 sample 1" (meaning at time present I sampled population 1)
Copied from the manual:
population sample : time sample pop - time is the time (always counted in number of generations) at which the sample was taken and pop is the population number from which is taken the sample.
  • asked a question related to Bayesian Analysis
Question
4 answers
Hello,
By using the parameter setting below, I am trying to obtain the results for further analysis in Structure harvester. However, even tough I check the option "compute the probability of the data (for estimating K)", I cannot find the related result file in any folders related to the analysis. But my input file is correct, the analysis runs without problem and I obtain the result files on correct folders, the only problem is missing file to use in structure harvester, which should be named for example "K1ReRun_run_1_f". What am I missing? If you make some suggestions I will be appreciated.
Length of Burnin Period: 10000
Number of MCMC Reps after Burnin: 50000
Ancestry Model Info: No Admixture Model
* Use Sampling Location Information
* Use Population IDs as Sampling Location Information
Frequency Model Info: Allele Frequencies are Correlated among Pops
* Assume Different Values of Fst for Different Subpopulations
* Prior Mean of Fst for Pops: 0.01
* Prior SD of Fst for Pops: 0.05
* Use Constant Lambda (Allele Frequencies Parameter)
* Value of Lambda: 1.0
Advanced Options
Estimate the Probability of the Data Under the Model
Frequency of Metropolis update for Q: 10
and I tried the same with different iteration numbers: 1, 5, 10
Relevant answer
Answer
Thank you for your reply, I really did not understand what was the problem with first simulations that I tried with different values, but the new ones are working. Maybe there was a temporary problem, I really don't know. For now, I get results from harvester.
Edit: It might sound unrelated,  but it might be because of a virus was interfering with softwares. Because also geneclass2 was not working, after I deleted a suspicious file they both started to work. 
Thanks a lot again!
  • asked a question related to Bayesian Analysis
Question
14 answers
My teacher said that "when a large number of data is available, the prior has little effect on the posterior, unless the prior is extremely sharp". We know that the prior reflects knowledge/understanding/experience about the parameters before observing data. (1) Does it mean that the prior is not important at all if we have enough data? (2) We know that if the prior is uniform distribution, then it will have no effects on posterior. Does it mean that the more data we have, the more possibility that the data obey uniform distribution? (I know it is weird, but how to reject it?)  Thanks!
Relevant answer
Answer
Regarding (1), Jimmie Savage highlighted this aspect of Bayesian inference by dubbing it the "principle of stable estimation." A very famous and influential 1963 paper by Edwards, Lindman & Savage, "Bayesian statistical inference for psychological research," devotes a section to it. You may want to read this to learn more. The original publication is in the journal Psychological Review:
It's also included in Springer's Breakthroughs in Statistics collection:
That re-publication is accompanied by a 12pp introduction by William DuMouchel that discusses the principle. This paper is also a "Citation Classic," and you can read Edwards's recollections about writing it here:
Wiley's StatRef resource has an article on stable estimation:
All of that said, the phrase "principle of stable estimation" isn't really used any longer, partly because it's not a completely general phenomenon. (Also, speaking personally, I have trouble understanding in what sense it is a principle, i.e., a foundational idea, vs. an observed property that holds in some circumstances.)
Regarding (2), it's not really meaningful to say that a single, particular prior has "no effect on the posterior." There is no posterior without a prior. The posterior derives its very existence from the prior; I'd say that's a big "effect"! You could say that a uniform prior makes the posterior proportional to the likelihood function, but the likelihood function is not a distribution in parameter space, so this must not be interpreted as indicating that a uniform prior has no effect on the likelihood—it converts it into a probability density function, which is a pretty big effect.
What one can say along these lines is that the choice of prior within some class of candidate priors has little effect on the posterior, i.e., that the estimates found using one choice are similar to what are found with another choice. This leaves open how to specify the class to look at. The literature on robust Bayesian analysis (aka Bayesian sensitivity analysis) studies this. For example, if you're estimating a single real-valued parameter, and you assign it a normal prior, you can explore how the inferences change as you change the prior standard deviation. As long as the prior standard deviation is large compared to the width of the likelihood function, you'll find the choice of prior has little effect on estimation. The uniform prior corresponds to arbitrarily large prior standard deviation in this case, so you could say the uniform choice has little effect in this sense.
The literature on reference priors uses information theory to measure the effect of priors on posteriors, and to identify a prior that is in some sense "least informative" in that it has minimal effect on posterior inferences, averaged over possible datasets. In some cases the reference prior is uniform, but not in general.
A respondent used the term "shrinkage" to talk about an effect of the prior. There isn't a single technical definition of this term, but I don't think it should be used in this way. Bayesian methods were used long before the term was introduced into statistics, with effects of priors noted. Stein introduced the term in a specific context: estimating many related parameters, such as properties of members of a population. In this setting, he used "shrinkage" to refer to how optimal estimators for the member properties shrink toward each other. He discovered this effect in a frequentist analysis, with no prior in sight. In hierarchical Bayesian methods, the same effect arises. It comes about via the role of the prior distribution for the member properties (very differently than how Stein introduced it). This has led some (but I'd say not the majority) to talk about any influence of priors with "shrinkage" terminology. I much prefer to restrict this term to the population (or related parameters) setting. "Shrinkage" then refers to how the ensemble of parameter estimates shrinks together, not to how they shrink toward an a priori specified prior.
Finally, I think it is important to note that there are important conditions underlying the features of Bayesian inference you've asked about. (1) You are talking about parameter estimation, not hypothesis testing (comparing models). (2) You are talking about models with low dimension. (3) You are talking about parametric inference, i.e., inference with a  model with a fixed number of free parameters (vs. problems where the number or effective number of parameters depends on the data, which happens for measurement error problems, and nonparametric and semiparametric modeling). Much could be said about each of these points.  Here are brief remarks.
(1): Bayesian model comparison results depend more sensitively on features of the prior than parameter estimate results do. Bayesian model comparison requires most priors to be proper (normalized), so a uniform prior over an infinite space may not be allowed (but would be allowed over a finite space). The dependence of model comparison results on priors is actually a good thing; one can show that it accounts for the ability to "fine tune" larger models, effectively penalizing models for a measure of their complexity. This aspect of Bayesian inference has played an important (positive) role in recent philosophy of science, particularly in Bayesian confirmation theory.
(2): A flat prior in high dimensions will be very informative about some aspects of the model, in a non-obvious way. E.g., if you put a flat prior on a bounded set of parameters (i.e., a prior that is flat on a hyper-rectangle), in high dimensions you are effectively saying you believe the model lies near the surface of the hyper-rectangle, i.e., with at least one of its parameters taking an extreme value. This is a consequence of a "curse of dimensionality." There are other subtle features of flat priors in high dimensions that make them a bad default choice. As creatures in a 3-D world (or 4-D, including time), we have poor intuition about higher dimensions. It's good to remember this when you work on a high-D inference problem.
(3): This is partly a consequence of (2). Many interesting models have a number of parameters (or effective number of parameters) that is allowed to grow with the number of data. Bayesian treatment of measurement error models—e.g., regression where there is error, not just in the outcome, Y, but also in the predictors or covariates, X—introduces latent parameters for the unknown "true" X values. The number of parameters thus grows with the number of observations. Because of (2), you have to be careful about adopting uniform priors in such problems. Nonparametric and semiparametric models formally may have an infinite number of parameters (with the number of effectively operating parameters determined by the data). Bayesian estimation in this setting requires priors over an infinite dimensional space (e.g., the space of all functions, or all densities). Data are always finite; finite data cannot completely "swamp" an infinite-dimensional prior. Some features of the prior will persist in the posterior, always, in such problems. A "uniform" prior (e.g., a flat prior for the heights of a histogram whose number and location of bins adjusts to the data) is a very bad idea; as in (2), it puts large prior weight in bad places. So when you do Bayesian inference with these more complicated models, you need to think carefully about the prior. (Frequentist methods don't escape these issues; they appear in other ways, e.g., by having to worry about regularization in nonparametric estimation.) A good basic safeguard is to always draw samples from your prior and generate data based on them; this may reveal that your prior is making very strong predictions that the data should look nothing like the data you actually have. If that's the case, you probably need to think more about the prior!
Addendum: On rereading your question (2), I suspect I (and others) may not have interpreted it correctly. It sounds like you are thinking that the claim that a uniform prior "has no effect" on the posterior implies that the data should be uniformly distributed. If that's what you're wondering about, I think your confusion is coming from either (or both): (1) not distinguishing the parameter space from the sample space, or (2) thinking that "no effect on the posterior" means the posterior is the same as the prior (i.e., uniform). For (1), consider estimating the mean, mu, of a normal distribution, from many measurements (x_1, x_2, ..., x_N). The parameter space is the 1-D space of values of mu, and the sample space is the N-D space of values of the data (collections or vectors of x_i values). The prior and posterior are over the mu space, not the data space; saying that a prior or posterior is uniform is not (directly) a statement about the distribution over x_1, x_2, etc.. For (2), as I noted above, when your teacher told you the uniform prior has no effect on the posterior, your teacher didn't mean that the posterior was uniform, but rather that whatever shape it has with a uniform prior is very close to the shape it will have with a somewhat nonuniform prior. This only holds when, in fact, the posterior is very non-uniform (because the likelihood function is very non-uniform).
  • asked a question related to Bayesian Analysis
Question
10 answers
I am aware that the Consistency Index uses the number of changes in a matrix, but I haven't found a way to do the matrix nor to calculate this index on any software.
Relevant answer
Answer
MESQUITE is a phylogenetic analysis package (as are MEGA, PHYLIP, PAUP, DAMBE and a few others).   A package such as MESQUITE has many advantages such as having built-in functions for consistency index scoring, but a disadvantage is that you need to leard how to use the package, such as importing your data file, importing your treefile, and running the job you want.  It can sometimes be frustrating to figure out exactly what data format is needed for each type of data.
Anyway, MESQUITE does have a consistency index module built in.  I do not find this built in to DAMBE or MEGA. 
Before you go to a lot of trouble calculating the consistency index value for your data and tree, I think you should find out if you will gain any useful information from this value.  Do you know what a "good" value should be for your type of data, for example?  The Consistency Index can be very useful for morphological character data sets in some organisms where morphology evolves nicely.  For DNA and amino acid sequence data the consistency index usually does not give us much information about the quality of the data or the tree.
  • asked a question related to Bayesian Analysis
Question
1 answer
Can anyone suggest me that how to do Spatial Empirical Bayes
Smoothing?
Relevant answer
Answer
This is Full Bayes - see last chapter -it shows how to handle spatial weights and model changes over time
My understanding that EB and spatial smoothing is available in GeoDa
"the spatial EB smoother, it is computed the same way as the regular EB smoother except that the mean and variance of the prior are taken from a local subset (as defined by the weights) rather than the study region as a whole. "
  • asked a question related to Bayesian Analysis
Question
5 answers
Dear colleagues,
I am having some problems with node dating using substitution rates in MrBayes 3.2.6, even following the example in the program manual.
According to the manual I need to:
1.     Set a normal distribution as the prior for the clock rate: e.g. using 0.02 as the mean and 0.005 as the standard deviation assuming the rate is approximately 0.01 ± 0.005 substitutions per site per million years:
      MrBayes > prset clockratepr = normal(0.01,0.005)
 2.    Modify the tree age prior to an exponential distribution with the rate 0.01:  
       MrBayes > prset treeagepr = exponential(0.01)
 When I run the analysis, the program does not recognize the argument “exponential” to modify the age prior:
      No valid match for argument “exponential”
      Invalid Treeagepr argument
      Error when setting parameter “Treeagepr”
I have checked the "Command Reference for MrBayes ver. 3.2.6" and, in fact, “exponential” does not appears as a valid argument for Treeage parameter, so I think it is an error in the manual but I cannot find a way to solve it. .
Does anyone have had such situation before?
Any solution to solve the problem?
Many thanks,
Yoannis
Relevant answer
Answer
Dear Yoannis
I know MrBayes well and I appreciate it a lot. But I highly recommend you to use BEAST for time estimations based on sequences. Both programs analyse molecular data Bayesian using MCMC. But BEAST is entirely orientated towards rooted, time-measured phylogenies. I have had very good experiences with BEAST using BEAUTi for creating the input file. BEAUTi is a user interface that enable uploading sequence data (nexus formated) and defining all needed parameters (taxa groups, molecular clock (relaxed, strict), substitution model, etc.). Finally, the BEAST software package includes some other programs more such as Tracer (that analyses the log-file of BEAST or MrBayes) and TreeAnnotator (that searches the "best tree" among data obtained from BEAST).
I recommend you to do first a tutorial to learn the application of different programs.
Good luck!
Carolina
  • asked a question related to Bayesian Analysis
Question
2 answers
I used MrBayes 3.2.6 to build a phylogeny of Nematodes. But I found no matter how I specified the constraint priors of taxa groups, the resulting trees looked same, even the constrained taxa were grouped together the supporting values were not 1, and some constrained taxa were not grouped as monophyly. And the log of execution did not show any problems. Could anyone help me, is my setting wrong or is it a bug in this version? Thanks!
-------my constrain setting like-------
.....
constraint outg hard = OGHETK_Ascaridia_galliX2Y OGCOS_Cruzia_americanaX2Y OGCOS_Oxyascaris_spX2Y; [the 'hard' flag could be missing, no effect]
prset topologypr=constraints(outg);
.....
Relevant answer
Answer
Thanks a lot, Alexandre!
I find the answer. I guess it is a bug. In the mailing group of MrBayes, many users posted similar questions, but only one answer helps. When the taxa constraining commands are placed after the character partitioning commands, the program will give an expected tree.
All the best,
Liang
  • asked a question related to Bayesian Analysis