Questions related to Bayesian Analysis
I want to conduct a mutilevel regression analysis with three levels with a binary outcome in Mplus. I am able to set the script, but I need to choose Bayes as estimator and the results seem to be quite fragile. I am not experienced with Bayes estimation so far and am not sure yet whether or not the results I obtain are trustworthy.
Now, I realized that in the Mplus User's Guide, it is said: "For TYPE=THREELEVEL, observed outcome variables can be continuous. Complex survey features are not available for
TYPE=THREELEVEL with categorical variables or TYPE=CROSSCLASSIFIED because these models are estimated using Bayesian analysis for which complex survey features have not been
generally developed." (page 262). To be honest, I have problems understanding what that means for my model. I do not need any further commands to the model (e.g., weighting), so, I am not sure whether or not that should worry me.
Can anyone tell me if I need to interpret results from Bayes estimation (from threelevel analysis with a binary outcome) any different? Or can anyone suggest specific literature?
In addition, I know that there is a debate whether or not binary outcomes may be treated as continuous data. Whenever possible, I always tried to treat binary outcomes as categorical. However, may treating the binary outcome as continuous be an option for me here?
(As an extra question: Does anyone know if I can perform threelevel analysis with binary outcomes any easier in R? I have only very basic knowledge in R, but maybe I could work with that, as well, if it is better suitable for this particular analysis.)
Any help is appreciated :-) Thanks!
I am trying to understand and use the best model selection method for my study. I have inferred models using swarm optimisation methods and used AIC for model selection. On the other hand I am also seeing a lot of references and discussions about BIC as well. Apparently, many papers have tried and concluded to use both AIC and BIC to select the best models. My question here is, What if I use ABC along side with AIC and BIC, how this will effect my study in better way and what would be its pros and corns of using ABC, BIC and AIC as model selection methods ?
JModel Test suggested HKY+G and TPM3uf+I+G as best fit models. I used the following set up for HKY+G model block preparation in MrBayes and it worked perfectly-
set autoclose=no nowarn=yes;
lset nst=2; lset rates=gamma;
mcmc ngen=1000000 nchains=4 relburnin=yes burninfrac=0.25 printfreq=1000 samplefreq=1000 savebrlens=yes;
But not sure how to set the TPM3uf+I+G model in MrBayes. Any suggestion would be highly appreciated.
I use STATA to undertake Bayesian analysis, but I'm not certain what or where the posterior probability is in the Stata output. The Stata Bayesian Analysis Reference Manual is a manual I've tried reading repeatedly, but it hasn't helped me get a handle on this problem. I discovered one study in Malaysia employing it, but I still do not comprehend it. Please help resolve this matter.
Is there a clear justification or any advantage to using a Bayesian approach for the analysis of zero-inflated count data? Specifically dental caries
I'm implementing a Bayesian Negative Binomial using STATA 17. Because of some colinearity or convergence issues, I needed to put my variables in different blocks in the modeling process. Yet, it is a bit confusing to choose the most optimum number of blocks (and also their exact set of variables) for a model. Do you have any idea about it?
Apart from that, what criteria do you suggest (DIC, Acceptance rate, Efficiency, variables significance, etc.) for comparing models developed using various number of blocks?
I appreciate any help you can provide in advance.
Which model is better for investigating outcome – exposure relationship spatially?
· Data are counties
· Dependent variable is incidence per county
· Independent variable is median of XXX per county
· Spearman’s correlation, significant and negative
· Spatial autorelation of incidence values close to zero
· Local clusters, detected but majorities are single counties
· Geographically weighted regression, local coefficients mix of positive and negative but global regression coefficient negative
With this background, we should go for geographically weighted regression or bayesian convolution model with structured and unstructured random effects.
I am looking for papers / sources that outline any kind of general rule for a minimum sample size (real observations) for using the bootstrap.
For now I am using it to generate the CI for a mean. Would be nice to know if there are minimum 'n' suggestions for other operations like correlation etc. PM for code etc.
I am trying to set models of evolution for a DNA dataset using jModelTest2.
Some partitions, when run multiple times, will return different best fit models (ex. F81 G, TPM1ufG, and F81 G after three runs).
Which model should I accept in this scenario?
I've heard a recommendation to select the simplest model if the output is inconsistent, but that was unsourced and in a few cases the models with have the same complexity.
*edit: By complexity, I mean qualifiers like G or I
I want to use BEAST to do EBSP analyses with two loci. I open two "input.nex" files in BEAUti to generate a "output.xml" file (In the Trees panel select Extended Bayesian skyline plot for the tree prior), and then run BEAST. I do not know if this is right and I do not know what to do next. I can not construct the trend of demographic history in Tracer just like BSP. I got one log file but two trees files (for each locus), and I do not know how to import both tree files into Tracer.
The only interpretation of the Bayes Factor calculated using the output of BayesTraits is this:
<2 Weak evidence
>2 Positive evidence
5-10 Strong evidence
>10 Very strong evidence
Is there a reason why Bayes Factor won't get a negative value?
Also, is a Bayes Factor value that provides 'negative' evidence, because they way I see, weak evidence is still evidence in favour of the dependent model, but not strong enough. Or is this not the case?
Many thanks in advance!
I am the owner of a luxurious mall which caters to the luxury segment (sells only branded goods with high prices) in one of the posh areas in the city. I have given the space to rent on a quarterly basis based on the a) possible revenue generation potential b) rent the shopkeeper is willing to pay per square feet c) how many customers they shop draws to the mall. Since the past historical data I have is just an estimate for all these factors I decide to treat the information as prior knowledge with the prior mean being computed using past historical data. Now what type of prior pdfs shall I use? I have to think on the line of a Bayesian and create a strategy for giving space for different shops in the mall so that I see long term profit.
Its going to be a huge shift for marketers, tracking identity is tricky at the best of times with online/offline and multiple channels of engagement - but when the current methods of targeting, measurement and attribution get disrupted, its going to be extremely difficult to get identity right to deliver exceptional customer experiences whilst getting compliance right.
We have put our framework and initial results show promising measurement techniques including Advanced Neo-classical fusion models (borrowed from Financial industry, Biochemical Stochastic & Deterministic frameworks) and applied Bayesian and Space models to run the optimisations. Initial results are looking very good and happy to share our wider thinking thru this work with everyone.
Link to our framework:
Please suggest how would you be handling this environmental change and suggest methods to measure digital landscape going forward.
#datascience #analytics #machinelearning #artificialintelligence #reinforcementlearning #cookieless #measurementsolutions #digital #digitaltransfromation #algorithms #econometrics #MMM #AI #mediastrategy #marketinganalytics #retargeting #audiencetargeting #cmo
I am doing a Bayesian comparison between two proportions, H0 being Proportion(Protein)> Proportion(Mixed). Here the Proportion is of no. of times a free-ranging dog(s) ate from a box(Protein, Mixed). Being a binary variable (Eat from the box: Yes/No), it is of the beta-binomial family. In experiment 1, I use a uniform prior: Beta(1,1). The no. of successes (x) and the no. of failures(y) for Protein and Mixed are as follows:
x1= 19, y1= 25; x2= 8, y2= 33
So my beta posterior shape parameters for Protein are (19+1, 25+1) and for Mixed are (8+1, 33+1).
I am planning to use these posterior parameters as priors for my current experiment. The two experiments have the same set-up. The only difference being the no. of dogs (individual vs groups). I don't expect to see a radical difference in results.
So my question is, can i use the previous beta posteriors as current priors in the way I have written it down, i.e B(20,26) for protein and B(9,34) for mixed. (where, B stands for beta distribution).
Current experiment info: x1=41, y1= 47; x2= 41, y2= 49
Information about me: I come from a biology background with minimal math and programming knowledge. I have been learning Bayesian on my own for about 2 months now. I am willing to learn but math heavy explanations tend to go over my head. Feel free to point me towards relevant resources. I am using R to do the calculations.
I am a Ph.D. research scholar in forensic microbiology I need to find out the cluster of 16srRNA between various people.
#clusteranalysis #forensics #microbiology #forensicmicrobiology
Hi all. I am trying to construct a Bayesian analysis and phylogenetic tree using MrBayes with a set of 85 sequences with an alignment of 900 bp. I am having trouble getting convergence and the analysis just keeps running. Here are my parameters. Does anyone have any ideas for different parameters that I could use to get convergence?
nst=6 rates = invgamma samplefreq=1000 nchains=4 stoprule=YES stopval=0.01 burninfrac=0.25 temp=0.25
I am doing bayesian analysis of Partial CprM region of dengue virus serorype 3 using beast v.1.10.4 . In the end of the analysis i have to calculate tmrca and rate of nucleotide substitution. Do i need to specify the tip dates for each individual sequence while generating xml file using beauti? Does absence of tip dates affects the results? What if i proceed the analysis by leaving the tip dates alone ? Please suggest
I have the following equation, Eating choice (Yes/No)~ Box type (A/B/C). So the question is whether a certain box type affects eating choice in groups of dogs. The eating choice of each member in a group from one (two or all three) of the boxes combined will give me the final choice of the group. I want to carry out a bayesian logistic regression. So bernoulli distribution with beta conjugate to be used. I have data from a previous experiment (analysed with frequentist methods) on the same equation (EC~ BT), but on individual dogs. So I have the following questions:
How do I find out the correct priors to use for the groups of dogs equation in a bayesian glm? Do I run the data from my previous experiment (individual dogs) through a bayesian glm with weakly informative or a non informative priors and then use the upper and lower limits of the posterior distribution as the prior for my new data? If this is incorrect, what should I do? Feel free to point me towards relevant articles and resources.
Other relevant information: I am using brms package in R. I have a biology background with minimal math experience. I am open to learning but I struggle if the articles/resources are math heavy in explanation.
One of the aims of my current study is to place a particular plant species within the context of the whole genus phylogenetically. The systematic position of these plant species is well known. But the species that I have in hand is rare and endemic and its phylogenetic position is obscured.
I will launch a phylogenetic analysis of the DNA sequences of this plant species. I have a few DNA sequences of 5 markers. Please I need help and answers to the following points:
1- I did a BLASTn search, should I use Mega blast search instead?
2- If the retrieved list includes a plant species with several accessions, should I download all accessions or just one accession for each species is enough for the phylogenetic analysis?
3- What is the threshold of similarity percent to the query sequence I should select?
4- How many accessions needed to cover the diversity of the plant species under investigation?
5- Which phylogenetic analysis methodology suits systematic research, (i.e.) Bayesian analysis, Maximum likelihood, or something else?
I'm trying to establish Bayes factor for the difference between two correlation coefficients (Pearson r). (That is, what evidence is there in favor for the null hypothesis that two correlation coefficients do not differ?)
I have searched extensively online but haven't found an answer. I appreciate any tips, preferably links to online calculators or free software tools that can calculate this.
I wish to know the difference between the BN and Markov model. In what type of problems one is better than other?
In case of reliability analysis of a power plant, where equipment failures are considered, which model should be used and why?
For a dynamic Bayesian network (DBN) with a warm spare gate having one primary and one back-up component:
If the primary component P is active at the first time slice, then its failure rate is lambda (P) and the failure rate of back up component S1 is [alpha*lambda (S1)].
If the primary component P fails at the first time slice, then its failure rate is lambda (P) and the failure rate of back up component S1 is [lambda (S1)].
My question is, above are the conditional probabilities of primary and backup component. In a DBN, prior failure probability is also required. What will the prior failure probability of back up component? Will it be calculated using lambda (S1) or alpha*lambda (S1)?
I am completing a Bayesian Linear Regression in JASP in which I am trying to see whether two key variables (IVs) predict mean accuracy on a task (DV).
When I complete the analysis, for Variable 1 there is a BFinclusion value of 20.802, and for Variable 2 there is a BFinclusion value of 1.271. Given that BFinclusion values quantify the change from prior inclusion odds to posterior inclusion odds and can be interpreted as the evidence in the data for including a predictor in the model, can I directly compare the BFinclusion values for each variable?
For instance, can I say that Variable 1 is approximately 16 times more likely to be included in a model to predict accuracy than Variable 2? (Because 20.802 divided by 1.271 is 16.367 and therefore the inclusion odds for Variable one are approximately 16 times higher).
Thank you in advance for any responses, I really appreciate your time!
I have ddRADseq data, with RAD-loci assembled de-novo using the STACKS pipeline for two species and their hybrids. As such, I was wondering how I could estimate Fst for each of those putative loci without using BayeScan. This is because many of my samples are hybrids which violate the model assumptions of the program.
I have seen a method mentioned which uses the ' Bayesian implementation of the F-model' by Gompert et al 2012, but I am unsure how to practically put this into use. I have also tried using outFLANK, but it needs a data-set containing no missing data.
Thanks a lot.
I am having a difficult time modeling a binary dataset that my team has gathered. It is a dataset of different prescriptions per each patient and an outcome of a certain event.
The goal is to determine if certain prescriptions lead to decreased rates of alerts.
However, there are two main problems:
The data is fully binary without much overlap or interaction between columns
There is a lot of noise in the data, and the chance of the alert occurring is due to chance.
I have run the data through different algorithms to determine any trends or factors that may be important, but have not been very successful.
I was thinking of doing a simple bayesian analysis to determine if each setting has an impact on the outcome of the alerts, but would love to be able to involve many features if possible.
Is there anything I am missing or that I could try to determine the influence of different treatments?
I'm performing some bayesian estimations in AMOS for censored data. For this kind of data, bayesian estimation is actually the only option available in AMOS. However, I find no way to compare different models, since DIC is not provided when data are censored (i.e. non-numerical data).
The only information I get about model fitting is a value of "p" (Posterior predictive), which according to the User Guide, should be " around 0.5.
1) Can this "p" be used to compare between models (the closer to 0.5, the better the model...)? In my particular case I'm running a full model and I get a p of 0.23, but don't know if this is good enough.When I run alternative model I get lower "p", in some cases even 0.00, which I interpret as a model not supported by the data.
2) Another idea that came to my mind is to use bayesian imputation of the censored data, so I get a clean (all numeric) dataset, and then use it to perform bayesian estimation of the parameters, obtaining in this case a value of DIC since all data are numerical now.
What is your opinion?
I performed a Bayesian paired sample t-test by using JASP (Great tool by the way). In the results' table, when I have a BF10 which is really large (e.g., BF10 = 186), then the Error% produces a NaN. In other comparisons, where the BF is smaller the Error% shows a number. Though, this value of Error% is really small (e.g., ~ 9.875e -5 ).
While executing a nexus file in MrBayes for phylogenetic analysis, if ntax=40, and out of these 40 taxa, some of them have a specific value of nchar (characters) and others have a different value (based on the sequence length), then whether there is any method to execute those files? If yes, then what kind of command in the <nchar=...>, MrBayes will follow to read that nexus file?
I am estimating a multiple regression model (with one 2-way interaction) using Maximum Likelihood estimation (MLE). Due to some substantial missingness on some important covariates (30-60% missing; n=19000), I estimated the multiple regression model using two missing data treatments (Listwise Deletion, Multiple Imputation). These methods, however, produced different results - example, interaction was significant when using multiple imputation, but not listwise deletion.
Is there a method/test to evaluate which approach (listwise deletion or multiple imputation) is more trustworthy? In papers I've read/reviewed, people often seem to find concordance between their model coefficients when using listwise deletion and multiple imputation.
Also, for those interested, these models were estimated in Mplus, and I implemented a multiple imputation based on bayesian analysis to generate imputed datasets followed by maximum likelihood estimation.
I am conducting a population genetic study on several species of fishes using microsatellite alleles. I have used BayesAss (http://www.rannala.org/software/) to estimate migration percentages to and from each population in my datasets for each species. Superficially, the outputs provided seem to be apparently correct and adhere to pre-conceived hypotheses. However, the output data do not provide me with p-values to indicate if the estimates are statistically significant. For each estimate (mean) the SD is provided in the output file, and after reading the generated Trace file (https://www.beast2.org/tracer-2/ ) several other outputs are calculated (SE, SD, variance, upper and lower 95% CIs) but no p-value. Could anyone help me either a) identify the p-value in the output that I am somehow missing? or b) calculate the p-value from the outputs provided? Unfortunately statistics aren't my strong suit so I feel like the answer might be obvious with the results I've been given but I can't figure it out from what I've looked up so far.
I am working with my doctoral thesis and trying to fit a generalized linear mixed effects model by using ‘MCMCglmm’ package in R. And actually this is the first time I work with it. I had repeatedly read Jarrod's tutorial materials and papers and they are very helpful for understanding the MCMCglmm method. However, there are still some problems about the priors specification I failed to figure out. I had been working with them for a couple of weeks but I cannot solve them.
In my research, the dependent variable is the number of people participating in household energy conservation program (count outcome). It has been repeatedly measured for each day over approximately three years for each of 360 communities (the data are thus quite big and n = 371, 520). In addition, these communities are located at different districts (there are a total of 90 districts). Thus, the longitudinal daily count data are nested within communities, which are nested within districts. My research aims to investigate which time-variant and time-invariant factors would influence the (daily) number of participants in such program. The basic model is (over-dispersed) Poisson model and the codes are cited as follows.
# load the data
#the requisite package
#give the priors
prior.poi <- list(R = list(V = diag(1), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(3)*0.02, nu =4),
#fit the model
model.poi <- MCMCglmm(y ~ 1 + t + x + x:t + t2 + t3 + t4 + c1 + c2 + c3 + d1 + d2 + d3,
random = ~ us(1 + t + x):no_c + us(1 + t + x):no_d,
rcov = ~idh(1):units,
family = "poisson",
data = dat.big,
prior = prior.poi,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
In the fixed effects part, ‘y’ is the count outcome; ‘t’ measures time in elapsed days since the start of the program; ‘x’ is another behavior intervention implemented for some communities. ‘t2 ~ t4’ are other time-variant factors (i.e. dummies measuring weekend and public holiday, and log term of average daily temperature); ‘c1 ~ c3’, and ‘d1 ~ d3’ measure the community and district-level characteristics respectively, which are time-invariant variables (e.g. total population, area size). In the random effects part, ‘no_c’ and ‘no_d’ are the record number of each community and district.
Since there are many excess zeros in my data, so I further run a hurdle (over-dispersed) Poisson model, as follows.
#give the priors
prior.hp <- list(R = list(V = diag(2), nu = 0.002, n=0, fix=1),
G = list(
G1=list(V = diag(6)*0.02, nu =7),
#fit the model
model.hp <- MCMCglmm(y ~ -1 + trait + trait:t + trait:x + trait:x:t + trait:t2 + trait:t3 + trait:t4 + trait:c1 + trait:c2 + trait:c3 + trait:d1 + trait:d2 + trait:d3,
random = ~ us(trait + trait:t + trait:x):no_c + us(trait + trait:t + trait:x):no_d,
rcov = ~idh(trait):units,
family = "hupoisson",
data = dat.big,
prior = prior.hp,
burnin = 15000, nitt = 65000, thin = 50,
pr = T, pl = T)
Both OD and hurdle Poisson models could work well only when ‘fix = 1’ was added into the R- structure of the prior specification. When it was removed from the priors, both models would return the error message: “Mixed model equations singular: use a (stronger) prior”, and stop running. This error would not disappear regardless of whether parameter expansions were used in the G-structure (that is, alpha.mu=rep(0, 3), alpha.V=diag(3)*25^2 for OD Poisson model, and alpha.mu=rep(0, 6), alpha.V=diag(6)*25^2 for hurdle model) or not, either whether other elements in R-structure were removed/adjusted or not.
In hurdle Poisson model, since the covariance matrix for zero-alteration process cannot be estimated, ‘fix = 2’ should be used in R-structure rather than “fix = 1”. However, the model could not run well unless the residual variance for the zero-truncated Poisson process is fixed at 1, as described above.
My question is that is it appropriate to fix the residual variance for both zero-alteration and Poisson processes at 1 in the R-structure? Is it too ‘informative’ for my model estimation? Are there any other priors I can take to make the model run well?
Thanks for any idea about these questions.
Question edited:for clarity:
My study is an observational two-wavel panels study involving one group samples with different levels of baseline pre-outcome measures.
There are three outcome measurements that will be measured two times (pre-rest and post-rest):
1. Subjective fatigue level (measured by visual analog score - continous numerical data)
2. Work engagement level (measured by Likert scale - ordinal data)
3. Objective fatigue level (mean reaction time in miliseconds - continous numberical data)
The independant variables consist of different type of data i.e. continous numerical (age, hours, etc), categorical (yes/no, role, etc) and ordinal type (likert scale).
To represent the concept of recovery i.e. unwinding of initial fatigue level, i decided to measure recovery by substracting pre-measure with post-measure for each outcome, and the score differences are operationally defined as recovery level (subjective recovery level, objective recovery level and engagement recovery level).
I would like to determine whether the independant variables would significantly predict each outcome (subjective fatigue, work engagement and objective fatigue).
Currently i am thinking of these statistical strategies. Kindly comments on these strategies whether they are appropriate.
1. Multiple linear regression, however one outcome measure i.e. work engagement is ordinal data.
2. Hierarchical regression or hierarchical linear modelling or multilevel modelling, but i am not quite familiar with the concept, assumption or other aspect of these method.
3. I would consider reading on beta regression (sorry, this is my first time reading on this method).
4. Structural Equation Modelling.
- Can the 3 different type of fatigue measurement act as inidcator to measure an outcome latent construct of Fatigue?
- Can the independant variables consist of mix type of continous, categorical and ordinal type of data
Thanks for your kind assistance.
My question is about the interpretation of the output obtained from an online calculator of Bayes factors. I am using the calculators provided by the Perception and Cognition Lab of the University of Missouri.
The reference paper for these calculators is Rouder et al. (2009), attached.
What I am not being able to understand is how to interpret the outputs in which the calculator states that the JZS factor is in favour of the null hypothesis. For example, in the attached output I obtain a JZS factor of 3.7 "in favour of the null hypothesis".
In other cases I find equally big or bigger JZS values "in favour of the alternative hypothesis" (also attached).
I think that this might be because the calculator reports the reciprocal of the JZS, depending on whether it is supporting one or the other hypothesis. So in both cases, the output of the calculator tells you which hypothesis is favoured (H0 or H1), and the higher the number the stronger the support for this hypothesis. Is that correct?
If this is the case, as JZS of 3.7 in favour of the null should mean that the null is 3.7 times more likely to be true than false, under the current data...
Can anybody confirm if I understood it correctly?
Suppose we have time series variable X1, X2 and Y1. where Y1 is dependent on these two. They are more or less linearly related. Data for all these variables are given from 1970 to 2018. We have to forecast values of Y1 for 2040 or 2060 based on these two variables.
What method would you like to suggest (other than a linear regression)?
We have a fact that these series es have a different pattern since 1990. I want to make this 1990-2018 data as prior information and then to find a posterior for Y1. Now, please let me know how to asses this prior distribution?
or any suggestions?
Hello. I have a little question about Bayesian analysis. The external group I have identified in my work is inside the tree I created with Bayesian analysis. Is this normal? or how can this be explained?
Note: I am using MrBayes for analysis and reviewing the trees with Figtree.
I am running BAMM on different Newick-format chronograms. While all other trees run perfectly fine, I am getting some strange results when using one particular tree. This tree originally had a polytomy, which I could not resolve using the multi2di function in R, so I changed branch lengths in the raw text file. Because I could not change branch lengths manually and maintain the tree perfectly ultrametric, I then used the force.ultrametric function to fix it. However, I do not think this is the source of my problem, because the results are off all across the tree, not just in the clade I modified.
Once I run BAMM, the analysis itself is much much slower than it is with my other, much bigger trees (for control file and run info see attachments). The resulting event.data file is huge (65MB) and the plot.bammdata visuals are a mess (see attachments Result1 and Result3). It looks to me as if it does not recognise branches and/or nodes correctly, so it plots rate shifts on branches instead of nodes and, for some reason, plots hundreds of them.
If anyone has seen anything like this before I would greatly appreciate your help. If you need any additional information please contact me. Thanks in advance,
I seem to be having issues with convergence in my Bayesian analysis. I'm using a single gene large dataset of 418 individuals. My PSFR values say N/A in my output but my split frequency is 0.007. Also, my consensus tree gives me posterior probabilities of 0.5 or 1 with no distnguishable clades (see attached). Below is my Bayes block:
charset F_1 = 1 - 655\3;
charset F_2 = 2 - 656\3;
charset F_3 = 3 - 657\3;
partition currentPartition = 3: F_1, F_2, F_3;
set partition = currentPartition;
lset applyto=(1) nst=6 rates=gamma;
lset applyto=(2) nst=2 rates=invgamma;
lset applyto=(3) nst=6 rates=gamma;
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
prset applyto=(all) ratepr=variable;
mcmc ngen= 24000000 append=yes samplefreq=1000 nchains=8;
sump burnin = 10000;
sumt burnin = 10000;
Any advice? Thanks!
I am running BAPS v6.0 to explore population structure in my study species. I am performing a population mixture analysis using the spatial clustering of individuals module. As input, I have a multiple sequence alignment in FASTA format and a text file with coordinates of sampling locations of my individuals.
Before running the analysis, I have set the output file in the "File" menu in the GUI. According to the manual, numerical results should then be written to a .txt file in this directory.
The analyses run fine, and the results appear in the console. However, they are not written to a .txt file in the directory specified above, and searching my computer for the expected file name turns up nothing, so it is not simply being written somewhere else.
I wonder if anybody else has encountered this problem?
Actually I am doing a phylogenic tree using the Bayesian analysis, but unfortunally I can’t run the JModelTest in my computer (Windonds 7), so in fact I am doing my trees without determining the best-fit model for one gene previously. Is possible to do that? Is necessary to perform with best-fit model?
thanks very much for the response
I am currently working on a research project that includes using Bayesian analysis techniques to estimate the parameters of some distribution whose pdf is to some extend is complicated. It occurred to me that transforming the distribution to another one whose pdf is simpler to study might help in estimating the parameters of the baseline distribution.
My question is: Is this idea known and used by statisticians to estimate parameters in the Bayesian framework?
Thanks in advance for your fruitful ideas.
Mohamed I. Riffi
I recorded data from 17 subjects 600 trials each. I fitted the data for a linear model(2 parameter) and exponential (3 parameter) model. Overall r square for the exponential model was better and SSE was less. I calculated AIC, BIC, HQC for both model. Average of 17 subjects are
Exponential: 17.700531(AIC) 14.388660(BIC) 11.232778(HQC) 0.829579 (r sqaure)
Linear: 19.355811(AIC) 16.375185(BIC) 14.153079(HQC) 0.723486 (r sqaure)
is it valid to select the exponential model?
I want to find out Bayes estimate under some LOSS function using some informative prior distribution like Gamma prior etc. but I don't know the criteria and procedure to select the value of Hyper-Parameter of prior distribution.
So anyone is here who can guide me for choosing the value of Hyper-Parameter.
I cannot get Mr Bayes to install on my MAC. Each time I download the MAC version what I get is a file in my downloads that has no installation capability. The user manual says that it should contain and installer but none comes up. I downloaded BEAGLE separately and a CUDO driver per instructions I found on a GitHub help page. Then re-downloaded Mr Bayes but again no installer populates. It definitely isn't installed as the mb command in my terminal window is still unrecognized. I tried downloading though GitHub but this didn't work either. Please has anyone had this issue. Admittedly, I know absolutely nothing about coding and fear having no GUI interface, but I understand taxonomy, and I require a bayesian analysis.
I have a single point for my parameter as a prior information and 26 data points as current data-set.
How can I incorporate that point(/single point prior value) while doing Bayesian Analysis.
(Initially, I use to run a model with non-informative prior without considering the old info as it wasn't valid).
In this particular case, I want to know, Is there any way to include this old evidence (single prior point)? If yes, How can I? Which way should I select and why?
Am trying to predict more confident mean value of coefficients of systems , those determined by their no. of failures and linked consequences, by Bayesian Analysis though am confused how to calculate its likelihood. Most of research papers address number of failures in specific time intervals but am using number of failures and its impact on reliability. What distribution or model can i use to determine its likelihood function. Any relevant paper or material , specifically from railway industry if there is any.
Using BEAST2 fossilized birth death model, I'm receiving low/red ESS values specifically for my posterior and prior. I'm already at 100M generations, which seems really high, but aside from increasing the generation number more, how do I increase the ESS values?
It is very well known and important to make power analysis for determine the statistical power of loci to detect genetic differentiation. But does anybody know if there are power analysis for amova (analysis of molecular variance), for bayesian analysis of migration and population structuring analysis (quantified as FST) . If there are some. How I can do the power analysis for this one.
I have done quiet a bit of reading (the manual, publication, papers that have used the method); however, I still have a question regarding sigma² and obtaining reliable results. The publication states:
"It is important to note that when the method fails to identify the true model, the results obtained give hints that they are not reliable. For instance, the posterior probability of models with either factor is nonnegligible (>15%) and the estimates of sigma² are very high (upper bound of the HPDI well over 1)."
This is one of my latest result:
Highest probability model: 5 (Constant, G3)
mean mode 95% HPDI
a0 -4.80 -4.90 [-7.67 ; -1.81 ] (Constant)
a3 8.32 8.70 [5.02 ; 11.4 ] (G3)
sigma² 37.9 25.6 [11.9 ; 75.9 ]
Am I to conclude that these results are not reliable? What might cause such a large sigma² and unreliable results? I ran the program for a long time so I do not think that's the issue. This problem continues to happen with many other trials that I've done. Does any one have any advice or recommendations? Thanks!
Is it a valid approach, when estimating a bayesian model, to build a 3-dimensional graph which plots the bayesian model estimates of the coefficient (y-axis) dependent on different assumed prior values for the standard deviation (z-axis) and mean (x-axis) of the prior?
Is this what is meant by superpriors?
If yes, can someone point me towards some relevant papers/books?
Thanks in advance.
I am working on a regression problem, where I achieved a very low MSE, which usually means that R2 coefficient should be close to 1, especially that the regressed curve is very close to the true curve.
The problem is the R2 is very low, even though the MSE is very low. What does this mean from the data nature perspective ?? and Why can this happen??
I understand that R2 equals MSE divided by variance, which implies that low R2 means either low MSE or very high variance. I hope somebody can explain this!
We made a “regular” two-way repeated measures ANOVA analysis and found 2 main effects that were very similar in size (around Eta-squared =.313). However, a Bayesian two-way repeated measures ANOVA analysis, revealed very strong evidence for only one main effect (B = 49.86), but inconclusive evidence for the other effect (B=2.42).
First question: Does the gap between the 2 ANOVA's sound familiar? And if so what does it mean?
Moreover, a comparison between the models revealed that the model that best explained the data, was the one containing the two main effects. Following these results, we concluded that the 2-main effects that were found are valid.
Second question: Does that make sense?
When doing Bayesian analyses, is there a certain effective sample size that is considered the minimum acceptable sample size, within psychology?
There are generally three approaches towards "binning" (discretizing) a continuous variable for Bayesian Analysis: 1) Frequency, 2) Quintile and 3) Entropy. Frequency divides the range of the variable into equal size amplitude-difference bins over the range. Quintile creates bins based on an equal number of the variable's amplitude "counts" in each bin, resulting in equivalent of a uniform distribution of the variable's amplitude across the range, and the bin sizes (widths) are different for each segment. Entropy creates bin sizes based on (what I believe) equal entropy, or information, in each bin. Shannon Entropy is the probability times the log base 2 of the probability.
Therefore, I believe Quintile is binning based on equal probability in each bin and Entropy is based on the probability times the log of probability in each bin. If this is true, then how can the bin sizes be different with the same number of partitions?
Teach me oh Sensei...
Using R's arms package, I've run two Bayesian analyses, one with "power" as a continuous predictor (the 'null' model) and one with power + condition + condition x power. The WAIC for the two models are nearly identical: -.017 difference. This suggests that there are no condition differences.
But, when I examine the credibility intervals of the condition main effect and the interaction, neither one includes zero: [-0.11, -0.03 ] and [0.05, 0.19]. Further complicating matters, when I use the "hypothesis" command in brms to test if each is zero, the evidence ratios (BFs) are .265 and .798 (indicating evidence in favor of the null, right?) but the test tells me that the expected value of zero is outside the range. I don't understand!
I have the same models tested on a different data set with a different condition manipulation, and again the WAICs are very similar, the CIs don't include zero, but now the evidence ratios are 4.38 and 4.84.
I am very confused. The WAICs for both models indicate no effect of condition but the CIs don't include zero. Furthermore, the BFs indicate a result consistent with (WAIC) no effect in the first experiment but not for the second experiment.
My guess is that this has something to do with my specification of the prior, but I would have thought that all three metrics would be affected similarly by my specification of the prior. Any ideas?
I have a set of disease cases in the polygon form as an attribute of each city. There are some 180 cities (polygons) that 2-5 of them recorded more than 300 cases, about 100 of them contain 0-2 cases and the rest recorded 2-20 disease cases. I'm going to evaluate the possible correlation between illness and some environmental factor such as temperature, precipitation, etc.
However, the distribution of the disease data is severely non-normal and violates many statistical methods' assumptions.
Do you have any suggestion in this case?
I have been trying to run DIYABC on my microsat data. All looks fine according to sample input files, and the program reads my file fine. However, I cannot get beyond setting my historical models. I continue to get an error message that states that I must indicate when samples are taken. No where in the manual or online have I found out how to do this. If I use the provided sample dataset, I get the same errors. I'm at a loss. Can anyone help?
By using the parameter setting below, I am trying to obtain the results for further analysis in Structure harvester. However, even tough I check the option "compute the probability of the data (for estimating K)", I cannot find the related result file in any folders related to the analysis. But my input file is correct, the analysis runs without problem and I obtain the result files on correct folders, the only problem is missing file to use in structure harvester, which should be named for example "K1ReRun_run_1_f". What am I missing? If you make some suggestions I will be appreciated.
Length of Burnin Period: 10000
Number of MCMC Reps after Burnin: 50000
Ancestry Model Info: No Admixture Model
* Use Sampling Location Information
* Use Population IDs as Sampling Location Information
Frequency Model Info: Allele Frequencies are Correlated among Pops
* Assume Different Values of Fst for Different Subpopulations
* Prior Mean of Fst for Pops: 0.01
* Prior SD of Fst for Pops: 0.05
* Use Constant Lambda (Allele Frequencies Parameter)
* Value of Lambda: 1.0
Estimate the Probability of the Data Under the Model
Frequency of Metropolis update for Q: 10
and I tried the same with different iteration numbers: 1, 5, 10
My teacher said that "when a large number of data is available, the prior has little effect on the posterior, unless the prior is extremely sharp". We know that the prior reflects knowledge/understanding/experience about the parameters before observing data. (1) Does it mean that the prior is not important at all if we have enough data? (2) We know that if the prior is uniform distribution, then it will have no effects on posterior. Does it mean that the more data we have, the more possibility that the data obey uniform distribution? (I know it is weird, but how to reject it?) Thanks!
I am aware that the Consistency Index uses the number of changes in a matrix, but I haven't found a way to do the matrix nor to calculate this index on any software.
I am having some problems with node dating using substitution rates in MrBayes 3.2.6, even following the example in the program manual.
According to the manual I need to:
1. Set a normal distribution as the prior for the clock rate: e.g. using 0.02 as the mean and 0.005 as the standard deviation assuming the rate is approximately 0.01 ± 0.005 substitutions per site per million years:
MrBayes > prset clockratepr = normal(0.01,0.005)
2. Modify the tree age prior to an exponential distribution with the rate 0.01:
MrBayes > prset treeagepr = exponential(0.01)
When I run the analysis, the program does not recognize the argument “exponential” to modify the age prior:
No valid match for argument “exponential”
Invalid Treeagepr argument
Error when setting parameter “Treeagepr”
I have checked the "Command Reference for MrBayes ver. 3.2.6" and, in fact, “exponential” does not appears as a valid argument for Treeage parameter, so I think it is an error in the manual but I cannot find a way to solve it. .
Does anyone have had such situation before?
Any solution to solve the problem?
I used MrBayes 3.2.6 to build a phylogeny of Nematodes. But I found no matter how I specified the constraint priors of taxa groups, the resulting trees looked same, even the constrained taxa were grouped together the supporting values were not 1, and some constrained taxa were not grouped as monophyly. And the log of execution did not show any problems. Could anyone help me, is my setting wrong or is it a bug in this version? Thanks!
-------my constrain setting like-------
constraint outg hard = OGHETK_Ascaridia_galliX2Y OGCOS_Cruzia_americanaX2Y OGCOS_Oxyascaris_spX2Y; [the 'hard' flag could be missing, no effect]