Question
Asked 21st Jan, 2015

How can I compute Bayesian posterior probability for a given phylogenetic tree?

After a search for a suitable phylogenetic tree using RAxML, how can I compute the Bayesian posterior probability for the resulted tree. What programs shall I use? Thanks.

Most recent answer

4th May, 2019
Nouari Sadrati
Ferhat Abbas University of Setif
Hello everyone,
Can you explain to me how I can present the values of ML bootstrap values (BPs) and posterior probabilities (PPs) at the same time at the nodes of the tree? I used MEGA7 for the Maximum Likelihood (ML) and MrBayes-3.2.7 for Bayesian inference (BI).
thank you.

All Answers (13)

21st Jan, 2015
Adam James Bewick
University of Georgia
RAxML uses maximum likelihood to estimate the "best" tree. For node support using RAxML you would need to bootstrap. This seems like a handy tutorial, and eludes to differences in the metric of support at nodes: http://bodegaphylo.wikispot.org/RAxML_Tutorial. To estimate support at nodes in terms of Bayesian posterior probabilities you would have to run a different program like BEAST or MrBayes that uses Bayesian statistics.
1 Recommendation
22nd Jan, 2015
Yong Jia
Murdoch University
Hi Adam,
I am not clear whether BEAST or MrBayes can test posterior support for a given tree or not, I looked at BEAST AND could not  find where to set this. Do you know how to set this?
BTW, it seems that PhyloBayes can do this according to its manual, when I run the pb command, it always say error in command, anyone got experience with PhyloBayes?
22nd Jan, 2015
Ferruccio Maltagliati
Università di Pisa
I use MrBayes for mtDNA sequence dataset and I have never had problems. WWW site: http://mrbayes.sourceforge.net
25th Jan, 2015
Brian Thomas Foley
Los Alamos National Laboratory
It looks to me, as if some of the answers above are not quite hitting it.  It looks like you don't want to build a new tree with Bayesian methods, you want to evaluate a tree you already built with with RAxML.  I am not a whiz with MrBayes, BEAST and other Bayesian phylogeny tools, but I suspect you should be able to do what you want in one or more of those tools by loading in your tree and your data and asking the software to evaluate your input tree rather than search for "the best tree".
2 Recommendations
27th Jan, 2015
Yong Jia
Murdoch University
Thanks all! Brian got me, I have figured out how to do this. The Tree Annotator program in BEAST allows me load my target tree to calculate corresponding posterior support. Now I am able to present both the bootstrap and bayesian supports in my ML tree. i saw some people do this but they did not state in detail how. Another question is that, I always get very weak maxium likelihood bootstrap support for one of my target clade, some people say this may be a problem when your target sequences are too homologous, and weak ML bootstrap support does not necessarily means the clustering is wrong, anyone got any comment, I wonder what you do or how you justify when there is weak bootstrap support. Thanks a lot!
1 Recommendation
27th Jan, 2015
Andrew W Wilson
Denver Botanic Gardens
I was a little confused by the answers here relative to the question. Bayesian posterior probabilities are based of the results of a Bayesian phylogenetic analysis. The most used phylogenetic methods (RAxML, MrBayes) evaluate how well a given phylogenetic tree fits your molecular data. A maximum likelihood score is created for each comparison/iteration of the analysis. For each iteration, the tree is tweaked and its new score is compared to old scores from earlier trees. Bayesian analyses are different from simple ML analysis because Bayesian analyses perform millions of iterations, but poorer scoring phylogenies are still retained relative to better ones, as long as the poorer ones fall within a margin of error. In the end all of these resulting trees are part of your "posterior distribution". The proportion of these trees that have your clade of interest gives you your "posterior probability". That is, the probability that a specific clade/branch will be found within the distribution of the Bayesian trees.
In essence you have to perform a Bayesian analysis to get posterior probabilities. I imagine with Brian's suggestion, you're using your RAxML tree as a starting tree to begin a Bayesian analysis. This should work just fine, but don't be surprised if the bayesian consensus tree isn't exactly identical to your RAxML ones.
It's important to realize that every phylogeny is simply a hypothesis. An estimation of relationships based on the data we are using. If you are trying to find evidence for a particular systematic relationship, but the phylogenetic analysis isn't giving the support you need, then you need to ask yourself a few more questions. Are you using the right data? That is, if you're using molecular data, perhaps the genes your using aren't sufficient by themselves, and that more or different molecular characters are needed. Also, what if the phylogeny is correct? What were the preliminary observations that caused you to hypothesize these organisms were closely related? Perhaps those observations were off, and there is really something else that is driving the evolution and diversification of the organisms you are studying. Phylogenies can be powerful in showing us how things evolved and motivating us to develop new hypotheses..
2 Recommendations
6th Feb, 2015
Yong Jia
Murdoch University
Hi Andrew, Thanks for the detailed elaboration! Sorry about the later reply.
Yes indeed, I have got a single gene family that I want to look at its phylogeny, the topologies always varied with different methods, including NJ, ML, and Bayesian test. The NJ and ML trees are kind of close, and Is consistent with commonly recognized life tree. But the ML support was weak for one of my target clade. Therefore, I want to test the Bayesian posterior support for my ML tree in the hope that there might be strong Bayesian support for that clade.
With Bayesian test, it generates a tree pool, I can calculate the best bayesian tree with this pool, and I can also calculate the posterior support for a previously determined phylogeny (not as an initial search tree). The thing is that the best Bayesian tree topology is very different with my NJ and ML trees.
Bayesian support calculation for my ML tree also doesn't give support to my target clade. So the problem is that I prefer to believe the NJ and ML topology, but I cannt get support for my target clade. (I found it hard to interpret the best Bayesian tree although it has strong Bayesian support, it seems to me that you could always get strong supported Bayesian tree if you let it run long enough, right?)
You may be right, the weak support for my target clade may result from my dataset, which may not be sufficient enough. Then my question is that why the ML method give weak support for its “best tree”. 1000 time bootstrapping for ML is already very computing consuming.
Another questions, when constructing phylogeny for a protein encoding gene family, do you prefer to use the protein sequence or CDS sequence? I found they could lead to very different results.
Thanks!
28th Aug, 2015
Andrew W Wilson
Denver Botanic Gardens
Sorry. This is probably waayy too late to reply as I'm sure you've probably resolved the issue. Regardless, I feel compelled to respond to your questions. First, yes, Bayesian analyses that have 'plateaued' will just increase support for the topology with longer runs. This is assuming that there isn't a more optimal topology hiding in treespace that the Bayesian analysis will somehow magically find.  This is rare but not impossible.
As far as protein sequence vs CDS, am I correct in assuming you mean amino acid sequence vs nucleotide sequence for a particular gene? Assuming yes I think you can analyze it both ways and reflect on the similarities more than the differences. I tend to used nucleotide sequence exclusively as there is more information there. However, if the relationships between genes is evolutionarily distant, then with nucleotides you might not be able to align the dataset sufficiently for analysis. The one time I encountered this problem I went to amino acid sequence and everything worked well. Ultimately, between the two datasets you would chose the phylogeny ("answer") that makes the most sense biologically to focus your discussion around.
1 Recommendation
1st Sep, 2015
Yong Jia
Murdoch University
Thanks Andrew! Always glad to learn more. I like the idea of "hypothesis" in phylogeny analysis in your previous answer. Whatever results people get from different phylogenetic methods, they got to make biological sense. 
As you mentioned, simply let Bayesian MCMC chain run longer after convergence would increase topology support, could these results be biased if someone intentionally let it run longer? Is there a norm that should be followed? 
Nucleotide sequences definitely contain more information than amino acid sequences, which however, from my understanding,  does not necessarily produce worse results when people try to answer a specific biological question.
Thanks!
1st Sep, 2015
Andrew W Wilson
Denver Botanic Gardens
I wouldn't call it bias. It's a statistical sampling process. More sampling = more statistical rigor. What you're essentially saying is that there is a high probability that the data you are using are going to recover a particular topology (give high posterior probability for some clades in your tree). Given that a phylogeny represents a 'hypothesis', any particular topology can be challenged or shut down given different or better data and analysis. As a result, someone can come along later and replicate any of my research, using different data and analysis, and ultimately refute my conclusions. This is the nature, and strength of the scientific method. Each new study generates greater understanding.
Beyond this philosophizing, Bayesian analyses are typically run with 10 thousand replicates. AND, I also think it is good to perform a 10K replicate analysis, multiple independent times (3x in general). Each time you can replicate a result, you are demonstrating statistical rigor in your analysis, outcome, and ultimate conclusions.
EDIT: Upon review I said 10K replicates when I should have said 10 million. My bad.
1 Recommendation
28th Sep, 2015
Fangluan Gao
Fujian Agriculture and Forestry University
Phylogenetic analysis was performed by Bayesian inference (BI) implemented in MrBayes 3.2.5  and Maximum Likelihood (ML) using RAxML v8.0 , respectively. ML bootstrap values (BPs) and posterior probabilities (PPs) were plotted on Bayesian 50% majority-rule consensus trees using FigTree v1.4.2.

Similar questions and discussions

Related Publications

Conference Paper
Cognates are present in multiple variants of the same text across different languages. Computational Phylogenetics uses algorithms and techniques to analyze these variants and infer phylogenetic trees for a hypothesized accurate representation based on the output of the computational algorithm used. In our work, we detect cognates among a few India...
Data
Full-text available
Community phylogenetic tree used in the study. (PDF)
Got a technical question?
Get high-quality answers from experts.