Question
Asked 10th Jan, 2014

How can I interpret bootstrap values on phylogenetic trees built with Maximum Likelihood method?

On different nodes I have values of 100 and others of 63, for example. Is there a minimum value under which the result is considered not to be reliable ?
I couldn't find any references on that, if you have some your welcome.
I am using MEGA software, and but most of the people seems to use RAxML, is there one program better than the other ? And for what reason ?

Most recent answer

16th May, 2019
Guido W Grimm
Trotters Independent Traders
K. Manikantan Tirumulpad takes up a few common misperceptions that I need to put right (as a network-thinker and somebody who worked at the coal-face of plant generic differentiation):
" Take 2 or more locus, atleast one resolving at genus scale and multiple locus (2 or more) which can resolve at species scale " – usually, with well-sample plant data at the genus level, each locus will prefer a (somewhat) different tree; why it's always problematic to then try to just combine it to infer a tree and use the arbitrary " ... cut off value of 50%. This will give an idea of what actually is happening with ur species and genus of interest."
No, it won't, it will only tell you what 50% of the segragating site prefer as topology, which may be biased by branching and data resolution artefacts as well, since one's loci typically can resolve some relationships but will be wrong on others. One can easily simulate data and just by using different mutation rates, one will get false positive: moderate to even highly supported branches at odds with the true tree used for simulating the data. When combining (concatentating) the loci, one gets a tree with some unambiguously (BS ~ 100) supported branches (often those of the true tree, few false positives) and others that a just the least-hurting resolution for conflicting topological aspects encoded in the data (BS << 100).
Also is a problem for well-sampled animal data like in the case of bears (the example we used for our 2017 paper; Schliep et al.) One branch in the combined tree such a least-hurting compromise, which has no support from each of the conflicting (topologically) datasets (mitochondrial genes, nuclear autosomes, Y-chromosome; see PDF for the basic situation there).
Plus, if reticulation took place (and with plant species but also animals, see e.g. the classic works by Mallet), the data will find some consensus tree for the data, which is not necessarily showing the coalescent, the tree that best explains all the data.
" We normally do not consider one with less than 50 bootstrap value, to avoid any false results. " is the reason why many genus-level phylogenies are just giving aspects of intra-generic evolution and differentiation (at best, some overlooking false branches, result of data and branching artefacts), rather than providing a comprehensive representation of what the data shows: with genus data, especially plant genera, when you have branches with BS < 100, one should always show the boostrap consensus network and/or consenus network of the individual loci trees.
Generic differentiation is rarely 100% tree-like, the data not the product of a simple sequence of dichotomous splits, which is what we model when inferreing a tree.
2 Recommendations

Popular Answers (1)

12th Jan, 2014
Weston Testo
University of Florida
It is important to understand what the bootstrap value represents before you can really get a good feeling for what is "good" or "poor" support.
Bootstrapping is a resampling analysis that involves taking columns of characters out of your analysis, rebuilding the tree, and testing if the same nodes are recovered. This is done through many (100 or 1000, quite often) iterations. If, for example, you recover the same node through 95 of 100 iterations of taking out one character and resampling your tree, then you have a good idea that the node is well supported (your bootstrap value in that case would be 0.95 or 95%). If you get low support, that suggests that only a few characters support that node, as removing characters at random from your matrix leads to a different reconstruction of that node.
Nicolas' suggestion regarding the Hillis and Bull paper is good, though I would suggest that a maximum likelihood tree with bootstrap values of 70% throughout would probably not go over well with reviewers. I can't comment on the analysis of Hillis and Bull (they obviously know what they are doing) but I frequently see BS values of ~70% referred to as moderate support.
As for comparison of maximum likelihood programs, I would recommend either Garli or RAxML. MEGA seems to get a lot of support on this site, but I do not use it because there are limited options for tweaking parameters.
I hope that helps.
57 Recommendations

All Answers (24)

11th Jan, 2014
Nicolas Magain
University of Liège
A bootstrap value of 70 is often considered as the threshold for good confidence. See this paper by Hillis & Bull .
I have never used MEGA, but I think that although very easy to use, it has a few drawbacks, as not enough flexible/tunable.
See the discussion here
5 Recommendations
11th Jan, 2014
Olivier Cagnac
Fermentalg
Thank you very much for your answer Nicolas.
CU,
Olivier
1 Recommendation
12th Jan, 2014
Weston Testo
University of Florida
It is important to understand what the bootstrap value represents before you can really get a good feeling for what is "good" or "poor" support.
Bootstrapping is a resampling analysis that involves taking columns of characters out of your analysis, rebuilding the tree, and testing if the same nodes are recovered. This is done through many (100 or 1000, quite often) iterations. If, for example, you recover the same node through 95 of 100 iterations of taking out one character and resampling your tree, then you have a good idea that the node is well supported (your bootstrap value in that case would be 0.95 or 95%). If you get low support, that suggests that only a few characters support that node, as removing characters at random from your matrix leads to a different reconstruction of that node.
Nicolas' suggestion regarding the Hillis and Bull paper is good, though I would suggest that a maximum likelihood tree with bootstrap values of 70% throughout would probably not go over well with reviewers. I can't comment on the analysis of Hillis and Bull (they obviously know what they are doing) but I frequently see BS values of ~70% referred to as moderate support.
As for comparison of maximum likelihood programs, I would recommend either Garli or RAxML. MEGA seems to get a lot of support on this site, but I do not use it because there are limited options for tweaking parameters.
I hope that helps.
57 Recommendations
14th Jan, 2014
Olivier Cagnac
Fermentalg
Thanks Weston it surely helps
20th Jan, 2014
Lauri Kaila
University of Helsinki
Not forgetting that a topology with very low support values may still be closer to a realistic one than other with better figures - quite often data is silently a little improved by removing unstable taxa, in order to improve support values and make he results look more convincing. One might then ask: what is the purpose of the study desin: to get beautiful figures to nodes, or to get a more realistic view on the taxa with their horn and warts? I have been involved in some papers where very low (almost non-existent) support values were not deemed fatal for the findings, and the journals are indeed quite well-known and esteemed, and so were the referees who identified themselves. if reviewers or editors only look at numbers, they have quite simplistic view of science, I daresay.
33 Recommendations
Deleted profile
Because the bootstrap values (and posterior probabilities and aLRTS, etc) are subject to the multiple tests problem which gets worse as trees get bigger, I now disagree with the cutoff of 70%. Applying a sort of Bonferroni correction to the values means that 70 on a single branch corrects to a value much smaller than 70%
I like values that are above 95% but will label branches that are around 90%.
Having said that it is unrealistic to find such support on any tree across all branches so you must use common sense and place your concern on a few branches that you consider important to your thesis.
I don't think that 70% and below provides anything other than an equivocal solution.
So if you find such a value on nodes that are important to your hypothesis then I would say that you lack the data to make your point.
1 Recommendation
3rd Feb, 2014
Hani Hafez
Suez University
what can i do if I have got values less than 70% as I did all alignments well and running the MEGA5. please help me in this point
as I need to calculate the MRCA, is that good with MEGA , as I tried to use BEAST but i could not do it I have no experience and there is alot of data that i dont understand
15th Jul, 2014
Kunal Arekar
Indian Institute of Science
As it is said before, Bootstrapping is re-sampling from your existing sample with replacement as it is not possible to go out in the field a 100 times or 1000 times to get  samples. So if you bootstrap 100 times(replicates) and 95 times you get similar result, then your bootstrap support for that particular result is 95%.
A bootstrap support above 95% is very good and very well accepted and a bootstrap support between 75% and 95% is reasonably good, anything below 75% is a very poor support and anything below 50% is of no use, it is rejected and such values are not even displayed on the phylogenetic tree.
Regarding softwares for ML analysis, I will suggest GARLI or RAxML(RAxML is mostly used these days, although both are similar in their performances). Never use MEGA if you seriously want to publish your data, firstly all the major journal won't accept it and secondly, at times, MEGA messes up your topology very badly (Personal experience).
And dude forget about dating your tree using MEGA, BEAST is better.
 
FYI-
You can also use Mr. Bayes, it works on bayesian principle, and people use it in addition to ML analysis for better understanding and interpretation of their results. And also it is widely accepted.
Cheers!
3 Recommendations
29th Jul, 2015
Narjes Alfuraiji
University of Kerbala
 Hi every one
I am wondering if any of you can suggest a reliable program to align 1184 protein sequences as I want to construct my tree using maximum likelihood
Appreciate your valuable suggestions  .   
16th Nov, 2015
Aram Jaf
University of Nottingham
how to assess reliability of partitions given in a tree?
Bootstrapping is one of the most popular ways to assess the reliability of branches. The term bootstrapping goes back to the Baron Münchhausen (pulled himself out of a swamp by his shoe laces). Briefly, positions of the aligned sequences are randomly sampled from the multiple sequence alignment with replacements.? The sampled positions are assembled into new data sets, the so-called bootstrapped samples. Each position has an about 63% chance to make it into a particular bootstrapped sample. If a grouping has a lot of support, it will be supported by at least some positions in each of the bootstrapped samples, and all the bootstrapped samples will yield this grouping. Bootstrapping can be applied to all methods of phylogenetic reconstruction. 
Bootstrapping thus realizes the impossible: the evolution of sequences in real life happened only once, and it is impossible to run the evolution of, let's say, small subunit ribosomal RNAs again. Nevertheless, using the resampling approach, pseudosamples are generated that have a variation that resembles the variation one would have obtained, if it were possible to sample 100 or 1000 parallel worlds in which the evolution of 16S rRNAs occurred over and over again. You end up with a statistical analyses using a single original sample only.
Bootstrapping has become very popular to assess the reliability of reconstructed phylogenies. Its advantage is that it can be applied to different methods of phylogenetic reconstruction, and that it assigns a probability-like number to every possible partition of the dataset (= branch in the resulting tree). Its disadvantage is that the support for individual groups decreases as you add more sequences to the dataset, and that it just measures how much support for a partition is in your data given a method of analysis. If the method of reconstruction falls victim to a bias or an artifact, this will be reproduced for every of the bootstrapped samples, and it will result in high bootstrap support values.
also you can found more on this link:
10 Recommendations
16th Nov, 2015
Aram Jaf
University of Nottingham
you can see this paper also, on this link
4 Recommendations
17th Nov, 2015
Lauri Kaila
University of Helsinki
Aram points a very crucial matter: "Its disadvantage is that the support for individual groups decreases as you add more sequences to the dataset". In other words: the better your data, the lower the support values! One should really stop to think this before blindly interpreting reliability of results using support values. There are very recent examples with Lepidopteran phylogeny studies (in high-esteemed journals), where with inferior taxon sampling very high bootstrap values are acquired with worrisomely mismatching results. But if taxon sampling is increased, bootstrap values become lower, even though the result is rather likely closer to a realistic one. I have not had much hardship in publishing results with very low bootstrap values, as long as there are logical reasons for to explain them in the particular data sets (in high-esteemed journals).
8 Recommendations
30th Nov, 2015
Rafael Molina-Venegas
Universidad de Sevilla
I am quite agree with Lauri Kaila point of view. The "problem" of low bootstrap values is even more notable when dealing with super-matrices of DNA, where you combine both low-evolving and fast-evolving regions of DNA (for example, to build the angiosperm phylogeny of a particular region). Since fast-evolving regions can not be globally aligned due to sequence saturation, they are usually clustered taxonomically (e.g. family level) and then global and taxonomically local alignments are concatenated in a single super-matrix (e.g. Roquet et al., 2012). As a result, you get many missing data in the final super-matrix, which necessarily make bootstrap values lower.
2 Recommendations
11th Apr, 2016
Jyrki Muona
University of Helsinki
The basic problem is that there is no agreeement about "support". Wheeler & Pickett (2008) showed with real data that one result had clade posterior support of 0.6, whereas another one Bremer support of 2, jacknife support of 0.57 and log likelihoord ratio 0,25. Same taxa, same data, same models. Trees based on bootstrapping do not need to agree with the optimal tree and may indeed support "wrong" conclusions more often than "right" ones.
I also recommend the Hillis & Bull (1993) paper. Simulations for sure, but the results show quite clearly that bootstrap values are a poor measure of support. I think Rafael's example exemplifies this. The more data there are, the more difficult it is to find the optimal solution and the more likely (in the everyday sense at least) it is that the resampling efforts give a very different result.
5 Recommendations
27th May, 2016
Eduardo Zavala
National Institute on Aging
Aram Jaf, thank you for a really great answer. I did not quite understand where you say     "if it were possible to sample 100 or 1000 parallel worlds in which the evolution of 16S rRNAs occurred over and over again. You end up with a statistical analyses using a single original sample only." But...it sounds cool, am going to learn more about this boot tying thing...I mean bootstrapping. 
18th Jun, 2016
Huynh Ky
Can Tho University
since the statistic for construct the phylogenetic tree...i think many wrong conclusion with evolution... 
13th Apr, 2017
Atif Idrees
Fujian Agriculture and Forestry University
The expression of a "bootstrap" comes from the tales of Baron von Münchhausen, whom found himself stuck in a deep hole. To get out, a grabbed his boots by the boostraps and pulled himself upwards, unto he got step out of the hole. (This is of cause impossible and the Baron's lies are famous!)
In the statistical context, bootstrapping refers to using the data at hand to infer the uncertainty of said data. I.e. improve the statistic by pulling on its bootstraps. In practice, this is achieved by sampling or permuting the input data.
As an example, consider a sample of 100 sheep of which you have tested their buoyancy. You can give the mean and standard deviation of these 100 sheep, but we wish to infer more about the general buoyancy of sheep. However, after our experiments, we are banned from the farm and cannot test more sheep. When taking a sample of a population, we assume they are representative of the entire population; thus if we repetitive sample by replacement from our 100 sheep and calculate the same test statistics, we get an idea of the whole population. This is the basic idea of bootstrapping.
In terms of your phylogenetic tree, the bootstrapping values indicates how many times out of 100 (in your case) the same branch was observed when repeating the phylogenetic reconstruction on a re-sampled set of your data. If you get 100 out of 100 (and your data is sufficiently large to support this), we are pretty damned sure that the observed branch is not due to a single extreme datapoint. If you get 50 out of 100, we cannot be as certain. 
What is a bootstrap value?. Available from: https://www.researchgate.net/post/What_is_a_bootstrap_value [accessed Apr 13, 2017].
What is a bootstrap value?. Available from: https://www.researchgate.net/post/What_is_a_bootstrap_value#58ef7ba6b0366d8cd460abfe [accessed Apr 13, 2017].
6 Recommendations
6th Jun, 2018
Georg Hochberg
Max Planck Institute for Terrestrial Microbiology
For anyone still interested, I highly recommend using transfer bootstrap estimates (see https://www.nature.com/articles/s41586-018-0043-0). They have a more natural interpretation that Felsenstein's classic bootstrap and don't suffer from systematically lower support for deeper nodes as Felsenstein's does. The actual bootstrapping is done exactly as before (so you can keep using MEGA, RaxML or PhyML), you simply analyse the resulting bootstrap trees using this program (https://github.com/evolbioinfo/booster/releases).
Cheers,
Georg
6 Recommendations
20th Mar, 2019
Guido W Grimm
Trotters Independent Traders
It may be a bit late, but I just accidently stepped over this thread, and have to notice that most (all) answers above are profoundly influenced by tree-thinking and common practise, rather than addressing the very nature of unambiguous bootstrap (BS) support, i.e. why do we get BS < 100 for a certain branch. Also, the question is still a good one, noticing the many errors in peer-reviewed phylogenetic papers across all biological disciplines (some of which surface in the comments so far).
A good starting read for the BS-unaware is this post by David Morrison:
Some things you probably don't know about the bootstrap
For those looking for the hard stuff, Felsenstein's (2004) book is still a good read:
Felsenstein J. 2004. Inferring phylogenies. Sunderland, MA, U.S.A.: Sinauer Associates Inc.
Already the first error in some responses above: we never estimate "node support" with currently used tree-inference programmes, but always branch-support! Only by rooting the tree, a post-inference graphical modification, e.g. using what we consider to be the outgroup, we can interpret the estimated branch support as a support for one of the two nodes, the root-distal one, terminating each inferred branch (the so-called "internode"; all commonly used tree-inference optimise unrooted trees). We bootstrap the matrix and infer a tree based on this pseudo-replicate of our entire data, we repeat this process and then count how often a certain phylogenetic split, a taxon bipartition, occurs in the pseudo-replicate tree sample. A phylogenetic tree is a one-dimensional graph put together by a series of such taxon bipartitions, and the standard BS approach is to optimise a tree, and then map the support from the BS sample on that tree. Note, that e.g. RAxML has an option to plot branch support values against each other, and to map different tree samples on tree not inferred simultaneously, e.g. you can read in a Bayesian tree sample and map the values on the ML tree; or read in the highest-probability Bayesian tree and map the ML-BS support on that tree.
Second, BS support or 'bootstrap percentages' are not probabilities, they measure the robustness of character support. This distinguishes them from Bayesian-estimated posterior probabilities and maybe the reason we write: BS = 50 but PP = 0.5). However, depending on the stringency (coherence) of the signal in a matrix, BS support may converge to the actual probability.
Third, and most importantly, you have to check out why the BS < 100. Let's say a branch in our tree that supports that A is sister of B, has a BS = 70, the widely used threshold for moderate vs. good BS support. The taxon bipartition we search for in the BS sample is:
A + B | all others
With perfect data, a BS support of 70 for this split means that 70% of the variable sites support A as sister of B (and assuming the root is outside A + B). In reality, the number of segregating sites supporting the split may be higher or substantially lower (it also depends on how we infer the BS replicate trees for our sample).
Now the crucial question is what do the other 30% show?
  • The either do not resolve this particular bipartition at all (A and/or B are part of a soft polytomy) or produce a great variety of random bipartitions (e.g A but not B sister to C, D, E, F, G, H ...) all of which have frequencies of < 5% in the BS pseudo-replicate sample – this means: the sister relationship of A and B is supported by not perfect (somewhat faint) but coherent signal in the matrix.
  • They consistently support a conflicting, alternative topology that places A as sister to C, which accordingly receives BS <= 30 — this means: part of your data prefers A as sister to B, but the other significant part prefers A as sister to C, you have internal signal conflict! And your tree only shows a part of the possible truth.
Internal signal conflict, also called tree-incompatible signal, can have various reasons. For instance (some of them occur in conjunction):
  • 70% of your segregating sites are from maternally inherited genomes (plastome, mitochondriome), which are stronger affected by genetic drift and biogeography; 30% are from the biparentally inherited nucleome.
  • Incomplete lineage sorting: part of your genes/data show different aspects of the true tree (or the coalescent).
  • Combination of data with strongly differing evolutionary rates. The fast-evolving traits, genes may get the leaves right, but will be increasingly wrong towards the roots (saturation effects, branching artefacts); slow-evolving, conserved patterns can better resolve deep relationships but will not provide any support (or wrong support, when crucial data are missing due to sequencing gaps; ML is less vulnerable to this than MP or distance-based approaches) for the tip branches.
  • All processes of evolutionary reticulation, hybridisation, introgression, lateral gene transfer etc (except for complete takeover by unilateral introgression) can express themselves in split BS support patterns.
  • In the special application of palaeontology (morphological data sets including or exclusively compiled for extant organisms) actual ancestor-descendant relationships, or, in general, the overall level of primitiveness/derivedness. For instance: if A is the ancestor of B and C, and matrix perfectly reflects this situation, than both equally correct and wrong alternatives (A + B) + C and (A + C) + B will converge to BS = 50. In reality, the more primitive A and the more derived B and C, a third alternative A + (B + C) will also take its share. The faintness of coherent signal and actual ancestor-descendant relationships (old vs. young fossils vs. modern-day relatives) is the reason why palaeontological phylogenetic studies never get high levels of BS support (and if they do, it's either trivial relationships, when A is identical to B but different from anything else, BS ~ 100, or branching artefacts such as long-branch attraction)
So if you have BS << 100, you need to explore the cause for this! Which is very easy to do using bootstrap consensus networks. See e.g. this post:
How did I do it – a short guide to a nice graph
You just read in your tree sample into SplitsTree (www.splitstree.org; detailed walk-through in the post); for the R-affine, the PHANGORN library for R now includes this option, too, and some some other functions to transfer information between trees and networks, see this (open-access paper; there are vignettes introducing the new functions)
Schliep K, Potts AJ, Morrison DA, Grimm GW. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution DOI:10.1111/2041-210X.12760.
A lot of examples for why we should stop ignoring why a BS becomes ambiguous can be found at the Genealogical World of Phylogenetic Networks:
PS Note that all internal data conflict much easier find a reflection in the BS support, the Bayesian PP will easily tilt to one alternative (because this is what the MCMCMC chain is trained for, to find the tree that best explains all the data). There's a much overlooked brilliant paper regarding split support values.
Zander RH. 2004. Minimal values of reliability of Bootstrap and Jackknife proportions, Decay index, and Bayesian posterior probability. PhyloInformatics 2:1-13.
26 Recommendations
20th Mar, 2019
Lauri Kaila
University of Helsinki
Many thanks Guido! This explanation with clarification of terminology and basis of BS seems excellent, and all of us should read it carefully.
15th May, 2019
K. Manikantan Tirumulpad
Tropical Botanical Garden and Research Institute
Guys thank you for posting the links of the articles. They are very good publications 👍. Now answering the query. Its based on what we do with plant specimens. Take 2 or more locus, atleast one resolving at genus scale and multiple locus (2 or more) which can resolve at species scale. The bootstrapping values indicates how many times out of 100, the same branch was observed when repeating the phylogenetic reconstruction on a re-sampled set of the data. While building the tree use a cut off value of 50%. This will give an idea of what actually is happening with ur species and genus of interest. Try to include more samples from diverse environment( I mean a similar species growing in different edaphic and climatic conditions ). We normally do not consider one with less than 50 bootstrap value, to avoid any false results. Unless we have any convincing evidence of trait. Now coming to RAxML and MEGA, RAxML is a software built specifically for maximum likelihood analysis and it has great features. But, MEGA is also very effective and is a better option for guys working on Windows platform.
2 Recommendations
Can you help by adding an answer?

Similar questions and discussions

Related Publications

Article
Full-text available
Four cidaroids and two regular echinoids were newly sequenced and used for phylogenetic analysis. In this a study, a new DNA extraction method was tested. Coelomic fluid was collected from the coelom using a syringe through the peristome. The collected coelomic fluid was centrifuged to precipitate the cells, and DNA extraction was performed from th...
Data
Phylogenetic analysis of the malonyl-CoA ACP transacylase protein present in glutaramide gene cluster. Maximum likelihood phylogenetic tree estimated of the AT protein sequence associated with glutaramide AT-less modular type I PKS as AT associates with others AT-less modular type I PKSs.
Data
Phylogenetic analysis of VLY gene. (A) Amino acid sequence alignment of four bladder Gardnerella isolates and representatives from other clades; clades are indicated to the left of each strain/isolate name. Mismatches within the alignment are highlighted (red/blue text). (B) Maximum-Likelihood phylogenetic tree for the VLY gene. Branch supports are...
Got a technical question?
Get high-quality answers from experts.