Preprint
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

Confidence intervals of divergence times and branch lengths do not reflect uncertainty about their clades or about the prior distributions and other model assumptions on which they are based. Uncertainty about the clade may be propagated to a confidence interval by multiplying its confidence level by the bootstrap proportion of its clade or by another probability that the clade is correct. (If the confidence level is 95% and the bootstrap proportion is 90%, then the uncertainty-adjusted confidence level is (0.95)(0.90) = 86%.) Uncertainty about the model can be propagated to the confidence interval by reporting the union of the confidence intervals from all the plausible models. Unless there is no overlap between the confidence intervals, that results in an uncertainty-adjusted interval that has as its lower and upper limits the most extreme limits of the models. The proposed methods of uncertainty quantification may be used together. https://doi.org/10.5281/zenodo.5212069

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Organisms face tradeoffs in performing multiple tasks. Identifying the optimal phenotypes maximizing the organismal fitness (or Pareto front) and inferring the relevant tasks allow testing phenotypic adaptations and help delineate evolutionary constraints, tradeoffs, and critical fitness components, so are of broad interest. It has been proposed that Pareto fronts can be identified from high-dimensional phenotypic data, including molecular phenotypes such as gene expression levels, by fitting polytopes (lines, triangles, tetrahedrons, etc.), and a program named ParTI was recently introduced for this purpose. ParTI has identified Pareto fronts and inferred phenotypes best for individual tasks (or archetypes) from numerous datasets such as the beak morphologies of Darwin’s finches and mRNA concentrations in human tumors, implying evolutionary optimizations of the involved traits. Nevertheless, the reliabilities of these findings are unknown. Using real and simulated data that lack evolutionary optimization, we here report extremely high false positive rates of ParTI. The errors arise from phylogenetic relationships or population structures of the organisms analyzed and the flexibility of data analysis in ParTI that is equivalent to p-hacking. Because these problems are virtually universal, our findings cast doubt on almost all ParTI-based results and suggest that reliably identifying Pareto fronts and archetypes from high-dimensional phenotypic data is currently generally difficult.
Article
Full-text available
The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on Apple computers. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge.
Article
Full-text available
The RelTime method estimates divergence times when evolutionary rates vary among lineages. Theoretical analyses show that RelTime relaxes the strict molecular clock throughout a molecular phylogeny, and it performs well in the analysis of empirical and computer simulated datasets in which evolutionary rates are variable. Lozano-Fernandez et al. (2017) found that the application of RelTime to one metazoan dataset (Erwin et al. 2011) produced equal rates for several ancient lineages, which led them to speculate that RelTime imposes a strict molecular clock for deep animal divergences. RelTime does not impose a strict molecular clock. The pattern observed by Lozano-Fernandez et al. (2017) was a result of the use of an option to assign the same rate to lineages in RelTime when the rates are not statistically significantly different. The median rate difference was 5% for many deep metazoan lineages for Erwin et al. (2011) dataset, so the rate equality was not rejected. In fact, RelTime analysis with and without the option to test rate differences produced very similar time estimates. We found that the Bayesian time estimates vary widely depending on the root priors assigned, and that the use of less restrictive priors produce Bayesian divergence times that are concordant with those from RelTime for Erwin et al. (2011) dataset. Therefore, it is prudent to discuss Bayesian estimates obtained under a range of priors in any discourse about molecular dating, including method comparisons.
Article
Full-text available
The molecular evolutionary genetics analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.
Article
Full-text available
Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead. Electronic supplementary material The online version of this article (doi:10.3758/s13423-015-0947-8) contains supplementary material, which is available to authorized users.
Article
Full-text available
Bootstrapping is a common method for assessing confidence in phylogenetic analyses. Although bootstrapping was first applied in phylogenetics to assess the repeatability of a given result, bootstrap results are commonly interpreted as a measure of the probability that a phylogenetic estimate represents the true phylogeny. Here we use computer simulations and a laboratory-generated phylogeny to test bootstrapping results of parsimony analyses, both as measures of repeatability (i.e., the probability of repeating a result given a new sample of characters) and accuracy (i.e., the probability that a result represents the true phylogeny). Our results indicate that any given bootstrap proportion provides an unbiased but highly imprecise measure of repeatability, unless the actual probability of replicating the relevant result is nearly one. The imprecision of the estimate is great enough to render the estimate virtually useless as a measure of repeatability. Under conditions thought to be typical of most phylogenetic analyses, however, bootstrap proportions in majority-rule consensus trees provide biased but highly conservative estimates of the probability of correctly inferring the corresponding clades. Specifically, under conditions of equal rates of change, symmetric phylogenies, and internodal change of less-than-or-equal-to 20% of the characters, bootstrap proportions of greater-than-or-equal-to 70% usually correspond to a probability of greater-than-or-equal-to 95% that the corresponding clade is real. However, under conditions of very high rates of internodal change (approaching randomization of the characters among taxa) or highly unequal rates of change among taxa, bootstrap proportions >50% are overestimates of accuracy.
Article
Full-text available
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.
Article
Full-text available
The fiducial argument arose from Fisher's desire to create an inferential alternative to inverse methods. Fisher discovered such an alternative in 1930, when he realized that pivotal quantities permit the derivation of probability statements concerning an unknown parameter independent of any assumption concerning its a priori distribution. The original fiducial argument was virtually indistinguishable from the confidence approach of Neyman, although Fisher thought its application should be restricted in ways reflecting his view of inductive reasoning, thereby blending an inferential and a behaviorist viewpoint. After Fisher attempted to extend the fiducial argument to the multiparameter setting, this conflict surfaced, and he then abandoned the unconditional sampling approach of his earlier papers for the conditional approach of his later work. Initially unable to justify his intuition about the passage from a probability assertion about a statistic (conditional on a parameter) to a probability assertion about a parameter (conditional on a statistic), Fisher thought in 1956 that he had finally discovered the way out of this enigma with his concept of recognizable subset. But the crucial argument for the relevance of this concept was founded on yet another intuition--one which, now clearly stated, was later demonstrated to be false by Buehler and Feddersen in 1963.
Article
Full-text available
The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value. Bootstrap estimation is formulated as a two-step sampling procedure: (1) sampling of sequences from the evolutionary process and (2) resampling of the original sequence sample. The probability that a bootstrap resampling of an original sequence sample will support the true tree is found to depend on the model tree, the sequence length, and the probability that a randomly chosen nucleotide site is an informative site. When a trifurcating tree is used as the model tree, the probability that one of the three bifurcating trees will appear in > or = 95% of the bootstrap replicates is < 5%, even if the number of bootstrap replicates is only 50; therefore, the probability of accepting an erroneous tree as the true tree is < 5% if that tree appears in > or = 95% of the bootstrap replicates and if more than 50 bootstrap replications are conducted. However, if a particular bifurcating tree is observed in, say, < 75% of the bootstrap replicates, then it cannot be claimed to be better than the trifurcating tree even if > or = 1,000 bootstrap replications are conducted. When a bifurcating tree is used as the model tree, the bootstrap approach tends to overestimate P when the sequences are very short, but it tends to underestimate that probability when the sequences are long. Moreover, simulation results show that, if a tree is accepted as the true tree only if it has appeared in > or = 95% of the bootstrap replicates, then the probability of failing to accept any bifurcating tree can be as large as 58% even when P = 95%, i.e., even when 95% of the samples from the evolutionary process will support the true tree. Thus, if the rate-constancy assumption holds, bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa.
Article
Full-text available
A simple mathematical method is developed to estimate the number of nucleotide substitutions per site between two DNA sequences, by extending Kimura's (1980) two-parameter method to the case where a G+C-content bias exists. This method will be useful when there are strong transition-transversion and G+C-content biases, as in the case of Drosophila mitochondrial DNA.
Article
Hypothesis tests are conducted not only to determine whether a null hypothesis (H0) is true but also to determine the direction or sign of an effect. A simple estimate of the posterior probability of a sign error is PSE = (1 - PH0) p/2 + PH0, depending only on a two-sided p value and PH0, an estimate of the posterior probability of H0. A convenient option for PH0 is the posterior probability derived from estimating the Bayes factor to be its e p ln(1/p) lower bound. In that case, PSE depends only on p and an estimate of the prior probability of H0. PSE provides a continuum between significance testing and traditional Bayesian testing. The former effectively assumes the prior probability of H0 is 0, as some statisticians argue. In that case, PSE is equal to a one-sided p value. (In that sense, PSE is a calibrated p value.) In traditional Bayesian testing, on the other hand, the prior probability of H0 is at least 50%, which usually brings PSE close to PH0.
Article
A review is provided of the concept confidence distributions. Material covered include: fundamentals, extensions, applications of confidence distributions and available computer software. We expect that this review could serve as a source of reference and encourage further research with respect to confidence distributions.
An Introduction to Molecular Evolution and Phylogenetics
  • L Bromham
Bromham, L., 2016. An Introduction to Molecular Evolution and Phylogenetics. Oxford University Press, Oxford.
Bayesian molecular dating: opening up the black box
  • L Bromham
  • S Duchéne
  • X Hua
  • A M Ritchie
  • D A Duch'ene
  • S Y W Ho
Bromham, L., Duchéne, S., Hua, X., Ritchie, A.M., Duch'ene, D.A., Ho, S.Y.W., 2018. Bayesian molecular dating: opening up the black box. Biological Reviews 93, 1165-1191.
Phylogenetic Trees Made Easy: A How-To Manual
  • B Hall
Hall, B., 2018. Phylogenetic Trees Made Easy: A How-To Manual. Sinauer Associates, New York.