John HuelsenbeckUniversity of California, Berkeley | UCB · Department of Integrative Biology
John Huelsenbeck
Doctor of Philosophy
About
130
Publications
49,724
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
122,765
Citations
Introduction
Publications
Publications (130)
In Bayesian phylogenetic inference, marginal likelihoods can be estimated using several different methods, including the path-sampling or stepping-stone-sampling algorithms. Both algorithms are computationally demanding because they require a series of power posterior Markov chain Monte Carlo (MCMC) simulations. Here we introduce a general parallel...
The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for...
The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We use several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all meth...
Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such...
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrat...
Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such...
Species richness varies considerably among the tree of life which can only be explained by heterogeneous rates of diversification (speciation and extinction). Previous approaches use phylogenetic trees to estimate branch-specific diversification rates. However, all previous approaches disregard diversification-rate shifts on extinct lineages althou...
Motivation
In Bayesian phylogenetic inference, marginal likelihoods are estimated using either the path-sampling or stepping-stone-sampling algorithms. Both algorithms are computationally demanding because they require a series of power posterior Markov chain Monte Carlo (MCMC) simulations. Here we introduce a general parallelization strategy that...
Significance
We show that Bayesian analysis of macroevolutionary mixtures (BAMM)—a method for identifying lineage-specific diversification rates—is flawed. Exposing the problems with BAMM is important both to empiricists (to avoid making unreliable inferences using this method) and to theoreticians (to focus their efforts on solving the problems th...
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed...
Insect phylogeny has recently been the focus of renewed interest as advances in sequencing techniques make it possible to rapidly generate large amounts of genomic or transcriptomic data for a species of interest. However, large numbers of markers are not sufficient to guarantee accurate phylogenetic reconstruction, and the choice of the model of s...
Recent years have seen a rapid expansion of the model space explored in
statistical phylogenetics, emphasizing the need for new approaches to
statistical model representation and software development. Clear communication
and representation of the chosen model is crucial for: (1) reproducibility of
an analysis, (2) model development and (3) software...
Variation in the evolutionary process across the sites of nucleotide sequence
alignments is well established, and is an increasingly pervasive feature of
datasets composed of gene regions sampled from multiple loci and/or different
genomes. Inference of phylogeny from these data demands that we adequately
model the underlying process heterogeneity;...
Significance
Divergence time estimation on an absolute timescale requires external calibration information, which typically is derived from the fossil record. The common practice in Bayesian divergence time estimation involves applying calibration densities to individual nodes. Often, these priors are arbitrarily chosen and specified yet have an ex...
Paired epistatic interactions, such as those in the stem regions of RNA, play an important role in many biological processes.
However, unlike protein-coding regions, paired epistatic interactions have lacked the appropriate statistical tools for the
detection of departures from selective neutrality. Here, a model is presented for the analysis of pa...
Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within a set of discrete geographic areas. The main constraint of these methods is the computational limit on the numbe...
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows...
We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate...
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these appro...
Structurama is a program for inferring population structure. Specifically, the program calculates the posterior probability of assigning individuals to different populations. The program takes as input a file containing the allelic information at some number of loci sampled from a collection of individuals. After reading a data file into computer m...
But Tuffley and Steel (1997) introduced a model called No Common Mechanism (NCM), in which characters may—but are not required to—vary their relative rates independently, both within and between branches. Because the independent variation is taken only as a possibility, not as a requirement, NCM would apply to almost any situation, and so may be ac...
Nearly all commonly used methods of phylogenetic inference assume that characters in an alignment evolve independently of one another. This assumption is attractive for simplicity and computational tractability but is not biologically reasonable for RNAs and proteins that have secondary and tertiary structures. Here, we simulate RNA and protein-cod...
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best appr...
Parallel evolution is the acquisition of identical adaptive traits in independently evolving populations. Understanding whether the genetic changes underlying adaptation to a common selective environment are parallel within and between species is interesting because it sheds light on the degree of evolutionary constraints. If parallel evolution is...
We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based
on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic
analysis. We show on real data that the method outperforms Blast searches as a measure of...
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the an...
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the...
An important challenge in evolutionary biology is to understand how major changes in body form arise. The dramatic transition from a lizard-like to snake-like body form in squamate reptiles offers an exciting system for such research because this change is replicated dozens of times. Here, we use morphometric data for 258 species and a time-calibra...
IntroductionHistoryDeveloping an Intuition of LikelihoodMethod of Maximum LikelihoodBayesian InferenceMarkov Chain Monte CarloAssessing Uncertainty of PhylogeniesHypothesis Testing and Model ChoiceComparative AnalysisConclusions
References
The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those used in current Bayesian phylogenetics software. We sampled 12 empirical nucleotide data sets--ranging in size f...
The statistical methods applied to the analysis of genomic data do not account for uncertainty in the sequence alignment.
Indeed, the alignment is treated as an observation, and all of the subsequent inferences depend on the alignment being correct.
This may not have been too problematic for many phylogenetic studies, in which the gene is carefully...
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the se...
IntroductionMolecular dataModels and phylogenetic inferenceEstimating phylogenyAssessing confidenceTesting models of DNA substitutionConclusions
To distinguish between alternative explanations for the presence of synchronous broods in the Miocene-Pliocene bivalve, Transenriella species. we performed in situ burial experiments of the Recent species T. corfusa. All Recent Transennella species are asynchronous brooders; a single brood contains all or most developmental stages. Specimens from M...
When a beneficial mutation is fixed in a population that lacks recombination, the genetic background linked to that mutation
is fixed. As a result, beneficial mutations on different backgrounds experience competition, or “clonal interference,” that
can cause asexual populations to evolve more slowly than their sexual counterparts. Factors such as a...
Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely used approach considers the number of populations to be fixed and calculates the posterior probability of assigning individuals to each population. More recently, the assignment of individuals to populations and t...
Mossel and Vigoda (Reports, 30 September 2005, p. 2207) show that nearest neighbor interchange transitions, commonly used in phylogenetic Markov chain Monte Carlo (MCMC) algorithms, perform poorly on mixtures of dissimilar trees. However, the conditions leading to their results are artificial. Standard MCMC convergence diagnostics would detect the...
Most methods for detecting Darwinian natural selection at the molecular level rely on estimating the rates or numbers of nonsynonymous and synonymous changes in an alignment of protein-coding DNA sequences. In some of these methods, the nonsynonymous rate of substitution is allowed to vary across the sequence, permitting the identification of singl...
Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recently been applied to the problem of phylogenetic model selection. Here we use a simulation approach to assess the performance of this method and compare it to Akaike weights, a...
Metazoan phylogeny remains one of evolutionary biology's major unsolved problems. Molecular and morphological data, as well as different analytical approaches, have produced highly conflicting results due to homoplasy resulting from more than 570 million years of evolution 1, 2, 3 and 4. To date, parsimony has been the only feasible combined approa...
What does the posterior probability of a phylogenetic tree mean? This simulation study shows that Bayesian posterior probabilities
have the meaning that is typically ascribed to them; the posterior probability of a tree is the probability that the tree
is correct, assuming that the model is correct. At the same time, the Bayesian method can be sens...
Metazoan phylogeny remains one of evolutionary biology's major unsolved problems. Molecular and morphological data, as well as different analytical approaches, have produced highly conflicting results due to homoplasy resulting from more than 570 million years of evolution. To date, parsimony has been the only feasible combined approach but is high...
The unambiguous footprint of positive Darwinian selection in protein-coding DNA sequences is revealed by an excess of nonsynonymous substitutions over synonymous substitutions compared with the neutral expectation. Methods for analyzing the patterns of nonsynonymous and synonymous substitutions usually rely on stochastic models in which the selecti...
The likelihood of a phylogenetic tree is proportional to the probability of observing the comparative data (such as aligned DNA sequences) conditional on the tree. The likelihood function is important because it is the vehicle that carries the observations. The likelihood function can be used in two ways to infer phylogeny. First, the tree that max...
In protein-coding DNA sequences, historical patterns of selection can be inferred from amino acid substitution patterns. High relative rates of nonsynonymous to synonymous changes (ω=d
N
/d
S
) are a clear indicator of positive, or directional, selection, and several recently developed methods attempt to distinguish these sites from those under neu...
A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Crite...
The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we d...
In this chapter, we develop a Bayesian approach to supertree construction. Bayesian inference requires that prior knowledge be specified in terms of a probability distribution and incorporates this evidence in new analyses. This provides a natural framework for the accumulation of phylogenetic evidence, but it requires that phylogenetic results be...
Although the conditions under which the parsimony method becomes inconsistent have been studied for almost two decades, the probability that the parsimony method would encounter conditions causing inconsistency under simple models of cladogenesis is unknown. Here, we examine the statistical behavior of the parsimony method under a birth-death model...
MrBayes 3 performs Bayesian phylogenetic analysis combining information from different data partitions or subsets evolving under different stochastic evolutionary models. This allows the user to analyze heterogeneous data sets consisting of different data types—e.g. morphological, nucleotide, and protein—and to explore a wide variety of structured...
The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of well-supported trees or on a collection of trees generated under a stochastic model of cladogenesis. How...
Many questions in evolutionary biology are best addressed by comparing traits in different species. Often such studies involve mapping characters on phylogenetic trees. Mapping characters on trees allows the nature, number, and timing of the transformations to be identified. The parsimony method is the only method available for mapping morphologica...
This book owes its origins (in more ways than one) to Mark Hafner.
In 1997 he suggested that he and I edit a book on cospeciation. Mark had already assembled a list ofcontributors and approached some publishers, so I readily agreed to undertake what I though would be a straightforward task as junior editor. Soon afterwards, it became clear that Mar...
Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods;
indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution,
ease of interpretation of the results, and computational efficiency. However, the method shou...
Introduction 1.1 The dffds ratio The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays a critical role perhaps as a member in a functionally important structure is unlikely to change over evolutionary time. In fact, most methods aimed at detecting regions o...
The concomitantly variable codons hypothesis of DNA substitution argues that at any time only a fraction of the codons in a gene are capable of accepting a mutation. However, as mutations are fixed at some positions in a gene, the sites that are potentially variable also change because of changed functional constraints. This hypothesis has been ter...
Phylogenetic trees can be rooted by a number of criteria. Here, we introduce a Bayesian method for inferring the root of a
phylogenetic tree by using one of several criteria: the outgroup, molecular clock, and nonreversible model of DNA substitution.
We perform simulation analyses to examine the relative ability of these three criteria to correctly...
Many biogeographic problems are tested on phylogenetic trees. Typically, the uncertainty in the phylogeny is not accommodated when investigating the biogeography of the organisms. Here we present a method that accommodates uncertainty in the phylogenetic trees. Moreover, we describe a simple method for examining the support for competing biogeograp...
Identifying positively selected amino acid sites is an important approach for making inference about the function of proteins; an amino acid site that is undergoing positive selection is likely to play a key role in the function of the protein. We present a new Bayesian method for identifying positively selected amino acid sites and apply the metho...
As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a number of outstanding issues in evolutionary biology, including...
As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be
asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of
phylogeny brings a new perspective to a number of outstanding issues in evolutionary biology, including...
The program MRBAYES performs Bayesian inference of
phylogeny using a variant of Markov chain Monte Carlo.
Availability: MRBAYES, including the source code, documentation,
sample data files, and an executable, is available at http://brahms.biology.rochester.edu/software.html.
Contact: johnh{at}brahms.biology.rochester.edu
Several methods have been proposed to infer the states at the ancestral nodes on a phylogeny. These methods assume a specific tree and set of branch lengths when estimating the ancestral character state. Inferences of the ancestral states, then, are conditioned on the tree and branch lengths being true. We develop a hierarchical Bayes method for in...
Several methods have been proposed to infer the states at the ancestral nodes on a phylogeny. These methods assume a specific tree and set of branch lengths when estimating the ancestral character state. Inferences of the ancestral states, then, are conditioned on the tree and branch lengths being true. We develop a hierarchical Bayes method for in...
Bacteriophage of the family Leviviridae have played an important role in molecular biology where representative species, such as Q beta and MS2, have been studied as model systems for replication, translation, and the role of secondary structure in gene regulation. Using nucleotide sequences from the coat and replicase genes we present the first st...
MrBayes is a program for the Bayesian inference of phylogeny. This manual explains Bayesian inference of phylogeny and how to use the program. The program has a command-line interface and should run on a variety of computer platforms. Note that the computer should be reasonably fast and should have a lot of memory (depending on the size of the data...
Many evolutionary studies use comparisons across species to detect evidence of natural selection and to examine the rate of character evolution. Statistical analyses in these studies are usually performed by means of a species phylogeny to accommodate the effects of shared evolutionary history. The phylogeny is usually treated as known without erro...
Information on the history of cospeciation and host switching for a group of host and parasite species is contained in the DNA sequences sampled from each. Here, we develop a Bayesian framework for the analysis of cospeciation. We suggest a simple model of host switching by a parasite on a host phylogeny in which host switching events are assumed t...
The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a co...
— Information on the history of cospeciation and host switching for a group of host and parasite species is contained in the DNA sequences sampled from each. Here, we develop a Bayesian framework for the analysis of cospeciation. We suggest a simple model of host switching by a parasite on a host phylogeny in which host switching events are assumed...
All current phylogenetic methods assume that DNA substitutions are independent among sites. However, ample empirical evidence suggests that the process of substitution is not independent but is, in fact, temporally and spatially correlated. The robustness of several commonly used phylogenetic methods to the assumption of independent substitution is...