Paul O Lewis

Paul O Lewis
  • Doctor of Philosophy
  • Professor at University of Connecticut

About

92
Publications
19,744
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,106
Citations
Introduction
Bayesian statistics related to phylogenetics
Current institution
University of Connecticut
Current position
  • Professor

Publications

Publications (92)
Preprint
The multispecies coalescent (MSC) model applies coalescent theory to gene evolution within and among reproductively isolated populations ("species") to estimate a species tree in the face of gene tree conflict resulting from deep coalescence. Sequential Monte Carlo (SMC) uses particle filtering to sample a posterior distribution, providing a fully-...
Article
The Lowest Radial Distance (LoRaD) method is a modification of the recently introduced Partition-Weighted Kernel method for estimating the marginal likelihood of a model, a quantity important for Bayesian model selection. For analyses involving a fixed tree topology, LoRaD improves upon the Steppingstone or Thermodynamic Integration (Path Sampling)...
Article
Full-text available
Lanchester's models of combat have been invoked to explain the mechanics of group fighting in social animals. Specifically, Lanchester's square law posits that the fighting ability of the group is proportional to the square of the number of combatants. Although used to explain a variety of ecological phenomena, the models have not been thoroughly t...
Preprint
Full-text available
In 2020 and 2021, the COVID-19 pandemic led to an abrupt overhaul of many academic practices, including the transition of scientific events, such as workshops, to a fully virtual format. We describe our experiences organizing and teaching online-only statistical phylogenetics workshops and the lessons we learned along the way. We found that online...
Article
It is of great practical importance to compare and combine data from different studies in order to carry out appropriate and more powerful statistical inference. We propose a partition based measure to quantify the compatibility of two datasets using their respective posterior distributions. We further propose an information gain measure to quantif...
Data
DNA sequence alignments and the associated python script that removes uncertainly aligned sites, which are indicated (masked) in the nexus file. Explanatory text included in the nexus file.
Article
Full-text available
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrat...
Article
In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper,...
Article
Full-text available
Background: Chlorophyceae is one of three most species-rich green algal classes and also the only class in core Chlorophyta whose monophyly remains uncontested as gene and taxon sampling improves. However, some key relationships within Chlorophyceae are less clear-cut and warrant further investigation. The present study combined genome-scale chlor...
Article
With the rapid reduction in sequencing costs of high-throughput genomic data, it has become commonplace to use hundreds of genes to infer phylogeny of any study system. While sampling a large number of genes has given us a tremendous opportunity to uncover previously unknown relationships and improve phylogenetic resolution, it also presents us wit...
Article
The computation of marginal posterior density in Bayesian analysis is essential in that it can provide complete information about parameters of interest. Furthermore, the marginal posterior density can be used for computing Bayes factors, posterior model probabilities, and diagnostic measures. The conditional marginal density estimator (CMDE) is th...
Article
Full-text available
Premise of the study: Phylogenomic analyses across the green algae are resolving relationships at the class, order, and family levels and highlighting dynamic patterns of evolution in organellar genomes. Here we present a within-family phylogenomic study to resolve genera and species relationships in the family Hydrodictyaceae (Chlorophyceae), for...
Preprint
Full-text available
With the rapid reduction in sequencing costs of high-throughput genomic data, it has become commonplace to use hundreds of genes/sites to infer phylogeny of any study system. While sampling large number of genes has given us a tremendous opportunity to uncover previously unknown relationships and improve phylogenetic resolution, it also presents us...
Article
Full-text available
Premise of the study: Spermacoceae are mainly an herbaceous group in the Rubiaceae. However, a few lineages are woody and are found in a diverse range of habitat types. Three of the largest woody lineages (Arcytophyllum, Hedyotis, and Kadua) are characterized by their distribution in the moist tropical mountains and have disjunct distribution patt...
Article
Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte...
Article
Premise of the study: Estimating phylogenetic relationships in relatively recent evolutionary radiations is challenging, especially if short branches associated with recent divergence result in multiple gene tree histories. We combine anchored enrichment next-generation sequencing with species tree analyses to produce a robust estimate of phylogen...
Article
Full-text available
Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared to the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant t...
Article
PREMISE OF THE STUDY: Discovery and morphological characterization of a novel epiphytic aquatic green alga increases our understanding of Chaetopeltidales, a poorly known order in Chlorophyceae. Chloroplast genomic data from this taxon reveals an unusual architecture previously unknown in green algae. METHODS: Using light and electron microscopy, w...
Article
Full-text available
Abstract • Premise of the study: Discovery and morphological characterization of a novel epiphytic aquatic green alga increases our understanding of Chaetopeltidales, a poorly known order in Chlorophyceae. Chloroplast genomic data from this taxon reveals an unusual architecture previously unknown in green algae. • Methods: Using light and electron...
Article
Full-text available
The chloroplast genomes of green algae are highly variable in their architecture. In this article we summarize gene content across newly obtained and published chloroplast genomes in Chlorophyceae, including new data from nine of species in Sphaeropleales (Chlorophyceae, Chlorophyta). We present genome architecture information, including genome syn...
Article
Chloroplast sequence data are widely used to infer phylogenies of plants and algae. With the increasing availability of complete chloroplast genome sequences, the opportunity arises to resolve ancient divergences that were heretofore problematic. On the flip side, properly analyzing large multi-gene data sets can be a major challenge, as these data...
Article
Full-text available
Hedyotis and related genera (here called the Hedyotis-Oldenlandia complex) are highly debated groups in the Rubiaceae family with no consensus to date on their generic delimitations. The present study focuses on Asian-Pacific taxa from these groups and aims at resolving taxonomic inconsistencies by describing monophyletic genera within the complex....
Article
Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods...
Article
Full-text available
The method of fitting a hierarchical model with Dirichlet process mixing is a versatile tool for data analysts. It has been applied to density estimation, classification , clustering, and high dimensional data analysis. Many computing algorithms have been proposed to evaluate this mixture. Different labels in the algorithm that assign data points i...
Article
Full-text available
Phylogenetic relationships in the green algal phylum Chlorophyta have long been subject to debate, especially at higher taxonomic ranks (order, class). The relationships among three traditionally defined and well-studied classes, Chlorophyceae, Trebouxiophyceae, and Ulvophyceae are of particular interest, as these groups are species-rich and ecolog...
Article
Unlike most other green algae, trebouxiophyceans are predominantly aerophytic and contain many symbiotic representatives. In recent years, a number of new terrestrial trebouxiophycean taxa were described from soils, tree bark, and lichens. The present phylogenetic study reveals three new lineages of free-living trebouxiophyceans found in North Amer...
Article
Full-text available
The majority of our knowledge about mitochondrial genomes of Viridiplantae comes from land plants, but much less is known about their green algal relatives. In the green algal order Sphaeropleales (Chlorophyta), only one representative mitochondrial genome is currently available—that of Acutodesmus obliquus. Our study adds nine completely sequenced...
Article
Full-text available
We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to poste...
Article
Best known for aquatic colonial algae such as Hydrodictyon, Pediastrum, or Scenedesmus, the order Sphaeropleales also contains numerous coccoid taxa from aquatic and terrestrial habitats. Recent findings indicate that coccoid lineages in this order are very diverse genetically and may be the prevalent form, although their diversity is often hidden...
Article
Phylogenetics, the study of evolutionary relationships among groups of organisms, has played an important role in modern biological research, such as genomic comparison, detecting orthology and paralogy, estimating divergence times, reconstructing ancient proteins, identifying mutations likely to be associated with disease, determining the identity...
Article
Molecular phylogenetic analyses have had a major impact on the classification of the green algal class Chlorophyceae, corroborating some previous evolutionary hypotheses, but primarily promoting new interpretations of morphological evolution. One set of morphological traits that feature prominently in green algal systematics is the absolute orienta...
Article
Full-text available
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these appro...
Article
Full-text available
The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo...
Article
Full-text available
Bayesian phylogenetic analyses often depend on Bayes factors (BFs) to determine the optimal way to partition the data. The marginal likelihoods used to compute BFs, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new more accurate method for estimating the marginal likel...
Article
Full-text available
This is an electronic version of an article published in Systematic Biology [Holder, Mark T., Jeet Sukumaran, and Paul O. Lewis. A justification for reporting majority-rule consensus tree in Bayesian phylogenetics. Systematic Biology, 57(5):814{821, 2008.] Systematic Biology is available online at informaworld http://dx.doi.org/10.1080/106351508024...
Article
The plastid genome sequence of the parasitic liverwort Aneura mirabilis revealed the loss of five chlororespiration (ndh) genes. Additionally, six ndh genes, subunits of photosystem I, photosystem II, and the cytochrome b6f complex were inferred to be pseudogenes. Pseudogenes of cysA, cyst, ccsA, and ycf3, an inversion of psbE and petL, were also d...
Article
Full-text available
In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent)....
Article
Full-text available
Understanding spatial patterns of species diversity and the distributions of individual species is a consuming problem in biogeography and conservation. The Cape Floristic Region (CFR) of South Africa is a global hotspot of diversity and endemism, and the Protea Atlas Project, with some 60,000 site records across the region, provides an extraordin...
Article
Deserts are not usually considered biodiversity hotspots, but desert microbiotic crust communities exhibit a rich diversity of both eukaryotic and prokaryotic life forms. Like many communities dominated by microscopic organisms, they defy characterization by traditional species-counting approaches to assessing biodiversity. Here we use exclusive mo...
Article
Full-text available
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch leng...
Article
Full-text available
We examined isozyme variation in the dominant Chihuahuan Desert shrub, Larrea tridentata (creosotebush), to determine the genetic variation within and among populations, the biogeographic relationships of populations, and the potential inbreeding in the species. We surveyed 17 populations consisting of 20 to 50 individuals per population along a 16...
Article
Polygonella macrophylla Small is a rare, perennial, primarily gynodioecious plant species endemic to a narrow zone of coastal sand pine scrub habitat along the Gulf of Mexico from Carrabelle, Florida, to Gulf Shores, Alabama (U.S.A.). The species is comprised of a crimson-red flowered form (“rubra”), known from only two populations at the eastern d...
Article
Full-text available
Unlabelled: The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored th...
Article
Within-population variability in plant reproductive traits can influence both male and female fitness, but research on the male function of flowers has been hindered by the difficulty of measuring male fertility. Here we evaluate studies that employ paternity analysis to examine how specific plant traits affect male reproductive success (RS) in bot...
Article
Deserts are not thought of as biodiversity hotspots, but desert microbiotic crust communities represent a largely unknown community type rich in diversity of eukaryotic and prokaryotic taxa. These ecologically important communities have received much attention because of their role in nutrient cycling and soil stabilization in deserts, but they def...
Article
Full-text available
The construction of evolutionary trees is now a standard part of exploratory sequence analysis. Bayesian methods for estimating trees have recently been proposed as a faster method of incorporating the power of complex statistical models into the process. Researchers who rely on comparative analyses need to understand the theoretical and practical...
Article
Full-text available
We investigated the usefulness of a parallel genetic algorithm for phylogenetic inference under the maximum-likelihood (ML) optimality criterion. Parallelization was accomplished by assigning each "individual" in the genetic algorithm "population" to a separate processor so that the number of processors used was equal to the size of the evolving po...
Article
Molecular markers derived from polymerase chain reaction (PCR) amplification of genomic DNA are an important part of the toolkit of evolutionary geneticists. Random amplified polymorphic DNA markers (RAPDs), amplified fragment length polymorphisms (AFLPs) and intersimple sequence repeat (ISSR) polymorphisms allow analysis of species for which previ...
Article
Molecular markers derived from PCR amplification of genomic DNA are an important part of the toolkit of evolutionary geneticists. RAPDs, AFLPs, and ISSR polymorphisms allow analysis of species for which prior DNA sequence information is lacking, but dominance makes it impossible to apply standard techniques to calculate F-statistics. We describe a...
Article
Full-text available
Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standa...
Article
The Bayesian Methods in Phylogenetics symposium was held as part of Evolution 2001 in Knoxville, TN, USA, from 26 to 30 June 2001.
Article
Full-text available
Convergence in nucleotide composition (CNC) in unrelated lineages is a factor potentially affecting the performance of most phylogeny reconstruction methods. Such convergence has deleterious effects because unrelated lineages show similarities due to similar nucleotide compositions and not shared histories. While some methods (such as the LogDet/pa...
Article
Long restricted to the domain of molecular systematics and studies of molecular evolution, likelihood methods are now being used in analyses of discrete morphological data, specifically to estimate ancestral character states and for tests of character correlation. Biologists are beginning to apply likelihood models within a Bayesian statistical fra...
Data
PAUPRat is a program to generate a text file that contains commands which, when read by PAUP* (with an open and executed data file), will implement the Parsimony Ratchet of Kevin Nixon (1999b). The Parsimony Ratchet is a search strategy that is surprisingly efficient at finding shortest trees for data sets too large for traditional heuristic search...
Article
Full-text available
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neig...
Chapter
Full-text available
Much of what we know about the phylogenetic relationships among organisms has resulted from the application of parsimony methods to discrete character data. Huelsenbeck (1995) reported that 59% of phylogenetic analyses reported in the journal Systematic Biology over the past decade used parsimony rather than compatibility, invariants, distance, or...
Article
Advanced-generation domestication programs for forest-tree species has raised some concerns about the maintenance of genetic diversity in forest-tree breeding programs. Genetic diversity in natural stands was compared with two genetic conservation options for a third-generation elite Pinus taeda breeding population. The breeding population was subd...
Article
Full-text available
We used simulated data to investigate a number of properties of maximum-likelihood (ML) phylogenetic tree estimation for the case of four taxa. Simulated data were generated under a broad range of conditions, including wide variation in branch lengths, differences in the ratio of transition and transversion substitutions, and the absence of presenc...
Article
Populations of each of the 11 species of the North American angiosperm genus Polygonella (Polygonaceae) were sampled for electrophoretically detectable allozyme diversity. In contrast to expectations based on similar surveys in many other vascular plant groups, the two most widespread species of Polygonella showed reduced within-population gene div...
Article
Populations of each of the 11 species of the North American angiosperm genus Polygonella (Polygonaceae) were sampled for electrophoretically detectable allozyme diversity. In contrast to expectations based on similar surveys in many other vascular plant groups, the two most widespread species of Polygonella showed reduced within-population gene div...
Article
The Random Amplified Polymorphic DNA (RAPD) technique can potentially provide hundreds of polymorphic markers for use by ecologists studying mating systems in natural populations. We consider here the implications of the dominance displayed by RAPD markers for deterministic paternity assignment. Our goal was to provide a means for assessing the cos...
Article
Thesis (Ph. D.)--Ohio State University, 1991. Includes bibliographical references. Photocopy. s

Network

Cited By