About
116
Publications
14,468
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,397
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (116)
Hawaiian lobeliads exhibit extensive adaptive radiations and are considered the largest plant clade (143 species) endemic to any oceanic archipelago. Rapid insular radiations are prone to reticulate evolution, yet detecting hybridization is often limited by inadequate sampling of taxa or independent loci. We analyzed 633 nuclear loci (including tet...
Inference of phylogenetic networks is of increasing interest in the genomic era. However, the extent to which phylogenetic networks are identifiable from various types of data remains poorly understood, despite its crucial role in justifying methods. This work obtains strong identifiability results for large sub-classes of galled tree-child semidir...
F -statistics are commonly used to assess hybridization, admixture or introgression between populations or deeper evolutionary lineages. Their fast calculation from allele frequencies allows for rapid downstream admixture graph inference. One frequently overlooked assumption of the f 4 -test is a constant substitution rate. This assumption is typic...
Semidirected networks have received interest in evolutionary biology as the appropriate generalization of unrooted trees to networks, in which some but not all edges are directed. Yet these networks lack proper theoretical study. We define here a general class of semidirected phylogenetic networks, with a stable set of leaves, tree nodes and hybrid...
The current study aimed to fill the gap in research on factors predictive of word reading in French-speaking children with developmental language disorder (DLD) by finding out whether the same predictors of written word recognition evidenced in typically developing children would be retrieved in children with DLD or if some predictors could be spec...
In phylogenetic networks, it is desirable to estimate edge lengths in substitutions per site or calendar time. Yet, there is a lack of scalable methods that provide such estimates. Here we consider the problem of obtaining edge length estimates from genetic distances, in the presence of rate variation across genes and lineages, when the network top...
Semidirected networks have received interest in evolutionary biology as the appropriate generalization of unrooted trees to networks, in which some but not all edges are directed. Yet these networks lack proper theoretical study. We define here a general class of semidirected phylogenetic networks, with a stable set of leaves, tree nodes and hybrid...
The evolution of molecular and phenotypic traits is commonly modelled using Markov processes along a rooted phylogeny. This phylogeny can be a tree, or a network if it includes reticulations, representing events such as hybridization or admixture. Computing the likelihood of data observed at the leaves is costly as the size and complexity of the ph...
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. W...
Tradeoffs between the energetic benefits and costs of traits can shape species and trait distributions along environmental gradients. Here we test predictions based on such tradeoffs using survival, growth, and 50 photosynthetic, hydraulic, and allocational traits of ten Eucalyptus species grown in four common gardens along an 8-fold gradient in pr...
Within-species trait variation may be the result of genetic variation, environmental variation, or measurement error, for example. In phylogenetic comparative studies, failing to account for within-species variation has many adverse effects, such as increased error in testing hypotheses about evolutionary correlations, biased estimates of evolution...
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of...
A bstract
Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the...
We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance....
A bstract
The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the ass...
A bstract
We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inh...
Mycoheterotrophy is an alternative nutritional strategy whereby plants obtain sugars and other nutrients from soil fungi. Mycoheterotrophy and associated loss of photosynthesis has evolved repeatedly in plants, particularly in monocots. Although reductive evolution of plastomes in mycoheterotrophs is well documented, the dynamics of nuclear genome...
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could...
We assess relationships among 192 species in all 12 monocot orders and 72 of 77 families, using 602 conserved single-copy (CSC) genes and 1375 benchmarking single-copy ortholog (BUSCO) genes extracted from genomic and transcriptomic datasets. Phylogenomic inferences based on these data, using both coalescent-based and supermatrix analyses, are larg...
Motivation:
Kinship estimation is necessary for evaluating violations of assumptions or testing certain hypotheses in many population genomic studies. However, kinship estimators are usually designed for diploid systems and cannot be used in populations with mixed haploid diploid genetic systems. The only estimators for different ploidies require...
Within-species variation may be the result of genetic variation, environmental variation or measurement error for example. In phylogenetic comparative studies, failing to account for intraspe-cific variation has many adverse effects, such as increased error to test hypotheses about evolutionary correlations, biased estimates of evolutionary rates,...
Chiropterophily, or bat pollination, is typically considered a highly specialized pollination system that has evolved independently numerous times across the angiosperm phylogeny, with distinct lineages often converging on a similar suite of floral traits. The African baobab, Adansonia digitata, occurs widespread across continental Africa and intro...
Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could...
Motivation
Kinship estimation is necessary for evaluating violations of assumptions or testing certain hypotheses in many population genomic studies. However, kinship estimators are usually designed for diploid systems and cannot be used in populations with mixed haploid diploid genetic systems. The only estimators for different ploidies require da...
Significance
Since Darwin’s ground-breaking monograph on carnivorous plants, scientists have recognized only 11 independent origins of plant carnivory. We report the discovery of a new lineage of carnivorous plants, represented by the North American flowering plant Triantha occidentalis . Among monocots, Triantha represents the only instance of a s...
Motivation
With growing genome-wide molecular data sets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow, or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heav...
The Tree of Life is the graphical structure that represents the evolutionary process from single-cell organisms at the origin of life to the vast biodiversity we see today. Reconstructing this tree from genomic sequences is challenging due to the variety of biological forces that shape the signal in the data, and many of those processes like incomp...
Baobabs (Adansonia) are a cohesive group of tropical trees with a disjunct distribution in Australia, Madagascar, and continental Africa, and diverse flowers associated with two pollination modes. We used custom targeted sequence capture in conjunction with new and existing phylogenetic comparative methods to explore the evolution of floral traits...
Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene fl...
[This corrects the article DOI: 10.1371/journal.pone.0076267.].
Previous research suggests that Gossypium has undergone a 5‐ to 6‐fold multiplication following its divergence from Theobroma. However, the number of events, or where they occurred in the Malvaceae phylogeny remains unknown. We analyzed transcriptomic and genomic data from representatives of eight of the nine Malvaceae subfamilies. Phylogenetic ana...
Premise of the Study
We present the first plastome phylogeny encompassing all 77 monocot families, estimate branch support, and infer monocot‐wide divergence times and rates of species diversification.
Methods
We conducted maximum likelihood analyses of phylogeny and BAMM studies of diversification rates based on 77 plastid genes across 545 monoco...
The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affec...
The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affec...
To study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The o...
PhyloNetworks is a Julia package for the inference, manipulation, visualization and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tool...
To study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The o...
Diffusion processes on trees are commonly used in evolutionary biology to
model the joint distribution of continuous traits, such as body mass, across
species. Estimating the parameters of such processes from tip values presents
challenges because of the intrinsic correlation between the observations
produced by the shared evolutionary history, thu...
Over the past several years, phylogenetic comparative studies have increasingly approached trait evolution in a multivariate context, with a number of taxa that continues to rise dramatically. Recent methods for phylogenetic comparative studies have provided ways to incorporate measurement error and to address computational challenges. However, mis...
Coalescent-based methods are now broadly used to infer evolutionary relationships between groups of organisms under the assumption
that incomplete lineage sorting (ILS) is the only source of gene tree discordance. Many of these methods are known to consistently
estimate the species tree when all their assumptions are met. Nonetheless, little work h...
Since Darwin, biologists have come to recognize that the theory of descent from common ancestry is very well supported by diverse lines of evidence. However, while the qualitative evidence is overwhelming, we also need formal methods for quantifying the evidential support for common ancestry (CA) over the alternative hypothesis of separate ancestry...
Whole genome duplications (WGDs) have helped shape the genomes of land plants, and recent evidence suggests that the genomes of all angiosperms have experienced at least two ancient WGDs. In plants, WGDs often are followed by rapid fractionation, in which many homeologous gene copies are lost. Thus, it can be extremely difficult to identify, let al...
Supporting information file that contains formal definitions, proofs of network identifiability results, more details on the heuristic network search, on the simulation study and on the fish network analysis.
Quartet CFs under the coalescent with hybridization
Topology identifiability
Parameter identifiability
Heuristic search in the space of n...
While there is no doubt among evolutionary biologists that all living species, or merely all living species within a particular group (e.g., animals), share descent from a common ancestor, formal statistical methods for evaluating common ancestry from aligned DNA sequence data have received criticism. One primary criticism is that prior methods tak...
The common ancestry of life is supported by an enormous body of evidence and is universally accepted within the scientific community. However, some potential sources of data that can be used to test the thesis of common ancestry have not yet been formally analyzed.
We developed a new test of common ancestry based on nucleotide sequences at amino ac...
Phylogenetic networks are necessary to represent the tree of life expanded by
edges to represent events such as horizontal gene transfers, hybridizations or
gene flow. Not all species follow the paradigm of vertical inheritance of their
genetic material. While a great deal of research has flourished into the
inference of phylogenetic trees, statist...
Genome sequence data contain abundant information about genealogical history, but methods for extracting and interpreting
this information are not yet fully developed. We analyzed genome sequences for multiple accessions of the selfing plant, Arabidopsis thaliana, with the goal of better understanding its genealogical history. As expected from acce...
Delimitation of species based exclusively on genetic data has been advocated despite a critical knowledge gap: how might such approaches fail because they rely on genetic data alone, and would their accuracy be improved by using multiple data-types. We provide here the requisite framework for addressing these key questions. Because both phenotypic...
For the study of macroevolution, phenotypic data are analysed across species on a dated phylogeny using phylogenetic comparative methods. In this context, the Ornstein‐Uhlenbeck (OU) process is now being used extensively to model selectively driven trait evolution, whereby a trait is attracted to a selection optimum μ.
We report here theoretical pr...
We developed a linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees. Our algorithm solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products...
Whole Genome Duplications (WGDs) followed by massive gene loss occurred in the evolutionary history of many groups. WGDs are usually inferred from the age distribution of paralogs (Ks-based methods) or from gene collinearity data (synteny). However, Ks-based methods are restricted to detect the recent WGDs due to saturation effects and the difficul...
We compared 31 complete and nearly complete globally derived HSV-1 genomic sequences using HSV-2 HG52 as an outgroup to investigate their phylogenetic relationships and look for evidence of recombination. The sequences were retrieved from NCBI and were then aligned using Clustal W. The generation of a maximum likelihood tree resulted in a six clade...
The Dobzhansky-Muller model of speciation posits that defects in hybrids between species are the result of negative epistatic interactions between alleles that arose in independent genetic backgrounds. Tests of one important prediction from this model, that incompatibilities "snowball," have relied on comparisons of the number of incompatibilities...
We compared 31 complete and nearly complete globally derived HSV-1 genomic sequences using HSV-2 HG52 as an outgroup to investigate their phylogenetic relationships and look for evidence of recombination. The sequences were retrieved from NCBI and were then aligned using Clustal W. The generation of a maximum likelihood tree resulted in a six clade...
Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees i...
Hierarchical autocorrelation in the error term of linear models arises when
sampling units are related to each other according to a tree. The residual
covariance is parametrized using the tree-distance between sampling units. When
observations are modeled using an Ornstein-Uhlenbeck (OU) process along the
tree, the autocorrelation between two tips...
Hierarchical autocorrelation in the error term of linear models arises when sampling units are related to each other according to a tree. The residual covariance is parametrized using the tree-distance between sampling units. When observations are modeled using an Ornstein-Uhlenbeck (OU) process along the tree, the autocorrelation between two tips...
AimsThis study analysed the growth and survival of 18 strains of the six serotypes of non‐O157 Shiga toxin‐producing Escherichia coli (STEC) (O26, O45, O103, O111, O121 and O145) most frequently implicated in human illness and compared them with Escherichia coli O157:H7 strain ATCC43895. Methods and ResultsThe data from growth in Luria–Bertani brot...
Abstract The Dobzhansky-Muller model of speciation posits that defects in hybrids between species are the result of negative epistatic interactions between alleles that arose in independent genetic backgrounds. Tests of one important prediction from this model, that ...
Conspicuous innovations in the history of life are often preceded by more cryptic genetic and developmental precursors. In many cases, these appear to be associated with recurring origins of very similar traits in close relatives (parallelisms) or striking convergences separated by deep time (deep homologies). Although the phylogenetic distribution...
Despite advances in genetic mapping of quantitative traits and in phylogenetic comparative approaches, these two perspectives are rarely combined. The joint consideration of multiple crosses among related taxa (whether species or strains) not only allows more precise mapping of the genetic loci (called quantitative trait loci, QTL) that contribute...
Recent genomic studies have drastically altered our knowledge of polyploid evolution. Wild potatoes (Solanum section Petota) are a highly diverse and economically important group of about 100 species widely distributed throughout the Americas. Thirty-six percent of the species in section Petota are polyploid or with diploid and polyploid cytotypes....
The objective of this study was to evaluate possible claims by advocates of small-scale dairy farming that milk from smaller Wisconsin farms is of higher quality than milk from larger Wisconsin farms. Reported bulk tank standard plate count (SPC) and somatic cell count (SCC) test results for Wisconsin dairy farms were obtained for February to Decem...
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that...
With the increasing interest in recognizing the discordance between gene genealogies, various gene tree/species tree reconciliation methods have been developed. We present here the first attempt to assess and compare two such Bayesian methods, Bayesian estimation of species trees (BEST) and BUCKy (Bayesian untangling of concordance knots), in the p...
With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is...
Motivation: BUCKy is a C++ program that implements Bayesian concordance analysis. The method uses a non-parametric clustering of genes
with compatible trees, and reconstructs the primary concordance tree from clades supported by the largest proportions of genes.
A population tree with branch lengths in coalescent units is estimated from quartet con...
Compliance with U.S. Department of Agriculture (USDA) composition-based labeling standards often has been regarded as evidence of the shelf stability of ready-to-eat (RTE) meats. However, the USDA now requires further proof of shelf stability. Our previous work included development of equations for predicting the probability of Staphylococcus aureu...
Distribution of locus sizes. Using a minimum description length principle, the genome was partitioned into 14,081 loci with a median size of 98,238 bp (SD 312,637 bp) and a maximum locus size of 7.21 Mb. Loci greater than 1 Mb in size are not shown.
(0.30 MB TIF)
Recombination rate within large and small loci. The 2.5% largest loci (blue) have a significantly lower recombination rate as compared to the 2.5% smallest loci (red) (p<0.00001), suggesting the minimum description length principle partitioned the genome in a biologically informative manner.
(0.33 MB TIF)
Genomic locations and posterior probabilities of the 14,081 loci (computed with a prior probability of gene tree concordance set at α = 1).
(1.29 MB XLS)
Single locus posterior probabilities. 84.9% of loci are supported by a high posterior probability (>0.9) from the single-locus Bayesian phylogenetic analyses, suggesting the minimum description length principle partitioned the genome in a phylogenetically informative manner.
(0.58 MB TIF)
Fine-scale phylogenetic discordance. The posterior probability of each topology is mapped throughout the genome to characterize fine-scale patterns of discordance. Position along the chromosomes is indicated on the x-axis (Mb) and the posterior probability of each topology is on the y-axis. Colors correspond to the three topologies.
(6.81 MB TIF)
Maximum and minimum penalties against breakpoints for the minimum description length partitioning. Both the maximum (3) and minimum (0.9039) penalties were applied to the partitioning of chromosomes 18, 19, and X. Using a minimum penalty roughly doubles the number of loci on each chromosome, but the chromosome-wide concordance factors remain simila...
Median locus size for each of the three topologies. Median locus size for each topology parallels the rank order of the concordance factors on both the autosomes and the X chromosome. Colors correspond to the three topologies.
(0.28 MB TIF)
Varied starting interval sizes for the minimum description length partitioning. A range of SNP intervals was applied to the partitioning of chromosomes 18 and 19. There are no significant differences in the concordance factors between the first three starting intervals: 25, 50, or 100 SNPs. Colors correspond to the three topologies. Error bars are...
Phylogenetic discordance and long-branch attraction. The rat sequence was randomly shuffled to erase any phylogenetic signal between rat and house mice on chromosomes 18 and 19. Without the sequence shuffled (A), topologies significantly deviate from a 1/3, 1/3, 1/3 ratio. With the rat sequence shuffled (B), the topologies converge to a 1/3, 1/3, 1...
Population genetic theory predicts discordance in the true phylogeny of different genomic regions when studying recently diverged species. Despite this expectation, genome-wide discordance in young species groups has rarely been statistically quantified. The house mouse subspecies group provides a model system for examining phylogenetic discordance...
The 40 COSII oligonucleotide primers screened in this study.
The 19 COSII markers used in this study and their characteristics.
NCBI sequence database accession numbers.
Species examined; their superseries and series relationships within sect.Petota, endosperm balance number (EBN), genomes and plastid DNA clade relationships.
Background
Phylogenies reconstructed with only one or a few independently inherited loci may be unresolved or incongruent due to taxon and gene sampling, horizontal gene transfer, or differential selection and lineage sorting at individual loci. In an effort to remedy this situation, we examined the utility of conserved orthologous set (COSII) nucl...
A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases
like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative
biologists commonly combine such phylogenies into informative supertrees that reveal information whic...
ABSTRACT Studies of floral ecology and evolution are often centered on the idea that particular floral trait combinations, or syndromes, represent adaptations for particular pollinators. Despite the conceptual importance of pollination syndromes, few macroevolutionary studies have statistically examined the relationship between pollinators and flor...
This study was done to optimize accuracy of predicting growth of Salmonella serovars, Escherichia coli O157:H7, and Staphylococcus aureus in temperature-abused raw beef, poultry, and bratwurst (with salt but without added nitrite). Four mathematical approaches were used with experimentally determined lag-phase duration (LPD) and growth rate (GR) va...
U.S. Department of Agriculture (USDA) composition-based labeling standards for various ready-to-eat (RTE) meat products typically specify maximum product pH and/or moisture:protein ratio and less often maximum water activity (a(w)). Compliance with these standards often has been regarded as proof of shelf stability. However, the USDA now requires a...
The intestine of hibernating ground squirrels is protected against damage by ischemia-reperfusion (I/R) injury. This resistance does not depend on the low body temperature of torpor; rather, it is exhibited during natural interbout arousals that periodically return hibernating animals to euthermia. Here we use fluorescence two-dimensional differenc...
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the...
The asymptotic behavior of estimates and information criteria in linear models are studied in the context of hierarchically correlated sampling units. The work is motivated by biological data collected on species where autocorrelation is based on the species' genealogical tree. Hierarchical autocorrelation is also found in many other kinds of data,...
Differences in floral traits among plant species have often been attributed to adaptation to pollinators. We explored the importance of pollinator shifts in explaining floral divergence among 15 species of Iochroma. We examined four continuously varying floral traits: corolla length, nectar reward, display size, and flower color. Pollinator associa...
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible...
Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible...