Article

DendroPy: A Python Library for Phylogenetic Computing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. Availability: The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy). Contact: jeet{at}ku.edu

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The first step in our automated phylogenetic cluster identification was the creation of a pairwise distance matrix (PDM). This matrix was calculated by parsing the branch length estimates of a given phylogenetic tree using the Python phylogenetic computer library DendroPy 4.5.1 [27] and summing the total distances between all pairwise leaves (Fig 1: "Extract pairwise distance matrix DendroPy"). Next, we used Uniform Manifold Approximation and Projection (UMAP) to reduce the dimensionality of the PDM to 2 dimensions for clustering ("Reduce dimensionality of PDM") [28]. ...
... bipartition.leafset_as_newick_string() from DendroPy Phylogenetic Computing Library 4.5.1 [27] until all monophyletic groups in the polyphyletic groups had been identified. Once the cluster was split into monophyletic groups, unique decimal values were added to the end of the integer label, creating a floating-point label for all clusters that were previously non-monophyletic ("Label partitions with a decimal and integers"). ...
... where i j, delta (i,j) is the phylogenetic distance between species i and j, and n is the number of species in the sample. We used DendroPy Phylogenetic Computing Library 4.5.1 [27] to calculate the MPD for putative subfamilies and parsing their subtending branches. ...
Article
Full-text available
Phylogenetic analysis of protein sequences provides a powerful means of identifying novel protein functions and subfamilies, and for identifying and resolving annotation errors. However, automation of functional clustering based on phylogenetic trees has been challenging and most of it is done manually. Clustering phylogenetic trees usually requires the delineation of tree-based thresholds (e.g., distances), leading to an ad hoc problem. We propose a new phylogenetic clustering approach that identifies clusters without using ad hoc distances or other pre-defined values. Our workflow combines uniform manifold approximation and projection (UMAP) with Gaussian mixture models as a k-means like procedure to automatically group sequences into clusters. We then apply a “second pass” clade identification algorithm to resolve non-monophyletic groups. We tested our approach with several well-curated protein families (outer membrane porins, acyltransferase, and nuclear receptors) and showed our automated methods recapitulated known subfamilies. We also applied our methods to a broad range of different protein families from multiple databases, including Pfam, PANTHER, and UniProt, and to alignments of RNA viral genomes. Our results showed that AutoPhy rapidly generated monophyletic clusters (subfamilies) within phylogenetic trees evolving at very different rates both within and among phylogenies. The phylogenetic clusters generated by AutoPhy resolved misannotations and identified new protein functional groups and novel viral strains.
... As the first step, we simulated an ultra-metric tree under a birth-death model with equal birth rate and death rate with 1k, 5k, and 10k extent leaves using the simulator described in [33]. We set the root node as a diploid without any CNA, and each leaf node represents a single cell sample from the patient. ...
... To calculate this score, we first assign the cluster label of each leaf as the profile of that leaf in the tree. Next, we use the implementation in [33] to compute the parsimony score of the tree. Here, all label transitions are treated as substitutions with a uniform weight of one. ...
Preprint
Full-text available
Single-cell sequencing technologies are producing large data sets, often with thousands or even tens of thousands of single-cell genomic data from an individual patient. Evolutionary analyses of these data sets help uncover and order genetic variants in the data as well as elucidate mutation trees and intra-tumor heterogeneity (ITH) in the case of cancer data sets. To enable such large-scale analyses computationally, we propose a divide-and-conquer approach that could be used to scale up computationally intensive inference methods. The approach consists of four steps: 1) partitioning the dataset into subsets, 2) constructing a rooted tree for each subset, 3) computing a representative genotype for each subset by utilizing its inferred tree, and 4) assembling the individual trees using a tree built on the representative genotypes. Besides its flexibility and enabling scalability, this approach also lends itself naturally to ITH analysis, as the clones would be the individual subsets, and the “assembly tree” could be the mutation tree that defines the clones. To demonstrate the effectiveness of our proposed approach, we conducted experiments employing a range of methods at each stage. In particular, as clustering and dimensionality reduction methods are commonly used to tame the complexity of large datasets in this area, we analyzed the performance of a variety of such methods within our approach.
... For the assignment tools, it was necessary to make representative trees and alignments of each of the levels of designation. To do this, we used the phylogenetic distance matrix, calculated using the python package DendroPy (Sukumaran and Holder 2010), which gives pairwise phylogenetic distance between each pair of sequences. We took five sequences with coverage of over 90% which were furthest apart from each other in each of the genotypes, major lineages, and minor lineages. ...
Preprint
Full-text available
Dengue virus (DENV) is currently causing epidemics of unprecedented scope in endemic settings and expanding to new geographical areas. It is therefore critical to track this virus using genomic surveillance. However, the complex patterns of viral genomic diversity make it challenging to use the existing genotype classification system. Here we propose adding two sub-genotypic levels of virus classification, named major and minor lineages. These lineages have high thresholds for phylogenetic distance and clade size, rendering them stable between phylogenetic studies. We present an assignment tool to show that the proposed lineages are useful for regional, national and sub-national discussions of relevant DENV diversity. Moreover, the proposed lineages are robust to classification using partial genome sequences. We provide a standardized neutral descriptor of DENV diversity with which we can identify and track lineages of potential epidemiological and/or clinical importance. Information about our lineage system, including methods to assign lineages to sequence data and propose new lineages, can be found at: dengue-lineages.org .
... This project uses data formats and tools associated with the ALife Data Standards project (Lalejini et al., 2019) and benefited from many pieces of open-source scientific software (Sand et al., 2014;Virtanen et al., 2020;Harris et al., 2020;pandas development team, 2020;Wes McKinney, 2010;Sukumaran & Holder, 2010;Cock et al., 2009;Torchiano, 2016;Waskom, 2021;Hunter, 2007;Moreno & Papa, 2024;Moreno, 2024dMoreno, , 2023Hagen et al., 2021;Torchiano, 2016). ...
Preprint
Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.
... Vast amounts of bioinformatics-oriented phylogenetics software is also available. Applications typically include -inferring phylogenies from extant organisms (and sometimes fossils) (Challa & Neelapu, 2019), -sampling phylogenies from theoretical models of population and species dynamics (Stadler, 2011), -cross-referencing phylogenies with other data (e.g., spatial species distributions) (Emerson & Gillespie, 2008), and -analyzing and manipulating tree structures (Cock et al., 2009;Sand et al., 2014;Smith, 2020;Sukumaran & Holder, 2010). ...
Preprint
In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics. The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.
... In addition to R, Python stands out as a powerful tool for performing phylogenetic analysis, with an extensive set of libraries and tools such as Biopython [99] and Den-droPy [100] that provide a wide range of functions and algorithms. In addition, Python provides access to machine learning and deep learning libraries such as scikit-learn [101] and PyTorch [102], which facilitate effective model building and prediction in phylogenetic analysis. ...
Article
Full-text available
A phylogenetic tree can reflect the evolutionary relationships between species or gene families, and they play a critical role in modern biological research. In this review, we summarize common methods for constructing phylogenetic trees, including distance methods, maximum parsimony, maximum likelihood, Bayesian inference, and tree-integration methods (supermatrix and supertree). Here we discuss the advantages, shortcomings, and applications of each method and offer relevant codes to construct phylogenetic trees from molecular data using packages and algorithms in R. This review aims to provide comprehensive guidance and reference for researchers seeking to construct phylogenetic trees while also promoting further development and innovation in this field. By offering a clear and concise overview of the different methods available, we hope to enable researchers to select the most appropriate approach for their specific research questions and datasets.
... The standard NJ algorithm [42] and its variants, such as FastNJ [44], are commonly used for distance-based methods. In the final step of Scuphr, the standard NJ algorithm is applied to the distance matrix to reconstruct the cell lineage tree using the implementation in the Dendropy library [45]. The tree is re-rooted, so the bulk node becomes the root and indicates the non-mutated state. ...
Article
Full-text available
Cell lineage tree reconstruction methods are developed for various tasks, such as investigating the development, differentiation, and cancer progression. Single-cell sequencing technologies enable more thorough analysis with higher resolution. We present Scuphr, a distance-based cell lineage tree reconstruction method using bulk and single-cell DNA sequencing data from healthy tissues. Common challenges of single-cell DNA sequencing, such as allelic dropouts and amplification errors, are included in Scuphr. Scuphr computes the distance between cell pairs and reconstructs the lineage tree using the neighbor-joining algorithm. With its embarrassingly parallel design, Scuphr can do faster analysis than the state-of-the-art methods while obtaining better accuracy. The method’s robustness is investigated using various synthetic datasets and a biological dataset of 18 cells.
... A consensus tree was build adopting a 50% majority-rule using SumTree v4.1.0 in DendroPy v4.1.0 [27,28] following Rubolini et al. [29]. The MCMC was run in four parallel threads of 2,500,000 iterations, with a burn-in of 40,000 and a thinning interval of 5000, resulting in 9,840,000 iterations and between 966 and 1887 samples of the posterior distribution parameters. ...
Article
Full-text available
After mosquitoes, ticks are among the most important vector of pathogens of concern for animal and public health, but unless mosquitoes ticks remain attached to their hosts for long time periods providing an opportunity to analyse their role in the dispersal and dynamics of different zoonotic pathogens. Given their interest in public health it is important to understand which factors affect their incidence in different hosts and to stablish effective surveillance programs to determine the risk of transmission and spill-over of zoonotic pathogens. Taking benefit of a large network of volunteer ornithologists, we analysed the life-history traits associated to the presence of ticks using information of 620,609 individuals of 231 avian species. Bird phylogeny, locality and year explained a large amount of variance in tick prevalence. Non-colonial species non breeding in grasslands and non-spending the non-breeding season as gregarious groups or isolated individuals (e.g. thrushes, quails and finches) had the higher prevalence of ticks and appear as good candidates for zoonosis surveillance programs based on the analyses of ticks collected from wild birds. Ringers underestimated tick prevalence but can be considered as an important source of information of ticks for public and animal health surveillance programs if properly trained for the detection and collection of the different tick development phases.
... Using the 30,066 produced gene trees, ASTRAL (v5.7.1) [61] was used to estimate the species tree under a multi-species coalescent model. Combined with the previous species tree, we used DendroPy (v4.5.2) [62] to simulate the gene trees under the ILS scenario. ...
Article
Full-text available
Background The pig (Sus Scrofa) is one of the oldest domesticated livestock species that has undergone extensive improvement through modern breeding. European breeds have advantages in lean meat development and highly-productive body type, whereas Asian breeds possess extraordinary fat deposition and reproductive performance. Consequently, Eurasian breeds have been extensively used to develop modern commercial breeds for fast-growing and high prolificacy. However, limited by the sequencing technology, the genome architecture of some nascent developed breeds and the human-mediated impact on their genomes are still unknown. Results Through whole-genome analysis of 178 individuals from an Asian locally developed pig breed, Beijing Black pig, and its two ancestors from two different continents, we found the pervasive inconsistent gene trees and species trees across the genome of Beijing Black pig, which suggests its introgressive hybrid origin. Interestingly, we discovered that this developed breed has more genetic relationships with European pigs and an unexpected introgression from Asian pigs to this breed, which indicated that human-mediated introgression could form the porcine genome architecture in a completely different type compared to native introgression. We identified 554 genomic regions occupied 63.30 Mb with signals of introgression from the Asian ancestry to Beijing Black pig, and the genes in these regions enriched in pathways associated with meat quality, fertility, and disease-resistant. Additionally, a proportion of 7.77% of genomic regions were recognized as regions that have been under selection. Moreover, combined with the results of a genome-wide association study for meat quality traits in the 1537 Beijing Black pig population, two important candidate genes related to meat quality traits were identified. DNAJC6 is related to intramuscular fat content and fat deposition, and RUFY4 is related to meat pH and tenderness. Conclusions Our research provides insight for analyzing the origins of nascent developed breeds and genome-wide selection remaining in the developed breeds mediated by humans during modern breeding.
... Φ ST and D XY provide independent measures of divergence but are correlated with each other. Third, we calculated Slatkin and Maddison's 's' statistic (Slatkin & Maddison, 1989), the number of parsimony steps required for each gene tree to be congruent with the 'true' lineage tree ( Figure 2) using dendropy version 4.4.0 (Sukumaran & Holder, 2010). This cladistic measure is sensitive to lowered gene flow or increased selection, which should both cause gene trees to be more congruent with the lineage tree. ...
Article
Full-text available
Genomes are heterogeneous during the early stages of speciation, with small ‘islands’ of DNA appearing to reflect strong adaptive differences, surrounded by vast seas of relative homogeneity. As species diverge, secondary contact zones between them can act as an interface and selectively filter through advantageous alleles of hybrid origin. Such introgression is another important adaptive process, one that allows beneficial mosaics of recombinant DNA (‘rivers’) to flow from one species into another. Although genomic islands of divergence appear to be associated with reproductive isolation, and genomic rivers form by adaptive introgression, it is unknown whether islands and rivers tend to be the same or different loci. We examined three replicate secondary contact zones for the Yosemite toad ( Anaxyrus canorus ) using two genomic data sets and a morphometric data set to answer the questions: (1) How predictably different are islands and rivers, both in terms of genomic location and gene function? (2) Are the adaptive genetic trait loci underlying tadpole growth and development reliably islands, rivers or neither? We found that island and river loci have significant overlap within a contact zone, suggesting that some loci are first islands, and later are predictably converted into rivers. However, gene ontology enrichment analysis showed strong overlap in gene function unique to all island loci, suggesting predictability in overall gene pathways for islands. Genome‐wide association study outliers for tadpole development included LPIN3, a lipid metabolism gene potentially involved in climate change adaptation, that is island‐like for all three contact zones, but also appears to be introgressing (as a river) across one zone. Taken together, our results suggest that adaptive divergence and introgression may be more complementary forces than currently appreciated.
... As in Duboys et al. [36], the analysis was set to run for 20 million generations, three separate runs with four chains each, sampling every 1000 generations and discarding the first 25% as burn-in. The trees obtained were summarized using SumTrees [40]. The synapomorphies of the consensus tree were accessed using Mesquite [41]. ...
Article
Full-text available
Cetotheriidae is a family of baleen whales that went nearly extinct during the Pleistocene (excluding Caperea marginata). For a long time, the Cetotheriidae family has been seen as a problematic clade, but in the past two decades there have been various studies trying to resolve the phylogeny of this group. In 1831, Alexandre Vandelli described three cetotheriid skulls, found during a gold exploration at Adiça beach (Portugal). These specimens constituted the first Portuguese vertebrate fossils ever published in the literature. Another skull was added to the “Vandelli skulls” by Jacinto Pedro Gomes, in 1914, during a survey of the Museu Nacional de História Natural collections without giving information on the origin of this skull. In 1941, Remington Kellogg states that one of the original “Vandelli skulls” is no longer present in the Museu Nacional de História Natural collections. Until today, there is no information on how, or exactly when, the fourth skull and one of the original three “Vandelli skulls” appeared and disappeared, respectively. Since their discovery, all the attempts to describe these specimens were not based on direct observations and no comprehensive phylogenetic analysis have included the three skulls. Here we provide a detailed anatomic description, a new phylogenetic analysis and a palaeoecological reconstruction of these specimens, clarifying their relationships within the Cetotheriidae family and fostering the importance of these historical specimens to the modern comprehension of fossil whale evolution. In addition, our results support that Cephalotropis nectus is a valid species with an emended diagnosis. We also concluded that two specimens belong to a new genus, forming two new fossil species (new combinations).
... We performed 100 bootstrap replicates of this species tree with a single migration edge, resampling 100 SNP blocks to generate support values for internal branches. We used the SumTrees command from DendroPy v4.5.2 (Sukumaran & Holder, 2010) to collate these support values and visualize them on the species tree. Species trees were rooted using a sample of H. diadema from mainland Papua New Guinea, following the molecular phylogeny of Lavery et al. (2014) that encompassed the study region and incorporated broader taxonomic sampling from the family Hipposideridae. ...
Article
Full-text available
Body size is a key morphological attribute, often used to delimit species boundaries among closely related taxa. But body size can evolve in parallel, reaching similar final states despite independent evolutionary and geographic origins, leading to faulty assumptions of evolutionary history. Here, we document parallel evolution in body size in the widely distributed leaf-nosed bat genus Hipposideros, which has misled both taxonomic and evolutionary inference. We sequenced reduced representation genomic loci and measured external morphological characters from three closely related species from the Solomon Islands archipelago, delimited by body size. Species tree reconstruction confirms the paraphyly of two morphologically designated species. The nonsister relationship between large-bodied H. dinops lineages found on different islands indicates that large-bodied ecomorphs have evolved independently at least twice in the history of this radiation. A lack of evidence for gene flow between sympatric, closely related taxa suggests the rapid evolution of strong reproductive isolating barriers between morphologically distinct populations. Our results position Solomon Islands Hipposideros as a novel vertebrate system for studying the repeatability of parallel evolution under natural conditions. We conclude by offering testable hypotheses for how geography and ecology could be mediating the repeated evolution of large-bodied Hipposideros lineages in the Solomon Islands.
... We mapped bootstrap support onto the ExaML tree using sumtrees.py from the DendroPy package (Sukumaran and Holder, 2010). We found that the IQ-TREE and ExaML trees 1 "allfam (all)" refers to the complete allfam dataset (396 taxa) and "allfam (ws)" refers to the well-sampled subset (369 taxa) of the allfam dataset. 2 Percentage of gaps and missing data per taxon. ...
Preprint
Full-text available
The evolutionary histories of different genomic regions typically differ from each other and from the underlying species phylogeny. This makes species tree estimation challenging. Here, we examine the performance of phylogenomic methods using a well-resolved phylogeny that nevertheless contains many difficult nodes, the species tree of living birds. We compared trees generated by maximum likelihood (ML) analysis of concatenated data, gene tree summary methods, and SVDquartets. We also conduct the first empirical test of a ''new'' method called METAL ( M etric algorithm for E stimation of T rees based on A ggregation of L oci), which is based on evolutionary distances calculated using concatenated data. We conducted this test using a novel dataset comprising more than 4000 ultraconserved element (UCE) loci from almost all bird families and two existing UCE and intron datasets sampled from almost all avian orders. We identified ''reliable clades'' very likely to be present in the true avian species tree and used them to assess method performance. ML analyses of concatenated data recovered almost all reliable clades with less data and greater robustness to missing data than other methods. METAL recovered many reliable clades, but only performed well with the largest datasets. Gene tree summary methods (weighted ASTRAL and weighted ASTRID) performed well; they required less data than METAL but more data than ML concatenation. SVDquartets exhibited the worst performance of the methods tested. In addition to the methodological insights, this study provides a novel estimate of avian phylogeny with almost 99% of the currently recognized avian families. Only one of the 181 reliable clades we examined was consistently resolved differently by ML concatenation versus other methods, suggesting that it may be possible to achieve consensus on the deep phylogeny of extant birds.
... The statistical parameters, mean (µ) and standard deviation (σ) of each RTDs (4 k ) for every k-mer in the range of 1 to 8 were calculated for every sequence in the reference dataset [53] and hence, every gene sequence in the reference dataset constituted a numeric vector of 2 × 4 k dimensions. Numeric vectors for each value of k were used for the generation of a Euclidean-based pairwise distance matrix that was used to infer Neighbor-joining (NJ)-based phylogeny trees [52,65] by using the DendroPy library [66] in Python. The optimum k value was chosen based on the accuracy of clustering in accordance with the lineage classification of the reference dataset. ...
Article
Full-text available
Influenza D virus (IDV) is the most recent addition to the Orthomyxoviridae family and cattle serve as the primary reservoir. IDV has been implicated in Bovine Respiratory Disease Complex (BRDC), and there is serological evidence of human infection of IDV. Evolutionary changes in the IDV genome have resulted in the expansion of genetic diversity and the emergence of multiple lineages that might expand the host tropism and potentially increase the pathogenicity to animals and humans. Therefore, there is an urgent need for automated, accurate and rapid typing tools for IDV lineage typing. Currently, IDV lineage typing is carried out using BLAST-based searches and alignment-based molecular phylogeny of the hemagglutinin-esterase fusion (HEF) gene sequences, and lineage is assigned to query sequences based on sequence similarity (BLAST search) and proximity to the reference lineages in the tree topology, respectively. To minimize human intervention and lineage typing time, we developed IDV Typer server, implementing alignment-free method based on return time distribution (RTD) of k-mers. Lineages are assigned using HEF gene sequences. The server performs with 100% sensitivity and specificity. The IDV Typer server is the first application of an RTD-based alignment-free method for typing animal viruses.
... The unit of branch lengths of the sampled trees was set to be substitutions per site. Parameter estimates were summarized with TreeAnnotator v. 2.6.0 (part of BEAST v. 2) and mapped onto the 50 % majority-rule consensus tree created by SumTrees v. 4.4.0 (Sukumaran & Holder 2015) from the Python library DendroPy v. 4.4.0 (Sukumaran & Holder 2010). The edge lengths of the summarizing tree were calculated as mean lengths for the corresponding edges in the input set of trees. ...
Article
Full-text available
During 25 surveys of global Phytophthora diversity, conducted between 1998 and 2020, 43 new species were detected in natural ecosystems and, occasionally, in nurseries and outplantings in Europe, Southeast and East Asia and the Americas. Based on a multigene phylogeny of nine nuclear and four mitochondrial gene regions they were assigned to five of the six known subclades, 2a–c, e and f, of Phytophthora major Clade 2 and the new subclade 2g. The evolutionary history of the Clade appears to have involved the pre-Gondwanan divergence of three extant subclades, 2c, 2e and 2f, all having disjunct natural distributions on separate continents and comprising species with a soilborne and aquatic lifestyle and, in addition, a few partially aerial species in Clade 2c; and the post-Gondwanan evolution of subclades 2a and 2g in Southeast/East Asia and 2b in South America, respectively, from their common ancestor. Species in Clade 2g are soilborne whereas Clade 2b comprises both soil-inhabiting and aerial species. Clade 2a has evolved further towards an aerial lifestyle comprising only species which are predominantly or partially airborne. Based on high nuclear heterozygosity levels ca. 38 % of the taxa in Clades 2a and 2b could be some form of hybrid, and the hybridity may be favoured by an A1/A2 breeding system and an aerial life style. Circumstantial evidence suggests the now 93 described species and informally designated taxa in Clade 2 result from both allopatric non-adaptive and sympatric adaptive radiations. They represent most morphological and physiological characters, breeding systems, lifestyles and forms of host specialism found across the Phytophthora clades as a whole, demonstrating the strong biological cohesiveness of the genus. The finding of 43 previously unknown species from a single Phytophthora clade highlight a critical lack of information on the scale of the unknown pathogen threats to forests and natural ecosystems, underlining the risk of basing plant biosecurity protocols mainly on lists of named organisms. More surveys in natural ecosystems of yet unsurveyed regions in Africa, Asia, Central and South America are needed to unveil the full diversity of the clade and the factors driving diversity, speciation and adaptation in Phytophthora.
... Trees were sampled every 10,000 generations with 25% burn-in. To select a more robust topology and to examine whether both analyses supported each node, we summarized the topologies of the two gene trees by SumTrees Version 4.0.0 (Sukumaran & Holder, 2015) under DendroPy (Sukumaran & Holder, 2010). The topology with overall higher nodal supports was set as the target tree. ...
Article
Full-text available
Madagascar is a global biodiversity hotspot, but its biodiversity continues to be underestimated and understudied. Of raft spiders, genus Dolomedes Latreille, 1804, literature only reports two species on Madagascar. Our single expedition to humid forests of eastern and northern Madagascar, however, yielded a series of Dolomedes exemplars representing both sexes of five morphospecies. To avoid only using morphological diagnostics, we devised and tested an integrative taxonomic model for Dolomedes based on the unified species concept. The model first determines morphospecies within a morphometrics framework, then tests their validity via species delimitation using COI. It then incorporates habitat preferences, geological barriers, and dispersal related traits to form hypotheses about gene flow limitations. Our results reveal four new Dolomedes species that we describe from both sexes as Dolomedes gregoric sp. nov., D. bedjanic sp. nov., D. hydatostella sp. nov., and D. rotundus sp. nov. The range of D. kalanoro Silva & Griswold, 2013, now also known from both sexes, is expanded to eastern Madagascar. By increasing the known raft spider diversity from one valid species to five, our results merely scratch the surface of the true Dolomedes species diversity on Madagascar. Our integrative taxonomic model provides the framework for future revisions of raft spiders anywhere.
... We followed the suggestion by 64 and downloaded 1000 trees for each (sub)class and summarized these into a 50% majority-rule consensus tree using the sumtree.py script in the Dendropy package 65 . The phylogeny of Actinopterygii was obtained from https://fishtreeoflife.org/ 66 . ...
Article
Full-text available
Chemoreception – the ability to smell and taste – is an essential sensory modality of most animals. The number and type of chemical stimuli that animals can perceive depends primarily on the diversity of chemoreceptors they possess and express. In vertebrates, six families of G protein-coupled receptors form the core of their chemosensory system, the olfactory/pheromone receptor gene families OR, TAAR, V1R and V2R, and the taste receptors T1R and T2R. Here, we study the vertebrate chemoreceptor gene repertoire and its evolutionary history. Through the examination of 1,527 vertebrate genomes, we uncover substantial differences in the number and composition of chemoreceptors across vertebrates. We show that the chemoreceptor gene families are co-evolving, highly dynamic, and characterized by lineage-specific expansions (for example, OR in tetrapods; TAAR, T1R in teleosts; V1R in mammals; V2R, T2R in amphibians) and losses. Overall, amphibians, followed by mammals, are the vertebrate clades with the largest chemoreceptor repertoires. While marine tetrapods feature a convergent reduction of chemoreceptor numbers, the number of OR genes correlates with habitat in mammals and birds and with migratory behavior in birds, and the taste receptor repertoire correlates with diet in mammals and with aquatic environment in fish.
... Parsnp was executed with different partition sizes p on each set of genomes. The resulting phylogenies from each run were compared to the non-partitioned phylogenies using the normalized weighted Robinson-Foulds distance (N-wRF) calculated using DendroPy [10]. * As the non-partitioned version of Parsnp failed to run on the K. pneumoniae dataset, we compared the resulting trees to the tree obtained from the p = 250. ...
Preprint
Motivation Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014. Results To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4x and reduce runtime by over 2x, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes. Availability Parsnp is available at https://github.com/marbl/parsnp
... [71]. Specifically, we built a 50% majority-rule consensus tree [72] using the Python program 'SumTrees' in 'DendroPy' [73] following the methods of Rubolini et al. [74]. We trimmed trees to include species in the baseline and stress-induced data subsets using the R package 'APE' [75]. ...
Article
Full-text available
As humans alter landscapes worldwide, land and wildlife managers need reliable tools to assess and monitor responses of wildlife populations. Glucocorticoid (GC) hormone levels are one common physiological metric used to quantify how populations are coping in the context of their environments. Understanding whether GC levels can reflect broad landscape characteristics, using data that are free and commonplace to diverse stakeholders, is an important step towards physiological biomarkers having practical application in management and conservation. We conducted a phylogenetic comparative analysis using publicly available datasets to test the efficacy of GCs as a biomarker for large spatial-scale avian population monitoring. We used hormone data from HormoneBase (51 species), natural history information and US national land cover data to determine if baseline or stress-induced corticosterone varies with the amount of usable land cover types within each species' home range. We found that stress-induced levels, but not baseline, positively correlated with per cent usable land cover both within and across species. Our results indicate that GC concentrations may be a useful biomarker for characterizing populations across a range of habitat availability, and we advocate for more physiological studies on non-traditional species in less studied populations to build on this framework. This article is part of the theme issue ‘Endocrine responses to environmental variation: conceptual approaches and recent developments’.
... We extracted the sub-tree for each clade using the UShER platform 63 , and calculated the Sackin index 65 as a measure of tree imbalance using Python's dendroPy framework 66 . Other indices were assessed (e.g., B1, Treeness) and deemed inappropriate for the task at hand. ...
Article
Full-text available
The evolution of SARS-Coronavirus-2 (SARS-CoV-2) has been characterized by the periodic emergence of highly divergent variants. One leading hypothesis suggests these variants may have emerged during chronic infections of immunocompromised individuals, but limited data from these cases hinders comprehensive analyses. Here, we harnessed millions of SARS-CoV-2 genomes to identify potential chronic infections and used language models (LM) to infer chronic-associated mutations. First, we mined the SARS-CoV-2 phylogeny and identified chronic-like clades with identical metadata (location, age, and sex) spanning over 21 days, suggesting a prolonged infection. We inferred 271 chronic-like clades, which exhibited characteristics similar to confirmed chronic infections. Chronic-associated mutations were often high-fitness immune-evasive mutations located in the spike receptor-binding domain (RBD), yet a minority were unique to chronic infections and absent in global settings. The probability of observing high-fitness RBD mutations was 10-20 times higher in chronic infections than in global transmission chains. The majority of RBD mutations in BA.1/BA.2 chronic-like clades bore predictive value, i.e., went on to display global success. Finally, we used our LM to infer hundreds of additional chronic-like clades in the absence of metadata. Our approach allows mining extensive sequencing data and providing insights into future evolutionary patterns of SARS-CoV-2.
... ; https://doi.org/10.1101/2024.01.04.574244 doi: bioRxiv preprint linkage disequilibrium) in TreeMix, as per author recommendation (Fitak, 2021). Lastly, we ran 100 bootstrap replicates for the TreeMix species tree with the optimal number of migration edges, and summarized the branch support values on this tree using DendroPy (Sukumaran & Holder, 2010). ...
Preprint
Full-text available
Accurate estimates of species diversity are essential for all biodiversity research. Delimiting species and understanding the underlaying processes of speciation are also central components of systematic biology that outline our comprehension of the evolutionary mechanisms generating biodiversity. We obtained genomic data (Ultraconserved Elements and single nucleotide polymorphisms) for a widespread genus of South American tree squirrels (genus Guerlinguetus ) to explore alternative hypotheses on species limits and to clarify recent and rapid speciation on continental-scale and dynamically evolving landscapes. Using a multilayered genomic approach that integrates fine-scale population genetic analyses with quantitative molecular species delimitation methods, we observed that (i) the most likely number of species within Guerlinguetus is six, contrasting with both classic morphological revision and mitochondrial species delimitation; (ii) incongruencies in species relationships still persist, which might be a response to population migration and gene flow taking place in lowlands of eastern Amazonia; and (iii) effective migration surfaces detected important geographic barriers associated with the major Amazonian riverine systems and the mountain ranges of the Guiana shield. In conclusion, we uncovered unexpected species diversity of keystone mammals that are critical in tree-seed predation and dispersal in one of the most fragile and threated ecosystems of the world, the tropical rainforests of South America. Our results corroborate recent findings suggesting that much of the extant species-level diversity in Amazonia is young, dating back to the Quaternary, while also reinforcing long-established hypotheses on the role of rivers and climate-driven forest dynamics triggering Amazonian speciation.
... The remaining trees were used to build the consensus tree using the Phylogenetic Tree Summarization (SumTrees) program within DendroPy v. 4.3.0., https:// github. com/ jeets ukuma ran/ Dendr oPy 55 . The topology of species resulting from ML is used, and to add the posterior probabilities (PP) of BI on the ML tree, the Phylogenetic Tree Summarization (SumTrees) program within DendroPy v. 4.3.0., was used. ...
Article
Full-text available
Earliella scabrosa is a pantropical species of Polyporales (Basidiomycota) and well-studied concerning its morphology and taxonomy. However, its pantropical intraspecific genetic diversity and population differentiation is unknown. We initiated this study to better understand the genetic variation within E. scabrosa and to test if cryptic species are present. Sequences of three DNA regions, the nuclear ribosomal internal transcribed spacer (ITS), the large subunit ribosomal DNA (LSU), and the translation elongation factor (EF1α) were analysed for 66 samples from 15 geographical locations. We found a high level of genetic diversity (haplotype diversity, Hd = 0.88) and low nucleotide diversity (π = 0.006) across the known geographical range of E. scabrosa based on ITS sequences. The analysis of molecular variance (AMOVA) indicates that the genetic variability is mainly found among geographical populations. The results of Mantel tests confirmed that the genetic distance among populations of E. scabrosa is positively correlated with the geographical distance, which indicates that geographical isolation is an important factor for the observed genetic differentiation. Based on phylogenetic analyses of combined dataset ITS-LSU-EF1α, the low intraspecific divergences (0–0.3%), and the Automated Barcode Gap Discovery (ABGD) analysis, E. scabrosa can be considered as a single species with five different geographical populations. Each population might be in the process of allopatric divergence and in the long-term they may evolve and become distinct species.
... CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Any type of analytical or generative procedure involving statistical models requires some form of infrastruc- 73 ture for specifying such models. One example is the framework adopted by the BEAST, BEAST 2 and 74 RevBayes platforms, whereby atomic model components can be combined into an arbitrarily large Bayesian 75 network -a probabilistic graphical model whose structure can be represented by a directed acyclic graph 76 (DAG; or more explicitly as a factor graph, e.g., Fig. 1b; [29]). ...
Preprint
Full-text available
We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, testing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, through its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This paper describes the features of PhyloJunction - which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models - and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.
... org) using the Hackett backbone tree (only sequenced species; Hackett et al., 2008). Then, following Rubolini et al. (2015), trees were summarized by computing a single 50% majority-rule consensus tree in SumTrees v 4.5.1 in DendroPy (Sukumaran & Holder, 2010. ...
Article
Full-text available
Comprehending symbiont abundance among host species is a major ecological endeavour, and the metabolic theory of ecology has been proposed to understand what constrains symbiont populations. We parameterized metabolic theory equations to investigate how bird species' body size and the body size of their feather mites relate to mite abundance according to four potential energy (uropygial gland size) and space constraints (wing area, total length of barbs and number of feather barbs). Predictions were compared with the empirical scaling of feather mite abundance across 106 passerine bird species (26,604 individual birds sampled), using phylogenetic modelling and quantile regression. Feather mite abundance was strongly constrained by host space (number of feather barbs) but not by energy. Moreover, feather mite species' body size was unrelated to the body size of their host species. We discuss the implications of our results for our understanding of the bird–feather mite system and for symbiont abundance in general.
... Consensus trees across these gene trees were visualised using DensiTree 132 . Finally the Bayesian posterior probability support (based on the amount of gene trees supporting the split) for each clade was calculated and plotted on top of the consensus tree using SumTrees, part of DendroPy 133 Fig. 18). ...
Article
Full-text available
Teleost fishes, which are the largest and most diverse group of living vertebrates, have a rich history of ancient and recent polyploidy. Previous studies of allotetraploid common carp and goldfish (cyprinids) reported a dominant subgenome, which is more expressed and exhibits biased gene retention. However, the underlying mechanisms contributing to observed ‘subgenome dominance’ remains poorly understood. Here we report high-quality genomes of twenty-one cyprinids to investigate the origin and subsequent subgenome evolution patterns following three independent allopolyploidy events. We identify the closest extant relatives of the diploid progenitor species, investigate genetic and epigenetic differences among subgenomes, and conclude that observed subgenome dominance patterns are likely due to a combination of maternal dominance and transposable element densities in each polyploid. These findings provide an important foundation to understanding subgenome dominance patterns observed in teleost fishes, and ultimately the role of polyploidy in contributing to evolutionary innovations.
... We inferred individual gene trees with the GTRCAT substitution model in RAxML v.8 (Stamatakis, 2014) along with 200 bootstrap replicates. To avoid biasing species tree inference with gene tree branches of low support, we collapsed any branch on the gene trees with less than 33% gene tree bootstrap support using DendroPy v.4.4.0 (Sukumaran & Holder, 2010). We inferred a species tree using the summary coalescent method ASTRAL-III (Zhang & al., 2017), and evaluated support via local posterior probability (LPP). ...
Article
Full-text available
Traits of the spore‐bearing generation have historically provided the basis for systematic concepts across the phylogenetic spectrum and depth of mosses. Whether taxa characterized by a simple sporophytic architecture are closely related or emerged from independent reduction is often ambiguous. Phylogenomic inferences in the Funariaceae, which hold the model taxon Physcomitrium patens , revealed that several such shifts in sporophyte complexity occurred, and mostly within the Entosthodon‐Physcomitrium complex. Here, we report the rediscovery of the monospecific, Himalayan endemic genera Brachymeniopsis and Clavitheca , after nearly 100 years and 40 years since their respective descriptions. The genera are characterized by, among other traits, their short sporophytes lacking the sporangial peristome teeth controlling spore dispersal. Phylogenomic inferences reveal that Brachymeniopsis gymnostoma arose within the clade of Entosthodon s.str., a genus with typically long‐exserted capsules. We therefore propose to transfer B. gymnostoma to the genus Entosthodon , as E. gymnostomus comb. nov . Furthermore, Clavitheca poeltii , the sole species of the genus, is morphologically highly similar to E. gymnostomus , and should also be transferred to Entosthodon , but is retained as a distinct taxon, E. poeltii comb. nov . , until additional populations allow for testing the robustness of the observed divergence in costa and seta length between the Nepalese and Chinese populations.
... To test if the discordance between plastome tree and nuclear trees could be explained by ILS alone, we conducted coalescent simulations following previous studies (Garcıá et al., 2017;Morales-Briones et al., 2018;Zhou et al., 2022). The ASTRAL-III tree was chosen as a guide tree for the gene tree simulation using DENDROPY v3.12.1 (Sukumaran and Holder, 2010). To simulate plastid gene trees, branch lengths of the ASTRAL-III tree were scaled by 12 to account for organellar inheritance as paleotropical woody bamboos are hexaploidy, and the effective population size of the plastome is generally expected to be one-twelfth that of the nuclear genome given the assumptions of equal sex ratios, haploidy (homoplasmic), and uniparental inheritance (McCauley, 1994;Stull et al., 2020). ...
Article
Full-text available
Neomicrocalamus and Temochloa are closely related to bamboo genera. However, when considered with newly discovered and morphologically similar material from China and Vietnam, the phylogenetic relationship among these three groups was ambiguous in the analyses based on DNA regions. Here, as a means of investigating the relationships among the three bamboo groups and exploring potential sources of genomic conflicts, we present a phylogenomic examination based on the whole plastome, single-nucleotide polymorphism (SNP), and single-copy nuclear (SCN) gene datasets. Three different phylogenetic hypotheses were found. The inconsistency is attributed to the combination of incomplete lineage sorting and introgression. The origin of newly discovered bamboos is from introgressive hybridization between Temochloa liliana (which contributed 80.7% of the genome) and Neomicrocalamus prainii (19.3%), indicating that the newly discovered bamboos are closer to T. liliana in genetics. The more similar morphology and closer distribution elevation also imply a closer relationship between Temochloa and newly discovered bamboos.
... (Jetz et al., 2012). We downloaded 1000 random trees using the Hackett backbone (Hackett et al., 2008) and merged them into a rooted, ultrametric consensus tree using the SumTrees Python library (Sukumaran and Holder, 2010). The phylogenetic signal was estimated by maximum likelihood and set to the most appropriate value in each model. ...
Article
Full-text available
Chronically high blood glucose levels (hyperglycaemia) can compromise healthy ageing and lifespan at the individual level. Elevated oxidative stress can play a central role in hyperglycaemia-induced pathologies. Nevertheless, the lifespan of birds shows no species-level association with blood glucose. This suggests that the potential pathologies of high blood glucose levels can be avoided by adaptations in oxidative physiology at the macroevolutionary scale. However, this hypothesis remains unexplored. Here, we examined this hypothesis using comparative analyses controlled for phylogeny, allometry and fecundity based on data from 51 songbird species (681 individuals with blood glucose and 1021 individuals with oxidative state data). We measured blood glucose at baseline and after stress stimulus and computed glucose stress reactivity as the magnitude of change between the two time points. We also measured three parameters of non-enzymatic antioxidants (uric acid, total antioxidants and glutathione) and a marker of oxidative lipid damage (malondialdehyde). We found no clear evidence for blood glucose concentration being correlated with either antioxidant or lipid damage levels at the macroevolutionary scale, as opposed to the hypothesis postulating that high blood glucose levels entail oxidative costs. The only exception was the moderate evidence for species with a stronger stress-induced increase in blood glucose concentration evolving moderately lower investment into antioxidant defence (uric acid and glutathione). Neither baseline nor stress-induced glucose levels were associated with oxidative physiology. Our findings support the hypothesis that birds evolved adaptations preventing the (glyc)oxidative costs of high blood glucose observed at the within-species level. Such adaptations may explain the decoupled evolution of glycaemia and lifespan in birds and possibly the paradoxical combination of long lifespan and high blood glucose levels relative to mammals.
... https://github.com/jeetsukumaran/DendroPy), distributed with the DendroPy v4.5.2 Python library 22(Sukumaran and Holder 2010). SumTrees was run with the minimum clade frequency set to zero and 23 otherwise default settings. ...
Article
Full-text available
Even in the genomics era, the phylogeny of Neotropical small felids comprised in the genus Leopardus remains contentious. We used whole-genome resequencing data to construct a time-calibrated consensus phylogeny of this group, quantify phylogenomic discordance, test for inter-species introgression, and assess patterns of genetic diversity and demographic history. We infer that the Leopardus radiation started in the Early Pliocene as an initial speciation burst, followed by another in its subgenus Oncifelis during the Early Pleistocene. Our findings challenge the long-held notion that ocelot (Leopardus pardalis) and margay (L. wiedii) are sister species, and instead indicate that margay is most closely related to the enigmatic Andean cat (L. jacobita), whose whole-genome data are reported here for the first time. In addition, we found that the newly sampled Andean tiger cat (L. tigrinus pardinoides) population from Colombia associates closely with Central American tiger cats (L. tigrinus oncilla). Genealogical discordance was largely attributable to incomplete lineage sorting, yet was augmented by strong gene flow between ocelot and the ancestral branch of Oncifelis, as well as between Geoffroy’s cat (L. geoffroyi) and southern tiger cat (L. guttulus). Contrasting demographic trajectories have led to disparate levels of current genomic diversity, with a nearly tenfold difference in heterozygosity between Andean cat and ocelot, spanning the entire range of variability found in extant felids. Our analyses improved our understanding of the speciation history and diversity patterns in this felid radiation, and highlight the benefits for phylogenomic inference of embracing the many heterogeneous signals scattered across the genome.
Article
We have sequenced, assembled, and analyzed the nuclear and mitochondrial genomes and transcriptomes of Potamopyrgus estuarinus and Potamopyrgus kaitunuparaoa, two prosobranch snail species native to New Zealand that together span the continuum from estuary to freshwater. These two species are the closest known relatives of the freshwater species Potamopyrgus antipodarum—a model for studying the evolution of sex, host–parasite coevolution, and biological invasiveness—and thus provide key evolutionary context for understanding its unusual biology. The P. estuarinus and P. kaitunuparaoa genomes are very similar in size and overall gene content. Comparative analyses of genome content indicate that these two species harbor a near-identical set of genes involved in meiosis and sperm functions, including seven genes with meiosis-specific functions. These results are consistent with obligate sexual reproduction in these two species and provide a framework for future analyses of P. antipodarum—a species comprising both obligately sexual and obligately asexual lineages, each separately derived from a sexual ancestor. Genome-wide multigene phylogenetic analyses indicate that P. kaitunuparaoa is likely the closest relative to P. antipodarum. We nevertheless show that there has been considerable introgression between P. estuarinus and P. kaitunuparaoa. That introgression does not extend to the mitochondrial genome, which appears to serve as a barrier to hybridization between P. estuarinus and P. kaitunuparaoa. Nuclear-encoded genes whose products function in joint mitochondrial-nuclear enzyme complexes exhibit similar patterns of nonintrogression, indicating that incompatibilities between the mitochondrial and the nuclear genome may have prevented more extensive gene flow between these two species.
Article
Males and females often have different roles in reproduction, although the origin of these differences has remained controversial. Explaining the enigmatic reversed sex roles where males sacrifice their mating potential and provide full parental care is a particularly long-standing challenge in evolutionary biology. While most studies focused on ecological factors as the drivers of sex roles, recent research highlights the significance of social factors such as the adult sex ratio. To disentangle these propositions, here, we investigate the additive and interactive effects of several ecological and social factors on sex role variation using shorebirds (sandpipers, plovers, and allies) as model organisms that provide the full spectrum of sex role variation including some of the best-known examples of sex-role reversal. Our results consistently show that social factors play a prominent role in driving sex roles. Importantly, we show that reversed sex roles are associated with both male-skewed adult sex ratios and high breeding densities. Furthermore, phylogenetic path analyses provide general support for sex ratios driving sex role variations rather than being a consequence of sex roles. Together, these important results open future research directions by showing that different mating opportunities of males and females play a major role in generating the evolutionary diversity of sex roles, mating system, and parental care.
Article
Full-text available
Background & Aims: Lineage sorting (LS) refers to the process in which multiple populations are descended from a common ancestral population or species and are now reproductively isolated from one another. It provides an approach to gain insights into speciation, and is often classified into the phases of polyphyly, paraphyly, and monophyly. The first two phases are in the state of incomplete lineage sorting (ILS) where gene trees do not correctly reflect species trees. The third phase is in the state of complete LS where gene trees are concordant with species trees. Here, we reviewed relevant theories and summarized recent progresses in method for LS detection. Progress: We first systematically discussed the coalescent theories of how genome sites with distinct evolutionary properties (neutral or selective) in an ancestral population were transmitted to progeny populations. We discussed the potential relationships between gene trees and species trees for neutral and selective genes, respectively. Secondly, we delved into LS analyses based on the neutral DNA sequences, including construction of phylogeny under ILS and the network-based phylogenetic analysis. We then discussed the impacts of selection on LS analysis and methods for detecting both directional and balancing selection based on gene trees and species trees. Finally, we discussed a few open questions about the effects of mating system on LS, the detection of ILS, and the effects of pollen and seed flow on LS. Prospect: New theories are needed to explore how mating system shapes the LS process for both selective and neutral genes. To appropriately assess ILS for individual genes based on species trees, it is crucial to improve the method estimating species trees and to fully utilize the potential of genome sequence data in future study. Given a high frequency of natural hybridization in plant species, a phylogenetic network method is needed to simultaneously examine pollen and seed flow together with ILS. Answers to these questions could help us to understand in-depth the LS process in plant species.
Article
Full-text available
In this study of evolutionary relationships in the subfamily Rubioideae (Rubiaceae), we take advantage of the off-target proportion of reads generated via previous target capture sequencing projects based on nuclear genomic data to build a plastome phylogeny and investigate cytonuclear discordance. The assembly of off-target reads resulted in a comprehensive plastome dataset and robust inference of phylogenetic relationships, where most intratribal and intertribal relationships are resolved with strong support. While the phylogenetic results were mostly in agreement with previous studies based on plastome data, novel relationships in the plastid perspective were also detected. For example, our analyses of plastome data provide strong support for the SCOUT clade and its sister relationship to the remaining members of the subfamily, which differs from previous results based on plastid data but agrees with recent results based on nuclear genomic data. However, several instances of highly supported cytonuclear discordance were identified across the Rubioideae phylogeny. Coalescent simulation analysis indicates that while ILS could, by itself, explain the majority of the discordant relationships, plastome introgression may be the better explanation in some cases. Our study further indicates that plastomes across the Rubioideae are, with few exceptions, highly conserved and mainly conform to the structure, gene content, and gene order present in the majority of the flowering plants.
Article
Full-text available
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Article
Sunda-Papuan keelback snakes (Serpentes: Natricidae: Tropidonophis Jan 1863) include 20 species distributed from the Philippines south-east through the Moluccas to New Guinea and Australia. Diversity of this insular snake lineage peaks on the island of New Guinea. Previous phylogenetic studies incorporating Tropidonophis have been limited to multi-locus Sanger-sequenced datasets with broad squamate or family-level focus. We used a targeted-sequence capture approach to sequence thousands of nuclear ultraconserved elements (UCEs) to construct the most comprehensive sequence-based phylogenetic hypothesis for this genus and estimate ancestral biogeography. Phylogenies indicate the genus is monophyletic given recent taxonomic reassignment of Rhabdophis spilogaster to Tropidonophis. All UCE phylogenies recovered a monophyletic Tropidonophis with reciprocally monophyletic Philippine and New Guinean clades. Divergence dating and ancestral range estimation suggest dispersal to New Guinea from the Philippines to have occurred during the Mid-Miocene via the Oceanic Arc Terranes. From Late Miocene into the Pliocene the genus experienced rapid diversification from orogeny of the New Guinean Central Cordillera from Oceanic Arc Terrane accretion on the northern boundary of the Sahul Shelf. Future collecting of missing taxa from the Moluccas and Indonesian Papua will better the understanding of non-volant faunal biogeography and diversification in this tectonically complex Pacific arena.
Article
Escherichia coli is a diverse pathogen, causing a range of disease in humans, from self-limiting diarrhea to urinary tract infections (UTIs). Uropathogenic E. coli (UPEC) is the most frequently observed uropathogen in UTIs, a common disease in high-income countries, incurring billions of dollars yearly in treatment costs. Although E. coli is easily grown and identified in the clinical laboratory, genotyping the pathogen is more complicated, yet critical for reducing the incidence of disease. These goals can be achieved through whole-genome sequencing of E. coli isolates, but this approach is relatively slow and typically requires culturing the pathogen in the laboratory. To genotype E. coli rapidly and inexpensively directly from clinical samples, including but not limited to urine, we developed and validated a multiplex amplicon sequencing assay, called ColiSeq. The assay consists of targets designed for E. coli species confirmation, high resolution genotyping, and mixture deconvolution. To demonstrate its utility, we screened the ColiSeq assay against 230 clinical urine samples collected from a hospital system in Flagstaff, Arizona, USA. A limit of detection analysis demonstrated the ability of ColiSeq to identify E. coli at a concentration of ~2 genomic equivalent (GEs)/mL and to generate high-resolution genotyping at a concentration of 1 × 10 ⁵ GEs/mL. The results of this study suggest that ColiSeq could be a valuable method to understand the source of UPEC strains and guide infection mitigation efforts. As sequence-based diagnostics become accepted in the clinical laboratory, workflows such as ColiSeq will provide actionable information to improve patient outcomes. IMPORTANCE Urinary tract infections (UTIs), caused primarily by Escherichia coli , create an enormous health care burden in the United States and other high-income countries. The early detection of E. coli from clinical samples, including urine, is important to target therapy and prevent further patient complications. Additionally, understanding the source of E. coli exposure will help with future mitigation efforts. In this study, we developed, tested, and validated an amplicon sequencing assay focused on direct detection of E. coli from urine. The resulting sequence data were demonstrated to provide strain level resolution of the pathogen, not only confirming the presence of E. coli , which can focus treatment efforts, but also providing data needed for source attribution and contact tracing. This assay will generate inexpensive, rapid, and reproducible data that can be deployed by public health agencies to track, diagnose, and potentially mitigate future UTIs caused by E. coli .
Article
Full-text available
The genus Gracilinanus ranges from savannas to dense forests in South America, yet its systematics have never been thoroughly investigated across its wide distributional range. We assessed Gracilinanus phylogenetic relationships, species boundaries, and geographical limits using mtDNA sequences. Our analysis confirmed the distinctiveness of the six recognized species (G. aceramarcae, G. agilis, G. emiliae, G. marica, G. microtarsus, and G. peruanus), with a mean p-distance for interspecific nucleotide sequence divergences ranging from 13–16.2% and robust phylogenetic support (BPP > 0.95; BS > 75%). Refined species delimitation approaches (GMYC, PTP, ASAP) revealed potential cryptic diversity, suggesting up to 20 candidate species. Three geographically structured and divergent lineages (4.1–4.8% sequence divergence) were identified within G. agilis, extending its Cerrado range. Within G. emiliae, we found divergence values ranging from 4.7–5.7% and the first known record for the northeastern Atlantic Forest. Three divergent clades were recovered within G. microtarsus (9.0–9.8% sequence divergence), including a new lineage for the northern Atlantic Forest. For G. peruanus, we found two divergent lineages (7.2%) and the first documented occurrence for Amazonian lowland forest. This comprehensive sampling revealed greater genetic diversity in Gracilinanus, extending its geographic limits. Here we propose nine putative new species, emphasizing a hidden diversity that warrants formal description and further increases the taxonomic diversity of this genus. These newly identified lineages underscore the urgency of inventorying and conserving the threatened ecosystems of the Cerrado and Atlantic Forest hotspots.
Preprint
Full-text available
The evolutionary history of species has become relevant to understanding and explaining the composition and structure of biological communities; however, we need to identify species clearly and have a phylogenetic framework to consider such a historical perspective. This study seeks to understand the community-level patterns of mammals in Andean highland forest remnants associated to agricultural landscapes. Our methods included fieldwork to survey small terrestrial mammals, bats, and medium to large species during two sampling periods in avocado plantations in the Western Cordillera of Colombia. We implemented three approaches to identify mammal species: traditional morphological identification, DNA barcoding, and phylogenetic analyses. We also evaluated the Phylogenetic Diversity of the mammal community of this study with other assemblages in montane forests. Our fieldwork recorded 738 records of 37 mammal species included in 13 families. Our study generated sequences for 18 mammal species of Colombia and ten new DNA barcodes, highlighting the importance of producing genetic libraries for Neotropical mammals. Our phylogenetic diversity analyses show that although our study area is more species-rich than other Andean localities, it has lower phylogenetic diversity values because many mammalian lineages are absent in these transformed ecosystems. We propose expanding the use of DNA-based species identification and Phylogenetic Diversity analyses to provide an objective characterization of the communities rather than simplistic and misleading parameters such as species richness.
Article
Full-text available
The myriad microorganisms that live in close association with humans have diverse effects on physiology, yet the molecular bases for these impacts remain mostly unknown1–3. Classical pathogens often invade host tissues and modulate immune responses through interactions with human extracellular and secreted proteins (the ‘exoproteome’). Commensal microorganisms may also facilitate niche colonization and shape host biology by engaging host exoproteins; however, direct exoproteome–microbiota interactions remain largely unexplored. Here we developed and validated a novel technology, BASEHIT, that enables proteome-scale assessment of human exoproteome–microbiome interactions. Using BASEHIT, we interrogated more than 1.7 million potential interactions between 519 human-associated bacterial strains from diverse phylogenies and tissues of origin and 3,324 human exoproteins. The resulting interactome revealed an extensive network of transkingdom connectivity consisting of thousands of previously undescribed host–microorganism interactions involving 383 strains and 651 host proteins. Specific binding patterns within this network implied underlying biological logic; for example, conspecific strains exhibited shared exoprotein-binding patterns, and individual tissue isolates uniquely bound tissue-specific exoproteins. Furthermore, we observed dozens of unique and often strain-specific interactions with potential roles in niche colonization, tissue remodelling and immunomodulation, and found that strains with differing host interaction profiles had divergent interactions with host cells in vitro and effects on the host immune system in vivo. Overall, these studies expose a previously unexplored landscape of molecular-level host–microbiota interactions that may underlie causal effects of indigenous microorganisms on human health and disease.
Preprint
In migratory species, the temporal phases of the annual cycle are intrinsically linked to seasonally shifting geographic ranges. Despite intense interest in the annual cycle ecology of migration, a synthetic understanding of the relationship between the biogeography and phenology of seasonal migration remains elusive. Here, we interrogate the spatiotemporal structure of the annual cycle in a novel phylogenetic comparative framework. We use eBird, a massive avian occurrence dataset, to demarcate and measure in a consistent manner among species the portions of the annual cycle when a geographic distribution is stationary versus dynamic due to migration. Through comparative analyses of the durations of annual cycle stages for 150 species of migratory birds breeding in North America, we show that the duration of the migratory periods is remarkably consistent among species and is unrelated to the distance between breeding and nonbreeding locations. In other words, the seasonal distributions of long-distance migrants shift between their geographically distant stationary phases in the same amount of time as short-distance migrants, suggesting that individuals of long-distance migratory species have more synchronous periods of migration and likely a faster individual migratory pace than short-distance migrants. Our results further show that the amount of time a species spends on the breeding grounds is strongly inversely related to time spent on the nonbreeding grounds, revealing the length of the breeding versus nonbreeding stationary period to be the primary source of species-level variation in the pacing of the annual cycle, as opposed to the time needed for the migratory period. Further, our study reveals that the amount of time spent annually on the breeding versus nonbreeding grounds predicts the distance between breeding and nonbreeding locations, demonstrating key linkages between the biogeography of the migratory cycle, its phenology, and the evolution of life history tradeoffs.
Article
Abstract. Vitreorana parvula was the first glassfrog described for the Atlantic Forest. The species, however, has become a taxonomic puzzle as the only known individual is the lectotype from the 19 th century, which is not particularly well-preserved or accompanied by a detailed original description. To solve this problem, we collected topotypic specimens, as well as advertisement calls, tissue samples, and natural history data, and compared them to other Vitreorana species. Our results show clear morphological, acoustic, and genetic differences between V. parvula and other species of Vitreorana, except for V. uranoscopa. Following our results, we consider V. uranoscopa as a junior synonym of V. parvula and redescribe the species based on topotypic material, while summarizing relevant variation from across its distribution. ---------------------------------------------------------------------------------------------------------- Resumo. Vitreorana parvula foi a primeira perereca-de-vidro descrita para a Mata Atlântica. Contudo, a espécie se tornou um quebra-cabeça taxonômico, uma vez que o único indivíduo conhecido é o lectótipo coletado no século 19, o qual não está em boas condições de preservação, nem acompanhada de uma detalhada descrição original. Visando resolver este problema, nós coletamos espécimes topotípicos, assim como dados de cantos de anúncio, amostras de tecido e registros da história natural, para comparar com outras espécies de Vitreorana da Mata Atlântica. Nossos resultados mostram uma clara diferenciação morfológica, acústica e genética entre V. parvula e as demais espécies de Vitreorana, com exceção de V. uranoscopa. Considerando nossos resultados, alocamos V. uranoscopa como um sinónimo júnior de V. parvula e redescrevemos a espécie baseado no material topotípico, além de registrar relevantes variações ao longo da sua distribuição.
Article
Full-text available
Evolutionary processes may have substantial impacts on community assembly, but evidence for phylogenetic relatedness as a determinant of interspecific interaction strength remains mixed. In this perspective, we consider a possible role for discordance between gene trees and species trees in the interpretation of phylogenetic signal in studies of community ecology. Modern genomic data show that the evolutionary histories of many taxa are better described by a patchwork of histories that vary along the genome rather than a single species tree. If a subset of genomic loci harbour trait‐related genetic variation, then the phylogeny at these loci may be more informative of interspecific trait differences than the genome background. We develop a simple method to detect loci harbouring phylogenetic signal and demonstrate its application through a proof‐of‐principle analysis of Penicillium genomes and pairwise interaction strength. Our results show that phylogenetic signal that may be masked genome‐wide could be detectable using phylogenomic techniques and may provide a window into the genetic basis for interspecific interactions.
Article
The Open Tree of Life (OToL) project produces a supertree that summarizes phylogenetic knowledge from tree estimates published in the primary literature. The supertree construction algorithm iteratively calls Aho’s Build algorithm thousands of times in order to assess the compatability of different phylogenetic groupings. We describe an incrementalized version of the Build algorithm that is able to share work between successive calls to Build . We provide details that allow a programmer to implement the incremental algorithm BuildInc , including pseudo-code and a description of data structures. We assess the effect of BuildInc on our supertree algorithm by analyzing simulated data and by analyzing a supertree problem taken from the OpenTree 13.4 synthesis tree. We find that BuildInc provides up to 550-fold speedup for our supertree algorithm.
Article
Full-text available
Background The bHLH transcription factor family is named after the basic helix-loop-helix (bHLH) domain that is a characteristic element of their members. Understanding the function and characteristics of this family is important for the examination of a wide range of functions. As the availability of genome sequences and transcriptome assemblies has increased significantly, the need for automated solutions that provide reliable functional annotations is emphasised. Results A phylogenetic approach was adapted for the automatic identification and functional annotation of the bHLH transcription factor family. The bHLH_annotator, designed for the automated functional annotation of bHLHs, was implemented in Python3. Sequences of bHLHs described in literature were collected to represent the full diversity of bHLH sequences. Previously described orthologs form the basis for the functional annotation assignment to candidates which are also screened for bHLH-specific motifs. The pipeline was successfully deployed on the two Arabidopsis thaliana accessions Col-0 and Nd-1, the monocot species Dioscorea dumetorum, and a transcriptome assembly of Croton tiglium. Depending on the applied search parameters for the initial candidates in the pipeline, species-specific candidates or members of the bHLH family which experienced domain loss can be identified. Conclusions The bHLH_annotator allows a detailed and systematic investigation of the bHLH family in land plant species and classifies candidates based on bHLH-specific characteristics, which distinguishes the pipeline from other established functional annotation tools. This provides the basis for the functional annotation of the bHLH family in land plants and the systematic examination of a wide range of functions regulated by this transcription factor family.
Article
Recent phylogeographic studies of poorly-dispersing coastal invertebrates in highly biodiverse regions have led to the discovery of high levels of cryptic diversity and complex phylogeographic patterns that suggest isolation, geological, and ecological processes have shaped their biodiversity. Studies of southern African coastal invertebrates have uncovered cryptic diversity for various taxa and phylogeographic patterns that, although sharing some similarities across taxa, do differ. These findings underscore the need for additional studies to better understand the biodiversity levels, distributional patterns, and processes responsible for producing coastal biodiversity in that region. The coastal isopod Deto echinata is of particular interest, as its complex taxonomic history, poor dispersal capabilities, and broad geographic distribution suggest the potential for cryptic diversity. We use mitochondrial and nuclear sequences to characterize D. echinata individuals from localities ranging from northern Namibia to Glentana, about 2,500 km along the coastline on the south coast of South Africa. These are used to assess whether D. echinata harbors cryptic genetic diversity and whether phylogeographic distributional patterns correlate with those previously documented for other coastal isopods in the region. Analysis of mitochondrial and nuclear sequences revealed two deeply-divergent lineages that exhibit a distributional break in the Cape Peninsula region. These findings suggest D. echinata is a cryptic species complex in need of taxonomic revision and highlight the need for further taxonomic and phylogeographic studies of similarly poorly-dispersing coastal invertebrates in southern Africa.
Article
Full-text available
Scaling laws are a powerful way to compare genomes because they put all organisms onto a single curve and reveal nontrivial generalities as genomes change in size. The abundance of functional categories across genomes has previously been found to show power law scaling with respect to the total number of functional categories, suggesting that universal constraints shape genomic category abundance. Here, we look across the tree of life to understand how genome evolution may be related to functional scaling. We revisit previous observations of functional genome scaling with an expanded taxonomy by analyzing 3,726 bacterial, 220 archaeal, and 79 unicellular eukaryotic genomes. We find that for some functional classes, scaling is best described by multiple exponents, revealing previously unobserved shifts in scaling as genome-encoded protein annotations increase or decrease. Furthermore, we find that scaling varies between phyletic groups at both the domain and phyla levels and is less universal than previously thought. This variability in functional scaling is not related to taxonomic phylogeny resolved at the phyla level, suggesting that differences in cell plan or physiology outweigh broad patterns of taxonomic evolution. Since genomes are maintained and replicated by the functional proteins encoded by them, these results point to functional degeneracy between taxonomic groups and unique evolutionary trajectories toward these. We also find that individual phyla frequently span scaling exponents of functional classes, revealing that individual clades can move across scaling exponents. Together, our results reveal unique shifts in functions across the tree of life and highlight that as genomes grow or shrink, proteins of various functions may be added or lost.
Article
Full-text available
Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.
Article
Full-text available
The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk.
Article
Full-text available
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.
Article
Full-text available
Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. Availability: The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.
Article
Full-text available
We have implemented in Python the COmparative GENomic Toolkit, a fully integrated and thoroughly tested framework for novel probabilistic analyses of biological sequences, devising workflows, and generating publication quality graphics. PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for third-party applications. The toolkit takes advantage of parallel architectures and runs on a range of hardware and operating systems, and is available under the general public license from http://sourceforge.net/projects/pycogent.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Book
— We studied sequence variation in 16S rDNA in 204 individuals from 37 populations of the land snail Candidula unifasciata (Poiret 1801) across the core species range in France, Switzerland, and Germany. Phylogeographic, nested clade, and coalescence analyses were used to elucidate the species evolutionary history. The study revealed the presence of two major evolutionary lineages that evolved in separate refuges in southeast France as result of previous fragmentation during the Pleistocene. Applying a recent extension of the nested clade analysis (Templeton 2001), we inferred that range expansions along river valleys in independent corridors to the north led eventually to a secondary contact zone of the major clades around the Geneva Basin. There is evidence supporting the idea that the formation of the secondary contact zone and the colonization of Germany might be postglacial events. The phylogeographic history inferred for C. unifasciata differs from general biogeographic patterns of postglacial colonization previously identified for other taxa, and it might represent a common model for species with restricted dispersal.
Article
It is not unusual for several classifications to be given for the same collection of objects. We present a method, called majority rule, which can be used to define a consensus of these classifications. We also discuss some mathematical properties of this consensus tree.
Article
A metric on general phylogenetic trees is presented. This extends the work of most previous authors, who constructed metrics for binary trees. The metric presented in this paper makes possible the comparison of the many nonbinary phylogenetic trees appearing in the literature. This provides an objective procedure for comparing the different methods for constructing phylogenetic trees. The metric is based on elementary operations which transform one tree into another. Various results obtained in applying these operations are given. They enable the distance between any pair of trees to be calculated efficiently. This generalizes previous work by Bourque to the case where interior vertices can be labeled, and labels may contain more than one element or may be empty.
Article
The n-coalescent is a continuous-time Markov chain on a finite set of states, which describes the family relationships among a sample of n members drawn from a large haploid population. Its transition probabilities can be calculated from a factorization of the chain into two independent components, a pure death process and a discrete-time jump chain. For a deeper study, it is useful to construct a more complicated Markov process in which n-coalescents for all values of n are embedded in a natural way.
Article
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.
Data standards in phylogenetics: the nexml project
  • Vos
Vos,R. (2008) Data standards in phylogenetics: the nexml project. In Weitzman,A.L. and Belbin,L. (eds), Proceedings of TDWG (2008), Fremantle, Australia.
Testing macro-evolutionary models using incomplete molecular phylogenies R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci
  • O Pybus
  • P Harvey
Pybus,O. and Harvey,P. (2000) Testing macro-evolutionary models using incomplete molecular phylogenies. Proc. R. Soc. B: Biol. Sci., 267, 2267. R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Rannala,B. and Yang,Z. (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164, 1645–1656.
P4, a python package for phylogenetics
  • P Foster
Foster,P. (2010) P4, a python package for phylogenetics. Available at http://bmnh.org/∼pf/p4.html (last accessed date February 02, 2010).
geiger: Analysis of evolutionary diversification. R package version
  • L Harmon
Harmon,L. et al. (2009) geiger: Analysis of evolutionary diversification. R package version 1.3-1.
The coalescent. Stochastic Processes Appl
  • J Kingman
Kingman,J. (1982) The coalescent. Stochastic Processes Appl., 13, 235–248.
R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing
  • R Development
  • Core Team
R Development Core Team (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Data standards in phylogenetics: the nexml project
  • R Vos
  • AL Weitzman
  • L Belbin