Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent.

Department of Mathematics and Statistics, University of Alaska Fairbanks, PO Box 756660, Fairbanks, AX 99775, USA.
Journal of Mathematical Biology (Impact Factor: 2.37). 06/2011; 62(6):833-62. DOI: 10.1007/s00285-010-0355-7
Source: PubMed

ABSTRACT Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals-each with many genes-splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates the computational geometry relevant to calculations of the Frechet mean and variance for probability distributions on the phylogenetic tree space of Billera, Holmes and Vogtmann, using the theory of probability measures on spaces of nonpositive curvature developed by Sturm. We show that the combinatorics of geodesics with a specified fixed endpoint in tree space are determined by the location of the varying endpoint in a certain polyhedral subdivision of tree space. The variance function associated to a finite subset of tree space is continuously differentiable within each cell of the corresponding subdivision. We use this subdivision to establish two iterative methods for producing sequences that converge to the Frechet mean: one based on Sturm's Law of Large Numbers, and another based on descent algorithms for finding optima of smooth functions on convex polyhedra. We present properties and biological applications of Frechet means and extend our main results to more general globally nonpositively curved spaces composed of Euclidean orthants.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract The multispecies coalescent model describes the generation of gene trees from a rooted metric species tree and thus provides a framework for the inference of species trees from sampled gene trees. We prove that the STAR method of Liu et al. ( 2009 ) and generalizations of it, are statistically consistent methods of topological species tree inference under this model. We discuss the impact of gene tree sampling schemes for species tree inference using generalized STAR methods and reinterpret the original STAR as a consensus method based on clades.
    Journal of computational biology: a journal of computational molecular cell biology 01/2013; 20(1):50-61. · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The controversy surrounding the potential impact of birds in spirochete transmission dynamics and their capacity to serve as a reservoir has existed for a long time. The majority of analyzed bird species are able to infect larval ticks with Borrelia. Dispersal of infected ticks due to bird migration is a key to the establishment of new foci of Lyme borreliosis. The dynamics of infection in birds supports the mixing of different species, the horizontal exchange of genetic information, and appearance of recombinant genotypes. Four Borrelia burgdorferi sensu lato strains were cultured from Ixodes minor larvae and four strains were isolated from Ixodes minor nymphs collected from a single Carolina Wren (Thryothorus ludovicianus). A multilocus sequence analysis that included 16S rRNA, a 5S-23S intergenic spacer region, a 16S-23S internal transcribed spacer, flagellin, p66, and ospC separated 8 strains into 3 distinct groups. Additional multilocus sequence typing of 8 housekeeping genes, clpA, clpX, nifS, pepX, pyrG, recG, rplB, and uvrA was used to resolve the taxonomic status of bird-associated strains. Results of analysis of 14 genes confirmed that the level of divergence among strains is significantly higher than what would be expected for strains within a single species. The presence of cross-species recombination was revealed: Borrelia burgdorferi sensu stricto housekeeping gene nifS was incorporated into homologous locus of strain, previously assigned to B. americana. Genetically diverse Borrelia strains are often found within the same tick or same vertebrate host, presenting a wide opportunity for genetic exchange. We report the cross-species recombination that led to incorporation of a housekeeping gene from the B. burgdorferi sensu stricto strain into a homologous locus of another bird-associated strain. Our results support the hypothesis that recombination maintains a majority of sequence polymorphism within Borrelia populations because of the re-assortment of pre-existing sequence variants. Even if our findings of broad genetic diversity among 8 strains cultured from ticks that fed on a single bird could be the exception rather than the rule, they support the theory that the diversity and evolution of LB spirochetes is driven mainly by the host.
    Parasites & Vectors 01/2014; 7(1):4. · 3.25 Impact Factor

Full-text (2 Sources)

Available from
May 27, 2014