Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics

Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA.
Systematic Biology (Impact Factor: 11.53). 05/2012; 61(5):727-44. DOI: 10.1093/sysbio/sys049
Source: PubMed

ABSTRACT The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tree alignment graphs (TAGs) provide an intuitive data structure for storing phylogenetic trees that exhibits the relationships of the individual input trees and can potentially account for nested taxonomic relationships. This paper provides a theoretical foundation for the use of TAGs in phylogenetics. We provide a formal definition of TAG that - unlike previous definition - does not depend on the order in which input trees are provided. In the consensus case, when all input trees have the same leaf labels, we describe algorithms for constructing majority-rule and strict consensus trees using the TAG. When the input trees do not have identical sets of leaf labels, we describe how to determine if the input trees are compatible and, if they are compatible, to construct a supertree that contains the input trees.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract— Primers from a recently published study that identified a set of low-copy nuclear genes (LCNG) in multiple angiosperms were used to obtain sequence data from three LCNG (Chlp, Agt1, and Hmgs) for phylogenetic inference at the species level. The phylogenetic utility of each of these markers was compared to ITS and seven chloroplast loci (trnL, trnG-S, ycf5, accD, rpoC1, trnK intron, psbM-trnD intergenic spacer) widely used in phylogenetic analyses. Here we use Valerianaceae as an example for two reasons: 1) the group has a well-supported “backbone” phylogeny based on numerous molecular markers; and 2) there are several species groups (e.g. the South American taxa) that have been particularly difficult to resolve, potentially due to a rapid or recent radiation. Although these new markers added nucleotide characters, they did not provide significant phylogenetic information to resolve relationships among closely related species of Valerianaceae. Likewise, relationships among some of the major clades of Valerianaceae are still not resolved with much certainty. This study indicates that several of these nuclear markers provide a significant increase in phylogenetic signal compared to traditional chloroplast markers, but information content within these regions is most useful at broader-scale phylogenetic levels. Advances in “next-generation” sequencing technologies may ultimately make their utility obsolete.
    Systematic Botany 02/2015; 40(1). DOI:10.1600/036364415X686611 · 1.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite considerable progress in unravelling the phylogenetic relationships of microhylid frogs, relationships among subfamilies remain largely unstable and many genera are not demonstrably monophyletic. Here, we used five alternative combinations of DNA sequence data (ranging from seven loci for 48 taxa to up to 73 loci for as many as 142 taxa) generated using the anchored phylogenomics sequencing method (66 loci, derived from conserved genome regions, for 48 taxa) and Sanger sequencing (seven loci for up to 142 taxa) to tackle this problem. We assess the effects of character sampling, taxon sampling, analytical methods and assumptions in phylogenetic inference of microhylid frogs. The phylogeny of microhylids shows high susceptibility to different analytical methods and datasets used for the analyses. Clades inferred from maximum-likelihood are generally more stable across datasets than those inferred from parsimony. Parsimony trees inferred within a tree-alignment framework are generally better resolved and better supported than those inferred within a similarity-alignment framework, even under the same cost matrix (equally weighted) and same treatment of gaps (as a fifth nucleotide state). We discuss potential causes for these differences in resolution and clade stability among discovery operations. We also highlight the problem that commonly used algorithms for model-based analyses do not explicitly model insertion and deletion events (i.e. gaps are treated as missing data). Our results corroborate the monophyly of Microhylidae and most currently recognized subfamilies but fail to provide support for relationships among subfamilies. Several taxonomic updates are provided, including naming of two new subfamilies, both monotypic.
    Cladistics 03/2015; DOI:10.1111/cla.12118 · 6.09 Impact Factor


Available from
Jun 5, 2014