Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics

Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA.
Systematic Biology (Impact Factor: 14.39). 05/2012; 61(5):727-44. DOI: 10.1093/sysbio/sys049
Source: PubMed


The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.

  • Source
    • "However, the ability to capture loci across relatively deep phylogenetic scales has remained challenging because of the inverse relationship between capture efficiency and the evolutionary distance from the individual(s) used to design the probes (Bi et al., 2012;Lemmon, Emme & Lemmon, 2012;Peñalba et al., 2014;Weitemier et al., 2014). For very deep divergences in animals, to understand amniote evolution or deep divergences in vertebrate evolution for example, ultra-conserved elements (Faircloth et al., 2012) and anchored hybrid enrichment (Lemmon, Emme & Lemmon, 2012) have been used to target conserved loci that are flanked by less conserved regions. However, these regions were developed using animal genomes and are unsuitable for use in plants (Reneker et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Zingiberales are an iconic order of monocotyledonous plants comprising eight families with distinctive and diverse floral morphologies and representing an important ecological element of tropical and subtropical forests. While the eight families are demonstrated to be monophyletic, phylogenetic relationships among these families remain unresolved. Neither combined morphological and molecular studies nor recent attempts to resolve family relationships using sequence data from whole plastomes has resulted in a well-supported, family-level phylogenetic hypothesis of relationships. Here we approach this challenge by leveraging the complete genome of one member of the order, Musa acuminata , together with transcriptome information from each of the other seven families to design a set of nuclear loci that can be enriched from highly divergent taxa with a single array-based capture of indexed genomic DNA. A total of 494 exons from 418 nuclear genes were captured for 53 ingroup taxa. The entire plastid genome was also captured for the same 53 taxa. Of the total genes captured, 308 nuclear and 68 plastid genes were used for phylogenetic estimation. The concatenated plastid and nuclear dataset supports the position of Musaceae as sister to the remaining seven families. Moreover, the combined dataset recovers known intra- and inter-family phylogenetic relationships with generally high bootstrap support. This is a flexible and cost effective method that gives the broader plant biology community a tool for generating phylogenomic scale sequence data in non-model systems at varying evolutionary depths.
    Full-text · Article · Jan 2016 · PeerJ
  • Source
    • "Target capture of conserved genomic regions (Faircloth et al. 2012; Lemmon et al. 2012) combined with massively parallel sequencing produce data matrices containing thousands of unlinked loci distributed across the genome suitable for phylogenetic inference. These markers can be generated efficiently, cost-effectively, and are useful across deeper evolutionary scales than restriction enzyme based reduced-representation libraries (Rubin et al. 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Production of massive DNA sequence datasets is transforming phylogenetic inference, but best practices for analyzing such datasets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4800 locus sequence matrix of landfowl using target capture of ultraconserved elements. We then compared phylogenies estimated with concatenated maximum likelihood, quartet-based methods executed on concatenated matrices, and gene tree reconciliation methods across five thresholds of missing data. Results of maximum likelihood and quartet analyses were similar, well-resolved, and demonstrated increasing support with increasing matrix size and sparseness. Conversely, gene tree reconciliation produced unexpected relationships when we included all informative loci, with certain taxa placed towards the root compared to other approaches. Inspection of these taxa identified a prevalence of short average contigs, which potentially biased gene tree inference and caused erroneous results in gene tree reconciliation. This suggests the more problematic missing data in gene-tree based analyses are partial sequences rather than entire missing sequences from locus alignments. Limiting gene tree reconciliation to the most informative loci solved this problem, producing well-supported topologies congruent with concatenation and quartet methods. Collectively, our analyses provide a well-resolved phylogeny of landfowl, including strong support for previously problematic relationships such as those among junglefowl (Gallus), and clarify the position of two enigmatic galliform genera (Lerwa, Melanoperdix) not sampled in previous molecular phylogenetic studies.
    Full-text · Article · Dec 2015 · Molecular Biology and Evolution
  • Source
    • "Recent advances in DNA sequencing technologies provide great opportunities for using genome-scale data to reconstruct phylogenetic history (Rokas and Abbot 2009; Hittinger et al. 2010; Faircloth et al. 2012; Lemmon et al. 2012). However, recent phylogenomic studies in diverse taxonomic groups, including plants (Zhong et al. 2013; Wickett et al. 2014), fungi (Hess and Goldman 2011; Salichos and Rokas 2013), and animals (Song et al. 2012; Jarvis et al. 2014), have shown that a large number of individual gene trees are topologically incongruent with each other. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Comparison of individual gene trees in several recent phylogenomic studies from diverse lineages has revealed a surprising amount of topological conflict or incongruence, but we still know relatively little about its distribution across the tree of life. To further our understanding of incongruence, the factors that contribute to it and how it can be ameliorated, we examined its distribution in a clade of 20 Culicidae mosquito species through the reconstruction and analysis of the phylogenetic histories of 2,007 groups of orthologous genes. Levels of incongruence were generally low, the three exceptions being the internodes concerned with the branching of Anopheles christyi, with the branching of the subgenus Anopheles as well as the already reported incongruence within the Anopheles gambiae species complex. Two of these incongruence events (An. gambiae species complex and An. christyi) are likely due to biological factors, whereas the third (subgenus Anopheles) is likely due to analytical factors. Similar to previous studies, the use of genes or internodes with high bootstrap support or internode certainty values, both of which were positively correlated with gene alignment length, substantially reduced the observed incongruence. However, the clade support values of the internodes concerned with the branching of the subgenus Anopheles as well as within the An. gambiae species complex remained very low. Based on these results, we infer that the prevalence of incongruence in Culicidae mosquitoes is generally low, that it likely stems from both analytical and biological factors, and that it can be ameliorated through the selection of genes with strong phylogenetic signal. More generally, selection of genes with strong phylogenetic signal may be a general empirical solution for reducing incongruence and increasing the robustness of inference in phylogenomic studies.
    Full-text · Article · Nov 2015 · Genome Biology and Evolution
Show more