Anchored Hybrid Enrichment for Massively High-Throughput Phylogenomics

Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA.
Systematic Biology (Impact Factor: 14.39). 05/2012; 61(5):727-44. DOI: 10.1093/sysbio/sys049
Source: PubMed


The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.

    • "Nowadays, enrichment techniques are no longer restricted to mitochondrial genes, but rather expanded to numerous other loci of interest for their implementation in phylogenetic analyses (e.g. Lemmon et al. 2012; Peñalba et al. 2014). In contrast, bait sequences ( " bioinformatics baits " ) serve to identify complete mitochondrial genomes in a mixed pool of untagged samples that were sequenced and assembled together (e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Glyceridae (Annelida) are a group of venomous annelids distributed worldwide from intertidal to abyssal depths. To trace the evolutionary history and complexity of glycerid venom cocktails, a solid backbone phylogeny of this group is essential. We therefore aimed to reconstruct the phylogenetic relationships of these annelids using Illumina sequencing technology. We constructed whole genome shotgun libraries for 19 glycerid specimens and one outgroup species (Glycinde armigera). The chosen target genes comprise 13 mitochondrial proteins, two ribosomal mitochondrial genes and four nuclear loci (18SrRNA, 28SrRNA, ITS1, ITS2). Based on partitioned Maximum likelihood as well as Bayesian analyses of the resulting supermatrix we were finally able to resolve a robust glycerid phylogeny and identified three clades comprising the majority of taxa. Furthermore, we detected group II introns inside the cox1 gene of two analysed glycerid specimens, with two different insertions in one of these species. Moreover, we generated reduced datasets comprising 10 million, 4 million and 1 million reads from the original datasets to test the influence of the sequencing depth on assembling complete mitochondrial genomes from low coverage genome data. We estimated the coverage of mitochondrial genome sequences in each dataset size by mapping the filtered Illumina reads against the respective mitochondrial contigs. By comparing the contig coverage calculated in all dataset sizes, we got a hint for the scalability of our genome skimming approach. This allows estimating more precisely the number of reads that are at least necessary to reconstruct complete mitochondrial genomes in Glyceridae and probably non-model organisms in general.
    Genome Biology and Evolution 11/2015; DOI:10.1093/gbe/evv224 · 4.23 Impact Factor
  • Source
    • "method represents the first effort to simultaneously capture different mitochondrial DNA from mass samples to recover taxonomic composition. Although previous work has demonstrated some success in cross-taxa sequence capture (Lemmon et al. 2012; Li et al. 2013 "

  • Source
    • "Our method represents the first effort to simultaneously capture different mitochondrial DNA from mass samples to recover taxonomic composition. Although previous work has demonstrated some success in cross-taxa sequence capture (Lemmon et al. 2012; Li et al. 2013), it is crucial to understand whether and how the presence of multiple divergent species affect capture success. Our results suggest that probes based on references from a "
    [Show abstract] [Hide abstract]
    ABSTRACT: Biodiversity analyses based on Next Generation Sequencing (NGS) platforms have developed by leaps and bounds in recent years. A PCR-free strategy, which can alleviate taxonomic bias, was considered as a promising approach to delivering reliable species compositions of targeted environments. The major impediment of such a method is the lack of appropriate mitochondrial DNA enrichment ways. Because mitochondrial genomes (mitogenomes) make up only a small proportion of total DNA, PCR-free methods will inevitably result in a huge excess of data (> 99%). Furthermore, the massive volume of sequence data is highly demanding on computing resources. Here, we present a mitogenome enrichment pipeline via a gene capture chip that was designed by virtue of the mitogenome sequences of the 1000 Insect Transcriptome Evolution project (1KITE, A mock sample containing 49 species was used to evaluate the efficiency of the mitogenome capture method. We demonstrate that the proportion of mitochondrial DNA can be increased by ca. 100-fold (from the original 0.47% to 42.52%). Variation in phylogenetic distances of target taxa to the probe set could in principle result in bias in abundance. However, the frequencies of input taxa were largely maintained after capture (R(2) =0.81). We suggest that our mitogenome capture approach coupled with PCR-free shotgun sequencing could provide ecological researchers an efficient NGS method to deliver reliable biodiversity assessment. This article is protected by copyright. All rights reserved.
    Molecular Ecology Resources 10/2015; DOI:10.1111/1755-0998.12472 · 3.71 Impact Factor
Show more