The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research

Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
Systematic Biology (Impact Factor: 14.39). 07/2008; 57(3):335-46. DOI: 10.1080/10635150802158688
Source: PubMed


As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (, which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.

Download full-text


Available from: André Wehe
  • Source
    • "gov/genbank/)显示, 现有基因数据已涵盖我国木本 植物约1,090属(占全部1,175属的93%)。GenBank数 据形式如附录1所示。从如此海量数据中准确、快 速查找和下载所需数据, 是高效利用这些数据的基 础, 也是目前这些基因数据难以被广泛使用的障碍 之一(Sanderson et al., 2008; Jones et al., 2011 "
    [Show abstract] [Hide abstract]
    ABSTRACT: 核苷酸序列是生物体遗传信息的载体, 是现代生物学和生态学的基础数据。随着测序技术的进步, 大量核苷酸序列被提取并存储在公共数据平台中, 其中GenBank (是目前最大的核苷酸序列数据平台之一。截至2015年2月, 该平台收录核苷酸序列总数已超过1.8亿条、覆盖全球超过30万个物种。但如何从如此海量的数据中准确、快速查找并下载所需数据已成为限制基因数据广泛使用的障碍之一。为此, 我们开发了一款可高效、准确下载GenBank数据的生物信息学软件NCBIminer。NCBIminer可根据用户提供的核苷酸序列名称、数据类型、一或多条初始化参考序列, 查找并下载用户指定的多个物种或类群的特定基因序列数据。该软件下载地址为, 可在Windows、Linux和MAC操作系统下免费使用; 同时, 其操作简单, 用户无需生物信息学背景。为方便该软件的使用, 本文将介绍该软件的工作流程与算法、 安装及使用过程中的参数设置等。
    Full-text · Article · Jul 2015 · Biodiversity Science
    • "We mined GenBank for all available bat mitochondrial and nuclear sequences, utilizing scripts that automated sequence identification, cleaning, and alignment, and generally followed the supermatrix approaches of Hinchcliff and Roalson (2013) and Zanne et al. (2014). We first used the PhyLoTa Browser (Sanderson et al. 2008) to identify all loci sequenced for at least 20 unique bat species within GenBank release 194. We then downloaded the entire national center for biotechnology information (NCBI) SQLite3 database of Chiroptera nucleotide data using the program PHLAWD (Smith et al. 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Species richness varies widely across extant clades, but the causes of this variation remain poorly understood. We investigate the role of diversification rate heterogeneity in shaping patterns of diversity across families of extant bats. To provide a robust framework for macroevolutionary inference, we assemble a time-calibrated, species-level phylogeny using a supermatrix of mitochondrial and nuclear sequence data. We analyze the phylogeny using a Bayesian method for modeling complex evolutionary dynamics. Surprisingly, we find that variation in family richness can largely be explained without invoking heterogeneous diversification dynamics. We document only a single well-supported shift in diversification dynamics across bats, occurring at the base of the subfamily Stenodermatinae. Bat diversity is phylogenetically imbalanced, but - contrary to previous hypotheses - this pattern is unexplained by any simple patterns of diversification rate heterogeneity. This discordance may indicate that diversification dynamics are more complex than can be captured using the statistical tools available for modeling data at this scale. We infer that bats as a whole are almost entirely united into one macroevolutionary cohort, with decelerating speciation through time. There is also a significant relationship between clade age and richness, suggesting that global bat diversity may still be expanding. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    No preview · Article · May 2015 · Evolution
  • Source
    • "Our dataset was assembled using the PhyLoTA browser release 1.5 (Sanderson et al., 2008). A total of five mitochondrial (cytochrome b – cytb, cytochrome oxidase I – coxI, NADH dehydrogenase subunit 2 – nd2, and ribosomal 12S and 16S) and four nuclear (myoglobin exons 2 and 3 – myo, ornithine decarboxylase exons 6 through 8 – odc, and the recombination-activating protein genes 1 and 2 – rag1, rag2) genes were selected. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we present a detailed family-level phylogenetic hypothesis for the largest avian order (Aves: Passeriformes) and an unmatched multi-calibrated, relaxed clock inference for the diversification of crown passerines. Extended taxon sampling allowed the recovery of many challenging clades and elucidated their position in the tree. Acanthisittia appear to have diverged from all other passerines at the early Paleogene, which is considerably later than previously suggested. Thus, Passeriformes may be younger and represent an even more intense adaptive radiation compared to the remaining avian orders. Based on our divergence time estimates, a novel hypothesis for the diversification of modern Suboscines is proposed. According to this hypothesis, the first split between New and Old World lineages would be related to the severing of the Africa-South America biotic connection during the mid-late Eocene, implying an African origin for modern Eurylaimides. The monophyletic status of groups not recovered by any subsequent study since their circumscription, viz. Sylvioidea including Paridae, Remizidae, Hyliotidae, and Stenostiridae; and Muscicapoidea including the waxwing assemblage (Bombycilloidea) were notable topological findings. We also propose possible ecological interactions that may have shaped the distinct Oscine distribution patterns in the New World. The insectivorous endemic Oscines of the Americas, Vireonidae (Corvoidea), Mimidae, and Troglodytidae (Muscicapoidea), probably interfered with autochthonous Suboscines through direct competition. Thus, the Early Miocene arrival of these lineages before any other Oscines may have occupied the few available niches left by Tyrannides, constraining the diversification of insectivorous Oscines that arrived in the Americas later. The predominantly frugivorous-nectarivorous members of Passeroidea, which account for most of the diversity of New World-endemic Oscines, may not have been subjected to competition with Tyrannides. In fact, the vast availability of frugivory niches combined with weak competition with the autochthonous passerine fauna may have been crucial for passeroids to thrive in the New World. Copyright © 2015 Elsevier Inc. All rights reserved.
    Full-text · Article · Mar 2015 · Molecular Phylogenetics and Evolution
Show more