PhyloFinder: an intelligent search engine for phylogenetic tree databases. BMC Evol Biol 8:90

Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
BMC Evolutionary Biology (Impact Factor: 3.37). 02/2008; 8(1):90. DOI: 10.1186/1471-2148-8-90
Source: PubMed


Bioinformatic tools are needed to store and access the rapidly growing phylogenetic data. These tools should enable users to identify existing phylogenetic trees containing a specified taxon or set of taxa and to compare a specified phylogenetic hypothesis to existing phylogenetic trees.
PhyloFinder is an intelligent search engine for phylogenetic databases that we have implemented using trees from TreeBASE. It enables taxonomic queries, in which it identifies trees in the database containing the exact name of the query taxon and/or any synonymous taxon names, and it provides spelling suggestions for the query when there is no match. Additionally, PhyloFinder can identify trees containing descendants or direct ancestors of the query taxon. PhyloFinder also performs phylogenetic queries, in which it identifies trees that contain the query tree or topologies that are similar to the query tree.
PhyloFinder can enhance the utility of any tree database by providing tools for both taxonomic and phylogenetic queries as well as visualization tools that highlight the query results and provide links to NCBI and TBMap. An implementation of PhyloFinder using trees from TreeBASE is available from the web client application found in the availability and requirements section.

Download full-text


Available from: David F. Fernández-Baca
  • [Show abstract] [Hide abstract]
    ABSTRACT: Biologists are often interested to query published phylo- genetic data for research purposes. PhyQL, a web-based visual phylogenetic query engine, can be quite useful on this regard. In PhyQL, we have implemented a data model and a visual query language to interact with hierarchi- cally classified tree elements. To hide textual query sub- mission, PhyQL provides a design interface to build the query visually. The users can build simple to complex queries using the query operators. PhyQL separates the application layer from the data layer by a logic layer lead- ing to reduced query tools development time. Moreover, PhyQL provides interactive tree views in radial, phylo- gram and dendrogram layout. It can be accessed online at
    No preview · Conference Paper · Jan 2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (, which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.
    Full-text · Article · Jul 2008 · Systematic Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL methodology we developed is scalable and can be applied to new data, as those become available in the future. Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection.
    Full-text · Article · Feb 2009 · BMC Evolutionary Biology
Show more