Article

Embedding Operational Taxonomic Units in Three-Dimensional Space for Evolutionary Distance Relationship in Phylogenetic Analysis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Phylogenetic analysis is one of important issues in bioinformatics, which estimates evolutionary relation-ship of operational taxonomic units. The phylogenetic trees used in phylogenetic analysis are weak in expressing information about the evolutionary distances among those units. This paper proposes a method to embed them into a three-dimensional space for evolutionary distance expression and presents the developed tool to visualize them in a three-dimensional space along with a two-dimensional tree view.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The most popular methods of data visualization, such as the principal component analysis [3], consist in mapping objects from some multi-dimensional space into a two-or three-dimensional space. However usage of these methods for displaying the evolutionary process [4] doesn't allow to observe the presence of a hypercavity. At the same time, a hypercavity can be seen in some cross-section of the evolutionary space. ...
Conference Paper
Full-text available
This paper proposes a definition and a solution of the problem of finding a hyper cavity as a data-free hyper sphere of maximum radius. This problem is formulated here as a multiextremal problem under constraints in a linear feature space and in a linear space produced by a kernel function. In accordance with the proposed approach, just as in the one-class SVM, the center of the hyper sphere is sought for as a linear combination of some small quantity of so called "support" objects. Experiments with smulated points in a 2-dimensional feature space and with symbolic sequences modeling a global evolutionary process have demonstrated correctness of the obtained solution.
Article
Full-text available
Unlabelled: This paper examines a new technique for the visualization of and the interaction with trees, objects frequently used to convey hierarchical relationships in biological data. Motivated by the quality of 2D tree interaction, we adapt the planar tree-of-life metaphor to a virtual, semi-immersive 3D environment. A 3D environment extends the utility of this metaphor by allowing the user to view an entire data set in a single screen. Interrogation of the tree is implemented using 3D input devices. This real-time interrogation of the tree itself provides a quick means by which to qualitatively analyse the hierarchical data. In this paper, we describe the techniques underlying the implementation of such an environment. We conclude by considering the utility of tree metaphors as a basis for the representation of highly dimensional data sets. Availability: Arbor3D (source code, a binary executable for SGI IRIX 6.4, Perl parsers, and sample Newick data files) are available via the Internet (http://xian.tamu.edu/Arbor3D/). Arbor3D can be displayed in "CAVE simulator" mode on an SGI workstation screen, or as an interactive virtual environment on a projection workbench. Contact: druths@rice.edu; echen@cs.rice.edu; leland@xian.tamu.edu
Article
Full-text available
Structural comparison of large trees is a difficult task that is only partially supported by current visualization techniques, which are mainly designed for browsing. We present TreeJuxtaposer, a system designed to support the comparison task for large trees of several hundred thousand nodes. We introduce the idea of "guaranteed visibility", where highlighted areas are treated as landmarks that must remain visually apparent at all times. We propose a new methodology for detailed structural comparison between two trees and provide a new nearly-linear algorithm for computing the best corresponding node from one tree to another. In addition, we present a new rectilinear Focus+Context technique for navigation that is well suited to the dynamic linking of side-by-side views while guaranteeing landmark visibility and constant frame rates. These three contributions result in a system delivering a fluid exploration experience that scales both in the size of the dataset and the number of pixels in the display. We have based the design decisions for our system on the needs of a target audience of biologists who must understand the structural details of many phylogenetic, or evolutionary, trees. Our tool is also useful in many other application domains where tree comparison is needed, ranging from network management to call graph optimization to genealogy.
Article
Full-text available
The assumptions underlying the maximumparsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981, J. Mol. Evol. 17: 368--376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that M...
Article
Full-text available
Common existing phylogenetic tree visualisation tools are not able to display readable trees with more than a few thousand nodes. These existing methodologies are based in two dimensional space. We introduce the idea of visualising phylogenetic trees in three dimensional hyperbolic space with the Walrus graph visualisation tool and have developed a conversion tool that enables the conversion of standard phylogenetic tree formats to Walrus' format. With Walrus, it becomes possible to visualise and navigate phylogenetic trees with more than 100,000 nodes. Walrus enables desktop visualisation of very large phylogenetic trees in 3 dimensional hyperbolic space. This application is potentially useful for visualisation of the tree of life and for functional genomics derivatives, like The Adaptive Evolution Database (TAED).
Article
Full-text available
We explored the use of multidimensional scaling (MDS) of tree-to-tree pairwise distances to visualize the relationships among sets of phylogenetic trees. We found the technique to be useful for exploring “tree islands” (sets of topologically related trees among larger sets of near-optimal trees), for comparing sets of trees obtained from bootstrapping and Bayesian sampling, for comparing trees obtained from the analysis of several different genes, and for comparing multiple Bayesian analyses. The technique was also useful as a teaching aid for illustrating the progress of a Bayesian analysis and as an exploratory tool for examining large sets of phylogenetic trees. We also identified some limitations to the method, including distortions of the multidimensional tree space into two dimensions through the MDS technique, and the definition of the MDS-defined space based on a limited sample of trees. Nonetheless, the technique is a useful approach for the analysis of large sets of phylogenetic trees.
Article
In the rising tide of business transaction data, these tools help distinguish which are strategic assets and which are not worth collecting in the first place.
Article
Motivation: The rapidly increasing amount and disparity of biological data requires interpretation at many levels of description. Human judgement and intuition are important because not all data can be automatically and comprehensively analyzed. Visualization of trees and substructures corresponding to certain features are often used to analyze phylogenies or taxonomies. Unfortunately, most existing tools do not cope with the size of current datasets, the required functionality, or both. Results: We introduce a program for visualization of huge trees and also for the interactive exploration of their content. We have developed a range of new schemes which are tailored for biological problems. Users can get an overview, zoom in, filter out data and retrieve details from standard databases such as SWISS-PROT. Furthermore, it is possible to analyze the relationship between chosen leaf sets that are specified by common features on a second level of representation. On a PC (with approximately equal to 512 MB RAM), trees of up to several tens of thousands of leaves can be loaded and both rapidly and interactively explored. We demonstrate the use of this program for the analysis of the SYSTERS data set (which contains hierarchically clustered protein sequences) to which PFAM domains were added as features.
Article
The development of powerful visualisation tools is a major challenge in bioinformatics. Phylogenetics, a field with a growing impact on a variety of life science areas, is experiencing an increasing but poorly met requirement for software supporting the advanced visualisation of phylogenetic trees. Visualisation problems within the domain are commonly experienced by its researchers, but are poorly documented. Furthermore, the applications in the domain have not been reviewed from an information visualisation perspective. In this paper, the problems are defined and the methods employed by phylogenetic applications are reviewed with respect to related research within information visualisation. The results of a survey of the visualisation needs of phylogenetics researchers are also presented.
Article
We are designing tools to visualize very large sets of phylogenetic trees. Our tools give a three dimesional representation of treespace, with two dimensions representing the clustering of trees under multidimensional scaling, and the third dimension (the "height") the score of the tree (i.e. parsimony or maximum likelihood score). The user can rotate the resulting distribution to get a sense of the threedimensional structure. This is implemented as part of the Mesquite system for phylogenetic analysis.
Article
now consist of several hundred species, and presently available tree reconstruction methods are inadequate to the task of analyzing such datasets. For example, an rbcL DNA sequence data set of 500 plants has been analyzed for several years now, without solution. The explanation for why these analyses are so difficult is simple: the optimization problems are NP-hard, and the heuristics used in an attempt to solve these optimization problems use hill-climbing techniques to search through an exponentially large space of phylogenetic trees. Statistical approaches towards phylogeny reconstruction have modeled the evolutionary process stochastically, and have studied the performance of methods for recovering phylogenetic trees in terms of the accuracy of these methods on datasets of finite length sequences generated under different model trees. These studies have shown that some methods recover the true tree topology with high probability, once the sequences
The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny
  • M Salemi
  • A.-M Vandamme
M. Salemi, A.-M. Vandamme, The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny, Cambridge, 2003.
Phylogenetic Trees: An Information Visualization Perspective
  • S F Carrizo
S. F. Carrizo, Phylogenetic Trees: An Information Visualization Perspective. In Proc. the 2nd Asia-Pacific Bioinformatics Conference(APBC2004), Dunedin, New Zealand, 2004.
Visualizing Restricted Landscapes of Phylogenetic Trees
  • I Montealegre
  • K St
  • John
I. Montealegre and K. St. John, Visualizing Restricted Landscapes of Phylogenetic Trees. In Proc. of the European Conference for Computational Biology (ECCB 03), Paris, Sept. 2003.
Analysis and Visualuzation of Tree Space
  • D M Hills
  • T A Heath
  • K St
  • John
D. M. Hills, T. A. Heath, K. St. John, Analysis and Visualuzation of Tree Space, Systematic Biology, 54(3), 2005, pp.471-482.