Fast calculation of the quartet distance between trees of arbitrary degrees

Department of Computer Science, University of Aarhus, Aabogade 34, DK-8200 Arhus N, Denmark.
Algorithms for Molecular Biology (Impact Factor: 1.86). 02/2006; 1:16. DOI: 10.1186/1748-7188-1-16
Source: DBLP

ABSTRACT A number of algorithms have been developed for calculating the quartet distance between two evolutionary trees on the same set of species. The quartet distance is the number of quartets - sub-trees induced by four leaves - that differs between the trees. Mostly, these algorithms are restricted to work on binary trees, but recently we have developed algorithms that work on trees of arbitrary degree.
We present a fast algorithm for computing the quartet distance between trees of arbitrary degree. Given input trees T and T', the algorithm runs in time O(n + /V/./V'/ min{id, id'}) and space O(n + /V/./V'/), where n is the number of leaves in the two trees, V and V are the non-leaf nodes in T and T', respectively, and id and id' are the maximal number of non-leaf nodes adjacent to a non-leaf node in T and T', respectively. The fastest algorithms previously published for arbitrary degree trees run in O(n3) (independent of the degree of the tree) and O(/V/./V'/'), respectively. We experimentally compare the algorithm with existing algorithms for computing the quartet distance for general trees.
We present a new algorithm for computing the quartet distance between two trees of arbitrary degree. The new algorithm improves the asymptotic running time for computing the quartet distance, compared to previous methods, and experimental results indicate that the new method also performs significantly better in practice.

  • Source
    • "Two algorithms exist that can be directly applied to compute the parametric quartet distance (see also [11]). One runs in time O(n 2 min{d 1 , d 2 }), where, for i ∈ {1, 2}, d i is the maximum degree of a node in T i [12]; the other takes O(d 9 n log n) time, where d is the maximum degree of a node in T 1 and T 2 [34]. 4 Our faster O(n 2 ) algorithm offers a 2-approximate solution when an exact value of the parametric quartet distance is not required. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We define, analyze, and give efficient algorithms for two kinds of distance measures for rooted and unrooted phylogenies. For rooted trees, our measures are based on the topologies the input trees induce on triplets; that is, on three-element subsets of the set of species. For unrooted trees, the measures are based on quartets (four-element subsets). Triplet and quartet-based distances provide a robust and fine-grained measure of the similarities between trees. The distinguishing feature of our distance measures relative to traditional quartet and triplet distances is their ability to deal cleanly with the presence of unresolved nodes, also called polytomies. For rooted trees, these are nodes with more than two children; for unrooted trees, they are nodes of degree greater than three. Our first class of measures are parametric distances, where there is a parameter that weighs the difference between an unresolved triplet/quartet topology and a resolved one. Our second class of measures are based on Hausdorff distance. Each tree is viewed as a set of all possible ways in which the tree could be refined to eliminate unresolved nodes. The distance between the original (unresolved) trees is then taken to be the Hausdorff distance between the associated sets of fully resolved trees, where the distance between trees in the sets is the triplet or quartet distance, as appropriate. Comment: 34 pages
    Theoretical Computer Science 06/2009; 412(48). DOI:10.1007/978-3-540-78773-0_7 · 0.52 Impact Factor
  • Source
    • "In [4] we developed two algorithms: the first algorithm runs in time O(n 3 ) and space O(n 2 )—and is thus independent of the degree of the inner nodes—the second in time O(n 2 d 2 ) and space O(n 2 ), where d is the maximal degree of inner nodes in the trees— and thus depend on the degree of the nodes. The O(n 2 d 2 ) was later improved to O(n 2 d) [5] and by taking an approach similar to the Brodal et al. [2] O(n log n) we developed a sub-quadratic algorithm in terms of n but at a significant cost in terms of d: O(d 9 n log n) [10]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We derive a quadratic time and space algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree 3. The time and space complexity of our algorithm is quadratic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm for computing the quartet distance between general trees independent of the degree of the inner nodes.
    International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009, Shanghai, China, 3-5 August 2009; 01/2009
  • Source

Preview (3 Sources)

Available from