Article

On the maximum quartet distance between phylogenetic trees

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A conjecture of Bandelt and Dress states that the maximum quartet distance between any two phylogenetic trees on n leaves is at most ( 2/3 +o(1))(n 4). Using the machinery of flag algebras, we improve the currently known bounds regarding this conjecture; in particular, we show that the maximum is at most (0.69 + o(1)) (n 4). We also give further evidence that the conjecture is true by proving that the maximum distance between caterpillar trees is at most ( 2/3 + o(1)) (n 4).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As we are interested in monotone subwords, we consider their minimum asymptotic density, as formally defined below. 1 (k−1) k−1 . This value was shown to hold for the special class of layered permutations by de Oliveira Bastos, and Coregliano [6]. ...
... resulting matrix that are potentially nonzero). In the "bound" case, rounding is less stringent and hence suitable for sdp's with a large number of variables (see, e.g., [1,18]). As in our problem we only claim a lower bound for f (s, 3), the rounding procedure we present is of the "bound" category; specifically, we will adopt to our setting the method from [18]. ...
... The slack block of A j is entirely zero except for entries [1,1] and [j + 1, j + 1] which are 1 (so the slack block is a diagonal matrix). The input files referenced in Table 4 contain all of these values in standard SDPA sparse format (see the manuals of either SDPA or CSDP for a description of this format, used by both programs). ...
Article
We consider the asymptotic minimum density f(s,k) of monotone k-subwords of words over a totally ordered alphabet of size s. The unrestricted alphabet case, f(,k)f(\infty,k), is well-studied, known for f(,3)f(\infty,3) and f(,4)f(\infty,4), and, in particular, conjectured to be rational for all k. Here we determine f(2,k) for all k and determine f(3,3), which is already irrational. We describe an explicit construction for all s which is conjectured to yield f(s,3). Using our construction and flag algebra, we determine f(4,3),f(5,3),f(6,3) up to 10310^{-3} yet argue that flag algebra, regardless of computational power, cannot determine f(5,3) precisely. Finally, we prove that for every fixed k3k \ge 3, the gap between f(s,k) and f(,k)f(\infty,k) is Θ(1s)\Theta(\frac{1}{s}).
... We assume that all words are over Σ 2 = {0, 1}. It is immediate to verify that for every k-word u, the probability that a randomly chosen k-subword of the alternating n-word 01010... equals u is 1 2 k + o n (1). As there are precisely 2k monotone k-words over Σ 2 , it follows that f (2, k) ≤ 2k 2 k . ...
... We assume that all words are over Σ 2 = {0, 1}. It is immediate to verify that for every k-word u, the probability that a randomly chosen k-subword of the alternating n-word 01010... equals u is 1 2 k + o n (1). As there are precisely 2k monotone k-words over Σ 2 , it follows that f (2, k) ≤ 2k 2 k . ...
... entry positions of the resulting matrix that are potentially nonzero). In the "bound" case, rounding is less stringent and hence suitable for sdp's with a large number of variables (see, e.g., [1,18]). As in our problem we only claim a lower bound for f (s, 3), the rounding procedure we present is of the "bound" category; specifically, we will adopt to our setting the method from [18]. ...
Preprint
We consider the asymptotic minimum density f(s,k) of monotone k-subwords of words over a totally ordered alphabet of size s. The unrestricted alphabet case, f(,k)f(\infty,k), is well-studied, known for f(,3)f(\infty,3) and f(,4)f(\infty,4), and, in particular, conjectured to be rational for all k. Here we determine f(2,k) for all k and determine f(3,3), which is already irrational. We describe an explicit construction for all s which is conjectured to yield f(s,3). Using our construction and flag algebra, we determine f(4,3),f(5,3),f(6,3) up to 10310^{-3} yet argue that flag algebra, regardless of computational power, cannot determine f(5,3) precisely. Finally, we prove that for every fixed k3k \ge 3, the gap between f(s,k) and f(,k)f(\infty,k) is Θ(1s)\Theta(\frac{1}{s}).
... This raises the problem of finding a tree maximizing the number of compatible quartets -maximum quartet compatibility (MQC) [15]. As MQC is NP-hard, several approximation algorithms have been designed for it [4,13,16,17], but the best approximation to the general problem is still obtained by a naive "random tree" with expected approximation ratio of 1 3 . Related to the problem of compatibility is the concept of quartet distance [9,19]. ...
... Alon, Snir, and Yuster [2] improved the upper bound on the maximum quartet distance proving it is asymptotically smaller than 9 10 n 4 . Alon, Naves, and Sudakov [1] further improved the upper bound to at most (0.69 + o(1)) n 4 . In fact, for the particular important case where both T 1 and T 2 are the caterpillar tree they established the validity of Conjecture 1.1. ...
... for every tree T of order n. The result of [1] proves that M n ≤ (0.69 + o(1)) n 4 . The caterpillar CAT n of order n ≥ 4 is the tree having only two internal vertices x, y adjacent to two leaves each. ...
Preprint
Let T be an arbitrary phylogenetic tree with n leaves. It is well-known that the average quartet distance between two assignments of taxa to the leaves of T is 23(n4)\frac 23 \binom{n}{4}. However, a longstanding conjecture of Bandelt and Dress asserts that (23+o(1))(n4)(\frac 23 +o(1))\binom{n}{4} is also the {\em maximum} quartet distance between two assignments. While Alon, Naves, and Sudakov have shown this indeed holds for caterpillar trees, the general case of the conjecture is still unresolved. A natural extension is when partial information is given: the two assignments are known to coincide on a given subset of taxa. The partial information setting is biologically relevant as the location of some taxa (species) in the phylogenetic tree may be known, and for other taxa it might not be known. What can we then say about the average and maximum quartet distance in this more general setting? Surprisingly, even determining the {\em average} quartet distance becomes a nontrivial task in the partial information setting and determining the maximum quartet distance is even more challenging, as these turn out to be dependent of the structure of T. In this paper we prove nontrivial asymptotic bounds that are sometimes tight for the average quartet distance in the partial information setting. We also show that the Bandelt and Dress conjecture does not generally hold under the partial information setting. Specifically, we prove that there are cases where the average and maximum quartet distance substantially differ.
... The main power comes from the possibility of formulating a problem as a semidefinite program and using a computer to solve it. The method can be applied in various settings such as graphs [28,44], hypergraphs [3,19], oriented graphs [29,37], edge-coloured graphs [5,12], permutations [6,55], discrete geometry [7,36], or phylogenetic trees [1]. For a detailed explanation of the flag algebra method in the setting of 3-uniform hypergraphs see [22]. ...
... Lemma 5.1. For all ε > 0 there exists δ > 0 and n 0 such that for all n ≥ n 0 : if G is a K 3 4 -free 3-uniform graph on n vertices with co 2 (G) ≥ (1 − δ) 1 3 n 4 /2, then the densities of all 3-graphs on 4, 5 and 6 vertices in G that are not contained in C n are at most ε. Additionally, ...
... where the first inequality holds because when one edge is removed from a 3-uniform hypergraph, then the codegree squared sum can go down by at most 6n. (1). By (2) and the fact that G is K 3 4 -free, we have ...
Preprint
Full-text available
Tur\'an's famous tetrahedron problem is to compute the Tur\'an density of the tetrahedron K43K_4^3. This is equivalent to determining the maximum 1\ell_1-norm of the codegree vector of a K43K_4^3-free n-vertex 3-uniform hypergraph. We will introduce a new way for measuring extremality of hypergraphs and determine asymptotically the extremal function of the tetrahedron in our notion. The codegree squared sum, co2(G)\textrm{co}_2(G), of a 3-uniform hypergraph G is the sum of codegrees squared d(x,y)2d(x,y)^2 over all pairs of vertices xy, or in other words, the square of the 2\ell_2-norm of the codegree vector of the pairs of vertices. Define exco2(n,H)\textrm{exco}_2(n,H) to be the maximum co2(G)\textrm{co}_2(G) over all H-free n-vertex 3-uniform hypergraphs G. We use flag algebra computations to determine asymptotically the codegree squared extremal number for K43K_4^3 and K53K_5^3 and additionally prove stability results. In particular, we prove that the extremal function for K43K_4^3 in 2\ell_2-norm is asymptotically the same as the one obtained from one of the conjectured extremal K43K_4^3-free hypergraphs for the 1\ell_1-norm. Further, we prove several general properties about exco2(n,H)\textrm{exco}_2(n,H) including the existence of a scaled limit, blow-up invariance and a supersaturation result.
... They also proved a 9 10 · N 4 asymptotic upper bound on the quartet distance. Finally, using the technique of flag algebra, Alon et al. [1] have obtained a (0.69 + o (1)) · N 4 upper bound on the normalized quartet distance (for a large enough number N of leaves). ...
... They also proved a 9 10 · N 4 asymptotic upper bound on the quartet distance. Finally, using the technique of flag algebra, Alon et al. [1] have obtained a (0.69 + o (1)) · N 4 upper bound on the normalized quartet distance (for a large enough number N of leaves). ...
... We start by analyzing case (1). There is no overlap between the + 1 long prefixes and the k + 1 long suffixes. ...
Article
Full-text available
Given two binary trees on N labeled leaves, the quartet distance between the trees is the number of disagreeing quartets. By permuting the leaves at random, the expected quartet distance between the two trees is 23(N4)\frac{2}{3}\left( {\begin{array}{c}N\\ 4\end{array}}\right) . However, no strongly explicit construction reaching this bound asymptotically was known. We consider complete, balanced binary trees on N=2nN=2^n leaves, labeled by n bits long sequences. Ordering the leaves in one tree by the prefix order, and in the other tree by the suffix order, we show that the resulting quartet distance is (23+o(1))(N4)\left( \frac{2}{3} + o(1)\right) \left( {\begin{array}{c}N\\ 4\end{array}}\right) , and it always exceeds the 23(N4)\frac{2}{3}\left( {\begin{array}{c}N\\ 4\end{array}}\right) bound.
... They also proved a 9 10 · N 4 asymptotic upper bound on the quartet distance. Finally, using the technique of flag algebra, Alon, Naves, and Sudakov [1] have shown a (0.69 + o(1)) · N 4 upper bound on the normalized quartet distance (for large enough number of leaves, N ). ...
... We start by analyzing case (1). There is no overlap between the ℓ + 1 long prefixes and the k + 1 long suffixes. ...
... 3.2. P 0,1 ∩ S 0, 1 We denote the length of the longest common prefix of x 0 , x 1 by ℓ (ℓ ≤ n − 1), and the length of the longest common suffix of x 0 , x 1 by k (k ≤ n − ℓ − 1). For P 0,1 , ℓ ≥ 1 should hold, and for S 0,1 , k ≥ 1 should hold. ...
Article
Full-text available
Given two binary trees on N labeled leaves, the quartet distance between the trees is the number of disagreeing quartets. By permuting the leaves at random, the expected quartets distance between the two trees is 23(N4)\frac{2}{3}\binom{N}{4}. However, no explicit construction reaching this bound asymptotically was known. We consider complete binary trees on N=2nN=2^n leaves, labeled by n long bit sequences. Ordering the leaves in one tree by the prefix order, and in the other tree by the suffix order, we show that the resulting quartet distance is (23+o(1))(N4)\left(\frac{2}{3} + o(1)\right)\binom{N}{4}, and it always exceeds the 23(N4)\frac{2}{3}\binom{N}{4} bound.
... A conjecture of Bandelt and Dress [2, p.338] asserts that the diameter is at most (2/3 + o(1)) n 4 , achieved by caterpillar trees. Noga Alon, Humberto Naves, and Benny Sudakov [1] proved that the diameter is at most (0.69 + o(1)) n 4 , and verified that the conjecture is true for the diameter of caterpillar trees, instead of all phylogenetic trees. ...
Preprint
In phylogenetics, a key problem is to construct evolutionary trees from collections of characters where, for a set X of species, a character is simply a function from X onto a set of states. In this context, a key concept is convexity, where a character is convex on a tree with leaf set X if the collection of subtrees spanned by the leaves of the tree that have the same state are pairwise disjoint. Although collections of convex characters on a single tree have been extensively studied over the past few decades, very little is known about coconvex characters, that is, characters that are simultaneously convex on a collection of trees. As a starting point to better understand coconvexity, in this paper we prove a number of extremal results for the following question: What is the minimal number of coconvex characters on a collection of n-leaved trees taken over all collections of size t >= 2, also if we restrict to coconvex characters which map to k states? As an application of coconvexity, we introduce a new one-parameter family of tree metrics, which range between the coarse Robinson-Foulds distance and the much finer quartet distance. We show that bounds on the quantities in the above question translate into bounds for the diameter of the tree space for the new distances. Our results open up several new interesting directions and questions which have potential applications to, for example, tree spaces and phylogenomics.
... The theory of flag algebras was developed by Razborov [38]. It has been used to find new results on graphs [3,12,13,39], hypergraphs [2,19,22,25], graphons [21], permutations [4], discrete geometry [20,23], and even phylogenetic trees [1], to name a few. ...
Preprint
Finding exact Ramsey numbers is a problem typically restricted to relatively small graphs. The flag algebra method was developed to find asymptotic results for very large graphs, so it seems that the method is not suitable for finding small Ramsey numbers. But this intuition is wrong, and we will develop a technique to do just that in this paper. We find new upper bounds for many small graph and hypergraph Ramsey numbers. As a result, we prove the exact values R(K4,K4,K4)=28R(K_4^-,K_4^-,K_4^-)=28, R(K8,C5)=29R(K_8,C_5)= 29, R(K9,C6)=41R(K_9,C_6)= 41, R(Q3,Q3)=13R(Q_3,Q_3)=13, R(K3,5,K1,6)=17R(K_{3,5},K_{1,6})=17, R(C3,C5,C5)=17R(C_3, C_5, C_5)= 17, and R(K4,K5;3)=12R(K_4^-,K_5^-;3)= 12. We hope that this technique will be adapted to address other questions for smaller graphs with the flag algebra method.
... From the combinatorial perspective, an intriguing question is to investigate the maximum possible quartet distance between two trees on n leaves. A conjecture of Bandelt and Dress [7] is that this is always ( 2 3 +o(1)) n 4 , with the best known bound being (0.69 +o(1)) n 4 by Alon et al. [4]. From the algorithmic perspective, a long-standing challenge is to compute the quartet distance efficiently. ...
Conference Paper
The quartet distance is a measure of similarity used to compare two unrooted phylogenetic trees on the same set of n leaves, defined as the number of subsets of four leaves related by a different topology in both trees. After a series of previous results, Brodal et al. [SODA 2013] presented an algorithm that computes this number in O(ndlogn) time, where d is the maximum degree of a node. For the related triplet distance between rooted phylogenetic trees, the same authors were able to design an O(nlogn) time algorithm, that is, with running time independent of d. This raises the question of achieving such complexity for computing the quartet distance, or at least improving the dependency on d. Our main contribution is a two-way reduction establishing that the complexity of computing the quartet distance between two trees on n leaves is the same, up to polylogarithmic factors, as the complexity of counting 4-cycles in an undirected simple graph with m edges. The latter problem has been extensively studied, and the fastest known algorithm by Vassilevska Williams [SODA 2015] works in O(m1.48) time. In fact, even for the seemingly simpler problem of detecting a 4-cycle, the best known algorithm works in O(m4/3) time, and a conjecture of Yuster and Zwick implies that this might be optimal. In particular, an almost-linear time for computing the quartet distance would imply a surprisingly efficient algorithm for counting 4-cycles. In the other direction, by plugging in the state-of-the-art algorithms for counting 4-cycles, our reduction allows us to significantly decrease the complexity of computing the quartet distance. For trees with unbounded degrees we obtain an O(n1.48) time algorithm, which is a substantial improvement on the previous bound of O(n²logn). For trees with degrees bounded by d, by analysing the reduction more carefully, we are able to obtain an Õ(nd0.77) time algorithm, which is again a nontrivial improvement on the previous bound of O(ndlogn).
... From the combinatorial perspective, an intriguing question is to investigate the maximum possible quartet distance between two trees on n leaves. A conjecture of Bandelt and Dress [7] is that this is always ( 2 3 + o(1)) n 4 , with the best known bound being (0.69 + o(1)) n 4 by Alon et al. [4]. From the algorithmic perspective, a long-standing challenge is to compute the quartet distance efficiently. ...
Preprint
The quartet distance is a measure of similarity used to compare two unrooted phylogenetic trees on the same set of n leaves, defined as the number of subsets of four leaves related by a different topology in both trees. After a series of previous results, Brodal et al. [SODA 2013] presented an algorithm that computes this number in O(ndlogn)\mathcal{O}(nd\log n) time, where d is the maximum degree of a node. Our main contribution is a two-way reduction establishing that the complexity of computing the quartet distance between two trees on n leaves is the same, up to polylogarithmic factors, as the complexity of counting 4-cycles in an undirected simple graph with m edges. The latter problem has been extensively studied, and the fastest known algorithm by Vassilevska Williams [SODA 2015] works in O(m1.48)\mathcal{O}(m^{1.48}) time. In fact, even for the seemingly simpler problem of detecting a 4-cycle, the best known algorithm works in O(m4/3)\mathcal{O}(m^{4/3}) time, and a conjecture of Yuster and Zwick implies that this might be optimal. In particular, an almost-linear time for computing the quartet distance would imply a surprisingly efficient algorithm for counting 4-cycles. In the other direction, by plugging in the state-of-the-art algorithms for counting 4-cycles, our reduction allows us to significantly decrease the complexity of computing the quartet distance. For trees with unbounded degrees we obtain an O(n1.48)\mathcal{O}(n^{1.48}) time algorithm, which is a substantial improvement on the previous bound of O(n2logn)\mathcal{O}(n^{2}\log n). For trees with degrees bounded by d, by analysing the reduction more carefully, we are able to obtain an O~(nd0.77)\mathcal{\tilde O}(nd^{0.77}) time algorithm, which is again a nontrivial improvement on the previous bound of O(ndlogn)\mathcal{O}(nd\log n).
... 2.deciding the upper bound, which needs algebraic or combinatoric methods. We want to know whether non-constructive method, like probabilistic method [109,110], could give better lower bound, or to make modification on flag algebra [111,112] for split system, which has been proved to be powerful in other extreme problem originated from phylogenetics [113]. An problem of special interest is, is growth rate of maximal cardinality of a split system with finite forbidden configurations always be polynomial? ...
Thesis
Full-text available
Quartet weight encodes the quantitative substructure of all quartets in the taxa set we analyze, as a comparison, distance is the substructure of all pairs. So quartet weights of a taxa set contains more information than distances and is possible of infering phylogenetic history more accurately. Traditionally quartet weight were used in tree reconstruction, but we found that quartet weight is more appropriate to be understood in in the context of phylogenetic network. The main part of this paper builds a theory for quartet weights and discuss some applications of quartet weight related methods in phylogenetic network reconstruction, the phylogenetic network is an alternative for phylogenetic tree, which allows for representing reticulate events. This part were consists of of two chapters. The first part were aimed to build a theory being the quartet weight analog of theory of metrics, which give rise to fruitful results, including linear dependency theorem and analogs of SplitDecomposition and Neighbornet algorithm. Even most of generalization are not direct and failed to maintain all desirable properties of the theory with metric, those method were proved practically useful in many cases. We also show that the T-theory for quartet weights failed to explain the 2-very-weakly-compatible condition. In this chapter when we were trying to apply those methods into a real dataset we found that the existing method for calculating quartet weight is not satisfactory, thus more accurate method is needed. The second part introduces an novel method that calculates quartet weight using Hadamard conjugation. Rate-variation were also involved in such method, which significantly improves the performance. Compared with existing methods like pattern counting and Maximal Likelihood-based method, Hadamard conjugation generates more accurate quartet weights and be able to construct more accurate phylogenetic networks, verified by both simulation studies and real dataset. In the end an epilogue is attached to establish some results on more general types of split system and clusters, especially on maximal cardinalities, is presented. Those systems were deviated by methods using those higher data. The order of maximal cardinalities of (p,q)-hierarchies were explicitly decided. Some other important result is: the maximal cardinality of (1,3)(-1,3)-hierarchy is between n3/9+O(n2)n^3/9+O(n^2) and n3/6+O(n2)n^3/6+O(n^2); the maximal cardinality of 22'-weakly compatible split system is between 3n2/4+O(n)3n^2/4+O(n) and n2+O(n)n^2+O(n) and maximal cardinality of 2-weakly compatible split system is between 3n2/2+O(n)3n^2/2+O(n) and O(n2.5)O(n^{2.5}).
... The easiest and most popular usage is the plain flag algebra method. The theory of flag algebras was applied to graphs [3,11,12,33], hypergraphs [2,18,21,24], graphons [20], permutations [4], discrete geometry [19,22], and even phylogenetic trees [1], to name a few. Formally, the method works with homomorphisms from linear combinations of combinatorial structures (graphs) to real numbers. ...
Article
We use the theory of flag algebras to find new upper bounds for several small graph and hypergraph Ramsey numbers. In particular, we prove the exact values R(K4,K4,K4)=28R(K_4^-,K_4^-,K_4^-)=28, R(K8,C5)=29R(K_8,C_5)= 29, R(K9,C6)=41R(K_9,C_6)= 41, R(Q3,Q3)=13R(Q_3,Q_3)=13, R(K3,5,K1,6)=17R(K_{3,5},K_{1,6})=17, R(C3,C5,C5)=17R(C_3, C_5, C_5)= 17, and R(K4,K5;3)=12R(K_4^-,K_5^-;3)= 12, and in addition improve many additional upper bounds.
Preprint
We study the approximability of a broad class of computational problems -- originally motivated in evolutionary biology and phylogenetic reconstruction -- concerning the aggregation of potentially inconsistent (local) information about n items of interest, and we present optimal hardness of approximation results under the Unique Games Conjecture. The class of problems studied here can be described as Constraint Satisfaction Problems (CSPs) over infinite domains, where instead of values {0,1}\{0,1\} or a fixed-size domain, the variables can be mapped to any of the n leaves of a phylogenetic tree. The topology of the tree then determines whether a given constraint on the variables is satisfied or not, and the resulting CSPs are called Phylogenetic CSPs. Prominent examples of Phylogenetic CSPs with a long history and applications in various disciplines include: Triplet Reconstruction, Quartet Reconstruction, Subtree Aggregation (Forbidden or Desired). For example, in Triplet Reconstruction, we are given m triplets of the form ijkij|k (indicating that ``items i,j are more similar to each other than to k'') and we want to construct a hierarchical clustering on the n items, that respects the constraints as much as possible. Despite more than four decades of research, the basic question of maximizing the number of satisfied constraints is not well-understood. The current best approximation is achieved by outputting a random tree (for triplets, this achieves a 1/3 approximation). Our main result is that every Phylogenetic CSP is approximation resistant, i.e., there is no polynomial-time algorithm that does asymptotically better than a (biased) random assignment. This is a generalization of the results in Guruswami, Hastad, Manokaran, Raghavendra, and Charikar (2011), who showed that ordering CSPs are approximation resistant (e.g., Max Acyclic Subgraph, Betweenness).
Article
Let T be an arbitrary phylogenetic tree with n leaves. It is well known that the average quartet distance between two assignments of taxa to the leaves of T is 2 3 n 4 . However, a longstanding conjecture of Bandelt and Dress asserts that ( 2 3 + o ( 1 ) ) n 4 is also the maximum quartet distance between two assignments. While Alon, Naves, and Sudakov have shown this indeed holds for caterpillar trees, the general case of the conjecture is still unresolved. A natural extension is when partial information is given: the two assignments are known to coincide on a given subset of taxa. The partial information setting is biologically relevant as the location of some taxa (species) in the phylogenetic tree may be known, and for other taxa it might not be known. What can we then say about the average and maximum quartet distance in this more general setting? Surprisingly, even determining the average quartet distance becomes a nontrivial task in the partial information setting and determining the maximum quartet distance is even more challenging, as these turn out to be dependent on the structure of T. In this paper we prove nontrivial asymptotic bounds that are sometimes tight for the average quartet distance in the partial information setting. We also show that the Bandelt and Dress conjecture does not generally hold under the partial information setting. Specifically, we prove that there are cases where the average and maximum quartet distance substantially differ.
Article
Synthesizing median trees from a collection of gene trees under the biologically motivated gene tree parsimony (GTP) costs has provided credible species tree estimates. GTP costs are defined for each of the classic evolutionary processes. These costs count the minimum number of events necessary to reconcile the gene tree with the species tree where the leaf-genes are mapped to the leaf-species through a function called labeling. To better understand the synthesis of median trees under these costs there is an increased interest in analyzing their diameters. The diameters of a GTP cost between a gene tree and a species tree are the maximum values of this cost of one or both topologies of the trees involved. We are concerned about the diameters of the GTP costs under bijective labelings. While these diameters are linear time computable for the gene duplication and deep coalescence costs, this has been unknown for the classic gene duplication and loss, and for the loss cost. For the first time, we show how to compute these diameters and proof that this can be achieved in linear time, and thus, completing the computational time analysis for all of the bijective diameters under the GTP costs.
Article
Full-text available
In taxonomy and other branches of classification it is useful to know when tree-like classifications on overlapping sets of labels can be consistently combined into a parent tree. This paper considers the computation complexity of this problem. Recognizing when a consistent parent tree exists is shown to be intractable (NP-complete) for sets of unrooted trees, even when each tree in the set classifies just four labels. Consequently determining the compatibility of qualitative characters and partial binary characters is, in general, also NP-complete. However for sets of rooted trees an algorithm is described which constructs the “strict consensus tree” of all consistent parent trees (when they exist) in polynomial time. The related question of recognizing when a set of subtrees uniquely defines a parent tree is also considered, and a simple necessary and sufficient condition is described for rooted trees.
Article
One of the earliest results in Combinatorics is Mantel's theorem from 1907 that the largest triangle-free graph on a given vertex set is complete bipartite. However, a seemingly similar question posed by Turán in 1941 is still open: what is the largest 3-uniform hypergraph on a given vertex set with no tetrahedron? This question can be considered a test case for the general hypergraph Turán problem, where given an r-uniform hypergraph F , we want to determine the maximum number of edges in an r-uniform hypergraph on n vertices that does not contain a copy of F . To date there are very few results on this problem, even asymptotically. However, recent years have seen a revitalisation of this field, via significant developments in the available methods, notably the use of stability (approximate structure) and flag algebras. This article surveys the known results and methods, and discusses some open problems.
Article
Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.