Conference Paper

Algorithms for Rapid Error Correction for the Gene Duplication Problem.

DOI: 10.1007/978-3-642-21260-4_23 Conference: Bioinformatics Research and Applications - 7th International Symposium, ISBRA 2011, Changsha, China, May 27-29, 2011. Proceedings
Source: DBLP

ABSTRACT Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within the context of
an organismal phylogeny. In one example, the gene duplication problem seeks the evolutionary scenario that implies the minimum
number of gene duplications needed to reconcile a gene tree and a species tree. While the gene duplication problem can effectively
link gene and species evolution, error in gene trees can profoundly bias the results. We describe novel algorithms that rapidly
search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a gene tree to find
a topology that implies the fewest duplications. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n
2 for searching TBR neighborhoods, where n is the number of vertices in the given gene tree. They provide a fast error correction protocol for gene trees, in which
we allow small gene tree rearrangements to improve the reconciliation cost. We tested the SPR tree rearrangement algorithm
on a collection of 1201 plant gene trees, and in every case, the SPR algorithm identified an alternate topology that implied
at least one fewer duplication. We also demonstrate a simple method to use the gene rearrangement algorithm to improve gene
tree parsimony phylogenetic analyses, which infer a species tree based on the gene duplication problem.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The genome content of extant species is derived from that of ancestral genomes, distorted by evolutionary events such as gene duplications, transfers and losses. Reconciliation methods aim at recovering such events and at localizing them in the species history, by comparing gene family trees to species trees. These methods play an important role in studying genome evolution as well as in inferring orthology relationships. A major issue with reconciliation methods is that the reliability of predicted evolutionary events may be questioned for various reasons: Firstly, there may be multiple equally optimal reconciliations for a given species tree-gene tree pair. Secondly, reconciliation methods can be misled by inaccurate gene or species trees. Thirdly, predicted events may fluctuate with method parameters such as the cost or rate of elementary events. For all of these reasons, confidence values for predicted evolutionary events are sorely needed. It was recently suggested that the frequency of each event in the set of all optimal reconciliations could be used as a support measure. We put this proposition to the test here and also consider a variant where the support measure is obtained by additionally accounting for suboptimal reconciliations. Experiments on simulated data show the relevance of event supports computed by both methods, while resorting to suboptimal sampling was shown to be more effective. Unfortunately, we also show that, unlike the majority-rule consensus tree for phylogenies, there is no guarantee that a single reconciliation can contain all events having above 50% support. In this paper, we detail how to rely on the reconciliation graph to efficiently identify the median reconciliation. Such median reconciliation can be found in polynomial time within the potentially exponential set of most parsimonious reconciliations.
    PLoS ONE 10/2013; 8(10):e73667. DOI:10.1371/journal.pone.0073667 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The use of genomic data sets for phylogenetics is complicated by the fact that evolutionary processes such as gene duplication and loss, or incomplete lineage sorting (deep coalescence) cause incongruence among gene trees. One well-known approach that deals with this complication is gene tree parsimony, which, given a collection of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, a lack of efficient algorithms has limited the use of this approach. Here, we present efficient algorithms for SPR and TBR-based local search heuristics for gene tree parsimony under the 1) duplication, 2) loss, 3) duplication-loss, and 4) deep coalescence reconciliation costs. These novel algorithms improve upon the time complexities of previous algorithms for these problems by a factor of $(n)$, where $(n)$ is the number of species in the collection of gene trees. Our algorithms provide a substantial improvement in runtime and scalability compared to previous implementations and enable large-scale gene tree parsimony analyses using any of the four reconciliation costs. Our algorithms have been implemented in the software packages DupTree and iGTP, and have already been used to perform several compelling phylogenetic studies.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 07/2013; 10(4):939-956. DOI:10.1109/TCBB.2013.103 · 1.54 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a reconciliation heuristic accounting for gene duplications, losses and horizontal transfers that specifically takes into account the uncertainties in the gene tree. Rearrangements are tried for gene tree edges that are weakly supported, and are accepted whenever they improve the reconciliation cost. We prove useful properties on the dynamic programming matrix used to compute reconciliations, which allows to speed-up the tree space exploration when rearrangements are generated by Nearest Neighbor Interchanges (NNI) edit operations. Experimental results on simulated and real data confirm that running times are greatly reduced when considering the above-mentioned optimization in comparison to the naïve rearrangement procedure. Results also show that gene trees modified by such NNI rearrangements are closer to the correct (simulated) trees and lead to more correct event predictions on average. The program is available at
    Proceedings of the 12th international conference on Algorithms in Bioinformatics; 09/2012