Conference Paper

Algorithms for Rapid Error Correction for the Gene Duplication Problem.

DOI: 10.1007/978-3-642-21260-4_23 Conference: Bioinformatics Research and Applications - 7th International Symposium, ISBRA 2011, Changsha, China, May 27-29, 2011. Proceedings
Source: DBLP

ABSTRACT Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within the context of
an organismal phylogeny. In one example, the gene duplication problem seeks the evolutionary scenario that implies the minimum
number of gene duplications needed to reconcile a gene tree and a species tree. While the gene duplication problem can effectively
link gene and species evolution, error in gene trees can profoundly bias the results. We describe novel algorithms that rapidly
search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a gene tree to find
a topology that implies the fewest duplications. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n
2 for searching TBR neighborhoods, where n is the number of vertices in the given gene tree. They provide a fast error correction protocol for gene trees, in which
we allow small gene tree rearrangements to improve the reconciliation cost. We tested the SPR tree rearrangement algorithm
on a collection of 1201 plant gene trees, and in every case, the SPR algorithm identified an alternate topology that implied
at least one fewer duplication. We also demonstrate a simple method to use the gene rearrangement algorithm to improve gene
tree parsimony phylogenetic analyses, which infer a species tree based on the gene duplication problem.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Reconciliation methods compare gene trees and species trees to recover evolutionary events suchas duplications, transfers and losses explaining the history and composition of genomes. It is well-known that gene trees inferred from molecular sequences can be partly erroneous due to incorrectsequence alignments as well as phylogenetic reconstruction artifacts such as long branch attraction. Inpractice, this leads reconciliation methods to overestimate the number of evolutionary events. Severalmethods have been proposed to circumvent this problem, by collapsing the unsupported edges andthen resolving the obtained multifurcating nodes, or by directly rearranging the binary gene trees. Yetthese methods have been defined for models of evolution accounting only for duplications and losses,i.e. can not be applied to handle prokaryotic gene families. RESULTS: We propose a reconciliation method accounting for gene duplications, losses and horizontal trans-fers, that specifically takes into account the uncertainties in gene trees by rearranging their weaklysupported edges. Rearrangements are performed on edges having a low confidence value, and areaccepted whenever they improve the reconciliation cost. We prove useful properties on the dynamicprogramming matrix used to compute reconciliations, which allows to speed-up the tree space explo-ration when rearrangements are generated by Nearest Neighbor Interchanges (NNI) edit operations.Experiments on synthetic data show that gene trees modified by such NNI rearrangements are closerto the correct simulated trees and lead to better event predictions on average. Experiments on real datademonstrate that the proposed method leads to a decrease in the reconciliation cost and the number ofinferred events. Finally on a dataset of 30k gene families, this reconciliation method shows a rankingof prokaryotic phyla by transfer rates identical to that proposed by a different approach dedicated totransfer detection [BMCBIOINF 11:324, 2010, PNAS 109(13):4962-4967, 2012]. CONCLUSIONS: Prokaryotic gene trees can now be reconciled with their species phylogeny while accounting for theuncertainty of the gene tree. More accurate and more precise reconciliations are obtained with respectto previous parsimony algorithms not accounting for such uncertainties [LNCS 6398:93-108, 2010,BIOINF 28(12): i283-i291, 2012].A software implementing the method is freely available at
    Algorithms for Molecular Biology 04/2013; 8(1):12. · 1.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The genome content of extant species is derived from that of ancestral genomes, distorted by evolutionary events such as gene duplications, transfers and losses. Reconciliation methods aim at recovering such events and at localizing them in the species history, by comparing gene family trees to species trees. These methods play an important role in studying genome evolution as well as in inferring orthology relationships. A major issue with reconciliation methods is that the reliability of predicted evolutionary events may be questioned for various reasons: Firstly, there may be multiple equally optimal reconciliations for a given species tree-gene tree pair. Secondly, reconciliation methods can be misled by inaccurate gene or species trees. Thirdly, predicted events may fluctuate with method parameters such as the cost or rate of elementary events. For all of these reasons, confidence values for predicted evolutionary events are sorely needed. It was recently suggested that the frequency of each event in the set of all optimal reconciliations could be used as a support measure. We put this proposition to the test here and also consider a variant where the support measure is obtained by additionally accounting for suboptimal reconciliations. Experiments on simulated data show the relevance of event supports computed by both methods, while resorting to suboptimal sampling was shown to be more effective. Unfortunately, we also show that, unlike the majority-rule consensus tree for phylogenies, there is no guarantee that a single reconciliation can contain all events having above 50% support. In this paper, we detail how to rely on the reconciliation graph to efficiently identify the median reconciliation. Such median reconciliation can be found in polynomial time within the potentially exponential set of most parsimonious reconciliations.
    PLoS ONE 01/2013; 8(10):e73667. · 3.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a reconciliation heuristic accounting for gene duplications, losses and horizontal transfers that specifically takes into account the uncertainties in the gene tree. Rearrangements are tried for gene tree edges that are weakly supported, and are accepted whenever they improve the reconciliation cost. We prove useful properties on the dynamic programming matrix used to compute reconciliations, which allows to speed-up the tree space exploration when rearrangements are generated by Nearest Neighbor Interchanges (NNI) edit operations. Experimental results on simulated and real data confirm that running times are greatly reduced when considering the above-mentioned optimization in comparison to the naïve rearrangement procedure. Results also show that gene trees modified by such NNI rearrangements are closer to the correct (simulated) trees and lead to more correct event predictions on average. The program is available at
    Proceedings of the 12th international conference on Algorithms in Bioinformatics; 09/2012