Algorithms for Rapid Error Correction for the Gene Duplication Problem.
ABSTRACT Gene tree - species tree reconciliation problems infer the patterns and processes of gene evolution within the context of
an organismal phylogeny. In one example, the gene duplication problem seeks the evolutionary scenario that implies the minimum
number of gene duplications needed to reconcile a gene tree and a species tree. While the gene duplication problem can effectively
link gene and species evolution, error in gene trees can profoundly bias the results. We describe novel algorithms that rapidly
search local Subtree Prune and Regraft (SPR) or Tree Bisection and Reconnection (TBR) neighborhoods of a gene tree to find
a topology that implies the fewest duplications. These algorithms improve on the current solutions by a factor of n for searching SPR neighborhoods and n
2 for searching TBR neighborhoods, where n is the number of vertices in the given gene tree. They provide a fast error correction protocol for gene trees, in which
we allow small gene tree rearrangements to improve the reconciliation cost. We tested the SPR tree rearrangement algorithm
on a collection of 1201 plant gene trees, and in every case, the SPR algorithm identified an alternate topology that implied
at least one fewer duplication. We also demonstrate a simple method to use the gene rearrangement algorithm to improve gene
tree parsimony phylogenetic analyses, which infer a species tree based on the gene duplication problem.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: Gene tree and species tree reconstruction, orthology analysis and reconciliation, are problems important in multigenome-based comparative genomics and biology in general. In the present paper, we advance the frontier of these areas in several respects and provide important computational tools. First, exact algorithms are given for several probabilistic reconciliation problems with respect to the probabilistic gene evolution model, previously developed by the authors. Until now, those problems were solved by MCMC estimation algorithms. Second, we extend the gene evolution model to the gene sequence evolution model, by including sequence evolution. Third, we develop MCMC algorithms for the gene sequence evolution model that, given gene sequence data allows: (1) orthology analysis, reconciliation analysis, and gene tree reconstruction, w.r.t. a species tree, that balances a likely/unlikely reconciliation and a likely/unlikely gene tree and (2) species tree reconstruction that balance a likely/unlikely reconciliation and a likely/unlikely gene trees. These MCMC algorithms take advantage of the exact algorithms for the gene evolution model. We have successfully tested our dynamical programming algorithms on real data for a biogeography problem. The MCMC algorithms perform very well both on synthetic and biological data.Proceedings of the Eighth Annual International Conference on Computational Molecular Biology, 2004, San Diego, California, USA, March 27-31, 2004; 01/2004
- [show abstract] [hide abstract]
ABSTRACT: Comparative genomic studies are revealing frequent gains and losses of whole genes via duplication and pseudogenization. One commonly used method for inferring the number and timing of gene gains and losses reconciles the gene tree for each gene family with the species tree of the taxa considered. Recent studies using this approach have found a large number of ancient duplications and recent losses among vertebrate genomes. I show that tree reconciliation methods are biased when the inferred gene tree is not correct. This bias places duplicates towards the root of the tree and losses towards the tips of the tree. I demonstrate that this bias is present when tree reconciliation is conducted on both multiple mammal and Drosophila genomes, and that lower bootstrap cut-off values on gene trees lead to more extreme bias. I also suggest a method for dealing with reconciliation bias, although this method only corrects for the number of gene gains on some branches of the species tree. Based on the results presented, it is likely that most tree reconciliation analyses show biases, unless the gene trees used are exceptionally well-resolved and well-supported. These results cast doubt upon previous conclusions that vertebrate genome history has been marked by many ancient duplications and many recent gene losses.Genome biology 02/2007; 8(7):R141. · 10.30 Impact Factor
- Journal of Computational Biology. 01/2000; 7:429-447.