Efficient Algorithms for the Reconciliation Problem with Gene Duplication, Horizontal Transfer and Loss

Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Bioinformatics (Impact Factor: 4.98). 06/2012; 28(12):i283-91. DOI: 10.1093/bioinformatics/bts225
Source: PubMed


Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction.
We present two new algorithms for the DTL reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL reconciliation model by considering distance-dependent transfer costs, which allow for more accurate reconciliation and give an efficient algorithm for DTL reconciliation under this extended model. We implemented our new algorithms and demonstrated up to 100 000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL reconciliation for performing rigorous evolutionary analyses of large gene families and enables its use in advanced reconciliation-based gene and species tree reconstruction methods.
Our programs can be freely downloaded from

Download full-text


Available from: Manolis Kellis
  • Source
    • "The selection of a random subset of mapping sites, Φ(p i ), requires an update to the Improved Node Mapping algorithm [Drinkwater and Charleston, 2014a], in particular, providing an adaptive data structure which allows for a random subset of size k to be retained for each node p i , along with a method to procure the random subset at each iteration. This functionality has been integrated into the RASCAL algorithm as seen inFigure 2. Node Mapping algorithms have traditionally stored the minimum cost mapping sites in a two-dimensional matrix of size O(n 2 ) [Yodpinyanee et al., 2011, Bansal et al., 2012. While still possible to use a two-dimensional matrix, this time of size O(kn), we have instead stored the sub solutions within an array of lists. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A popular method for coevolutionary inference is cophylogenetic reconstruction where the branch length of the phylogenies have been previously derived. This approach, unlike the more generalized reconstruction techniques that are NP-Hard, can reconcile the shared evolutionary history of a pair of phylogenetic trees in polynomial time. This approach, while proven to be highly successful, requires a high polynomial running time. This is quickly becoming a limiting factor of this approach due to the continual increase in size of coevolutionary data sets. One existing method that combats this issue proposes a trade-off of accuracy for an asymptotic time complexity reduction. This technique in almost 70% of cases converges on Pareto optimal solutions in linear time. We build on this prior work by proposing an alternate linear time algorithm (RASCAL) that offers a significant accuracy increase, with RASCAL converging on Pareto optimal solutions in 85% of cases and unlike prior methods can ensure, with high probability, that all optimal solutions can be recovered, provided sufficient replicates are performed.
    Full-text · Article · Feb 2016 · Journal of Computational Biology
  • Source
    • "Most studies aimed at evaluating the role of gene transfer using phylogenetic approaches have tried to circumvent the problem of duplications and loss of genes by focusing on genes that are present in at most one copy in each genome (Beiko et al. 2005;Than et al. 2008;Abby et al. 2010Abby et al. , 2012Puigbò et al. 2010). Only recently, new methods have been developed that can sort out the role of duplication, transfer, and loss in gene histories (Bansal et al. 2012;Szölló´si et al. , 2013bSjöstrand et al. 2014). A crucial ingredient of any phylogenetic method that aims at detecting gene transfer is the ability to account for phylogenetic uncertainty . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microbes acquire DNA from a variety of sources. The last decades, which have seen the development of genome sequencing, have revealed that horizontal gene transfer has been a major evolutionary force that has constantly reshaped genomes throughout evolution. However, because the history of life must ultimately be deduced from gene phylogenies, the lack of methods to account for horizontal gene transfer has thrown into confusion the very concept of the tree of life. As a result, many questions remain open, but emerging methodological developments promise to use information conveyed by horizontal gene transfer that remains unexploited today.
    Full-text · Article · Jan 2016 · Cold Spring Harbor perspectives in biology
  • Source
    • "Reconciliations are computed with an implementation of the ILP approach and compared with the results of Jane 4 [14], TreeMap 3b [2], NOTUNG 2.8 Beta [13], and Ranger-DTL [12]. For all tools the same simulated data sets were reconciled using the respective default parameters. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present an integer linear programming (ILP) approach, called CoRe-ILP, for finding an optimal time consistent cophylogenetic host-parasite reconciliation under the cophylogenetic event model with the events cospeciation, duplication, sorting, host switch, and failure to diverge. Instead of assuming event costs, a simplified model is used, maximizing primarily for cospeciations and secondarily minimizing host switching events. Duplications, sortings, and failure to diverge events are not explicitly scored. Different from existing event based reconciliation methods, CoRe-ILP can use (approximate) phylogenetic branch lengths for filtering possible ancestral host-parasite interactions. Experimentally, it is shown that CoRe-ILP can successfully use branch length information and performs well for biological and simulated data sets. The results of CoRe-ILP are compared with the results of the reconciliation tools Jane 4, Treemap 3b, NOTUNG 2.8 Beta, and Ranger-DTL. Algorithm CoRe-ILP is implemented using IBM ILOG CPLEXTM Optimizer 12.6 and is freely available from
    Full-text · Article · Dec 2015 · IEEE/ACM Transactions on Computational Biology and Bioinformatics
Show more