Efficient Algorithms for the Reconciliation Problem with Gene Duplication, Horizontal Transfer and Loss

Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Bioinformatics (Impact Factor: 4.98). 06/2012; 28(12):i283-91. DOI: 10.1093/bioinformatics/bts225
Source: PubMed


Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction.
We present two new algorithms for the DTL reconciliation problem that are dramatically faster than existing algorithms, both asymptotically and in practice. We also extend the standard DTL reconciliation model by considering distance-dependent transfer costs, which allow for more accurate reconciliation and give an efficient algorithm for DTL reconciliation under this extended model. We implemented our new algorithms and demonstrated up to 100 000-fold speed-up over existing methods, using both simulated and biological datasets. This dramatic improvement makes it possible to use DTL reconciliation for performing rigorous evolutionary analyses of large gene families and enables its use in advanced reconciliation-based gene and species tree reconstruction methods.
Our programs can be freely downloaded from

Download full-text


Available from: Manolis Kellis, Oct 01, 2015
1 Follower
15 Reads
  • Source
    • "Reconciliations are computed with an implementation of the ILP approach and compared with the results of Jane 4 [14], TreeMap 3b [2], NOTUNG 2.8 Beta [13], and Ranger-DTL [12]. For all tools the same simulated data sets were reconciled using the respective default parameters. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present an integer linear programming (ILP) approach, called CoRe-ILP, for finding an optimal time consistent cophylogenetic host-parasite reconciliation under the cophylogenetic event model with the events cospeciation, duplication, sorting, host switch, and failure to diverge. Instead of assuming event costs, a simplified model is used, maximizing primarily for cospeciations and secondarily minimizing host switching events. Duplications, sortings, and failure to diverge events are not explicitly scored. Different from existing event based reconciliation methods, CoRe-ILP can use (approximate) phylogenetic branch lengths for filtering possible ancestral host-parasite interactions. Experimentally, it is shown that CoRe-ILP can successfully use branch length information and performs well for biological and simulated data sets. The results of CoRe-ILP are compared with the results of the reconciliation tools Jane 4, Treemap 3b, NOTUNG 2.8 Beta, and Ranger-DTL. Algorithm CoRe-ILP is implemented using IBM ILOG CPLEXTM Optimizer 12.6 and is freely available from
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 01/2015; DOI:10.1109/TCBB.2015.2430336 · 1.44 Impact Factor
  • Source
    • "Some parsimony methods (e.g. Bansal et al., 2012) do not need information on the order of speciations in time. This allows a more efficient recursion over reconciliations, but at the cost of considering reconciliations that contain transfer events that are not consistent with any ordering of the species tree (Tofigh et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.
    Bioinformatics 11/2014; 31(6). DOI:10.1093/bioinformatics/btu728 · 4.98 Impact Factor
  • Source
    • "he potential inaccuracies of supermatrix topologies ( Edwards et al . , 2007 ) . Most of these methods imply only incomplete lineage sorting as the source of gene tree incongruence ( Nakhleh , 2013 ) . Although a few recent reconciliation approaches incorporate hybridiza - tion as well as deep coalescence and gene duplication in their algorithms ( Bansal et al . , 2012 ; Stolzer et al . , 2012 ; Yu et al . , 2012 , 2013b ) , most of these require a species tree topology in addition to gene trees as input . Supermatrix and species tree estimation approaches are completely congruent in supporting the three sub - tribes in the tribe Cocoseae with good support ( Figs 1 and 2 ) —Attaleinae , Bactridinae , "
    [Show abstract] [Hide abstract]
    ABSTRACT: Arecaceae tribe Cocoseae is the most economically important tribe of palms, including both coconut and African oil palm. It is mostly represented in the Neotropics, with one and two genera endemic to South Africa and Madagascar, respectively. Using primers for six single copy WRKY gene family loci, we amplified DNA from 96 samples representing all genera of the palm tribe Cocoseae as well as outgroup tribes Reinhardtieae and Roystoneae. We compared parsimony (MP), maximum likelihood (ML), and Bayesian (B) analysis of the supermatrix with three species-tree estimation approaches. Subtribe Elaeidinae is sister to the Bactridinae in all analyses. Within subtribe Attaleinae, Lytocaryum, previously nested in Syagrus, is now positioned by MP and ML as sister to the former, with high support; B maintains Lytocaryum embedded within Syagrus. Both MP and ML resolve Cocos as sister to Syagrus; B positions Cocos as sister to Attalea. Bactridineae is composed of two sister clades, Bactris and Desmoncus in one, for which there is morphological support, and a second comprising Acrocomia, Astrocaryum, and Aiphanes. Two B and one ML gene tree-species estimation approaches are incongruent with the supermatrix in a few critical intergeneric clades, but resolve the same infrageneric relationships. The biogeographic history of the Cocoseae is dominated by dispersal events. The tribe originated in the late Cretaceous in South America. Evaluated together, the supermatrix and species tree analyses presented in this paper provide the most accurate picture of the evolutionary history of the tribe to date, with more congruence than incongruence among the various methodologies.
    Cladistics 10/2014; 31(5). DOI:10.1111/cla.12100 · 6.22 Impact Factor
Show more