Approximate Likelihood-Ratio Test for Branches: A Fast, Accurate, and Powerful Alternative

Equipe Méthodes et Algorithmes pour la Bioinformatique LIRMM-CNRS, Université Montpellier II, Montpellier 34392, France.
Systematic Biology (Impact Factor: 14.39). 09/2006; 55(4):539-52. DOI: 10.1080/10635150600755453
Source: PubMed


We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).

Download full-text


Available from: Maria Anisimova
  • Source
    • "(Melolonthinae: Ablaberini), while in the analyses of dung beetles, water beetles, mayflies and butterflies the trees were rooted at the most basal ingroup node established in the analysis ofMonaghan et al. (2009)whose outgroups were not included here. We measured branch support using the approximate likelihood ratio test (aLRT) as implemented in PhyML v.3.0 (Anisimova and Gascuel 2006). The tree of unique haplotypes was made ultrametric using Pathd8 (Britton et al. 2007) assigning the root an arbitrary age of one. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA-based species delimitation may be compromised by limited sampling effort and species rarity, including ‘singleton’ representatives of species, which hampers estimates of intra- vs. interspecies evolutionary processes. In a case study of southern African chafers (beetles in the family Scarabaeidae) many species and subclades were poorly represented and 48.5% of species were singletons. Using cox1 sequences from >500 specimens and ~100 species, the Generalized Mixed Yule Coalescent (GMYC) analysis as well as various other approaches for DNA-based species delimitation (AGBD, PTP, Species Identifier, Statistical Parsimony), frequently produced poor results if analyzing a narrow target group only, but the performance improved when several subclades were combined. Hence, low sampling may be compensated for by “clade addition” of lineages outside of the focal group. Similar findings were obtained in reanalysis of published datasets of taxonomically poorly known species assemblages of insects from Madagascar. The low performance of undersampled trees is not due to high proportions of singletons per se, as shown in simulations (with 13%, 40% and 52% singletons). However, the GMYC method was highly sensitive to variable effective population size (Ne), which was exacerbated by variable species abundances in the simulations. Hence, low sampling success and rarity of species affect the power of the GMYC method only if they reflect great differences in Ne among species. Potential negative effects of skewed species abundances and prevalence of singletons are ultimately an issue about the variation in Ne and the degree to which this is correlated with the census population size and sampling success. Clade addition beyond a limited study group can overcome poor sampling for the GMYC method in particular under variable Ne. This effect was less pronounced for methods of species delimitation not based on coalescent models.
    Full-text · Article · Jan 2016 · Systematic Biology
  • Source
    • "Convergence was determined by verifying that the standard deviations of split frequencies approached zero and that there was no obvious trend in the log likelihood plot. The ML analyses were run under the GTR+G+I model with a BioNJ starting tree, the best of NNI (nearest neighbour interchange) and SPR (subtree pruning and regrafting) tree improvement and aLRT SH-like branch support (approximate likelihood ratio test with Shimodaira-Hasegawa-like branch support) (Anisimova & Gascuel, 2006 ). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Retinitis pigmentosa (RP) comprises several heritable diseases that involve photoreceptor, and ultimately retinal, degeneration. Currently, mutations in over 50 genes have known links to RP. Despite advances in clinical characterization, molecular characterization of RP remains challenging due to the heterogeneous nature of causal genes, mutations, and clinical phenotypes. In this study, we compiled large datasets of two important visual genes associated with RP: rhodopsin, which initiates the phototransduction cascade, and the retinoid isomerase RPE65, which regenerates the visual cycle. We used a comparative evolutionary approach to investigate the relationship between interspecific sequence variation and pathogenic mutations that lead to degenerative retinal disease. Using codon-based likelihood methods, we estimated evolutionary rates ( d N / d S ) across both genes in a phylogenetic context to investigate differences between pathogenic and nonpathogenic amino acid sites. In both genes, disease-associated sites showed significantly lower evolutionary rates compared to nondisease sites, and were more likely to occur in functionally critical areas of the proteins. The nature of the dataset (e.g., vertebrate or mammalian sequences), as well as selection of pathogenic sites, affected the differences observed between pathogenic and nonpathogenic sites. Our results illustrate that these methods can serve as an intermediate step in understanding protein structure and function in a clinical context, particularly in predicting the relative pathogenicity (i.e., functional impact) of point mutations and their downstream phenotypic effects. Extensions of this approach may also contribute to current methods for predicting the deleterious effects of candidate mutations and to the identification of protein regions under strong constraint where we expect pathogenic mutations to occur.
    Full-text · Article · Jan 2016 · Visual Neuroscience
  • Source
    • "means of ShimodairaeHasegawa approximate likelihood ratio tests (SH-like; Anisimova and Gascuel, 2006) and 1000 bootstrap replicates (Felsenstein, 1985). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Matudaea is the only genus of the Hamamelidaceae found in South America. The genus is composed by two extant species, M. trinervia, from Mexico and Costa Rica, and Matudaea colombiana, from the Colombian Andes; additional fossil records are present in Central Europe. Population genetics, molecular phylogenetics and niche modelling approaches were applied to explain processes related with the trans-Panamanian M. trinervia/. M. colombiana split and the putative colonization of the latter to the northern Andes. The split between the two Matudaea species was estimated during Middle Miocene. The colonization of Matudaea into South America could have been facilitated by the closure of the Isthmus of Panama and the global decreasing of temperature during Miocene. Five haplotypes of M. colombiana were identified, which show an eastwards decline of genetic diversity and suggest a founder effect in the colonization of Eastern cordillera of the Colombian Andes. We detected a niche conservatism signal between the two Matudaea species related with Temperature of Coldest Month and Mean Temperature of Driest Quarter bioclimatic variables; this signal might be related to the narrow altitudinal range occupied by the two species.
    Full-text · Article · Dec 2015 · Biochemical Systematics and Ecology
Show more