Choosing Appropriate Substitution Models for the Phylogenetic Analysis of Protein-Coding Sequences

University of Oxford, Oxford, England, United Kingdom
Molecular Biology and Evolution (Impact Factor: 9.11). 02/2006; 23(1):7-9. DOI: 10.1093/molbev/msj021
Source: PubMed


Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.

Download full-text


Available from: Alexei J Drummond, Nov 25, 2015
  • Source
    • "ddarriba/jmodeltest2; Posada, 2008) to compare substitution models based on the Bayesian Information Criterion (Alizon and Fraser 2013); the resulting substitution models were the Tamura-Nei (TnR) model (lp-fragment) and the Hasegawa, Kashino and Yano (HKY) model (vp3-and RdRp-fragments), both with Gamma variation. For the DWV-subtype data set, we compared the general time-reversible model and the SRD06 model (Shapiro et al., 2006) using path sampling. We partitioned substitution rates between the first and second and third codon positions as, for all data sets, the third codon position had a significantly higher rate. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Treatment of emerging RNA viruses is hampered by the high mutation and replication rates that enable these viruses to operate as a quasispecies. Declining honey bee populations have been attributed to the ectoparasitic mite Varroa destructor and its affiliation with Deformed Wing Virus (DWV). In the current study we use next-generation sequencing to investigate the DWV quasispecies in an apiary known to suffer from overwintering colony losses. We show that the DWV species complex is made up of three master variants. Our results indicate that a new DWV Type C variant is distinct from the previously described types A and B, but together they form a distinct clade compared with other members of the Iflaviridae. The molecular clock estimation predicts that Type C diverged from the other variants ~319 years ago. The discovery of a new master variant of DWV has important implications for the positive identification of the true pathogen within global honey bee populations.
    Full-text · Article · Dec 2015 · The ISME Journal
  • Source
    • "A two-partition (first and second codon position linked; third position separate) HKY (Hasegawa, Kishino and Yano) (+Γ) nucleotide substitution model was used for this, as recommended for protein coding sequence data (Shapiro et al. 2006). Three replicate runs of 100 million generations were performed using a constant-size population model and with a sampling frequency that provided a total of 10,000 samples for each run. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The European snow vole (Chionomys nivalis) is a microtine rodent with a highly fragmented distribution range, mostly associated with the main mountain systems from southern Europe to Turkmenistan. In this paper we confirm the occurrence of the snow vole in Portugal, based on morphological characteristics, biometrics and genetic analysis of two individuals captured in the Montesinho Mountain range (northeastern Portugal). Both mitochondrial and nuclear genetic markers were used to confirm the species identity. The analysis of cytochrome b supports previous conclusions on the phylogeographic structure of the species, revealing the existence of several distinct lineages. Moreover, it shows that the Portuguese specimens are closely related to the other Iberian populations. This finding is of great interest as it adds new information regarding the spatial distribution of the snow vole, by redefining the southwestern limits of the species’ range, and it highlights the need for accurate assessment of regional small mammal population trends and conservation status.
    Full-text · Article · Nov 2015 · Italian Journal of Zoology
  • Source
    • "Phylogenetic trees were also reconstructed with the Bayesian inference methods using the BEAST v1.8.2 ( (Drummond et al., 2012). The SRD06 nucleotide substitution model was used in all simulations as this model is recognized to provide better resolution for coding regions to Bayesian analysis (Shapiro et al., 2006). The posterior distribution of trees was summarized from Bayesian Markov chain Monte Carlo and chain lengths were 30 million generations with sampled every 1000 generations of which the first 10% were discarded as burn-in. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Taiwan had been declared rabies-free in humans and domestic animals for five decades until July 2013, when surprisingly, three Formosan ferret badgers (FB) were diagnosed with rabies. Since then, a variety of wild carnivores and other wildlife species have been found dead, neurologically ill, or exhibiting aggressive behaviors around the island. To determine the affected animal species, geographic areas, and environments, animal bodies were examined for rabies by direct fluorescent antibody test (FAT). The viral genomes from the brains of selected rabid animals were sequenced for the phylogeny of rabies viruses (RABV). Out of a total of 1016 wild carnivores, 276/831 (33.2%) Formosan FBs were FAT positive, with occasional biting incidents in 1 dog and suspected spillover in 1 house shrew. All other animals tested, including dogs, cats, bats, mice, house shrews, and squirrels, were rabies-negative. The rabies was badger-associated and confined to nine counties/cities in sylvatic environments. Phylogeny of nucleoprotein and glycoprotein genes from 59 Formosan FB-associated RABV revealed them to be clustered in two distinct groups, TWI and TWII, consistent with the geographic segregation into western and eastern Taiwan provided by the Central Mountain Range and into northern rabies-free and central-southern rabies-affected regions by a river bisecting western Taiwan. The unique features of geographic and genetic segregation, sylvatic enzooticity, and FB-association of RABV suggest a logical strategy for the control of rabies in this nation.
    Full-text · Article · Nov 2015
Show more