Choosing Appropriate Substitution Models for the Phylogenetic Analysis of Protein-Coding Sequences

University of Oxford, Oxford, England, United Kingdom
Molecular Biology and Evolution (Impact Factor: 9.11). 02/2006; 23(1):7-9. DOI: 10.1093/molbev/msj021
Source: PubMed


Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.

14 Reads
  • Source
    • "A two-partition (first and second codon position linked; third position separate) HKY (Hasegawa, Kishino and Yano) (+Γ) nucleotide substitution model was used for this, as recommended for protein coding sequence data (Shapiro et al. 2006). Three replicate runs of 100 million generations were performed using a constant-size population model and with a sampling frequency that provided a total of 10,000 samples for each run. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The European snow vole (Chionomys nivalis) is a microtine rodent with a highly fragmented distribution range, mostly associated with the main mountain systems from southern Europe to Turkmenistan. In this paper we confirm the occurrence of the snow vole in Portugal, based on morphological characteristics, biometrics and genetic analysis of two individuals captured in the Montesinho Mountain range (northeastern Portugal). Both mitochondrial and nuclear genetic markers were used to confirm the species identity. The analysis of cytochrome b supports previous conclusions on the phylogeographic structure of the species, revealing the existence of several distinct lineages. Moreover, it shows that the Portuguese specimens are closely related to the other Iberian populations. This finding is of great interest as it adds new information regarding the spatial distribution of the snow vole, by redefining the southwestern limits of the species’ range, and it highlights the need for accurate assessment of regional small mammal population trends and conservation status.
    Italian Journal of Zoology 11/2015; DOI:10.1080/11250003.2015.1103320 · 0.79 Impact Factor
  • Source
    • "GBE implemented in BEAST (Drummond et al. 2012). The nucleotide substitution process was modeled by separately partitioning the codon positions into 1st + 2nd and 3rd positions (Shapiro et al. 2006) and applying a separate general timereversible substitution model with gamma-distributed rate heterogeneity and a proportion of invariant sites (GTR + I + gamma) (Tavaré 1986), under an uncorrelated lognormal relaxed molecular clock to account for variation in rates of evolution among lineages (Drummond et al. 2006). We specified a Bayesian Skygrid coalescent tree prior that allows the population size to be estimated through time from a single or multiple unlinked genetic loci (Gill et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Rotaviruses are the most important etiological agent of acute gastroenteritis in young children worldwide. Among the first countries to introduce rotavirus vaccines into their national immunization programs were Belgium (November, 2006) and Australia (July, 2007). Surveillance programs in Belgium (since 1999) and Australia (since 1989) offer the opportunity to perform a detailed comparison of rotavirus strains circulating pre- and post-vaccine introduction.G1P[8] rotaviruses are the most prominent genotype in humans, and a total of 157 G1P[8] rotaviruses isolated between 1999 and 2011 were selected from Belgium and Australia and their complete genomes were sequenced. Phylogenetic analysis showed evidence of frequent reassortment among Belgian and Australian G1P[8] rotaviruses. Although many different phylogenetic subclusters were present before and after vaccine introduction, some unique clusters were only identified after vaccine introduction, which could be due to natural fluctuation or the first signs of vaccine driven evolution. The times to the most recent common ancestors for the Belgian and Australian G1P[8] rotaviruses ranged from 1846 to 1955 depending on the gene segment, with VP7 and NSP4 resulting in the most recent estimates. We found no evidence that rotavirus population size was affected after vaccine introduction and only six amino acid sites in VP2, VP3, VP7 and NSP1 were identified to be under positive selective pressure. Continued surveillance of G1P[8] strains is needed to determine long-term effects of vaccine introductions, particularly now rotavirus vaccines are implemented in the national immunization programs of an increasing number of countries worldwide. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Genome Biology and Evolution 08/2015; 7(9). DOI:10.1093/gbe/evv157 · 4.23 Impact Factor
  • Source
    • "Preliminary analysis using Path-O-Gen ( software/pathogen/) indicated that temporal signal of each subgenotype dataset was limited (D2: R 2 = 0.233 and D3: R 2 = 0.16). To maximize the temporal signal, each subgenotype was allowed to have an independent tree, while sharing the same underlying substitution model [16], which allows for different rates at the 1st + 2nd and 3rd codon positions. The uncorrelated lognormal molecular clock model was applied to take into account rate variation along the tree branches [17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Hepatitis B virus (HBV) has been classified into eight genotypes and forty subgenotypes. Genotype D of HBV is the most worldwide distributed genotype and HBV subgenotype D1 has been isolated from Iranian patients. Objective: To characterize for the first time complete genomes of recently emerged non-D1 strains in Iran. Study design: HBV complete genomes isolated from 9 Iranian HBV carriers were sequenced. Different diversities of the ORFs were mapped and evolutionary history relationships were investigated. Results: Phylogenetic analysis identified four D2 subgenotypes and five D3 subgenotypes of HBV in the studied patients. Of note, D2 strains clustered with strains from Lebanon and Syria. The time of the most recent common ancestor (TMRCA) of the first cluster of D2 was dated at 1953 (BCI=1926, 1976) while the second cluster was dated at 1947 (BCI=1911, 1978). All five Iranian D3 strains formed a monophyletic cluster with Indian strain and dated back to 1967 (BCI=1946, 1987). Surprisingly, two D3 strains had an adw2 subtype. Interestingly, more than 80% of the present strains showed precore mutations, while two isolates carried basal core promoter variation. Conclusion: Iranian D2 and D3 isolates were introduced on at least two and one occasion in Iran and diverged from west and south Asian HBV strains, respectively. Considering the impact of the different (sub) genotypes on clinical outcome, exploring the distinct mutational patterns of Iranian D1 and non-D1 strains is of clinical importance.
    Journal of Clinical Virology 02/2015; 63:38-41. DOI:10.1016/j.jcv.2014.12.010 · 3.02 Impact Factor
Show more


14 Reads