Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites

Article (PDF Available)inGenetics 168(2):1041-51 · October 2004with18 Reads
DOI: 10.1534/genetics.104.031153 · Source: PubMed
The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.
    • "M1a models genetic drift by constraining dN/dS values < 1 in comparison to M2a's assumption of dN/dS > 1 for positive selection. In the more sensitive pairwise model that is beta distributed, M7 models genetic drift (0 ≤ dN/dS ≤ 1) vs. the M8 positive selection model whereby dN/dS > 1 (Anisimova et al., 2001; Wong et al., 2004). For each model, raw dN, dS and dN/dS ratios for all sites and all lineages were estimated using the Nei-Gojobori Method (Nei and Gojobori, 1986). "
    [Show abstract] [Hide abstract] ABSTRACT: The epidermal differentiation complex (EDC) is the most rapidly evolving locus in the human genome compared to that of the chimpanzee. Yet the EDC genes that are undergoing positive selection across mammals and in humans are not known. We sought to identify the positively selected genetic variants and determine the evolutionary events of the EDC using mammalian-wide and clade-specific branch- and branch-site likelihood ratio tests and a genetic algorithm (GA) branch test. Significant non-synonymous substitutions were found in filaggrin, SPRR4, LELP1, and S100A2 genes across 14 mammals. By contrast, we identified recent positive selection in SPRR4 in primates. Additionally, the GA branch test discovered lineage-specific evolution for distinct EDC genes occurring in each of the nodes in the 14-mammal phylogenetic tree. Multiple instances of positive selection for FLG, TCHHL1, SPRR4, LELP1, and S100A2 were noted among the primate branch nodes. Branch-site likelihood ratio tests further revealed positive selection in specific sites in SPRR4, LELP1, filaggrin, and repetin across 14 mammals. However, in addition to continuous evolution of SPRR4, site-specific positive selection was also found in S100A11, KPRP, SPRR1A, S100A7L2, and S100A3 in primates and filaggrin, filaggrin2, and S100A8 in great apes. Very recent human positive selection was identified in the filaggrin2 L41 site that was present in Neanderthal. Together, our results identifying recent positive selection in distinct EDC genes reveal an underappreciated evolution of epidermal skin barrier function in primates and humans.
    Full-text · Article · Jan 2017
    • "Predicted catalytic triad is shown by a triangle (black in colour); Active sites in yellow colour, Substrate binding sites in magenta colour. Amino acid highlighted in black experienced type-II divergence and was present in one of the active sites Maximum likelihood estimations of selection pressure were based on the ratio (ω) of the nonsynonymous (dN) and synonymous substitution rates (dS), dN/dS [40]. The parameter estimates (ω) and likelihood scores were calculated for three pairs of models: M0 (one ratio) versus M3 (discrete), M1a (nearly neutral) versus M2a (positive selection) and M7 (beta) versus M8 (beta + ω). "
    [Show abstract] [Hide abstract] ABSTRACT: Background Subtilisin-like serine proteases or Subtilases in fungi are important for penetration and colonization of host. In Hypocreales, these proteins share several properties with other fungal, bacterial, plant and mammalian homologs. However, adoption of specific roles in entomopathogenesis may be governed by attainment of unique biochemical and structural features during the evolutionary course. Due to such functional shifts Subtilases coded by different family members of Hypocreales acquire distinct features according to respective hosts and lifestyle. We conducted phylogenetic and DIVERGE analyses and identified important protein residues that putatively assign functional specificity to Subtilases in fungal families/species under the order Hypocreales. ResultsA total of 161 Subtilases coded by 10 species from five different families under the fungal order Hypocreales was included in the analysis. Based on the presence of conserved domains, the Subtilase genes were divided into three subfamilies, Subtilisin (S08.005), Proteinase K (S08.054) and Serine-carboxyl peptidases (S53.001). These subfamilies were investigated for phylogenetic associations, protein residues under positive selection and functional divergence among paralogous clades. The observations were co-related with the life-styles of the fungal families/species. Phylogenetic and Divergence analyses of Subtilisin (S08.005) and Proteinase K (S08.054) families of proteins revealed that the paralogous clades were clear-cut representation of familial origin of the protein sequences. We observed divergence between the paralogous clades of plant-pathogenic fungi (Nectriaceae), insect-pathogenic fungi (Cordycipitaceae/Clavicipitaceae) and nematophagous fungi (Ophiocordycipitaceae). In addition, Subtilase genes from the nematode-parasitic fungus Purpureocillium lilacinum made a unique cluster which putatively indicated that the fungus might have developed distinctive mechanisms for nematode-pathogenesis. Our evolutionary genetics analysis revealed evidence of positive selection on the Subtilisin (S08.005) and Proteinase K (S08.054) protein sequences of the entomopathogenic and nematophagous species belonging to Cordycipitaceae, Clavicipitaceae and Ophiocordycipitaceae families of Hypocreales. Conclusions Our study provided new insights into the evolution of Subtilisin like serine proteases in Hypocreales, a fungal order largely consisting of biological control species. Subtilisin (S08.005) and Proteinase K (S08.054) proteins seemed to play important roles during life style modifications among different families and species of Hypocreales. Protein residues found significant in functional divergence analysis in the present study may provide support for protein engineering in future.
    Full-text · Article · Dec 2016
    • "As the branch-site models are relatively parameterrich , we also fit sites models for each gene, which are designed to detect positively selected sites across the entire tree. We conducted two sets of model comparisons for each gene, M1a versus M2a and M7 versus M8 [52, 82]. The M1a model has two site classes, one conserved (0 < ω < 1) and one neutral (ω = 1), while the M2a model has an extra category of positively selected sites (ω > 1). "
    [Show abstract] [Hide abstract] ABSTRACT: Background Phenotypic transitions, such as trait gain or loss, are predicted to carry evolutionary consequences for the genes that control their development. For example, trait losses can result in molecular decay of the pathways underlying the trait. Focusing on the Iochrominae clade (Solanaceae), we examine how repeated losses of floral anthocyanin pigmentation associated with flower color transitions have affected the molecular evolution of three anthocyanin pathway genes (Chi, F3h, and Dfr). Results We recovered intact coding regions for the three genes in all of the lineages that have lost floral pigmentation, suggesting that molecular decay is not associated with these flower color transitions. However, two of the three genes (Chi, F3h) show significantly elevated dN/dS ratios in lineages without floral pigmentation. Maximum likelihood analyses suggest that this increase is due to relaxed constraint on anthocyanin genes in the unpigmented lineages as opposed to positive selection. Despite the increase, the values for dN/dS in both pigmented and unpigmented lineages were consistent overall with purifying selection acting on these loci. Conclusions The broad conservation of anthocyanin pathway genes across lineages with and without floral anthocyanins is consistent with the growing consensus that losses of pigmentation are largely achieved by changes in gene expression as opposed to structural mutations. Moreover, this conservation maintains the potential for regain of flower color, and indicates that evolutionary losses of floral pigmentation may be readily reversible. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0675-3) contains supplementary material, which is available to authorized users.
    Full-text · Article · Dec 2016
Show more