Accuracy and Power of Statistical Methods for Detecting Adaptive Evolution in Protein Coding Sequences and for Identifying Positively Selected Sites

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA.
Genetics (Impact Factor: 5.96). 10/2004; 168(2):1041-51. DOI: 10.1534/genetics.104.031153
Source: PubMed


The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.

Download full-text


Available from: Rasmus Nielsen,
  • Source
    • "ssified into 3 classes : 0 < ω < 1 , ω = 1 and ω > 1 ) . These two models were com - pared by LRT and when the MA best fitted the data , we proceeded to one additional comparison ; between MA and MA ω=1 ( where ω is forced to be 1 in the third class ) to distinguish relaxed selective constraints from posi - tive selection ( Yang and Nielsen 2002 ; Wong et al . 2004 ) ."
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-domain proteins form the majority of proteins in eukaryotes. During their formation by tandem duplication or gene fusion, new interactions between domains may arise as a result of the structurally-forced proximity of domains. The proper function of the formed proteins likely required the molecular adjustment of these stress zones by specific amino acid replacements, which should be detectable by the molecular signature of selection that governed their changes. We used multi-domain globins from three different invertebrate lineages to investigate the selective forces that acted throughout the evolution of these molecules. In the youngest of these molecules [Branchipolynoe scaleworm; original duplication ca. 60 million years (Ma)], we were able to detect some amino acids under positive selection corresponding to the initial duplication event. In older lineages (didomain globin from bivalve mollusks and nematodes), there was no evidence of amino acid positions under positive selection, possibly the result of accumulated non-adaptative mutations since the original duplication event (165 and 245 Ma, respectively). Some amino acids under positive selection were sometimes detected in later branches, either after speciation events, or after the initial duplication event. In Branchipolynoe, the position of the amino acids under positive selection on a 3D model suggests some of them are located at the interface between two domains; while others are locate in the heme pocket.
    SpringerPlus 07/2015; 4:354. DOI:10.1186/s40064-015-1124-2
  • Source
    • "The M7 model assumes that v follows a beta distribution with 10 categories, each corresponding to a distinctive v value that is always less than 1, whereas the M8a model allows for an extra class of codons with v = 1. The alternative model M8 has an extra category with v . 1 (Yang et al. 2000; Swanson et al. 2001; Wong et al. 2004). Comparisons between the two nested site models were performed to evaluate the variation in v (M3 vs. M0) and to determine the presence of a positively selected class of sites (M2a vs. M1a; M8 vs. M7 and M8 vs. M8a). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Following the two rounds of whole-genome duplication that occurred during deuterostome evolution, a third genome duplication event occurred in the stem lineage of ray-finned fishes. This teleost-specific genome duplication (TGD) is thought to be responsible for the biological diversification of ray-finned fishes. DEAD-box polypeptide 3 (DDX3) belongs to the DEAD-box RNA helicase family. Although their functions in humans have been well studied, limited information is available regarding their function in teleosts. In this study, two teleost Ddx3 genes were first identified in the transcriptome of Japanese flounder (Paralichthys olivaceus). We confirmed that the two genes originated from teleost-specific genome duplication through synteny and phylogenetic analysis. Additionally, comparative analysis of genome structure, molecular evolution rate, and expression pattern of the two genes in Japanese flounder revealed evidence of sub-functionalization of the duplicated Ddx3 genes in teleosts. Thus, the results of this study reveal novel insights into the evolution of the teleost Ddx3 genes and constitute important groundwork for further research on this gene family. Copyright © 2015 Author et al.
    G3-Genes Genomes Genetics 06/2015; 5(8). DOI:10.1534/g3.115.018911 · 3.20 Impact Factor
  • Source
    • "Several methods have been developed to carry out genome-wide scans for genes evolving under positive selection (Nielsen 2005; Anisimova and Liberles 2007; Vitti et al. 2013). We used here a rather simple approach based on the comparison of the nonsynonymous substitution rate (dn) with the synonymous substitution rate (ds) at the codon level (Yang et al. 2000; Wong et al. 2004; Zhang et al. 2005; Yang 2007). Genes putatively under positive selection were detected on the basis of statistical evidence for a subset of codons where replacement mutations were fixed faster than mutation at silent sites. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cactophilic Drosophila species provide a valuable model to study gene-environment interactions and ecological adaptation. D. buzzatii and D. mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content to that of D. mojavensis and two other non-cactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (> 3 kb) and contains 13,657 annotated protein-coding genes. Using RNA-Seq data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes and 20% ncRNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii-D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology and reproduction. In summary we identified genetic signatures of adaptation in the shared D. buzzatii-D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Genome Biology and Evolution 01/2015; 7(1). DOI:10.1093/gbe/evu291 · 4.23 Impact Factor
Show more