Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites.

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA.
Genetics (Impact Factor: 4.87). 10/2004; 168(2):1041-51. DOI: 10.1534/genetics.104.031153
Source: PubMed

ABSTRACT The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.

Download full-text


Available from: Rasmus Nielsen, Jul 01, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cactophilic Drosophila species provide a valuable model to study gene-environment interactions and ecological adaptation. D. buzzatii and D. mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content to that of D. mojavensis and two other non-cactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (> 3 kb) and contains 13,657 annotated protein-coding genes. Using RNA-Seq data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes and 20% ncRNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii-D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology and reproduction. In summary we identified genetic signatures of adaptation in the shared D. buzzatii-D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Genome Biology and Evolution 01/2015; 7(1). DOI:10.1093/gbe/evu291 · 4.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multigenic and multiallelic S-locus in plants is responsible for the gametophytic self-incompatibility system, which is important to prevent the detrimental effects of self-fertilization and inbreeding depression. Several studies have discussed the importance of punctual mutations, recombination, and natural selection in the generation of allelic diversity in the S-locus. However, there has been no wide-ranging study correlating the molecular evolution and structural aspects of the corresponding proteins in Solanum. Therefore, we evaluated the molecular evolution of one gene in this locus and generated a statistically well-supported phylogenetic tree, as well as evidence of positive selection, helping us to understand the diversification of S alleles in Solanum. The three-dimensional structures of some of the proteins corresponding to the major clusters of the phylogenetic tree were constructed and subsequently submitted to molecular dynamics to stabilize the folding and obtain the native structure. The positively selected amino acid residues were predominantly located in the hyper variable regions and on the surface of the protein, which appears to be fundamental for allele specificity. One of the positively selected residues was identified adjacent to a conserved strand that is crucial for enzymatic catalysis. Additionally, we have shown significant differences in the electrostatic potential among the predicted molecular surfaces in S-RNases. The structural results indicate that local changes in the three-dimensional structure are present in some regions of the molecule, although the general structure seems to be conserved. No previous study has described such structural variations in S-RNases.
    Molecular Genetics and Genomics 12/2014; 290(3). DOI:10.1007/s00438-014-0969-3 · 2.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PKDREJ is a testis-specific protein thought to be located on the sperm surface. Functional studies in the mouse revealed that loss of PKDREJ has effects on sperm transport and the ability to undergo an induced acrosome reaction. Thus, PKDREJ has been considered a potential target of post-copulatory sexual selection in the form of sperm competition. Proteins involved in reproductive processes often show accelerated evolution. In many cases, this rapid divergence is promoted by positive selection which may be driven, at least in part, by post-copulatory sexual selection. We analyzed the evolution of the PKDREJ protein in primates and rodents and assessed whether PKDREJ divergence is associated with testes mass relative to body mass, which is a reliable proxy of sperm competition levels. Evidence of an association between the evolutionary rate of the PKDREJ gene and testes mass relative to body mass was not found in primates. Among rodents, evidence of positive selection was detected in the Pkdrej gene in the family Cricetidae but not in Muridae. We then assessed whether Pkdrej divergence is associated with episodes of sperm competition in these families. We detected a positive significant correlation between the evolutionary rates of Pkdrej and testes mass relative to body mass in cricetids. These findings constitute the first evidence of post-copulatory sexual selection influencing the evolution of a protein that participates in the mechanisms regulating sperm transport and the acrosome reaction, strongly suggesting that positive selection may act on these fertilization steps, leading to advantages in situations of sperm competition.
    Molecular Human Reproduction 10/2014; 21(2). DOI:10.1093/molehr/gau095 · 3.48 Impact Factor