Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites.

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA.
Genetics (Impact Factor: 4.87). 10/2004; 168(2):1041-51. DOI: 10.1534/genetics.104.031153
Source: PubMed

ABSTRACT The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.


Available from: Rasmus Nielsen, Apr 20, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Respiratory complexes are encoded by two genomes (mitochondrial DNA [mtDNA] and nuclear DNA [nDNA]). Although the importance of intergenomic coadaptation is acknowledged, the forces and constraints shaping such coevolution are largely unknown. Previous works using cytochrome c oxidase (COX) as a model enzyme have led to the so-called “optimizing interaction” hypothesis. According to this view, mtDNA-encoded residues close to nDNA-encoded residues evolve faster than the rest of positions, favoring the optimization of protein–protein interfaces. Herein, using evolutionary data in combination with structural information of COX, we show that failing to discern the effects of interaction from other structural and functional effects can lead to deceptive conclusions such as the “optimizing hypothesis.” Once spurious factors have been accounted for, data analysis shows that mtDNA-encoded residues engaged in contacts are, in general, more constrained than their noncontact counterparts. Nevertheless, noncontact residues from the surface of COX I subunit are a remarkable exception, being subjected to an exceptionally high purifying selection that may be related to the maintenance of a suitable heme environment. We also report that mtDNA-encoded residues involved in contacts with other mtDNA-encoded subunits are more constrained than mtDNA-encoded residues interacting with nDNA-encoded polypeptides. This differential behavior cannot be explained on the basis of predicted thermodynamic stability, as interactions between mtDNA-encoded subunits contribute more weakly to the complex stability than those interactions between subunits encoded by different genomes. Therefore, the higher conservation observed among mtDNA-encoded residues involved in intragenome interactions is likely due to factors other than structural stability.
    Genome Biology and Evolution 10/2014; 6(11):3064-3076. DOI:10.1093/gbe/evu240 · 4.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Polymorphisms in the first intron of FTO have been robustly replicated for associations with obesity. In the Sorbs, a Slavic population resident in Germany, the strongest effect on body mass index (BMI) was found for a variant in the third intron of FTO (rs17818902). Since this may indicate population specific effects of FTO variants, we initiated studies testing FTO for signatures of selection in vertebrate species and human populations. First, we analyzed the coding region of 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML, ω = dN/dS) to screen for signatures of selection among species. Second, we investigated human population (Europeans/CEU, Yoruba/YRI, Chinese/CHB, Japanese/JPT, Sorbs) SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/PhaseII. Finally, using ConSite we compared transcription factor (TF) binding sites at sequences harbouring FTO SNPs in intron three. PAML analyses revealed strong conservation in coding region of FTO (ω<1). Sliding-window results from population genetic analyses provided highly significant (p<0.001) signatures for balancing selection specifically in the third intron (e.g. Tajima's D in Sorbs = 2.77). We observed several alterations in TF binding sites, e.g. TCF3 binding site introduced by the rs17818902 minor allele. Population genetic analysis revealed signatures of balancing selection at the FTO locus with a prominent signal in intron three, a genomic region with strong association with BMI in the Sorbs. Our data support the hypothesis that genes associated with obesity may have been under evolutionary selective pressure.
    PLoS ONE 02/2015; 10(2):e0117093. DOI:10.1371/journal.pone.0117093 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Empirical evidence is accumulating that pathogens drive selection and explain common patterns of high immune gene (major histocompatibility complex, MHC) polymorphism. While most previous studies have identified that selection has acted over large time scales on the MHC, there still is a paucity of information in mammal species that demonstrates how processes operate on MHC genes in extant generations. Here we investigated 439 striped mouse individuals (Rhabdomys pumilio), trapped across seven different locations along a climatic gradient in southern Africa. Data from a previous study, conducted in the same study system, revealed that gastro-intestinal nematode infections were higher in individuals from study sites located within wetter climates compared to those from drier ones. In order to improve our understanding about the role of parasite-driven selection on the MHC in contemporary generations we tested for population divergences based on seven neutral microsatellite markers and the MHC DRB exon II locus. If divergences exist, we wanted to know if they are influenced by the spatial variation in parasite pressure mediated by different climatic conditions along the study site transect. Our analysis revealed an extensive polymorphism of 249 different MHC alleles and isolation-by-distance showed significant correlations at the microsatellite loci but not at the MHC. Nematode pressure was lowest at the driest site (Fish River Canyon, Namibia) and specifically this population revealed the highest divergence between MHC and microsatellite loci. We conclude that spatial variation in parasite pressure can facilitate local immune gene adaptations and thus mediate interactions of directional and balancing selection shaping MHC polymorphism in contemporary generations.
    Evolutionary Ecology 11/2014; 28(6):1169-1190. DOI:10.1007/s10682-014-9731-x · 2.37 Impact Factor