Zeng K, Fu YX, Shi S, Wu CI.. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174: 1431-1439

State Key Laboratory of Biocontrol, Ministry of Education, Sun Yat-sen University, Guangzhou, China.
Genetics (Impact Factor: 4.87). 12/2006; 174(3):1431-9. DOI: 10.1534/genetics.106.061432
Source: PubMed

ABSTRACT By comparing the low-, intermediate-, and high-frequency parts of the frequency spectrum, we gain information on the evolutionary forces that influence the pattern of polymorphism in population samples. We emphasize the high-frequency variants on which positive selection and negative (background) selection exhibit different effects. We propose a new estimator of theta (the product of effective population size and neutral mutation rate), thetaL, which is sensitive to the changes in high-frequency variants. The new thetaL allows us to revise Fay and Wu's H-test by normalization. To complement the existing statistics (the H-test and Tajima's D-test), we propose a new test, E, which relies on the difference between thetaL and Watterson's thetaW. We show that this test is most powerful in detecting the recovery phase after the loss of genetic diversity, which includes the postselective sweep phase. The sensitivities of these tests to (or robustness against) background selection and demographic changes are also considered. Overall, D and H in combination can be most effective in detecting positive selection while being insensitive to other perturbations. We thus propose a joint test, referred to as the DH test. Simulations indicate that DH is indeed sensitive primarily to directional selection and no other driving forces.

Download full-text


Available from: Suhua Shi, Jun 03, 2015
  • Source
    • "Briefly, a large number of replicate simulations were performed for each demographic model, where the parameters of the model were drawn from prior distributions. Simulated data were summarized using h w (Watterson 1975), Tajima's D (Tajima 1989), the standardized Fay and Wu's H (Fay and Wu 2000; Zeng et al. 2006), and Kelly's Z nS (Kelly 1997) statistics. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Pinus krempfii Lecomte is a morphologically and ecologically unique pine, endemic to Vietnam. It is regarded as vulnerable species with distribution limited to just two provinces: Khanh Hoa and Lam Dong. Although a few phylogenetic studies have included this species, almost nothing is known about its genetic features. In particular, there are no studies addressing the levels and patterns of genetic variation in natural populations of P. krempfii. In this study, we sampled 57 individuals from six natural populations of P. krempfii and analyzed their sequence variation in ten nuclear gene regions (approximately 9 kb) and 14 mitochondrial (mt) DNA regions (approximately 10 kb). We also analyzed variation at seven chloroplast (cp) microsatellite (SSR) loci. We found very low haplotype and nucleotide diversity at nuclear loci compared with other pine species. Furthermore, all investigated populations were monomorphic across all mitochondrial DNA (mtDNA) regions included in our study, which are polymorphic in other pine species. Population differentiation at nuclear loci was low (5.2%) but significant. However, structure analysis of nuclear loci did not detect genetically differentiated groups of populations. Approximate Bayesian computation (ABC) using nuclear sequence data and mismatch distribution analysis for cpSSR loci suggested recent expansion of the species. The implications of these findings for the management and conservation of P. krempfii genetic resources were discussed.
    Ecology and Evolution 05/2014; 4(11):2228-2238. DOI:10.1002/ece3.1091 · 1.66 Impact Factor
  • Source
    • "), Fay and Wu's H (Fay and Wu 2000), Zeng's E (Zeng et al. 2006), Strobeck's S (Strobeck 1987), Achaz's Y (Achaz 2009), Fu's F S (Fu 1997), Ramos-Onsins' and Rozas' R2 (Ramos-Onsins and Rozas 2002), as well as all corresponding theta values Linkage disequilibrium ZnS (Kelly 1997), B/Q (Wall 1999), ZA/ZZ (Rozas et al. 2001), and correlation coefficient r 2 for each pair of SNPs within or between windows/regions Recombination statistics Four-gamete test (Hudson and Kaplan 1985) Diversities Nucleotide and haplotype diversity (Hudson, Boos et al. 1992); (Nei 1979); see " Neutrality statistics " for a list of calculated Theta values Selective sweeps CL, CLR (Nielsen et al. 2005) FST estimates G ST (Nei 1973); F ST (Hudson, Slatkin et al. 1992); G ST , H ST , K ST (Hudson, Boos et al. 1992); S nn (Hudson 2000); Phi ST (Excoffier and Smouse 1992) MKT McDonald–Kreitman test (McDonald and Kreitman 1991) "
    [Show abstract] [Hide abstract]
    ABSTRACT: While many computer programs can perform population genetics calculations, they are typically limited in the analyses and data input formats they offer; few applications can process the large datasets produced by whole-genome resequencing projects. Furthermore, there is no coherent framework for the easy integration of new statistics into existing pipelines, hindering the development and application of new population genetics and genomics approaches. Here, we present PopGenome, a population genomics package for the R software environment (a de-facto standard for statistical analyses). PopGenome can efficiently process genome-scale data as well as large sets of individual loci. It reads DNA alignments and SNP datasets in most common formats, including those used by the HapMap, 1000 human genomes, and 1001 Arabidopsis genomes projects. PopGenome also reads associated annotation files in GFF format, enabling users to easily define regions or classify SNPs based on their annotation; all analyses can also be applied to sliding windows. PopGenome offers a wide range of diverse population genetics analyses, including neutrality tests as well as statistics for population differentiation, linkage disequilibrium, and recombination. PopGenome is linked to Hudson's MS and Ewing's MSMS programs to assess statistical significance based on coalescent simulations. PopGenome's integration in R facilitates effortless and reproducible downstream analyses as well as the production of publication-quality graphics. Developers can easily incorporate new analyses methods into the PopGenome framework. PopGenome and R are freely available from CRAN ( for all major operating systems under the GNU General Public License.
    Molecular Biology and Evolution 04/2014; 31(7). DOI:10.1093/molbev/msu136 · 14.31 Impact Factor
  • Source
    • "); D, Tajima's D statistic (Tajima 1989); D*, Fu and Li's D* test (Fu and Li 1993); H, Fay and Wu H test (Fay et al. 2002; Zeng et al. 2006); P(HKA), HKA test P value (Hudson et al. 1987). *P value 0.025 using three different demographic models (constant size, our best-fit model, and Hey 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
    Genome Biology and Evolution 12/2013; 5(12):2512. DOI:10.1093/gbe/evt198 · 4.53 Impact Factor
Show more