Article

Constructing genomic maps of positive selection in humans: Where do we go from here?

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Identifying targets of positive selection in humans has, until recently, been frustratingly slow, relying on the analysis of individual candidate genes. Genomics, however, has provided the necessary resources to systematically interrogate the entire genome for signatures of natural selection. To date, 21 genome-wide scans for recent or ongoing positive selection have been performed in humans. A key challenge is to begin synthesizing these newly constructed maps of positive selection into a coherent narrative of human evolutionary history and derive a deeper mechanistic understanding of how natural populations evolve. Here, I chronicle the recent history of the burgeoning field of human population genomics, critically assess genome-wide scans for positive selection in humans, identify important gaps in knowledge, and discuss both short- and long-term strategies for traversing the path from the low-resolution, incomplete, and error-prone maps of selection today to the ultimate goal of a detailed molecular, mechanistic, phenotypic, and population genetics characterization of adaptive alleles.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Genome-wide scans for selection in humans are now routine (Fan et al. 2016), but they have poor replicability such that different methods produce very different lists of candidate genes. This is true for positive selection (Akey 2009) and may be worse for balancing selection: among six recent studies that scan the genome for balancing selection in Africans or African Americans, the proportion of identified candidate selection targets that are shared between any two scans ranges from 0 to 9% (Andrés et al. 2009;Leffler et al. 2013;DeGiorgio et al. 2014;Siewert and Voight 2017;Bitarello et al. 2018;Cheng and DeGiorgio 2019). ...
... Thus, despite being canonical examples of adaptive polymorphism, these genes are almost never detected in genome-wide scans for partial Downloaded from https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa294/5981112 by guest on 10 December 2020 sweeps or balancing selection (e.g. Voight et al. 2006;Akey 2009;Andrés et al. 2009;Leffler et al. 2013;DeGiorgio et al. 2014;Siewert and Voight 2017;Bitarello et al. 2018;Cheng and DeGiorgio 2019). ...
... We present three novel summary statistics that reflect evidence for selective non-neutrality in population genetic data. There are many such statistical tests (Vitti et al. 2013), which are frequently employed to find putative targets of selection across the human genome (Sabeti et al. 2007;Akey 2009;Fan et al. 2016). However, despite this plethora of statistical tools, it remains challenging to conclusively identify instances of positive or balancing selection in humans. ...
Article
Full-text available
Malaria has been one of the strongest selective pressures on our species. Many of the best-characterized cases of adaptive evolution in humans are in genes tied to malaria resistance. However, the complex evolutionary patterns at these genes are poorly captured by standard scans for non-neutral evolution. Here we present three new statistical tests for selection based on population genetic patterns that are observed more than once among key malaria resistance loci. We assess these tests using forward-time evolutionary simulations and apply them to global whole-genome sequencing data from humans, and thus we show that they are effective at distinguishing selection from neutrality. Each test captures a distinct evolutionary pattern, here called Divergent Haplotypes, Repeated Shifts, and Arrested Sweeps, associated with a particular period of human prehistory. We clarify the selective signatures at known malaria-relevant genes and identify additional genes showing similar adaptive evolutionary patterns. Among our top outliers, we see a particular enrichment for genes involved in erythropoiesis and for genes previously associated with malaria resistance, consistent with a major role for malaria in shaping these patterns of genetic diversity. Polymorphisms at these genes are likely to impact resistance to malaria infection and contribute to ongoing host-parasite coevolutionary dynamics.
... In contrast, the opposite We therefore turned to alternative sources of insights. Published studies have listed genes showing evidence of positive selection in genome-wide scans; although these lists differ between studies, a core set of genes detected by multiple scans has been compiled [18][19][20]. ...
... Our data throw light on two topics of current debate about selection in humans. First, some but not all studies [18] have found more evidence for recent positive selection outside Africa than inside. It has been difficult to interpret the results of tests that incorporate haplotype structure, because recombination differs between populations, with lower levels of linkage disequilibrium and different PRDM9 alleles and recombination hotspots in Africa [47]. ...
... We summarized from the literature a list of 3,467 genes that have been previously identified in in genomic scans for positive selection [18][19][20] and compiled the occurrence of non-redundant genes hosting HighD sites (HighD-genes; n=542) in it. To obtain estimates of random expectation we calculated the occurrence of 100 sets of control genes in the list of positively selected genes. ...
Preprint
Full-text available
Background: Population differentiation has proved to be effective for identifying loci under geographically-localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes. Results: We demonstrate that while sites of low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively. Conclusions: We have identified known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.
... This integrative approach advances beyond the current phylogenomic methods that compare patterns across species, but are blind to variation segregating within a given species (Goldman and Yang 1994;Muse and Gaut 1994;Yang and Bielawski 2000;Hurst 2002;Nielsen et al. 2005;Pollard et al. 2006;Anisimova and Yang 2007;Shapiro and Alm 2008;Lindblad-Toh et al. 2011;Peter et al. 2012). It is also distinct from the current population genomic methods that utilize patterns of population variation to identify candidate adaptive genes or genetic regions, but do not distinguish specific amino acid variants (Akey et al. 2002;Li and Stephan 2006;Teshima et al. 2006;Voight et al. 2006;Sabeti et al. 2007;Akey 2009;Grossman et al. 2013;Moon and Akey 2016). Together, evolutionary information from both short-and long-term time scales is harnessed in our approach. ...
... This new adaptive allele catalog is made possible by the EP approach, which is sensitive to a timeframe that predates the out of Africa migration of modern humans, but is not limited to fixed differences between species (Goldman and Yang 1994;Muse and Gaut 1994;Yang and Bielawski 2000;Hurst 2002;Nielsen et al. 2005;Pollard et al. 2006;Anisimova and Yang 2007;Holt et al. 2008;Shapiro and Alm 2008;Lindblad-Toh et al. 2011;Peter et al. 2012). The former timeframe has been addressed by methods that are sensitive to recent classic sweeps and regionally restricted adaptation, which have been the focus of the majority of human adaptation studies to date (Akey et al. 2002;Li and Stephan 2006;Teshima et al. 2006;Voight et al. 2006;Sabeti et al. 2007;Akey 2009;Grossman et al. 2013;Moon and Akey 2016). These studies have yielded only a few adaptive coding variants, leading some to argue that regulatory variation is the predominant raw material for adaptive change (Akey 2009;Fraser 2013;Grossman et al. 2013). ...
... The former timeframe has been addressed by methods that are sensitive to recent classic sweeps and regionally restricted adaptation, which have been the focus of the majority of human adaptation studies to date (Akey et al. 2002;Li and Stephan 2006;Teshima et al. 2006;Voight et al. 2006;Sabeti et al. 2007;Akey 2009;Grossman et al. 2013;Moon and Akey 2016). These studies have yielded only a few adaptive coding variants, leading some to argue that regulatory variation is the predominant raw material for adaptive change (Akey 2009;Fraser 2013;Grossman et al. 2013). Our results suggest that the temporal sensitivity of the EP approach is able to generate a catalog of CAPs that is enriched in functional as well as beneficial variation. ...
Article
Full-text available
The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored non-adaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many non-adaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and that hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of non-neutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
... Classical examples including Tajima's D [70], Fay and Wu's H [24], Composite Likelihood Ratio [52], were all shown to be weighted linear combination of the SFS values [1]. While successful, these methods are prone to both, false negatives [49], as also false-discoveries due to confounding factors such as demography, including bottleneck and population expansions, and ascertainment bias [3,49,51,56,57]. Nevertheless, SFS based tests continue to be used successfully, often in combination with other tests [3,75]. ...
... While successful, these methods are prone to both, false negatives [49], as also false-discoveries due to confounding factors such as demography, including bottleneck and population expansions, and ascertainment bias [3,49,51,56,57]. Nevertheless, SFS based tests continue to be used successfully, often in combination with other tests [3,75]. One of the contributions of this paper is the extension of SFS based methods to analyze time-series data, and the identification of selection regimes where these methods perform well. ...
Preprint
Full-text available
The advent of next generation sequencing technologies has made whole-genome and whole-population sampling possible, even for eukaryotes with large genomes. With this development, experimental evolution studies can be designed to observe molecular evolution “in-action” via Evolve-and-Resequence (E&R) experiments. Among other applications, E&R studies can be used to locate the genes and variants responsible for genetic adaptation. Existing literature on time-series data analysis often assumes large population size, accurate allele frequency estimates, and wide time spans. These assumptions do not hold in many E&R studies. In this article, we propose a method-Composition of Likelihoods for Evolve-And-Resequence experiments (C lear )–to identify signatures of selection in small population E&R experiments. C lear takes whole-genome sequence of pool of individuals (pool-seq) as input, and properly addresses heterogeneous ascertainment bias resulting from uneven coverage. C lear also provides unbiased estimates of model parameters, including population size, selection strength and dominance, while being computationally efficient. Extensive simulations show that C lear achieves higher power in detecting and localizing selection over a wide range of parameters, and is robust to variation of coverage. We applied C lear statistic to multiple E&R experiments, including, data from a study of D. melanogaster adaptation to alternating temperatures and a study of outcrossing yeast populations, and identified multiple regions under selection with genome-wide significance.
... Likewise, in populations living at lower latitudes, there are signatures of positive selection in the pigmentation pathway due to high UVR exposure than those living in higher latitudes [2]. Additionally, pathogens have imposed selective pressures on human populations as indicated by more than 300 immune and immune-related genes reported to have signatures of recent positive selection correlated with specific groups of microbes such as viruses, protozoa, and parasitic worms [3][4][5]. ...
... , quantifying a dimensional free energy unit shared amongst all humans. (3) For a given allele a that can be biophysically associated with a definable environmental parameter λ (such as UV light, etc.), we defined the environmentally induced adaptive force on that allele by ≡ , with analogously defined adaptive forces on potentials characterizing SNPs, haploblocks, haplotypes, (4) genes, and even perhaps whole chromosomes. Such an expression is only meaningful if there is a functional relationship between expression of the biology of the genomic unit and a particular environmental parameter λ. ...
... As evident from the table in Fig. 4e, all, except one (rs4968839), of the nsSNVs at the ABC transporter family, which showed evidence of RPS or significant population differentiation, are predicted to either have a potentially deleterious effect on protein function or alter ESE/ESS modulating the proportion of the different splice forms. Notably, > 40% of these nsSNVs have been reported to be significantly associated with various phenotypes including clinically relevant ones [43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59] (Fig. 4e), highlighting the functional importance of the nsSNVs under natural selection at the ABC transporter gene family. ...
Article
Full-text available
Background: Genetic polymorphisms can contribute to phenotypic differences amongst individuals, including disease risk and drug response. Characterization of genetic polymorphisms that modulate gene expression and/or protein function may facilitate the identification of the causal variants. Here, we present the architecture of genetic polymorphisms in the human genome focusing on those predicted to be potentially functional/under natural selection and the pathways that they reside. Results: In the human genome, polymorphisms that directly affect protein sequences and potentially affect function are the most constrained variants with the lowest single-nucleotide variant (SNV) density, least population differentiation and most significant enrichment of rare alleles. SNVs which potentially alter various regulatory sites, e.g. splicing regulatory elements, are also generally under negative selection. Interestingly, genes that regulate the expression of transcription/splicing factors and histones are conserved as a higher proportion of these genes is non-polymorphic, contain ultra-conserved elements (UCEs) and/or has no non-synonymous SNVs (nsSNVs)/coding INDELs. On the other hand, major histocompatibility complex (MHC) genes are the most polymorphic with SNVs potentially affecting the binding of transcription/splicing factors and microRNAs (miRNA) exhibiting recent positive selection (RPS). The drug transporter genes carry the most number of potentially deleterious nsSNVs and exhibit signatures of RPS and/or population differentiation. These observations suggest that genes that interact with the environment are highly polymorphic and targeted by RPS. Conclusions: In conclusion, selective constraints are observed in coding regions, master regulator genes, and potentially functional SNVs. In contrast, genes that modulate response to the environment are highly polymorphic and under positive selection.
... We applied methods to detect signatures of selection that depend on frequency changes of alleles (CLR) and haplotypes (iHS). The detected signatures of selection may be confounded by other evolutionary forces including genetic drift and background selection [60][61][62]. ...
Article
Full-text available
Background: Autochthonous cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and food conditions. Original Braunvieh (OB) is a local cattle breed of Switzerland used for beef and milk production in alpine areas. Using whole-genome sequencing (WGS) data of 49 key ancestors, we characterize genomic diversity, genomic inbreeding, and signatures of selection in Swiss OB cattle at nucleotide resolution. Results: We annotated 15,722,811 SNPs and 1,580,878 Indels including 10,738 and 2763 missense deleterious and high impact variants, respectively, that were discovered in 49 OB key ancestors. Six Mendelian trait-associated variants that were previously detected in breeds other than OB, segregated in the sequenced key ancestors including variants causal for recessive xanthinuria and albinism. The average nucleotide diversity (1.6  × 10- 3) was higher in OB than many mainstream European cattle breeds. Accordingly, the average genomic inbreeding derived from runs of homozygosity (ROH) was relatively low (FROH = 0.14) in the 49 OB key ancestor animals. However, genomic inbreeding was higher in OB cattle of more recent generations (FROH = 0.16) due to a higher number of long (> 1 Mb) runs of homozygosity. Using two complementary approaches, composite likelihood ratio test and integrated haplotype score, we identified 95 and 162 genomic regions encompassing 136 and 157 protein-coding genes, respectively, that showed evidence (P < 0.005) of past and ongoing selection. These selection signals were enriched for quantitative trait loci related to beef traits including meat quality, feed efficiency and body weight and pathways related to blood coagulation, nervous and sensory stimulus. Conclusions: We provide a comprehensive overview of sequence variation in Swiss OB cattle genomes. With WGS data, we observe higher genomic diversity and less inbreeding in OB than many European mainstream cattle breeds. Footprints of selection were detected in genomic regions that are possibly relevant for meat quality and adaptation to local environmental conditions. Considering that the population size is low and genomic inbreeding increased in the past generations, the implementation of optimal mating strategies seems warranted to maintain genetic diversity in the Swiss OB cattle population.
... A selective sweep can be either hard (Maynard Smith and Haigh, 1974;Kaplan et al., 1989;Stephan et al., 1992;Braverman et al., 1995) or soft (Orr and Betancourt, 2001;Innan and Kim, 2004;Hermisson and Pennings, 2005;Przeworski et al., 2005). A hard sweep is driven by a single beneficial haplotype, whereas a soft sweep is driven by multiple beneficial haplotypes that may form because of recurrent mutation/migration or as standing variations (Pennings and Hermisson, 2006;Teshima et al., 2006;Akey, 2009;Scheinfeldt et al., 2009;Peter et al., 2012). Fujito et al. (2018a) considered a soft sweep for generality, but they could not develop a method that can distinguish between the two modes of adaptation. ...
Article
Full-text available
The two-dimensional site frequency spectrum (2D SFS) was investigated to describe the intra-allelic variability (IAV) maintained within a derived allele (D) group that has undergone an incomplete selective sweep against an ancestral allele group. We observed that recombination certainly muddles the ancestral relationships of allelic lineages between the two allele groups; however, the 2D SFS reveals intriguing signatures of recombination as well as the genealogical structure of the D group, particularly the size of a mutation and the time to the most recent common ancestor (TMRCA). Coalescent simulations were performed to achieve powerful and robust 2D SFS-based statistics with special reference to accurate evaluation of IAV, significance of recombination effects, and distinction between hard and soft selective sweeps. These studies were extended to a case wherein an incomplete selective sweep is no longer in progress and ceased in the recent past. The 2D SFS-based method was applied to 100 intronic linkage disequilibrium regions randomly chosen from the East Asian population of modern humans to examine the P value distributions of the summary statistics under the null hypothesis of neutrality in a nonequilibrium demographic model. We argue that about 96% of intronic variants are non-adaptive with a 10% false discovery rate. Furthermore, this method was applied to six genomic regions in Eurasian populations that were claimed to have experienced recent selective sweeps. We found that two of these genomic regions did not have significant signals of selective sweeps, but the remaining four had undergone hard and soft sweeps and were dated, in terms of TMRCA, after the major out-of-Africa dispersal of modern humans.
... The underlying rational is that the nature as a super laboratory performs functional experiments by inducing mutagenesis cross human genomes and simulating diverse conditions along the evolutionary time; regions/variants that are evolutionarily conserved or under positive selection are assumed to be functional. Up to date, substantial constraint-based algorithms have been developed to measure the deleteriousness of both protein-coding variants [16,17] and non-coding regions [18,19], and numerous methods have emerged and applied to larger empirical data for detecting positively selected regions [20][21][22][23][24]. For instance, a deleterious missense variant (rs80356779) located in the gene CPT1A (MIM: 600528) [25], a functional variant (rs7330796) located in the gene TBC1D4 (MIM: 612465) [26], and several variants in proteins that metabolizes omega-3 polyunsaturated fatty acids [27] occur at high frequency in Arctic human populations and might adapt humans to either specific diets or a cold environment. ...
Article
Full-text available
Despite the tremendous growth of the DNA sequencing data in the last decade, our understanding of the human genome is still in its infancy. To understand the implications of genetic variants in the light of population genetics and molecular evolution, we developed a database, PGG.SNV (https://www.pggsnv.org), which gives much higher weight to previously under-investigated indigenous populations in Asia. PGG.SNV archives 265 million SNVs across 220,147 present-day genomes and 1018 ancient genomes, including 1009 newly sequenced genomes, representing 977 global populations. Moreover, estimation of population genetic diversity and evolutionary parameters is available in PGG.SNV, a unique feature compared with other databases.
... However, both of positive selection and demographic history can result in the high level of genetic differentiation; therefore, selection inferences may contain biases (Akey et al. 2004;Sabeti et al. 2006;Wang et al. 2017). Thus, we performed analyses with two additional methods, π-Ratioothers/homer and XPEHHhomer-to-others, which are less sensitive to population demography events than the other approaches (Innan and (Akey 2009;Sabeti et al. 2006). ...
Article
Full-text available
The homing pigeon was selectively bred from the domestic pigeon for a homing ability over long distances, a very fascinating but complex behavioral trait. Here, we generate a total of 95 whole genomes from diverse pigeon breeds. Comparing the genomes from the homing pigeon population with those from other breeds identifies candidate positively selected genes, including many genes involved in the central nervous system, particularly spatial learning and memory such as LRP8. Expression profiling reveals many neuronal genes displaying differential expression in the hippocampus, which is the key organ for memory and navigation and exhibits significantly larger size in the homing pigeon. In addition, we uncover a candidate gene GSR (encoding glutathione-disulfide reductase) experiencing positive selection in the homing pigeon. Expression profiling found that GSR is highly expressed in the wattle and visual pigment cell layer, and displays increased expression levels in the homing pigeon. In vitro, a magnetic field stimulates increases in calcium ion concentration in cells expressing pigeon GSR. These findings support the importance of the hippocampus (functioning in spatial memory and navigation) for homing ability, and the potential involvement of GSR in pigeon magnetoreception.
... A selective sweep alters the allele frequencies of single nucleotide polymorphisms (SNPs) in the vicinity of the selected allele, and thus causes a distorted pattern of genetic variation that can be useful for detecting selection. The scans for selection that have sought to detect such signals have largely been based on searching for a distortion in the allele frequency spectrum or haplotype structure in a single population (Tajima 1989;Fu and Li 1993;Fay and Wu 2000;Sabeti et al. 2002;Nielsen et al. 2005;Voight et al. 2006; for review, see Akey 2009). ...
... While genome-wide scans with heuristically predetermined analysis regions are an established approach, they are limited in their scope, resolution, and power by requiring a prior choice of the analysis regions. In the context of selection analysis, Akey fittingly compared the scan with a hatchet and called for more refined scalpel-like approaches [28]. We argue that in Akey's analogy, the exhaustive scan is an electron microscope, as it allows for base-pair-level analysis of genomic regions, with genome-wide, nonconservative, optimally powerful correction for multiple testing using replicates of the data generated under the null hypothesis. ...
Article
Full-text available
Region-based genome-wide scans are usually performed by use of a priori chosen analysis regions. Such an approach will likely miss the region comprising the strongest signal and, thus, may result in increased type II error rates and decreased power. Here, we propose a genomic exhaustive scan approach that analyzes all possible subsequences and does not rely on a prior definition of the analysis regions. As a prime instance, we present a computationally ultraefficient implementation using the rare-variant collapsing test for phenotypic association, the genomic exhaustive collapsing scan (GECS). Our implementation allows for the identification of regions comprising the strongest signals in large, genome-wide rare-variant association studies while controlling the family-wise error rate via permutation. Application of GECS to two genomic data sets revealed several novel significantly associated regions for age-related macular degeneration and for schizophrenia. Our approach also offers a high potential to improve genome-wide scans for selection, methylation, and other analyses.
... Similar findings were reported in a study of local adaptation in the wild progenitor of soybean Glycine soja (Bandillo et al. 2017). Different assumptions of each method may be one reason for the small number of colocalized outlier SNPs (Akey 2009). ...
Article
Full-text available
Seed mass is a key component of adaptation in plants and a determinant of yield in crops. The climatic drivers and genomic basis of seed mass variation remain poorly understood. In the cereal crop Sorghum bicolor, globally-distributed landraces harbor abundant variation in seed mass, which is associated with precipitation in their agroclimatic zones of origin. This study aimed to test the hypothesis that diversifying selection across precipitation gradients, acting on ancestral cereal grain size regulators, underlies seed mass variation in global sorghum germplasm. We tested this hypothesis in a set of 1901 georeferenced and genotyped sorghum landraces, 100-seed mass from common gardens, and bioclimatic precipitation variables. As predicted, 100-seed mass in global germplasm varies significantly among botanical races and is correlated to proxies of the precipitation gradients. With general and mixed linear model genome-wide associations, we identified 29 and 56 of 100 a priori candidate seed size genes with polymorphisms in the top 1% of seed mass association, respectively. Eleven of these genes harbor polymorphisms associated with the precipitation gradient, including orthologs of genes that regulate seed size in other cereals. With FarmCPU, 13 significant SNPs were identified, including one at an a priori candidate gene. Finally, we identified eleven colocalized outlier SNPs associated with seed mass and precipitation that also carry signatures of selection based on FST scans and PCAdapt, which represents a significant enrichment. Our findings suggest that seed mass in sorghum was shaped by diversifying selection on drought stress, and can inform genomics-enabled breeding for climate-resilient cereals.
... We applied methods to detect signatures of selection that depend on frequency changes of alleles (CLR) and haplotypes (iHS). The detected signatures of selection may be confounded by other evolutionary forces including genetic drift and background selection [19,92,93]. By considering only the top 0.5% of selection . ...
Preprint
Full-text available
Background Autochthonous cattle breeds represent an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and food conditions. Original Braunvieh (OB) is a local cattle breed of Switzerland used for beef and milk production in alpine areas. Using whole-genome sequencing (WGS) data of 49 key ancestors, we characterize genomic diversity, genomic inbreeding, and signatures of selection in Swiss OB cattle at nucleotide resolution. Results We annotated 15,722,811 million SNPs and 1,580,878 million Indels including 10,738 and 2,763 missense deleterious and high impact variants, respectively, that were discovered in 49 OB key ancestors. Six Mendelian trait-associated variants that were previously detected in breeds other than OB, segregated in the sequenced key ancestors including variants causal for recessive xanthinuria and albinism. The average nucleotide diversity (1.6 × 10 ⁻³ ) was higher in OB than many mainstream European cattle breeds. Accordingly, the average genomic inbreeding quantified using runs of homozygosity (ROH) was relatively low (F ROH =0.14) in the 49 OB key ancestor animals. However, genomic inbreeding was higher in more recent generations of OB cattle (F ROH =0.16) due to a higher number of long (> 1 Mb) runs of homozygosity. Using two complementary approaches, composite likelihood ratio test and integrated haplotype score, we identified 95 and 162 genomic regions encompassing 136 and 157 protein-coding genes, respectively, that showed evidence ( P < 0.005) of past and ongoing selection. These selection signals were enriched for quantitative trait loci related to beef traits including meat quality, feed efficiency and body weight and pathways related to blood coagulation, nervous and sensory stimulus. Conclusions We provide a comprehensive overview of sequence variation in Swiss OB cattle genomes. With WGS data, we observe higher genomic diversity and less inbreeding in OB than many European mainstream cattle breeds. Footprints of selection were detected in genomic regions that are possibly relevant for meat quality and adaptation to local environmental conditions. Considering that the population size is low and genomic inbreeding increased in the past generations, the implementation and adoption of optimal mating strategies seems warranted to maintain genetic diversity in the Swiss OB cattle population.
... These selection pressures left signatures in the landscape of genetic variation that can be identified in our today's genomes (4). Starting from single-locus studies to the first large-scale catalogs of genetic variation (5)(6)(7)(8), dozens of targets of positive selection have been identified, providing important insights into recent human evolutionary history (3,9,10). Even though genome-wide HapMap genotyping data is able to disentangle the effects of demography and selection better than single-locus approaches, it still has the problem of ascertainment bias, which may alter the site frequency spectrum (SFS) of analyzed single nucleotide polymorphisms (SNPs) (11). ...
Article
Full-text available
Since the migrations that led humans to colonize Earth, our species has faced frequent adaptive challenges that have left signatures in the landscape of genetic variation and that we can identify in our today’s genomes. Here, we (i) perform an outlier approach on eight different population genetic statistics for 22 non-admixed human populations of the Phase III of the 1000 Genomes Project to detect selective sweeps at different historical ages, as well as events of recurrent positive selection in the human lineage; and (ii) create PopHumanScan, an online catalog that compiles and annotates all candidate regions under selection to facilitate their validation and thoroughly analysis. Well-known examples of human genetic adaptation published elsewhere are included in the catalog, as well as hundreds of other attractive candidates that will require further investigation. Designed as a collaborative database, PopHumanScan aims to become a central repository to share information, guide future studies and help advance our understanding of how selection has modeled our genomes as a response to changes in the environment or lifestyle of human populations. PopHumanScan is open and freely available at https://pophumanscan.uab.cat.
... To characterize the genomic signature of longevity we used nextgeneration pool-sequencing (Pool-seq) (Schlötterer et al. 2014) to obtain genome-wide allele frequency estimates from four longlived selection lines and two unselected control lines after ࣙ 144 generations of selection (see Supplementary methods for details). We identified candidate SNPs by comparing allele frequency differentiation between the selection and control regimes with a stringent F ST outlier approach (Lewontin and Krakauer 1973;Akey 2009) (Fig. 1A,B). The majority of SNPs (62.2%) showed no or less differentiation between the selection versus control regime as compared to differentiation within these regimes (selection signal-to-noise ratio ࣘ 0; Fig. 1B,C). ...
Article
Full-text available
Much has been learned about the genetics of aging from studies in model organisms, but still little is known about naturally occurring alleles that contribute to variation in longevity. For example, analysis of mutants and transgenes has identified insulin signaling as a major regulator of longevity, yet whether standing variation in this pathway underlies microevolutionary changes in lifespan and correlated fitness traits remains largely unclear. Here, we have analyzed the genomes of a set of Drosophila melanogaster lines that have been maintained under direct selection for postponed reproduction and indirect selection for longevity, relative to unselected control lines, for over 35 years. We identified many candidate loci shaped by selection for longevity and late‐life fertility, but – contrary to expectation – we did not find overrepresentation of canonical longevity genes. Instead, we found an enrichment of immunity genes, particularly in the Toll pathway, suggesting that evolutionary changes in immune function might underpin – in part – the evolution of late‐life fertility and longevity. To test whether this genomic signature is causative, we performed functional experiments. In contrast to control flies, long‐lived flies tended to downregulate the expression of antimicrobial peptides upon infection with age yet survived fungal, bacterial, and viral infections significantly better, consistent with alleviated immunosenescence. To examine whether genes of the Toll pathway directly affect longevity, we employed conditional knockdown using in vivo RNAi. In adults, RNAi against the Toll receptor extended lifespan, whereas silencing the pathway antagonist cactus‐–causing immune hyperactivation – dramatically shortened lifespan. Together, our results suggest that genetic changes in the age‐dependent regulation of immune homeostasis might contribute to the evolution of longer life.
... 的hmmSweep [37] 和XP-CLR(the cross-population composite likelihood ratio method) [38] , 以及Yi等人 [39] 的 PBS(population branch statistic)分别使用基于连锁不 平衡、选择扫荡和种群分化的方法检测自然选择, 均 取得了积极的研究进展. 根据Vitti等人 [34] 和Akey [40] [57,59] , F ST , iHS和XP-EHH(the cross-population extended haplotype homozygosity), 还结合了两种 作者提出的、基于频谱的检测方法: ∆DAF和∆iHH. [63] . ...
... For every locus, a single F ST value 582 was computed. 583 F ST depends on initial allele frequency (Jakobsson, Edge, & Rosenberg, 2013) and cannot 584 alone be used to generate significance thresholds (Akey, 2009). Therefore we implemented a 585 simulation-based test that incorporated the demographic history of the C15-H and C15-L 586 populations to allow for significance testing. ...
Preprint
Full-text available
Stalk lodging, breakage of the stalk at or below the ear, causes substantial yield losses in maize. The strength of the stalk rind, commonly measured as rind penetrometer resistance (RPR), is an important contributor to stalk lodging resistance. To enhance RPR genetic architecture, we conducted selection mapping on populations developed by 15 cycles of divergent selection for high (C15-H) and low (C15-L) RPR. We also performed time-course transcriptome and metabolic analyses on developing stalks of high (Hrpr1) and low (Lrpr1) RPR inbred lines derived from the C15-H and C15-L populations, respectively. Divergent selection significantly altered allele frequencies at 3,656 and 3,412 single nucleotide polymorphisms (SNP) in the C15-H and C15-L populations, respectively. While the majority of the SNPs under selection were unique, 110 SNPs were common in both populations indicating the fixation of alleles with alternative effects. Remarkably, preferential selection on the genomic regions associated with lignin and polysaccharide biosynthesis genes was observed in C15-H and C15-L populations, respectively. This observation was supported by higher lignification and lower extractability of cell wall-bound sugars in Hrpr1 compared to Lrpr1. Tricin, a monolignol important for incorporation of lignin in grass cell walls, emerged as a key determinant of the different cell wall properties of Hrpr1 and Lrpr1. Integration of selection mapping with transcriptomics and previous genetic studies on RPR identified 40 novel candidate genes including ZmMYB31, ZmNAC25, ZmMADS1, two PAL paralogues, two lichenases, ZmEXPA2, ZmIAA41, and Caleosin. Enhanced mechanistic and genetic understanding of RPR provides a foundation for improved stalk lodging resistance.
... However, 419 they carry several caveats and limitations. Like other outlier-based tests (Akey 2009), these 420 . CC-BY-NC-ND 4.0 International license was not certified by peer review) is the author/funder. ...
Preprint
Full-text available
Malaria has plausibly been the single strongest selective pressure on our species. Many of the best-characterized cases of adaptive evolution in humans are in genes tied to malaria resistance. However, the complex evolutionary patterns at these genes are poorly captured by standard scans for non-neutral evolution. Here we present three new statistical tests for selection based on population genetic patterns that are observed more than once among key malaria resistance loci. We assess these tests using forward-time evolutionary simulations and apply them to global whole-genome sequencing data from humans, and thus we show that they are effective at distinguishing selection from neutrality. Each test captures a distinct evolutionary pattern, here called Divergent Haplotypes, Repeated Shifts, and Arrested Sweeps, associated with a particular period of human prehistory. We clarify the selective signatures at known malaria-relevant genes and identify additional genes showing similar adaptive evolutionary patterns. Among our top outliers, we see a particular enrichment for genes involved in erythropoiesis and for genes previously associated with malaria resistance, consistent with a major role for malaria in shaping these patterns of genetic diversity. Polymorphisms at these genes are likely to impact resistance to malaria infection and contribute to ongoing host-parasite coevolutionary dynamics.
... While genome-wide scans with heuristically predetermined analysis regions are an established 266 approach, they are limited in their scope, resolution and power by requiring a prior choice of 267 the analysis regions. In the context of selection analysis, Akey fittingly compared the scan to a 268 hatchet and called for more refined scalpel-like approaches (AKEY 2009). We argue that in 269 ...
Preprint
Full-text available
Region-based genome-wide scans are usually performed by use of a priori chosen analysis regions. Such an approach will likely miss the region comprising the strongest signal and, thus, may result in increased type II error rates and decreased power. Here, we propose a genomic exhaustive scan approach that analyzes all possible subsequences and does not rely on a prior definition of the analysis regions. As a prime instance, we present a computationally ultra-efficient implementation using the rare-variant collapsing test for phenotypic association, the genomic exhaustive collapsing scan (GECS). Our implementation allows for the identification of regions comprising the strongest signals in large, genome-wide rare-variant association studies while controlling the family-wise error rate via permutation. Application of GECS to two genomic data sets revealed several novel significantly associated regions for age-related macular degeneration and for schizophrenia. Our approach also offers a high potential for genome-wide scans for selection, methylation and other analyses.
... In this study, we compute diverse evolutionary measures on sPTB-associated genomic regions to infer the action of multiple evolutionary forces (Table 1). While various methods to detect signatures of evolutionary forces exist, many of them lack approaches for determining statistically significant observations or rely on the genome-wide background distribution as the null expectation to determine statistical significance (e.g., outlierbased methods) 34,35 . Comparison to the genome-wide background distribution is appropriate in some contexts, but such outlier-based methods do not account for genomic attributes that may influence both the identification of variants of interest and the expected distribution of the evolutionary metrics, leading to false positives. ...
Article
Full-text available
Currently, there is no comprehensive framework to evaluate the evolutionary forces acting on genomic regions associated with human complex traits and contextualize the relationship between evolution and molecular function. Here, we develop an approach to test for signatures of diverse evolutionary forces on trait-associated genomic regions. We apply our method to regions associated with spontaneous preterm birth (sPTB), a complex disorder of global health concern. We find that sPTB-associated regions harbor diverse evolutionary signatures including conservation, excess population differentiation, accelerated evolution, and balanced polymorphism. Furthermore, we integrate evolutionary context with molecular evidence to hypothesize how these regions contribute to sPTB risk. Finally, we observe enrichment in signatures of diverse evolutionary forces in sPTB-associated regions compared to genomic background. By quantifying multiple evolutionary forces acting on sPTB-associated regions, our approach improves understanding of both functional roles and the mosaic of evolutionary forces acting on loci. Our work provides a blueprint for investigating evolutionary pressures on complex traits.
... While the study of small numbers of autosomal loci is less powerful for testing evolutionary scenarios with large numbers of parameters, specific loci with well-known functional roles can provide insights into the combination of demographic and selective events that shaped the distribution and variation of a specific gene or genomic region [17]. ...
Article
Full-text available
The American continent was the last to be occupied by modern humans, and native populations bear the marks of recent expansions, bottlenecks, natural selection, and population substructure. Here we investigate how this demographic history has shaped genetic variation at the strongly selected HLA loci. In order to disentangle the relative contributions of selection and demography process, we assembled a dataset with genome-wide microsatel-lites and HLA-A,-B,-C, and-DRB1 typing data for a set of 424 Native American individuals. We find that demographic history explains a sizeable fraction of HLA variation, both within and among populations. A striking feature of HLA variation in the Americas is the existence of alleles which are present in the continent but either absent or very rare elsewhere in the world. We show that this feature is consistent with demographic history (i.e., the combination of changes in population size associated with bottlenecks and subsequent population expansions). However, signatures of selection at HLA loci are still visible, with significant evidence selection at deeper timescales for most loci and populations, as well as population differentiation at HLA loci exceeding that seen at neutral markers. PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.
... Therefore, previous claims of signals of adaptation based mainly on the spatial distribution of allele frequencies [e.g. 52,53] should probably be taken with great caution or revised [54], and could partly explain some lack of consistency between the outcome of various genome scans [55]. It has also been recently recognized that there was an overall lack of evidence for strong signals of selective sweeps in the human genome [56][57][58], with only a handful of regions with fixed differences between continents [59]. ...
Preprint
Genetic surfing describes the spatial spread and increase in frequency of variants that are not lost by genetic drift and serial migrant sampling during a range expansion. Genetic surfing does not modify the total number of derived alleles in a population or in an individual genome, but it leads to a loss of heterozygosity along the expansion axis, implying that derived alleles are more often in homozygous state. Genetic surfing also affects selected variants on the wave front, making them behave almost like neutral variants during the expansion. In agreement with theoretical predictions, human genomic data reveals an increase in recessive mutation load with distance from Africa, an expansion load likely to have developed during the expansion of human populations out of Africa.
... [The Civian et al. (2015) analyses may also have other flaws (Huang and Han, 2015)]. The fact that we find little overlap among SS regions identified by different methods mirrors the lack of overlap of SS regions identified across the human genome by different studies (Akey, 2009), between domesticated grasses (Gaut, 2015), and between independent domestication events of common bean (Gaut, 2015). Because the inferred locations of SS regions vary markedly by method, sampling and taxon, they should be interpreted with caution, particularly as markers of independent domestication events. ...
Preprint
Full-text available
Many SNPs are predicted to encode deleterious amino acid variants. These slightly deleterious mutations can provide unique insights into population history, the dynamics of selection, and the genetic bases of phenotypes. This is especially true for domesticated species, where a history of bottlenecks and selection may affect the frequency of deleterious variants and signal a ‘cost of domestication’. Here we investigated the numbers and frequencies of deleterious variants in Asian rice ( O. sativa ), focusing on two varieties ( japonica and indica) and their wild relative ( O. rufipogon ). We investigated three signals of a potential cost of domestication in Asian rice relative to O. rufipogon : an increase in the frequency of deleterious SNPs (dSNPs), an enrichment of dSNPs compared to synonymous SNPs (sSNPs), and an increased number of deleterious variants. We found evidence for all three signals, and domesticated individuals contained ~3-4% more deleterious alleles than wild individuals. Deleterious variants were enriched within low recombination regions of the genome and experienced frequency increases similar to sSNPs within regions of putative selective sweeps. A characteristic feature of rice domestication was a shift in mating system from outcrossing to predominantly selfing. Forward simulations suggest that this shift in mating system may have been the dominant factor in shaping both deleterious and neutral diversity in rice.
... The availability of population genomic data has empowered efforts to uncover the selective, demographic, and stochastic forces driving patterns of genetic variation within species. Chief among these are attempts to uncover the genetic basis of recent adaptation [1]. Indeed, recent advances in genotyping and sequencing technologies have been accompanied by a proliferation of statistical methods for identifying recent positive selection [see 2 for recent review]. ...
Preprint
Full-text available
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
... The models of quantitative genetics have had a less dramatic impact on studies of evolutionary 59 adaptation, where genomes are often scanned to identify adaptive loci with large effects (Akey 2009). 60 ...
Preprint
Full-text available
Important traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome wide association studies and selection mapping protocols, are designed to identify individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique utilizes additive effects estimates from all available markers, and relates these estimates to allele frequency change over time. Using this information, we generate a composite statistic, denoted Ĝ , which can be used to test for significant evidence of selection on a trait. Our test requires pre- and post-selection genotypic data but only a single time point with phenotypic information. Simulations demonstrate that Ĝ is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection.
... C'est dans ce cadre que des méthodes plus sophistiquées sont apparues, en particulier pour l'humain, avec pour objectif de détecter plus finement les balayages sélectifs récents. Il existe plusieurs catégories de méthodes statistiques pour détecter les balayages sélectifs, qui peuvent être basées sur différents types de signaux (Akey, 2009). Certaines sont basées sur le spectre de fréquences alléliques, soit sur la base de statistiques comme le D de Tajima et ses variants qui en font un résumé, ou sur la base d'une comparaison de la vraisemblance des données (exploitant donc le jeu de données de manière plus complète) sous des modèles avec et sans sélection . ...
Thesis
Melampsora larici-populina est un champignon pathogène responsable de la rouille foliaire sur les peupliers, causant de graves dommages dans les plantations du monde entier. Presque toutes les résistances des peupliers déployées en France ont été contournées et l'événement majeur est survenu en 1994 avec le contournement de la résistance R7 largement déployée en populiculture. Dans le but d'identifier des gènes candidats liés à la pathogénicité, j'ai mené une étude de génomique comparative basée sur le séquençage de 15 isolats. Cette analyse a mis à jour des patrons de polymorphisme corrélés à la distribution des virulences au sein des isolats tout en révélant la nécessité d'une étude populationnelle. Pour se faire, une étude de génétique des population basée sur le génotypage de 600 isolats de M. larici-populina échantillonnés de 1992 à 2012 a été conduite. Cette analyse m'a permis de décrire l'histoire démographique des populations de M. larici-populina et de documenter l'impact majeur du contournement de la R7 sur la structure des populations. Enfin, j'ai mené une analyse de génomique des populations afin d'obtenir un scenario démographique décrivant les liens historiques entre les populations et d'identifier les régions sous sélection. Cette analyse est basée sur le séquençage en Illumina de 86 isolats répartis en quatre populations clés mises en évidence par l'analyse de la structure génétique des populations. Plus de 1 000 000 positions polymorphes ont été identifiées. Un scenario fiable a été identifié par ABC à partir duquel les enveloppes de confiance des indices de génétique des populations ont été mesurées. L'analyse du génome scan qui a été réalisée sur les 86 génomes en utilisant ces mêmes indices a révélée 20 régions génomiques contenant 14 gènes candidats potentiellement impliquées dans le contournement de la R7.
... In contrast to the stochastic effects of drift, selection systematically alters allele frequencies by favoring particular alleles at the expense of others as a result of their effects on fitness. Researchers often study drift by excluding potentially selected sites 2,3,4 , or selection by focusing on site-specific patterns under the assumption that genome-wide diversity reflects primarily the action of drift 5 . ...
Preprint
Genetic diversity is shaped by the interaction of drift and selection, but the details of this interaction are not well understood. The impact of genetic drift in a population is largely determined by its demographic history, typically summarized by its long-term effective population size ( N e ). Rapidly changing population demographics complicate this relationship, however. To better understand how changing demography impacts selection, we used whole-genome sequencing data to investigate patterns of linked selection in domesticated and wild maize (teosinte). We produce the first whole-genome estimate of the demography of maize domestication, showing that maize was reduced to approximately 5% the population size of teosinte before it experienced rapid expansion post-domestication to population sizes much larger than its ancestor. Evaluation of patterns of nucleotide diversity in and near genes shows little evidence of selection on beneficial amino acid substitutions, and that the domestication bottleneck led to a decline in the efficiency of purifying selection in maize. Young alleles, however, show evidence of much stronger purifying selection in maize, reflecting the much larger effective size of present day populations. Our results demonstrate that recent demographic change — a hallmark of many species including both humans and crops — can have immediate and wide-ranging impacts on diversity that conflict with would-be expectations based on N e alone.
... There is significant recombination between FY*A and FY*B as, unlike 116 FY*O, they coexist in many populations. 117 Evidence of selection in DARC 118 Evidence of positive selection at FY*O Despite FY*O's biological support for 119 positive selection, it has not been identified as a potential selected region in many genome-wide 120 selection scans [27][28][29][30][31][32][33][34][35]. Accordingly, we find the DARC promoter region is not an outlier in the 121 genome with respect to segregating sites, average number of pairwise differences nor Tajima's D 122 (S3 Table). ...
Preprint
Full-text available
The human DARC (Duffy antigen receptor for chemokines) gene encodes a membrane-bound chemokine receptor crucial for the infection of red blood cells by Plasmodium vivax , a major causative agent of malaria. Of the three major allelic classes segregating in human populations, the FY*O allele has been shown to protect against P. vivax infection and is near fixation in sub-Saharan Africa, while FY*B and FY*A are common in Europe and Asia, respectively. Due to the combination of its strong geographic differentiation and association with malaria resistance, DARC is considered a canonical example of a locus under positive selection in humans. Here, we use sequencing data from over 1,000 individuals in twenty-one human populations, as well as ancient human and great ape genomes, to analyze the fine scale population structure of DARC. We estimate the time to most recent common ancestor (T MRCA ) of the FY*O mutation to be 42 kya (95% CI: 34–49 kya). We infer the FY*O null mutation swept to fixation in Africa from standing variation with very low initial frequency (0.1%) and a selection coefficient of 0.043 (95% CI:0.011–0.18), which is among the strongest estimated in the genome. We estimate the T MRCA of the FY*A mutation to be 57 kya (95% CI: 48–65 kya) and infer that, prior to the sweep of FY*O, all three alleles were segregating in Africa, as highly diverged populations from Asia and ≠Khomani San hunter-gatherers share the same FY*A haplotypes. We test multiple models of admixture that may account for this observation and reject recent Asian or European admixture as the cause. Author Summary Infectious diseases have undoubtedly played an important role in ancient and modern human history. Yet, there are relatively few regions of the genome involved in resistance to pathogens that have shown a strong selection signal. We revisit the evolutionary history of a gene associated with resistance to the most common malaria-causing parasite, Plasmodium vivax , and show that it is one of regions of the human genome that has been under strongest selective pressure in our evolutionary history (selection coefficient: 5%). Our results are consistent with a complex evolutionary history of the locus involving selection on a mutation that was at a very low frequency in the ancestral African population (standing variation) and a large differentiation between European, Asian and African populations.
... Combined with the PBS method, only 87 and 45 PSGs were detected by all four methods in SA and SL breeds, respectively ( Figures S8 and S9). The low number of overlapped PSGs could have resulted from different signatures of population variations in the four methods (Akey, 2009;Sabeti et al., 2006;Wang et al., 2016). Finally, we obtained 942 PSGs in SA breed and 923 PSGs in SL breed after integrating all candidate selection regions from the four methods, and 150 of them were found in both SA and SL breeds ( Figure S10). ...
Article
Full-text available
The genetic footprints of adaptations to naturally occurring tropical stress along with domestication are poorly reported in chickens. Here, by conducting population genomic analyses of 67 chickens inhabiting distinct climates, we found signals of gene flow from Tibetan chickens to Sri Lankan and Saudi Arabian breeds and identified 12 positively selected genes that are likely involved in genetic adaptations to both tropical desert and tropical monsoon island climates. Notably, in tropical desert climate, advantageous alleles of TLR7 and ZC3HAV1, which could inhibit replication of viruses in cells, suggest immune adaptation to the defense against zoonotic diseases in chickens. Furthermore, comparative genomic analysis showed that four genes (OC90, PLA2G12B, GPR17 and TNFRSF11A) involved in arachidonic acid metabolism have undergone convergent adaptation to tropical desert climate between birds and mammals. Our study offers insights into the genetic mechanisms of adaptations to tropical climates in birds and other animals and provides practical value for breeding design and medical research on avian viruses.
... iv) Over the past decade, both empirical data and theoretical advances have sufficiently accumulated to suggest that adaptive evolution is not mutation-limited in natural populations but natural selection have [106,107] . Many authors claimed that both empirical data and theoretical advances have sufficiently accumulated to suggest that genetic drift and small population size have no adaptive value but natural selection have [108][109][110][111] . Thus, according to numerous evolutionary biologists neutral theory is opposite to any kind of evolution. ...
Article
Full-text available
The objective of this article is to prove that "Seven non-Darwinian theories opposite to evolution". However, the genetic drift represents the punctuated equilibrium, the shifting balance theory, the allopatric speciation theory and the species selection theory for the macroevolution. The genetic drift rapidly works in a small and isolated population and not works in a large population. Hence, genetic drift means small and isolated population and vice-versa. But the genetic drift creates zero variation. But there is no variation (raw materials of any kind of evolution); there is no evolution. Hence, evolutionary biologists rejected genetic drift for any kind of evolution. Again, genetic drift means small and isolated but those populations have to mate with their close relative and produced homozygous organisms. Homozygous organisms have low fecundity, suffer from various diseases, least fitted to survive and may extinct suddenly, e.g. American Heath hen. Thus, small populations and isolated populations (i.e. genetic drift) are opposite to any kind of evolution, even risk for extinction. However, genetic drift is also the key force of Neutral theory, which works in smalls and isolated populations. Consequently, Neutral theory is opposite to any kind of evolution. So, many evolutionary biologists rejected Neutral theory. Once more, evolutionary biologists rejected the shifting balance theory, the punctuated equilibrium theory and of Goldschmidt's theory. Gould and Wright advocated chromosomal speciation (chromosome rearrangements) theory for macroevolution but which are not valid. Moreover, extinction is the main process of the macroevolution, which is quite absurd. The fossil is the excellent and only evidence of those theories of macroevolution. But fossil completely opposes macroevolution. So, those seven non-Darwinian theories are opposite to any kind of evolution. Consequently, the Darwinists, the neo-Darwinists and the Sociobiology's oppose those non-Darwinian. Subsequently, plants and animals including human are not evolved via those theories.
... However, the rapid advancement of genome sequencing, including methods that use reduced genomic complexity (e.g., genotyping by sequencing (GBS), restriction-site associated DNA sequencing (RADseq)), has opened the door to more comprehensive assessments of population-level diversity and allowed for the detection of regions under selection. Although some instances of strong selection on single or few gene loci have been noted (Akey, 2009;Linnen, Kingsley, Jensen, & Hoekstra, 2009;Sella, Petrov, Przeworski, & Andolfatto, 2009), many traits of adaptive importance in plants are believed to be polygenic in nature (Holland, 2007;Le Corre & Kremer, 2012;Pritchard & Di Rienzo, 2010;Yeaman et al., 2016). Under selection, these traits can exhibit subtle changes in frequency across many loci of small effect. ...
Article
Full-text available
Uncovering the genetic basis of local adaptation is a major goal of evolutionary biology and conservation science alike. In an era of climate change, an understanding of how environmental factors shape adaptive diversity is crucial to predicting species response and directing management. Here, we investigate patterns of genomic variation in giant sequoia, an iconic and ecologically important tree species, using 1,364 bi‐allelic single nucleotide polymorphisms (SNPs). We use an FST outlier test and two genotype–environment association methods, latent factor mixed models (LFMMs) and redundancy analysis (RDA), to detect complex signatures of local adaptation. Results indicate 79 genomic regions of potential adaptive importance, with limited overlap between the detection methods. Of the 58 loci detected by LFMM, 51 showed strong correlations to a precipitation‐driven composite variable and seven to a temperature‐related variable. RDA revealed 24 outlier loci with association to climate variables, all of which showed strongest relationship to summer precipitation. Nine candidate loci were indicated by two methods. After correcting for geographic distance, RDA models using climate predictors accounted for 49% of the explained variance and showed significant correlations between SNPs and climatic factors. Here, we present evidence of local adaptation in giant sequoia along gradients of precipitation and provide a first step toward identifying genomic regions of adaptive significance. The results of this study will provide information to guide management strategies that seek to maximize adaptive potential in the face of climate change.
... We searched for further associations of this gene in the whole GWAS Catalog database, finding SNP rs675209 mapping also in RREB1 and associated with modulation of uric acid levels in blood 99 Figure 6). Moreover, this gene has also been pinpointed as a candidate of positive selection by Akey et al. 76 When comparing CEU vs. YRI, XP-EHH reaches significant values between ~21,925,000 and ~21,937,000 bp, although is not appreciated due to zoom level. F st reached also significant values for SNPs within this same region. ...
Thesis
Full-text available
Old age comes coupled with frailty and disease and, thus, the ageing of the World’s population has spurred the interest on the causes and mechanisms of senescence. Senescence has long been a mystery, with no single universally accepted theory accounting for its ultimate evolutionary causes (if indeed these causes exist). Perhaps two of the most popular evolutionary explanations proposed so far are the Mutation Accumulation Theory proposed by Peter Medawar in 1951 and the Antagonistic Pleiotropy Theory, suggested by George C. Williams in 1957. The large amount of data derived from Genome-Wide Association Studies (GWAS) obtained over the last decade allows testing both theories, provided that they can make predictions in terms of the genetic architecture of complex human disease. However, if we want to take advantage from GWAS data, we need to assure that they are sound, replicable and that they contain information that is useful for our purposes. This PhD thesis deals with both goals: we first assess the quality and replicability of information on genome-disease associations and then we use it to explore the Mutation Accumulation and Antagonistic Pleiotropy theories of senescence. Knowledge about the impact of these theories will be important for an increasingly ageing population.
... By virtue of large genome-wide association studies and population sequencing efforts, humans have become somewhat of a model organism for population genomics (see reviews by Akey (2009) and Scheinfeldt and Tishkoff (2013)). Some human studies combining selection mapping and genetic mapping would fit in the genome annotation category (for example, studies identifying a selective sweeps that overlap genes associated with Mendelian traits that may give hints about their function, such as albinism-associated genes in Voight et al. (2006)). ...
Article
Full-text available
Genomic scans for signatures of selection allow us to, in principle, detect variants and genes that underlie recent adaptations. By combining selection mapping with genetic mapping of traits known to be relevant to adaptation, we can simultaneously investigate whether genes and variants show signals of recent selection and whether they impact traits that have likely been selected. There are three ways to integrate selection mapping with genetic mapping or functional genomics: (1) To use genetic mapping data from other populations as a form of genome annotation. (2) To perform experimental evolution or artificial selection to be able to study selected variants when they segregate, either by performing genetic mapping before selection or by crossing the selected individuals to some reference population. (3) To perform a comparative study of related populations facing different selection regimes. This short review discusses these different ways of integrating selection mapping with genetic mapping and functional genomics, with examples of how each has been done.
Article
Full-text available
Genetic variation among the individuals is considered as an important tool for conservation of livestock animals. Application of molecular markers to determine genetic variation between populations has been widely used in recent years. Therefore, this study conducted to analyze genetic diversity using F statistics in populations of Camelus dromedaries in north of Kerman using 8 autosomal microsatellite markers (YWLL08, VOLP03, VOLP08, YWLL38, CVR01, YWLL44, VOLP32 and VOLP67). Eighty one blood samples were collected from Shahr-e Babak, Rafsanjan and Ravar. Total DNAs of the samples using salting out were extracted and applied for genotyping analysis. The result showed that the highest and the lowest allele number and effective alleles are shown in YWLL08 (21 and 14.9) and VOLP32 (4 and 3.11), respectively. Fixation index (FST) values for markers YWLL08, VOLP03, VOLP08, YWLL38, CVR01, YWLL44, VOLP32 and VOLP67 was obtained 0.036, 0.088, 0.080, 0.045, 0.054, 0.0698, 0.014 and 0.060, respectively. This result showed that differentiation is low between populations. The highest gene flow obtained between Shahr-e Babak population and Rafsanjan2 samples (15.83) and the lowest gene flow was observed between the two populations of Ravar (6.49). In general, it can be concluded that Camelus dromedarius in north of Kerman has approximately high genetic diversity and microsatellite markers have approximately high polymorphism and therefore can be used for genetic studies.
Article
Full-text available
Modern dogs are distinguished among domesticated species by the vast breadth of phenotypic variation produced by strong and consistent human-driven selective pressure. The resulting breeds reflect the development of closed populations with well-defined physical and behavioral attributes. The sport-hunting dog group has long been employed in assistance to hunters, reflecting strong behavioral pressures to locate and pursue quarry over great distances and variable terrain. Comparison of whole-genome sequence data between sport-hunting and terrier breeds, groups at the ends of a continuum in both form and function, reveals that genes underlying cardiovascular, muscular, and neuronal functions are under strong selection in sport-hunting breeds, including ADRB1, TRPM3, RYR3, UTRN, ASIC3, and ROBO1 We also identified an allele of TRPM3 that was significantly associated with increased racing speed in Whippets, accounting for 11.6% of the total variance in racing performance. Finally, we observed a significant association of ROBO1 with breed-specific accomplishments in competitive obstacle course events. These results provide strong evidence that sport-hunting breeds have been adapted to their occupations by improved endurance, cardiac function, blood flow, and cognitive performance, demonstrating how strong behavioral selection alters physiology to create breeds with distinct capabilities.
Preprint
Full-text available
Human pregnancy requires the coordinated function of multiple tissues in both mother and fetus and has evolved in concert with major human adaptations. As a result, pregnancy-associated phenotypes and related disorders are genetically complex and have likely been sculpted by diverse evolutionary forces. However, there is no framework to comprehensively evaluate how these traits evolved or to explore the relationship of evolutionary signatures on trait-associated genetic variants to molecular function. Here we develop an approach to test for signatures of diverse evolutionary forces, including multiple types of selection, and apply it to genomic regions associated with spontaneous preterm birth (sPTB), a complex disorder of global health concern. We find that sPTB-associated regions harbor diverse evolutionary signatures including evolutionary sequence conservation (consistent with the action of negative selection), excess population differentiation (local adaptation), accelerated evolution (positive selection), and balanced polymorphism (balancing selection). Furthermore, these genomic regions show diverse functional characteristics which enables us to use evolutionary and molecular lines of evidence to develop hypotheses about how these genomic regions contribute to sPTB risk. In summary, we introduce an approach for inferring the spectrum of evolutionary forces acting on genomic regions associated with complex disorders. When applied to sPTB-associated genomic regions, this approach both improves our understanding of the potential roles of these regions in pathology and illuminates the mosaic nature of evolutionary forces acting on genomic regions associated with sPTB.
Article
Significance: Acute respiratory distress syndrome (ARDS) is a severe, highly heterogeneous critical illness with staggering mortality that is influenced by environmental factors, such as mechanical ventilation, and genetic factors. Significant unmet needs in ARDS are addressing the paucity of validated predictive biomarkers for ARDS risk and susceptibility that hamper the conduct of successful clinical trials in ARDS and the complete absence of novel disease-modifying therapeutic strategies. Recent Advances: The current ARDS definition relies on clinical characteristics that fail to capture the diversity of disease pathology, severity, and mortality risk. We undertook a comprehensive survey of the available ARDS literature to identify genes and genetic variants (candidate gene and limited genome-wide association study approaches) implicated in susceptibility to developing ARDS in hopes of uncovering novel biomarkers for ARDS risk and mortality and potentially novel therapeutic targets in ARDS. We further attempted to address the well-known health disparities that exist in susceptibility to and mortality from ARDS. Critical Issues: Bioinformatic analyses identified 201 ARDS candidate genes with pathway analysis indicating a strong predominance in key evolutionarily conserved inflammatory pathways, including reactive oxygen species, innate immunity-related inflammation, and endothelial vascular signaling pathways. Future Directions: Future studies employing a system biology approach that combines clinical characteristics, genomics, transcriptomics, and proteomics may allow for a better definition of biologically relevant pathways and genotype-phenotype connections and result in improved strategies for the sub-phenotyping of diverse ARDS patients via molecular signatures. These efforts should facilitate the potential for successful clinical trials in ARDS and yield a better fundamental understanding of ARDS pathobiology.
Article
The acute respiratory distress syndrome (ARDS) phenotype was first described over 50 years ago and since that time significant progress has been made in understanding the biologic processes underlying the syndrome. Despite this improved understanding, no pharmacologic therapies aimed at the underlying biology have been proven effective in ARDS. Increasingly, ARDS has been recognized as a heterogeneous syndrome characterized by subphenotypes with distinct clinical, radiographic, and biologic differences, distinct outcomes, and potentially distinct responses to therapy. The Berlin Definition of ARDS specifies three severity classifications: mild, moderate, and severe based on the PaO2 to FiO2 ratio. Two randomized controlled trials have demonstrated a potential benefit to prone positioning and neuromuscular blockade in moderate to severe phenotypes of ARDS only. Precipitating risk factor, direct versus indirect lung injury, and timing of ARDS onset can determine other clinical phenotypes of ARDS after admission. Radiographic phenotypes of ARDS have been described based on a diffuse versus focal pattern of infiltrates on chest imaging. Finally and most promisingly, biologic subphenotypes or endotypes have increasingly been identified using plasma biomarkers, genetics, and unbiased approaches such as latent class analysis. The potential of precision medicine lies in identifying novel therapeutics aimed at ARDS biology and the subpopulation within ARDS most likely to respond. In this review, we discuss the challenges and approaches to subphenotype ARDS into clinical, radiologic, severity, and biologic phenotypes with an eye toward the future of precision medicine in critical care.
Article
Full-text available
Selection not only increases the frequency of new-useful mutations but also remains some signals throughout the genome. Since these areas are often control economically important traits, identify and tracking these areas is the most important issue in the animal genetics. The aim of this study was to detect signals of selection in the genome of Turkmen horse using 70K SNP chip. Twenty-three Turkmen horses were selected from different areas of Gonbad-e kavuos. After blood sampling and DNA extraction all samples were genotyped to detect footprint of signal selection, some tests based on linkage disequilibrium (LD) such as extended haplotype homozygosity (EHH) and integrated haplotype score (iHS) was used. For identification of the regions on the genome that contains the most signals of selection, iHS statistics were used and accordingly 6 genomic regions which were in the 99.99% percentile of iHS values selected for further analysis. These regions were located in 6 areas on chromosomes 4, 5, 7, 8, 9 and 10. Results of EHH test with bifurcation diagram of haplotype confirmed signals of selection in these areas. Based on the results of the EHH test, sharp decay of LD in some regions was observed (chromosomes 7, 9 and 10) while in other regions it wasn’t so significant (chromosomes 4, 5 and 8). So that alleles on chromosomes 4,5 and 8 had long range of LD with the frequency of, %43, %52, and %37, therefore, it can be stated that respectively, these regions of the genome of Turkmen horse most likely has been the target of positive selection.
Article
Classic selective sweeps occur when positive selection increases a variant's frequency from low to high in a population, and underlie some long‐studied human characteristics such as variation in skin, hair or eye colour. In such well‐studied ‘gold standard’ examples, a known variant has been associated with a plausible phenotype and underlying selective force. Signatures of classic sweeps have more recently been detected in population‐genetic data independently of any prior information about the corresponding phenotype or selective force, and usually without suggesting any insights into these. Motivated by the need to understand such candidates, we first review the gold standards and show that our understanding of them is often incomplete or unconvincing; only two of the examples we consider are compellingly explained. We assess approaches for large‐scale association of classic sweep candidate variants to phenotypes and selective forces, test these on the gold standards, and discuss the standards of evidence needed to adequately understand a selective sweep. This article is protected by copyright. All rights reserved.
Preprint
Full-text available
Traditional genome-wide scans for positive selection have mainly uncovered selective sweeps associated with monogenic traits. While selection on quantitative traits is much more common, very few signals have been detected because of their polygenic nature. We searched for positive selection signals underlying coronary artery disease (CAD) in worldwide populations, using novel approaches to quantify relationships between polygenic selection signals and CAD genetic risk. We identified new candidate adaptive loci that appear to have been directly modified by disease pressures given their significant associations with CAD genetic risk. These candidates were all uniquely and consistently associated with many different male and female reproductive traits suggesting selection may have also targeted these because of their direct effects on fitness. This suggests the presence of widespread antagonistic-pleiotropic tradeoffs on CAD loci, which provides a novel explanation for the maintenance and high prevalence of CAD in modern humans. Lastly, we found that positive selection more often targeted CAD gene regulatory variants using HapMap3 lymphoblastoid cell lines, which further highlights the unique biological significance of candidate adaptive loci underlying CAD. Our study provides a novel approach for detecting selection on polygenic traits and evidence that modern human genomes have evolved in response to CAD-induced selection pressures and other early-life traits sharing pleiotropic links with CAD. Author Summary How genetic variation contributes to disease is complex, especially for those such as coronary artery disease (CAD) that develop over the lifetime of individuals. One of the fundamental questions about CAD — whose progression begins in young adults with arterial plaque accumulation leading to life-threatening outcomes later in life — is why natural selection has not removed or reduced this costly disease. It is the leading cause of death worldwide and has been present in human populations for thousands of years, implying considerable pressures that natural selection should have operated on. Our study provides new evidence that genes underlying CAD have recently been modified by natural selection and that these same genes uniquely and extensively contribute to human reproduction, which suggests that natural selection may have maintained genetic variation contributing to CAD because of its beneficial effects on fitness. This study provides novel evidence that CAD has been maintained in modern humans as a byproduct of the fitness advantages those genes provide early in human lifecycles.
Article
Over the past few years several methodological and data-driven advances have greatly improved our ability to robustly detect genomic signatures of selection in humans. New methods applied to large samples of present-day genomes provide increased power, while ancient DNA allows precise estimation of timing and tempo. However, despite these advances, we are still limited in our ability to translate these signatures into understanding about which traits were actually under selection, and why. Combining information from different populations and timescales may allow interpretation of selective sweeps. Other modes of selection have proved more difficult to detect. In particular, despite strong evidence of the polygenicity of most human traits, evidence for polygenic selection is weak, and its importance in recent human evolution remains unclear. Balancing selection and archaic introgression seem important for the maintenance of potentially adaptive immune diversity, but perhaps less so for other traits.
Preprint
Full-text available
Isolated populations with novel phenotypes present an exciting opportunity to uncover the genetic basis of ecologically significant adaptation, and genomic scans for positive selection in such populations have often, but not always, led to candidate genes directly related to an adaptive phenotype. However, in many cases these populations were established by a severe bottleneck, which can make identifying targets of selection problematic. Here we simulate severe bottlenecks and subsequent selection on standing variation, mimicking adaptation after establishment of a new small population, such as an island or an artificial selection experiment. Using simulations of single loci under positive selection and population genetics theory, we examine how population size and age of the population isolate affects the ability of outlier scans for selection to identify adaptive alleles using both single site measures and haplotype structure. We find and explain an optimal combination of selection strength, starting frequency, and age of the adaptive allele, which we refer to as a Goldilocks zone, where adaptation is likely to occur, and yet the adaptive variants are most likely to derive from a single ancestor (a “hard” selective sweep); in this zone, four commonly used statistics detect selection with high power. Real-world examples of both island colonization and experimental evolution studies are discussed. Our study provides concrete considerations to be made before embarking on whole genome sequencing of differentiated populations.
Preprint
Full-text available
A composite likelihood ratio test implemented in the program SweepFinder is a commonly used method for scanning a genome for recent selective sweeps. SweepFinder uses information on the spatial pattern of the site frequency spectrum (SFS) around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson-Kreitman-Aguadé test, we suggest to add fixed differences relative to an outgroup to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection modeled as a local reduction in the effective population size. Using simulations we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.
Article
Full-text available
The cereal pathogen Fusarium graminearum is the primary cause of Fusarium head blight (FHB) and a significant threat to food safety and crop production. To elucidate population structure and identify genomic targets of selection within major FHB pathogen populations in North America we sequenced the genomes of 60 diverse F. graminearum isolates. We also assembled the first pan-genome for F. graminearum to clarify population-level differences in gene content potentially contributing to pathogen diversity. Bayesian and phylogenomic analyses revealed genetic structure associated with isolates that produce the novel NX-2 mycotoxin, suggesting a North American population that has remained genetically distinct from other endemic and introduced cereal-infecting populations. Genome scans uncovered distinct signatures of selection within populations, focused in high diversity, frequently recombining regions. These patterns suggested selection for genomic divergence at the trichothecene toxin gene cluster and thirteen additional regions containing genes potentially involved in pathogen specialization. Gene content differences further distinguished populations, in that 121 genes showed population-specific patterns of conservation. Genes that differentiated populations had predicted functions related to pathogenesis, secondary metabolism and antagonistic interactions, though a subset had unique roles in temperature and light sensitivity. Our results indicated that F. graminearum populations are distinguished by dozens of genes with signatures of selection and an array of dispensable accessory genes, suggesting that FHB pathogen populations may be equipped with different traits to exploit the agroecosystem. These findings provide insights into the evolutionary processes and genomic features contributing to population divergence in plant pathogens, and highlight candidate genes for future functional studies of pathogen specialization across evolutionarily and ecologically diverse fungi.
Article
Full-text available
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional ∼12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Article
Full-text available
Genetic variation among individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single nucleotide changes. Here we explore variation on an intermediate scale—particularly insertions, deletions and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1,695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number between individuals. Complete sequencing of 261 structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence map of human structural variation—a standard for genotyping platforms and a prelude to future individual genome sequencing projects.
Article
Full-text available
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
Article
Full-text available
DDBJ/EMBL/GenBank accession no. { Recessive splice site and nonsense mutations of PCDH15, encoding protocadherin 15, are known to cause deafness and retinitis pigmentosa in Usher syndrome type 1F (USH1F). Here we report that non-syndromic recessive hearing loss (DFNB23) is caused by missense mutations of PCDH15. This suggests a genotype– phenotype correlation in which hypomorphic alleles cause non-syndromic hearing loss, while more severe mutations of this gene result in USH1F. We localized protocadherin 15 to inner ear hair cell stereocilia, and to retinal photoreceptors by immunocytochemistry. Our results further strengthen the importance of protocadherin 15 in the morphogenesis and cohesion of stereocilia bundles and retinal photoreceptor cell maintenance or function.
Article
Full-text available
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
Article
Full-text available
It is suggested that the major prehistoric human colonizations of Oceania occurred twice, namely, about 50,000 and 4,000 years ago. The first settlers are considered as ancestors of indigenous people in New Guinea and Australia. The second settlers are Austronesian-speaking people who dispersed by voyaging in the Pacific Ocean. In this study, we performed genome-wide single-nucleotide polymorphism (SNP) typing on an indigenous Melanesian (Papuan) population, Gidra, and a Polynesian population, Tongans, by using the Affymetrix 500K assay. The SNP data were analyzed together with the data of the HapMap samples provided by Affymetrix. In agreement with previous studies, our phylogenetic analysis indicated that indigenous Melanesians are genetically closer to Asians than to Africans and European Americans. Population structure analyses revealed that the Tongan population is genetically originated from Asians at 70% and indigenous Melanesians at 30%, which thus supports the so-called Slow train model. We also applied the SNP data to genome-wide scans for positive selection by examining haplotypic variation and identified many candidates of locally selected genes. Providing a clue to understand human adaptation to environments, our approach based on evolutionary genetics must contribute to revealing unknown gene functions as well as functional differences between alleles. Conversely, this approach can also shed some light onto the invisible phenotypic differences between populations.
Article
Full-text available
One of this century's leading evolutionary biologists, Motoo Kimura revolutionized the field with his random drift theory of molecular evolution—the neutral theory—and his groundbreaking theoretical work in population genetics. This volume collects 57 of Kimura's most important papers and covers forty years of his diverse and original contributions to our understanding of how genetic variation affects evolutionary change. Kimura's neutral theory, first presented in 1968, challenged the notion that natural selection was the sole directive force in evolution. Arguing that mutations and random drift account for variations at the level of DNA and amino acids, Kimura advanced a theory of evolutionary change that was strongly challenged at first and that eventually earned the respect and interest of evolutionary biologists throughout the world. This volume includes the seminal papers on the neutral theory, as well as many others that cover such topics as population structure, variable selection intensity, the genetics of quantitative characters, inbreeding systems, and reversibility of changes by random drift. Background essays by Naoyuki Takahata examine Kimura's work in relation to its effects and recent developments in each area.
Article
Full-text available
The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.
Article
Full-text available
Mutations in the human gene ALMS1 result in Alström Syndrome, which presents with early childhood obesity and insulin resistance leading to Type 2 diabetes. Previous genomewide scans for selection in the HapMap data based on linkage disequilibrium and population structure suggest that ALMS1 was subject to recent positive selection. Through a detailed population genomic analysis of existing genomewide data sets and new resequencing data obtained in geographically diverse populations, we find that the signature of selection at ALMS1 is considerably more complex than what would be expected for an idealized model of a selective sweep acting on a newly arisen advantageous mutation. Specifically, we observed three highly divergent and globally dispersed haplogroups, two of which carry a set of seven derived nonsynonymous single nucleotide polymorphisms that are nearly fixed in Asian populations. Our data suggest that the interaction of human demographic history and positive selection on standing variation in Eurasian populations approximately 15 thousand years ago parsimoniously explains the spectrum of extant ALMS1 variation. These results provide new insights into the evolutionary history of ALMS1 in humans and suggest that selective events identified in genomewide scans may be more complex than currently appreciated.
Article
Full-text available
The variation in gene frequency among populations or between generations within a population is a result of breeding structure and selection. But breeding structure should affect all loci and alleles in the same way. If there is significant heterogeneity between loci in their apparent inbreeding coefficients F=s(p) (2)/p(1-p), this heterogeneity may be taken as evidence for selection. We have given the statistical properties of F and shown how tests of heterogeneity can be made. Using data from human populations we have shown highly significant heterogeneity in F values for human polymorphic genes over the world, thus demonstrating that a significant fraction of human polymorphisms owe their current gene frequencies to the action of natural selection. We have also applied the method to temporal variation within a population for data on Dacus oleae and have found no significant evidence of selection.
Article
Full-text available
A class of statistical tests based on molecular polymorphism data is studied to determine size and power properties. The class includes Tajima's D statistic as well as the D* and F* tests proposed by Fu and Li. A new method of constructing critical values for these tests is described. Simulations indicate that Tajima's test is generally most powerful against the alternative hypotheses of selective sweep, population bottleneck, and population subdivision, among tests within this class. However, even Tajima's test can detect a selective sweep or bottleneck only if it has occurred within a specific interval of time in the recent past or population subdivision only when it has persisted for a very long time. For greatest power against the particular alternatives studied here, it is better to sequence more alleles than more sites.
Article
... The PANTHER database (http:// panther .celera.com) was designed as a resource to comprehensively and consistently treat both family and subfamily classification of proteins, focused on metazoans but also covering other organisms. Rationale. ...
Article
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.
Article
Most evolutionary change in proteins may be due to neutral mutations and genetic drift.
Article
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
Article
Can anything be less certain than chance? Nature can, according to these views.
Article
Quantitative consideration of the operation and evolution of the information-processing system that constitutes the terrestrial biosphere indicates that non-Darwinian evolutionary changes cannot be expected to account for the increase in the amount of non-random biospheric structure that constitutes the information content of the biosphere. Changes that lead to such an increase must also be selectively advantageous and lead to preferential survival through natural selection.
Article
A family of species diversity measures proposed by Hurlbert (1971) is defined as the expected number of species in a random sample of m individuals from a population. For m = 2 this measure is equivalent to Simpson's diversity index. For larger m, the measure is increasingly sensitive to rare species. In this paper we use unbiased estimation theory to obtain a minimum variance unbiased estimator for this family of diversity measures. An unbiased estimator of the sampling variance is also obtained. These results are then used to partition the variation in sample diversity between random sampling error and local variation community diversity.
Article
Agriculture has long been regarded as an improvement in the human condition: Once Homo sapiens made the transition from foraging to farming in the Neolithic, health and nutrition improved, longevity increased, and work load declined. Recent study of archaeological human remains worldwide by biological anthropologists has shown this characterization of the shift from hunting and gathering to agriculture to be incorrect. Contrary to earlier models, the adoption of agriculture involved an overall decline in oral and general health. This decline is indicated by elevated prevalence of various skeletal and dental pathological conditions and alterations in skeletal and dental growth patterns in prehistoric farmers compared with foragers. In addition, changes in food composition and preparation technology contributed to craniofacial and dental alterations, and activity levels and mobility decline resulted in a general decrease in skeletal robusticity. These findings indicate that the shift from food collection to...
Article
Equations (3) and (4) describe the changes undergone by a Mendelian population mating at random, and under intense selection.(Received February 02 1932)(Accepted March 07 1932)
Article
• In considering the Origin of Species, it is quite conceivable that a naturalist, reflecting on the mutual affinities of organic beings, on their embryological relations, their geographical distribution, geological succession, and other such facts, might come to the conclusion that each species had not been independently created, but had descended, like varieties, from other species. Nevertheless, such a conclusion, even if well founded, would be unsatisfactory, until it could be shown how the innumerable species inhabiting this world have been modified, so as to acquire that perfection of structure and coadaptation which most justly excites our admiration. Naturalists continually refer to external conditions, such as climate, food, &c, as the only possible cause of variation. In one very limited sense, as we shall hereafter see, this may be true; but it is preposterous to attribute to mere external conditions, the structure, for instance, of the woodpecker, with its feet, tail, beak, and tongue, so admirably adapted to catch insects under the bark of trees. In the case of the misseltoe, which draws its nourishment from certain trees, which has seeds that must be transported by certain birds, and which has flowers with separate sexes absolutely requiring the agency of certain insects to bring pollen from one flower to the other, it is equally preposterous to account for the structure of this parasite, with its relations to several distinct organic beings, by the effects of external conditions, or of habit, or of the volition of the plant itself. (PsycINFO Database Record (c) 2012 APA, all rights reserved) • In considering the Origin of Species, it is quite conceivable that a naturalist, reflecting on the mutual affinities of organic beings, on their embryological relations, their geographical distribution, geological succession, and other such facts, might come to the conclusion that each species had not been independently created, but had descended, like varieties, from other species. Nevertheless, such a conclusion, even if well founded, would be unsatisfactory, until it could be shown how the innumerable species inhabiting this world have been modified, so as to acquire that perfection of structure and coadaptation which most justly excites our admiration. Naturalists continually refer to external conditions, such as climate, food, &c, as the only possible cause of variation. In one very limited sense, as we shall hereafter see, this may be true; but it is preposterous to attribute to mere external conditions, the structure, for instance, of the woodpecker, with its feet, tail, beak, and tongue, so admirably adapted to catch insects under the bark of trees. In the case of the misseltoe, which draws its nourishment from certain trees, which has seeds that must be transported by certain birds, and which has flowers with separate sexes absolutely requiring the agency of certain insects to bring pollen from one flower to the other, it is equally preposterous to account for the structure of this parasite, with its relations to several distinct organic beings, by the effects of external conditions, or of habit, or of the volition of the plant itself. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
A brief introduction to the field of population genetics, this text offers students and researchers an overview of the discipline. Chapter topics include: genetic drift; natural selection; non-random mating; quantitative genetics; and the evolutionary advantage of sex. While each chapter treats a specific topic or problem in genetics, the common thread throughout the book is the question "Why is there so much genetic variation in natural populations?".
Article
dbSNP is a general catalog of genetic polymorphism maintained by NCBI, mainly collating information for single nucleotide variations, many of which will be single nucleotide polymorphisms (SNPs), but also including small indels. It takes submissions from many sources, now also including large numbers of sequence variants identified by next-generation sequencing. A number of differently designed studies have attempted to estimate the error rates in data archived in dbSNP. Most recently, a study added to earlier studies identifying specific issues for duplicons and copy number variations (CNVs); earlier analyses have focused on stop codons, splice sites, and the general content of dbSNP. This article overviews dbSNP itself, these studies, and their implications.
Article
Tracing the development of population genetics through the writings of such luminaries as Darwin, Galton, Pearson, Fisher, Haldane, and Wright, William B. Provine sheds light on this complex field as well as its bearing on other branches of biology.
Article
Selection on the human genome has been studied using comparative genomics and SNP architecture in the lineage leading to modern humans. In connection with the African exodus and colonization of other continents, human populations have adapted to a range of different environmental conditions. Using a new method that jointly analyses haplotype block length and allele frequency variation (F(ST)) within and between populations, we have identified chromosomal regions that are candidates for having been affected by local selection. Based on 1.6 million SNPs typed in 71 individuals of African American, European American and Han Chinese descent, we have identified a number of genes and non-coding regions that are candidates for having been subjected to local positive selection during the last 100 000 years. Among these genes are those involved in skin pigmentation (SLC24A5) and diet adaptation (LCT). The list of genes implicated in these local selective sweeps overlap partly with those implicated in other studies of human populations using other methods, but show little overlap with those postulated to have been under selection in the 5-7 myr since the divergence of the ancestors of human and chimpanzee. Our analysis provides focal points in the genome for detailed studies of evolutionary events that have shaped human populations as they explored different regions of the world.
Article
The physiological basis for the resistance to falciparum malaria individuals with sickle cell trait has not been understood. Recent advances in erythrocytic Plasmodium falciparum culture have made possible a direct investigation of the development of the malaria parasite in cells with sickle cell homoglobin. In a high (18%) oxygen atmosphere, there is no apparent sickling of cells, and the growth and multiplication of P. falciparum is identical in normal (AA), hemoglobin S homozygous (SS), and hemoglobin S heterozygous (SA) erythrocytes. Cultures under low (1-5%) oxygen, however, showed clear inhibition of growth. The sickling of SS red cells killed and lysed most or all of the intracellular parasites. Parasites in SA red cells were killed primarily at the large ring stage, probably as a result of a disruption of the parasite metabolism. Incubation in cyanate prior to culture reversed the resistance of SA erythrocytes to plasmodium growth, but had no effect on SS red cell sickling or resistance. Thus, the mechanism of resistance in vivo may be due solely to intraerythrocytic conditions.
Article
The comparison of human and chimpanzee macromolecules leads to several inferences: 1) Amino acid sequencing, immunological, and electrophoretic methods of protein comparison yield concordant estimates of genetic resemblance. These approaches all indicate that the average human polypeptide is more than 99 percent identical to its chimpanzee counterpart. 2) Nonrepeated DNA sequences differ more than amino acid sequences. A large proportion of the nucleotide differences between the two species may be ascribed to redundancies in the genetic code or to differences in non-transcribed regions. 3) The genetic distance between humans and chimpanzees, based on electrophoretic comparison of proteins encoded by 44 loci is very small, corresponding to the genetic distance between sibling species of fruit flies or mammals. Results obtained with other biochemical methods are consistent with this conclusion. However, the substantial anatomical and behavioral differences between humans and chimpanzees have led to their classification in separate families. This indicates that macromolecules and anatomical or behavioral features of organisms can evolve at independent rates. 4) A relatively small number of genetic changes in systems controlling the expression of genes may account for the major organismal differences between humans and chimpanzees. Some of these changes may result from the rearrangement of genes on chromosomes rather than from point mutations (53).
Article
Mathematical expressions are found for the effect of selection on simple Mendelian populations mating at random. Selection of a given intensity is most effective when amphimixis does not affect the character selected, e.g. in complete inbreeding or homogamy. Selection is very ineffective on autosomal recessive characters so long as they are rare.
Article
We have purified and characterized active recombinant human bone morphogenetic protein (BMP) 2A. Implantation of the recombinant protein in rats showed that a single BMP can induce bone formation in vivo. A dose-response and time-course study using the rat ectopic bone formation assay revealed that implantation of 0.5-115 micrograms of partially purified recombinant human BMP-2A resulted in cartilage by day 7 and bone formation by day 14. The time at which bone formation occurred was dependent on the amount of BMP-2A implanted; at high doses bone formation could be observed at 5 days. The cartilage- and bone-inductive activity of the recombinant BMP-2A is histologically indistinguishable from that of bone extracts. Thus, recombinant BMP-2A has therapeutic potential to promote de novo bone formation in humans.
Article
The relationship between the two estimates of genetic variation at the DNA level, namely the number of segregating sites and the average number of nucleotide differences estimated from pairwise comparison, is investigated. It is found that the correlation between these two estimates is large when the sample size is small, and decreases slowly as the sample size increases. Using the relationship obtained, a statistical method for testing the neutral mutation hypothesis is developed. This method needs only the data of DNA polymorphism, namely the genetic variation within population at the DNA level. A simple method of computer simulation, that was used in order to obtain the distribution of a new statistic developed, is also presented. Applying this statistical method to the five regions of DNA sequences in Drosophila melanogaster, it is found that large insertion/deletion (greater than 100 bp) is deleterious. It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.
Article
The number of selectively neutral polymorphic sites in a random sample of genes can be affected by ancestral selectively favored substitutions at linked loci. The degree to which this happens depends on when in the history of the sample the selected substitutions happen, the strength of selection and the amount of crossing over between the sampled locus and the loci at which the selected substitutions occur. This phenomenon is commonly called hitchhiking. Using the coalescent process for a random sample of genes from a selectively neutral locus that is linked to a locus at which selection is taking place, a stochastic, finite population model is developed that describes the steady state effect of hitchhiking on the distribution of the number of selectively neutral polymorphic sites in a random sample. A prediction of the model is that, in regions of low crossing over, strongly selected substitutions in the history of the sample can substantially reduce the number of polymorphic sites in a random sample of genes from that expected under a neutral model.
It is well known that random fluctuations in gene frequencies are due to the process of sampling with which zygotes are produced at every generation, that they are among the causes of evolution by determining what is somewhat in­appropriately called ‘genetic drift’, and that this was the centre of an early and substantially unsettled controversy between Sir Ronald Fisher and Sewall Wright. It is, perhaps, less widely appreciated that Sir Ronald, who was against the import­ance of random fixation, was also the first to consider it theoretically under the name of ‘Hagedoorn effect’ (Fisher 1922). The reason that led Fisher to a negative conclusion is summarized in the relationship between the chance of success of the rare mutant and its selective advantage. This relationship is given in detail in his 1930 paper on the distribution of gene ratios for rare mutations, in which it is shown that the chance of success rises almost proportionately to the selective advantage, in such a way that even small changes in the selective co­efficient affect deeply the chance for success. Clearly, the overall chance with which mutants showing selective disadvantage will be fixed depends also on the relative frequency with which such mutants arise, that is, on the distribution of selective coefficients for mutations.
Article
When a selectively favourable gene substitution occurs in a population, changes in gene frequencies will occur at closely linked loci. In the case of a neutral polymorphism, average heterozygosity will be reduced to an extent which varies with distance from the substituted locus. The aggregate effect of substitution on neutral polymorphism is estimated; in populations of total size 106 or more (and perhaps of 104 or more), this effect will be more important than that of random fixation. This may explain why the extent of polymorphism in natural populations does not vary as much as one would expect from a consideration of the equilibrium between mutation and random fixation in populations of different sizes. For a selectively maintained polymorphism at a linked locus, this process will only be important in the long run if it leads to complete fixation. If the selective coefficients at the linked locus are small compared to those at the substituted locus, it is shown that the probability of complete fixation at the linked locus is approximately exp (-Nc), where c is the recombinant fraction and N the population size. It follows that in a large population a selective substitution can occur in a cistron without eliminating a selectively maintained polymorphism in the same cistron.
Article
Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations involved must be neutral ones.
Article
Selection against deleterious alleles maintained by mutation may cause a reduction in the amount of genetic variability at linked neutral sites. This is because a new neutral variant can only remain in a large population for a long period of time if it is maintained in gametes that are free of deleterious alleles, and hence are not destined for rapid elimination from the population by selection. Approximate formulas are derived for the reduction below classical neutral values resulting from such background selection against deleterious mutations, for the mean times to fixation and loss of new mutations, nucleotide site diversity, and number of segregating sites. These formulas apply to random-mating populations with no genetic recombination, and to populations reproducing exclusively asexually or by self-fertilization. For a given selection regime and mating system, the reduction is an exponential function of the total mutation rate to deleterious mutations for the section of the genome involved. Simulations show that the effect decreases rapidly with increasing recombination frequency or rate of outcrossing. The mean time to loss of new neutral mutations and the total number of segregating neutral sites are less sensitive to background selection than the other statistics, unless the population size is of the order of a hundred thousand or more. The stationary distribution of allele frequencies at the neutral sites is correspondingly skewed in favor of rare alleles, compared with the classical neutral result. Observed reductions in molecular variation in low recombination genomic regions of sufficiently large size, for instance in the centromere-proximal regions of Drosophila autosomes or in highly selfing plant populations, may be partly due to background selection against deleterious mutations.
Article
A short history of the major features of neutral theories of molecular evolution is presented. Emphasis is placed on the nearly neutral theory, as this version of the neutral theory has explained the widest range of phenomena. The shift of interest from protein to DNA evolution is chronicled, leading to the modern view that silent and replacement substitutions are responding to different evolutionary forces. However, the exact nature and magnitude of these forces remains controversial, as all current theoretical models suffer either from assumptions that are not quite realistic or from an inability to account readily for all phenomena. Although the gathering of sequence data has been the main effort of contemporary population genetics, further exploration of theoretical models of molecular evolution would provide a more coherent framework for data analysis.
Article
The Duffy blood group locus, which encodes a chemokine receptor, is characterized by three alleles-FY*A, FY*B, and FY*O. The frequency of the FY*O allele, which corresponds to the absence of Fy antigen on red blood cells, is at or near fixation in most sub-Saharan African populations but is very rare outside Africa. The FST value for the FY*O allele is the highest observed for any allele in humans, providing strong evidence for the action of natural selection at this locus. Homozygosity for the FY*O allele confers complete resistance to vivax malaria, suggesting that this allele has been the target of selection by Plasmodium vivax or some other infectious agent. To characterize the signature of directional selection at this locus, we surveyed DNA sequence variation, both in a 1.9-kb region centered on the FY*O mutation site and in a 1-kb region 5-6 kb away from it, in 17 Italians and in a total of 24 individuals from five sub-Saharan African populations. The level of variation across both regions is two- to threefold lower in the Africans than in the Italians. As a result, the pooled African sample shows a significant departure from the neutral expectation for the number of segregating sites, whereas the Italian sample does not. The FY*O allele occurs on two major haplotypes in three of the five African populations. This finding could be due to recombination, recurrent mutation, population structure, and/or mutation accumulation and drift. Although we are unable to distinguish among these alternative hypotheses, it is likely that the two major haplotypes originated prior to selection on the FY*O mutation.
Article
Selected substitutions at one locus can induce stochastic dynamics that resemble genetic drift at a closely linked neutral locus. The pseudohitchhiking model is a one-locus model that approximates these effects and can be used to describe the major consequences of linked selection. As the changes in neutral allele frequencies when hitchhiking are rapid, diffusion theory is not appropriate for studying neutral dynamics. A stationary distribution and some results on substitution processes are presented that use the theory of continuous-time Markov processes with discontinuous sample paths. The coalescent of the pseudohitchhiking model is shown to have a random number of branches at each node, which leads to a frequency spectrum that is different from that of the equilibrium neutral model. If genetic draft, the name given to these induced stochastic effects, is a more important stochastic force than genetic drift, then a number of paradoxes that have plagued population genetics disappear.
Article
Studies of nuclear sequence variation are accumulating, such that we can expect a good description of the structure of human variation across populations and genomic regions in the near future. This description will help to elucidate the evolutionary forces that shape patterns of variability. Such an understanding will be of general biological interest, but could also facilitate the design and interpretation of disease-mapping studies. Here, we integrate the results from surveys of nuclear sequence variation. When nuclear sequences are considered together with mtDNA and microsatellites, it becomes clear that neither the standard neutral model, nor a simple long-term exponential growth model, can account for all the available human variation data. A possible explanation is that a subset of loci are not evolving neutrally; even so, more-complex models of effective population size and structure might be necessary to explain the data.