ArticlePDF Available

Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y


Abstract and Figures

Significance Eye, hair, and skin pigmentation are highly variable in humans, particularly in western Eurasian populations. This diversity may be explained by population history, the relaxation of selection pressures, or positive selection. To investigate whether positive natural selection is responsible for depigmentation within Europe, we estimated the strength of selection acting on three genes known to have significant effects on human pigmentation. In a direct approach, these estimates were made using ancient DNA from prehistoric Europeans and computer simulations. This allowed us to determine selection coefficients for a precisely bounded period in the deep past. Our results indicate that strong selection has been operating on pigmentation-related genes within western Eurasia for the past 5,000 y.
Content may be subject to copyright.
PNAS proof
until 3PM ET Monday
publication week
Direct evidence for positive selection of skin, hair, and
eye pigmentation in Europeans during the last 5,000 y
Sandra Wilde
, Adrian Timpson
, Karola Kirsanow
, Elke Kaiser
, Manfred Kayser
, Martina Unterländer
Nina Hollfelder
, Inna D. Potekhina
, Wolfram Schier
, Mark G. Thomas
, and Joachim Burger
Institute of Anthropology, Johannes Gutenberg University Mainz, 55128 Mainz, Germany;
Research Department of Genetics, Evolution and Environment,
University College London, London WC1E 6BT, United Kingdom;
Institute of Archaeology, University College London, London WC1H 0PY, United Kingdom;
Institute of Prehistoric Archaeology, Freie Universität Berlin, 14195 Berlin, Germany;
Department of Forensic Molecular Biology, Erasmus University Medical
Center Rotterdam, 3000 CA, Rotterdam, The Netherlands; and
Institute of Archaeology, Academy of Science of the Ukraine, 04210 Kiev-210, Ukraine
Edited by Nina G. Jablonski, The Pennsylvania State University, University Park, Pennsylvania, and accepted by the Editorial Board February 1, 2014 (received
for review September 4, 2013)
Pigmentation is a polygenic trait encompassing some of the most
visible phenotypic variation observed in humans. Here we present
direct estimates of selection acting on functional alleles in three
key genes known to be involved in human pigmentation path-
waysHERC2,SLC45A2,andTYRusing allele frequency estimates
from Eneolithic, Bronze Age, and modern Eastern European samples
and forward simulations. Neutrality was overwhelmingly rejected
for all alleles studied, with point estimates of selection ranging
from around 210% per generation. Our results provide direct ev-
idence that strong selection favoring lighter skin, hair, and eye
pigmentation has been operating in European populations over
the last 5,000 y.
ancient DNA
computer simulations
natural selection
Neolithic/Bronze Age
Eastern Europe
Genomic signatures of natural selection in humans are usually
obtained from modern population genetic data and take the
form of patterns of variation outside those expected under
neutrality (1), including strong correlations between allele fre-
quencies and hypothesized ecological drivers of selection (2) and
identifying alleles with unusually recent age estimates for their
frequencies (1). All such indirect approaches have poor sensi-
tivity and temporal resolution, most are confounded by past
demographic processes, and many are insensitive to selection
acting on standing variation (3). With advances in ancient DNA
analysis techniques it is possible to obtain direct estimates of
natural selection over specific time periods by estimating allele
frequency change, permitting changes in selection intensity to be
detected through time and a more detailed understanding of the
forces shaping human evolution. However, to date no such esti-
mates have been made.
Pigmentation is a particularly conspicuous human phenotypic
variation and in the past has been misleadingly used as a proxy
for deep biogeographical origins (4). Dark pigmentation is thought
to be the ancestral state in humans and to have been maintained
by purifying selection in low-latitude, high-UVR regions to protect
against folate photolysis, UV radiation (UVR)-induced DNA
damage, and possibly damage to immunoglobulins (5, 6). Conti-
nental-scale correlations between skin pigmentation and incident
UVR levels strongly indicate positive ecological adaptation (5),
although sexual selectionparticularly in relation to eye and hair
coloration (7)and relaxation of selective constraints (6) may also
have been important.
Melanin, a derivative of tyrosine, is the primary biopolymer
responsible for constitutive animal pigmentation and is found in
two forms in humans, eumelanin (black-brown) and pheomela-
nin (red-yellow). It is synthesized in melanosomes, organelles
located in melanocytes in several tissues including the basal layer
of the epidermis, hair follicles, and the iris. Variation in pig-
mentation depends mainly on differences in the amount and type
of melanin synthesized and the shape and distribution of mela-
nosomes in different tissues (8). The products of several genes
are involved in melanin synthesis and distribution, and known
DNA polymorphism in those genes explains a substantial pro-
portion of human pigmentation variation (912). Signatures of
recent natural selection have been detected in some pigmenta-
tion genes, including both shared and regionally specific alleles
associated with lighter skin pigmentation in eastern and western
Eurasia (13, 14), using modern population genetic data (1, 12,
13, 15, 16).
To obtain direct estimates of the strength of natural selection
driving depigmentation we analyzed three polymorphic sites in
ancient and modern samples (Table 1) that had previously been
identified through genome-wide association studies (GWAS) (1,
17) and fine-mapping SNP association (18) as influencing pig-
mentation in modern Europeans: HERC2 (rs12913832 A >G)
(18), SLC45A2 (rs16891982 C >G) (13), and TYR (rs1042602
C>A) (11). The product of the TYR gene, tyrosinase, catalyzes
the first two steps of the melanogenesis pathway (8), and its ab-
sence generates an epistatic mask on downstream pigment-coding
genes, halting melanin production. The TYR SNP rs1042602 is
highly polymorphic in Europeans, and the derived A allele has
been associated with light skin (19) and eye color (20) and the
Eye, hair, and skin pigmentation are highly variable in humans,
particularly in western Eurasian populations. This diversity may
be explained by population history, the relaxation of selection
pressures, or positive selection. To investigate whether posi-
tive natural selection is responsible for depigmentation within
Europe, we estimated the strength of selection acting on three
genes known to have significant effects on human pigmenta-
tion. In a direct approach, these estimates were made using
ancient DNA from prehistoric Europeans and computer simu-
lations. This allowed us to determine selection coefficients for
a precisely bounded period in the deep past. Our results in-
dicate that strong selection has been operating on pigmenta-
tion-related genes within western Eurasia for the past 5,000 y.
Author contributions: S.W., M.G.T., and J.B. designed research; S.W., A.T., M.U., N.H., and
M.G.T. performed research; A.T. and M.G.T. contributed new reagents/analytic tools; E.K.
and W.S. coordinated the acquisition of the archaeological sample material and provided
background information; I.D.P. provided archaeological sample material and background
information; M.G.T. and J.B. coordinated this study; S.W., A.T., N.H., and M.G.T. analyzed
data; and S.W., A.T. , K.K., E.K., M.K., M.U., N. H., I.D.P., W.S., M.G.T., an d J.B. wrote
the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. N.G.J. is a guest editor invited by the Editorial
Freely available online through the PNAS open access option.
Present address: Department of Ecology and Genetics, Evolutionary Biology, Uppsal a
University, 75236 Uppsala, Sweden.
To whom correspondence should be addressed. E-mail:
This article contains supporting information online at
1073/pnas.1316513111/-/DCSupplemental. PNAS Early Edition
PNAS proof
until 3PM ET Monday
publication week
absence of freckles (11). SLC45A2 (MATP)isinvolvedinthe
distribution and intracellular processing of tyrosinase and other
pigmentation enzymes (21). The derived rs16891982 G allele,
which decreases in frequency along a northsouth cline in Europe
(13), is associated with lighter skin, hair, and eye pigmentation in
modern populations (13, 22). The HERC2 SNP rs12913832 A >G
is the main determinant of iris pigmentation (brown/blue) (18, 23)
and is also associated with skin and hair pigmentation and the
propensity to tan (24). It is located within an intron of the Hect
Domain and RCC1-like Domain2 (HERC2) gene 21 kb upstream
from the OCA2 promoter and serves as an enhancer for OCA2
expression (23, 25). OCA2 expression is increased in the mela-
nocytes of carriers of the ancestral A allele, and attenuated in
carriers of the derived G allele (25). OCA2 encodes a melanoso-
mal transmembrane protein, protein P, which is involved in the
trafficking and processing of tyrosinase, the regulation of mela-
nosomal pH, and glutathione metabolism (26). Selection favoring
the SLC45A2 rs16891982 G allele has been estimated to have
begun between 11,000 and 19,000 y ago (14), well after the ex-
pansion of anatomically modern humans out-of-Africa. The age of
the derived TYR rs1042602 allele has been estimated using two
different methods: rho statistics place its emergence around 6,100
y ago, whereas the Bayesian coalescent approach using BEAST
dates the allele to 15,600 y ago (27).
However, these estimates were made by assuming that the
selected allele arose at the time selection started and so do not
accommodate the possibility of selection acting on standing
variation (3). Furthermore, such estimates are themselves de-
pendent on estimates of mutation and recombination rates. No
estimates of the strength of selection or when it started to act on
HERC2/OCA2 and TYR are available at present.
Ancient DNA was retrieved from 63 out of 150 Eneolithic (ca.
6,5005,000 y ago) and Bronze Age (ca. 5,0004,000 y ago)
samples from the PonticCaspian steppe, mainly from modern-
day Ukraine. We used multiplex-PCR enrichment and next-
generation sequencing to genotype the three pigmentation-
associated SNPs (rs12913832, rs16891982, and rs1042602) and
mtDNA hypervariable region 1 (HVR1) sequences plus 32 mtDNA
coding region SNPs and a 9-bp-indel from these individuals (Tables
S1 and S2). Consensus HVR1 sequences were successfully assem-
bled from 60 individuals. Pigmentation gene data were obtained
from 48 samples. We also genotyped the three pigmentation-
associated SNPs in a sample of 60 modern Ukrainians (28) and
observed an increase in frequency of all derived alleles between
the ancient and modern samples from the same geographic re-
gion (Table 1 and Fig. S1). This implies that the pigmentation of the
prehistoric population is likely to have differed from that of
modern humans living in the same area. Modern frequencies of
the derived alleles within all of Europe and outside of Europe
are provided for comparison (Table 1).
Inferring natural selection based on temporal differences in
allele frequency requires the assumption of population continuity.
To this end we compared the 60 mtDNA HVR1 sequences
obtained from our ancient sample to 246 homologous modern
sequences (2931) from the same geographic region and found
low genetic differentiation (F
=0.00551; P=0.0663) (32).
Coalescent simulations based on the mtDNA data, accommo-
dating uncertainty in the ancient sample age, failed to reject
population continuity under a wide range of assumed ancestral
population size combinations (Fig. 1).
Conversely, continuity between early central European farm-
ers and modern Europeans has been rejected in a previous study
(33). However, the Eneolithic and Bronze Age sequences pre-
sented here are 5002,000 y younger than the early Neolithic
and belong to lineages identified both in early farmers and late
huntergatherers from central Europe (33). A plausible expla-
nation for this is that the prehistoric populations sampled in this
study are a product of admixture between in situ huntergatherers
and immigrant early farmers during the centuries after the ar-
rival of farming, and that this admixture was a major process
shaping modern patterns of mtDNA variation (34) and possibly
Table 1. Allele frequencies of three functional SNPs associated with pigmentation in ancient and modern populations
Derived allele frequency
Gene SNP Polymorphism Europe Asia Africa Modern Ukrainian sample Ancient sample
HERC2 rs12913832 regulatory
element (OCA2)
A>G 0.710 [758] 0.002 [572] 0.000 [370] 0.651 (0.5460.744) [86] 0.160 (0.0990.247) [94]
SLC45A2 (MATP) rs16891982 Leu374Phe C >G 0.970 [758] 0.007 [572] 0.000 [370] 0.927 (0.8490.965) [82] 0.432 (0.2960.578) [44]
TYR rs1042602 Ser192Tyr C >A 0.368 [758] 0.002 [572] 0.000 [370] 0.367 (0.2790.466) [98] 0.043 (0.0180.106) [92]
Modern allele frequencies are from 1000 Genomes ( (65). Range in parentheses indicates the equal-tailed 95% confidence
interval calculated as described using the qbeta function in R (66). Numbers in square brackets indicate 2Nindividuals. The African American (ASW) data from
1000 Genomes were excluded because this population is admixed to an unknown extent.
NN x103
NUP x103
Fig. 1. Probabilities of obtaining F
equal to or greater than that observed
(0.00551) between 60 Eneolithic (ca. 6,5005,000 y ago) and Bronze Age (ca.
5,0004,000 y ago) samples from the PonticCaspian steppe, and a combined
sample of 246 homologous modern sequences from the same geographic
region, across a range of assumed ancestral population size combinations.
Two phases of exponential growth were modeled, the first after the initial
colonization of Europe 45,000 y ago, of assumed effective female pop-
ulation size N
(yaxis), and ending when farming began in the region
considered 7,000 y ago, when the assumed effective female population size
was N
(xaxis), and the second leading up to the present, when the assumed
effective female population size is 5,444,812. The initial colonizers of Europe
were sampled from a constant ancestral African population of 5,000 effec-
tive females. Gray shaded areas indicate Pvalues >0.05.
| Wilde et al.
PNAS proof
until 3PM ET Monday
publication week
also the variability observed in European hair, eye, and
skin color.
To test whether the observed increases in the three light pig-
mentation-associated alleles can be explained by genetic drift
alone or whether natural selection needs to be invoked, we
performed forward computer simulations of drift plus selection,
accommodating uncertainty in ancient and modern allele fre-
quency, population size, and ancient sample age. We assumed
codominance for both SLC45A2 rs16891982 G and TYR rs1042602
A alleles (22, 35) and that the derived HERC2 rs12913832 G allele
is recessive (36). Using these simulations, neutrality (S=0) was
rejected under all assumed ancestral effective population sizes
ranging from 10
to 10
at the time of the ancient sample
(SLC45A2 P <1×10
,TYR P <2×10
,andHERC2 P <1×
). The values of selection acting on the SLC45A2 rs16891982
G allele, the TYR rs1042602 A allele, and the HERC2 rs12913832
G allele that best explained the observed derived allele frequency
changes were 0.030, 0.026, and 0.036, respectively (Fig. 2).
Whereas there is strong evidence that the derived HERC2
rs12913832 G allele is recessive (36), it is less clear whether the
SLC45A2 and TYR derived alleles are codominant, recessive, or
dominant (22, 35). Under the assumption that both SLC45A2 and
TYR derived alleles are recessive, the selection values that best
explain the observed changes in frequency are 0.022 and 0.104,
respectively, and under the assumption that they are dominant the
selection values are 0.088 and 0.016, respectively; again, neutrality
was rejected for all three alleles (P<4×10
) under all ancestral
population sizes modeled and all assumptions of dominance/
codominance/recessivity (Fig. S2).
Our analysis indicates that positive selection on pigmentation
variants associated with depigmented hair, skin, and eyes was still
ongoing after the time period represented by our archaeological
population, 6,5004,000 y ago. This finding suggests that either
the selection pressures that initiated the selective sweep during
the Late Pleistocene or early Holocene were still operative or that
a new selective environment had arisen in which depigmentation
was favored for a different reason.
The high selection coefficients estimated for pigmentation
genes HERC2,SLC45A2, and TYR are best understood in the
context of estimates obtained for other recently selected loci.
Using spatially explicit simulation and approximate Bayesian
computation, selection on the LCT -13,910*T allelewhich is
strongly associated with lactase persistence in Europeans and
southern Asianswas inferred to fall in the range 0.02590.0795
and to have begun around 7,500 y ago in the region between the
Balkans and central Europe (37). However, another simulation-
based study incorporating latitudinal effects on selection resulted
in a lower estimate of S(0.0080.018) (38). The selective ad-
vantage of the G6PD Aand Med deficiency alleles conferring
resistance to malaria have been estimated at 0.0190.048 and
0.0140.049, respectively, in regions where malaria is endemic
(39). These alleles are estimated to have arisen 6,357 y ago
(G6PD A) and 3,330 y ago (G6PD Med) (39). Thus, the esti-
mates of Sfor the three pigmentation genes examined in this
study are comparable to those for the most strongly selected loci
in the human genome.
Although these estimated selection coefficients are high, they
are comparable to previous estimates for genes in the pigmenta-
tion complex. The selective sweeps favoring the SLC45A2 de-
rived allele, as well as the derived alleles of SNPs in SLC24A5
and TYRP1, which are also implicated in the lightening of skin
pigmentation, are estimated to have begun between 11,000 and
19,000 y ago, after the separation of the ancestors of modern
Europeans and East Asians (the ages of the selective sweeps
affecting HERC2 and TYR have not yet been estimated) (14, 40).
Beleza et al. (14) recently estimated the coefficient of selection
at the SLC45A2 locus to be 0.05 under a dominant model of
inheritance and 0.04 under an additive model. Selection favoring
the derived alleles of SNPs in SLC24A5 and TYRP1 was found to
be similarly strong.
Estimating selection coefficients using the ancient DNA-based
simulation approach presented here offers considerable advan-
tages over traditional methods based on allele age and frequency
estimates (1): Selection coefficients are estimated over a defined
period; selection acting on standing variation can be accommo-
dated; and our approach is insensitive to the frequently un-
accounted for uncertainties associated with allele age estimation
using molecular or recombination clocks. This latter advantage
is likely to result in considerable improvements in precision.
However, our approach does require the assumption of pop-
ulation continuity and will not provide direct estimates of when
a selective sweep began.
Although the strength of the selection coefficients in a certain
time window can be estimated with improved precision using our
ancient DNA-based simulation approach, the actual nature of
the selection pressure remains unknown. However, temporal and
geographical information from the prehistoric skeletal pop-
ulation under study can help in formulating reasonable hy-
potheses. Geographic variation in many functional skin pigmentation
gene polymorphisms (13), and lighter skin pigmentation more
generally, correlate strongly with distance from the equator in
long-established populations, suggesting that selective pressure
also occurred along a latitudinal gradient. The samples in our
study were from between 42°N and 54°N, a latitudinal belt in which
3.0 3.5 4.0 4.5 5.0
founding N
Selection (%)
3.0 3.5 4.0 4.5 5.0
Log10 founding Ne
Selection (%)
3.0 3.5 4.0 4.5 5.0
Log10 founding Ne
Selection (%)
Fig. 2. Two-tailed empirical Pvalues for obtaining the observed (A)SLC45A2 G allele, (B)TYR A allele, and (C)HERC2 G allele frequency increase. Pvalues
were obtained by forward simulation of drift and natural selection across a range of assumed ancestral population sizes and selection coefficients, assuming
exponential growth to a modern N
of 4,845,710. The SLC45A2 rs16891982 G allele and the TYR rs1042602 A allele were assumed to be codominant. The
HERC2 rs12913832 G allele was assumed to be recessive (values less than 0.01 are shaded gray).
Wilde et al. PNAS Early Edition
PNAS proof
until 3PM ET Monday
publication week
yearly average UVR is insufficient for vitamin D3 photosynthesis
in highly melanized skin (4, 41). Constraints on the ability to
photosynthesize vitamin D3 imposed by low incident UVR in-
tensity may have provided significant selective pressure favoring
lighter pigmentation populations in high-latitude regions such as
the northern Pontic steppe belt. The need to admit UVB radiation
to catalyze the synthesis of vitamin D3, together with the decreased
danger of folate photolysis at higher latitudes, may account for
the observed skin depigmentation from prehistoric to modern
times in this region (5).
Dietary change during the Neolithization process may have
reinforced selection pressure favoring depigmented skin. The
individuals analyzed in this study lived 5002,000 y after the
arrival of farming in the region north of the Black Sea (42, 43).
In many parts of Europe, the MesolithicNeolithic transition is
associated with a switch from a vitamin D-rich aquatic or game-
based huntergatherer diet (44) to a vitamin D-poor agricul-
turalist diet. In low-UV regimes such as the one prevailing in our
study region, it is difficult to meet vitamin D requirements
without the consumption of significant quantities of oily fish or
animal liver (45, 46). The vitamin D recommended dietary al-
lowance of 8001,000 IU for adults requires the daily con-
sumption of the equivalent of 100 g of wild salmon (the dietary
input with the greatest measured vitamin D concentration).
Isotopic evidence suggests that the populations sampled in our
study continued to access aquatic resources, primarily river fish,
in the Neolithic, Eneolithic, and Bronze Age, although there was
considerable heterogeneity in fish consumption within the study
region (4750). However, any diminution in fish consumption
may have been sufficient to generate additional selective pres-
sure favoring depigmentation at this low-incident-UVR latitude.
Although ecological and environmental factors may be suffi-
cient to explain the observed change in European skin pigmen-
tation, these explanations are unlikely to hold for eye and hair
color. The geographic distribution of iris and hair pigmentation
variation does not conform as well to a latitudinal cline model,
with much of the observed phenotypic variation restricted to
Europe and closely related neighboring populations (51, 52).
The blue iris phenotype characteristic of the HERC2 rs12913832
G allele, for example, is almost completely restricted to western
Eurasia and some adjacent regions, its descendant populations,
and populations containing European admixture (51, 52). It is
possible that depigmented irises or the various human hair color
morphs in Europeans are by-products of selection on skin pig-
mentation. There is evidence for genegene interaction within
the polygenic system governing complex pigmentation traits;
interactions between HERC2,OCA2, and MC1R, in particular,
have been found to have a statistically significant effect on hair,
iris, and skin color (36). There is also evidence for epistatic
interactions between components of the melanin synthesis
pathway in other mammalian model systems, including interac-
tions between the products of ASIP,MC1R,andTYR (53).
Additionally, many pigmentation genes, including TYR,HERC2,
and SLC45A2 have pleiotropic effects on skin, hair, and eye
color (11, 36).
Given that intraspecific pigmentation variability in other taxa,
particularly avians, has been attributed to signaling and other
factors associated with mate choice (54) it is possible that depig-
mented irises and the various hair colors observed in Europeans
arose through sexual selection (7). Frequency-dependent sexual
selection in favor of rare variants has been observed in vertebrates
(55, 56), and such selection favoring rare pigmentation morphs
could have driven alleles associated with lighter hair and eye
colors to higher frequency. Once lighter hair and eye pigmenta-
tion phenotypes reached appreciable frequencies in European
populations, these novel traits may have continued to be preferred
as indicators of group membership, facilitating assortative mating.
Assortative mating based on coloration is common in vertebrates
(57), and skin pigmentation has been observed as a criterion for
endogamy in modern human populations (58, 59). In addition,
there is some evidence that lighter iris colors, because of their
recessive mode of inheritance, may be preferred by males in
assortative mating regimes to improve paternity confidence (60).
Consistent with positive assortative mating, an exact test of Hardy
Weinberg equilibrium reveals an excess of HERC2 rs12913832
homozygotes in both the modern (P=0.0543) and ancient (P=
0.0084) East European samples genotyped here (Table S3), despite
the relatively small sample sizes.
The observed excess of HERC2 rs12913832 homozygotes in
the ancient sample might be explained by population stratifica-
tion in a temporally heterogeneous population sample. Although
we do not observe any chronological or spatial patterning of the
pigmentation markers in our prehistoric sample, we cannot exclude
population stratification in the absence of additional neutral SNPs.
However, we note that neither the TYR nor the SLC45A2 SNPs
investigated here, nor three additional SNPs investigated in the same
ancient and modern samples, showed any significant observable
excess of homozygotes (Table S3), suggesting that the excess
of HERC2 rs12913832 homozygotes is less likely to be due to
population stratification.
In sum, a combination of selective pressures associated with
living in northern latitudes, the adoption of an agriculturalist
diet, and assortative mating may sufficiently explain the observed
change from a darker phenotype during the Eneolithic/Early
Bronze age to a generally lighter one in modern Eastern Euro-
peans, although other selective factors cannot be discounted.
The selection coefficients inferred directly from serially sampled
data at these pigmentation loci range from 2 to 10% and are
among the strongest signals of recent selection in humans.
Ancient DNA Extraction, Amplification, and Sequencing. Sample information,
detailed descriptions of DNA extraction, amplification, and sequencing methods
as well as validation of the ancient DNA data are provided in Supporting In-
formation owing to the extensive nature of the experimental setup.
Skeletal material from 150 Eneolithic and Bronze Age individuals from the
west and north Pontic region were available for ancient DNA analyses. From
all but one sample DNA was extracted twice independently using 0.51.0 g
bone powder; 403 bp of the hypervariable region 1 [nucleotide positions
(np) 16,01116,413] were amplified using seven overlapping primer pairs
(Table S4). They were integrated in a triple multiplex setup that included 32
clade-determining coding region SNPs and a 9-bp-indel (Table S5), as well as
used in single-locus PCRs. rs12913832, rs16891982, and rs1042602 were
amplified in a multiplex PCR together with 18 other nuclear loci (Table S4).
Sequencing of the PCR products was primarily done by 454 sequencing by
GATC Biotech. Before that, samples were pooled according to a protocol
modified after Meyer et al. (61). Raw data (454) were sorted by barcode and
primer sequences of the multiplex PCRs and then analyzed with SeqMan
ProTM (DNASTAR Lasergene 8, 9, and 10). For authentication purposes mi-
tochondrial haplotypes are based on at least three, and SNP genotypes on at
least four, independent amplification products from two extracts. In cases
where the authentication scheme could not be fulfilled using the available
454 data, those loci were additionally amplified in single-locus PCRs, fol-
lowed by direct sequencing after Sanger.
Population Genetic Analyses. Continuity between Neolithic and present-day
populations in the geographic region encompassing Bulgaria, Romania,
Ukraine, and the southwest of the Russian Federation was tested by calcu-
lating the F
between ancient and modern observed mtDNA HVR1 samples
(2932) and comparing this with F
s between ancient and modern DNA
samples generated by coalescent simulation (33). These simulated DNA
samples were generated using the program Fastsimcoal (62) under the null
model of a single continuous population, using plausible population pa-
rameter ranges, and serial ancient and modern samples that replicated the
observed sample numbers and dates. Details of the continuity test are pro-
vided in Supporting Information.
To test whether changes in HERC2 rs12913832 G, TYR rs1042602 A, or
SLC45A2 rs16891982 G allele frequencies (Fig. S1) can be explained by
genetic drift, or whether natural selection needs to be invoked, and to
| Wilde et al.
PNAS proof
until 3PM ET Monday
publication week
estimate the strength of natural selection where appropriate, we used a
forward simulation approach. In each forward simulation we first drew the
ancestral allele frequency estimate from a random Beta (n
+1, n
+1) dis-
tribution, where n
and n
were the number of ancestral and derived alleles
observed in our ancient sample, respectively, to reflect uncertainty in an-
cestral allele frequencies. Then drift and natural selection were forward-
simulated by binomial sampling across generations and using a standard
selection equation (63), respectively. We assumed codominance for both
SLC45A2 rs16891982 G and TYR rs1042602 A alleles (22, 35) and that the
derived HERC2 rs12913832 G allele is recessive (36) (Fig. 2). However, al-
though there is strong evidence that the derived HERC2 rs12913832 G allele
is recessive (36), it is less clear whether the SLC45A2 and TYR derived alleles
are codominant, recessive, or dominant (22, 35). For this reason we also
performed the same simulations assuming both dominance and recessivity
for the SLC45A2 and TYR derived alleles (Fig. S2). Exponential population
growth was modeled from a range of values of N
at the time of the ancient
sample (50 equally spaced log
values between 1,000 and 100,000) to
a modern N
of 4,845,710 (1/10 of the census population size of Ukraine in
2001, the year that the modern Ukrainian sample was collected; http://en. The number of generations
forward-simulated was drawn at random from a pool of 600,000 date esti-
mates for the ancient samples reported here, generated by pooling each set
of 10,000 date estimates for all 60 ancient samples. In the final generation of
each forward simulation, simulated modern sample allele frequencies were
picked from a random binomial with Nequal to the modern sample size
(HERC2 n =86, SLC45A2 n =82, and TYR n =98). Forward simulations were
repeated 100,000 times for each combination of the 50 assumed N
values at
the time the ancient sample and 50 selection coefficient (S) values, starting
at zero. Finally, the simulated distribution of modern derived allele fre-
quencies was compared with those observed using the equation 1 2×
j0.5 Pj, where P is the proportion of simulated modern allelefrequencies that
are greater than that observed. This yielded a two-tailed empirical Pvalue for
the observed allele frequency increase for each combination of the de-
mographic and natural selection model parameters (64) (Fig. 2 and Fig. S2).
ACKNOWLEDGMENTS. We thank all colleagues who contributed their ar-
chaeological knowledge and samples to this project, notably D. Agre,
St. Alexandrov, V. Bubulici, A. N. Gei, A. A. Khokhlov, I. N. Klyuchneva,
A. Kozak, N. Neradenko, A. V. Nikolova, V. P. Petrenko, Yu. Ya. Rassamakin,
V. A. Romashko, S. N. Sanzharov, E. N. Sava, N. N.Shishlina, and D. Ya. Teslenko.
We also thank Tom Gilbert and colleagues for giving us a hands-on introduc-
tion to bar coding and 454 sequencing workflow, and providing us with pro-
tocols. Thanks to Benjamin Rieger for his sort3perl script. The radiocarbon
dates were provided by the Research Laboratory for Archaeology and
the History of Art (University of Oxford), and financially supported by the
Excellence Cluster 264 Topoi, Berlin. The authors acknowledge the use of the
UCL Legion High Performance Computing Facility and associated support serv-
ices in the completion of this work. The project was funded by German Federal
Ministry of Education and Research Grant 01UA0809A.
1. Sabeti PC, et al.; International HapMap Consortium (2007) Genome-wide detection
and characterization of positive selection in human populations. Nature 449(7164):
2. Coop G, Witonsky D, Di Rienzo A, Pritchard JK (2010) Using environmental correla-
tions to identify loci underlying local adaptation. Genetics 185(4):14111423.
3. Peter BM, Huerta-Sanchez E, Nielsen R (2012) Distinguishing between selective
sweeps from standing variation and from a de novo mutation. PLoS Genet 8(10):
4. Jablonski NG, Chaplin G (2000) The evolution of human skin coloration. J Hum Evol
5. Jablonski NG , Chaplin G (2010) Colloquium paper: Human skin pigmentation as an
adaptation to UV radiation. Proc Natl Acad Sci USA 107(Suppl 2):89628968.
6. Harding RM, et al. (2000) Evidence for variable selective pressures at MC1R. Am J Hum
Genet 66(4):13511361.
7. Darwin C (1871) The Descent of Man, and Selection in Relation to Sex (John Murray,
8. Sturm RA, Teasdale RD, Box NF (2001) Human pigmentation genes: Identification,
structure and consequences of polymorphic variation. Gene 277(1-2):4962.
9. Rees JL, Harding RM (2012) Understanding the evolution of human pigmentation:
Recent contributions from population genetics. J Invest Dermatol 132(3 Pt 2):846853.
10. Myles S, Somel M, Tang K, Kelso J, Stoneking M (2007) Identifying genes underlying
skin pigmentation differences among human populations. Hum Genet 120(5):613621.
11. Sulem P, et al. (2007) Genetic determinants of hair, eye and skin pigmentation in
Europeans. Nat Genet 39(12):14431452.
12. Lao O, de Gruijter JM, van Duijn K, Navarro A, Kayser M (2007) Signatures of positive
selection in genes associated with human skin pigmentation as revealed from anal-
yses of single nucleotide polymorphisms. Ann Hum Genet 71(Pt 3):354369.
13. Norton HL, et al. (2007) Genetic evidence for the convergent evolution of light skin in
Europeans and East Asians. Mol Biol Evol 24(3):710722.
14. Beleza S, et al. (2013) The timing of pigmentation lightening in Europeans. Mol Biol
Evol 30(1):2435.
15. Alonso S, et al. (2008) Complex signatures of selection for the melanogenic loci TYR,
TYRP1 and DCT in humans. BMC Evol Biol 8:74.
16. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective
sweeps. Genome Res 20(3):393402.
17. Sulem P, et al. (2008) Two newly identified genetic determinants of pigmentation in
Europeans. Nat Genet 40(7):835837.
18. Eiberg H, et al. (2008) Blue eye color in humans may be caused by a perfectly asso-
ciated founder mutation in a regulatory element located within the HERC2 gene
inhibiting OCA2 expression. Hum Genet 123(2):177187.
19. Shriver MD, et al. (2003) Skin pigmentation, biogeographical ancestry and admixture
mapping. Hum Genet 112(4):387399.
20. Frudakis T, et al. (2003) Sequences associated with human iris pigmentation. Genetics
21. Costin GE, Valencia JC, Vieira WD, Lamoreux ML, Hearing VJ (2003) Tyrosinase pro-
cessing and intracellular trafficking is disrupted in mouse primary melanocytes car-
rying the underwhite (uw) mutation. A model for oculocutaneous albinism (OCA)
type 4. J Cell Sci 116(Pt 15):32033212.
22. Cook AL, et al. (2009) Analysis of cultured human melanocytes based on poly-
morphisms within the SLC45A2/MATP, SLC24A5/NCKX5, and OCA2/P loci. J Invest
Dermatol 129(2):392405.
23. Sturm RA, et al. (2008) A single SNP in an evolutionary conserved region within intron
86 of the HERC2 gene determines human blue-brown eye color. Am J Hum Genet
24. Han J, et al. (2008) A genome-wide association study identifies novel alleles associ-
ated with hair color and skin pigmentation. PLoS Genet 4(5):e1000074.
25. Visser M, Kayser M, Palstra R-J (2012) HERC2 rs12913832 modulates human pigmen-
tation by attenuating chromatin-loop formation between a long-range enhancer and
the OCA2 promoter. Genome Res 22(3):446455.
26. Donnelly MP, et al. (2012) A global view of the OCA2-HERC2 region and pigmenta-
tion. Hum Genet 131(5):683696.
27. Hudjashov G, Villems R, Kivisild T (2013) Global patterns of diversity and selection in
human tyrosinase gene. PLoS ONE 8(9):e74307.
28. Nadkarni NA, Weale ME, von Schantz M, Thomas MG (2005) Evolution of a length
polymorphism in the human PER3 gene, a component of the circadian system. J Biol
Rhythms 20(6):490499.
29. Malyarchuk BA, Derenko MV (2001) Mitochondrial DNA variability in Russians and
Ukrainians: Implication to the origin of the Eastern Slavs. Ann Hum Genet 65(Pt 1):
30. Malyarchuk BA, et al. (2002) Mitochondrial DNA variability in Poles and Russians. Ann
Hum Genet 66(Pt 4):261283.
31. Calafell F, Underhill P, Tolun A, Angelicheva D, Kalaydjieva L (1996) From Asia to
Europe: Mitochondrial DNA sequence variability in Bulgarians and Turks. Ann Hum
Genet 60(Pt 1):3549.
32. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred
from metric distances among DNA haplotypes: Application to human mitochondrial
DNA restriction data. Genetics 131(2):479491.
33. Bramanti B, et al. (2009) Genetic discontinuity between local hunter-gatherers and
central Europes first farmers. Science 326(5949):137140.
34. Bollongino R, et al. (2013) 2000 years of parallel societies in Stone Age Central Europe.
Science 342(6157):479481.
35. Stokowski RP, et al. (2007) A genomewide association study of skin pigmentation in
a South Asian population. Am J Hum Genet 81(6):11191132.
36. Branicki W, Brudnik U, Wojas-Pelc A (2009) Interactions between HERC2, OCA2 and
MC1R may influence human pigmentation phenotype. Ann Hum Genet 73(2):
37. Itan Y, Powell A, Beaumont MA, Burger J, Thomas MG (2009) The origins of lactase
persistence in Europe. PLOS Comput Biol 5(8):e1000491.
38. Gerbault P, Moret C, Currat M, Sanchez-Mazas A (2009) Impact of selection and de-
mography on the diffusion of lactase persistence. PLoS ONE 4(7):e6369.
39. Tishkoff SA, et al. (2001) Haplotype diversity and linkage disequilibrium at human
G6PD: Recent origin of alleles that confer malarial resistance. Science 293(5529):
40. Soejima M, Tachida H, Ishida T, Sano A, Koda Y (2006) Evidence for recent positive se-
lection at the human AIM1locus in a European population.Mol Biol Evol 23(1):179188.
41. Jablonski NG, Chaplin G (2012) Human skin pigmentation, migration and disease
susceptibility. Philos Trans R Soc Lond B Biol Sci 367(1590):785792.
42. Kotova N (2009) The Neolithization of Northern Black Sea area in the context of
climate changes. Documenta Praehistorica 36:159174.
43. Kotova N, Makhortykh S (2010) Human adaptation to past climate changes in the
northern Pontic steppe. Quat Int 220(1-2):8894.
44. Richards MP, Schulting RJ, Hedges RE (2003) Archaeology: Sharp shift in diet at onset
of Neolithic. Nature 425(6956):366.
45. Lu Z, et al. (2007) An evaluation of the vitamin D3 content in fish: Is the vitamin D
content adequate to satisfy the dietary requirement for vitamin D? J Steroid Biochem
Mol Biol 103(3-5):642644.
46. Holick MF (2004) Sunlight and vitamin D for bone health and prevention of auto-
immune diseases, cancers, and cardiovascular disease. Am J Clin Nutr 80(6, Suppl):
47. Lillie M, Budd C, Potekhina I (2011) Stable isotope analysis of prehistoric populations
from the cemeteries of the Middle and Lower Dnieper Basin, Ukraine. J Archaeol Sci
Wilde et al. PNAS Early Edition
PNAS proof
until 3PM ET Monday
publication week
48. Lillie MC, Richards M (2000) Stable isotope analysis and dental evidence of diet at the
MesolithicNeolithic transition in Ukraine. J Archaeol Sci 27(10):965972.
49. Shishlina NI, et al. (2009) Paleoecology, subsistence, and C-14 chronology of the
Eurasian Caspian Steppe Bronze Age. Radiocarbon 51(2):481499.
50. Hollund HI, Higham T, Belinskij A, Korenevskij S (2010) Investigation of palaeodiet in
the North Caucasus (South Russia) Bronze Age using stable isotope analysis and AMS
dating of human and animal bones. J Archaeol Sci 37(12):29712983.
51. Sturm RA, Frudakis TN (2004) Eye colour: Portals into pigmentation genes and an-
cestry. Trends Genet 20(8):327332.
52. Walsh S, et al. (2013) The HIrisPlex system for simultaneous prediction of hair and eye
colour from DNA. Forensic Sci Int Genet 7(1):98115.
53. Phillips PC (2008) Epistasisthe essential role of gene interactions in the structure and
evolution of genetic systems. Nat Rev Genet 9(11):855867.
54. Dale J (2006) Intraspecific variation in coloration. Bird Coloration 2:3686.
55. Hughes KA, Houde AE, Price AC, Rodd FH (2013) Mating advantage for rare males in
wild guppy populations. Nature 503(7474):108110.
56. Farr JA (1977) Male rarity or novelty, female choice behavior, and sexual selection in
the guppy, Poecilia reticulata Peters (Pisces: Poeciliidae). Evolution 31(1):162168.
57. Hofreiter M, Schöneberg T (2010) The genetic and evolutionary basis of colour vari-
ation in vertebrates. Cell Mol Life Sci 67(15):25912603.
58. Banerjee S (1985) Assortative mating for colour in Indian populations. J Biosoc Sci
59. Roberts DF, Kahlon DP (1972) Skin pigmentation and assortative mating in Sikhs.
J Biosoc Sci 4(1):91100.
60. Laeng B, Mathisen R, Johnsen J-A (2007) Why do blue-eyed men prefer women with
the same eye color? Behav Ecol Sociobiol 61(3):371384.
61. Meyer M, Stenzel U, Hofreiter M (2008) Parallel tagged sequencing on the 454
platform. Nat Protoc 3(2):267278.
62. Excoffier L, Foll M (2011) fastsimcoal: A continuous-time coalescent simulator of ge-
nomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27(9):
63. Maynard Smith J (1998) Evolutionary Genetics (Oxford Univ Press, New York), 2nd Ed.
64. Voight BF, et al. (2005) Interrogating multiple aspects of variation in a full re-
sequencing data set to infer human population size changes. Proc Natl Acad Sci USA
65. The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation
from 1,092 human genomes. Nature 491(7422):5665.
66. R Core Team (2012) R: A Language and Environment for Statistical Computing (R
Foundation for Statistical Computing, Vienna).
| Wilde et al.
... Darwinian concepts of natural selection have been detected in real populations thanks to methodological and technological advances using genomic data [31,32]. Ancient human DNA also makes it possible to detect natural selection by analyzing genome-wide samples from populations before, during and after adaptation [33,34,35]. ...
... This paper found that deleterious mutations did not seriously affect the population size even if the deleterious mutations were lethal. In the simulation, the coefficient of selection (death rate) was set to 0 to 0.03 per year, approximately 0 to 1 before reproductive age (32)(33)(34)(35)(36)(37)(38)(39)(40). A selection coefficient of 1 means that the mutation is lethal and that the person does not survive to reproduce. ...
Full-text available
In this work the Monte Carlo code named EVOLVE has been updated to simulate the impact of natural selection on crew size and human evolution during multigenerational interstellar travel. Designing multigenerational interstellar ships requires defining the capacity of a spaceship, which includes many variables, including the space required for one person, food production, closed-ecosystem design and propulsion. EVOLVE version 1 (Sano, 2021) provided a critical crew size of 1,900-2,000 for interstellar travel and estimated the rate of human evolution, including population genetic parameters such as mutation and genetic drift, based on the neutral hypothesis (no natural selection). However, deleterious mutations reduce fitness (reproductive success) and could decrease population size. On the other hand, beneficial mutations, which may occur occasionally, would increase the rate of evolution. Thus, natural selection could be an important factor for multigenerational interstellar travel. Therefore, EVOLVE was updated to version 2, which includes the effect of natural selection on multigenerational interstellar travel. This paper shows that the impact of deleterious mutations on crew size is small and that a critical crew size to maintain a genetically healthy crew during interstellar travel is also approximately 2,000 even if there are deleterious mutations. Finally, this paper shows that human evolution during multigenerational interstellar travel can occur through beneficial mutations, which should be taken into consideration for the design of interstellar spaceships. The evolution rate of space flight is approximately 10 times higher than that of Earth.
... Methods to look at selection at single loci and on polygenic scores in human ancient DNA now often account for the confounding effects of gene flow. These approaches have revealed persuasive signals of selection (Field et al., 2016;Irving-Pease et al., 2022;Ju & Mathieson, 2021;Mathieson et al., 2015;Mathieson & Terhorst, 2022;Souilmi et al., 2022;Wilde et al., 2014). However, these methods only capture outlier loci, and so cannot give us a full picture of how gene flow, selection, and genetic drift have driven genome-wide change. ...
... This does not necessarily contradict the number of signals of temporal selection found to date, as a small fraction of loci could be subject to strong selection (e.g. Irving-Pease et al., 2022;Ju & Mathieson, 2021;Le et al., 2022;Mathieson et al., 2015;Mathieson & Terhorst, 2022;Wilde et al., 2014). Indeed, some of these methods apply similar admixture adjustments as ours, but look for genome-wide outliers and so only detect strong selection on single loci (e.g. ...
Full-text available
Genomic time-series from experimental evolution studies and ancient DNA datasets offer us a chance to more directly observe the interplay of various evolutionary forces. Here we show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 years, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide allele frequency change is due to gene flow. In both cases, after correcting for known major gene flow events in those populations, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
... For the highconfidence signals, we additionally examined the relationship between frequencies and effect sizes for alleles derived in the human lineage since the last common ancestor of chimpanzees and bonobos (Fig. S8). Because 86.9% of derived alleles were the minor allele, it was not surprising that we saw strong effects for variants with allele frequencies close to 0. Large effect sizes were also observed for several variants with whose derived allele was high-frequency, some of which map to previously reported targets of positive selection in human populations (38)(39)(40). We observed this relationship between allele frequency and effect size for both newly observed variant-trait associations and those previously reported in the GWAS Catalog (24), with similar relative proportions in the three well-powered populations (AFR, AMR, EUR). ...
Full-text available
Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide (P < 4.6x10 ⁻¹¹ ) significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations. One Sentence Summary To address the underrepresentation of non-European individuals in genome-wide association studies (GWAS), we conducted a population-stratified phenome-wide GWAS across 2,068 traits in 635,969 participants from the diverse U.S. Department of Veterans Affairs Million Veteran Program, with results expanding our knowledge of variant-trait associations and highlighting the importance of genetic diversity in understanding the architecture of complex health and disease traits.
Full-text available
The appearance of steppe genetic ancestry in the 3rd millennium BC in Europe coincided with the beginning of a new cultural and economic era dominated by pastoralist economy and progressively more centralized social institutions. These genetic and socio-cultural elements were brought to Europe by the descendants of the Eneolithic inhabitants of the Ponto-Caspian steppe. The details of the formation of the steppe genetic package and the identity of the contributors of its genetic components remain elusive. We propose that steppe genetic ancestry, as well as the cultural attributes that characterize the Early Bronze Age steppe pastoralists such as the Yamna(ya) (Pit Grave) culture complex, formed as the result of activities in connection with the circum-Pontic trade network. A millennium-long association among the Eneolithic cultures of the Ponto-Caspian steppe and forest-steppe, the eastern Balkan cultures of west Pontic, and populations of the Caucasus and northeast Anatolia, led to the integration of the elements of genetics, subsistence strategies, material culture, and worldview, to produce the foundation of a novel genetic and socio-cultural phenomenon by the last third of the 4th millennium BC.
Full-text available
Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.
Beginning in 1985, we and others presented estimates of hunter-gatherer (and ultimately ancestral) diet and physical activity, hoping to provide a model for health promotion. The Hunter-Gatherer Model was designed to offset the apparent mismatch between our genes and the current Western-type lifestyle, a mismatch that arguably affects prevalence of many chronic degenerative diseases. The effort has always been controversial and subject to both scientific and popular critiques. The present article (1) addresses eight such challenges, presenting for each how the model has been modified in response, or how the criticism can be rebutted; (2) reviews new epidemiological and experimental evidence (including especially randomized controlled clinical trials); and (3) shows how official recommendations put forth by governments and health authorities have converged toward the model. Such convergence suggests that evolutionary anthropology can make significant contributions to human health.
Despite decades of research, identifying selective sweeps, the genomic footprints of positive selection, remains a core problem in population genetics. Of the myriad methods that have been developed to tackle this task, few are designed to leverage the potential of genomic time-series data. This is because in most population genetic studies of natural populations only a single period of time can be sampled. Recent advancements in sequencing technology, including improvements in extracting and sequencing ancient DNA, have made repeated samplings of a population possible, allowing for more direct analysis of recent evolutionary dynamics. Serial sampling of organisms with shorter generation times has also become more feasible due to improvements in the cost and throughput of sequencing. With these advances in mind, here we present Timesweeper, a fast and accurate convolutional neural network-based tool for identifying selective sweeps in data consisting of multiple genomic samplings of a population over time. Timesweeper population genomic time-series data by first simulating training data under a demographic model appropriate for the data of interest, training a one-dimensional Convolutional Neural Network on said simulations, and inferring which polymorphisms in this serialized dataset were the direct target of a completed or ongoing selective sweep. We show that Timesweeper is accurate under multiple simulated demographic and sampling scenarios, identifies selected variants with high resolution, and estimates selection coefficients more accurately than existing methods. In sum, we show that more accurate inferences about natural selection are possible when genomic time-series data are available; such data will continue to proliferate in coming years due to both the sequencing of ancient samples and repeated samplings of extant populations with faster generation times, as well as experimentally evolved populations where time-series data are often generated. Methodological advances such as Timesweeper thus have the potential to help resolve the controversy over the role of positive selection in the genome. We provide Timesweeper as a Python package for use by the community.
Full-text available
Ancient DNA (aDNA) has been a revolutionary technology in understanding human history but has not been used extensively to study natural selection as large sample sizes to study allele frequency changes over time have thus far not been available. Here, we examined a time transect of 708 published samples over the past 7,000 years of European history using multi-locus genotype-based selection scans. As aDNA data is affected by high missingness, ascertainment bias, DNA damage, random allele calling, and is unphased, we first validated our selection scan, G12ancient, on simulated data resembling aDNA under a demographic model that captures broad features of the allele frequency spectrum of European genomes as well as positive controls that have been previously identified and functionally validated in modern European datasets on data from ancient individuals from time periods very close to the present time. We then applied our statistic to the aDNA time transect to detect and resolve the timing of natural selection occurring genome wide and found several candidates of selection across the different time periods that had not been picked up by selection scans using single SNP allele frequency approaches. In addition, enrichment analysis discovered multiple categories of complex traits that might be under adaptation across these periods. Our results demonstrate the utility of applying different types of selection scans to aDNA to uncover putative selection signals at loci in the ancient past that might have been masked in modern samples.
Full-text available
Lactase persistence (LP) is common among people of European ancestry, but with the exception of some African, Middle Eastern and southern Asian groups, is rare or absent elsewhere in the world. Lactase gene haplotype conservation around a polymorphism strongly associated with LP in Europeans (-13,910 C/T) indicates that the derived allele is recent in origin and has been subject to strong positive selection. Furthermore, ancient DNA work has shown that the -13,910*T (derived) allele was very rare or absent in early Neolithic central Europeans. It is unlikely that LP would provide a selective advantage without a supply of fresh milk, and this has lead to a gene-culture coevolutionary model where lactase persistence is only favoured in cultures practicing dairying, and dairying is more favoured in lactase persistent populations. We have developed a flexible demic computer simulation model to explore the spread of lactase persistence, dairying, other subsistence practices and unlinked genetic markers in Europe and western Asia's geographic space. Using data on -13,910*T allele frequency and farming arrival dates across Europe, and approximate Bayesian computation to estimate parameters of interest, we infer that the -13,910*T allele first underwent selection among dairying farmers around 7,500 years ago in a region between the central Balkans and central Europe, possibly in association with the dissemination of the Neolithic Linearbandkeramik culture over Central Europe. Furthermore, our results suggest that natural selection favouring a lactase persistence allele was not higher in northern latitudes through an increased requirement for dietary vitamin D. Our results provide a coherent and spatially explicit picture of the coevolution of lactase persistence and dairying in Europe.
Full-text available
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
Male guppies, Poecilia reticulata, court females at the rate of 7-13 displays per five minutes on the average, yet female responses to these displays are seldom if ever observed. However, it was found that females tend to respond to the displays of a new male within 30 minutes of his introduction to a laboratory population. Further experimentation revealed that females responded to a second male added to a onemale, two-female population if he were of different coloration from the first, but that such responses were rare if he were the same coloration. Results of breeding experiments suggested that rare males were significantly more successful in mating than were common ones. The female preference for rare or novel males is believed to be partially responsible for the evolution and maintenance of the extreme polymorphism in male secondary sexual coloration observed in nature.