We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost four hundred thousand polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of western and far eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary, and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000 year old Siberian6 . By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe.
We generated genome-wide data from 69 Europeans who lived
between 8,000–3,000 years ago by enriching ancient DNA libraries
for a target set of almost 400,000 polymorphisms. Enrichment of
these positions decreases the sequencing required for genome-wide
ancient DNA analysis by a median of around 250-fold, allowing us
to study an order of magnitude more individuals than previous
and to obtain new insights about the past. We show that
the populations of Western and Far Eastern Europe followed opposite
trajectories between 8,000–5,000 years ago. At the beginningof the
Neolithic period in Europe, 8,000–7,000 years ago, closely related
groups of early farmers appeared in Germany, Hungary and Spain,
different from indigenous hunter-gatherers, whereas Russia was inhab-
ited by a distinctive population of hunter-gatherers with high affinity
to a 24,000-year-old Siberian
.By 6,000–5,000 years ago, farmers
throughout much of Europe had more hunter-gatherer ancestry than
their predecessors, but in Russia, the Yamnaya steppe herders of this
time were descended not only from the preceding eastern European
hunter-gatherers, but also from a population of Near Eastern ances-
try. Western and Eastern Europe came into contact 4,500 years ago,
as the Late Neolithic Corded Ware people from Germany traced
75% of their ancestry to the Yamnaya, documenting a massive
migration into the heartland of Europe from its eastern periphery.
This steppe ancestry persisted in all sampled central Europeans until
at least 3,000 years ago, and is ubiquitous in present-day Europeans.
These results provide support for a steppe origin
of at least some of
the Indo-European languages of Europe.
Genome-wide analysis ofancient DNA has emerged as a transform-
ative technology for studying prehistory, providing information that is
comparable in power to archaeology and linguistics. Realizing its pro-
mise, however, requirescollecting genome-wide data from an adequate
number of individuals to characterize population changes over time,
which meansnot only sampling a succession of archaeological cultures
but also multiple individuals per culture. To make analysis of large num-
bers of ancient DNA samples practical, we used in-solution hybridiza-
tion capture
to enrich next generation sequencing libraries for a
target set of 394,577 single nucleotide polymorphisms (SNPs) (‘390k
capture’), 354,212 of which are autosomal SNPs that have also been
genotyped using the Affymetrix Human Origins array in 2,345 humans
from 203 populations
. This reduces the amount of sequencing re-
quired to obtain genome-wide data by a minimum of 45-fold and a
median of 262-fold (Supplementary Data 1). This strategy allows us to
report genomic scale data on more than twice the number of ancient
Eurasians as has been presented in the entire preceding literature
(Extended Data Table 1).
We used this technology to study population transformations in Europe.
We began by preparing 212 DNA libraries from 119 ancient samples in
dedicated clean rooms, and testing these by light shotgun sequencing
and mitochondrial genome capture (Supplementary Information sec-
tion 1, Supplementary Data 1). We restricted the analysis to libraries
with molecular signatures ofauthentic ancient DNA (elevated damage
in the terminal nucleotide), negligible evidence of contaminationbased
on mismatches to the mitochondrial consensus
and, where available,
a mitochondrial DNA haplogroup that matched previous results using
(Supplementary Information section 2). For 123 libraries
prepared in the presence of uracil-DNA-glycosylase
to reduce errors
due to ancient DNA damage
, we performed 390k capture, carried out
paired-end sequencing and mapped the data to the human genome.
We restricted analysis to 94 libraries from 69 samples that had at least
0.06-fold average target coverage (average of 3.8-fold) and used major-
ity rule to call an allele at each SNP covered at least once (Supplemen-
tary Data 1). After combining our data (Supplementary Information
section 3) with 25 ancient samples from the literature three Upper
Paleolithic samples from Russia
, seven people of European hunter-
gatherer ancestry
, and fifteen European farmers
we had data
from 94 ancient Europeans. Geographically, these came from Germany
(n541), Spain (n510), Russia (n514), Sweden (n512), Hungary
(n515), Italy (n51) and Luxembourg (n51) (Extended Data Table 2).
Following the central European chronology, these included 19 hunter-
gatherers (,43,000–2,600 BC), 28 Early Neolithic farmers (,6,000–
4,000 BC), 11 Middle Neolithic farmers (,4,000–3,000 BC) including
the Tyrolean Iceman
, 9 Late Copper/Early Bronze Age individuals
(Yamnaya: ,3,300–2,700 BC), 15 Late Neolithic individuals (,2,500–
2,200BC), 9 Early Bronze Age individuals (,2,200–1,500BC), two Late
Bronze Age individuals (,1,200–1,100 BC) and one Iron Age indivi-
dual (,900 BC). Two individuals were excluded from analyses as they
were related to others from thesame population. The average number of
SNPs covered at least once was 212,375 and the minimum was 22,869
(Fig. 1).
We determined that 34 of the 69 newly analysed individuals were
male and used 2,258 Y chromosome SNPs targets included in the cap-
ture to obtain high resolution Y chromosome haplogroup calls (Sup-
plementary Informationsection 4). Outside Russia, and before the Late
Neolithic period, only a single R1b individual was found (early Neolithic
Spain) in the combined literature (n570). By contrast, haplogroups
R1a and R1b were found in 60% of Late Neolithic/Bronze Age Europeans
outside Russia (n510), and in 100% of the samples from European
Russia from all periods (7,500–2,700 BC;n59). R1a and R1b are the
most common haplogroups in many European populations today
and our results suggest that they spread into Europe from the East after
3,000 BC. Two hunter-gatherers from Russia included in our study be-
longed to R1a (Karelia) and R1b (Samara), the earliest documented ancient
samples of either haplogroup discovered to date. These two hunter-
gatherers did not belong to the derived lineages M417 within R1a and
M269 within R1b that are predominant in Europeans today
, but all
7 Yamnaya malesdid belong to the M269 subclade
of haplogroup R1b.
Principal components analysis (PCA) of all ancientindividuals along
with 777 present-day West Eurasians
(Fig. 2a, Supplementary Infor-
mation section 5) replicates the positioning of present-day Europeans
between the Near East and European hunter-gatherers
, and the clus-
tering of early farmers from across Europe with present day Sardinians
suggesting that farming expansions across the Mediterranean to Spain
and via the Danubian route toHungary and Germany descended from
a common stock. By adding samplesfrom later periods and additional
locations, we also observe several new patterns.All samples from Russia
have affinity to the ,24,000-year-old MA1 (ref. 6), the type specimen for
the Ancient North Eurasians (ANE) who contributed to both Europeans
and Native Americans
. The two hunter-gatherers from Russia (Karelia
in the northwest of the country and Samara on the steppe near the Urals)
form an ‘easternEuropean hunter-gatherer’ (EHG) cluster at one end of
a hunter-gatherer clineacross Europe; people of hunter-gatherer ances-
try from Luxembourg, Spain, and Hungary sit at the opposite ‘western
European hunter-gatherer’
(WHG) end, while the hunter-gatherers
from Sweden
(SHG) are intermediate.Against this background of dif-
ferentiated European hunter-gatherersand homogeneous early farmers,
multiplepopulation turnoverstranspired in all parts of Europe included
in our study. Middle Neolithic Europeans from Germany, Spain, Hungary,
and Swedenfrom the period ,4,000–3,000 BC are intermedi ate between
the earlier farmers and the WHG, suggesting an increase of WHG ances-
try throughout much of Europe. By contrast, in Russia, the later Yamnaya
steppe herders of ,3,000 BC plot between the EHG and the present-day
Near East/Caucasus, suggesting a decrease of EHG ancestry during the
same time period. The Late Neolithic and Bronze Age samples from
Germany and Hungary
are distinct from the preceding Middle Neo-
lithic and plot between them and the Yamnaya. This pattern is also
seen in ADMIXTURE analysis (Fig. 2b, Supplementary Information
section 6), which implies that the Yamnaya have ancestry from popu-
lations related to the Caucasus and South Asia that is largely absent in
38 Early or Middle Neolithic farmers but present in all 25 Late Neo-
lithic or Bronze Age individuals. This ancestry appears in Central
Europe for the first time in our series with the Corded Ware around
2,500 BC (Supplementary Information section 6, Fig. 2b). The Corded
Ware shared elements of material culture with steppe groups such as
the Yamnaya although whether this reflects movements of people has
been contentious
. Our genetic data provide direct evidence of migra-
tion and suggest that it was relatively sudden. The Corded Ware are
genetically closest to the Yamnaya ,2,600 km away, as inferred both
from PCA and ADMIXTURE (Fig. 2) and F
(0.011 60.002) (Extended
Data Table 3). If continuous gene flow from the east, rather than migra-
tion, had occurred, we would expect successive cultures in Europe
to become increasingly differentiated from the Middle Neolithic, but
0 50,000 100,000 150,000 200,000 250,000 300,000 350,000
Number of autosomal SNPs covered in 94 individuals
Maximum = 354,198
Minimum = 22,869
Mean = 212,375
Median = 231,945
n = 69; this study (UDG treated)
n = 4; previous studies (UDG treated)
n = 21; previous studies (not UDG treated)
(ky BC)Group West Central East
43–22 Pleistocene hunter-gatherer
6–4.6 Holocene hunter-gatherer
6–5.5 Early Neolithic
4–3 Mid Neolithic
3.3–2.7 Late Copper Age (steppe)
2.5–2.2 Late Neolithic
2.2–1.6 Early Bronze Age
1.1 Late Bronze Age
0.9 Iron Age
Ust Ishim (1)
Kostenki14 (1)
MA1 (1)
Karelia (1)
Samara (1)
Motala (7)
Sweden MHG (1)
Sweden NHG (3)
Loschbour (1)
La Brana1 (1) Hungary HG (1)
Starcevo (1)
LBKT (1)
Hungary EN (8)
LBK (12)
Stuttgart (1)
Els Trocs (5)
Iceman (1)
La Mina (4)
Baalberge (3)
Esperstedt (1)
Sweden MN (1)
Yamnaya (9)
Hungary CA (1)
Corded Ware (4)
Karsdorf (1)
Bell Beaker (6)
BenzigerodeHeimburg (3)
Alberstedt (1)
Unetice (8) Hungary BA (2)
Halberstadt (1)
Hungary IA (1)
Figure 1
Location and SNP coverage of samples included in this study.
a, Geographic location and time-scale (central European chronology) of the 69
newly analysed ancient individuals from this study (black outline) and 25 from
the literature for which shotgun sequencing data was available (no outline).
b, Number of SNPs covered at least once in the analysis data set of 94
instead, the Corded Ware are both the earliest and most strongly dif-
ferentiated from the Middle Neolithic population.
‘Outgroup’ f
(Supplementary Information section 7), which
measure shared genetic drift between a pair of populations (Extended
Data Fig. 1), support the clustering of hunter-gatherers, Early/Middle
Neolithic, and Late Neolithic/Bronze Age populations into different
groups as in the PCA (Fig. 2a). We also analysed f
statistics, which allow
us to test whether pairs of populations are consistent with descent from
common ancestral populations, and to assess significance using a nor-
mally distributed Zscore. Early European farmers from the Early and
Middle Neolithic were closely related but not identical. This is reflected
in the fact that Loschbour, a WHG individual from Luxembourg
more alleles with post-4,000 BC European farmers from Germany, Spain,
Hungary, Sweden and Italy than with early farmers of Germany, Spain,
and Hungary, documenting an increase of hunter-gatherer ancestry in
multiple regions of Europe during the course of the Neolithic. The two
EHG form a clade with respect to all other present-day and ancient popu-
lations (jZj,1.9), and MA1 shares more alleles with them (jZj.4.7)
than with other ancient or modern populations, suggesting that they
may be a source for the ANE ancestry in present Europeans
as they
are geographically and temporally more proximate than Upper Paleolithic
Siberians. The Yamnaya differ from the EHG by sharing fewer alleles
with MA1 (jZj56.7) suggesting a dilution of ANE ancestry between
5,000–3,000 BC on the European steppe. This was likely due to admixture
of EHG with a population relatedto present-day Near Easterners, as the
most negativef
statistic in the Yamnaya(giving unambiguousevidence
of admixture) is observed when we model them as a mixture of EHG
and present-day Near Eastern populations like Armenians (Z526.3;
Supplementary Information section 7). The Late Neolithic/BronzeAge
groups of central Europe share more alleles with Yamnaya than the
Middle Neolithic populations do (jZj512.4) and more alleles with the
Middle Neolithic than the Yamnaya do (jZj512.5), and have a nega-
tive f
statistic with the Middle Neolithic and Yamnaya as references
(Z5220.7), indicating that they were descended from a mixture of
the local European populations and new migrants from the east. More-
over, the Yamnaya share more alleles with the CordedWare (jZj$3.6)
than with any other Late Neolithic/Early Bronze Age group with at least
two individuals (Supplementary Information section 7), indicating that
they had more eastern ancestry, consistent with the PCA and ADMIXTURE
patterns (Fig. 2).
Modelling of the ancient samples shows that while Karelia is gen-
etically intermediate between Loschbour and MA1, the topology that
considers Karelia as a mixture of these two elements is not the only one
that can fit the data (Supplementary Information section 8). To avoid
biasing our inferences by fitting an incorrect model, we developed new
statistical methods that are substantial extensions of a previously reported
, which allow us to obtain precise estimates of the proportion
of mixture in later Europeans without requiring a formal model for the
relationship among the ancestral populations. The method (Supplemen-
tary Information section 9) is based on the idea that if a Test population
has ancestry related to reference populations Ref
proportions a
, ..., a
, and the references are themselves differenti-
ally related to a triple of outgroup populations A,B,C, then:
K =16
–0.10 –0.05 0.00 0.05 0.10
Dimension 1
Dimension 2
Eastern European hunter−gatherers (EHG)
Scandinavian hunter−gatherers (SHG)
Early Neolithic (EN)
Middle Neolithic (MN)
Late Neolithic / Bronze Age (LN/BA)
Western European hunter−gatherers (WHG)
Ancient North Eurasians (ANE)
Corded Ware
WHG replaced by early European farmers
>5,500 BC Resurgence of WHG
~5,000–3,000 BC
Dilution of EHG
~5,000–3,000 BC
Arrival of eastern migrants
a b
Figure 2
Population transformations in Europe. a, PCA analysis. b, ADMIXTURE analysis. The full ADMIXTURE analysis including present-day humans is
shown in Supplementary Information section 6.
By using a large number of outgroup populations we can fit the admix-
ture coefficients a
and estimate mixture proportions (Supplementary
Information section 9, Extended Data Fig. 2). Using 15 outgroups
from Africa, Asia, Oceania and the Americas, we obtain good fits as
assessed by a formal test (Supplementary Information section 10), and
estimate that the Middle Neolithic populations of Germany and Spain
have ,18–34% more WHG-related ancestry than Early Neolithic
populations and that the Late Neolithic and Early Bronze Age popula-
tions of Germany have ,22–39% more EHG-related ancestry than the
Middle Neolithic ones (Supplementary Information section 9). If we
model them as mixtures of Yamnaya-related and Middle Neolithic
populations, the inferred degree of population turnover is doubled to
48–80% (Supplementary Information sections 9 and 10).
To distinguish whether a Yamnaya or an EHG source fits the data
better, we added ancient samples as outgroups (Supplementary Infor-
mation section 9). Adding any Early or Middle Neolithic farmer results
in EHG-related genetic input into Late Neolithic populations being a
poor fit to the data (Supplementary Information section 9); thus, Late
Neolithic populationshave ancestry that cannot be explained by a mix-
ture of EHG and Middle Neolithic. When using Yamnaya instead of
EHG, however, we obtain a good fit (Supplementary Information sec-
tions 9 and 10). These results can be explained if the new genetic material
that arrived in Germany was a composite of two elements: EHG and a
type of Near Eastern ancestry different from that which was introduced
by early farmers (also suggested by PCA and ADMIXTURE; Fig. 2, Sup-
plementary Information sections 5 and 6). We estimate that these two
elements each contributed about half the ancestry of the Yamnaya
(Supplementary Information sections 6 and 9), explaining why the
population turnover inferred using Yamnaya as a source is about twice
as high compared to the undiluted EHG. The estimate of Yamnaya-
related ancestry in the Corded Ware is consistent when using either
present populations or ancient Europeans as outgroups (Supplemen-
tary Information sections 9 and 10), and is 73.1 62.2% when both sets
are combined (Supplementary Information section 10). The best pro-
xies for ANE ancestry in Europe
were initially Native Americans
and then the Siberian MA1 (ref. 6), but both are geographically and
temporally too remote for what appears to be a recent migration into
. We can now add three new pieces to the puzzle of how ANE
ancestrywas transmitted to Europe: first bythe EHG, then the Yamnaya
formed by mixture between EHG and a Near Eastern related popu-
lation, and then the Corded Ware who were formed by a mixtureof the
Yamnaya with Middle Neolithic Europeans. We caution that the sampled
Yamnaya individuals from Samara might not be directly ancestral to
Corded Ware individuals from Germany. It is possible that a more
western Yamnaya population, oran earlier (pre-Yamnaya) steppe popu-
lation may have migrated into central Europe, and future work may
uncover more missing links in the chain of transmission of steppe ancestry.
By extending our model to a three-way mixture of WHG, Early Neolithic
and Yamnaya, we estimate that the ancestry of the Corded Ware was
79% Yamnaya-like, 4% WHG, and 17% Early Neolithic (Fig.3). A small
contribution of the first farmers is also consistent with uniparentally
inherited DNA: for example, mitochondrial DNA haplogroup N1a and
Y chromosome haplogroup G2a, common in early central European
, almost disappear during the Late Neolithic and Bronze
Age, when they are largely replaced by Y haplogroups R1a and R1b (Sup-
plementary Information section 4) and mtDNA haplogroups I, T1, U2, U4,
U5a, W, and subtypes of H
(Supplementary Information section 2).
The uniparental data not only confirm a link to the steppe populations
but also suggest that both sexes participated in the migrations (Sup-
plementary Information sections 2 and 4 and Extended Data Table 2).
The magnitude of the population turnover that occurred becomes even
more evident if one considers the fact that the steppemigrants may well
have mixed with eastern European agriculturalists on their way to cen-
tral Europe. Thus, we cannot exclude a scenario in which the Corded
Ware arriving in today’s Germany had no ancestry at all from local
Our results support a view of European pre-history punctuated by
two major migrations: first, the arrival of the first farmers during the
Early Neolithic from the Near East, and second, the arrival of Yamnaya
pastoralists during the Late Neolithic from the steppe. Our data further
show that both migrations were followed by resurgences of the previous
inhabitants: first, during the Middle Neolithic, when hunter-gatherer
ancestry rose again after its Early Neolithic decline, and then between
the Late Neolithic and the present, when farmer and hunter-gatherer
ancestry rose after its Late Neolithic decline. This second resurgence
must have started during the Late Neolithic/Bronze Age period itself,
as the Bell Beaker and Unetice groups had reduced Yamnaya ancestry
compared to the earlier Corded Ware, and comparable levels to that in
some present-day Europeans (Fig. 3). Today, Yamnaya related ances-
try is lower in southern Europe and higher in northern Europe, and all
European populations can be modelled as a three-way mixture of WHG,
Early Neolithic, and Yamnaya, whereas some outlier populations show
evidence for additional admixture with populations from Siberia and
the Near East (Extended Data Fig. 3, Supplementary Information sec-
tion 9). Further data are needed to determine whether the steppeances-
try arrived in southern Europe at the time of the Late Neolithic/Bronze
Age, or is due to migrations in later times from northern Europe
Our results provide new data relevant to debates on the origin and
expansion of Indo-European languages in Europe (Supplementary Infor-
mation section 11). Although the findings from ancient DNA are silent
on the question of the languages spoken by preliterate populations,
they do carry evidence about processes of migration which are invoked
by theories on Indo-Europeanlanguage dispersals. Such theories make
predictions about movements of people to account for the spread of
Early Neolithic (LBK_EN)
Western European hunter−gatherer (Loschbour)
0 0.2 0.4 0.6 0.8 1.0
Figure 3
Admixture proportions. We estimate mixture proportions
using a method that gives unbiased estimates even without an accurate
model for the relationships between the test populations and the outgroup
populations (Supplementary Information section 9). Population samples
are grouped according to chronology (ancient) and Yamnaya ancestry
(present-day humans).
languages and material culture (Extended Data Fig. 4). The technology
of ancient DNA makes it possible to reject or confirm the proposed
migratory movements, as well as to identify new movements that
were not previously known. The best argument for the ‘Anatolian
that Indo-European languages arrived in Europe from
Anatolia ,8,500 years ago is that major language replacements are
thought to require major migrations, and that after the Early Neolithic
when farmers established themselves in Europe, the population base
was likely to have been so large that later migrations would not have
made much of an impact
. However, our study shows that a later
major turnover did occur, and that steppe migrants replaced ,75% of
the ancestry of central Europeans. An alternative theory is the ‘steppe
hypothesis’, which proposes that early Indo-European speakers were
pastoralists of the grasslands north of the Black and Caspian Seas, and
that their languages spread into Europe after the invention of wheeled
. Our results make a compelling case for the steppe as a source
of at least some of the Indo-European languages in Europe by doc-
umenting a massive migration ,4,500 years ago associated with the
Yamnaya and Corded Ware cultures, which are identified by proponents
of the steppe hypothesis as vectors for the spread of Indo-European
languages into Europe. These results challenge the Anatolian hypothesis
by showing that not all Indo-European languages in Europe can plaus-
ibly derive from the first farmer migrations thousands of years earlier
(Supplementary Information section 11). We caution that the location
of the proto-Indo-European
homeland that also gave rise to the
Indo-European languages of Asia, as well as the Indo-European lan-
guages of southeastern Europe, cannot be determined from the data
reported here (Supplementary Information section 11). Studying the
mixture in the Yamnaya themselves, and understanding the genetic
relationships among a broader set of ancient and present-day Indo-
European speakers, may lead to new insight about the shared homeland.
