ArticlePDF Available

Massive migration from the steppe is a source for Indo-European languages in Europe


Abstract and Figures

We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost four hundred thousand polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of western and far eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary, and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000 year old Siberian6 . By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe.
Content may be subject to copyright.
LETTER doi:10.1038/nature14317
Massive migration from the steppe was a source for
Indo-European languages in Europe
Wolfgang Haak
*, Iosif Lazaridis
*, Nick Patterson
, Nadin Rohland
, Swapan Mallick
, Bastien Llamas
, Guido Brandt
Susanne Nordenfelt
, Eadaoin Harney
, Kristin Stewardson
, Qiaomei Fu
, Alissa Mittnik
, Eszter Ba
Christos Economou
, Michael Francken
, Susanne Friederich
, Rafael Garrido Pena
, Fredrik Hallgren
, Valery Khartanovich
Aleksandr Khokhlov
, Michael Kunst
, Pavel Kuznetsov
, Harald Meller
, Oleg Mochalov
, Vayacheslav Moiseyev
Nicole Nicklisch
, Sandra L. Pichler
, Roberto Risch
, Manuel A. Rojo Guerra
, Christina Roth
, Anna Sze
Joachim Wahl
, Matthias Meyer
, Johannes Krause
, Dorcas Brown
, David Anthony
, Alan Cooper
Kurt Werner Alt
& David Reich
We generated genome-wide data from 69 Europeans who lived
between 8,000–3,000 years ago by enriching ancient DNA libraries
for a target set of almost 400,000 polymorphisms. Enrichment of
these positions decreases the sequencing required for genome-wide
ancient DNA analysis by a median of around 250-fold, allowing us
to study an order of magnitude more individuals than previous
and to obtain new insights about the past. We show that
the populations of Western and Far Eastern Europe followed opposite
trajectories between 8,000–5,000 years ago. At the beginningof the
Neolithic period in Europe, 8,000–7,000 years ago, closely related
groups of early farmers appeared in Germany, Hungary and Spain,
different from indigenous hunter-gatherers, whereas Russia was inhab-
ited by a distinctive population of hunter-gatherers with high affinity
to a 24,000-year-old Siberian
.By 6,000–5,000 years ago, farmers
throughout much of Europe had more hunter-gatherer ancestry than
their predecessors, but in Russia, the Yamnaya steppe herders of this
time were descended not only from the preceding eastern European
hunter-gatherers, but also from a population of Near Eastern ances-
try. Western and Eastern Europe came into contact 4,500 years ago,
as the Late Neolithic Corded Ware people from Germany traced
75% of their ancestry to the Yamnaya, documenting a massive
migration into the heartland of Europe from its eastern periphery.
This steppe ancestry persisted in all sampled central Europeans until
at least 3,000 years ago, and is ubiquitous in present-day Europeans.
These results provide support for a steppe origin
of at least some of
the Indo-European languages of Europe.
Genome-wide analysis ofancient DNA has emerged as a transform-
ative technology for studying prehistory, providing information that is
comparable in power to archaeology and linguistics. Realizing its pro-
mise, however, requirescollecting genome-wide data from an adequate
number of individuals to characterize population changes over time,
which meansnot only sampling a succession of archaeological cultures
but also multiple individuals per culture. To make analysis of large num-
bers of ancient DNA samples practical, we used in-solution hybridiza-
tion capture
to enrich next generation sequencing libraries for a
target set of 394,577 single nucleotide polymorphisms (SNPs) (‘390k
capture’), 354,212 of which are autosomal SNPs that have also been
genotyped using the Affymetrix Human Origins array in 2,345 humans
from 203 populations
. This reduces the amount of sequencing re-
quired to obtain genome-wide data by a minimum of 45-fold and a
median of 262-fold (Supplementary Data 1). This strategy allows us to
report genomic scale data on more than twice the number of ancient
Eurasians as has been presented in the entire preceding literature
(Extended Data Table 1).
We used this technology to study population transformations in Europe.
We began by preparing 212 DNA libraries from 119 ancient samples in
dedicated clean rooms, and testing these by light shotgun sequencing
and mitochondrial genome capture (Supplementary Information sec-
tion 1, Supplementary Data 1). We restricted the analysis to libraries
with molecular signatures ofauthentic ancient DNA (elevated damage
in the terminal nucleotide), negligible evidence of contaminationbased
on mismatches to the mitochondrial consensus
and, where available,
a mitochondrial DNA haplogroup that matched previous results using
(Supplementary Information section 2). For 123 libraries
prepared in the presence of uracil-DNA-glycosylase
to reduce errors
due to ancient DNA damage
, we performed 390k capture, carried out
paired-end sequencing and mapped the data to the human genome.
We restricted analysis to 94 libraries from 69 samples that had at least
0.06-fold average target coverage (average of 3.8-fold) and used major-
ity rule to call an allele at each SNP covered at least once (Supplemen-
tary Data 1). After combining our data (Supplementary Information
section 3) with 25 ancient samples from the literature — three Upper
Paleolithic samples from Russia
, seven people of European hunter-
gatherer ancestry
, and fifteen European farmers
— we had data
from 94 ancient Europeans. Geographically, these came from Germany
(n541), Spain (n510), Russia (n514), Sweden (n512), Hungary
(n515), Italy (n51) and Luxembourg (n51) (Extended Data Table 2).
Following the central European chronology, these included 19 hunter-
gatherers (,43,000–2,600 BC), 28 Early Neolithic farmers (,6,000–
4,000 BC), 11 Middle Neolithic farmers (,4,000–3,000 BC) including
*These authors contributed equally to this work.
Australian Centre for Ancient DNA, School of Earth and Environmental Sciences & Environment Institute, University of Adelaide, Adelaide, South Australia 5005, Australia.
Department of Genetics, Harvard
Medical School, Boston, Massachusetts 02115, USA.
Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.
Howard Hughes Medical Institute, Harvard Medical School, Boston,
Massachusetts 02115, USA.
Institute of Anthropology, Johannes Gutenberg University of Mainz, D-55128 Mainz, Germany.
Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig,
Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, IVPP, CAS, Beijing 100049, China.
Institute for Archaeological Sciences, University of Tu
D-72070 Tu
¨bingen, Germany.
Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Science, H-1014 Budapest, Hungary.
¨misch Germanische Kommission (RGK)
Frankfurt, D-60325 Frankfurt, Germany.
Archaeological Research Laboratory, Stockholm University, 114 18 Stockholm, Sweden.
Departments of Paleoanthropology and Archaeogenetics,
Senckenberg Center for Human Evolution and Paleoenvironment, University of Tu
¨bingen, D-72070 Tu
¨bingen, Germany.
State Office for Heritage Management and Archaeology Saxony-Anhalt and State
Museum of Prehistory, D-06114 Halle, Germany.
Departamento de Prehistoria y Arqueologı
´a, Facultad de Filosofı
´a y Letras, Universidad Auto
´noma de Madrid, E-28049 Madrid, Spain.
The Cultural
Heritage Foundation, Va
˚s 722 12, Sweden.
Peter the Great Museum of Anthropology and Ethnography (Kunstkamera) RAS, St Petersburg 199034, Russia.
Volga State Academy of Social Sciences
and Humanities, Samara 443099, Russia.
Deutsches Archaeologisches Institut, Abteilung Madrid, E-28002 Madrid, Spain.
Danube Private University, A-3500 Krems, Austria.
Institute for Prehistory
and Archaeological Science, University of Basel, CH-4003 Basel, Switzerland.
Departamento de Prehisto
`ria, Universitat Auto
`noma de Barcelona, E-08193 Barcelona, Spain.
Departamento de
`ria y Arqueolgia, Universidadde Valladolid, E-47002 Valladolid, Spain.
State Office for CulturalHeritage Management Baden-Wu
¨rttemberg, Osteology,D-78467 Konstanz, Germany.
Max Planck
Institute for the Science of Human History, D-07745 Jena, Germany.
Anthropology Department,Hartwick College, Oneonta, New York 13820, USA.
00 MONTH 2015 | VOL 000 | NATURE | 1
Macmillan Publishers Limited. All rights reserved
the Tyrolean Iceman
, 9 Late Copper/Early Bronze Age individuals
(Yamnaya: ,3,300–2,700 BC), 15 Late Neolithic individuals (,2,500–
2,200BC), 9 Early Bronze Age individuals (,2,200–1,500BC), two Late
Bronze Age individuals (,1,200–1,100 BC) and one Iron Age indivi-
dual (,900 BC). Two individuals were excluded from analyses as they
were related to others from thesame population. The average number of
SNPs covered at least once was 212,375 and the minimum was 22,869
(Fig. 1).
We determined that 34 of the 69 newly analysed individuals were
male and used 2,258 Y chromosome SNPs targets included in the cap-
ture to obtain high resolution Y chromosome haplogroup calls (Sup-
plementary Informationsection 4). Outside Russia, and before the Late
Neolithic period, only a single R1b individual was found (early Neolithic
Spain) in the combined literature (n570). By contrast, haplogroups
R1a and R1b were found in 60% of Late Neolithic/Bronze Age Europeans
outside Russia (n510), and in 100% of the samples from European
Russia from all periods (7,500–2,700 BC;n59). R1a and R1b are the
most common haplogroups in many European populations today
and our results suggest that they spread into Europe from the East after
3,000 BC. Two hunter-gatherers from Russia included in our study be-
longed to R1a (Karelia) and R1b (Samara), the earliest documented ancient
samples of either haplogroup discovered to date. These two hunter-
gatherers did not belong to the derived lineages M417 within R1a and
M269 within R1b that are predominant in Europeans today
, but all
7 Yamnaya malesdid belong to the M269 subclade
of haplogroup R1b.
Principal components analysis (PCA) of all ancientindividuals along
with 777 present-day West Eurasians
(Fig. 2a, Supplementary Infor-
mation section 5) replicates the positioning of present-day Europeans
between the Near East and European hunter-gatherers
, and the clus-
tering of early farmers from across Europe with present day Sardinians
suggesting that farming expansions across the Mediterranean to Spain
and via the Danubian route toHungary and Germany descended from
a common stock. By adding samplesfrom later periods and additional
locations, we also observe several new patterns.All samples from Russia
have affinity to the ,24,000-year-old MA1 (ref. 6), the type specimen for
the Ancient North Eurasians (ANE) who contributed to both Europeans
and Native Americans
. The two hunter-gatherers from Russia (Karelia
in the northwest of the country and Samara on the steppe near the Urals)
form an ‘easternEuropean hunter-gatherer’ (EHG) cluster at one end of
a hunter-gatherer clineacross Europe; people of hunter-gatherer ances-
try from Luxembourg, Spain, and Hungary sit at the opposite ‘western
European hunter-gatherer’
(WHG) end, while the hunter-gatherers
from Sweden
(SHG) are intermediate.Against this background of dif-
ferentiated European hunter-gatherersand homogeneous early farmers,
multiplepopulation turnoverstranspired in all parts of Europe included
in our study. Middle Neolithic Europeans from Germany, Spain, Hungary,
and Swedenfrom the period ,4,000–3,000 BC are intermedi ate between
the earlier farmers and the WHG, suggesting an increase of WHG ances-
try throughout much of Europe. By contrast, in Russia, the later Yamnaya
steppe herders of ,3,000 BC plot between the EHG and the present-day
Near East/Caucasus, suggesting a decrease of EHG ancestry during the
same time period. The Late Neolithic and Bronze Age samples from
Germany and Hungary
are distinct from the preceding Middle Neo-
lithic and plot between them and the Yamnaya. This pattern is also
seen in ADMIXTURE analysis (Fig. 2b, Supplementary Information
section 6), which implies that the Yamnaya have ancestry from popu-
lations related to the Caucasus and South Asia that is largely absent in
38 Early or Middle Neolithic farmers but present in all 25 Late Neo-
lithic or Bronze Age individuals. This ancestry appears in Central
Europe for the first time in our series with the Corded Ware around
2,500 BC (Supplementary Information section 6, Fig. 2b). The Corded
Ware shared elements of material culture with steppe groups such as
the Yamnaya although whether this reflects movements of people has
been contentious
. Our genetic data provide direct evidence of migra-
tion and suggest that it was relatively sudden. The Corded Ware are
genetically closest to the Yamnaya ,2,600 km away, as inferred both
from PCA and ADMIXTURE (Fig. 2) and F
(0.011 60.002) (Extended
Data Table 3). If continuous gene flow from the east, rather than migra-
tion, had occurred, we would expect successive cultures in Europe
to become increasingly differentiated from the Middle Neolithic, but
0 50,000 100,000 150,000 200,000 250,000 300,000 350,000
Number of autosomal SNPs covered in 94 individuals
Maximum = 354,198
Minimum = 22,869
Mean = 212,375
Median = 231,945
n = 69; this study (UDG treated)
n = 4; previous studies (UDG treated)
n = 21; previous studies (not UDG treated)
(ky BC)Group West Central East
43–22 Pleistocene hunter-gatherer
6–4.6 Holocene hunter-gatherer
6–5.5 Early Neolithic
4–3 Mid Neolithic
3.3–2.7 Late Copper Age (steppe)
2.5–2.2 Late Neolithic
2.2–1.6 Early Bronze Age
1.1 Late Bronze Age
0.9 Iron Age
Ust Ishim (1)
Kostenki14 (1)
MA1 (1)
Karelia (1)
Samara (1)
Motala (7)
Sweden MHG (1)
Sweden NHG (3)
Loschbour (1)
La Brana1 (1) Hungary HG (1)
Starcevo (1)
LBKT (1)
Hungary EN (8)
LBK (12)
Stuttgart (1)
Els Trocs (5)
Iceman (1)
La Mina (4)
Baalberge (3)
Esperstedt (1)
Sweden MN (1)
Yamnaya (9)
Hungary CA (1)
Corded Ware (4)
Karsdorf (1)
Bell Beaker (6)
BenzigerodeHeimburg (3)
Alberstedt (1)
Unetice (8) Hungary BA (2)
Halberstadt (1)
Hungary IA (1)
Figure 1
Location and SNP coverage of samples included in this study.
a, Geographic location and time-scale (central European chronology) of the 69
newly analysed ancient individuals from this study (black outline) and 25 from
the literature for which shotgun sequencing data was available (no outline).
b, Number of SNPs covered at least once in the analysis data set of 94
2 | NATURE | VOL 000 | 00 MONTH 2015
Macmillan Publishers Limited. All rights reserved
instead, the Corded Ware are both the earliest and most strongly dif-
ferentiated from the Middle Neolithic population.
‘Outgroup’ f
(Supplementary Information section 7), which
measure shared genetic drift between a pair of populations (Extended
Data Fig. 1), support the clustering of hunter-gatherers, Early/Middle
Neolithic, and Late Neolithic/Bronze Age populations into different
groups as in the PCA (Fig. 2a). We also analysed f
statistics, which allow
us to test whether pairs of populations are consistent with descent from
common ancestral populations, and to assess significance using a nor-
mally distributed Zscore. Early European farmers from the Early and
Middle Neolithic were closely related but not identical. This is reflected
in the fact that Loschbour, a WHG individual from Luxembourg
more alleles with post-4,000 BC European farmers from Germany, Spain,
Hungary, Sweden and Italy than with early farmers of Germany, Spain,
and Hungary, documenting an increase of hunter-gatherer ancestry in
multiple regions of Europe during the course of the Neolithic. The two
EHG form a clade with respect to all other present-day and ancient popu-
lations (jZj,1.9), and MA1 shares more alleles with them (jZj.4.7)
than with other ancient or modern populations, suggesting that they
may be a source for the ANE ancestry in present Europeans
as they
are geographically and temporally more proximate than Upper Paleolithic
Siberians. The Yamnaya differ from the EHG by sharing fewer alleles
with MA1 (jZj56.7) suggesting a dilution of ANE ancestry between
5,000–3,000 BC on the European steppe. This was likely due to admixture
of EHG with a population relatedto present-day Near Easterners, as the
most negativef
statistic in the Yamnaya(giving unambiguousevidence
of admixture) is observed when we model them as a mixture of EHG
and present-day Near Eastern populations like Armenians (Z526.3;
Supplementary Information section 7). The Late Neolithic/BronzeAge
groups of central Europe share more alleles with Yamnaya than the
Middle Neolithic populations do (jZj512.4) and more alleles with the
Middle Neolithic than the Yamnaya do (jZj512.5), and have a nega-
tive f
statistic with the Middle Neolithic and Yamnaya as references
(Z5220.7), indicating that they were descended from a mixture of
the local European populations and new migrants from the east. More-
over, the Yamnaya share more alleles with the CordedWare (jZj$3.6)
than with any other Late Neolithic/Early Bronze Age group with at least
two individuals (Supplementary Information section 7), indicating that
they had more eastern ancestry, consistent with the PCA and ADMIXTURE
patterns (Fig. 2).
Modelling of the ancient samples shows that while Karelia is gen-
etically intermediate between Loschbour and MA1, the topology that
considers Karelia as a mixture of these two elements is not the only one
that can fit the data (Supplementary Information section 8). To avoid
biasing our inferences by fitting an incorrect model, we developed new
statistical methods that are substantial extensions of a previously reported
, which allow us to obtain precise estimates of the proportion
of mixture in later Europeans without requiring a formal model for the
relationship among the ancestral populations. The method (Supplemen-
tary Information section 9) is based on the idea that if a Test population
has ancestry related to reference populations Ref
proportions a
, ..., a
, and the references are themselves differenti-
ally related to a triple of outgroup populations A,B,C, then:
K =16
–0.10 –0.05 0.00 0.05 0.10
Dimension 1
Dimension 2
Eastern European hunter−gatherers (EHG)
Scandinavian hunter−gatherers (SHG)
Early Neolithic (EN)
Middle Neolithic (MN)
Late Neolithic / Bronze Age (LN/BA)
Western European hunter−gatherers (WHG)
Ancient North Eurasians (ANE)
Corded Ware
WHG replaced by early European farmers
>5,500 BC Resurgence of WHG
~5,000–3,000 BC
Dilution of EHG
~5,000–3,000 BC
Arrival of eastern migrants
a b
Figure 2
Population transformations in Europe. a, PCA analysis. b, ADMIXTURE analysis. The full ADMIXTURE analysis including present-day humans is
shown in Supplementary Information section 6.
00 MONTH 2015 | VOL 000 | NATURE | 3
Macmillan Publishers Limited. All rights reserved
By using a large number of outgroup populations we can fit the admix-
ture coefficients a
and estimate mixture proportions (Supplementary
Information section 9, Extended Data Fig. 2). Using 15 outgroups
from Africa, Asia, Oceania and the Americas, we obtain good fits as
assessed by a formal test (Supplementary Information section 10), and
estimate that the Middle Neolithic populations of Germany and Spain
have ,18–34% more WHG-related ancestry than Early Neolithic
populations and that the Late Neolithic and Early Bronze Age popula-
tions of Germany have ,22–39% more EHG-related ancestry than the
Middle Neolithic ones (Supplementary Information section 9). If we
model them as mixtures of Yamnaya-related and Middle Neolithic
populations, the inferred degree of population turnover is doubled to
48–80% (Supplementary Information sections 9 and 10).
To distinguish whether a Yamnaya or an EHG source fits the data
better, we added ancient samples as outgroups (Supplementary Infor-
mation section 9). Adding any Early or Middle Neolithic farmer results
in EHG-related genetic input into Late Neolithic populations being a
poor fit to the data (Supplementary Information section 9); thus, Late
Neolithic populationshave ancestry that cannot be explained by a mix-
ture of EHG and Middle Neolithic. When using Yamnaya instead of
EHG, however, we obtain a good fit (Supplementary Information sec-
tions 9 and 10). These results can be explained if the new genetic material
that arrived in Germany was a composite of two elements: EHG and a
type of Near Eastern ancestry different from that which was introduced
by early farmers (also suggested by PCA and ADMIXTURE; Fig. 2, Sup-
plementary Information sections 5 and 6). We estimate that these two
elements each contributed about half the ancestry of the Yamnaya
(Supplementary Information sections 6 and 9), explaining why the
population turnover inferred using Yamnaya as a source is about twice
as high compared to the undiluted EHG. The estimate of Yamnaya-
related ancestry in the Corded Ware is consistent when using either
present populations or ancient Europeans as outgroups (Supplemen-
tary Information sections 9 and 10), and is 73.1 62.2% when both sets
are combined (Supplementary Information section 10). The best pro-
xies for ANE ancestry in Europe
were initially Native Americans
and then the Siberian MA1 (ref. 6), but both are geographically and
temporally too remote for what appears to be a recent migration into
. We can now add three new pieces to the puzzle of how ANE
ancestrywas transmitted to Europe: first bythe EHG, then the Yamnaya
formed by mixture between EHG and a Near Eastern related popu-
lation, and then the Corded Ware who were formed by a mixtureof the
Yamnaya with Middle Neolithic Europeans. We caution that the sampled
Yamnaya individuals from Samara might not be directly ancestral to
Corded Ware individuals from Germany. It is possible that a more
western Yamnaya population, oran earlier (pre-Yamnaya) steppe popu-
lation may have migrated into central Europe, and future work may
uncover more missing links in the chain of transmission of steppe ancestry.
By extending our model to a three-way mixture of WHG, Early Neolithic
and Yamnaya, we estimate that the ancestry of the Corded Ware was
79% Yamnaya-like, 4% WHG, and 17% Early Neolithic (Fig.3). A small
contribution of the first farmers is also consistent with uniparentally
inherited DNA: for example, mitochondrial DNA haplogroup N1a and
Y chromosome haplogroup G2a, common in early central European
, almost disappear during the Late Neolithic and Bronze
Age, when they are largely replaced by Y haplogroups R1a and R1b (Sup-
plementary Information section 4) and mtDNA haplogroups I, T1, U2, U4,
U5a, W, and subtypes of H
(Supplementary Information section 2).
The uniparental data not only confirm a link to the steppe populations
but also suggest that both sexes participated in the migrations (Sup-
plementary Information sections 2 and 4 and Extended Data Table 2).
The magnitude of the population turnover that occurred becomes even
more evident if one considers the fact that the steppemigrants may well
have mixed with eastern European agriculturalists on their way to cen-
tral Europe. Thus, we cannot exclude a scenario in which the Corded
Ware arriving in today’s Germany had no ancestry at all from local
Our results support a view of European pre-history punctuated by
two major migrations: first, the arrival of the first farmers during the
Early Neolithic from the Near East, and second, the arrival of Yamnaya
pastoralists during the Late Neolithic from the steppe. Our data further
show that both migrations were followed by resurgences of the previous
inhabitants: first, during the Middle Neolithic, when hunter-gatherer
ancestry rose again after its Early Neolithic decline, and then between
the Late Neolithic and the present, when farmer and hunter-gatherer
ancestry rose after its Late Neolithic decline. This second resurgence
must have started during the Late Neolithic/Bronze Age period itself,
as the Bell Beaker and Unetice groups had reduced Yamnaya ancestry
compared to the earlier Corded Ware, and comparable levels to that in
some present-day Europeans (Fig. 3). Today, Yamnaya related ances-
try is lower in southern Europe and higher in northern Europe, and all
European populations can be modelled as a three-way mixture of WHG,
Early Neolithic, and Yamnaya, whereas some outlier populations show
evidence for additional admixture with populations from Siberia and
the Near East (Extended Data Fig. 3, Supplementary Information sec-
tion 9). Further data are needed to determine whether the steppeances-
try arrived in southern Europe at the time of the Late Neolithic/Bronze
Age, or is due to migrations in later times from northern Europe
Our results provide new data relevant to debates on the origin and
expansion of Indo-European languages in Europe (Supplementary Infor-
mation section 11). Although the findings from ancient DNA are silent
on the question of the languages spoken by preliterate populations,
they do carry evidence about processes of migration which are invoked
by theories on Indo-Europeanlanguage dispersals. Such theories make
predictions about movements of people to account for the spread of
Early Neolithic (LBK_EN)
Western European hunter−gatherer (Loschbour)
0 0.2 0.4 0.6 0.8 1.0
Figure 3
Admixture proportions. We estimate mixture proportions
using a method that gives unbiased estimates even without an accurate
model for the relationships between the test populations and the outgroup
populations (Supplementary Information section 9). Population samples
are grouped according to chronology (ancient) and Yamnaya ancestry
(present-day humans).
4 | NATURE | VOL 000 | 00 MONTH 2015
Macmillan Publishers Limited. All rights reserved
languages and material culture (Extended Data Fig. 4). The technology
of ancient DNA makes it possible to reject or confirm the proposed
migratory movements, as well as to identify new movements that
were not previously known. The best argument for the ‘Anatolian
that Indo-European languages arrived in Europe from
Anatolia ,8,500 years ago is that major language replacements are
thought to require major migrations, and that after the Early Neolithic
when farmers established themselves in Europe, the population base
was likely to have been so large that later migrations would not have
made much of an impact
. However, our study shows that a later
major turnover did occur, and that steppe migrants replaced ,75% of
the ancestry of central Europeans. An alternative theory is the ‘steppe
hypothesis’, which proposes that early Indo-European speakers were
pastoralists of the grasslands north of the Black and Caspian Seas, and
that their languages spread into Europe after the invention of wheeled
. Our results make a compelling case for the steppe as a source
of at least some of the Indo-European languages in Europe by doc-
umenting a massive migration ,4,500 years ago associated with the
Yamnaya and Corded Ware cultures, which are identified by proponents
of the steppe hypothesis as vectors for the spread of Indo-European
languages into Europe. These results challenge the Anatolian hypothesis
by showing that not all Indo-European languages in Europe can plaus-
ibly derive from the first farmer migrations thousands of years earlier
(Supplementary Information section 11). We caution that the location
of the proto-Indo-European
homeland that also gave rise to the
Indo-European languages of Asia, as well as the Indo-European lan-
guages of southeastern Europe, cannot be determined from the data
reported here (Supplementary Information section 11). Studying the
mixture in the Yamnaya themselves, and understanding the genetic
relationships among a broader set of ancient and present-day Indo-
European speakers, may lead to new insight about the shared homeland.
Online Content Methods, along with any additional Extended Data display items
and SourceData, are available in theonline version of the paper;references unique
to these sections appear only in the online paper.
Received 29 December 2014; accepted 12 February 2015.
Published online 2 March 2015.
1. Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western
Siberia. Nature 514, 445–449 (2014).
2. Gamba, C. et al. Genome flux and stasisin a five millennium transect of European
prehistory. Nature Commun. 5, 5257 (2014).
3. Keller, A. et al. New insights into the Tyrolean Iceman’s origin and phenotype as
inferred by whole-genome sequencing. Nature Commun. 3, 698 (2012).
4. Lazaridis, I. et al. Ancient humangenomes suggest three ancestral populations for
present-day Europeans. Nature 513, 409–413 (2014).
5. Olalde,I. et al. Derived immune and ancestralpigmentation allelesin a 7,000-year-
old Mesolithic European. Nature 507, 225–228 (2014).
6. Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of
Native Americans. Nature 505, 87–91 (2014).
7. Seguin-Orlando, A. et al. Genomic structure in Europeans dating back at least
36,200 years. Science 346, 1113–1118 (2014) .
8. Skoglund, P. et al. Genomic diversity and admixture differs for Stone-Age
Scandinavian foragers and farmers. Science 344, 747–750 (2014).
9. Anthony,D. W. The Horse, the Wheel, and Language:How Bronze-Age Ridersfrom the
Eurasian Steppes Shaped the Modern World (Princeton Univ. Press, 2007).
10. Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China.
Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).
11. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil–DNA–
glycosylasetreatment for screeningof ancient DNA. Phil.Trans. R. Soc. Lond. B 370,
20130624 (2015).
12. Patterson,N. et al. Ancient admixture in human history.Genetics 192, 1065–1093
13. Fu, Q. et al. A revised timescale for human evolution based on ancient
mitochondrial genomes. Curr. Biol. 23, 553–559 (2013).
14. Brandt, G. et al. Ancient DNA reveals key stages in the formation of central
European mitochondrial genetic diversity. Science 342, 257–261 (2013).
15. Der Sarkissian, C. et al. Ancient DNA reveals prehistoric gene-flow from Siberia in
the complex human population history of North East Europe. PLoS Genet. 9,
e1003296 (2013).
16. Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo
methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
17. Briggs, A. W. et al. Patterns of damage in genomic DNA sequences from a
Neandertal. Proc. Natl Acad. Sci. USA 104, 14616–14621 (2007).
18. Myres, N. M. et al. A major Y-chromosome haplogroup R1b Holocene era
founder effect in Central and Western Europe. Eur. J. Hum. Genet. 19, 95–101
19. Underhill,P. A. et al. Thephylogenetic and geographic structureof Y-chromosome
haplogroup R1a. Eur. J. Hum. Genet. 23, 124–131 (2015).
20. Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-
gatherers in Europe. Science 336, 466–469 (2012).
21. Czebreszuk, J. in Ancient Europe, 8000 B.C. to A.D. 1000: Encyclopedia of the
Barbarian World (eds Bogucki, P. I. & Crabtree, P. J.) 467–475 (Charles Scribners&
Sons, 2003).
22. Lipson, M. et al. Efficient moment-based inference of admixture parameters and
sources of gene flow. Mol. Biol. Evol. 30, 1788–1802 (2013).
23. Sze
´nyi-Nagy,A. et al. Tracing the genetic originof Europe’s first farmersreveals
insights into their social organization. Preprint at bioRxiv
10.1101/008664 (2014).
24. Haak, W. et al. Ancient DNA from European early Neolithic farmers reveals their
Near Eastern affinities. PLoS Biol. 8, e1000536 (2010).
25. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343,
747–751 (2014).
26. Ralph, P. & Coop, G. The geography of recent geneticancestry across Europe.PLoS
Biol. 11, e1001555 (2013).
27. Renfrew, C. Archaeology and Language: The Puzzle of Indo-European Origins
(Pimlico, 1987).
28. Bellwood, P. First Farmers: The Origins of Agricultural Societies (Wiley-Blackwell,
29. Gamkrelidze,T. V. & Ivanov, V. V. The earlyhistory of Indo-Europeanlanguages. Sci.
Am. 262, 110–116 (1990).
30. Mallory, J. P. In Search of the Indo-Europeans: Language, Archaeology and Myth
(Thames and Hudson, 1991).
Supplementary Information is available in the online version of the paper.
Acknowledgements We thank P. Bellwood, J. Burger, P. Heggarty, M. Lipson,
C. Renfrew,J. Diamond, S.Pa
¨bo, R. Pinhasi and P. Skoglundfor critical comments, and
the Initiative for the Science of the Human Past at Harvard for organizing a workshop
around the issues touched on by this paper. We thank S. Pa
¨bo for support for
establishing the ancient DNA facilities in Boston, and P. Skoglund for detecting the
presence of two related individuals in our data set. We thank L. Orlando,
T. S. Korneliussen, and C. Gamba for help in obtaining data. We thank Agilent
Technologies and G. Frommer for help in developing the capture reagents. We thank
C. Der Sarkissian, G. Valverde, L. Papac and B. Nickel for wet laboratory support. We
thank archaeologists V. Dresely, R. Ganslmeier, O. Balanvosky, J. Ignacio Royo Guille
A. Oszta
´s, V. Majerik, T. Paluch, K. Somogyi and V.Voicsek for sharing samples and
discussionabout archaeologicalcontext. This research was supported by an Australian
Research Council grant to W.H. and B.L. (DP130102158), and German Research
Foundationgrants to K.W.A. (Al 287/7-1and 7-3, Al 287/10-1 and Al 287/14-1)and to
H.M. (Me 3245/1-1 and 1-3). D.R. was supported by US National Science Foundation
HOMINID grant BCS-1032255, US National Institutesof Health grant GM100233, and
the Howard Hughes Medical Institute.
Author Contributions W.H., N.P., N.R., J.K., K.W.A. and D.R. supervised the study. W.H.,
E.B., C.E., M.F., S.F., R.G.P., F.H., V.K., A.K., M.K., P.K., H.M., O.M., V.M., N.N., S.L.P., R.R.,
M.A.R.G.,C.R., A.S.-N., J.W.,J.K., D.B., D.A., A.C.,K.W.A. and D.R. assembled archaeological
material,W.H., I.L., N.P., N.R., S.M.,A.M. and D.R. analysed genetic data. I.L.,N.P. and D.R.
developed methods using fstatistics for inferring admixture proportions. W.H., N.R.,
B.L., G.B., S.N., E.H., K.S. and A.M. performed wet laboratory ancient DNA work. I.L., N.R.,
S.M., B.L., Q.F., M.M. and D.R. developed the 390k capture reagent. W.H., I.L. and D.R.
wrote the manuscript with help from all co-authors.
Author Information The aligned sequences are available through the European
Nucleotide Archive under accession number PRJEB8448. The Human Origins
genotype dataset including ancient individuals can be found at (http:// Reprints and
permissions information is available at The authors declare
no competing financial interests. Readers are welcome to comment on the online
version of the paper. Correspondenceand requests for materials shouldbe addressed
to D.R. (
00 MONTH 2015 | VOL 000 | NATURE | 5
Macmillan Publishers Limited. All rights reserved
Screening of libraries (shotgun sequencing and mitochondrial capture). The
212 libraries screened in this study (Supplementary Information section 1) from
a total of 119 samples (Supplementary Information section 3) were produced
at Adelaide (n5151), Tu
¨bingen (n516), and Boston (n545) (Supplementary
Data 1).
The libraries from Adelaide and Boston had internal barcodes directly attached
to both sides of the moleculesfrom the DNA extract so that each sequence begins
with the barcode
. The Adelaide libraries had 5 base pair (bp) barcodes on both
sides, whilethe Boston librarieshad 7 bp barcodes. Libraries from Tu
¨bingen hadno
internal barcodes,but were differentiated by the sequence of the indexingprimer
We adapted a reported protocol for enriching for mitochondrial DNA
, with
the difference that we adjusted the blocking oligonucleotides and PCR primers to
fit ourlibraries withshorter adapters. Over the course of thisproject, we alsolowered
the hybridization temperature from 65uCto60uC and performed stringent washes
at 55 uC instead of 60 uC
We used an aliquot of approximately 500ng of each library for target enrich-
ment of the complete mitochondrial genome in two consecutive rounds
, using a
bait set for human mtDNA
. We performed enrichmentin 96-well plates with one
library per well, and used a liquid handler (Evolution P3, Perkin Elmer) for the
capture and washing steps
. We used blocking oligonucleotides in hybridization
appropriate to the adapters of the truncated libraries. After either of the two enrich-
ment rounds, weamplified the enriched library moleculeswith the primer pair that
keeps the adapters short (PreHyb)using Herculase Fusion II PCR Polymerase. We
performed an indexing PCR of the finalcapture product using one or two indexing
. We cleaned up all PCR reactions usingSPRI technology
and the liquid
handler. Libraries from Tu
¨bingen were amplified with the primer pair IS5/IS6
For libraries from Boston and Adelaide, we used a second aliquot of eachlibrary
for shotgun sequencing after performing an indexing PCR
. We used unique
index combinations for each library and experiment, allowing us to distinguish
shotgun sequencing and mitochondrial DNA capture data, even if both experiments
were in the same sequencing run. We sequenced shotgun libraries and mtDNA
captured libraries from Tu
¨bingen in independent sequencing runs since the index
was already attached at the library preparation stage.
We quantified the sequencing pool with the BioAnalyzer (Agilent) and/or the
KAPA Library Quantification kit (KAPA Biosystems) and sequenced on Illumina
MiSeq, HiSeq2500 or NextSeq500 instruments for 2 375, 2 3100 or 2 3150
cycles along with the indexing read(s).
Enrichment for 394,577 SNP targets (‘390k capture’). The protocol for enrich-
ment for SNP targets was similar to the mitochondrial DNA capture, with the
exception that we used anotherbait set (390k) and about twice as much library (up
to 1,000 ng) compared to the mtDNA capture.
The specific capturereagent used in this study is described for the firsttime here.
To target each SNP, we used a different oligonucleotide probe design compared to
ref. 10. We used four 52 base pair probes for each SNP target. One probe ends just
before the SNP, and one starts just after. Two probes are centred on the SNP, and
are identical except for having the alternate alleles. This probe design avoids
systematic bias towards one SNP allele or another. For the template sequence for
designing the San and Yoruba panels baits, we used the sequence that was sub-
mitted for these same SNPs during the design of the Affymetrix Human Origins
SNP array
. For SNPs that were both in the San and Yoruba panels, we used the
Yoruba template sequence in preference. For all other SNPs, we used the human
genome reference sequence as a template. Supplementary Data 2a–d gives the list
of SNPs that we targeted, along with details of the probes used. The breakdown of
SNPs into different classes is as follows.
124,106 ‘Yoruba SNPs’: all SNPs in ‘panel 5’ of the Affymetrix Human Origins
array (discovered as heterozygous in a Yoruba male: HGDP00927)
that passed
the probe design criteria specified in ref. 11.
146,135 ‘San SNPs’: all SNPsin ‘panel 4’ of the Affymetrix Human Origins array
(discovered as heterozygous in a San male: HGDP01029)
that passed probe
design criteria
. The full AffymetrixHuman Origins array panel 4 contains several
tens of thousands of additional SNPs overlapping those from panel 5, but we did
not wish to redundantly capture panel 5 SNPs.
98,166 ‘compatibility SNPs’: SNPs that overlap between the Affymetrix Human
Origins,the Affymetrix6.0, and the Illumina 610 Quadarrays, whichare not already
included in the ‘Yoruba SNPs’ or ‘San SNPs’ lists
and that also passed the probe
design design criteria
26,170 ‘miscellaneous SNPs’: SNPs that did not overlap the Human Origins
array. The subset analysed in this study were 2,258 Y chromosome SNPs (http:// that we used for Y haplogroup
Processing of sequencing data. We restricted analysis to read pairs that passed
quality control according to the Illumina software (‘PF reads’).
We assigned read pairs to libraries by searching for matches to the expected
index and barcode sequences (if present, as for the Adelaide and Boston libraries).
We allowed no more than 1 mismatch per index or barcode, and zero mismatches
if there was ambiguity in sequence assignment or if barcodes of 5 bp length were
used (Adelaide libraries).
We used Seqprep ( to search for overlap-
ping sequence between the forward and reverse read, and restricted to molecules
where we could identifya minimum of 15 bp of overlap. We collapsed the two reads
into a single sequence, using the consensus nucleotide if both reads agreed,and the
read with higher base quality in the case of disagreement. For each merged nuc-
leotide, we assigned the base quality to be the higher of the two reads. We further
used Seqprep to search for the expected adaptor sequences at either ends of the
merged sequence, and to produce a trimmed sequence for alignment.
We mapped all sequences using BWA-0.6.1 (ref. 35). For mitochondrial ana-
lysis we mapped to the mitochondrial genome RSRS
. For whole-genome analysis
we mapped to the human reference genome hg19. We restricted all analyses to
sequences that had a mapping quality of MAPQ $37.
We sorted all mapped sequences by position,and used a custom script to search
for mapped sequences that had the same orientation and start and stop positions.
We stripped all but one of these sequences (keeping the best quality one) as
Mitochondrial sequence analysis and assessment of ancient DNA authenticity.
For each library for which we had average coverage of the mitochondrial genome
of at least tenfold after removal of duplicated molecules, we built a mitochondrial
consensus sequence, assigning haplogroups for each library as described in Sup-
plementary Information section 2.
We used contamMix-1.0.9 to search for evidence of contamination in the mito-
chondrial DNA
. This software estimates the fraction of mitochondrial DNA
sequences that match the consensus more closely than a comparison set of 311
worldwide mitochondrial genomes. This is doneby taking the consensus sequence
of reads aligning to the RSRS mitochondrial genome, and requiring a minimum
coverage of 5 after filtering bases where the quality was ,30. Raw reads are then
realigned to this consensus.In addition, the consensus is multiply aligned withthe
other 311 mitochondrial genomes using kalign (2.0.4)
to build the necessary
inputs for contamMix, trimming the first and last 5 bases of every read to mitigate
against the confounding factor of ancient damage. This software had difficulty
running on data sets with higher coverage, and for these data sets, we down-
sampled to 50,000 reads.
For all sequences mapping to the mitochondrial DNA for which the consensus
mitochondrial DNA sequence had a cytosine at the terminal nucleotide, we mea-
sured the proportion of sequences with a thymine at that position. For population
genetic analysis, we only used partially UDG-treated libraries with a minimum of
3% CRT substitutions as recommended by ref. 33. In cases where we used a fully
UDG-treated library for 390k analysis, we examined mitochondrial capture data
from a non-UDG-treated library made fromthe same extract, and verified that the
non-UDG library had a minimum of 10% CRT at the first nucleotide as recom-
mended by ref. 38. Metrics for the mitochondrial DNA analysis on eachlibrary are
given in Supplementary Data 1.
390k capture, sequence analysis and quality control. For 390k analysis, we
restricted to reads that not only mapped to the human reference genome hg19
but that also overlapped the 354,212 autosomal SNPs genotyped on the Human
Origins array
. We trimmed the last two nucleotides from each sequence because
we found that these are highly enriched in ancient DNA damage even for UDG-
treated libraries. We further restricted analyses to sites with base quality$30.
We madeno attempt to determinea diploid genotype at eachSNP in each sample.
Instead, we used a single allele—randomly drawn from the two alleles in the
individual—to represent the individual at that site
. Specifically, we made an
allele call at each targetSNP using majority rule over all sequences overlappingthe
SNP. When each of the possible alleles was supported by an equal number of
sequences, we picked an allele at random. We set the allele to ‘no call’ for SNPs
at which there was no read coverage.
We restricted population genetic analysis to libraries with a minimum of 0.06-
fold average coverage on the 390k SNP targets, and for which there was an un-
ambiguous sex determination based on the ratio of X to Y chromosome reads
(Supplementary Information section 4 and Supplementary Data 1). For indivi-
duals for whom there were multiple libraries per sample, we performed a series of
quality control analysis. First, we used the ADMIXTURE software
in super-
vised mode, using Kharia, Onge, Karitiana,Han, French, Mbuti, Ulchi and Eskimo
as reference populations. We visually inspected the inferred ancestry components
in each individual, and removed individuals with evidence of heterogeneity in
inferred ancestry components across libraries. For all possible pairs of libraries
for each sample, we also computed statistics of the form D(Library
, Library
Probe, Mbuti), where Probe is any of a panel of the same set of eight reference
Macmillan Publishers Limited. All rights reserved
populations), to determine whether there was significant evidence of the Probe
population being more closely related to one library from an ancient individual
than another library from that same individual. None of the individuals that we
used had strong evidence of ancestry heterogeneity across libraries. For samples
passing quality control for which there were multiple libraries per sample, we
merged the sequences into a single BAM.
We called alleles on each merged BAM using the same procedure as for the
individual libraries. We used ADMIXTURE
as well as PCA as implemented in
(using the lsqproject: YES option to project the ancient samples) to
visualize the geneticrelationships of each set of samples with the same culturelabel
with respect to 777 diverse present-day West Eurasians
. We visually identified
outlier individuals,and renamed them for analysis either as outliers or by the name
of the site at which they weresampled (Extended Data Table 1). We also identified
two pairs of related individuals based on the proportion of sites covered in pairs of
ancient samples from the same population that had identical allele calls using
. From eachpair of related individuals,we kept the one with themost SNPs.
Population genetic analyses. We determined genetic sex using the ratio of X and
Y chromosome alignments
(Supplementary Information section 4), and Y chro-
mosome haplogroupfor the male samples (SupplementaryInformation section 4).
We studied population structure (Supplementary Information sections 5 and 6).
We used fstatistics to carry out formal tests of population relationships (Supplemen-
tary Information section 6) and built explicit models of population history consistent
with the data (Supplementary Information section 7). We estimated mixture pro-
portions in a way that wasrobust to uncertainty about the exact population history
that applied (Supplementary Information section 8). We estimated the minimum
numberof streams ofmigration intoEurope neededto explainthe data (Supplemen-
tary Information sections 9 and 10). The estimated mixture proportions shown
in Fig. 3 were obtained using the lsqlin function of Matlab and the optimization
method described in Supplementary Information section 9 with 15 world outgroups.
Sample size. No statistical methods were used to predetermine sample size.
31. Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in
multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3
32. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los
Huesos. Nature 505, 403–406 (2014).
33. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil–DNA–
glycosylasetreatment for screeningof ancient DNA. Phil.Trans. R. Soc. Lond. B 370,
20130624 (2015).
34. Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries
for multiplexed target capture. Genome Res. 22, 939–946 (2012).
35. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler
transform. Bioinformatics 25, 1754–1760 (2009) .
36. Behar, D. M. et al. A ‘‘Copernican’’ reassessment of thehuman mitochondrial DNA
tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012).
37. Lassmann, T. & Sonnhammer, E. L. L. Kalign—an accurate and fast multiple
sequence alignment algorithm. BMC Bioinformatics 6, 298 (2005).
38. Sawyer, S.,Krause, J., Guschanski,K., Savolainen, V. & Pa
¨bo, S. Temporal patterns
of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS
ONE 7, e34131 (2012).
39. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328,
710–722 (2010).
40. Alexander, D. H. & Lange, K. Enhancements to the ADMIXTURE algorithm for
individual ancestry estimation. BMC Bioinformatics 12, 246 (2011).
41. Alexander,D. H., Novembre, J. & Lange, K. Fastmodel-based estimationof ancestry
in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
42. Reich, D., Price, A. L. & Patterson, N. Principal component analysis of genetic data.
Nature Genet. 40, 491–492 (2008).
43. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-
based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
44. Skoglund, P., Stora
¨m, A. & Jakobsson, M. Accurate sex identification
of ancient human remains using DNA shotgun sequencing. J. Archaeol. Sci. 40,
4477–4482 (2013).
Macmillan Publishers Limited. All rights reserved
Extended Data Figure 1
), measuring the degree of shared drift among pairs of ancient individuals.
Macmillan Publishers Limited. All rights reserved
Extended Data Figure 2
Modelling Corded Ware as a mixture of
51, 2,
or 3 ancestral populations. a, The left column shows a histogram of raw f
statistic residuals and on the right Z-scores for the best-fitting (lowest
squared 2-norm of the residuals, or resnorm) model at each N.b, The data on
the left show resnorm and on the right showthe maximum
score change for
different N.c,resnorm of different N52 models. The set of outgroupsused in
this analysis in the terminology of Supplementary Information section 9 is
‘World Foci 15 1Ancients’.
Macmillan Publishers Limited. All rights reserved
Extended Data Figure 3
Modelling Europeans as mixtures of increasing
51 (EN),
52 (EN, WHG),
53 (EN, WHG, Yamnaya),
54 (EN, WHG, Yamnaya, Nganasan),
55 (EN, WHG, Yamnaya,
Nganasan, BedouinB). The residual norm of the fitted model (Supplementary
Information section 9) and its changes are indicated.
Macmillan Publishers Limited. All rights reserved
Extended Data Figure 4
Geographic distribution of archaeological
cultures and graphic illustration of proposed population movements /
turnovers discussed in the main text. a, Proposed routes of migration by early
farmers into Europe ,9,00027000 years ago. b, Resurgenceof hunter-gatherer
ancestry during the Middle Neolithic 7,00025,000 years ago. c, Arrival of
steppe ancestry in central Europe during the Late Neolithic ,4,500 years ago.
White arrows indicate the two possible scenarios of the arrival of Indo-
European language groups. Symbols of samples are identical to those in Fig. 1.
Macmillan Publishers Limited. All rights reserved
Extended Data Table 1
Number of ancient Eurasian modern human samples screened in genome-wide studies to date
Only studies that produced at least one sample at $0.053coverage are listed.
Macmillan Publishers Limited. All rights reserved
Extended Data Table 2
Summary of the archaeological context for the 69 newly reported samples
Samples with direct radiocarbon dates are indicated by a calibrated date ‘‘cal BC’’ along with associated laboratory numbers. Dates that are estimated based on faunal elements associated with the samples are not
indicated with ‘cal’ (although they are still calibrated, absolute dates).
Macmillan Publishers Limited. All rights reserved
Extended Data Table 3
Pairwise F
for all ancient groups with $2 individuals, present-day Europeans with $10 individuals, and selected
other groups
(below the diagonal), standard deviation (above the diagonal).
Macmillan Publishers Limited. All rights reserved
... Advances in DNA sequencing have led to rapidly increasing numbers of ancient genomes available for demographic inference. Understanding the relationships among such temporally distributed genomes is not only of fundamental interest (Sjödin, Skoglund, and Jakobsson, 2014;Malmström et al., 2009;Lazaridis et al., 2014;Slatkin, 2016;Schraiber, 2017), but can also help reveal historical demographic patterns that would be impossible to detect using modern genomes alone (Green et al., 2010;Lazaridis et al., 2016;Raghavan et al., 2014;Haak et al., 2015;Rasmussen et al., 2015). In the field of human evolution in particular, ancient genomes have been key to discriminating between models of population continuity, admixture and replacement that have accompanied the emergence and spread of technological innovations, cultures and languages around the world (Lazaridis et al., 2016;Skoglund et al., 2012;Raghavan et al., 2014;Haak et al., 2015;Rasmussen et al., 2015;Olalde et al., 2018). ...
... Understanding the relationships among such temporally distributed genomes is not only of fundamental interest (Sjödin, Skoglund, and Jakobsson, 2014;Malmström et al., 2009;Lazaridis et al., 2014;Slatkin, 2016;Schraiber, 2017), but can also help reveal historical demographic patterns that would be impossible to detect using modern genomes alone (Green et al., 2010;Lazaridis et al., 2016;Raghavan et al., 2014;Haak et al., 2015;Rasmussen et al., 2015). In the field of human evolution in particular, ancient genomes have been key to discriminating between models of population continuity, admixture and replacement that have accompanied the emergence and spread of technological innovations, cultures and languages around the world (Lazaridis et al., 2016;Skoglund et al., 2012;Raghavan et al., 2014;Haak et al., 2015;Rasmussen et al., 2015;Olalde et al., 2018). ...
... Various forms of 2, 3 and 4 populations or individuals tests, f-statistics, represents a powerful framework widely applied in population genetics studies (e.g., Green et al., 2010;Reich et al., 2010;Patterson et al., 2012;Haak et al., 2015;Olalde et al., 2018). Built upon the the expected covariance in allele frequency differences between populations, f 2 , f 3 and f 4 -statistics have been used to provide evidence for admixture events in the deep ancestry of modern humans (Green et al., 2010;Reich et al., 2010;Patterson et al., 2012). ...
Full-text available
Ancient DNA (aDNA) can prove a valuable resource when investigating the evolutionary relationships between ancient and modern populations. Performing demographic inference using datasets that include aDNA samples however, requires statistical methods that explicitly account for the differences in drift expected among a temporally distributed sample. Such drift due to temporal structure can be challenging to discriminate from admixture from an unsampled, or "ghost", population, which can give rise to very similar summary statistics and confound methods commonly used in population genetics. Sequence data from ancient individuals also have unique characteristics, including short fragments, increased sequencing-error rates, and often limited genome-coverage that poses further challenges. Here we present a novel and conceptually simple approach for assessing questions of population continuity among a temporally distributed sample. We note that conditional on heterozygote sites in an individual genome at a particular point in time, the mean proportion of derived variants at those sites in other individuals has different expectations forwards in time and backwards in time. The difference in these processes enables us to construct a statistic that can detect population continuity in a temporal sample of genomes. We can show that the statistic is sensitive to historical admixture events from unsampled populations. Simulations are used to evaluate the power of this approach. We investigate a set of ancient genomes from Early Neolithic Scandinavia to assess levels of population continuity to an earlier Mesolithic individual. Individuals from hunter-gathering Neolithic Pitted Ware culture show marked continuity with the Mesolithic individual, whereas the contemporary Neolithic individuals from the and farming Funnel Beaker culture display much less continuity.
... Its geographical origin(s), emergence mechanisms, diffusion, and integration throughout Europe and northern Africa during the 3rd millennium BCE have been strongly debated and are still actively discussed today (Besse, 2015;Bosch-Gimpera, 1926;Delibes de Castro & Guerra Doce, 2019;Gallay, 2014;Harrison, 1974;Lanting, Mook, & van der Waals, 1973;Lemercier, 2020;Salanova, 2004;Sangmeister, 1963;Siret, 1913, among others). Much of the latest research on this phenomenon has been focused mainly on broad aDNA studies on a continental, or otherwise vast, scale (Allentoft et al., 2015;Brotherton et al., 2013;Haak et al., 2015;Marcus et al., 2019;Olalde et al., 2018Olalde et al., , 2019. ...
Full-text available
The spread of the Bell Beaker phenomenon across Europe is still strongly debated today. Small-scale technological studies investigating its integration in local contexts remain rare, even though these are crucial to observing disruptions in traditions. In this article, we studied the ceramic technology of Final Neolithic, Bell Beaker period, and Early Bronze Age settlements of the Upper Rhône valley in Switzerland (3300–1600 BCE). We reconstructed and compared their pottery traditions to those from the contemporaneous megalithic necropolis of Sion ‘Petit-Chasseur’, a major funerary and ritual site located in the centre of the valley. Our findings showed that the Bell Beaker period saw an abundance of simultaneous technical changes, mirroring disruptions identified by other fields, and confirmed that this cultural phenomenon did not blend seamlessly with the local context. More importantly, they revealed the role played by human mobility, with the arrival of potters shortly after 2500 BCE.
... In North-Western Russia, the first signs of agricultural development were observed between 2,7 and 2 kyrs BC (Fowler, Harding and Hofmann, 2015). Some studies have suggested that farmers from the Near East would have immigrate to Europe (Fowler, Harding and Hofmann, 2015) and substantially replaced the local huntergatherer population, except on the Western and Northern margin of the continent, where Mesolithic societies persisted longer (Haak et al., 2015). DNA analysis have revealed an unbroken chain of ancestry from Central and SouthWestern Europe to Greece and NorthWestern Anatolia, suggesting that these migrations would have been rather limited (Hofmanová et al., 2016). ...
The major cultural and techno-economic changes that occurred in Europe between 7,000 and 4,000 BC, including the development of agriculture, had major repercussions on the animals that lived close to humans. The dog, the only animal that has been domesticated for thousands of years is probably a good marker of the evolution of human societies at that time. Although many data inform us about its status and genetic diversity, very few studies have documented its morphological variability and the resulting possible functional adaptations in relation toanthropogenic constraints. Furthermore, to date no studies have explored the variability in ancient red foxes although they are likely to develop the same adaptations as dogs (but to a lesser extent due to their commensal nature). In this thesis, an innovative morpho-functional approach is used to describe the evolution of mandible (the best preserved bone in archaeological series and an important functional element of the masticatory apparatus) from the Mesolithic to the very early Bronze Age in Western Europe and Southern Romania. Photogrammetry and geometric morphometrics are used to quantify the shape of the bones in3D. In a first step, shape drivers and form-function relationships within the masticatory apparatus are explored in a sample of modern dogs and foxes. The masticatory muscles of approximately 120 dogs of various breeds and foxes were dissected. A biomechanical model for estimating bite force using muscle data is established and validated by in vivo measurements. Strong interrelationships between the cranium, mandible, masticatory muscles and bite force are demonstrated for both species, highlighting the strong integration despite the extreme artificial selections in modern dogs. A predictive model of bite force using theshape of mandibular fragments is therefore developed to interpret the variations in shape in the archaeological sample. The impacts of developmental and environmental factors (climate, urbanism, diet) on the form or function are quantified by studying 433 Australian foxes. Secondly, the variability of ancient dogs and foxes (528 dogs and 50 foxes) is compared with that of modern canids (70 dogs, 8 dingoes, 8 wolves, 68 foxes). Strong morphological differences are demonstrated for both species, suggesting functional differences. Ancient dogs appear highly variable in terms of size and shape, although less variable than modern dogs. Modern hypertypes have no equivalent in our archaeological sample. More surprisingly, some ancient shapes are not found in the extant sample. Finally, the variability existing in dogs prior to the Bronze Age is explored and linked to the information already available. Strong differences between eastern and western Europe are highlighted, reflecting the very different histories of dog populations in these two areas. In each geographical area, temporal but also cultural differences in the size and shape of the dogs are demonstrated. The study of foxes, although limited due to the scarcity of remains, reveals the existence of a relatively large diversity. Variation in size and shape are then probably more related to geographical andclimatic variation than to anthropogenic constraints. Differences in bite force over time are suggested for both dogs and foxes, suggesting changes in dog function, and possibly functional adaptations to a diet that has become increasingly influenced by human practices.
Full-text available
We present a spatiotemporal picture of human genetic diversity in Anatolia, Iran, Levant, South Caucasus, and the Aegean, a broad region that experienced the earliest Neolithic transition and the emergence of complex hierarchical societies. Combining 35 new ancient shotgun genomes with 382 ancient and 23 present-day published genomes, we found that genetic diversity within each region steadily increased through the Holocene. We further observed that the inferred sources of gene flow shifted in time. In the first half of the Holocene, Southwest Asian and the East Mediterranean populations homogenized among themselves. Starting with the Bronze Age, however, regional populations diverged from each other, most likely driven by gene flow from external sources, which we term “the expanding mobility model.” Interestingly, this increase in inter-regional divergence can be captured by outgroup-f3-based genetic distances, but not by the commonly used FST statistic, due to the sensitivity of FST, but not outgroup-f3, to within-population diversity. Finally, we report a temporal trend of increasing male bias in admixture events through the Holocene.
Fascioliasis is a plant- and waterborne zoonotic parasitic disease caused by two trematode species: (i) Fasciola hepatica in Europe, Asia, Africa, the Americas, and Oceania and (ii) F. gigantica, which is restricted to Africa and Asia. Fasciolid liver flukes infect mainly herbivores as ruminants, equids, and camelids but also omnivore mammals as humans and swine and are transmitted by freshwater Lymnaeidae snail vectors. Two phases may be distinguished in fasciolid evolution. The long predomestication period includes the F. gigantica origin in east-southern Africa around the mid-Miocene, the F. hepatica origin in the Near-Middle East of Asia around the latest Miocene to Early Pliocene, and their subsequent local spread. The short postdomestication period includes the worldwide spread by human-guided movements of animals in the last 12,000 years and the more recent transoceanic anthropogenic introductions of F. hepatica into the Americas and Oceania and of F. gigantica into several large islands of the Pacific with ships transporting livestock in the last 500 years. The routes and chronology of the spreading waves followed by both fasciolids into the five continents are redefined on the basis of recently generated knowledge of human-guided movements of domesticated hosts. No local, zonal, or regional situation showing disagreement with historical records was found, although in a few world zones the available knowledge is still insufficient. The anthropogenically accelerated evolution of fasciolids allows us to call them "peridomestic endoparasites." The multidisciplinary implications for crucial aspects of the disease should therefore lead the present baseline update to be taken into account in future research studies.
The cultural layers of ancient (3rd–2nd millennia BCE) settlements are unique study objects. Top-down, they consist of modern-day soils overlapping the ancient buried soil, strongly altered by anthropogenic pressure. Cultural layers always contain the remains of artifacts and human life in the settlement, such as bones and ceramics. Settlement sites contain cultural layers that are a promising object for studying the ancient anthropogenic mineral formation. Still, such studies should follow the study of the principal physical and chemical properties of soils. Soils of a Bronze Age settlement were studied along with the natural soils of the floodplain terrace of the Volga river, which form a common area with the terrace of the Samara River. Paleourbanozems (soils formed on cultural layers of ancient settlements) with anthropogenic horizons built into the system of natural soil horizons are formed on the settlement site. The Krasnosamarskoe settlement revealed two generations of solonetzic soils located one above another and differing in the thickness of solonetzic jointing (including thin-columnar solonetzic soils). These solonetzic soils were formed during various stages of the Bronze Age, but subsequently, they morphologically merged into a single horizon. The author investigated the stages of soil cover formation of river valleys in connection with the long-term anthropogenic impact with a specific focus on the Bronze Age societies of the Samara Volga region.
Full-text available
“Cretan hieroglyphic is very much influenced by Luwian hieroglyphic – almost 75 percent of the signary of Cretan hieroglyphic consists of Luwian hieroglyphic.” “There is no question about the values of the signs in Linear A – they are almost the same as in Linear B. It is indeed very easy to read Linear A, there are no challenges: you fill in the signs, look up the Semitic words and then you are able to translate the text.” “In Byblos, people invented a new provincial style of writing Egyptian hieroglyphic. It was these newly invented signs which were then also adapted for the Byblos script. Byblos script documents were written in a pure Semitic dialect closely related to Ugaritic.” “The cylinder seals in Cypro-Minoan script reflect economic registration since the Hittite overlords were anxious to charge taxes. The system used for economic registration is the same that we already know from the lead strip with Luwian hieroglyphic inscription that was found in Kayseri-Kululu in central Anatolia.” “Etruscan is basically a seventh century Luwian dialect. The Liber Linteus, a whole book in Etruscan, is a manual for priests. The language of the enigmatic Lemnos Stele is closely related to Etruscan, but it is not identical; it has dialectal features.” “Carian, Etruscan and Lycian are all Luwian dialects! Etruscan is indeed very closely related to the Carian language, and both are ultimately derived from Luwian hieroglyphic."
Full-text available
As a historical nomadic group in Central Asia, Kazaks have mainly inhabited the steppe zone from the Altay Mountains in the East to the Caspian Sea in the West. Fine scale characterization of the genetic profile and population structure of Kazaks would be invaluable for understanding their population history and modeling prehistoric human expansions across the Eurasian steppes. With this mind, we characterized the maternal lineages of 200 Kazaks from Jetisuu at mitochondrial genome level. Our results reveal that Jetisuu Kazaks have unique mtDNA haplotypes including those belonging to the basal branches of both West Eurasian (R0, H, HV) and East Eurasian (A, B, C, D) lineages. The great diversity observed in their maternal lineages may reflect pivotal geographic location of Kazaks in Eurasia and implies a complex history for this population. Comparative analyses of mitochondrial genomes of human populations in Central Eurasia reveal a common maternal genetic ancestry for Turko-Mongolian speakers and their expansion being responsible for the presence of East Eurasian maternal lineages in Central Eurasia. Our analyses further indicate maternal genetic affinity between the Sherpas from the Tibetan Plateau with the Turko-Mongolian speakers.
Full-text available
There have been numerous attempts to find relatives of Proto-Indo-European, not the least of which is the Indo-Uralic Hypothesis. According to this hypothesis, Proto-Indo-European and Proto-Uralic are alleged to descend from a common ancestor. However, attempts to prove this hypothesis have run into numerous difficulties. One difficulty concerns the inability to reconstruct the ancestral morphological system in detail, and another concerns the rather small shared vocabulary. This latter problem is further complicated by the fact that many scholars think in terms of borrowing rather than inheritance. Moreover, the lack of agreement in vocabulary affects the ability to establish viable sound correspondences and rules of combinability. This paper will attempt to show that these and other difficulties are caused, at least in large part, by the question of the origins of the Indo-European parent language. Evidence will be presented to demonstrate that Proto-Indo-European is the result of the imposition of a Eurasiatic language-to use Joseph Greenberg's term-on a population speaking one or more primordial Northwest Caucasian languages. 2
Full-text available
Within the framework of the Indo-European discourse, the key issue is the linguistic status of the carriers of the Corded Ware cultures of the Bronze Age. Were they Indo-Europeans in the Neolithic, or did they become Indo-Europeans at some point in their history? Unfortunately, over the past half century, no new concepts of the origin of these cultures have been proposed, including the version that existed on the Middle Dnieper. At the same time, in the light of new radiocarbon dating and genetic data that have appeared over the past decades, old archaeological sources, long ago introduced into scientific circulation, have made it possible to clarify the specific mechanism for the formation of Corded Ware cultures. The purpose of the research is to determine the origin of the Middle Dnieper variant of the Pit grave culture as ancestral to the Corded Ware culture of the region. The origin of the funeral rite has been analyzed from the point of view of its occurrence in adjacent regions in previous periods. As a result, it has been found that the funeral rite of the future carriers of Corded Ware came to the Middle Dnieper from the area of the Shcherbanev-Penezhkovskaya group of Tripoli. This conclusion turns out to be in a systematic connection with the origin as such of the technological method of cord ornamentation from the same group of the Trypillia culture. This allows us to conclude that the future Slavic-Balto-Germanic tribes during the period when they constituted a single linguistic (Nordic) group were not inhabitants of the steppes, but traditional Trypillia farmers who switched to a pastoral way of life in the Bronze Age.
Full-text available
Archaeological reconstruction of Indo-European migrations
Full-text available
Bone and tooth samples from sixteen individuals of the Vedrovice skeletal collection were submitted to ancient DNA (aDNA) analyses of mitochondrial as well as nuclear DNA. Compared with other aDNA prehistoric samples analysed at the University of Mainz aDNA laboratories, the Vedrovice samples are generally not among the best preserved due to a low content of severely damaged DNA molecules. Only 37.5% of the individuals yielded consistent results reproducible from different extracts. It was possible to type mitochondrial DNA samples from three male and three female individuals. The resulting six different DNA sequences (haplotypes) were classified into 4 haplogroups: haplogroup K (represented by two individuals), haplogroup T2 (also represented by two individuals), haplogroup H and haplogroup J1c, each represented by one individual. All of these haplogroups have been identified amongst modern European populations, although the individual haplotypes are predominantly represented among today's Eastern-European populations. Two of the Vedrovice haplotypes are unique, and as yet not identified among the currently known modern lineages. Haplotype N1a, whose incidence among LBK individuals is relatively high elsewhere (Haak et al. 2005), was not recovered among the analysed individuals from Vedrovice.
Excavations of a burial of the Yamnaya culture at Kutuluk have uncovered the remains of a large copper weapon analogous to both the later copper bar celts found in India and the vajra, the mythological weapon wielded by Indra.
The first complete history of Central Eurasia from ancient times to the present day, Empires of the Silk Road represents a fundamental rethinking of the origins, history, and significance of this major world region. Christopher Beckwith describes the rise and fall of the great Central Eurasian empires, including those of the Scythians, Attila the Hun, the Turks and Tibetans, and Genghis Khan and the Mongols. In addition, he explains why the heartland of Central Eurasia led the world economically, scientifically, and artistically for many centuries despite invasions by Persians, Greeks, Arabs, Chinese, and others. In retelling the story of the Old World from the perspective of Central Eurasia, Beckwith provides a new understanding of the internal and external dynamics of the Central Eurasian states and shows how their people repeatedly revolutionized Eurasian civilization. Beckwith recounts the Indo-Europeans' migration out of Central Eurasia, their mixture with local peoples, and the resulting development of the Graeco-Roman, Persian, Indian, and Chinese civilizations; he details the basis for the thriving economy of premodern Central Eurasia, the economy's disintegration following the region's partition by the Chinese and Russians in the eighteenth and nineteenth centuries, and the damaging of Central Eurasian culture by Modernism; and he discusses the significance for world history of the partial reemergence of Central Eurasian nations after the collapse of the Soviet Union.
Roughly half the world's population speaks languages derived from a shared linguistic source known as Proto-Indo-European. But who were the early speakers of this ancient mother tongue, and how did they manage to spread it around the globe? Until now their identity has remained a tantalizing mystery to linguists, archaeologists, and even Nazis seeking the roots of the Aryan race.The Horse, the Wheel, and Languagelifts the veil that has long shrouded these original Indo-European speakers, and reveals how their domestication of horses and use of the wheel spread language and transformed civilization. David Anthony identifies the prehistoric peoples of central Eurasia's steppe grasslands as the original speakers of Proto-Indo-European, and shows how their innovative use of the ox wagon, horseback riding, and the warrior's chariot turned the Eurasian steppes into a thriving transcontinental corridor of communication, commerce, and cultural exchange. He explains how they spread their traditions and gave rise to important advances in copper mining, warfare, and patron-client political institutions, thereby ushering in an era of vibrant social change. Anthony describes his discovery of how the wear from bits on ancient horse teeth reveals the origins of horseback riding. And he introduces a new approach to linking prehistoric archaeological remains with the development of language. The Horse, the Wheel, and Languagesolves a puzzle that has vexed scholars for two centuries--the source of the Indo-European languages and English--and recovers a magnificent and influential civilization from the past.