ArticlePDF Available

Abstract and Figures

Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language Yiddish were investigated by applying the Geographic Population Structure (GPS) to a cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs along major ancient trade routes in northeastern Turkey adjacent to primeval villages with names that resemble the word “Ashkenaz.” These findings were compatible with the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German origins for Yiddish. We discuss how these findings advance three ongoing debates concerning 1) the historical meaning of the term “Ashkenaz;” 2) the genetic structure of Ashkenazic Jews and their geographical origins as inferred from multiple studies employing both modern and ancient DNA and an original ancient DNA analysis; and 3) the development of Yiddish. Due to the rising popularity of geo-localizati
Content may be subject to copyright.
published: 21 June 2017
doi: 10.3389/fgene.2017.00087
Frontiers in Genetics | 1June 2017 | Volume 8 | Article 87
Edited by:
Stéphane Joost,
École Polytechnique Fédérale de
Lausanne, Switzerland
Reviewed by:
Pavel Flegontov,
University of Ostrava, Czechia
Lounès Chikhi,
Centre National de la Recherche
Scientifique (CNRS), France
Erika Hagelberg,
University of Oslo, Norway
Eran Elhaik
Specialty section:
This article was submitted to
Evolutionary and Population Genetics,
a section of the journal
Frontiers in Genetics
Received: 02 October 2016
Accepted: 07 June 2017
Published: 21 June 2017
Das R, Wexler P, Pirooznia M and
Elhaik E (2017) The Origins of
Ashkenaz, Ashkenazic Jews, and
Yiddish. Front. Genet. 8:87.
doi: 10.3389/fgene.2017.00087
The Origins of Ashkenaz, Ashkenazic
Jews, and Yiddish
Ranajit Das 1, Paul Wexler 2, Mehdi Pirooznia 3and Eran Elhaik 4*
1Manipal Centre for Natural Sciences, Manipal University, Manipal, India, 2Department of Linguistics, Tel Aviv University,
Tel-Aviv, Israel, 3Department of Psychiatry and Behavioral Sciences, Johns Hopkins University, Baltimore, MD, United States,
4Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
Recently, the geographical origins of Ashkenazic Jews (AJs) and their native language
Yiddish were investigated by applying the Geographic Population Structure (GPS) to a
cohort of exclusively Yiddish-speaking and multilingual AJs. GPS localized most AJs
along major ancient trade routes in northeastern Turkey adjacent to primeval villages
with names that resemble the word “Ashkenaz.” These findings were compatible with
the hypothesis of an Irano-Turko-Slavic origin for AJs and a Slavic origin for Yiddish and
at odds with the Rhineland hypothesis advocating a Levantine origin for AJs and German
origins for Yiddish. We discuss how these findings advance three ongoing debates
concerning (1) the historical meaning of the term “Ashkenaz;” (2) the genetic structure
of AJs and their geographical origins as inferred from multiple studies employing both
modern and ancient DNA and original ancient DNA analyses; and (3) the development of
Yiddish. We provide additional validation to the non-Levantine origin of AJs using ancient
DNA from the Near East and the Levant. Due to the rising popularity of geo-localization
tools to address questions of origin, we briefly discuss the advantages and limitations of
popular tools with focus on the GPS approach. Our results reinforce the non-Levantine
origins of AJs.
Keywords: Yiddish, Ashkenazic Jews, Ashkenaz, geographic population structure (GPS), Archaeogenetics,
Rhineland hypothesis, ancient DNA
The geographical origin of the Biblical “Ashkenaz,” Ashkenazic Jews (AJs), and Yiddish, are among
the longest standing questions in history, genetics, and linguistics.
Uncertainties concerning the meaning of “Ashkenaz” arose in the Eleventh century when the
term shifted from a designation of the Iranian Scythians to become that of Slavs and Germans and
finally of “German” (Ashkenazic) Jews in the Eleventh to Thirteenth centuries (Wexler, 1993). The
first known discussion of the origin of German Jews and Yiddish surfaced in the writings of the
Hebrew grammarian Elia Baxur in the first half of the Sixteenth century (Wexler, 1993).
It is well established that history is also reflected in the DNA through relationships
between genetics, geography, and language (e.g., Cavalli-Sforza, 1997; Weinreich, 2008). Max
Weinreich, the doyen of the field of modern Yiddish linguistics, has already emphasized the
truism that the history of Yiddish mirrors the history of its speakers. These relationships
prompted Das et al. (2016) to address the question of Yiddish origin by analyzing
the genomes of Yiddish-speaking AJs, multilingual AJs, and Sephardic Jews using the
Geographical Population Structure (GPS), which localizes genomes to where they experienced
the last major admixture event. GPS traced nearly all AJs to major ancient trade
routes in northeastern Turkey adjacent to four primeval villages whose names resemble
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
“Ashkenaz:” ˙
skenaz (or E¸skenaz), E¸skenez (or E¸skens), A¸shanas,
and Aschuz. Evaluated in light of the Rhineland and Irano-
Turko-Slavic hypotheses (Das et al., 2016,Table 1) the findings
supported the latter, implying that Yiddish was created by Slavo-
Iranian Jewish merchants plying the Silk Roads. We discuss
these findings from historical, genetic, and linguistic perspectives
and calculate the genetic similarity of AJs and Middle Eastern
populations to ancient genomes from Anatolia, Iran, and the
Levant. We lastly review briefly the advantages and limitation of
bio-localization tools and their application in genetic research.
“Ashkenaz” is one of the most disputed Biblical placenames.
It appears in the Hebrew Bible as the name of one of Noah’s
TABLE 1 | Major open questions regarding the origin of the term “Ashkenaz,” AJs, and Yiddish as explained by two competing hypotheses.
Open questions Rhineland hypothesis Irano-Turko-Slavic hypothesis Evidence in favor of the Irano-Turko-Slavic
The term “Ashkenaz” Originally affiliated with the people living
north of Biblical Israel (Aptroot, 2016) or
north of the Black Sea (Wexler, 1991).
Used in Hebrew and Yiddish sources from
the Eleventh century onward to denote a
region in what is now roughly Southern
Germany (Wexler, 1991; Aptroot, 2016).
Denotes an Iranian people “near Armenia,”
presumably Scythians known as aškuza,
ašguza, or išguza in Assyrian inscriptions of the
early Seventh century B.C. (Wexler, 2012,
GPS analysis uncovered four primeval villages
in northeastern Turkey whose names resemble
“Ashkenaz,” at least one of which predates any
major Jewish settlement in Germany (Das
et al., 2016). “Ashkenaz” is thereby a
placename associated with the Near East and
its inhabitants both Jews and non-Jews.
The ancestral origin of
Ashkenazic Jews
Judaean living in Judaea until 70 A.D. who
were exiled by the Romans (King, 2001)
and remained in relative isolation from
neighboring non-Jewish communities
during and after the Diaspora (Hammer
et al., 2000; Ostrer, 2001). This scenario
has no historical (Sand, 2009) nor genetic
support (Figure 1B) (e.g., Elhaik, 2013,
2016; Xue et al., 2017).
A minority of Judaean emigrants and a majority
of Irano-Turko-Slavic converts to Judaism
(Wexler, 2012).
AJs exhibit high genetic similarity to
populations living in Turkey and the Caucasus
(Das et al., 2016). All bio-location analyses
predicted AJs to Turkey (Figure 1A). Ancient
DNA analyses provide strong evidence of the
Iranian Neolithic ancestry of AJs (Figure 1B)
(Lazaridis et al., 2016).
The arrival of Jews to
German lands
After the arrival of Palestinian Jews to
Roman lands, Jewish merchants and
soldiers arrived to German lands with the
Roman army and settled there (King,
2001). This scenario has no historical
support (Wexler, 1993; Sand, 2009).
Jews from the Khazar Empire and the former
Iranian Empire plying the old Roman trade
routes (Rabinowitz, 1945, 1948) and Silk
Roads began to settle in the mixed
Germano-Sorbian lands during the first
Millennium (Sand, 2009; Wexler, 2011).
Ashkenazic Jews were predicted to a Near
Eastern hub of ancient trade routes that
connected Europe, Asia, and the northern
Caucasus (Das et al., 2016). The findings imply
that migration to Europe took place initially
through trade routes going west and later
through Khazar lands.
Yiddish’s emergence in
the 9th century
Between the Ninth and Tenth centuries,
French- and Italian-speaking Jewish
immigrants adopted and adapted the local
German dialects (Weinreich, 2008).
Upon arrival to German lands, Western and
Eastern Slavic went through a relexification to
German, creating what became known as
Yiddish (Wexler, 2012).
Xue et al.’s (2017) inferred “admixture time” of
960–1,416 AD corresponds to a time period
during which AJ have experienced major
demographic changes. At that time, AJs were
speculated to have absorbed Slavic people,
developed Slavic Yiddish, and intensified the
migration to Europe (Das et al., 2016).
Growth of Eastern
European Jewry
A small group of German Jews migrated
to Eastern Europe and reproduced via a
so-called “demographic miracle”
(Ben-Sasson, 1976; Atzmon et al., 2010;
Ostrer, 2012), which resulted in an
unnatural growth rate (1.7–2% annually)
(van Straten and Snel, 2006; van Straten,
2007) over half a millennium acting only on
Jews residing in Eastern Europe. This
explanation is unsupported by the data.
During the half millennium (740–1,250 CE),
Khazar and Iranian lands harbored the largest
Eurasian Jewish centers. Ashkenazic, Khazar,
and Iranian Jews then sent offshoots into the
Slavic lands (Baron, 1957; Sand, 2009).
Most of the Ashkenazic Jews were predicted to
Northeastern Turkey and the remaining
individuals clustered along a gradient going
from Turkey to Eastern European lands (Das
et al., 2016). This is in agreement with the
recorded conversions of populations living
along the southern shores of the Black Sea to
Judaism (Baron, 1937). A German origin of AJs
is unsupported by the data (Figure 1A).
The genetic evidence produced by Das et al. (2016) is shown in the last column.
descendants (Genesis 10:3) and as a reference to the kingdom
of Ashkenaz, prophesied to be called together with Ararat
and Minnai to wage war against Babylon (Jeremiah 51:27). In
addition to tracing AJs to the ancient Iranian lands of Ashkenaz
and uncovering the villages whose names may derive from
“Ashkenaz,” the partial Iranian origin of AJs, inferred by Das
et al. (2016), was further supported by the genetic similarity of
AJs to Sephardic Mountain Jews and Iranian Jews as well as their
similarity to Near Eastern populations and simulated “native”
Turkish and Caucasus populations.
There are good grounds, therefore, for inferring that Jews who
considered themselves Ashkenazic adopted this name and spoke
of their lands as Ashkenaz, since they perceived themselves as of
Iranian origin. That we find varied evidence of the knowledge
of Iranian language among Moroccan and Andalusian Jews and
Karaites prior to the Eleventh century is a compelling point
Frontiers in Genetics | 2June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
FIGURE 1 | The localization of AJs and their ancient admixture proportions compared to neighboring populations. (A) Geographical predictions of individuals analyzed
in three separate studies employing different tools: Elhaik (2013, Figure 4) (blue), Behar et al. (2013, Figure 2B) (red), and Das et al. (2016, Figure 4) (dark green for AJs
who have four AJ grandparents and light green for the rest) are shown. Color matching mean and standard deviation (bars) of the longitude and latitude are shown for
each cohort. Since we were unsuccessful in obtaining the data points of Behar et al. (2013, Figure 2B) from the corresponding author, we procured 78% of the data
points from their figure. Due to the low quality of their figure we were unable to reliably extract the remaining data points. (B) Supervised ADMIXTURE results. For
brevity, subpopulations were collapsed. The xaxis represents individuals. Each individual is represented by a vertical stacked column of color-coded admixture
proportions that reflect genetic contributions from ancient Hunter-Gatherer, Anatolian, Levantine, and Iranian individuals.
of reference to assess the shared Iranian origins of Sephardic
and Ashkenazic Jews (Wexler, 1996). Moreover, Iranian-speaking
Jews in the Caucasus (the so-called Juhuris) and Turkic-speaking
Jews in the Crimea prior to World War II called themselves
“Ashkenazim” (Weinreich, 2008).
The Rhineland hypothesis cannot explain why a name that
denotes “Scythians” and was associated with the Near East
became associated with German lands in the Eleventh to
Thirteenth centuries (Wexler, 1993). Aptroot (2016) suggested
that Jewish immigrants in Europe transferred Biblical names
onto the regions in which they settled. This is unconvincing.
Biblical names were used as place names only when they had
similar sounds. Not only Germany and Ashkenaz do not share
similar sounds, but Germany was already named “Germana,” or
“Germamja” in the Iranian (“Babylonian”) Talmud (completed in
the Fifth century A.D.) and, not surprisingly, was associated with
Noah’s grandson Gomer (Talmud, Yoma 10a). Name adoption
also occurred when the exact place names were in doubt as
in the case of Sefarad (Spain). This is not the case here, as
Aptroot too notes, since “Ashkenaz” had a known and clear
geographical affiliation (Table 1). Finally, Germany was known
to French scholars like the RaDaK (1160–1235) as “Almania”
(Sp. Alemania, Fr. Allemagne), after the Almani tribes, a term
that was also adopted by Arab scholars. Had the French scholar
Rashi (1040?-1105), interpreted aškenaz as “Germany,” it would
have been known to the RaDaK who used Rashi’s symbols.
Frontiers in Genetics | 3June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
Therefore, Wexler’s proposal that Rashi used aškenaz in the
meaning of “Slavic” and that the term aškenaz assumed the
solitary meaning “German lands” only after the Eleventh century
in Western Europe as a result of the rise of Yiddish, is more
reasonable (Wexler, 2011). This is also supported by Das et al.’s
major findings of the only known primeval villages whose names
derive from the word “Ashkenaz” located in the ancient lands
of Ashkenaz. Our inference is therefore supported by historical,
linguistic, and genetic evidence, which has more weight as a
simple origin that can be easily explained than a more complex
scenario that involves multiple translocations.
AJs were localized to modern-day Turkey and found to be
genetically closest to Turkic, southern Caucasian, and Iranian
populations, suggesting a common origin in Iranian “Ashkenaz”
lands (Das et al., 2016). These findings were more compatible
with an Irano-Turko-Slavic origin for AJs and a Slavic origin
for Yiddish than with the Rhineland hypothesis, which lacks
historical, genetic, and linguistic support (Table 1) (van Straten,
2004; Elhaik, 2013). The findings have also highlighted the strong
social-cultural and genetic bonds of Ashkenazic and Iranian
Judaism and their shared Iranian origins (Das et al., 2016).
Thus far, all analyses aimed to geo-localize AJs (Behar et al.,
2013, Figure 2B; Elhaik, 2013, Figure 4; Das et al., 2016, Figure 4)
identified Turkey as the predominant origin of AJs, although they
used different approaches and datasets, in support of the Irano-
Turko-Slavic hypothesis (Figure 1A,Table 1). The existence of
both major Southern European and Near Eastern ancestries in
AJ genomes are also strong indictors of the Irano-Turko-Slavic
hypothesis provided the Greco-Roman history of the region
southern to the Black Sea (Baron, 1937; Kraemer, 2010). Recently,
Xue et al. (2017) applied GLOBETROTTER to a dataset of 2,540
AJs genotyped over 252,358 SNPs. The inferred ancestry profile
for AJs was 5% Western Europe, 10% Eastern Europe, 30%
Levant, and 55% Southern Europe (a Near East ancestry was not
considered by the authors). Elhaik (2013) portrayed a similar
profile for European Jews, consisting of 25–30% Middle East
and large Near Eastern–Caucasus (32–38%) and West European
(30%) ancestries. Remarkably, Xue et al. (2017) also inferred
an “admixture time” of 960–1,416 AD (24–40 generations
ago), which corresponds to the time AJs experienced major
geographical shifts as the Judaized Khazar kingdom diminished
and their trading networks collapsed forcing them to relocate
to Europe (Das et al., 2016). The lower boundary of that date
corresponds to the time Slavic Yiddish originated, to the best of
our knowledge.
The non-Levantine origin of AJs is further supported by
an ancient DNA analysis of six Natufians and a Levantine
Neolithic (Lazaridis et al., 2016), some of the most likely Judaean
progenitors (Finkelstein and Silberman, 2002; Frendo, 2004). In
a principle component analysis (PCA), the ancient Levantines
clustered predominantly with modern-day Palestinians and
Bedouins and marginally overlapped with Arabian Jews, whereas
AJs clustered away from Levantine individuals and adjacent
to Neolithic Anatolians and Late Neolithic and Bronze Age
Europeans. To evaluate these findings, we inferred the ancient
ancestries of AJs using the admixture analysis described in
Marshall et al. (2016). Briefly, we analyzed 18,757 autosomal
SNPs genotyped in 46 Palestinians, 45 Bedouins, 16 Syrians,
and eight Lebanese (Li et al., 2008) alongside 467 AJs [367 AJs
previously analyzed and 100 individuals with AJ mother) (Das
et al., 2016) that overlapped with both the GenoChip (Elhaik
et al., 2013) and ancient DNA data (Lazaridis et al., 2016). We
then carried out a supervised ADMIXTURE analysis (Alexander
and Lange, 2011) using three East European Hunter Gatherers
from Russia (EHGs) alongside six Epipaleolithic Levantines, 24
Neolithic Anatolians, and six Neolithic Iranians as reference
populations (Table S0). Remarkably, AJs exhibit a dominant
Iranian (
88%) and residual Levantine (
3%) ancestries, as opposed
to Bedouins (
14% and
68%, respectively) and Palestinians (
58%, respectively). Only two AJs exhibit Levantine ancestries
typical to Levantine populations (Figure 1B). Repeating the
analysis with qpAdm (AdmixTools, version 4.1) (Patterson et al.,
2012), we found that AJs admixture could be modeled using
either three- (Neolithic Anatolians [46%], Neolithic Iranians
[32%], and EHGs [22%]) or two-way (Neolithic Iranians [71%]
and EHGs [29%]) migration waves (Supplementary Text).
These findings should be reevaluated when Medieval DNA
would become available. Overall, the combined results are in
a strong agreement with the predictions of the Irano-Turko-
Slavic hypothesis (Table 1) and rule out an ancient Levantine
origin for AJs, which is predominant among modern-day
Levantine populations (e.g., Bedouins and Palestinians). This is
not surprising since Jews differed in cultural practices and norms
(Sand, 2011) and tended to adopt local customs (Falk, 2006).
Very little Palestinian Jewish culture survived outside of Palestine
(Sand, 2009). For example, the folklore and folkways of the Jews
in northern Europe is distinctly pre-Christian German (Patai,
1983) and Slavic in origin, which disappeared among the latter
(Wexler, 1993, 2012).
The hypothesis that Yiddish has a German origin ignores
the mechanics of relexification, the linguistic process which
produced Yiddish and other “Old Jewish” languages (i.e., those
created by the Ninth to Tenth century). Understanding how
relexification operates is essential to understanding the evolution
of languages. This argument has a similar context to that of the
evolution of powered flight. Rejecting the theory of evolution
may lead one to conclude that birds and bats are close relatives.
By disregarding the literature on relexification and Jewish history
in the early Middle Ages, authors (e.g., Aptroot, 2016; Flegontov
et al., 2016) reach conclusions that have weak historical support.
The advantage of a geo-localization analysis is that it allows us
to infer the geographical origin of the speakers of Yiddish, where
they resided and with whom they intermingled, independently
of historical controversies, which provides a data driven view
Frontiers in Genetics | 4June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
on the question of geographical origins. This allows an objective
review of potential linguistic influences on Yiddish (Table 1),
which exposes the dangers in adopting a “linguistic creationism”
view in linguistics.
The historical evidence in favor of an Irano-Turko-Slavic
origin for Yiddish is paramount (e.g., Wexler, 1993, 2010). Jews
played a major role on the Silk Roads in the Ninth to Eleventh
century. In the mid-Ninth century, in roughly the same years,
Jewish merchants in both Mainz and at Xi’an received special
trading privileges from the Holy Roman Empire and the Tang
dynasty court (Robert, 2014). These roads linked Xi’an to Mainz
and Andalusia, and further to sub-Saharan Africa and across
to the Arabian Peninsula and India-Pakistan. The Silk Roads
provided the motivation for Jewish settlement in Afro-Eurasia
in the Ninth to Eleventh centuries since the Jews played a
dominant role on these routes as a neutral trading guild with
no political agendas (Gil, 1974; Cansdale, 1996, 1998). Hence,
the Jewish traders had contact with a wealth of languages in the
areas that they traversed (Hadj-Sadok, 1949; Khordadhbeh, 1889;
Hansen, 2012; Wexler TBD), which they brought back to their
communities nested in major trading hubs (Rabinowitz, 1945,
1948; Das et al., 2016). The central Eurasian Silk Roads were
controlled by Iranian polities, which provided opportunities for
Iranian-speaking Jews, who constituted the overwhelming bulk
of the world’s Jews from the time of Christ to the Eleventh century
(Baron, 1952). It should not come as a surprise to find that
Yiddish (and other Old Jewish languages) contains components
and rules from a large variety of languages, all of them spoken
on the Silk Roads (Khordadhbeh, 1889; Wexler, 2011, 2012,
In addition to language contacts, the Silk Roads also provided
the motivation for widespread conversion to Judaism by
populations eager to participate in the extremely lucrative trade,
which had become a Jewish quasi-monopoly along the trade
routes (Rabinowitz, 1945, 1948; Baron, 1957). These conversions
are discussed in Jewish literature between the Sixth and Eleventh
centuries, both in Europe and Iraq (Sand, 2009; Kraemer, 2010).
Yiddish and other Old Jewish languages were all created by
the peripatetic merchants as secret languages that would isolate
them from their customers and non-Jewish trading partners
(Hadj-Sadok, 1949; Gil, 1974; Khordadhbeh, 1889; Cansdale,
1998; Robert, 2014). The study of Yiddish genesis, thereby,
necessitates the study of all the Old Jewish languages of this time
There is also a quantifiable amount of Iranian and Turkic
elements in Yiddish. The Babylonian Talmud, completed by the
Sixth century A.D., is rich in Iranian linguistic, legalistic, and
religious influences. From the Talmud, a large Iranian vocabulary
has entered Hebrew and Judeo-Aramaic, and from there spread
to Yiddish. This corpus has been known since the 1930s and is
common knowledge to Talmud scholars (Telegdi, 1933). In the
Khazar Empire, the Eurasian Jews, plying the Silk Roads, became
speakers of Slavic—an important language because of the trading
activities of the Rus’ (pre-Ukrainians) with whom the Jews were
undoubtedly allied on the routes linking Baghdad and Bavaria.
This is evident by the existence of newly invented Hebroidism,
inspired by Slavic patterns of discourse in Yiddish (Wexler, 2010).
We advocate for implementing a more evolutionary
understanding in linguistics. That includes giving more attention
to the linguistic process that alter languages (e.g., relexification)
and acquiring more competence in other languages and histories.
When studying the origin of Ashkenazic Jews and Yiddish, such
knowledge should include the history of the Silk Roads and
Irano-Turkish languages.
Deciphering the origin of human populations is not a new
challenge for geneticists, yet only in the past decade high-
throughput genetic data were harnessed to answer these
questions. Here, we briefly discuss the differences between the
available tools based on identity by distance. Existing PCA or
PCA-like approaches (e.g., Novembre et al., 2008; Yang et al.,
2012) can localize Europeans to countries (understood as the
last place where major admixture event took place or the place
where the four ancestors of “unmixed” individuals came from)
with less than 50% accuracy (Yang et al., 2012). The limitations of
PCA (discussed in Novembre and Stephens, 2008) appear to be
inherent in the framework where continental populations plotted
along the two primary PCs cluster in the vertices of a triangle-
like shape and the remaining populations cluster along or within
the edges (e.g., Elhaik et al., 2013). There is therefore reason
to question the applicability of ambitious PCA-based methods
(Yang et al., 2012, 2014) aiming to infer multiple ancestral
locations outside of Europe. Overall, accurate localization of
worldwide individuals remains a significant challenge (Elhaik
et al., 2014).
The GPS framework assumes that humans are mixed and
that their genetic variation (admixture) can be modeled by the
proportion of genotypes assigned to any number of fixed regional
putative ancestral populations (Elhaik et al., 2014). GPS employs
a supervised ADMIXTURE analysis where the admixture
components are fixed, which allows evaluating both the test
individuals and reference populations against the same putative
ancestral populations. GPS infers the geographical coordinates
of an individual by matching their admixture proportions
with those of reference populations.Reference populations are
populations known to reside in a certain geographical region
for a substantial period of time in a time frame of hundreds
to a thousand years and can be predicted to their geographical
locations while absent from the reference population panel (Das
et al., 2016). The final geographic location of a test individual is
determined by converting the genetic distance of the individual to
m reference populations into geographic distances (Elhaik et al.,
2014). Intuitively, the reference populations can be thought of
as “pulling” the individual in their direction with a strength
proportional to their genetic similarity until a consensus is
reached (Figure S1). Interpreting the results, particularly when
the predicted location differs from the contemporary location of
the studied population, demands cautious.
Population structure is affected by biological and
demographic processes like genetic drift, which can act rapidly
on small, relatively isolated populations, as opposed to large
Frontiers in Genetics | 5June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
non-isolated populations, and migration, which occurs more
frequently (Jobling et al., 2013). Understanding the geography-
admixture relationships necessitates knowing how relative
isolation and migration history affected the allele frequencies
of populations. Unfortunately, oftentimes we lack information
about both processes. GPS addresses this problem by analyzing
the relative proportions of admixture in a global network of
reference populations that provide us with different “snapshots”
of historical admixture events. These global admixture events
occurred at different times through different biological and
demographic processes, and their long-lasting effect is related
to our ability to associate an individual with their matching
admixture event.
In relatively isolated populations the admixture event is likely
old, and GPS would localize a test individual with their parental
population more accurately. By contrast, if the admixture event
was recent and the population did not maintain relative isolation,
GPS prediction would be erroneous (Figure S2). This is the
case of Caribbean populations, whose admixture proportions
still reflect the massive Nineteenth and Twentieth centuries’
mixture events involving Native Americans, West Europeans,
and Africans (Elhaik et al., 2014). While the original level
of isolation remains unknown, these two scenarios can be
distinguished by comparing the admixture proportions of the
test individual and adjacent populations. If this similarity is high,
we can conclude that we have inferred the likely location of the
admixture event that shaped the admixture proportion of the test
individual. If the opposite is true, the individual is either mixed
and thereby violates the assumptions of the GPS model or the
parental populations do not exist either in GPS’s reference panel
or in reality. Most of the time (83%) GPS predicted unmixed
individuals to their true locations with most of the remaining
individuals predicted to neighboring countries (Elhaik et al.,
To understand how migration modifies the admixture
proportions of the migratory and host populations, we can
consider two simple cases of point or massive migration
followed by assimilation and a third case of migration followed
by isolation. Point migration events have little effect on the
admixture proportions of the host population, particularly when
it absorbs a paucity of migrants, in which case the migrants’
admixture proportions would resemble those of the host
population within a few generations and their resting place would
represent that of the host population. Massive demographic
movements, such as large-scale invasion or migration that
affect a large part of the population are rare and create
temporal shifts in the admixture proportions of the host
population. The host population would temporarily appear as
a two-way mixed population, reflecting the components of
the host and invading populations (e.g., European and Native
American, in the case of Puerto Ricans) until the admixture
proportions would homogenize population-wise. If this process
is completed, the admixture signature of this region may be
altered and the geographical placement of the host population
would represent again the last place where the admixture
event took place for both the host and invading populations.
GPS would, thereby, predict the host population’s location for
both populations. Populations that migrate from A to B and
maintain genetic isolation would be predicted to point A in
the leave-one-out population analysis. While human migrations
are not uncommon, maintaining a perfect genetic isolation
over a long period of time is very difficult (e.g., Veeramah
et al., 2011; Behar et al., 2012; Elhaik, 2016; Hellenthal et al.,
2016), and GPS predictions for the vast majority of worldwide
populations indicate that these cases are indeed exceptional
(Elhaik et al., 2014). Despite of its advantages, GPS has several
limitations. First, it yields the most accurate predictions for
unmixed individuals. Second, using migratory or highly mixed
populations (both are detectable through the leave-one-out
population analysis) as reference populations may bias the
predictions. Further developments are necessary to overcome
these limitations and make GPS applicable to mixed population
groups (e.g., African Americans).
The meaning of the term “Ashkenaz” and the geographical
origins of AJs and Yiddish are some of the longest standing
questions in history, genetics, and linguistics. In our previous
work we have identified “ancient Ashkenaz,” a region in
northeastern Turkey that harbors four primeval villages whose
names resemble Ashkenaz. Here, we elaborate on the meaning
of this term and argue that it acquired its modern meaning only
after a critical mass of Ashkenazic Jews arrived in Germany.
We show that all bio-localization analyses have localized AJs
to Turkey and that the non-Levantine origins of AJs are
supported by ancient genome analyses. Overall, these findings
are compatible with the hypothesis of an Irano-Turko-Slavic
origin for AJs and a Slavic origin for Yiddish and contradict the
predictions of Rhineland hypothesis that lacks historical, genetic,
and linguistic support (Table 1).
EE conceived the paper. MP processed the ancient DNA data. RD
and EE carried out the analyses. EE co-wrote it with PW and RD.
All authors approved the paper.
EE was partially supported by The Royal Society International
Exchanges Award to EE and Michael Neely (IE140020), MRC
Confidence in Concept Scheme award 2014-University of
Sheffield to EE (Ref: MC_PC_14115), and a National Science
Foundation grant DEB-1456634 to Tatiana Tatarinova and EE.
We thank the many public participants for donating their DNA
sequences for scientific studies and The Genographic Project’s
public database for providing us with their data.
The Supplementary Material for this article can be found
online at:
Frontiers in Genetics | 6June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
Alexander, D. H., and Lange, K. (2011). Enhancements to the ADMIXTURE
algorithm for individual ancestry estimation. BMC Bioinformatics 12:246.
doi: 10.1186/1471-2105-12-246
Aptroot, M. (2016). Yiddish language and Ashkenazic Jews: a perspective
from culture, language and literature. Genome Biol. Evol. 8, 1948–1949.
doi: 10.1093/gbe/evw131
Atzmon, G., Hao, L., Pe’er, I., Velez, C., Pearlman, A., Palamara, P. F., et al. (2010).
Abraham’s children in the genome era: major Jewish diaspora populations
comprise distinct genetic clusters with shared Middle Eastern ancestry. Am. J.
Hum. Genet. 86, 850–859. doi: 10.1016/j.ajhg.2010.04.015
Baron, S. W. (1937). Social and Religious History of the Jews, vol. 1. New York, NY:
Columbia University Press.
Baron, S. W. (1952). Social and Religious History of the Jews, vol. 2. New York, NY:
Columbia University Press.
Baron, S. W. (1957). Social and Religious History of the Jews, vol. 3. High Middle
Ages: Heirs of Rome and Persia. New York, NY: Columbia University Press.
Behar, D. M., Harmant, C., Manry, J., van Oven, M., Haak, W., Martinez-Cruz, B.,
et al. (2012). The Basque paradigm: genetic evidence of a maternal continuity
in the Franco-Cantabrian region since pre-Neolithic times. Am. J. Hum. Genet.
90, 486–493. doi: 10.1016/j.ajhg.2012.01.002
Behar, D. M., Metspalu, M., Baran, Y., Kopelman, N. M., Yunusbayev,
B., Gladstein, A., et al. (2013). No evidence from genome-wide data
of a Khazar origin for the Ashkenazi Jews. Hum.Biol. 85, 859–900.
doi: 10.3378/027.085.0604
Ben-Sasson, H. H. (1976). A History of the Jewish People. Cambridge, MA: Harvard
University Press.
Cansdale, L. (1996). The Radhanites: ninth century Jewish international traders.
Aust. J. Jewish Stud. 10, 65–77.
Cansdale, L. (1998). “Jews on the Silk Road,” in Worlds of the Silk Roads: Ancient
and Modern, eds D. Christian and C. Benjamin (Turnhout: Brepols), 23–30.
doi: 10.1484/M.SRS-EB.4.00037
Cavalli-Sforza, L. L. (1997). Genes, peoples, and languages. Proc. Natl. Acad. Sci.
U.S.A. 94, 7719–7724. doi: 10.1073/pnas.94.15.7719
Das, R., Wexler, P., Pirooznia, M., and Elhaik, E. (2016). Localizing Ashkenazic
Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biol.
Evol. 8, 1132–1149. doi: 10.1093/gbe/evw046
Elhaik, E. (2013). The missing link of Jewish European ancestry: Contrasting
the Rhineland and the Khazarian hypotheses. Genome Biol. Evol. 5, 61–74.
doi: 10.1093/gbe/evs119
Elhaik, E. (2016). In search of the jüdische Typus: a proposed benchmark to test the
genetic basis of Jewishness challenges notions of “Jewish biomarkers.” Front.
Genet. 7:141. doi: 10.3389/fgene.2016.00141
Elhaik, E., Greenspan, E., Staats, S., Krahn, T., Tyler-Smith, C., Xue, Y., et al.
(2013). The GenoChip: a new tool for genetic anthropology. Genome Biol. Evol.
5, 1021–1031. doi: 10.1093/gbe/evt066
Elhaik, E., Tatarinova, T., Chebotarev, D., Piras, I. S., Maria Calò, C., De Montis,
A., et al. (2014). Geographic population structure analysis of worldwide
human populations infers their biogeographical origins. Nat.Commun. 5:3513.
doi: 10.1038/ncomms4513
Falk, R. (2006). Zionism and the Biology of Jews (Hebrew). Tel Aviv: Resling.
Finkelstein, I., and Silberman, N. A. (2002). The Bible Unearthed: Archaeology’s
New Vision of Ancient Israel and the Origin of Its Sacred Texts. New York, NY:
Simon and Schuster.
Flegontov, P., Kassian, A., Thomas, M. G., Fedchenko, V., Changmai, P., Starostin,
G., et al. (2016). Pitfalls of the geographic population structure (GPS) approach
applied to human genetic history: a case study of Ashkenazi Jews. Genome Biol.
Evol. 8, 2259–2265. doi: 10.1093/gbe/evw162
Frendo, A. J. (2004). “Back to basics: a holistic approach to the
problem of the emergence of ancient Israel,” in Search of Pre-Exilic
Israel, ed J. Day (New York, NY: T&T Clark International), 41–64.
doi: 10.1097/00152193-200410000-00004
Gil, M. (1974). The R¯
anite merchants and the land of R¯
an. J. Econ. Soc. Hist.
Orient. 17, 299–328.
Hadj-Sadok, M. (1949). Description du Maghreb et de l’Europe au IIIe-IXe siecle.
Algiers: Carbonel.
Hammer, M. F., Redd, A. J., Wood, E. T., Bonner, M. R., Jarjanazi, H., Karafet,
T., et al. (2000). Jewish and Middle Eastern non-Jewish populations share a
common pool of Y-chromosome biallelic haplotypes. Proc. Natl. Acad. Sci.
U.S.A. 97, 6769–6774. doi: 10.1073/pnas.100115997
Hansen, V. (2012). The Silk Road: A New History. New York, NY: Oxford
University Press.
Hellenthal, G., Myers, S., Reich, D., Busby, G. B. J., Lipson, M., Capelli,
C., et al. (2016). The Kalash genetic isolate? the evidence for recent
admixture. Am. J. Hum. Genet. 98, 396–397. doi: 10.1016/j.ajhg.2015.
Jobling, M., Hurles, M. E., and Tyler-Smith, C. (2013). Human Evolutionary
Genetics: Origins, Peoples and Disease. New York, NY: Garland Science.
Khordadhbeh, I. (1889). The Book of Roads and Kingdoms (Kitab al-Masalik
Wa-’al-Mamalik), p. 114 in Bibliotheca Geographorum Arabicorum, Edited by
de Goeje. Leiden: Brill.
King, R. D. (2001). The paradox of creativity in diaspora: the Yiddish language and
Jewish identity. Stud. Ling. Sci. 31, 213–229.
Kraemer, R. S. (2010). Unreliable Witnesses: Religion, Gender, and History in
the Greco-Roman Mediterranean. New York, NY: Oxford University Press.
doi: 10.1093/acprof:oso/9780199743186.001.0001
Lazaridis, I., Nadel, D., Rollefson, G., Merrett, D. C., Rohland, N., Mallick, S., et al.
(2016). Genomic insights into the origin of farming in the ancient Near East.
Nature 536, 419–424. doi: 10.1038/nature19310
Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran,
S., et al. (2008). Worldwide human relationships inferred from genome-
wide patterns of variation. Science 319, 1100–1104. doi: 10.1126/science.
Marshall, S., Das, R., Pirooznia, M., and Elhaik, E. (2016). Reconstructing Druze
population history. Sci. Rep. 6:35837. doi: 10.1038/srep35837
Novembre, J., Johnson, T., Bryc, K., Kutalik, Z., Boyko, A. R., Auton, A.,
et al. (2008). Genes mirror geography within Europe. Nature 456, 98–101.
doi: 10.1038/nature07331
Novembre, J., and Stephens, M. (2008). Interpreting principal component
analyses of spatial population genetic variation. Nat.Genet. 40, 646–649.
doi: 10.1038/ng.139
Ostrer, H. (2001). A genetic profile of contemporary Jewish populations. Nat. Rev.
Genet. 2, 891–898. doi: 10.1038/35098506
Ostrer, H. (2012). Legacy: A Genetic History of the Jewish People. Oxford: Oxford
University Press.
Patai, R. (1983). On Jewish Folklore. Detroit, MI: Wayne State University Press.
Patterson, N. J., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y.,
et al. (2012). Ancient admixture in Human history. Genetics 192, 1065–1093.
doi: 10.1534/genetics.112.145037
Rabinowitz, L. I. (1945). The routes of the Radanites. Jew. Q. Rev. 35, 251–280.
doi: 10.2307/1452187
Rabinowitz, L. I. (1948). Jewish Merchant Adventurers: A Study of the Radanites.
London: Goldston.
Robert, J. N. (2014). De Rome à la Chine.Sur les Routes de la soie au Temps des
Césars. Paris: Les Belles Lettres.
Sand, S. (2009). The Invention of the Jewish People. London: Verso.
Sand, S. (2011). The Words and the Land: Israeli Intellectuals and the Nationalist
Myth. Los Angeles, CA: Semiotext(e).
Telegdi, Z. (1933). A Talmudi Irodalom iráni Kölcsönszavainak Hangtana.
Budapest: Kertész József Ny.
van Straten, J. (2004). Jewish migrations from Germany to Poland: the Rhineland
hypothesis revisited. Mankind Q. 44, 367–384.
van Straten, J. (2007). Early modern Polish Jewry the Rhineland hypothesis
revisited. Hist. Methods 40, 39–50. doi: 10.3200/HMTS.40.1.39-50
van Straten, J., and Snel, H. (2006). The Jewish “demographic miracle” in
nineteenth-century Europe fact or fiction? Hist. Methods 39, 123–131.
doi: 10.3200/HMTS.39.3.123-131
Veeramah, K. R., Tönjes, A., Kovacs, P., Gross, A., Wegmann, D., Geary, P.,
et al. (2011). Genetic variation in the Sorbs of eastern Germany in the context
of broader European genetic diversity. Eur. J. Hum. Genet. 19, 995–1001.
doi: 10.1038/ejhg.2011.65
Weinreich, M. (2008). History of the Yiddish Language. New Haven, CT: Yale
University Press.
Frontiers in Genetics | 7June 2017 | Volume 8 | Article 87
Das et al. The Origins of Ashkenaz, Ashkenazic Jews, and Yiddish
Wexler, P. (1991). Yiddish—the fifteenth Slavic language. A study of partial
language shift from Judeo-Sorbian to German. Int. J. Soc. Lang. 1991, 9–150,
215–225. doi: 10.1515/ijsl.1991.91.9
Wexler, P. (1993). The Ashkenazic Jews: a Slavo-Turkic People in Search of a Jewish
identity. Colombus, OH: Slavica.
Wexler, P. (1996). The Non-Jewish Origins of the Sephardic Jews. Albany, NY: State
University of New York Press.
Wexler, P. (2010). “Do Jewish Ashkenazim (i.e. “Scythians”) originate in
Iran and the Caucasus and is Yiddish Slavic?,in Sprache und Leben der
frühmittelalterlichen Slaven: Festschrift für Radoslav Katiˇ
c zum 80 Geburtstag,
eds E. Stadnik-Holzer and G. Holzer (Frankfurt: Peter Lang), 189–216.
Wexler, P. (2011). A covert Irano-Turko-Slavic population and its two covert Slavic
languages: The Jewish Ashkenazim (Scythians), Yiddish and ‘Hebrew’. Zbornik
Matice srpske za Slavistiku 80, 7–46.
Wexler, P. (2012). “Relexification in Yiddish: a Slavic language masquerading as
a High German dialect?,in Studien zu Sprache, Literatur und Kultur bei den
Slaven: Gedenkschrift für George, Y. Shevelov aus Anlass seines 100. Geburtstages
und 10. Todestages, eds A. Danylenko and S. H. Vakulenko (München, Berlin:
Verlag Otto Sagner), 212–230.
Wexler, P. (2016). “Cross-border Turkic and Iranian language retention in
the West and East Slavic lands and beyond: a tentative classification,” in
The Palgrave Handbook of Slavic Languages, Identities and Borders, eds T.
Kamusella, M. Nomachi, and C. Gibson (London: Palgrave Macmillan), 8–25.
Wexler, P. (2017). Looking at the overlooked. (The Iranian and other Asian and
African components of the Slavic, Iranian and Turkic “Yiddishes” and their
common Hebrew lexicon along the Silk Roads).
Xue,J., Lencz, T., Darvasi, A., Pe’er, I., and Carmi, S. (2017). The time and place
of European admixture in Ashkenazi Jewish history. PLoS Genet. 13:e1006644.
doi: 10.1371/journal.pgen.1006644
Yang, W. Y., Novembre, J., Eskin, E., and Halperin, E. (2012). A model-based
approach for analysis of spatial structure in genetic data. Nat. Genet. 44,
725–731. doi: 10.1038/ng.2285
Yang, W. Y., Platt, A., Chiang, C. W.-K., Eskin, E., Novembre, J.,
and Pasaniuc, B. (2014). Spatial localization of recent ancestors for
admixed individuals. G3 (Bethesda) 4, 2505–2518. doi: 10.1534/g3.114.
Conflict of Interest Statement: EE is a consultant for DNA Diagnostic Centre.
The other authors declare that the research was conducted in the absence of
any commercial or financial relationships that could be construed as a potential
conflict of interest.
The reviewer PF declared a past co-authorship with one of the authors to
the handling Editor, who ensured that the process nevertheless met the standards
of a fair and objective review.
Copyright © 2017 Das, Wexler, Pirooznia and Elhaik. This is an open-access article
distributed under the terms of the Creative Commons Attribution License (CC BY).
The use, distribution or reproduction in other forums is permitted, provided the
original author(s) or licensor are credited and that the original publication in this
journal is cited, in accordance with accepted academic practice. No use, distribution
or reproduction is permitted which does not comply with these terms.
Frontiers in Genetics | 8June 2017 | Volume 8 | Article 87

Supplementary resource (1)

... Given the strong correspondence between geography and genetics [8,9], a number of strategies have focused on the delineation of the precise geographic origin of human populations using high-resolution genetic data. The Geographic Population Structure (GPS) algorithm is an admixture based tool that has so far been employed for the biogeographical analyses of human populations and is likely superior to other existing methods for the same [9][10][11][12][13]. It has been successfully used to reconstruct history of several human populations worldwide [9,[13][14][15][16][17][18][19]. ...
... The Geographic Population Structure (GPS) algorithm is an admixture based tool that has so far been employed for the biogeographical analyses of human populations and is likely superior to other existing methods for the same [9][10][11][12][13]. It has been successfully used to reconstruct history of several human populations worldwide [9,[13][14][15][16][17][18][19]. In brief, it deduces the genomic proximity between the query and reference individuals to determine the likely biogeographical affinity of the former using the geographic coordinates (latitude and longitude) corresponding to the latter as reference. ...
... Here we sought to evaluate whether the GPS algorithm, largely employed for biogeographical analyses of human populations [9,[13][14][15][16] could be applied to non-human species, and to estimate its efficacy in doing the same. We applied the GPS tool to interrogate available gorilla genomes [7] and estimated the ancestral Inference of the biogeographic proximity of individuals, based on genetic data has been challenging and of interest to biologists over decades. ...
Full-text available
Background The utilization of high resolution genome data has important implications for the phylogeographical evaluation of non-human species. Biogeographical analyses can yield detailed understanding of their population biology and facilitate the geo-localization of individuals to promote their efficacious management, particularly when bred in captivity. The Geographic Population Structure (GPS) algorithm is an admixture based tool for inference of biogeographical affinities and has been employed for the geo-localization of various human populations worldwide. Here, we applied the GPS tool for biogeographical analyses and localization of the ancestral origins of wild and captive gorilla genomes, of unknown geographic source, available in the Great Ape Genome Project (GAGP), employing Gorillas with known ancestral origin as the reference data. Results Our findings suggest that GPS was successful in recapitulating the population history and estimating the geographic origins of all gorilla genomes queried and localized the wild gorillas with unknown geographical origin < 150 km of National Parks/Wildlife Reserves within the political boundaries of countries, considered as prominent modern-day abode for gorillas in the wild. Further, the GPS localization of most captive-born gorillas was congruent with their previously presumed ancestral homes. Conclusions Currently there is limited knowledge of the ancestral origins of most North American captive gorillas, and our study highlights the usefulness of GPS for inferring ancestry of captive gorillas. Determination of the native geographical source of captive gorillas can provide valuable information to guide breeding programs and ensure their appropriate management at the population level. Finally, our findings shine light on the broader applicability of GPS for protecting the genetic integrity of other endangered non-human species, where controlled breeding is a vital component of their conservation.
... The Geographic Population Structure (GPS) algorithm is a recently devised admixture based tool for biogeographical analyses. While GPS has been demonstrated to be superior to other existing methods for tracing the ancestry of human populations [2][3][4][5][6][7], it may not be accurate for tracing ancestry of recently admixed individuals and groups (up to 1000 years before present) [2,8]. It relies on extrapolating the genomic similarity between the query and reference populations to infer the likely biogeographical affinity of the former using the geographic locations (latitude and longitude) corresponding to the latter as a reference. ...
... It relies on extrapolating the genomic similarity between the query and reference populations to infer the likely biogeographical affinity of the former using the geographic locations (latitude and longitude) corresponding to the latter as a reference. GPS has been effectively employed for reconstructing the population history of several populations worldwide [2,6,7,[9][10][11]. However, so far its utility and robustness in accurately localizing highly admixed populations whose genetic structure has been modified by significant demographic, biological and social factors has remained largely unexplored. ...
... Presently several biogeographical approaches using highresolution next-generation sequencing data are available that are based on identity by distance, nevertheless the accurate geo-localization of populations has remained a challenge. GPS has been used successfully for determination of the biogeographical affinity of several worldwide populations [2,6,7,[9][10][11]. This approach correlates the admixture proportions of the query populations with that of the reference groups known to have resided in a specific geographic location for a substantial period of time and infers the geographic coordinates (latitude and longitude) of the former based on the geographic information pertaining to the latter. ...
Full-text available
Background: The utilization of biological data to infer the geographic origins of human populations has been a long standing quest for biologists and anthropologists. Several biogeographical analysis tools have been developed to infer the geographical origins of human populations utilizing genetic data. However due to the inherent complexity of genetic information these approaches are prone to misinterpretations. The Geographic Population Structure (GPS) algorithm is an admixture based tool for biogeographical analyses and has been employed for the geo-localization of various populations worldwide. Here we sought to dissect its sensitivity and accuracy for localizing highly admixed groups. Given the complex history of population dispersal and gene flow in the Indian subcontinent, we have employed the GPS tool to localize five South Asian populations, Punjabi, Gujarati, Tamil, Telugu and Bengali from the 1000 Genomes project, some of whom were recent migrants to USA and UK, using populations from the Indian subcontinent available in Human Genome Diversity Panel (HGDP) and those previously described as reference. Results: Our findings demonstrate reasonably high accuracy with regards to GPS assignment even for recent migrant populations sampled elsewhere, namely the Tamil, Telugu and Gujarati individuals, where 96%, 87% and 79% of the individuals, respectively, were positioned within 600 km of their native locations. While the absence of appropriate reference populations resulted in moderate-to-low levels of precision in positioning of Punjabi and Bengali genomes. Conclusions: Our findings reflect that the GPS approach is useful but likely overtly dependent on the relative proportions of admixture in the reference populations for determination of the biogeographical origins of test individuals. We conclude that further modifications are desired to make this approach more suitable for highly admixed individuals.
... Biogeographical analysis was performed using the Geographical Population Structure (GPS) algorithm, which has been successfully used to reconstruct history of several populations worldwide [45][46][47][48][49][50][51][52][53]. GPS correlates the admixture patterns of individuals of unknown origins using the admixture fractions (GEN file) and geographical locations or coordinates (GEO file) of reference individuals with known geographical origin. ...
Full-text available
Background The population structure of the Indian subcontinent is a tapestry of extraordinary diversity characterized by the amalgamation of autochthonous and immigrant ancestries and rigid enforcement of sociocultural stratification. Here we investigated the genetic origin and population history of the Kumhars, a group of people who inhabit large parts of northern India. We compared 27 previously published Kumhar SNP genotype data sampled from Uttar Pradesh in north India to various modern day and ancient populations. Results Various approaches such as Principal Component Analysis (PCA), Admixture, TreeMix concurred that Kumhars have high ASI ancestry, minimal Steppe component and high genomic proximity to the Kurchas, a small and relatively little-known population found ~ 2500 km away in Kerala, south India. Given the same, biogeographical mapping using Geographic Population Structure (GPS) assigned most Kumhar samples in areas neighboring to those where Kurchas are found in south India. Conclusions We hypothesize that the significant genomic similarity between two apparently distinct modern-day Indian populations that inhabit well separated geographical areas with no known overlapping history or links, likely alludes to their common origin during or post the decline of the Indus Valley Civilization (estimated by ALDER). Thereafter, while they dispersed towards opposite ends of the Indian subcontinent, their genomic integrity and likeness remained preserved due to endogamous social practices. Our findings illuminate the genomic history of two Indian populations, allowing a glimpse into one or few of numerous of human migrations that likely occurred across the Indian subcontinent and contributed to shape its varied and vibrant evolutionary past.
... The admixture components can be used to correct for population stratification (Elhaik and Ryan, 2019), in the same manner as principal components are used, excepting that they model admixture directly, whereas PCA does not. This approach, first employed for biogeography (Elhaik et al., 2014), has been routinely for population genetic investigations and was shown to be applicable to both modern and ancient populations (Flegontov et al., 2016;Marshall et al., 2016;Das et al., 2017). Despite its premise it has yet been implemented in biobanks; the barriers resemble those of Mixture Models in that a "correct" set of gene pools is hard to establish. ...
The past years saw the rise of genomic biobanks and mega-scale meta-analysis of genomic data that promise to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limit the global understanding of disease risk and intervention efficacy, but also inhibit viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared, and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable, computable, operate without access to raw data due to privacy concerns. But they must be comparable, both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of commonly used genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks, locally and internationally, increase the accuracy of association analyses, and inform developmental efforts.
... Due to the success of the admixture-based method that employs AIMs in describing and classifying individuals to populations [17,[43][44][45], we sought to develop such a method that employs aAIMs. ...
Full-text available
The rapid accumulation of ancient human genomes from various areas and time periods potentially enables the expansion of studies of biodiversity, biogeography, forensics, population history, and epidemiology into past populations. However, most ancient DNA (aDNA) data were generated through microarrays designed for modern-day populations, which are known to misrepresent the population structure. Past studies addressed these problems by using ancestry informative markers (AIMs). It is, thereby, unclear whether AIMs derived from contemporary human genomes can capture ancient population structures, and whether AIM-finding methods are applicable to aDNA, provided that the high missingness rates in ancient—and oftentimes haploid—DNA can also distort the population structure. Here, we define ancient AIMs (aAIMs) and develop a framework to evaluate established and novel AIM-finding methods in identifying the most informative markers. We show that aAIMs identified by a novel principal component analysis (PCA)-based method outperform all of the competing methods in classifying ancient individuals into populations and identifying admixed individuals. In some cases, predictions made using the aAIMs were more accurate than those made with a complete marker set. We discuss the features of the ancient Eurasian population structure and strategies to identify aAIMs. This work informs the design of single nucleotide polymorphism (SNP) microarrays and the interpretation of aDNA results, which enables a population-wide testing of primordialist theories.
Full-text available
Motivation: In clinical trials, individuals are matched using demographic criteria, paired, and then randomly assigned to treatment and control groups to determine a drug's efficacy. A chief cause for the irreproducibility of results across pilot to Phase III trials is population stratification bias caused by the uneven distribution of ancestries in the treatment and control groups. Results: Pair Matcher (PaM) addresses stratification bias by optimising pairing assignments a priori and/or a posteriori to the trial using both genetic and demographic criteria. Using simulated and real datasets, we show that PaM identifies ideal and near-ideal pairs that are more genetically homogeneous than those identified based on competing methods, including the commonly used principal component analysis (PCA). Homogenising the treatment (or case) and control groups can be expected to improve the accuracy and reproducibility of the trial or genetic study. PaM's ancestral inferences also allow characterizing responders and developing a precision medicine approach to treatment. Availability: PaM is freely available via R and a web-interface at Supplementary information: Supplementary data are available at Bioinformatics online.
Full-text available
The human population displays wide variety in demographic history, ancestry, content of DNA derived from hominins or ancient populations, adaptation, traits, copy number variation (CNVs), drug response, and more. These polymorphisms are of broad interest to population geneticists, forensics investigators, and medical professionals. Historically, much of that knowledge was gained from population survey projects. While many commercial arrays exist for genome-wide single-nucleotide polymorphism (SNP) genotyping, their design specifications are limited and they do not allow a full exploration of biodiversity. We thereby aimed to design the Diversity of REcent and Ancient huMan (DREAM) - an all-inclusive microarray that would allow both identification of known associations and exploration of standing questions in genetic anthropology, forensics, and personalized medicine. DREAM includes probes to interrogate ancestry informative markers obtained from over 450 human populations, over 200 ancient genomes, and 10 archaic hominins. DREAM can identify 94% and 61% of all known Y and mitochondrial haplogroups, respectively and was vetted to avoid interrogation of clinically relevant markers. To demonstrate its capabilities, we compared its FST distributions with those of the 1000 Genomes Project and commercial arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, DREAM's autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. DREAM performances are further illustrated in biogeographical, identical by descent (IBD), and CNV analyses. In summary, with approximately 800,000 markers spanning nearly 2,000 genes, DREAM is a useful tool for genetic anthropology, forensic, and personalized medicine studies.
Full-text available
The Ashkenazi Jewish (AJ) population is important in genetics due to its high rate of Mendelian disorders. AJ appeared in Europe in the 10th century, and their ancestry is thought to comprise European (EU) and Middle-Eastern (ME) components. However, both the time and place of admixture are subject to debate. Here, we attempt to characterize the AJ admixture history using a careful application of new and existing methods on a large AJ sample. Our main approach was based on local ancestry inference, in which we first classified each AJ genomic segment as EU or ME, and then compared allele frequencies along the EU segments to those of different EU populations. The contribution of each EU source was also estimated using GLOBETROTTER and haplotype sharing. The time of admixture was inferred based on multiple statistics, including ME segment lengths, the total EU ancestry per chromosome, and the correlation of ancestries along the chromosome. The major source of EU ancestry in AJ was found to be Southern Europe (≈60–80% of EU ancestry), with the rest being likely Eastern European. The inferred admixture time was ≈30 generations ago, but multiple lines of evidence suggest that it represents an average over two or more events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event, which was likely Southern European, was estimated to ≈25–50 generations ago.
Full-text available
The Druze are an aggregate of communities in the Levant and Near East living almost exclusively in the mountains of Syria, Lebanon and Israel whose ~1000 year old religion formally opposes mixed marriages and conversions. Despite increasing interest in genetics of the population structure of the Druze, their population history remains unknown. We investigated the genetic relationships between Israeli Druze and both modern and ancient populations. We evaluated our findings in light of three hypotheses purporting to explain Druze history that posit Arabian, Persian or mixed Near Eastern- Levantine roots. The biogeographical analysis localised proto-Druze to the mountainous regions of southeastern Turkey, northern Iraq and southeast Syria and their descendants clustered along a trajectory between these two regions. The mixed Near Eastern–Middle Eastern localisation of the Druze, shown using both modern and ancient DNA data, is distinct from that of neighbouring Syrians, Palestinians and most of the Lebanese, who exhibit a high affinity to the Levant. Druze biogeographic affinity, migration patterns, time of emergence and genetic similarity to Near Eastern populations are highly suggestive of Armenian-Turkish ancestries for the proto-Druze.
Full-text available
The debate as to whether Jewishness is a biological trait inherent from an “authentic” “Jewish type” (jüdische Typus) ancestor or a system of beliefs has been raging for over two centuries. While the accumulated biological and anthropological evidence support the latter argument, recent genetic findings, bolstered by the direct-to-consumer genetic industry, purport to identify Jews or quantify one’s Jewishness from genomic data. To test the merit of claims that Jews and non-Jews are genetically distinguishable, we propose a benchmark where genomic data of Jews and non-Jews are hybridized over two generations and the observed and predicted Jewishness of the terminal offspring according to either the Orthodox religious law (Halacha) or the Israeli Law of Return are compared. Members of academia, the public, and 23andMe were invited to use the benchmark to test claims that Jews are genetically distinct from non-Jews. Here, we report the findings from these trials. We also compare the genomic similarity of ∼300 individuals from nearly thirty Afro-Eurasian Jewish communities to a simulated jüdische Typus population. The results are discussed in light of modern trends in the genetics of Jews and related fields and provide a tentative answer to the ageless question “who is a Jew?”
Full-text available
We report genome-wide ancient DNA from 44 ancient Near Easterners ranging in time between ~12,000 and 1,400 BCE, from Natufian hunter–gatherers to Bronze Age farmers. We show that the earliest populations of the Near East derived around half their ancestry from a ‘Basal Eurasian’ lineage that had little if any Neanderthal admixture and that separated from other non-African lineages before their separation from each other. The first farmers of the southern Levant (Israel and Jordan) and Zagros Mountains (Iran) were strongly genetically differentiated, and each descended from local hunter–gatherers. By the time of the Bronze Age, these two populations and Anatolian-related farmers had mixed with each other and with the hunter–gatherers of Europe to drastically reduce genetic differentiation. The impact of the Near Eastern farmers extended beyond the Near East: farmers related to those of Anatolia spread westward into Europe; farmers related to those of the Levant spread southward into East Afri
Full-text available
In a recent interdisciplinary study, Das and co-authors have attempted to trace the homeland of Ashkenazi Jews and of their historical language, Yiddish (Das et al. 2016. Localizing Ashkenazic Jews to Primeval Villages in the Ancient Iranian Lands of Ashkenaz. Genome Biology and Evolution). Das and co-authors applied the geographic population structure (GPS) method to autosomal genotyping data and inferred geographic coordinates of populations supposedly ancestral to Ashkenazi Jews, placing them in Eastern Turkey. They argued that this unexpected genetic result goes against the widely accepted notion of Ashkenazi origin in the Levant, and speculated that Yiddish was originally a Slavic language strongly influenced by Iranian and Turkic languages, and later remodeled completely under Germanic influence. In our view, there are major conceptual problems with both the genetic and linguistic parts of the work. We argue that GPS is a provenancing tool suited to inferring the geographic region where a modern and recently unadmixed genome is most likely to arise, but is hardly suitable for admixed populations and for tracing ancestry up to 1000 years before present, as its authors have previously claimed. Moreover, all methods of historical linguistics concur that Yiddish is a Germanic language, with no reliable evidence for Slavic, Iranian, or Turkic substrata.
Full-text available
The typology of Yiddish and the name Ashkenaz cannot serve as arguments to support the theory put forward by Das et al. (2016). (Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biol Evol. 8:1132–1149.) that the origin of Ashkenazic Jews can be located in ancient Iran. Yiddish is a Germanic, not a Slavic language. The history of the use of the term Ashkenaz from the Middle Ages onward is well documented. Ashkenazic Jewry is named for the Hebrew and Yiddish designation for Germany, originally a Biblical term.
Full-text available
The Yiddish language is over one thousand years old and incorporates German, Slavic, and Hebrew elements. The prevalent view claims Yiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians, and Turks exhibit the highest genetic similarity with AJs. The Geographic Population Structure (GPS) analysis localized most AJs along major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from “Ashkenaz.” Iranian and mountain Jews were localized along trade routes on the Turkey’s eastern border. Loss of maternal haplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call “Ashkenazic” (i.e., “Scythian”), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.
Speakers of Aramaic, Indo-Iranian, Turkic, Finno-Ugric, Mongolic, and possibly Indic languages migrated into Europe between the third millennium BCE and 1300 CE; see, for instance, Syrians, Jews;1 Sarmatians, Scythians, Jas, Jazygians, Chvalis, Alans, Avars, Huns, Roma (on their possible Iranian origins, see below), Cumans, Khazars, Kovars, Pechenegs, proto-Bulgarians, Hungarians, Mongolians, respectively. Apart from Hungarians and Turkic Karaites (at least until recently), most migratory groups eventually assimilated to the local majority ethnic groups and adopted their languages.