Article

Abraham's Children in the Genome Era: Major Jewish Diaspora Populations Comprise Distinct Genetic Clusters with Shared Middle Eastern Ancestry

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

For more than a century, Jews and non-Jews alike have tried to define the relatedness of contemporary Jewish people. Previous genetic studies of blood group and serum markers suggested that Jewish groups had Middle Eastern origin with greater genetic similarity between paired Jewish populations. However, these and successor studies of monoallelic Y chromosomal and mitochondrial genetic markers did not resolve the issues of within and between-group Jewish genetic identity. Here, genome-wide analysis of seven Jewish groups (Iranian, Iraqi, Syrian, Italian, Turkish, Greek, and Ashkenazi) and comparison with non-Jewish groups demonstrated distinctive Jewish population clusters, each with shared Middle Eastern ancestry, proximity to contemporary Middle Eastern populations, and variable degrees of European and North African admixture. Two major groups were identified by principal component, phylogenetic, and identity by descent (IBD) analysis: Middle Eastern Jews and European/Syrian Jews. The IBD segment sharing and the proximity of European Jews to each other and to southern European populations suggested similar origins for European Jewry and refuted large-scale genetic contributions of Central and Eastern European and Slavic populations to the formation of Ashkenazi Jewry. Rapid decay of IBD in Ashkenazi Jewish genomes was consistent with a severe bottleneck followed by large expansion, such as occurred with the so-called demographic miracle of population expansion from 50,000 people at the beginning of the 15(th) century to 5,000,000 people at the beginning of the 19(th) century. Thus, this study demonstrates that European/Syrian and Middle Eastern Jews represent a series of geographical isolates or clusters woven together by shared IBD genetic threads.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This is the opinion of Zoossmann-Diskin (2010), whose analysis of genetic markers on autosomes, sex chromosomes and the mtDNA positioned Eastern European Jews closer to their non-Jewish neighbors, especially Italians, than to other Jewish populations. Their genetic proximity to Italians led Zoossmann-Diskin (2010), as well as the authors of similar studies (Atzmon et al. 2010;Behar et al. 2010), to assert an origin for Ashkenazim among Roman converts to Judaism. ...
... Despite these issues, this paper, along with those of Atzmon et al. (2010) and Behar et al. (2010), represents a great improvement over the landmark study by Hammer et al. (2000) which calculates aggregated populations of Russians, Germans, and Austrians (a fictitious population in this sense) and moreover applies the χ 2 test to the aggregates, which should be avoided in cases not certified by peer review) is the author/funder. All rights reserved. ...
... While it has been claimed these four lineages are of general Middle Eastern in origin (Behar et al. 2004, p. 4-5; see also Behar et al. 2006), the most recent research (Costa et al. 2013) suggests they are, by and large, mostly rooted in European prehistory, and their presence therefore must be due to conversion. Whole genome and autosomal studies (Atzmon et al. 2010;Bray et al. 2010;Kopelman et al. 2009;Need et al. 2009;Zoossmann-Diskin 2010) have bridged these seemingly divergent results by confirming the genetic proximity of Ashkenazi Jews to both Middle Eastern and European groups, concluding essentially that the Jewish genome is a series of geographical clusters tied loosely together. Overall, the closest genetic neighbors to most Jewish groups are Palestinian Arabs, Bedouins, and Druze (see Hammer et al. 2000;Nebel et al. 2000), southern European populations such as Cypriots and Italians (see Atzmon et al. 2010; Zoossmann-Diskin 2010; see also Behar et al. 2010), and, somewhat surprisingly, North not certified by peer review) is the author/funder. ...
Preprint
The debate over the ethnogenesis of Ashkenazi Jewry is longstanding, and has been hampered by a lack of Jewish historiographical work between the Biblical and the early Modern eras. Most historians, as well as geneticists, situate them as the descendants of Israelite tribes whose presence in Europe is owed to deportations during the Roman conquest of Palestine, as well as migration from Babylonia, and eventual settlement along the Rhine. By contrast, a few historians and other writers, most famously Arthur Koestler, have looked to migrations following the decline of the little-understood Medieval Jewish kingdom of Khazaria as the main source for Ashkenazi Jewry. A recent study of genetic variation in southeastern European populations (Elhaik 2012) also proposed a Khazarian origin for Ashkenazi Jews, eliciting considerable criticism from other scholars investigating Jewish ancestry who favor a Near Eastern origin of Ashkenazi populations. This paper re-examines the genetic data and analytical approaches used in these studies of Jewish ancestry, and situates them in the context of historical, linguistic, and archaeological evidence from the Caucasus, Europe and the Near East. Based on this reanalysis, it appears not only that the Khazar Hypothesis per se is without serious merit, but also the veracity of the ‘Rhineland Hypothesis’ may also be questionable.
... The remaining authors use an arbitrary number of PCs or adopt ad hoc strategies to aid their decision, e.g., Ref. 33 . Pardiñas et al. 34 , for example, selected the first five PC "as recommended for most GWAS approaches" and principal components 6,9,11,12,13,and 19, whereas Wainschtein et al. 35 preferred the top 280 PCs. There are no proper usage guidelines for PCA, and "innovations" toward less restrictive usage are adopted quickly. ...
... Such analyses have a thematic intepretation, where the clustering of AJ samples is evidence of a shared Levantine origin, e.g., Refs. 12,13 , that "short" distances between AJs and Levantines indicate close genetic relationships in support of a shared Levantine past, e.g., Ref. 12 , whereas the "short" distances between AJs and Europeans are evidence of admixture 13 . Finally, as a rule, the much shorter distances between AJs and the Caucasus or Turkish populations, observed by all recent studies, were ignored 12,13,47,48 . ...
... Such analyses have a thematic intepretation, where the clustering of AJ samples is evidence of a shared Levantine origin, e.g., Refs. 12,13 , that "short" distances between AJs and Levantines indicate close genetic relationships in support of a shared Levantine past, e.g., Ref. 12 , whereas the "short" distances between AJs and Europeans are evidence of admixture 13 . Finally, as a rule, the much shorter distances between AJs and the Caucasus or Turkish populations, observed by all recent studies, were ignored 12,13,47,48 . ...
Article
Full-text available
Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
... The first part of this paper focuses then on some well-known genetic population studies on Jews and the criticism that was voiced against them by other geneticists. It weighs the arguments that are brought forward by various scientists and attempts to answer the question of whether the two much-discussed genome association studies on worldwide Jewry from 2010 (Atzmon et al. 2010;Behar et al. 2010) are biased and convey a biological view of Judaism. ...
... In the introduction the team made it clear that their objective is to take up the old question of a biological definition of "the Jews". They (Atzmon et al. 2010) explicitly take on the notion of race: "For more than a century, Jews and non-Jews alike have tried to define the relatedness of contemporary Jewish people [...] whether the Jews constitute a race, a religious group, or something else." Atzmon and co-authors use race as a shorthand for a biologically definable entity, which might or might not be detected or verified by genetic research. Two years later one of the coauthors of the paper, claimed that the study had "demonstrated a biological basis for Jewishness." ...
... The wording in the scientific paper itself is more careful: "In this study, Jewish populations […] formed a distinctive population cluster […], albeit one that is closely related to European and Middle Eastern, non-Jewish populations." (Atzmon et al. 2010). ...
Article
Full-text available
Scientific studies on the genetic proximity of Jews undertake to shed light on "who or what Jews really are". However, various scientists and scholars have warned that such studies reify racial thinking. This essay delineates and contextualizes the debate held between various geneticists and social scientists on the danger of reification within the Jewish context. This is mainly a debate about the impact of (traditional, religious, and Zionist) narratives on scientific research as well as on the ethical responsibility of scientists. The paper claims that such genetic studies test Jewish religious narratives against genetic research results and do not necessarily enforce old notions of distinctiveness.
... The worldwide scattered Jewish populations are shown to be a genetic mosaic of peoples of autosomally, patrilineally and matrilineally different origins, who historically follow elements of Judaistic tradition (Atzmon et al., 2010;Behar et al., 2010Yardumian and Schurr, 2019). Jewish diaspora immigrated to the Indian subcontinent in three known episodes of migrations at least, setting up the stage for the rising of three distinct Jewish communities at three different coastal areas of India; the Cochin Jews in Kerala port city Cochin, the Bene Israel Jews in Mumbai, and the Baghdadi Jews in Kolkata (Katz, 2000;Israel, 2002). ...
... Later, we compared them with worldwide population genetic data from modern and ancient individuals. In the case of Jews, we used previously published Jews genome-wide data (Atzmon et al., 2010;Behar et al., 2010) and analyzed them in the context of available modern and ancient data published previously. ...
... However, considering the growing number of ancient samples, it is imperative to understand the genetic relation of Indian Jews in the context of ancient individuals from West Eurasia and South Asia (Allentoft et al., 2015;Haak et al., 2015;Mathieson et al., 2015;Lazaridis et al., 2016;Damgaard et al., 2018;Mittnik et al., 2018;Narasimhan et al., 2018). We studied the earlier published genome-wide SNP data of Indian Jews (Atzmon et al., 2010;Behar et al., 2010), applying f 3 , D statistics, and qpAdm to fulfill this objective. We observed a substantially higher affinity between Mumbai Jews and ancient Steppe people in both the f 3 and D statistics analyses compared to South Asian Dravidians (Figure 4, Table 1 in Ref III). ...
Thesis
Full-text available
This PhD thesis, prepared in Tartu University, addresses genetics of population history of the South Asian peoples. Inhabited considerably before the Last Glacial Maximum, the region harbors by now about 1.8 billion humans – almost a quarter of the global population. Therefore, understanding of present-day variation of the latter, in particular outside sub-Saharan Africa, is not possible without deeper knowledge about genetics of South Asian populations. This thesis is based on four published papers. The first one is focused on selected populations inhabiting northeastern Indus Valley, bearing, in particular, in mind ancient Indus Valley civilization and following it Vedic period. The second and the third paper address historically somewhat better known migrations, bringing to India religiously distinct Parsi and Jewish peoples. The fourth paper analyses the genetic variation of a populous Tharu tribe, living predominantly in Nepal, but also in northern provinces of India. Perhaps the most interesting finding of the first paper is that the presumably identified already in Vedic texts, Ror population exhibits significant genetic affinity with northern Steppe and West European peoples, testifying about prehistoric north to south migration(s). The arrival of Parsis to South Asia in 7th century was a consequence of the Islamization of Iran. Comparing Parsi genomes in their historic contexts, we observed their extensive admixture with South Asians, in particular, asymmetrically in paternal and maternal lineages. Nearly the same can be said about different Indian communities that preserved Judaist traditions: their genomes show affinities to peoples living in the Near and Middle East. As far as the genetically highly diverse Tharu tribe is concerned, a clearly distinct East Asian contribution can be seen, admixed with South Asian genetic heritage. It seems justified to identify the Tharu as cultural, rather than demic phenomenon.
... PCA is used as the first data investigation and data description analyses in most population genetic analyses (e.g., Atzmon et al. 2010;Behar et al. 2010;Campbell et al. 2012;Lazaridis et al. 2016). It has a wide range of applications. ...
... ; https://doi.org/10.1101/2021.04.11.439381 doi: bioRxiv preprint PCA has been used extensively to investigate the origins of AJs. In such analyses, it was assumed that the clustering of AJs by itself is evidence of Levantine origins (e.g., Atzmon et al. 2010;Behar et al. 2010;Carmi et al. 2014) and that the "short" distance between AJs and Levantine populations indicates their close genetic relationships and thereby the Levantine origins of AJ (e.g., Behar et al. 2010). The "short" to European populations was also interpreted as evidence of admixture (Atzmon et al. 2010). ...
... In such analyses, it was assumed that the clustering of AJs by itself is evidence of Levantine origins (e.g., Atzmon et al. 2010;Behar et al. 2010;Carmi et al. 2014) and that the "short" distance between AJs and Levantine populations indicates their close genetic relationships and thereby the Levantine origins of AJ (e.g., Behar et al. 2010). The "short" to European populations was also interpreted as evidence of admixture (Atzmon et al. 2010). As a rule, the much shorter distances between AJs and the Caucasus or Turkish populations, observed by all recent studies, were ignored (Need et al. 2009;Atzmon et al. 2010;Behar et al. 2010;Bray et al. 2010;Carmi et al. 2014) in favor of Levantine of South European populations. ...
Preprint
Full-text available
Principal Component Analysis (PCA) is a multivariate analysis that allows reduction of the complexity of datasets while preserving data's covariance and visualizing the information on colorful scatterplots, ideally with only a minimal loss of information. PCA applications are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics), implemented in well-cited packages like EIGENSOFT and PLINK. PCA outcomes are used to shape study design, identify and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, whereabouts, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We employed an intuitive color-based model alongside human population data for eleven common test cases. We demonstrate that PCA results are artifacts of the data and that they can be easily manipulated to generate desired outcomes. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns on the validity of results reported in the literature of population genetics and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations. An alternative mixed-admixture population genetic model is discussed.
... In the last decades, genetic information has become an important source for the study of human history and has been applied numerous times for various Jewish populations, first based on uniparental Y chromosomal and mitochondrial DNA (mtDNA) markers [6][7][8][9][10][11] and later by using genome-wide markers [12][13][14][15]. These studies found that most Jewish Diasporas share ancestry that can be traced back to the Middle-East, in accordance with historical records [12][13][14][15]. ...
... In the last decades, genetic information has become an important source for the study of human history and has been applied numerous times for various Jewish populations, first based on uniparental Y chromosomal and mitochondrial DNA (mtDNA) markers [6][7][8][9][10][11] and later by using genome-wide markers [12][13][14][15]. These studies found that most Jewish Diasporas share ancestry that can be traced back to the Middle-East, in accordance with historical records [12][13][14][15]. Some of these studies included Bene Israel members, though with inconclusive results [15]. ...
... We genotyped Bene Israel individuals and combined the data with 14 other Jewish populations from worldwide Diaspora previously genotyped using the same array [12,14]. We applied various quality control (QC) steps on these samples, resulting with 18 individuals of the Bene Israel community together with 347 samples from the other 14 Jewish populations. ...
Preprint
Full-text available
The Bene Israel Jewish community from West India is a unique population whose history before the 18 th century remains largely unknown. Bene Israel members consider themselves as descendants of Jews, yet the identity of Jewish ancestors and their arrival time to India are unknown, with speculations on arrival time varying between the 8th century BCE and the 6th century CE. Here, we characterize the genetic history of Bene Israel by collecting and genotyping 18 Bene Israel individuals. Combining with 486 individuals from 41 other Jewish, Indian and Pakistani populations, and additional individuals from worldwide populations, we conducted comprehensive genome-wide analyses based on F ST , principal component analysis, ADMIXTURE, identity-by-descent sharing, admixture linkage disequilibrium decay, haplotype sharing and allele sharing autocorrelation decay, as well as contrasted patterns between the X chromosome and the autosomes. The genetics of Bene Israel individuals resemble local Indian populations, while at the same time constituting a clearly separated and unique population in India. They are unique among Indian and Pakistani populations we analyzed in sharing considerable genetic ancestry with other Jewish populations. Putting together the results from all analyses point to Bene Israel being an admixed population with both Jewish and Indian ancestry, with the genetic contribution of each of these ancestral populations being substantial. The admixture took place in the last millennium, about 19-33 generations ago. It involved Middle-Eastern Jews and was sex-biased, with more male Jewish and local female contribution. It was followed by a population bottleneck and high endogamy, which can lead to increased prevalence of recessive diseases in this population. This study provides an example of how genetic analysis advances our knowledge of human history in cases where other disciplines lack the relevant data to do so.
... In particular, following a period of emphasis in Jewish population genetics on uniparental Y-chromosomal and mitochondrial loci that follow male and female lineages, respectively-often with differing patterns observed (reviewed in [7])-recent analyses of genome-wide autosomal SNPs have introduced a new era in which it has been possible to clarify population relationships at a fine scale from genome-wide averages. Indeed, some of the first genomic studies spanning large numbers of Jewish populations represented considerable advances in placing Jewish in relation to non-Jewish populations [9,10]. In population structure inference from autosomal genomes, studies of large numbers of genomewide polymorphisms in Jewish and closely related non-Jewish populations [9][10][11][12][13][14][15][16][17][18][19][20][21][22] have produced agreement on two main patterns. ...
... Indeed, some of the first genomic studies spanning large numbers of Jewish populations represented considerable advances in placing Jewish in relation to non-Jewish populations [9,10]. In population structure inference from autosomal genomes, studies of large numbers of genomewide polymorphisms in Jewish and closely related non-Jewish populations [9][10][11][12][13][14][15][16][17][18][19][20][21][22] have produced agreement on two main patterns. (1) Genetic clustering of many Jewish populations places them as intermediate between non-Jewish European and Middle Eastern groups. ...
... In this study, we examine Jewish population structure using genome-wide SNPs. Our work builds primarily upon five studies [9,10,15,19,20], each including a variety of Jewish populations, but each having less geographic coverage or smaller sample sizes per population than the current investigation, or both, and with generally more emphasis on relationships of Jewish and non-Jewish groups. Kopelman et al. [15] studied four Jewish populations, finding that they could be distinguished from neighboring European and Middle Eastern non-Jewish populations, and that the Tunisian Jewish group was the most distinctive of the four. ...
Article
Recent studies have used genome-wide single-nucleotide polymorphisms (SNPs) to investigate relationships among various Jewish populations and their non-Jewish historical neighbors, often focusing on small subsets of populations from a limited geographic range or relatively small samples within populations. Here, building on the significant progress that has emerged from genomic SNP studies in the placement of Jewish populations in relation to non-Jewish populations, we focus on population structure among Jewish populations. In particular, we examine Jewish population-genetic structure in samples that span much of the historical range of Jewish populations in Europe, the Middle East, North Africa, and South Asia. Combining 429 newly genotyped samples from 29 Jewish and 3 non-Jewish populations with previously reported genotypes on Jewish and non-Jewish populations, we investigate variation in 2789 individuals from 114 populations at 486,592 genome-wide autosomal SNPs. Using multidimensional scaling analysis, unsupervised model-based clustering, and population trees, we find that, genetically, most Jewish samples fall into four major clusters that largely represent four culturally defined groupings, namely the Ashkenazi, Mizrahi, North African, and Sephardi subdivisions of the Jewish population. We detect high-resolution population structure, including separation of the Ashkenazi and Sephardi groups and distinctions among populations within the Mizrahi and North African groups. Our results refine knowledge of Jewish population-genetic structure and contribute to a growing understanding of the distinctive genetic ancestry evident in closely related but historically separate Jewish communities.
... The Ashkenazi "founder event" is also evident in four mitochondrial lineages carried by 40% of AJ (Behar et al., 2006;Costa et al., 2013). More recently, studies found high rates of identical-by-descent (IBD) sharing in AJ, that is, nearly identical long haplotypes present in unrelated individuals, a hallmark of founder populations (Atzmon et al., 2010;Carmi et al., 2014a;Gusev et al., 2012;Henn et al., 2012). Quantitative modeling suggested that AJ experienced a sharp reduction in size (a "bottleneck") in the late Middle Ages and that the (effective) number of founders was in the hundreds (Carmi et al., 2014a;Granot-Hershkovitz et al., 2018;Palamara et al., 2012;Santiago et al., 2020;Tournebize et al., 2022). ...
... Genetic evidence supports a mixed Middle Eastern (ME) and European (EU) ancestry in AJ. This is based on uniparental markers with origins in either region (Behar et al., 2006(Behar et al., , 2017Costa et al., 2013;Hammer et al., 2000Hammer et al., , 2009Nebel et al., 2001), as well as autosomal studies showing that AJ have ancestry intermediate between ME and EU populations (Atzmon et al., 2010;Behar et al., 2010Behar et al., , 2013Bray et al., 2010;Carmi et al., 2014a;Granot-Hershkovitz et al., 2018;Guha et al., 2012;Kopelman et al., 2020). These and other autosomal studies also showed that individuals with AJ ancestry are genetically distinguishable from those of other ancestries. ...
Article
Full-text available
We report genome-wide data from 33 Ashkenazi Jews (AJ), dated to the 14th century, obtained following a salvage excavation at the medieval Jewish cemetery of Erfurt, Germany. The Erfurt individuals are genetically similar to modern AJ, but they show more variability in Eastern European-related ancestry than modern AJ. A third of the Erfurt individuals carried a mitochondrial lineage common in modern AJ and eight carried pathogenic variants known to affect AJ today. These observations, together with high levels of runs of homozygosity, suggest that the Erfurt community had already experienced the major reduction in size that affected modern AJ. The Erfurt bottleneck was more severe, implying substructure in medieval AJ. Overall, our results suggest that the AJ founder event and the acquisition of the main sources of ancestry pre-dated the 14th century and highlight late medieval genetic heterogeneity no longer present in modern AJ.
... Applying ASCEND to this dataset, we obtained highly concordant results for all groups except one (Sahariya), where the fit in the original study looked noisy (Notes S5.1 in S1 Text). We also applied ASCEND to Ashkenazi Jews (AJ) and Finns that have been previously studied for their history of founder events [3,[21][22][23]. Applying ASCEND to nine Finns and seven Ashkenazi Jews (AJ) in the HO37 dataset, we obtained significant evidence of founder events in both groups (S2 Table). ...
... Across worldwide populations, we identified 53 groups that have experienced more extreme founder events (with significantly higher founder intensity) than AJs, who have high rates of recessive diseases due to their history of founder events [1,[21][22][23]. These populations are particularly interesting from a population and medical genetics perspective to understand the genetic consequences of population bottlenecks. ...
Article
Full-text available
Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups––including Native Americans, Oceanians and South Asians––have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time––with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events.
... When analysing IBD segments, it has been typical to sum the total length (W ab ) of segments shared between a pair of individuals (a and b), one from each of a pair of populations (A and B), and then sum over all such pairs to arrive at a total sum of IBD sharing between each pair of populations 97 . This sum can be normalized, dividing by the total number of possible cross-population pairs of individuals, one from each of the populations (n A n B ), to give the average total IBD length shared (W AB ) per cross-population individual pair 94,97,98 ...
... λx − If we are considering recombination segments shared between two present day individuals stemming from the same common ancestor, that is IBD segments, we must adjust the rate for the number of recombination events per unit length that have occurred down both sides of the pedigree from this common ancestor, which gives a λ of 2T total 95,98,101 . Each of these 2T opportunities for recombination to occur along the genome is called a meiosis event. ...
Article
Full-text available
Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth¹, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2–4. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Tōtaiete mā) Islands (11th century), the western Austral (Tuha’a Pae) Islands and Tuāmotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua ‘Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately ad 1200 via Mangareva.
... We believe that the NOD2 variants determining the occurrence of Crohn's disease came from common ancestors, resulting from mutual history. In the 13th century, Ashkenazi communities emerged in Poland and multiplied until the 20th century, reaching millions in size and a wide geographic spread across Europe [44,45]. Assessing genetic distance, Atzmon et al. showed that the AJs are more closely related to some host Europeans than to the ancestral Levantines [44]. ...
... In the 13th century, Ashkenazi communities emerged in Poland and multiplied until the 20th century, reaching millions in size and a wide geographic spread across Europe [44,45]. Assessing genetic distance, Atzmon et al. showed that the AJs are more closely related to some host Europeans than to the ancestral Levantines [44]. Hue et al. suggested a model of at least two events of European admixture: The first of them slightly pre-dated a late medieval founder event, was probably of Southern European origin, and was estimated to be 25 ± 50 generations ago. ...
Article
Full-text available
The genetic background and the determinants influencing the disease form, course, and onset of inflammatory bowel disease (IBD) remain unresolved. We aimed to determine the NOD2 gene haplotypes and their relationship with IBD occurrence, clinical presentation, and onset, analyzing a cohort of 578 patients with IBD, including children, and 888 controls. Imaging or endoscopy with a histopathological confirmation was used to diagnose IBD. Genotyping was performed to assess the differences in genotypic and allelic frequencies. Linkage disequilibrium was analyzed, and associations between haplotypes and clinical data were evaluated. We emphasized the prevalence of risk alleles in all analyzed loci in patients with Crohn disease (CD). Interestingly, c.2722G>C and c.3019_3020insC alleles were also overrepresented in ulcerative colitis (UC). T-C-G-C-insC, T-C-G-T-insC, and T-T-G-T-wt haplotypes were correlated with the late-onset form of CD (OR = 23.01, 5.09, and 17.71, respectively), while T-T-G-T-wt and C-C-G-T-wt were prevalent only in CD children (OR = 29.36, and 12.93, respectively; p-value = 0.001). In conclusion, the presence of c.3019_3020insC along with c.802C>T occurred as the most fundamental contributing diplotype in late-onset CD form, while in CD children, the mutual allele in all predisposing haplotypes was the c.2798 + 158T. Identifying the unique, high-impact haplotypes supports further studies of the NOD2 gene, including haplotypic backgrounds.
... Previous studies have documented a pervasive history of founder events in Jewish groups 21,22 . To characterize the impact of founder events in diverse Jewish populations, we applied ASCEND to 11 groups, including Ashkenazi Jews (AJs). ...
... ~560 years ago). Our estimate for Mbuti pygmies of ~21 [18][19][20][21][22][23][24] generations ago is consistent but more precise than the previous published estimate of ~10-100 generations 30 . These results are interesting in light of the fact that most African populations, including hunter-gatherers, have high diversity and historically large population sizes. ...
Preprint
Full-text available
Founder events play a critical role in shaping genetic diversity, impacting the fitness of a species and disease risk in humans. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods for characterizing founder events require large sample sizes or phased genomes. To learn about the frequency and evolutionary history of founder events, we introduce ASCEND (Allele Sharing Correlation for the Estimation of Non-equilibrium Demography), a flexible two-locus method to infer the age and strength of founder events. This method uses the correlation in allele sharing across the genome between pairs of individuals to recover signatures of past bottlenecks. By performing coalescent simulations, we show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios, with genotype or sequence data. We apply ASCEND to ~5,000 worldwide human samples (~3,500 present-day and ~1,500 ancient individuals), and ~1,000 domesticated dog samples. In both species, we find pervasive evidence of founder events in the recent past. In humans, over half of the populations surveyed in our study had evidence for a founder events in the past 10,000 years, associated with geographic isolation, modes of sustenance, and historical invasions and epidemics. We document that island populations have historically maintained lower population sizes than continental groups, ancient hunter-gatherers had stronger founder events than Neolithic Farmers or Steppe Pastoralists, and periods of epidemics such as smallpox were accompanied by major population crashes. Many present-day groups--including Central & South Americans, Oceanians and South Asians--have experienced founder events stronger than estimated in Ashkenazi Jews who have high rates of recessive diseases due to their history of founder events. In dogs, we uncovered extreme founder events in most groups, more than ten times stronger than the median strength of founder events in humans. These founder events occurred during the last 25 generations and are likely related to the establishment of dog breeds during Victorian times. Our results highlight a widespread history of founder events in humans and dogs, and provide insights about the demographic and cultural processes underlying these events.
... We show that many of the community's exhibit evidence of founder effects, as demonstrated by elevated levels of autozygosity, median within-community IBD sharing, and by applying methods that measure the degree of distance between communities in network topology. This includes canonical founder populations with known historical evidence of founder events, including Ashkenazi Jewish 35 Overall approximately one quarter of BioMe participants harbor genetic signatures of founder effects, and extrapolating this observation to NYC, we estimate that approximately 15% of New Yorkers can be genetically linked to one or more founder populations. This finding mirrors simi-. ...
... , Finnish33 and Garifuna populations35,36 . We also show evidence for the timing and magnitude of founder effects in less characterized H/L founder populations in NYC, namely populations of Puerto Rican, Colombian, Ecuadorian, and Dominican descent. ...
Preprint
Full-text available
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens specific to sub-populations. Here we propose a framework for repurposing data from Electronic Health Records (EHRs) in concert with genomic data to explore enrichment of disease within sub-populations. Using data from a diverse biobank in New York City, we genetically identified 17 sub-populations, and noted the presence of genetic founder effects in 7. By then linking community membership to the EHR, we were able to identify over 600 health outcomes that were statistically enriched within a specific population, with many representing known associations, and many others being novel. This work reinforces the utility of linking genomic data to EHRs, and provides a framework towards fine-scale monitoring of population health.
... Recent studies of microsatellite polymorphisms (Kopelman et al. 2009;Listman et al. 2010) and SNPs (Atzmon et al. 2010;Behar et al. 2010;Campbell et al. 2012) on autosomes have been able to statistically distinguish European, North African, and Middle Eastern Jewish populations from their non-Jewish neighbors. Using F ST we see that, next to the Cohanim, the closest group to the Samaritans is the Yemeni Jews, whereas (δμ) 2 indicates it is the Bedouins, whose autosomal F ST data indicate they are the closest to the Samaritans (Table 5). ...
... On the origin and affiliation of Indian Jewish populations, using large sample size with combination of high resolution biparental (autosomal) and uniparental markers (Y chromosome and mitochondrial DNA), reconstructing the genetic history, see for example Chaubey et al., 2016. Also see Behar et al., 2006;Sutton et al., 2006;Feder et al., 2007;Behar et al., 2010;Atzmon et al., 2010;Moorjani et al., 2011;Campbell et al., 2012. Carmi et al., 2014Behar, 2008;Rootsi et al., 2013;Nebel et al., 2005. ...
Article
Full-text available
Pour une publication sur Épisiciences. The comparative legal scholar authors, working a broad project mapping how law conceptualizes and operationalizes race, ethnicity and nationality, provide an assessment of the triadic relationship between law, identity (making and claims recognition) and science. The project focuses on race and ethnicity, excluding the discussion of gender identity, but the latter is used as a point of reference to demonstrate the transformative changes in the past years in how the meaning of the terms of identity are assigned and conceptualized in social sciences and humanities, and to a certain degree in politics and law. Yet, there is a debilitating lack of linguistic and conceptual resources, cultural tools, and a solid and proper vocabulary for thinking about racial identity, which is particularly stark in the field of law, especially international law, which habitually operates with the concepts of race, ethnicity, and nationality when setting forth standards for the recognition of collective rights or protection from discrimination, establishing criteria for asylum, labeling actions as genocide, or requiring a “genuine link” in citizenship law, without actually providing definitions for these groups or of membership criteria within these legal constructs. The paper provides an overview of the obstacles, challenges and controversies in the legal institutionalization. In technical terms, the operationalization of ethnic/racial/national group affiliation can follow several options: self-identification; authority given to elected or appointed members (representatives) of the group (leaving aside legitimacy-, or ontological questions regarding the authenticity or genuineness of these actors); classification by outsiders, through the perception of the majority; or by outsiders but using “objective” criteria, such as names, residence, et cetera. The paper also provides an assessment of how “objective” criteria, data and constructions provided by science translate into the legal discourse. Case studies will be used from anthropological/historical “scientific knowledge,” and the operationalization of (performative) whiteness and otherness in the US, to contemporary examples of requiring DNA-heritage certificates in naturalization and Diaspora-programs (for example for birthright schemes in Israel); race-focused forensic datasets; and race-based medicine and reproductive technologies – where the methodology and conceptualization of “scientific race” is analyzed in a comparative and critical framework. Les auteurs, juristes comparatistes, travaillant sur un vaste projet qui cartographie la manière dont le droit conceptualise et opérationnalise la race, l’ethnicité et la nationalité, fournissent une évaluation de la relation triadique entre le droit, l’identité (la reconnaissance de l’identité et des revendications) et la science. Le projet se concentre sur la race et l’ethnicité, excluant la discussion de l’identité de genre, mais cette dernière est utilisée comme point de référence pour démontrer les changements transformateurs de ces dernières années dans la façon dont la signification des termes d’identité est assignée et conceptualisée dans les sciences sociales et humaines, et dans une certaine mesure dans la politique et le droit. Pourtant, il existe un manque débilitant de ressources linguistiques et conceptuelles, d’outils culturels et d’un vocabulaire solide et approprié pour réfléchir à l’identité raciale, ce qui est particulièrement flagrant dans le domaine du droit, notamment le droit international, qui utilise habituellement les concepts de race, d’ethnicité et de nationalité lorsqu’il établit des normes pour la reconnaissance des droits collectifs ou la protection contre la discrimination, qu’il établit des critères pour l’asile, qu’il qualifie des actions de génocide ou qu’il exige un « lien authentique » dans le droit de la citoyenneté, sans réellement fournir de définitions pour ces groupes ou de critères d’adhésion dans ces constructions juridiques. L’article donne un aperçu des obstacles, des défis et des controverses liés à l’institutionnalisation juridique. En termes techniques, l’opérationnalisation de l’affiliation à un groupe ethnique/racial/national peut suivre plusieurs options : auto-identification ; autorité donnée aux membres (représentants) élus ou nommés du groupe (en laissant de côté les questions de légitimité ou ontologiques concernant l’authenticité ou l’authenticité de ces acteurs) ; classification par des personnes extérieures, à travers la perception de la majorité ; ou par des personnes extérieures mais en utilisant des critères « objectifs », tels que les noms, la résidence, etc. L’article fournit également une évaluation de la manière dont les critères, données et constructions « objectifs » fournis par la science se traduisent dans le discours juridique. Des études de cas seront utilisées, allant de la « connaissance scientifique » anthropologique/historique et de l’opérationnalisation de la blancheur (performative) et de l’altérité aux États-Unis, à des exemples contemporains d’exigence de certificats d’héritage ADN dans les programmes de naturalisation et de diaspora (par exemple pour les programmes de droit de naissance en Israël) ; des ensembles de données médico-légales axées sur la race ; et de la médecine et des technologies de reproduction fondées sur la race – où la méthodologie et la conceptualisation de la « race scientifique » sont analysées dans un cadre comparatif et critique.
... The Jewish HapMap dataset 6,51 and 112 Europeans in the HapMap 52 dataset were used to identify 100% Ashkenazi Jews among IBDGC samples. Jewish samples in Eastern Europe and the Middle East and Europeans were used as a reference panel to perform PCA, aiming to validate the distribution of genetically identified AJs compared to the AJ reference panel. ...
Article
Full-text available
Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility.
... Using the inferred IBD segment data, we calculated an average pairwise IBD sharing between Swabians and various populations with the following formula according to Atzmon et al.: IBD ij is the length of the IBD segment shared between individuals i and j. The n and m are the number of individuals in the groups I and J [28]. ...
Article
Full-text available
Background German-derived ethnicities are one of the largest ethnic groups in Hungary, dating back to the formation of the Kingdom of Hungary, which took place at the beginning of the 11th century. Germans arrived in Hungary in many waves. The most significant immigration wave took place following the collapse of the Ottoman Empire in East-Central Europe which closed the 150 year long Ottoman occupation. To date, there are no comprehensive genome-wide studies investigating the genetic makeup of the Danube Swabians. Here we analyzed 47 Danube Swabian samples collected from elderly Swabian individuals living in the Dunaszekcső-Bár area, in Danube side villages of Southwest Hungary. These Swabians, according to self-declaration, did not admix with other ethnic groups for 3–6 succeeding generations. Using Illumina Infinium 720 K Beadchip genotype data, we applied allele frequency-based and haplotype-based genome-wide marker data analyses to investigate the ancestry and genetic composition of the collected Danube Swabian samples. Results Haplotype-based analyses like identity by descent segment analysis show that the investigated Danube Swabians possess significant German and other West European ancestry, but their Hungarian ancestry is also prominent. Our results suggest that their main source of ancestry can be traced back to Western Europe, presumably to the region of Germany. Conclusion This is the first analysis of Danube Swabian population samples based on genome-wide autosomal data. Our results establish the basis for conducting further comprehensive research on Danube Swabians and on other German ethnicities of the Carpathian basin, which can help reconstruct their origin, and identify their major archaic genomic patterns.
... Present-day Ashkenazim are descendants of medieval Jewish populations with histories primarily in northern and eastern Europe. As a result, they carry distinctive ancestries, [7][8][9][10] and Jewish and non-Jewish medieval individuals living in the same regions would likely show characteristic patterns of genetic variation. 11 Hereditary disorders in Ashkenazi Jewish populations have been the focus of considerable medical research, [12][13][14][15][16] with genetic screening now commonplace to mitigate risks. ...
Article
Full-text available
We report genome sequence data from six individuals excavated from the base of a medieval well at a site in Norwich, UK. A revised radiocarbon analysis of the assemblage is consistent with these individuals being part of a historically attested episode of antisemitic violence on 6 February 1190 CE. We find that four of these individuals were closely related and all six have strong genetic affinities with modern Ashkenazi Jews. We identify four alleles associated with genetic disease in Ashkenazi Jewish populations and infer variation in pigmentation traits, including the presence of red hair. Simulations indicate that Ashkenazi-associated genetic disease alleles were already at appreciable frequencies, centuries earlier than previously hypothesized. These findings provide new insights into a significant historical crime, into Ashkenazi population history, and into the origins of genetic diseases associated with modern Jewish populations.
... IBD ij is the length of the IBD segment shared between individuals i and j. The n and m are the number of individuals in the groups I and J [17]. ...
Preprint
Full-text available
Background German-derived ethnicities are one of the largest ethnic groups in Hungary, dating back to the formation of the Kingdom of Hungary, which took place at the beginning of the 11th century. Germans came to Hungary in many waves. The most significant immigration wave took place following the collapse of the Ottoman Empire in East-Central Europe which closed the 150-year long Ottoman occupation. To date, there are no comprehensive genome-wide studies investigating the genetic makeup of the Danube Swabians. Here we analyzed 47 Danube Swabian samples collected from elderly Danube Swabian individuals living in the Dunaszekcső-Bár area, Danube side villages in Southwest Hungary. Based on self-declaration they did not admix with other ethnic groups for 3–6 succeeding generations. Using Illumina Infinium 720K Beadchip genotype data, we applied allele frequency-based and haplotype-based genome-wide marker data analyses to investigate the ancestry and genetic composition of the collected Danube Swabian samples. Results Haplotype-based analyses like identity by descent and homozygosity by descent analyses show that the investigated Danube Swabians remained isolated from other ethnic groups which was supported by the D-statistics results. According to our results, the investigated Danube Swabian individuals do not possess detectable recent admixture, and our results suggest that their main source of ancestry can be traced back to the region of Germany. Conclusions This is the first analysis of Danube Swabian population samples based on genome-wide autosomal data. Our results establish the basis for conducting further comprehensive research on Danube Swabians and on other German ethnicities of the Carpathian basin, which can help reconstruct their origin, and identify their major archaic genomic patterns.
... IBD ij is the length of the IBD segment shared between individuals i and j. The n and m are the number of individuals in the groups I and J [17]. ...
Preprint
Full-text available
Background: German-derived ethnicities are one of the largest ethnic groups in Hungary, dating back to the formation of the Kingdom of Hungary, which took place at the beginning of the 11th century. Germans came to Hungary in many waves. The most significant immigration wave took place following the collapse of the Ottoman Empire in East-Central Europe which closed the 150-year long Ottoman occupation. To date, there are no comprehensive genome-wide studies investigating the genetic makeup of the Danube Swabians. Here we analyzed 47 Danube Swabian samples collected from elderly Danube Swabian individuals living in the Dunaszekcső-Bár area, Danube side villages in Southwest Hungary. Based on self-declaration they did not admix with other ethnic groups for 3-6 succeeding generations. Using Illumina Infinium 720K Beadchip genotype data, we applied allele frequency-based and haplotype-based genome-wide marker data analyses to investigate the ancestry and genetic composition of the collected Danube Swabian samples. Results: Haplotype-based analyses like identity by descent and homozygosity by descent analyses show that the investigated Danube Swabians remained isolated from other ethnic groups which was supported by the D-statistics results. According to our results, the investigated Danube Swabian individuals do not possess detectable recent admixture, and our results suggest that their main source of ancestry can be traced back to the region of Germany. Conclusions: This is the first analysis of Danube Swabian population samples based on genome-wide autosomal data. Our results establish the basis for conducting further comprehensive research on Danube Swabians and on other German ethnicities of the Carpathian basin, which can help reconstruct their origin, and identify their major archaic genomic patterns.
... We were among the first to document a shared genotype between Ashkenazi Jewish individuals and Mexican Americans with no known Jewish ancestry who carried BRCA1 c.66_67del 6 . Subsequent publications have reinforced the observation, and suggest they are likely descendants of conversos or crypto-Jews who emigrated to the Americas in the late 15th century and over generations assimilated into the larger Hispanic society, representing an underappreciated diaspora to the new world 13,42,43 . ...
Article
Full-text available
The prevalence and contribution of BRCA1 / 2 ( BRCA ) pathogenic variants (PVs) to the cancer burden in Latin America are not well understood. This study aims to address this disparity. BRCA analyses were performed on prospectively enrolled Latin American Clinical Cancer Genomics Community Research Network participants via a combination of methods: a Hispanic Mutation Panel ( HISPANEL ) on MassARRAY; semiconductor sequencing; and copy number variant (CNV) detection. BRCA PV probability was calculated using BRCAPRO. Among 1,627 participants (95.2% with cancer), we detected 236 (14.5%) BRCA PVs; 160 BRCA1 (31% CNVs); 76 BRCA2 PV frequency varied by country: 26% Brazil, 9% Colombia, 13% Peru, and 17% Mexico. Recurrent PVs (seen ≥3 times), some region-specific, represented 42.8% (101/236) of PVs. There was no ClinVar entry for 14% (17/125) of unique PVs, and 57% (111/196) of unique VUS. The area under the ROC curve for BRCAPRO was 0.76. In summary, we implemented a low-cost BRCA testing strategy and documented a significant burden of non-ClinVar reported BRCA PVs among Latin Americans. There are recurrent, population-specific PVs and CNVs, and we note that the BRCAPRO mutation probability model performs adequately. This study helps address the gap in our understanding of BRCA -associated cancer in Latin America.
... At least since the beginning of the Zionist movement, and especially since the establishment of the State of Israel, studies on the genetic proximity among Jews assumed that Jews shared a common biological identity (Efron 1994;Kirsh 2003;Lipphardt 2008Lipphardt , 2012Egorova 2014;Falk 2015). NGS impacted the growing body of research papers also in the field of "Jewish genetics", which -among others -includes two genome-wide-association studies on the interrelatedness of the world Jewry (Atzmon et al. 2010;Behar et al. 2010) that have received a lot of media and academic attention. Historians and anthropologists, as well as geneticists, have pointed out the pitfalls of such studies, as they are informed by pre-existing notions and narratives about group identity, (national) history, and origins, and assign genetic markers to supposedly clear-cut ethnic population groups so that "Jewishness" is embedded in the biological rather than in the cultural or social realm (Glenn 2002;Gibel-Azoulay 2003;Abu El-Haj 2012;Egorova 2014;Falk 2015;Elhaik 2016). ...
Article
Full-text available
In Israel, several hundred thousand citizens form a minority group that wishes to be acknowledged as Jewish by the state authorities. Most of them immigrated from the former Soviet Union and cannot provide sufficient evidence of their maternal ancestors’ affiliation with a Jewish community. This has a direct impact on their civil rights. Based on a scientific research article on matrilineal genetic markers among Eastern and Central European Jews, the rabbinical dean of an institute for advanced Jewish studies in Jerusalem proposed to accept, under certain conditions, the presence of specific genetic markers as legal proof of “Jewishness.” Genetic testing here is meant to become a tool for empowerment and (re)claiming Jewish status. This case raises many questions concerning a biological understanding of Judaism and shows how genetic ancestry testing could be used to uphold the religious orthodox narrative.
... At least since the beginning of the Zionist movement, and especially since the establishment of the State of Israel, studies on the genetic proximity among Jews assumed that Jews shared a common biological identity (Efron 1994;Kirsh 2003;Lipphardt 2008Lipphardt , 2012Egorova 2014;Falk 2015). NGS impacted the growing body of research papers also in the field of "Jewish genetics", which -among others -includes two genome-wide-association studies on the interrelatedness of the world Jewry (Atzmon et al. 2010;Behar et al. 2010) that have received a lot of media and academic attention. Historians and anthropologists, as well as geneticists, have pointed out the pitfalls of such studies, as they are informed by pre-existing notions and narratives about group identity, (national) history, and origins, and assign genetic markers to supposedly clear-cut ethnic population groups so that "Jewishness" is embedded in the biological rather than in the cultural or social realm (Glenn 2002;Gibel-Azoulay 2003;Abu El-Haj 2012;Egorova 2014;Falk 2015;Elhaik 2016). ...
Article
Full-text available
Next Generation Sequencing led to major knowledge gains in the molecular life sciences. But the new technology provides data that pose new challenges to both science and society. New fields of research are emerging and questions of identity on the basis of genetic analyses are being negotiated.
... Syrian and Iranian Jews are genetically quite similar (Atzmon et al., 2010), yet unlike disease-causing variants common to Iranian Jews (Dagan & Gershoni-Baruch, 2010), much information is lacking regarding the frequency of pathogenic variants in Syrian Jews. Hence, we chose Iranian Jews as a comparison group for the assessment of allele frequencies within the Syrian Jewish population. ...
Article
Full-text available
Background: There is a paucity of information available regarding the carrier frequency for autosomal recessive pathogenic variants among Syrian Jews. This report provides data to support carrier screening for a group of autosomal recessive conditions among Syrian Jews based on the population frequency of 40 different pathogenic variants in a cohort of over 3800 individuals with Syrian Jewish ancestry. Methods: High throughput PCR amplicon sequencing was used to genotype 40 disease-causing variants in 3840 and 5279 individuals of Syrian and Iranian Jewish ancestry, respectively. These data were compared with Ashkenazi Jewish carrier frequencies for the same variants, based on roughly 370,000 Ashkenazi Jewish individuals in the Dor Yeshorim database. Results: Carrier screening identified pathogenic variants shared among Syrian, Iranian, and Ashkenazi Jewish groups. In addition, alleles unique to each group were identified. Importantly, 8.2% of 3401 individuals of mixed Syrian Jewish ancestry were carriers for at least one pathogenic variant. Conclusion: The findings of this study support the clinical usefulness of premarital genetic screening for individuals with Syrian Jewish ancestry to reduce the incidence of autosomal recessive disease among persons with Syrian Jewish heritage.
... To compare haplotype sharing between the two populations, we calculated the haplotype sharing index of each population pair using the following equation (Atzmon et al. 2010): ...
Article
Full-text available
The Ryukyu Archipelago is located in the southwest of the Japanese islands and is composed of dozens of islands, grouped into the Miyako Islands, Yaeyama Islands, and Okinawa Islands. Based on the results of principal component analysis (PCA) on genome-wide single nucleotide polymorphisms (SNPs), genetic differentiation was observed among the island groups of the Ryukyu Archipelago. However, a detailed population structure analysis of the Ryukyu Archipelago has not yet been completed. We obtained genomic DNA samples from 1,240 individuals living in the Miyako Islands, and we genotyped 665,326 SNPs to infer population history within the Miyako Islands, including Miyakojima, Irabu and Ikema islands. The haplotype-based analysis showed that populations in the Miyako Islands were divided into three subpopulations located on Miyakojima northeast, Miyakojima southwest, and Irabu/Ikema. The results of haplotype sharing and the D statistics analyses showed that the Irabu/Ikema subpopulation received gene flows different from those of the Miyakojima subpopulations, which may be related with the historically attested immigration during the Gusuku period (900 - 500 BP). A coalescent-based demographic inference suggests that the Irabu/Ikema population firstly split away from the ancestral Ryukyu population about 41 generations ago, followed by a split of the Miyako southwest population from the ancestral Ryukyu population (about 16 generations ago), and the differentiation of the ancestral Ryukyu population into two populations (Miyako northeast and Okinawajima populations) about 7 generations ago. Such genetic information is useful for explaining the population history of modern Miyako people and must be taken into account when performing disease association studies.
... www.nature.com/scientificreports/ genome-wide SNP arrays [13][14][15][16][17][18] . The combined analysis of millions of polymorphic markers along the genome have led to greater precision in the clustering of different Jewish groups and to the ability to estimate the Middle Eastern, European, and African components in each group. ...
Article
Full-text available
Chuetas are a group of descendants of Majorcan Crypto-Jews (Balearic Islands, Spain) who were socially stigmatized and segregated by their Majorcan neighbours until recently; generating a community that, although after the seventeenth century no longer contained Judaic religious elements, maintained strong group cohesion, Jewishness consciousness, and endogamy. Collective memory fixed 15 surnames as a most important defining element of Chueta families. Previous studies demonstrated Chuetas were a differentiated population, with a considerable proportion of their original genetic make-up. Genetic data of Y-chromosome polymorphism and mtDNA control region showed, in Chuetas’ paternal lineages, high prevalence of haplogroups J2-M172 (33%) and J1-M267 (18%). In maternal lineages, the Chuetas hallmark is the presence of a new sub-branching of the rare haplogroup R0a2m as their modal haplogroup (21%). Genetic diversity in both Y-chromosome and mtDNA indicates the Chueta community has managed to avoid the expected heterogeneity decrease in their gene pool after centuries of isolation and inbreeding. Moreover, the composition of their uniparentally transmitted lineages demonstrates a remarkable signature of Middle Eastern ancestry—despite some degree of host admixture—confirming Chuetas have retained over the centuries a considerable degree of ancestral genetic signature along with the cultural memory of their Jewish origin.
... Recently, this feature gained tremendous popularity in investigative genetic genealogy, where it has been used to solve hundreds of violent crimes and identify unclaimed bodies by finding distant relatives of crime-scene samples (Ney et al. 2020;Erlich et al. 2018). Finally, finding genealogical relatives from DNA also have applications in medical genetics (Speed & Balding 2015) and population genetics (Atzmon et al. 2010). ...
Preprint
Full-text available
Finding familial relatives using DNA has multiple applications, in genetic genealogy, population genetics, and forensics. So far, most relative matching algorithms rely on detecting identity-by-descent (IBD) segments with high quality genotype data. Recently, low coverage sequencing (LCS) has received growing attention as a promising cost-effective method to ascertain genomic information. However, with higher error rates, it is unclear whether existing IBD detection can work on LCS datasets. Here, we developed and tested a framework for relative matching using sequencing with 1× coverage (1×LCS). We started by exploring the error characteristics of this method compared to array data. Our results show that after some optimization 1×LCS can exhibit the same genotyping discordance rates as the discordance between two array platforms. Using this observation, we developed a hybrid framework for relative matching and tuned this framework with >2,700 pairs of confirmed genealogical relatives that were genotyped using heterogenous datasets. We then obtained array and 1×LCS on 19 samples and use our framework to find relatives in a database of over 3 million individuals. The total length of shared segments obtained by 1×LCS was virtually indistinguishable to genotyping arrays for matches with a total sharing >200cM (second cousins or closer). For more distant relatives, as long as those were detected by both technologies, the total length obtained by LCS and by genotyping arrays was highly correlated, with no evidence of over- or underestimation. Taken together, our results show that 1×LCS can be a valid alternative to arrays for relative matching, opening the possibility for further democratization of genomic data.
... Human population structure is a complex product of the forces of migration and drift acting on both local and global scales, patterned by geography [Novembre et al., 2008, Ralph andCoop, 2013], time [Skoglund et al., , 2014, admixture [Hellenthal et al., 2014], landscape and environment [Beall et al., 2010, Bigham et al., 2010, Bradburd et al., 2013, and shaped by culture [Reich et al., 2009, Atzmon et al., 2010, Moorjani et al., 2011. To visualize the patterns these processes have induced, we create a geogenetic map for a worldwide sample of modern hu-man populations. ...
Preprint
Full-text available
Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build ``geogenetic maps,'' which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.
... The IBD segments were further filtered by implementing the program merge-ibd-segments, to remove breaks and short gaps in IBD segments (>0.6 cM in length). The output of merge-ibd-segments was used to compute the average pairwise IBD sharing between the different NC groups by using the previously described expression 57 . ...
... The IBD segments were further filtered by implementing the program merge-ibd-segments, to remove breaks and short gaps in IBD segments (>0.6 cM in length). The output of merge-ibd-segments was used to compute the average pairwise IBD sharing between the different NC groups by using the previously described expression 57 . ...
Article
Full-text available
The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals—comprising 50 ethnolinguistic groups, including previously unsampled populations—to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon—but in other genes, variants denoted as ‘likely pathogenic’ in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.
... where IBD ij is the length of IBD segment shared between individuals i and j and n, m are the number of individuals in population I and J [37]. ...
... Ashkenazi Jewish sample identification. The Jewish HapMap dataset 56 and 112 Europeans in the HapMap 57 dataset were used to identify 100% Ashkenazi Jewish among IBDGC samples. Jewish samples in Eastern Europe and the Middle East and Europeans were used as a reference panel to perform PCA, aiming to validate the distribution of genetically identified AJs comparing to the AJ reference panel. ...
Preprint
Full-text available
Inflammatory bowel disease (IBD) is a group of chronic diseases, affecting different parts of the gastrointestinal tract, that mainly comprises Crohn's Disease (CD) and Ulcerative Colitis (UC). Most IBD genomic research to date has involved genome-wide association studies (GWAS) of common genetic variants, mostly in Europeans, resulting in the identification of over 200 risk loci. The incidence of IBD in Ashkenazi Jews (AJ) is particularly high compared to other population groups and rare protein-coding variants are significantly enriched in AJ. These variants are expected to have a larger phenotypic effect and are hypothesized to complement the missing heritability that cannot be fully addressed by GWAS in IBD. Therefore, we genetically identified 4,974 AJs IBD cases and controls from whole exome sequencing (WES) data from the NIDDK IBD Genetics Consortium (IBDGC). We selected credible rare variants with high predicted impact, aggregated them into genes, and performed gene burden and pathway enrichment analyses to identify 7 novel plausible IBD-causing genes:NCF1, CES1, ICAM1, INPP5D, ABCB1, IL33 and TLR4. We further perform bulk and single-cell RNA sequencing, demonstrating the likely relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of AJ adult IBD displays a significant overlap with very early onset IBD (VEOIBD) genetics. At the variant level, we performed Phenome-wide association studies (PheWAS) in the UK Biobank to replicate risk sites in IBD and reveal shared risk sites with other diseases. Finally, we showed that a polygenic risk score (PRS) has high power to differentiate AJ IBD cases from controls when using rare and high impact variants.
... Sequenced samples (n=1346) were derived from subjects described previously from multiple casecontrol cohorts summarized in Supplementary Table 1. All samples were self-reported to be Ashkenazi Jewish, and also verified as AJ by principal components analysis of previously collected SNP array data as described in our prior publications 29,30 . Informed consent was obtained in accordance with institutional policies and the studies were approved by the corresponding institutional review boards. ...
Preprint
Full-text available
Identification of rare genetic variants associated with schizophrenia has proven challenging due to multiple sources of heterogeneity, which may be reduced in founder populations. We examined ultra-rare exonic variants in 786 patients with schizophrenia and 463 healthy comparison subjects, all drawn from the Ashkenazi Jewish population. Cases had a higher frequency of novel missense or loss of function (MisLoF) variants compared to controls. Characterizing 141 “case-only” genes (in which ≥ 3 cases in our dataset had MisLoF variants with none found in controls), we identified cadherins as a novel gene set associated with schizophrenia, including a recurrent mutation in PCDHA3 . Modeling the effects of purifying selection demonstrated that deleterious ultra-rare variants are greatly over-represented in the Ashkenazi population, resulting in enhanced power for rare variant association. Identification of cell adhesion genes in the cadherin/protocadherin family helps specify the synaptic abnormalities central to the disorder, and suggests novel potential treatment strategies.
... It exists in all Arabic-speaking populations (apart from the Druze). The European-related component is highest in the European control populations (English and Tuscan), as well as in Ashkenazi and Moroccan Jews, both having a history in Europe (Atzmon et al., 2010;Carmi et al., 2014;Schroeter, 2008). This component is present, although in smaller amount, in all other populations except for Bedouin B and Ethiopian Jews. ...
Article
We report genome-wide DNA data for 73 individuals from five archaeological sites across the Bronze and Iron Ages Southern Levant. These individuals, who share the “Canaanite” material culture, can be modeled as descending from two sources: (1) earlier local Neolithic populations and (2) populations related to the Chalcolithic Zagros or the Bronze Age Caucasus. The non-local contribution increased over time, as evinced by three outliers who can be modeled as descendants of recent migrants. We show evidence that different “Canaanite” groups genetically resemble each other more than other populations. We find that Levant-related modern populations typically have substantial ancestry coming from populations related to the Chalcolithic Zagros and the Bronze Age Southern Levant. These groups also harbor ancestry from sources we cannot fully model with the available data, highlighting the critical role of post-Bronze-Age migrations into the region over the past 3,000 years.
... Ashkenazi Jewish population was repeatedly demonstrated to derive its ancestral origins from the Levant and Europe (Atzmon et al., 2010;Behar et al., 2010), and strong isolation from its hosting population were suggestive for a strong parental founding event (Behar et al., 2006;. ...
Article
Full-text available
The spectrum of of BRCA1 and BRCA2 pathogenic sequence variants in Middle Eastern, North African, and South European countries
... Ashkenazi Jewish population was repeatedly demonstrated to derive its ancestral origins from the Levant and Europe (Atzmon et al., 2010;Behar et al., 2010), and strong isolation from its hosting population were suggestive for a strong parental founding event (Behar et al., 2006;. ...
Article
BRCA1 BRCA2 mutational spectrum in the Middle East, North Africa, and Southern Europe is not well characterized. The unique history and cultural practices characterizing these regions, often involving consanguinity and inbreeding, plausibly led to the accumulation of population‐specific founder pathogenic sequence variants (PSVs). To determine recurring BRCA PSVs in these locales, a search in PUBMED, EMBASE, BIC, and CIMBA was carried out combined with outreach to researchers from the relevant countries for unpublished data. We identified 232 PSVs in BRCA1 and 239 in BRCA2 in 25 of 33 countries surveyed. Common PSVs that were detected in four or more countries were c.5266dup (p.Gln1756Profs), c.181T>G (p.Cys61Gly), c.68_69del (p.Glu23Valfs), c.5030_5033del (p.Thr1677Ilefs), c.4327C>T (p.Arg1443Ter), c.5251C>T (p.Arg1751Ter), c.1016dup (p.Val340Glyfs), c.3700_3704del (p.Val1234Glnfs), c.4065_4068del (p.Asn1355Lysfs), c.1504_1508del (p.Leu502Alafs), c.843_846del (p.Ser282Tyrfs), c.798_799del (p.Ser267Lysfs), and c.3607C>T (p.Arg1203Ter) in BRCA1 and c.2808_2811del (p.Ala938Profs), c.5722_5723del (p.Leu1908Argfs), c.9097dup (p.Thr3033Asnfs), c.1310_1313del (p. p.Lys437Ilefs), and c.5946del (p.Ser1982Argfs) for BRCA2 . Notably, some mutations (e.g., p.Asn257Lysfs (c.771_775del)) were observed in unrelated populations. Thus, seemingly genotyping recurring BRCA PSVs in specific populations may provide first pass BRCA genotyping platform. Data Repository Information LOVD: https://databases.lovd.nl/shared/variants#order=VariantOnGenome&search_VariantOnGenome/Reference=Laitman
... In the PCA of Crete vs Europe, the Cretans overlap with three populations: the Peloponneseans, the Sicilians and the Ashkenazi Jews (see Figures 4a, S17, and S18). Southern European and Mediterranean ancestry of the Ashkenazi Jews has also been demonstrated before (Atzmon et al., 2010;Behar et al., 2010;Bauchet et al., 2007;Price et al., 2008;Seldin et al., 2006;Tian et al., 2008). Furthermore, we find in both PCA and ADMIXTURE analysis, that the Ashkenazi are more similar to the Cretans than to the two Levantine Semitic populations. ...
Article
Full-text available
The medieval history of several populations often suffers from scarcity of contemporary records resulting in contradictory and sometimes biased interpretations by historians. This is the situation with the population of the island of Crete, which remained relatively undisturbed until the Middle Ages when multiple wars, invasions, and occupations by foreigners took place. Historians have considered the effects of the occupation of Crete by the Arabs (in the 9th and 10th centuries C.E.) and the Venetians (in the 13th to the 17th centuries C.E.) to the local population. To obtain insights on such effects from a genetic perspective, we studied representative samples from 17 Cretan districts using the Illumina 1 million or 2.5 million arrays and compared the Cretans to the populations of origin of the medieval conquerors and settlers. Highlights of our findings include (1) small genetic contributions from the Arab occupation to the extant Cretan population, (2) low genetic contribution of the Venetians to the extant Cretan population, and (3) evidence of a genetic relationship among the Cretans and Central, Northern, and Eastern Europeans, which could be explained by the settlement in the island of northern origin tribes during the medieval period. Our results show how the interaction between genetics and the historical record can help shed light on the historical record.
Article
Full-text available
HLA frequencies show widespread variation across human populations. Demographic factors as well as selection are thought to have shaped HLA variation across continents. In this study, a worldwide comparison of HLA class I and class II diversity was carried out. Multidimensional scaling techniques were applied to 50 HLA-A and HLA-B (class I) as well as 13 HLA-DRB1 (class II) first-field frequencies in 200 populations from all continents. Our results confirm a strong effect of geography on the distribution of HLA class I allele groups, with principal coordinates analysis closely resembling geographical location of populations, especially those of Africa-Eurasia. Conversely, class II frequencies stratify populations along a continuum of differentiation less clearly correlated to actual geographic location. Double clustering analysis revealed finer intra-continental sub-clusters (e.g., Northern and Western Europe vs. South East Europe, North Africa and Southwest Asia; South and East Africa vs. West Africa), and HLA allele group patterns characteristic of these clusters. Ancient (Austronesian expansion) and more recent (Romani people in Europe) migrations, as well as extreme differentiation (Taiwan indigenous peoples, Native Americans), and interregional gene flow (Sámi, Egyptians) are also reflected by the results. Barrier analysis comparing D ST and geographic location identified genetic discontinuities caused by natural barriers or human behavior explaining inter and intra-continental HLA borders for class I and class II. Overall, a progressive reduction in HLA diversity from African to Oceanian and Native American populations is noted. This analysis of HLA frequencies in a unique set of worldwide populations confirms previous findings on the remarkable similarity of class I frequencies to geography, but also shows a more complex development for class II, with implications for both human evolutionary studies and biomedical research.
Article
Full-text available
Objectives The increased availability of genome‐wide data allows capturing the fine genetic structure of present days populations. Here we analyze the genetic ancestry at a fine scale of an Argentinean Patagonia population to understand the origins beyond the three‐hybrid model, and to compare these results with volunteers' self‐perceived ancestry in a broad context encompassed by historical and familiar information. Materials and Methods We compare high‐throughput genotyping data for 92 individuals that we generated to data sets from the literature by applying fully haplotype‐based methods to examine patterns of human population substructure. The volunteers filled out a semi‐structured questionnaire, including questions about their history, ancestors, and self‐perceived ancestry. Finally, we used non‐parametric tests in order to compare genomic ancestry against self‐perception. Results Genetic ancestry from Iberian populations accounted for 0.176 (Spain and Basque origins), while the component associated with Italian populations accounted for 0.140. We observed a 0.169 Native American genetic ancestry. Participants significantly over‐ and under‐ self‐perceived Native American and European origins, respectively. Components of origins from North Africa to Central South Asia accounted for 0.225 of the genetic ancestry in the sample, with significantly higher proportions for people that mentioned such origins in their genealogical history. Discussion We captured the fine‐genetic architecture of a Puerto Madryn population sample in Chubut province, showing that self‐perceived ancestry remains a poor proxy for genetic ancestry. The presence of North Africa to Central South Asia components and its correlate with self‐perception of these origins justifies its inclusion in future miscegenation studies in Argentina.
Preprint
Full-text available
European-ancestry populations are recognized as stratified but not as admixed, implying that residual confounding by locus-specific ancestry can affect studies of association, polygenic adaptation, and polygenic risk scores. We integrated individual-level genome-wide data from ~ 19,000 European-ancestry individuals across 79 European populations and five European American cohorts. We generated a new reference panel that captures ancestral diversity missed by both the 1000 Genomes and Human Genome Diversity Projects. Both Europeans and European-Americans are admixed at subcontinental level, with admixture dates differing among subgroups of European Americans. After adjustment for both genome-wide and locus-specific ancestry, associations between a highly differentiated variant in LCT (rs4988235) and height or LDL-cholesterol were confirmed to be false positives whereas the association between LCT and body mass index was genuine. We provide formal evidence of subcontinental admixture in individuals with European ancestry, which, if not properly accounted for, can produce spurious results in genetic epidemiology studies.
Article
Full-text available
Genome-wide genotype data from 48 carefully selected population samples of Transylvania-living Szeklers and non-Szekler Hungarians were analyzed by comparative analysis. Our analyses involved contemporary Hungarians living in Hungary, other Europeans, and Eurasian samples counting 530 individuals altogether. The source of the Szekler samples was the commune of Korond, Transylvania. The analyzed non-Szekler Hungarian samples were collected from villages with a history dating back to the era of the Árpád Dynasty. Population structure by principal component analysis and ancestry analysis also revealed a great within-group similarity of the analyzed Szeklers and non-Szekler Transylvanian Hungarians. These groups also showed similar genetic patterns with each other. Haplotype analyses using identity-by-descent segment discovering tools showed that average pairwise identity-by-descent sharing is similar in the investigated populations, but the Korond Szekler samples had higher average sharing with the Hungarians from Hungary than non-Szekler Transylvanian Hungarians. Average sharing results showed that both groups are isolated compared to other Europeans, and pointed out that the non-Szekler Transylvanian Hungarian inhabitants of the investigated Árpád Age villages are more isolated than investigated Szeklers from Korond. This was confirmed by our autozygosity analysis as well. Identity-by-descent segment analyses and 4-population tests also confirmed that these Hungarian-speaking Transylvanian ethnic groups are strongly related to Hungarians living in Hungary.
Article
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Article
The identification of rare variants associated with schizophrenia has proven challenging due to genetic heterogeneity, which is reduced in founder populations. In samples from the Ashkenazi Jewish population, we report that schizophrenia cases had a greater frequency of novel missense or loss of function (MisLoF) ultra-rare variants (URVs) compared to controls, and the MisLoF URV burden was inversely correlated with polygenic risk scores in cases. Characterizing 141 “case-only” genes (MisLoF URVs in ≥3 cases with none in controls), the cadherin gene set was associated with schizophrenia. We report a recurrent case mutation in PCDHA3 that results in the formation of cytoplasmic aggregates and failure to engage in homophilic interactions on the plasma membrane in cultured cells. Modeling purifying selection, we demonstrate that deleterious URVs are greatly overrepresented in the Ashkenazi population, yielding enhanced power for association studies. Identification of the cadherin/protocadherin family as risk genes helps specify the synaptic abnormalities central to schizophrenia.
Preprint
Full-text available
The Ashkenazi Jewish (AJ) population is important in medical genetics due to its high rate of Mendelian disorders and other unique genetic characteristics. Ashkenazi Jews have appeared in Europe in the 10 th century, and their ancestry is thought to involve an admixture of European (EU) and Middle-Eastern (ME) groups. However, both the time and place of admixture in Europe are obscure and subject to intense debate. Here, we attempt to characterize the Ashkenazi admixture history using a large Ashkenazi sample and careful application of new and existing methods. Our main approach is based on local ancestry inference, assigning each Ashkenazi genomic segment as EU or ME, and comparing allele frequencies across EU segments to those of different EU populations. The contribution of each EU source was also evaluated using GLOBETROTTER and analysis of IBD sharing. The time of admixture was inferred using multiple tools, relying on statistics such as the distributions of EU segment lengths and the total EU ancestry per chromosome and the correlation of ancestries along the chromosome. Our simulations demonstrated that distinguishing EU vs ME ancestry is subject to considerable noise at the single segment level, but nevertheless, conclusions could be drawn based on chromosome-wide statistics. The predominant source of EU ancestry in AJ was found to be Southern European (≈60-80%), with the rest being likely Eastern European. The inferred admixture time was ≈35 generations ago, but multiple lines of evidence suggests that it represents an average over two or more admixture events, pre-and post-dating the founder event experienced by AJ in late medieval times, with the prebottleneck admixture event bounded between 25-55 generations ago. Author Summary The Ashkenazi Jewish population has dwelt in Europe for much of its 1000-year existence. However, the ethnic and geographic origins of Ashkenazi Jews are controversial, due to the lack of reliable historical records. Previous genetic studies have exposed links to Middle-Eastern and European ancestries, but the history of admixture in Europe has not been studied in detail yet, partly due to technical difficulties in disentangling signals from multiple admixture events. Here, we address this challenge by presenting an in-depth analysis of the sources of European gene flow and the time of admixture events, using a wide spectrum of genetic methods, extensive simulations, and a number of new approaches. Specifically, to ensure minimal confounding by the Ashkenazi Middle-Eastern ancestry, we mask out genomic regions with Middle-Eastern ancestry, and investigate the lengths and geographic sources of the remaining regions. Our results suggest a model of at least two events of European admixture. One event slightly pre-dated a late medieval founder event and was likely from a Southern European source. Another event post-dated the founder event and was likely in Eastern Europe. These results, as well as the methods introduced, will be highly valuable for geneticists and other researchers interested in Ashkenazi Jewish origins and medical genetics.
Article
Csango people are an East–Central European ethnographic group living mostly in the historical region of Moldavia, Romania. Their traditional language, the Csango is an old Hungarian dialect, which is a severely endangered language due to language shift. Their origin is still disputed among experts and there are many hypotheses since the 19th century. Previous genetic studies found connection with ethnic groups living in Hungary and provided evidence which might support their Hungarian origin. Another study found Inner Asian Altaic ancestry in their genetic makeup. The goal of this study was to analyze the genetic characteristics of the Csango people by comparing their genetic characteristics to contemporary Eurasian populations based on genome-wide autosomal marker data. Our findings suggest that genetic affinity of Csangos to Hungarians is more significant than to Romanians. They also have a detectable connection with Central-Asian and Siberian Turkic ethnic groups. Besides the presumable Middle Eastern/Central-Asian Turkic ancestry, Csangos show ~4% Turkic ancestry from Central Asia/Siberia, which makes them unique in comparison to all other East–Central European populations investigated in this study. The admixture that resulted in this Turkic ancestry could have occurred 30–40 generations ago, which date interval corresponds to Hungarian historical events regarding their migration and the conquest of the Carpathian basin.
Article
The South Asian populations have a mosaic of ancestries likely due to the interactions of long-term populations of the landmass and those of East and West Eurasia. Apart from prehistoric dispersals, there are some known population movements to India. In this study, we focussed on the migration of Jewish and Parsi populations on temporal and spatial scales. The existence of Jewish and Parsi communities in India are recorded since ancient times. However, due to the lack of high-resolution genetic data, their origin and affiliation with other Indian and non-Indian populations remains shrouded in legends. Earlier genetic studies on populations of Indian Jews have found evidence for a minor shared ancestry of Indian Jews with Middle Eastern (Jews) populations, whereas for Parsis, the Iranian link was proposed. Recently, in our high-resolution study, we were able to quantify the admixture dynamics of these groups, which has suggested a male-biased admixture. Here, we added the newly available ancient samples and revisited the interplay of genes and cultures. Thus, in this study we reconstructed a broad genetic profile of Indian Jews and Parsis to paint a fine-grained picture of these ethnic groups.
Article
Full-text available
History of East-Central Europe has been intertwined with the history of Turks in the past. A significant part of this region of Europe has been fallen under Ottoman control during the 150 years of Ottoman occupation in the 16-17th centuries. The presence of the Ottoman Empire affected this area not only culturally but also demographically. The Romani people, the largest ethnic minority of the East-Central European area, share an even more eventful past with Turkish people from the time of their migration throughout Eurasia and they were a notable ethnic group in East-Central Europe in the Ottoman era already. The relationship of Turks with East-Central European ethnic groups and with regional Roma ethnicity was investigated based on genome-wide autosomal single nucleotide polymorphism data. Population structure analysis, ancestry estimation, various formal tests of admixture and DNA segment analyses were carried out in order to shed light to the conclusion of these events on a genome-wide basis. Analyses show that the Ottoman occupation of Europe left detectable impact in the affected East-Central European area and shaped the ancestry of the Romani people as well. We estimate that the investigated European populations have an average identity-by-descent share of 0.61 with Turks, which is notable, compared to other European populations living in West and North Europe far from the affected area, and compared to the share of Sardinians, living isolated from these events. Admixture of Roma and Turks during the Ottoman rule show also high extent.
Article
Full-text available
Formulae are given for estimators for the parameters F, θ, f (FIT, FST, FIS) of population structure. As with all such estimators, ratios are used so that their properties are not known exactly, but they have been found to perform satisfactorily in simulations. Unlike the estimators in general use, the formulae do not make assumptions concerning numbers of populations, sample sizes, or heterozygote frequencies. As such, they are suited to small data sets and will aid the comparisons of results of different investigators. A simple weighting procedure is suggested for combining information over alleles and loci, and sample variances may be estimated by a jackknife procedure.
Article
Full-text available
Genetic studies have often produced conflicting results on the question of whether distant Jewish populations in different geographic locations share greater genetic similarity to each other or instead, to nearby non-Jewish populations. We perform a genome-wide population-genetic study of Jewish populations, analyzing 678 autosomal microsatellite loci in 78 individuals from four Jewish groups together with similar data on 321 individuals from 12 non-Jewish Middle Eastern and European populations. We find that the Jewish populations show a high level of genetic similarity to each other, clustering together in several types of analysis of population structure. Further, Bayesian clustering, neighbor-joining trees, and multidimensional scaling place the Jewish populations as intermediate between the non-Jewish Middle Eastern and European populations. These results support the view that the Jewish populations largely share a common Middle Eastern ancestry and that over their history they have undergone varying degrees of admixture with non-Jewish populations of European descent.
Article
Full-text available
To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at approximately 160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F(ST) = 0.0002 approximately 0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F(ST) > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10(-101)). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.
Article
Full-text available
It was recently shown that the genetic distinction between self-identified Ashkenazi Jewish and non-Jewish individuals is a prominent component of genome-wide patterns of genetic variation in European Americans. No study however has yet assessed how accurately self-identified (Ashkenazi) Jewish ancestry can be inferred from genomic information, nor whether the degree of Jewish ancestry can be inferred among individuals with fewer than four Jewish grandparents. Using a principal components analysis, we found that the individuals with full Jewish ancestry formed a clearly distinct cluster from those individuals with no Jewish ancestry. Using the position on the first principal component axis, every single individual with self-reported full Jewish ancestry had a higher score than any individual with no Jewish ancestry. Here we show that within Americans of European ancestry there is a perfect genetic corollary of Jewish ancestry which, in principle, would permit near perfect genetic inference of Ashkenazi Jewish ancestry. In fact, even subjects with a single Jewish grandparent can be statistically distinguished from those without Jewish ancestry. We also found that subjects with Jewish ancestry were slightly more heterozygous than the subjects with no Jewish ancestry, suggesting that the genetic distinction between Jews and non-Jews may be more attributable to a Near-Eastern origin for Jewish populations than to population bottlenecks.
Article
Full-text available
We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.
Article
Full-text available
The Roman Jewish community has been historically continuous in Rome since pre-Christian times and may have been progenitor to the Ashkenazi Jewish community. Despite a history of endogamy over the past 2000 yr, the historical record suggests that there was admixture with Ashkenazi and Sephardic Jews during the Middle Ages. To determine whether Roman and Ashkenazi Jews shared common signature mutations, we tested a group of 107 Roman Jews, representing 176 haploid sets of chromosomes. No mutations were found for Bloom syndrome, BRCA1, BRCA2, Canavan disease, Fanconi anemia complementation group C, or Tay-Sachs disease. Two unrelated individuals were positive for the 3849 + 10C->T cystic fibrosis mutation; one carried the N370S Gaucher disease mutation, and one carried the connexin 26 167delT mutation. Each of these was shown to be associated with the same haplotype of tightly linked microsatellite markers as that found among Ashkenazi Jews. In addition, 14 individuals had mutations in the familial Mediterranean fever gene and three unrelated individuals carried the factor XI type III mutation previously observed exclusively among Ashkenazi Jews. These findings suggest that the Gaucher, connexin 26, and familial Mediterranean fever mutations are over 2000 yr old, that the cystic fibrosis 3849 + 10kb C->T and factor XI type III mutations had a common origin in Ashkenazi and Roman Jews, and that other mutations prevalent among Ashkenazi Jews are of more recent origin.
Article
Full-text available
Haplotypes constructed from Y-chromosome markers were used to trace the paternal origins of the Jewish Diaspora. A set of 18 biallelic polymorphisms was genotyped in 1,371 males from 29 populations, including 7 Jewish (Ashkenazi, Roman, North African, Kurdish, Near Eastern, Yemenite, and Ethiopian) and 16 non-Jewish groups from similar geographic locations. The Jewish populations were characterized by a diverse set of 13 haplotypes that were also present in non-Jewish populations from Africa, Asia, and Europe. A series of analyses was performed to address whether modern Jewish Y-chromosome diversity derives mainly from a common Middle Eastern source population or from admixture with neighboring non-Jewish populations during and after the Diaspora. Despite their long-term residence in different countries and isolation from one another, most Jewish populations were not significantly different from one another at the genetic level. Admixture estimates suggested low levels of European Y-chromosome gene flow into Ashkenazi and Roman Jewish communities. A multidimensional scaling plot placed six of the seven Jewish populations in a relatively tight cluster that was interspersed with Middle Eastern non-Jewish populations, including Palestinians and Syrians. Pairwise differentiation tests further indicated that these Jewish and Middle Eastern non-Jewish populations were not statistically different. The results support the hypothesis that the paternal gene pools of Jewish communities from Europe, North Africa, and the Middle East descended from a common Middle Eastern ancestral population, and suggest that most Jewish communities have remained relatively isolated from neighboring non-Jewish communities during and after the Diaspora.
Article
Full-text available
A sample of 526 Y chromosomes representing six Middle Eastern populations (Ashkenazi, Sephardic, and Kurdish Jews from Israel; Muslim Kurds; Muslim Arabs from Israel and the Palestinian Authority Area; and Bedouin from the Negev) was analyzed for 13 binary polymorphisms and six microsatellite loci. The investigation of the genetic relationship among three Jewish communities revealed that Kurdish and Sephardic Jews were indistinguishable from one another, whereas both differed slightly, yet significantly, from Ashkenazi Jews. The differences among Ashkenazim may be a result of low-level gene flow from European populations and/or genetic drift during isolation. Admixture between Kurdish Jews and their former Muslim host population in Kurdistan appeared to be negligible. In comparison with data available from other relevant populations in the region, Jews were found to be more closely related to groups in the north of the Fertile Crescent (Kurds, Turks, and Armenians) than to their Arab neighbors. The two haplogroups Eu 9 and Eu 10 constitute a major part of the Y chromosome pool in the analyzed sample. Our data suggest that Eu 9 originated in the northern part, and Eu 10 in the southern part of the Fertile Crescent. Genetic dating yielded estimates of the expansion of both haplogroups that cover the Neolithic period in the region. Palestinian Arabs and Bedouin differed from the other Middle Eastern populations studied here, mainly in specific high-frequency Eu 10 haplotypes not found in the non-Arab groups. These chromosomes might have been introduced through migrations from the Arabian Peninsula during the last two millennia. The present study contributes to the elucidation of the complex demographic history that shaped the present-day genetic landscape in the region.
Article
Full-text available
The Jews are an ancient people with a history spanning several millennia. Genetic studies over the past 50 years have shed light on Jewish origins, the relatedness of Jewish communities and the genetic basis of Mendelian disorders among Jewish peoples. In turn, these observations have been used to develop genetic testing programmes and, more recently, to attempt to discover new genes for susceptibility to common diseases.
Article
Full-text available
We studied human population structure using genotypes at 377 autosomal microsatellite loci in 1056 individuals from 52 populations. Within-population differences among individuals account for 93 to 95% of genetic variation; differences among major groups constitute only 3 to 5%. Nevertheless, without using prior information about the origins of individuals, we identified six main genetic clusters, five of which correspond to major geographic regions, and subclusters that often correspond to individual populations. General agreement of genetic and predefined populations suggests that self-reported ancestry can facilitate assessments of epidemiological risks but does not obviate the need to use genetic information in genetic association studies.
Article
Full-text available
The molecular basis of more than 25 genetic diseases has been described in Ashkenazi Jewish populations. Most of these diseases are characterized by one or two major founder mutations that are present in the Ashkenazi population at elevated frequencies. One explanation for this preponderance of recessive diseases is accentuated genetic drift resulting from a series of dispersals to and within Europe, endogamy, and/or recent rapid population growth. However, a clear picture of the manner in which neutral genetic variation has been affected by such a demographic history has not yet emerged. We have examined a set of 32 binary markers (single nucleotide polymorphisms; SNPs) and 10 microsatellites on the non-recombining portion of the Y chromosome (NRY) to investigate the ways in which patterns of variation differ between Ashkenazi Jewish and their non-Jewish host populations in Europe. This set of SNPs defines a total of 20 NRY haplogroups in these populations, at least four of which are likely to have been part of the ancestral Ashkenazi gene pool in the Near East, and at least three of which may have introgressed to some degree into Ashkenazi populations after their dispersal to Europe. It is striking that whereas Ashkenazi populations are genetically more diverse at both the SNP and STR level compared with their European non-Jewish counterparts, they have greatly reduced within-haplogroup STR variability, especially in those founder haplogroups that migrated from the Near East. This contrasting pattern of diversity in Ashkenazi populations is evidence for a reduction in male effective population size, possibly resulting from a series of founder events and high rates of endogamy within Europe. This reduced effective population size may explain the high incidence of founder disease mutations despite overall high levels of NRY diversity.
Article
Full-text available
Recent genetic studies, based on Y chromosome polymorphic markers, showed that Ashkenazi Jews are more closely related to other Jewish and Middle Eastern groups than to their host populations in Europe. However, Ashkenazim have an elevated frequency of R-M17, the dominant Y chromosome haplogroup in Eastern Europeans, suggesting possible gene flow. In the present study of 495 Y chromosomes of Ashkenazim, 57 (11.5%) were found to belong to R-M17. Detailed analyses of haplotype structure, diversity and geographic distribution suggest a founder effect for this haplogroup, introduced at an early stage into the evolving Ashkenazi community in Europe. R-M17 chromosomes in Ashkenazim may represent vestiges of the mysterious Khazars.
Article
Full-text available
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general "phase change" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.
Article
Full-text available
GENOME proposes a rapid coalescent-based approach to simulate whole genome data. In addition to features of standard coalescent simulators, the program allows for recombination rates to vary along the genome and for flexible population histories. Within small regions, we have evaluated samples simulated by GENOME to verify that GENOME provides the expected LD patterns and frequency spectra. The program can be used to study the sampling properties of any statistic for a whole genome study. Availability: The program and C++ source code are available online at http://www.sph.umich.edu/csg/liang/genome/
Article
Full-text available
Clustering of individuals into populations on the basis of multilocus genotypes is informative in a variety of settings. In population-genetic clustering algorithms, such as BAPS, STRUCTURE and TESS, individual multilocus genotypes are partitioned over a set of clusters, often using unsupervised approaches that involve stochastic simulation. As a result, replicate cluster analyses of the same data may produce several distinct solutions for estimated cluster membership coefficients, even though the same initial conditions were used. Major differences among clustering solutions have two main sources: (1) 'label switching' of clusters across replicates, caused by the arbitrary way in which clusters in an unsupervised analysis are labeled, and (2) 'genuine multimodality,' truly distinct solutions across replicates. To facilitate the interpretation of population-genetic clustering results, we describe three algorithms for aligning multiple replicate analyses of the same data set. We have implemented these algorithms in the computer program CLUMPP (CLUster Matching and Permutation Program). We illustrate the use of CLUMPP by aligning the cluster membership coefficients from 100 replicate cluster analyses of 600 chickens from 20 different breeds. CLUMPP is freely available at http://rosenberglab.bioinformatics.med.umich.edu/clumpp.html.
Article
Full-text available
We announce the release of the fourth version of MEGA software, which expands on the existing facilities for editing DNA sequence data from autosequencers, mining Web-databases, performing automatic and manual sequence alignment, analyzing sequence alignments to estimate evolutionary distances, inferring phylogenetic trees, and testing evolutionary hypotheses. Version 4 includes a unique facility to generate captions, written in figure legend format, in order to provide natural language descriptions of the models and methods used in the analyses. This facility aims to promote a better understanding of the underlying assumptions used in analyses, and of the results generated. Another new feature is the Maximum Composite Likelihood (MCL) method for estimating evolutionary distances between all pairs of sequences simultaneously, with and without incorporating rate variation among sites and substitution pattern heterogeneities among lineages. This MCL method also can be used to estimate transition/transversion bias and nucleotide substitution pattern without knowledge of the phylogenetic tree. This new version is a native 32-bit Windows application with multi-threading and multi-user supports, and it is also available to run in a Linux desktop environment (via the Wine compatibility layer) and on Intel-based Macintosh computers under the Parallels program. The current version of MEGA is available free of charge at (http://www.megasoftware.net).
Article
Full-text available
European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient. Application of this substructure information was critical in examining a real dataset in whole genome association (WGA) analyses for rheumatoid arthritis in European Americans to reduce false positive signals. In addition, two sets of European substructure ancestry informative markers (ESAIMs) were identified that provide substantial substructure information. The results provide further insight into European population genetic substructure and show that this information can be used for improving error rates in association testing of candidate genes and in replication studies of WGA scans.
Article
Full-text available
Author Summary Genetic association studies analyze both phenotypes (such as disease status) and genotypes (at sites of DNA variation) of a given set of individuals. The goal of association studies is to identify DNA variants that affect disease risk or other traits of interest. However, association studies can be confounded by differences in ancestry. For example, misleading results can arise if individuals selected as disease cases have different ancestry, on average, than healthy controls. Although geographic ancestry explains only a small fraction of human genetic variation, there exist genetic variants that are much more frequent in populations with particular ancestries, and such variants would falsely appear to be related to disease. In an effort to avoid these spurious results, association studies often restrict their focus to a single continental group. European Americans are one such group that is commonly studied in the United States. Here, we analyze multiple large European American datasets to show that important differences in ancestry exist even within European Americans, and that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the major, consistent sources of variation. We provide an approach that is able to account for these ancestry differences in association studies even if only a small number of genes is studied.
Article
Full-text available
Genetic isolates such as the Ashkenazi Jews (AJ) potentially offer advantages in mapping novel loci in whole genome disease association studies. To analyze patterns of genetic variation in AJ, genotypes of 101 healthy individuals were determined using the Affymetrix EAv3 500 K SNP array and compared to 60 CEPH-derived HapMap (CEU) individuals. 435,632 SNPs overlapped and met annotation criteria in the two groups. A small but significant global difference in allele frequencies between AJ and CEU was demonstrated by a mean FST of 0.009 (P < 0.001); large regions that differed were found on chromosomes 2 and 6. Haplotype blocks inferred from pairwise linkage disequilibrium (LD) statistics (Haploview) as well as by expectation-maximization haplotype phase inference (HAP) showed a greater number of haplotype blocks in AJ compared to CEU by Haploview (50,397 vs. 44,169) or by HAP (59,269 vs. 54,457). Average haplotype blocks were smaller in AJ compared to CEU (e.g., 36.8 kb vs. 40.5 kb HAP). Analysis of global patterns of local LD decay for closely-spaced SNPs in CEU demonstrated more LD, while for SNPs further apart, LD was slightly greater in the AJ. A likelihood ratio approach showed that runs of homozygous SNPs were approximately 20% longer in AJ. A principal components analysis was sufficient to completely resolve the CEU from the AJ. LD in the AJ versus was lower than expected by some measures and higher by others. Any putative advantage in whole genome association mapping using the AJ population will be highly dependent on regional LD structure.
Article
Full-text available
Human genetic diversity is shaped by both demographic and biological factors and has fundamental implications for understanding the genetic basis of diseases. We studied 938 unrelated individuals from 51 populations of the Human Genome Diversity Panel at 650,000 common single-nucleotide polymorphism loci. Individual ancestry and population substructure were detectable with very high resolution. The relationship between haplotype heterozygosity and geography was consistent with the hypothesis of a serial founder effect with a single origin in sub-Saharan Africa. In addition, we observed a pattern of ancestral allele frequency distributions that reflects variation in population dynamics among geographic regions. This data set allows the most comprehensive characterization to date of human genetic variation.
Article
Full-text available
The history of the Jewish Diaspora dates back to the Assyrian and Babylonian conquests in the Levant, followed by complex demographic and migratory trajectories over the ensuing millennia which pose a serious challenge to unraveling population genetic patterns. Here we ask whether phylogenetic analysis, based on highly resolved mitochondrial DNA (mtDNA) phylogenies can discern among maternal ancestries of the Diaspora. Accordingly, 1,142 samples from 14 different non-Ashkenazi Jewish communities were analyzed. A list of complete mtDNA sequences was established for all variants present at high frequency in the communities studied, along with high-resolution genotyping of all samples. Unlike the previously reported pattern observed among Ashkenazi Jews, the numerically major portion of the non-Ashkenazi Jews, currently estimated at 5 million people and comprised of the Moroccan, Iraqi, Iranian and Iberian Exile Jewish communities showed no evidence for a narrow founder effect, which did however characterize the smaller and more remote Belmonte, Indian and the two Caucasus communities. The Indian and Ethiopian Jewish sample sets suggested local female introgression, while mtDNAs in all other communities studied belong to a well-characterized West Eurasian pool of maternal lineages. Absence of sub-Saharan African mtDNA lineages among the North African Jewish communities suggests negligible or low level of admixture with females of the host populations among whom the African haplogroup (Hg) L0-L3 sub-clades variants are common. In contrast, the North African and Iberian Exile Jewish communities show influence of putative Iberian admixture as documented by mtDNA Hg HV0 variants. These findings highlight striking differences in the demographic history of the widespread Jewish Diaspora.
Article
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American,EastAsian,South Asian,Mexican,andEuropean origin.Seveninformativeaxesof variationidentifiedvia principal- component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic struc- ture of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The geno- type and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).