ArticlePDF Available

High-density SNP genotyping detects homogeneity of Spanish and French Basques, and confirms their genomic distinctiveness from other European populations

Authors:
Article

High-density SNP genotyping detects homogeneity of Spanish and French Basques, and confirms their genomic distinctiveness from other European populations

Abstract and Figures

A recent study reported that Basques do not constitute a genetically distinct population, and that Basques from Spanish and French provinces do not show significant genetic similarity. These conclusions disagree with numerous previous studies, and are not consistent with the historical and linguistic evidence that supports the distinctiveness of Basques. In order to further investigate this controversy, we have genotyped 83 Spanish Basque individuals and used these data to infer population structure based on more than 60,000 single nucleotide polymorphisms of several European populations. Here, we present the first high-throughput analysis including Basques from Spanish and French provinces, and show that all Basques constitute a homogeneous group that can be clearly differentiated from other European populations.
Content may be subject to copyright.
A preview of the PDF is not available
... Although numerous studies have focused on the genetics of Basques, a lively debate on their population history is still ongoing (see, for instance, Laayouni et al. 7 and Rodríguez-Ezpeleta et al. 8 ). Such an interest in the genetics of Basques started with the remarkable observation of a high frequency of the Rh-negative blood group, 9 a genetic variant associated with the hemolytic disease of the newborn, which was confirmed in following studies. ...
... Genome-wide data in Basque groups have shown contradictory results, with some studies suggesting that French Basques are markedly different from Spanish Basques, the latter being similar to other Iberian populations; 7 whereas other data were interpreted as showing internal homogeneity within Basques and marked genetic differentiation from non-Basque groups. 8 These remarkable contradictory results might be explained by a limited methodology and resolution. The low number of samples used in these analyses to represent the Basque groups and their neighboring areas has supposed the major limiting factor. ...
Article
Basques have historically lived along the Western Pyrenees, in the Franco-Cantabrian region, straddling the current Spanish and French territories. Over the last decades, they have been the focus of intense research due to their singular cultural and biological traits that, with high controversy, placed them as a heterogeneous, isolated, and unique population. Their non-Indo-European language, Euskara, is thought to be a major factor shaping the genetic landscape of the Basques. Yet there is still a lively debate about their history and assumed singularity due to the limitations of previous studies. Here, we analyze genome-wide data of Basque and surrounding groups that do not speak Euskara at a micro-geographical level. A total of ∼629,000 genome-wide variants were analyzed in 1,970 modern and ancient samples, including 190 new individuals from 18 sampling locations in the Basque area. For the first time, local- and wide-scale analyses from genome-wide data have been performed covering the whole Franco-Cantabrian region, combining allele frequency and haplotype-based methods. Our results show a clear differentiation of Basques from the surrounding populations, with the non-Euskara-speaking Franco-Cantabrians located in an intermediate position. Moreover, a sharp genetic heterogeneity within Basques is observed with significant correlation with geography. Finally, the detected Basque differentiation cannot be attributed to an external origin compared to other Iberian and surrounding populations. Instead, we show that such differentiation results from genetic continuity since the Iron Age, characterized by periods of isolation and lack of recent gene flow that might have been reinforced by the language barrier.
... Our result highlights other communities relevant to genetic mapping efforts, some that are well established in the literature such Orkney and Shetland (20,21,46,47) or the Basque (48)(49)(50)(51) and some novel, e.g., the Channel Islands. The Channel Islands are an archipelago of isles off the northern coast of France and are a British dependency. ...
Article
Full-text available
Haplotype-based analyses have recently been leveraged to interrogate the fine-scale structure in specific geographic regions, notably in Europe, although an equivalent haplotype-based understanding across the whole of Europe with these tools is lacking. Furthermore, study of identity-by-descent (IBD) sharing in a large sample of haplotypes across Europe would allow a direct comparison between different demographic histories of different regions. The UK Biobank (UKBB) is a population-scale dataset of genotype and phenotype data collected from the United Kingdom, with established sampling of worldwide ancestries. The exact content of these non-UK ancestries is largely uncharacterized, where study could highlight valuable intracontinental ancestry references with deep phenotyping within the UKBB. In this context, we sought to investigate the sample of European ancestry captured in the UKBB. We studied the haplotypes of 5,500 UKBB individuals with a European birthplace; investigated the population structure and demographic history in Europe, showing in parallel the variety of footprints of demographic history in different genetic regions around Europe; and expand knowledge of the genetic landscape of the east and southeast of Europe. Providing an updated map of European genetics, we leverage IBD-segment sharing to explore the extent of population isolation and size across the continent. In addition to building and expanding upon previous knowledge in Europe, our results show the UKBB as a source of diverse ancestries beyond Britain. These worldwide ancestries sampled in the UKBB may complement and inform researchers interested in specific communities or regions not limited to Britain.
... The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule 10 . Previous genetic studies of Spain have examined either a small fraction of the genome [12][13][14] or only a few Spanish regions 15,16 . Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. ...
Preprint
Genetic differences within or between human populations (population structure) has been studied using a variety of approaches over many years. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unique among European regions in having a centuries-long period of Muslim rule. Previous genetic studies of Spain have examined either a small fraction of the genome or only a few Spanish regions. Thus, the overall pattern of fine-scale population structure within Spain remains uncharacterised. Here we analyse genome-wide genotyping array data for 1,413 Spanish individuals sampled from all regions of Spain. We identify extensive fine-scale structure, down to unprecedented scales, smaller than 10 Km in some places. We observe a major axis of genetic differentiation that runs from east to west of the peninsula. In contrast, we observe remarkable genetic similarity in the north-south direction, and evidence of historical north-south population movement. Finally, without making particular prior assumptions about source populations, we show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north Moroccans. The north African ancestry results from an admixture event, which we date to 860 - 1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.
... La selección natural termina distorsionando la imaxe (por cuenta les adaptaciones ambientales), polo qu'al rebuscu d'arbíes nueves, los pseudoxenes (versiones non funcionales de xenes, duplicaciones de xe-36 Anque la esbilla de marcadores ye cuestionada dende'l País Vascu (Rodríguez-Ezpeleta et al. 2010). 37 A la de falar de llingües más o menos complexes, toi refiriéndome al rexistru fonolóxicu, ensin custionar la so valideza como arbíes de comunicación, nun hai duldes que les llingües mentaes sían afayadices pa falar de cualisquier disciplina, como yá se demostrará apocayá nel casu de los munduruku que nun tienen palabres pa conceptos xeométricos, pero son a remanar relaciones xeométriques pa llocalizar obxetos (Dehaene et al. 2006). ...
Article
Full-text available
Fossil-hunters unearthed early hominin specimens in the African savannah but archaeological evidence is not by itself enough to pinpoint the emergence of modern humans. From the vantage point provided by molecular genetics, Cavalli-Sforza and others have studied DNA patterns in extant populations to generate a family tree rooted in Africa 100,000 years ago. Distribution of genetic markers can be used to track human migrations. Principal component analysis of gene frequencies shows different distributions associated to major historical events. Such an approach was used to confirm the spread of agriculture throughout Europe alongside proto-Indo-European languages (5,000 years ago). Agriculture spread gradually from the Fertile Crescent (area between Iraq and Turkey) along the Mediterranean coast and rivers of central Europe, reaching Britain, Denmark and Spain (the farthest regions) in 4,000 years. This genetic dissection approach also reveals Ice Age-related adaptations; or focusing on the Iberian Peninsula, PCA exposes an Atlantic/Mediterranean gradient, highlighting the divide between both cultures. Scientific approaches provide an explanation to human linguistic diversity without relying on cultural myths.
... e del DNA autosomico(Rodríguez-Ezpeleta et al. 2010), corroborarono l'ipotesi secondo la quale il popolo euskaldun, apparendo come omogeneo, potesse considerarsi parzialmente distinto rispetto ai raggruppamenti umani limitrofi. Nel Paese Basco, inoltre, il rapporto tra genealogia e genetica venne sviluppato con lo scopo di indagare le origini dell'insonnia familiare fatale (IFF) come stimatore indiretto dei livelli di consanguineità(Rodríguez Martínez et al. 2007).È importante prestare attenzione al linguaggio utilizzato da questi studiosi: esattamente come le narrazioni archeologiche di fine Ottocento, alcuni degli argomenti esposti dai genetisti sono capaci, se ripresi e rimaneggiati all'interno di un'arena politica, di rimodellare e riproporre narrative preesistenti intorno all'identità nazionale(Kent et al. 2014, 736).Verso la fine degli anni Novanta del Novecento abbiamo assistito in Euskal Herria, come in diverse parti del mondo, a una grande proliferazione di siti internet dedicati alla pratica del DNA consulting. ...
Article
Full-text available
In this paper I propose to demonstrate how the anthropological view can highlight the effects of the relationship between ethnic nationalism and global processes. Showing the results of an ethnographical research conducted in 2015, I focus on the representation of social identity in the contemporary Basque Country. I emphasize the central role played by the new technologies for the identification of the Basque communities’ boundaries. The creation of „the eighth province“ (or province of the diaspora) shows how, in this context, Internet could transform the “imagined community” into a virtual reality. The ethnographical view proves to be useful to understand how local practices and discourses can interact with global phenomena: particularly significant is the spreading of archaeogenetic investigations in Euskal Herria, in order to verify the hypothesis of a reproductive isolation of Basque people. Moreover, a big part of local population is using genetic tests proposed online by DNA consulting agencies. It is important to identify how these genetic narratives are absorbed and reused by local populations and if they can reshape the past of a mnemonic community, influencing the representation of its future.
... It is interesting to notice that the presence of two distinct groups in the Southwestern region stressed the outcome of the isolation the Basque-speaking group experienced, splitting from their non-Basque-speaking neighbors from the very same department (PA and PAB groups). This finding is in agreement with their recognized distinct cultural entity (Calafell and Bertranpetit 1994) and their genetic outlier position in the European landscape (Rodríguez-Ezpeleta et al. 2010), as also with the lower internal levels of differentiation we detected with the F ST analysis, and the low effective migration rates evidenced by EEMS, resulting in a barrier to migration in the southwestern corner of France. ...
Article
Full-text available
Unlike other European countries, the human population genetics and demographic history of Metropolitan France is surprisingly understudied. In this work, we combined newly genotyped samples from various zones in France with publicly available data and applied both allele frequency and haplotype-based methods to describe the internal structure of this country, using genome-wide single nucleotide polymorphism (SNP) array genotypes. We found out that French Basques, already known for their linguistic uniqueness, are genetically distinct from all other groups and that the populations from southwest France (namely the Gascony region) share a large proportion of their ancestry with Basques. Otherwise, the genetic makeup of the French population is relatively homogeneous and mostly related to Southern and Central European groups. However, a fine-grained, haplotype-based analysis revealed that Bretons slightly separated from the rest of the groups, due mostly to gene flow from the British Isles in a time frame that coincides both historically attested Celtic population movements to this area between the 3th and the ninth centuries CE, but also with a more ancient genetic continuity between Brittany and the British Isles related to the shared drift with hunter-gatherer populations. Haplotype-based methods also unveiled subtle internal structures and connections with the surrounding modern populations, particularly in the periphery of the country.
... Received 12 August 2019; Received in revised form 26 November 2019; Accepted 1 February 2020 exist when analysed at the genomic level. On the contrary, Rodriguez Ezpeleta et al. [12] analysed genome wide SNPs and concluded that Basques (French and Spanish) form a homogeneous population, which seems to cluster apart from the rest of the European populations. Nevertheless, the inclusion of further samples of "resident" Basque individuals in the Basque Country (individuals with at least one grandparent from outside the Basque Country) seems to fill in this gap between Basque and Europeans, with some of these "resident" Basque fitting well into the presumed Basque cluster. ...
Article
The Basque Country has been the focus of population (genetic) and evolutionary studies for decades, as it represents an interesting evolutionary feature: it is the only European country where a non-Indo-European language is still spoken today and, for which there are no known living or extinct relatives. Early studies that were based on anatomical and serological methods, along with subsequent molecular genetic investigations, contain controversial interpretations of their data. Additionally, the analysis of mitochondrial DNA, which is maternally inherited and thus suitable for the examination of the maternal phylogeny of the population, was the focus of some studies. Early mtDNA studies were however restricted to the information provided by the control region or its hypervariable segments only. These are known to harbour little phylogenetic information, particularly for haplogroup H that is dominant in Westeurasian populations including the Basques. Later studies analysed complete mitogenome sequences. Their information content is however limited, either because the number of samples was low, or because these studies only considered particular haplogroups. In this study we present the full mitogenome sequences of 178 autochthonous Basque individuals that were carefully selected based on their familial descent and discuss the observed phylogenetic signals in the light of earlier published findings. We confirm the presence of Basque-specific mtDNA lineages and extend the knowledge of these lineages by providing data on their distribution in comparison to other Basque and non-Basque populations. This dataset improves our understanding of the Basque mtDNA phylogeny and serves as a high-quality dataset that is provided via EMPOP for forensic genetic purposes.
Article
Why do individuals engage in violence against the state? This research investigates the biological and environmental determinants of individual-level participation in political violence through the use of a Candidate Gene Association, gene-environment interaction, study. Existing research has demonstrated that variation in a specific gene (called MAO-A) is associated with aggression. However, relatively little scholarly attention has been paid to the interaction with the environment; specifically, the ways in which repressive political environments differentially incite acts of violence. Using original genetic, survey and experimental data collected on participants and non-participants of political violence, I find that under conditions of political repression, individuals with the low MAO-A genetic variant are significantly more likely to engage in acts of political violence. By examining both the genetic and environmental factors influencing political violence, the results make a significant contribution to our understanding of how genetic variation may lead to violence.
Article
One of the main challenges of human population genetics has been the reconstruction of the population history of humans at different scales, from the origin of the modern humans to the history of specific groups. In all cases information from other historical sciences (including archaeology, linguistics and physical anthropology) should match in the unique frame of population history. Cavalli-Sforza, had a pioneering role in defining the problem and putting together a database of classical genetic markers and statistical methods to make the genetic approach of high relevance. One of the problems studied refers to the Basque population, establishing its distinctiveness and “origin”. As in many other settings, research in the area in the last few decades has flourished by adding much DNA information and statistical analysis to corroborate or correct the initial hypotheses. In the case of the Basques, the differentiation without strong external genetic influences has been confirmed as due to isolation, and instead of being pre-Neolithic, it is currently dated to the Iron Age, only some 2,500 year ago. Based on: “Bertranpetit J, Cavalli-Sforza LL. A genetic reconstruction of the history of the population of the Iberian Peninsula. Ann Hum Genet 1991; 55:51-67.”
Article
Full-text available
Here we report on the Y haplogroup and Y-STR diversity of the three autochthonous Basque populations of Alava (n = 54), Guipuzcoa (n = 30) and Vizcaya (n = 61). The same samples genotyped for Y-chromosome SNPs were typed for 17 Y-STR loci (DYS19, DYS385a/b, DYS398I/II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, Y-GATA H4) using the AmpFlSTR Yfiler system. Six major haplogroups (R, I, E, J, G, and DE) were detected, being R-S116 (P312) haplogroup the most abundant at 75.0% in Alava, 86.7% in Guipuzcoa and 87.3% in Vizcaya. Age estimates for the R-S116 mutation in the Basque Country are 3975 ± 303, 3680 ± 345 and 4553 ± 285 years for Alava, Guipuzcoa and Vizcaya, respectively. Pairwise Rst genetic distances demonstrated close Y-chromosome affinities among the three autochthonous Basque populations and between them and the male population of Ireland and Gascony. In a MDS plot, the population of Ireland segregates within the Basque cluster and closest to the population of Guipuzcoa, which plots closer to Ireland than to any of the other Basque populations. Overall, the results support the notion that during the Bronze Age a dispersal of individuals carrying the R-S116 mutation reached the Basque Country replacing the Paleolithic/Neolithic Y chromosome of the region.
Article
Full-text available
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).
Article
Full-text available
The Basques are a culturally isolated population, living across the western border between France and Spain and speaking a non-Indo-European language. They show outlier allele frequencies in the ABO, RH, and HLA loci. To test whether Basques are a genetic isolate with the features that would make them good candidates in genetic association studies, we genotyped 123 SNPs in a 1-Mb region in chromosome 22 in Basque samples from France and Spain, as well as in samples from northern and southern Spain, and in three North African samples. Both Basque samples showed similar levels of heterozygosity to the other populations, and the decay of linkage disequilibrium with physical distance was not different between Basques and non-Basques. Thus, Basques do not show the genetic properties expected in population isolates.
Article
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations (“admixture linkage disequilibium”). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.
Article
In analysis of multilocus genotypes from structured populations, individual coefficients of membership in subpopulations are often estimated using programs such as structure. distruct provides a general method for visualizing these estimated membership coefficients. Subpopulations are represented as colours, and individuals are depicted as bars partitioned into coloured segments that correspond to membership coefficients in the subgroups. distruct, available at http://www.cmb.usc.edu/~noahr/distruct.html, can also be used to display subpopulation assignment probabilities when individuals are assumed to have ancestry in only one group.
Article
Basques are a cultural isolate, and, according to mainly allele frequencies of classical polymorphisms, also a genetic isolate. We investigated the differentiation of Spanish Basques from the rest of Iberian populations by means of a dense, genome-wide SNP array. We found that F ST distances between Spanish Basques and other populations were similar to those between pairs of non-Basque populations. The same result is found in a PCA of individuals, showing a general distinction between Iberians and other South Europeans independently of being Basques. Pathogen-mediated natural selection may be responsible for the high differentiation previously reported for Basques at very specific genes such as ABO, RH, and HLA. Thus, Basques cannot be considered a genetic outlier under a general genome scope and interpretations on their origin may have to be revised.
Article
This is the version as published in the American Journal of Human Genetics by the University Of Chicago Press. Their website is http://www.journals.uchicago.edu/AJHG.home.html We have examined the worldwide distribution of a Y-chromosomal base-substitution polymorphism, the T/C transition at SRY-2627, where the T allele defines haplogroup 22; sequencing of primate homologues shows that the ancestral state cannot be determined unambiguously but is probably the C allele. Of 1,191 human Y chromosomes analyzed, 33 belong to haplogroup 22. Twenty-nine come from Iberia, and the highest frequencies are in Basques (11%; n=117) and Catalans (22%; n=32). Microsatellite and minisatellite (MSY1) diversity analysis shows that non-Iberian haplogroup-22 chromosomes are not significantly different from Iberian ones. The simplest interpretation of these data is that haplogroup 22 arose in Iberia and that non-Iberian cases reflect Iberian emigrants. Several different methods were used to date the origin of the polymorphism: microsatellite data gave ages of 1,650, 2,700, 3,100, or 3,450 years, and MSY1 gave ages of 1,000, 2,300, or 2,650 years, although 95% confidence intervals on all of these figures are wide. The age of the split between Basque and Catalan haplogroup-22 chromosomes was calculated as only 20% of the age of the lineage as a whole. This study thus provides evidence for direct or indirect gene flow over the substantial linguistic barrier between the Indo-European and non-Indo-European-speaking populations of the Catalans and the Basques, during the past few thousand years.
Article
Different analyses of genetic polymorphisms performed on the Basque population have suggested a possible heterogeneity of the Basques and a singularity of their genetic characteristics. In this paper, both aspects are analyzed by means of the genetic study of seven polymorphic systems--ACP, ADA, AK, ESD, PGD, GC, and HP--in 854 autochthonous individuals from the province of Vizcaya. The individuals were classified as being from the regions of Arratia, Guernica, Durango, Uribe, Marquina, Lea, and Bilbao, on the basis of the birthplaces of their four grandparents. Analyses for heterogeneity of the gene frequencies distribution suggest that there is a moderate genetic heterogeneity, probably produced by centuries of geographical and administrative isolation of these regions. The comparison with caucasoid populations, performed using the principal components analysis and Cavalli-Sforza and Edwards arc distance, indicates that the subpopulations of the province of Vizcaya have experienced little genetic exchange with other caucasoids and that the distribution of their genetic frequencies differentiates them from other populations.