[Show abstract][Hide abstract] ABSTRACT: The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modeling assumptions, compares results across different pre-determined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the post-processing of results of model-based population structure analyses. For analyzing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp, and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. This article is protected by copyright. All rights reserved.
This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: The majority of sub-Saharan Africans today speak a number of closely related languages collectively referred to as ‘Bantu’ languages. The current distribution of Bantu-speaking populations has been found to largely be a consequence of the movement of people rather than a diffusion of language alone. Linguistic and single marker genetic studies have generated various hypotheses regarding the timing and the routes of the Bantu expansion, but these hypotheses have not been thoroughly investigated. In this study, we re-analysed microsatellite markers typed for large number of African populations that—owing to their fast mutation rates—capture signatures of recent population history. We confirm the spread of west African people across most of sub-Saharan Africa and estimated the expansion of Bantu-speaking groups, using a Bayesian approach, to around 5600 years ago. We tested four different divergence models for Bantu-speaking populations with a distribution comprising three geographical regions in Africa. We found that the most likely model for the movement of the eastern branch of Bantu-speakers involves migration of Bantu-speaking groups to the east followed by migration to the south. This model, however, is only marginally more likely than other models, which might indicate direct movement from the west and/or significant gene flow with the western Branch of Bantu-speakers. Our study use multi-loci genetic data to explicitly investigate the timing and mode of the Bantu expansion and it demonstrates that west African groups rapidly expanded both in numbers and over a large geographical area, affirming the fact that the Bantu expansion was one of the most dramatic demographic events in human history.
Proceedings of the Royal Society B: Biological Sciences 09/2014; 281(1793). · 5.29 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The New World Arctic, the last region of the Americas to be populated by humans, has a relatively well-researched archaeology, but an understanding of its genetic history is lacking. We present genome-wide sequence data from ancient and present-day humans from Greenland, Arctic Canada, Alaska, Aleutian Islands, and Siberia. We show that Paleo-Eskimos (~3000 BCE to 1300 CE) represent a migration pulse into the Americas independent of both Native American and Inuit expansions. Furthermore, the genetic continuity characterizing the Paleo-Eskimo period was interrupted by the arrival of a new population, representing the ancestors of present-day Inuit, with evidence of past gene flow between these lineages. Despite periodic abandonment of major Arctic regions, a single Paleo-Eskimo metapopulation likely survived in near-isolation for more than 4000 years, only to vanish around 700 years ago.
[Show abstract][Hide abstract] ABSTRACT: The rapid advance of sequencing technology, coupled with improvements in molecular methods for obtaining genetic data from ancient sources, holds the promise of producing a wealth of genomic data from time-separated individuals. However, the population-genetic properties of time-structured samples have not been extensively explored. Here, we consider the implications of temporal sampling for analyses of genetic differentiation, and use a temporal coalescent framework to show that complex historical events such as size reductions, population replacements, and transient genetic barriers between populations leave a footprint of genetic differentiation that can be traced through history using temporal samples. Our results emphasize explicit consideration of the temporal structure when making inferences, and indicate that genomic data from ancient individuals will greatly increase our ability to reconstruct population history.
Molecular Biology and Evolution 06/2014; · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide scans for regions that demonstrate deviating patterns of genetic variation have become common approaches for finding genes targeted by selection. Several genomic patterns have been utilized for this purpose, including deviations in haplotype homozygosity, frequency spectra and genetic differentiation between populations.
[Show abstract][Hide abstract] ABSTRACT: Prehistoric population structure associated with the transition to an agricultural lifestyle in Europe remains contentious. Population-genomic data from eleven Scandinavian Stone-Age human remains suggest that hunter-gatherers had lower genetic diversity than farmers. Despite their close geographical proximity, the genetic differentiation between the two Stone-Age groups was greater than that observed among extant European populations. Additionally, the Scandinavian Neolithic farmers exhibited a greater degree of hunter-gatherer-related admixture than that of the Tyrolean Iceman, who also originated from a farming context. In contrast, Scandinavian hunter-gatherers displayed no significant evidence of introgression from farmers. Our findings suggest that Stone-Age foraging groups were historically in low numbers, likely due to oscillating living conditions or restricted carrying-capacity, and that they were partially incorporated into expanding farming groups.
[Show abstract][Hide abstract] ABSTRACT: The ability to digest milk into adulthood, lactase persistence (LP), as well as specific genetic variants associated with LP, is heterogeneously distributed in global populations [1-4]. These variants were most likely targets of selection when some populations converted from hunter-gatherer to pastoralist or farming lifestyles [5-7]. Specific LP polymorphisms are associated with particular geographic regions and populations [1-4, 8-10]; however, they have not been extensively studied in southern Africa. We investigate the LP-regulatory region in 267 individuals from 13 southern African populations (including descendants of hunter-gatherers, pastoralists, and agropastoralists), providing the first comprehensive study of the LP-regulatory region in a large group of southern Africans. The "East African" LP single-nucleotide polymorphism (SNP) (14010G>C) was found at high frequency (>20%) in a strict pastoralist Khoe population, the Nama of Namibia, suggesting a connection to East Africa, whereas the "European" LP SNP (13910C>T) was found in populations of mixed ancestry. Using genome-wide data from various African populations, we identify admixture (13%) in the Nama, from an Afro-Asiatic group dating to >1,300 years ago, with the remaining fraction of their genomes being from San hunter-gatherers. We also find evidence of selection around the LCT gene among Khoe-speaking groups, and the substantial frequency of the 14010C variant among the Nama is best explained by adaptation to digesting milk. These genome-local and genome-wide results support a model in which an East African group brought pastoralist practices to southern Africa and admixed with local hunter-gatherers to form the ancestors of Khoe people.
Current biology: CB 04/2014; · 10.99 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Clovis, with its distinctive biface, blade and osseous technologies, is the oldest widespread archaeological complex defined in North America, dating from 11,100 to 10,700 (14)C years before present (bp) (13,000 to 12,600 calendar years bp). Nearly 50 years of archaeological research point to the Clovis complex as having developed south of the North American ice sheets from an ancestral technology. However, both the origins and the genetic legacy of the people who manufactured Clovis tools remain under debate. It is generally believed that these people ultimately derived from Asia and were directly related to contemporary Native Americans. An alternative, Solutrean, hypothesis posits that the Clovis predecessors emigrated from southwestern Europe during the Last Glacial Maximum. Here we report the genome sequence of a male infant (Anzick-1) recovered from the Anzick burial site in western Montana. The human bones date to 10,705 ± 35 (14)C years bp (approximately 12,707-12,556 calendar years bp) and were directly associated with Clovis tools. We sequenced the genome to an average depth of 14.4× and show that the gene flow from the Siberian Upper Palaeolithic Mal'ta population into Native American ancestors is also shared by the Anzick-1 individual and thus happened before 12,600 years bp. We also show that the Anzick-1 individual is more closely related to all indigenous American populations than to any other group. Our data are compatible with the hypothesis that Anzick-1 belonged to a population directly ancestral to many contemporary Native Americans. Finally, we find evidence of a deep divergence in Native American populations that predates the Anzick-1 individual.
[Show abstract][Hide abstract] ABSTRACT: Ancestral relationships between populations separated by time represent an often neglected dimension in population genetics, a field which historically has focused on analysis of spatially distributed samples from the same point in time. Models are usually straightforward when two time-separated populations are assumed to be completely isolated from all other populations. However, this is usually an unrealistically stringent assumption when there is gene flow with other populations. Here we investigate continuity in the presence of gene flow from unknown populations. This set-up allows a more nuanced treatment of questions regarding population continuity in terms of "level of contribution" from a particular ancient population to a more recent population. We propose a statistical framework which makes use of a biallelic marker sampled at two different points in time to assess population contribution, and present two different interpretations of the concept. We apply the approach to published data from a prehistoric human population in Scandinavia (Malmström et al. 2009) and Pleistocene woolly mammoth (Barnes et al. 2007; Debruyne et al. 2008).
Molecular Biology and Evolution 02/2014; · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: One of the main impediments for obtaining DNA sequences from ancient human skeletons is the presence of contaminating modern human DNA molecules in many fossil samples and laboratory reagents. However, DNA fragments isolated from ancient specimens show a characteristic DNA damage pattern caused by miscoding lesions that differs from present day DNA sequences. Here, we develop a framework for evaluating the likelihood of a sequence originating from a model with postmortem degradation-summarized in a postmortem degradation score-which allows the identification of DNA fragments that are unlikely to originate from present day sources. We apply this approach to a contaminated Neandertal specimen from Okladnikov Cave in Siberia to isolate its endogenous DNA from modern human contaminants and show that the reconstructed mitochondrial genome sequence is more closely related to the variation of Western Neandertals than what was discernible from previous analyses. Our method opens up the potential for genomic analysis of contaminated fossil material.
Proceedings of the National Academy of Sciences 01/2014; · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians, there is no consensus with regard to which specific Old World populations they are closest to. Here we sequence the draft genome of an approximately 24,000-year-old individual (MA-1), from Mal'ta in south-central Siberia, to an average depth of 1×. To our knowledge this is the oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome belongs to haplogroup U, which has also been found at high frequency among Upper Palaeolithic and Mesolithic European hunter-gatherers, and the Y chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most Native American lineages. Similarly, we find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically closely related to modern-day Native Americans, with no close affinity to east Asians. This suggests that populations related to contemporary western Eurasians had a more north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we estimate that 14 to 38% of Native American ancestry may originate through gene flow from this ancient population. This is likely to have occurred after the divergence of Native American ancestors from east Asian ancestors, but before the diversification of Native American populations in the New World. Gene flow from the MA-1 lineage into Native American ancestors could explain why several crania from the First Americans have been reported as bearing morphological characteristics that do not resemble those of east Asians. Sequencing of another south-central Siberian, Afontova Gora-2 dating to approximately 17,000 years ago, revealed similar autosomal genetic signatures as MA-1, suggesting that the region was continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal that western Eurasian genetic signatures in modern-day Native Americans derive not only from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the First Americans.
[Show abstract][Hide abstract] ABSTRACT: Reconstructing historical variation of population size from sequence and single-nucleotide polymorphism (SNP) data is valuable for understanding the evolutionary history of species. Changes in the population size of humans have been thoroughly investigated, and we review different methodologies of demographic reconstruction, specifically focusing on human bottlenecks. In addition to the classical approaches based on the site-frequency spectrum (SFS) or based on linkage disequilibrium, we also review more recent approaches that utilize atypical shared genomic fragments, such as identical by descent or homozygous segments between or within individuals. Compared with methods based on the SFS, these methods are well suited for detecting recent bottlenecks. In general, all these various methods suffer from bias and dependencies on confounding factors such as population structure or poor specification of the mutational and recombination processes, which can affect the demographic reconstruction. With the exception of SFS-based methods, the effects of confounding factors on the inference methods remain poorly investigated. We conclude that an important step when investigating population size changes rests on validating the demographic model by investigating to what extent the fitted demographic model can reproduce the main features of the polymorphism data.Heredity advance online publication, 20 February 2013; doi:10.1038/hdy.2012.120.
[Show abstract][Hide abstract] ABSTRACT: Genetic differentiation among human populations is greatly influenced by geography due to the accumulation of local allele frequency differences. However, little is known about the possibly different increment of genetic differentiation along the different geographical axes (north-south, east-west, etc). Here we provide new methods to examine the asymmetrical patterns of genetic differentiation. We analyzed genome-wide polymorphism data from populations in Africa (n = 29), Asia (n = 26), America (n = 9) and Europe (n = 38), and we found that the major orientations of genetic differentiation are north-south in Europe and Africa, east-west in Asia, but no preferential orientation was found in the Americas. Additionally, we showed that the localization of the individual geographic origins based on SNP data was not equally precise along all orientations. Confirming our findings, we obtained that in each continent, the orientation along which the precision is maximal corresponds to the orientation of maximum differentiation. Our results have implications for interpreting human genetic variation in terms of isolation by distance and spatial range expansion processes. In Europe for instance, the precise NNW-SSE axis of main European differentiation can not be explained by a simple Neolithic demic diffusion model without admixture with the local populations because in that case the orientation of greatest differentiation should be perpendicular to the direction of expansion. In addition to humans, anisotropic analyses can guide the description of genetic differentiation for other organisms and provide information on expansions of invasive species or the processes of plant dispersal.
Molecular Biology and Evolution 11/2012; · 14.31 Impact Factor