ArticlePDF Available

Abstract and Figures

The Khoisan populations of southern Africa are known to harbor some of the deepest-rooting lineages of human mtDNA; however, their relationships are as yet poorly understood. Here, we report the results of analyses of complete mtDNA genome sequences from nearly 700 individuals representing 26 populations of southern Africa who speak diverse Khoisan and Bantu languages. Our data reveal a multilayered history of the indigenous populations of southern Africa, who are likely to be the result of admixture of different genetic substrates, such as resident forager populations and pre-Bantu pastoralists from East Africa. We find high levels of genetic differentiation of the Khoisan populations, which can be explained by the effect of drift together with a partial uxorilocal/multilocal residence pattern. Furthermore, there is evidence of extensive contact, not only between geographically proximate groups, but also across wider areas. The results of this contact, which may have played a role in the diffusion of common cultural and linguistic features, are especially evident in the Khoisan populations of the central Kalahari. Am J Phys Anthropol, 2013. © 2013 Wiley Periodicals, Inc.
Content may be subject to copyright.
Unraveling the Complex Maternal History of
Southern African Khoisan Populations
Chiara Barbieri,
* Tom G
Christfried Naumann,
Linda Gerlach,
Falko Berthold,
Hirosi Nakagawa,
Sununguko W. Mpoloka,
Mark Stoneking,
and Brigitte Pakendorf
Max Planck Research Group on Comparative Population Linguistics, MPI for Evolutionary
Anthropology, Leipzig 04103, Germany.
Department of African Studies, Humboldt University, Berlin 10099, Germany
Department of Linguistics, MPI for Evolutionary Anthropology, Leipzig 04103, Germany
Institute of Global Studies, Tokyo University of Foreign Studies, Tokyo 183-8534, Japan
Department of Biological Sciences, University of Botswana, Gaborone, Botswana
Department of Evolutionary Genetics, MPI for Evolutionary Anthropology, Leipzig 04103, Germany
KEY WORDS mtDNA; haplogroup; foragers
ABSTRACT The Khoisan populations of southern
Africa are known to harbor some of the deepest-rooting
lineages of human mtDNA; however, their relationships
are as yet poorly understood. Here, we report the results
of analyses of complete mtDNA genome sequences from
nearly 700 individuals representing 26 populations of
southern Africa who speak diverse Khoisan and Bantu
languages. Our data reveal a multilayered history of the
indigenous populations of southern Africa, who are likely
to be the result of admixture of different genetic sub-
strates, such as resident forager populations and pre-
Bantu pastoralists from East Africa. We find high levels
of genetic differentiation of the Khoisan populations,
which can be explained by the effect of drift together
with a partial uxorilocal/multilocal residence pattern.
Furthermore, there is evidence of extensive contact, not
only between geographically proximate groups, but also
across wider areas. The results of this contact, which
may have played a role in the diffusion of common cul-
tural and linguistic features, are especially evident in the
Khoisan populations of the central Kalahari. Am J Phys
Anthropol 153:435–448, 2014. V
C2013 Wiley Periodicals, Inc.
African populations are increasingly the focus of
genetic studies, in particular those characterized by the
simultaneous presence of an ancestral way of subsist-
ence (predominantly foraging) together with deep-
rooting genetic lineages, like the Pygmies of Central
Africa and the Khoisan of southern Africa (Tishkoff
et al., 2009; Batini et al., 2011; Henn et al., 2011; Veera-
mah et al., 2011; Lachance et al., 2012; Pickrell et al.,
2012; Schlebusch et al., 2012; Verdu et al., 2013). With
the term “Khoisan” we refer to the hunter-gatherer and
pastoralist populations of southern Africa that speak
indigenous non-Bantu languages characterized by heavy
use of click consonants, without any assumption about
their genetic or linguistic unity (cf. Barnard, 1992).
There is archeological evidence for the continuous
presence of foragers in the Kalahari region since the
Late Stone Age 30,000 years ago (Denbow, 1984; Dea-
con and Deacon, 1999). Much later in time, signals of
pastoralist and Iron Age agriculturalist cultures begin to
appear in the archeological record. Pottery and remains
of domesticated animals appear almost simultaneously
2000 years ago in the coastal regions of what is now
South Africa and Namibia, and in northern Botswana.
One hypothesis suggests that this pastoralist culture ori-
ginated in East Africa, where domesticated species are
found as early as 4000 years ago (Deacon and Deacon,
1999; Phillipson, 2005), and was brought to southern
Africa by an immigration of East African herders,
spreading rapidly over the entire territory (Deacon and
Deacon, 1999; Mitchell, 2002; Pleurdeau et al., 2012). A
contrasting explanation is cultural diffusion, according
to which “hunters with sheep” would have autonomously
embarked on the transition to a new way of subsistence
after coming into contact with populations of herders
from the north (Kinahan, 1991; Sadr, 1998). However,
such a rapid shift in lifestyle and cultural paradigm is
Additional Supporting Information may be found in the online ver-
sion of this article.
Grant sponsor: Deutsche Forschungsgemeinschaft (as part of the
European Science Foundation EUROCORES Programme EuroBA-
BEL); Grant sponsor: Japan Society for the Promotion of Science
(B), Grant-in-Aid for Scientific Research; Grant number: 19401019;
Grant sponsor: Max Planck Society.
*Correspondence to: Chiara Barbieri, MPI for Evolutionary
Anthropology, Deutscher Platz 6, Leipzig 04103, Germany. E-mail: (or) Brigitte Pakendorf, Institut des Sci-
ences de l’Homme, DDL, 14 avenue Berthelot, 69363 Lyon CEDEX
07, France. E-mail:
Present Affiliation of Chiara Barbieri: Department of Evolutionary
Genetics, MPI for Evolutionary Anthropology, Leipzig, 04103, Germany.
Present Affiliation of Linda Gerlach and Falko Berthold: Depart-
ment of Linguistics, MPI for Evolutionary Anthropology, Leipzig,
04103; Department of African Studies, Humboldt University, Berlin
10099, Germany.
Present Affiliation of Brigitte Pakendorf: Laboratoire Dynamique
du Langage, UMR5596, CNRS and Universit
e Lyon Lumie
`re 2,
Lyon, 69007, France.
Received 22 July 2013; revised 12 November 2013; accepted 15
November 2013
DOI: 10.1002/ajpa.22441
Published online 9 December 2013 in Wiley Online Library
hard to reconcile with the ethnographic evidence (Smith,
1990; Barnard, 2008). The pastoralist tradition predates
the arrival of the agriculturalist Bantu speakers, whose
culture appears in the archeological record of southern
Africa not earlier than 2000–1200 years ago (Reid et al.,
1998; Phillipson, 2005; Kinahan, 2011).
Southern African Khoisan populations speak lan-
guages that belong to three distinct families (Fig. 1):
Kx’a (Heine and Honken, 2010), Tuu (G
2005), and Khoe-Kwadi (G
uldemann, 2004; G
and Elderkin, 2012). Kx’a and Tuu share some linguistic
features, and their distribution is mostly centered in the
Kalahari and its immediate surroundings. Speakers of
dialects belonging to the Ju branch of Kx’a are settled
mainly somewhat to the northwest of the Kalahari, in
northeast Botswana, northern Namibia, and southern
Angola. The Tuu language family was formerly more
widespread than today, covering most of South Africa as
well as parts of Botswana and Namibia; however, in
South Africa most Khoisan populations have assimilated
culturally and linguistically to neighboring populations.
Khoe-Kwadi languages are distributed over a large geo-
graphic area, including the Kalahari, western Namibia,
the Okavango river delta, and the salt pans to the east
of the Kalahari; the now extinct language Kwadi was
spoken in southern Angola. While all speakers of Kx’a
and Tuu languages are (or were) foragers, Khoe-Kwadi
speakers exhibit diverse subsistence strategies: the
majority are (or were) foragers (with a focus on fishing
in the Okavango river), but the now extinct Kwadi of
Angola and the Nama of Namibia were traditionally pas-
toralists, and the Damara had a mixed pattern of sub-
sistence involving hunting and gathering as well as
herding of small livestock (Barnard, 1992). Lastly, there
are phenotypic differences: while the majority of Khoi-
san populations have on average light skin pigmentation
and relatively short stature (a phenotype we here refer
to as the “Khoisan phenotype”), the Damara from Nami-
bia as well as populations of the eastern Kalahari and of
the Okavango region are characterized by on average
taller stature and darker skin pigmentation; the latter
two groups were therefore known as “Black Bushmen”
(e.g., Weiner et al., 1964; Jenkins, 1986).
From a genetic perspective, Khoisan populations are
known to harbor the deepest-rooting clades of uniparen-
tal lineages (Behar et al., 2008; Naidoo et al., 2010; Bar-
bieri et al., 2013a; Schlebusch et al., 2013), but until
recently not much was known about the relationships
between individual populations and the distribution of
genetic variation in these populations. Two novel studies
of autosomal DNA diversity in extended datasets of
Khoisan populations from southern Africa demonstrated
an ancient split that dates within the past 30,000
years, dividing Khoisan populations of the northwest
Kalahari Basin from those settled to the southeast or
south (Pickrell et al., 2012; Schlebusch et al., 2012). Fur-
thermore, both studies detect genetic links with East
Africa in the Nama and other Khoe-Kwadi speakers.
This is in good accordance with the hypothesis that the
ancestor of the Khoe-Kwadi languages was brought to
southern Africa by the pre-Bantu immigration of pastor-
alists detectable in the archeological record (G
Fig. 1. Studied populations. (a) Map of approximate location of the 26 populations included in this study, with symbols indicat-
ing their linguistic affiliation. The gray area indicates the Kalahari semi-desert. (b) Schema of Khoisan linguistic relationships.
[Color figure can be viewed in the online issue, which is available at]
American Journal of Physical Anthropology
2008). A similar link of the Khwe from southern Angola
and the Caprivi Strip, who speak a Khoe-Kwadi lan-
guage, with East African pastoralists was detected in
the shared presence at high frequency of the Y-
chromosomal haplogroup E-M293 (Henn et al., 2008)
that is rare elsewhere in Africa (de Filippo et al., 2011).
Finally, autosomal and mtDNA studies display evidence
of varying degrees of non-indigenous ancestry in all
Khoisan populations, which could reflect contact
between indigenous populations and immigrating pre-
Bantu pastoralists and/or Bantu-speaking populations
that took place at different periods of time in different
areas (Pickrell et al., 2012; Schlebusch et al., 2013).
The mtDNA variation of most Khoisan populations is
characterized by high frequencies of the deepest clades of
the mtDNA phylogeny, namely haplogroups L0d and L0k
(Behar et al., 2008; Barbieri et al., 2013a; Schlebush
et al., 2013). A minor presence of these haplogroups in
neighboring Bantu-speaking populations can be explained
by gene flow after the ancestors of these populations
reached these southernmost areas of their migration; the
proportion of L0d and L0k in Bantu-speaking populations
is higher than that of the characteristic Khoisan Y-
chromosomal haplogroups A-M91 and B-M112, in line
with sex-biased gene flow after contact (Coelho et al.,
2009; Quintana-Murci et al., 2010; Schlebusch et al.,
2011; Barbieri et al., 2013b). The source of haplogroups
other than mtDNA L0d/L0k and Y-chromosome A-M91/B-
M112 in Khoisan foragers has been identified with Bantu
agriculturalists (Schlebusch et al., 2013); however, the
possibility of gene flow from pastoralists or other pre-
Bantu populations should not be dismissed out of hand.
This study is one of the first to investigate the history
of Khoisan populations using complete mtDNA genome
sequences from a large set of populations. We analyze a
total of nearly 700 complete mtDNA genome sequences
from 19 Khoisan populations covering the three linguistic
families Kx’a, Tuu and Khoe-Kwadi and including both
hunter-gatherers and pastoralists, as well as from seven
neighboring Bantu-speaking populations. Our dataset
covers most of the extant variability in Khoisan popula-
tions, but lacks samples from South African populations
whose heritage languages belonged to the Tuu family and
the Khoekhoe group of Khoe-Kwadi, as well as the
extinct Kwadi of Angola; for this reason we refer to the
“Khoe family” and “Khoe speakers” instead of the “Khoe-
Kwadi family” and “Khoe-Kwadi speakers” in the remain-
der of this article. With these data, we aim at investigat-
ing the relationships among Khoisan populations as well
as evidence for gene flow among them. In particular, we
focus on the following research questions: 1) How is the
maternal genetic component structured in Khoisan, and
does it mirror the genetic structure emerging from the
genome-wide data? 2) How much contact was there
between different Khoisan populations and to what
extent does contact correlate with geographic proximity?
3) Can we detect traces of the hypothesized East African
ancestry of populations speaking Khoe languages?
The dataset
Samples were collected in Botswana and Namibia
between 2009 and 2011 in the framework of a multidisci-
plinary research project focusing on the prehistory of
southern African Khoisan (
kba/). The collection was approved by the ethical review
board of the University of Leipzig and authorized by the
governments of Botswana and of Namibia (Research per-
mit CYSC 1/17/2 IV (8) from the Ministry of Youth Sport
and Culture of Botswana, and 17/3/3 from the Ministry
of Health and Social Services of Namibia). Each
TABLE 1. Populations included in the study with values of diversity
affiliation Subsistence Phenotype
cluster n
div (p) Variance
diversity sd
Taa East Tuu Forager Khoisan SOUTH-CENTRAL 30 0.0015 0.000001 0.95 0.02
Taa North Tuu Forager Khoisan SOUTH-CENTRAL 25 0.0022 0.000001 0.94 0.03
Taa West Tuu Forager Khoisan SOUTH-CENTRAL 31 0.0028 0.000002 0.96 0.02
Hoan Kx’a Forager Khoisan SOUTH-CENTRAL 13 0.0010 0.000000 0.79 0.11
G|ui Khoe Forager Khoisan CENTRAL 31 0.0022 0.000001 0.92 0.03
G||ana Khoe Forager Khoisan CENTRAL 15 0.0018 0.000001 0.98 0.03
Naro Khoe Forager Khoisan CENTRAL 35 0.0029 0.000002 0.99 0.01
Ju|’hoan North Kx’a Forager Khoisan NORTHWEST 40 0.0028 0.000002 0.92 0.03
Ju|’hoan South Kx’a Forager Khoisan NORTHWEST 44 0.0029 0.000002 0.98 0.01
!Xuun Kx’a Forager Khoisan NORTHWEST 27 0.0031 0.000002 0.99 0.02
Hai||om Khoe Various Khoisan NORTHWEST 51 0.0035 0.000003 0.98 0.01
Nama Khoe Pastoralist Khoisan NAMA 29 0.0033 0.000003 0.99 0.01
||Ani Khoe Various Non-Khoisan OKAVANGO 18 0.0037 0.000004 0.96 0.03
Buga Khoe Various Non-Khoisan OKAVANGO 14 0.0037 0.000004 0.90 0.06
||Xo Khoe Various Non-Khoisan OKAVANGO 17 0.0041 0.000004 0.86 0.07
Tshwa Khoe Various Non-Khoisan EAST 22 0.0039 0.000004 0.94 0.03
Tcire Tcire Khoe Various Non-Khoisan EAST 12 0.0039 0.000004 0.97 0.04
Shua Khoe Various Non-Khoisan EAST 42 0.0039 0.000004 0.95 0.02
Damara Khoe Pastoralist Non-Khoisan NW-NAMIBIA 38 0.0028 0.000002 0.89 0.04
Herero Bantu Pastoralist – NW-NAMIBIA 30 0.0025 0.000002 0.94 0.03
Himba Bantu Pastoralist – NW-NAMIBIA 21 0.0024 0.000002 0.93 0.04
Kgalagadi Bantu Various – BANTU 19 0.0037 0.000003 0.97 0.03
Tswana Bantu Various – BANTU 17 0.0037 0.000004 0.99 0.02
Kalanga Bantu Various – BANTU 17 0.0042 0.000005 1.00 0.02
Tonga Bantu Various – BANTU 22 0.0042 0.000004 1.00 0.01
Mbukushu Bantu Various – BANTU 20 0.0042 0.000005 0.99 0.02
Nuc. div: Nucleotide Diversity, sd: Standard Deviation.
American Journal of Physical Anthropology
individual gave written consent after the purpose of the
study was explained with the help of local translators.
Details of the sample collection and DNA extraction from
saliva have been reported in the Supporting Information
of Pickrell et al. (2012). While in that study a reduced set
of 187 individuals was chosen for genome-wide SNP typ-
ing from a total of 22 African populations, in this study we
consider almost all the unrelated individuals from the
same sample collection from Botswana and Namibia. Rela-
tives were excluded from the analysis as far as they could
be ascertained from the information provided, as were
individuals with unclear ethnolinguistic family back-
ground, resulting in a dataset of 665 individuals belonging
to 19 Khoisan and five Bantu-speaking populations from
Botswana and Namibia. This dataset was augmented with
22 Tonga and 12 Mbukushu sequences from Zambia (Bar-
bieri et al., 2013b); these Mbukushu sequences were
merged with data from Mbukushu samples obtained in
Namibia, after checking for genetic homogeneity.
Nineteen sequences were not included in analyses
based on population comparisons because they belong to
populations with sample sizes below 12 individuals;
these are speakers of Khoe languages from Botswana (8
individuals) and of Bantu languages from Namibia (11
individuals). These sequences were included only in com-
parisons of haplotypes (i.e., network analyses). We
assigned the remaining 680 individuals to 26 popula-
tions on the basis of ethnolinguistic self-affiliation; the
populations and their linguistic affiliation are provided
in Table 1 and Figure 1. Populations were grouped
together according to their geographic distribution, and
in some cases taking into consideration their linguistic
affiliation and way of subsistence, into eight clusters
(see Table 1). This was done to simplify the interpreta-
tion of sequence sharing and networks, and for analyses
performed in BEAST, where larger sample sizes improve
the performance of the methods.
Sequence and data analysis
Genomic libraries were made from sheared DNA,
tagged with either single or double indexes, and
enriched for mtDNA following protocols described in
Meyer and Kircher (2010) and Maricic et al. (2010); see
also Supporting Information in Barbieri et al. (2012).
The libraries were sequenced on the Illumina GAIIx
platform, using either single or paired end runs of 76 bp
length, resulting in an average coverage of 4003. Read
adaptors were trimmed, and reads were filtered for hav-
ing at most 5 bases with a quality score <15 and indexes
for having no base with a quality score <10. Sequences
were manually checked with Bioedit ( and read alignments were
screened with ma (Briggs et al., 2009) to exclude align-
ment errors and confirm INDELS. The sequences
belonging to haplogroups L0d and L0k were already sub-
mitted to Genbank (
bank/; Barbieri et al., 2013a) and given accession
numbers KC345764-KC346248; the remaining 218
sequences were given accession numbers KC622055-
KC622272. The two poly-C regions (np 303-315, 16183-
16194), which are prone to sequencing errors, were
trimmed from the final alignment used in the analysis.
In the final alignment of 699 sequences, 97 samples
have between one and eight missing nucleotides (result-
ing in a maximum of 0.05% missing data per sequence
and a total of 160 missing nucleotides in the dataset). Of
these missing nucleotides, 81 occurred among the 1,233
polymorphic positions detected in the dataset. To mini-
mize the impact of missing data on the polymorphic
positions, we applied imputation using stringent criteria,
replacing missing sites with the nucleotide that was
present in at least two otherwise identical haplotypes of
the dataset. One hundred thirty nine positions, 75 of
which were among the polymorphic sites, were imputed
in 79 individuals. After imputation, the maximum num-
ber of missing sites per sample was three (with 18 sam-
ples still containing missing sites), and in the final
alignment only a total of 21 sites with Ns in one or more
individuals were excluded from the analysis. Haplogroup
assignment was performed with the online tool Haplo-
grep (Kloss-Brandst
atter et al., 2011).
Values of nucleotide diversity and variance were calcu-
lated in R with the package Pegas (Paradis, 2010). Corre-
spondence analysis (CA) was performed with the package
ca (Nenadic and Greenacre, 2007) using the haplogroup
frequencies reported in the Supporting Information table.
Nonmetric multidimensional Scaling (MDS) analyses were
performed with the function “isoMDS” from the package
MASS (Venables and Ripley, 2002). AMOVA, values of
sequence diversity and U
matrices of distances were
computed in Arlequin ver. 3.11. A Mantel test was per-
formed between genetic (U
) and geographic distances
with the R package vegan (Oksanen et al., 2012); geo-
graphic distances between populations were averaged over
GPS data from the individual sampling locations with the
function of the package fields (Furrer et al.,
2012). A neighbor-joining tree of the populations was gen-
erated from a U
matrix of distances with the function
“nj” of the package ape (Paradis et al., 2004). A heatplot of
haplotypes shared between at least two populations was
generated in R, with the frequency of the respective haplo-
types in each population indicated by variable shading.
Median-joining networks (Bandelt et al., 1999) with
all sites given equal weights and no pre- or post-
processing steps were computed with Network 4.11
( and visualized in Network
Publisher. Branches showing starlike signals of expan-
sions were dated using the rho statistic (Forster et al.,
1996) implemented in Network, with the calculator pro-
vided as a Supporting Information by Soares et al.
(2009). In the L0d1 network, branches are labeled with
subhaplogroup names, according to the nomenclature
proposed in Barbieri et al. (2013a).
BEAST (v1.7.2; Drummond et al., 2012) was used to
construct Bayesian Skyline Plots, based on the whole
mtDNA sequence and using the mutation rate of 1.665
from Soares et al. (2009). A Generalized Time
Reversible model was applied, and multiple runs were
performed for each dataset, using 30 million chains.
Simulations were performed in Serial Simcoal (Ander-
son et al., 2005) to estimate the probability of retaining
identical whole mtDNA sequence types after a given
number of generations following a population split,
starting from effective population sizes of 100, 1,000,
5,000 and 10,000 individuals. We based our simulations
on the two groups emerging from the autosomal data—
NW Kalahari and SE Kalahari—which are estimated to
have split within the last 30,000 years (Pickrell et al.,
2012). The populations included in the two groups were
chosen according to Supporting Information Figure S18
of Pickrell et al. (2012): the Northwest Kalahari group
(NW Kalahari) included the Ju|’hoan South, Ju|’hoan
North, !Xuun, and Hai||om (and thus corresponds to our
American Journal of Physical Anthropology
NORTHWEST cluster), and the Southeast Kalahari group
(SE Kalahari) included the Taa North, Taa East, Taa
West, Hoan, G||ana, Shua and Tshwa. The resulting
groups had sample sizes of 162 for the NW Kalahari and
209 for the SE Kalahari, with seven haplotypes shared
between the groups. We proceeded as follows: the initial
population was split in two populations, N
was kept
constant, and no migration was considered. The time
after the split was calculated applying a generation time
of 25 years (Fenner, 2005). The possibility of generating
new haplotypes was taken into account: mutations could
occur following a Kimura 2-Parameter mutation model
with the mutation rate for full mtDNA genomes from
Soares et al. (2009), which followed a gamma distribu-
tion. For each effective population size and split time we
ran 1000 iterations, and calculated both the probability
of retaining identical haplotypes and the average num-
ber of haplotypes retained, sampling 162 and 209
Khoisan mtDNA variation, population
size, and demography
The haplogroups L0d and L0k are the most common
haplogroups in our dataset: L0d1 is present at 38%,
L0d2 at 16% and L0k at 11%. As discussed in detail in
Barbieri et al. (2013a), these haplogroups are present
in higher proportions in most Khoisan populations than
in populations speaking Bantu languages (Supporting
Information Table). Apart from L0d and L0k, the other
haplogroups found in the dataset have a non-uniform
distribution. They mainly characterize and distinguish
Bantu-speaking populations from each other, although
some are also present in certain Khoisan populations,
especially in those of the OKAVANGO and EAST clusters (cf.
Supporting Information Fig. S1a).
The MDS analysis based on pairwise U
values (Fig.
2) demonstrates a lack of clear structure in the data,
with no distinct linguistically or geographically defined
groupings emerging. The only apparent groups are
formed by the Bantu-speaking Himba and Herero with
the Khoe-speaking Damara, on the one hand, and the
Khoe-speaking G|ui and Kx’a-speaking Hoan on the
other; furthermore, the Taa East are another outlier.
Notwithstanding their geographic location in southern-
central Namibia and their pastoralist subsistence, the
Nama are genetically similar to foraging populations of
northern Namibia and central and eastern Botswana.
The separation of some populations and the striking
genetic proximity of others is also reflected in the matrix
of pairwise genetic distances (Supporting Information
Fig. S2): here, several populations are visibly distin-
guished as having large genetic distances from almost
all of the other populations, for example the Himba,
Herero, Damara, ||Xo, Tonga, and Mbukushu. In con-
trast, populations of the SOUTH-CENTRAL,CENTRAL and
NORTHWEST clusters appear genetically close to each
other, with the exception of the G|ui, Taa East, and
Hoan; these are also separated in the MDS plot.
In the CA analysis (Supporting Information Fig.
S1a,b), the distinction between most Khoisan and the
Bantu-speaking populations is emphasized more than in
the MDS analysis, as is the distinction between the
Khoisan populations of the Kalahari (NORTHWEST,SOUTH-
CENTRAL, and CENTRAL) and the populations of the
OKAVANGO and EAST clusters (cf. Table 1 for a definition of
the clusters). The absence of genetic outliers among the
Khoisan populations of the Kalahari suggests that the
G|ui and Hoan, who are separated in the MDS, do not
differ from their Khoisan neighbors with respect to their
haplogroup composition. While strong genetic drift as
well as the small sample size might account for the dis-
tinction of the Hoan, the G|ui are characterized by high
frequencies of divergent sequence types belonging to
haplogroup L0d2 (Supporting Information Fig. S3).
The overall lack of ethnolinguistic or geographic dis-
tinctions between the populations evident in the MDS
and CA plots is confirmed by AMOVA analyses (Table 2).
These underline the considerable heterogeneity of the
maternal genepool in southern Africa, with a very high
and significant variance observed between populations,
both for the whole dataset of 26 populations (21%), as
well as for the set of 19 Khoisan populations (16.6%).
Focusing on the 19 Khoisan populations, different group-
ings were tested (Table 2). The variance between groups
is very low (3.4%) and nonsignificant when grouping by
the three language families, suggesting that simple lin-
guistic classification is not a good predictor of genetic
variation between populations. Dividing the populations
in four groups by rough geographic criteria results in a
significant between-group variance of 6.7%, but the
between-population variance is still higher (11.3%). The
highly significant between-group variance (16.7%) is
higher than that between populations (7.5%) when
grouping by the two phenotypes, that is, “Khoisan
phenotype” vs. “non-Khoisan phenotype”; phenotypic
variation therefore correlates with genetic structure.
This result is not unexpected, given that phenotypic
traits have a biological basis and are thus more likely to
be linked to populations than their linguistic affiliation
or geographic location. Nevertheless, the highest
between-group variance (19.4%, as opposed to only 3.9%
variance between populations) is found when grouping
the Khoisan populations by the clusters defined here
on geographic, linguistic and subsistence criteria
Fig. 2. Multidimensional Scaling plot based on U
ces. Population symbols indicate their linguistic affiliation, as
shown in Figure 1. Stress value: 7.97%. [Color figure can be
viewed in the online issue, which is available at]
American Journal of Physical Anthropology
(cf. Table 1), suggesting that all these factors contribute
to structuring the genetic variation in Khoisan (cf.
Schlebusch et al., 2012).
The high level of between-population variance at the
maternal level emerging from the AMOVA is an impor-
tant feature of our dataset. In fact, this value of
between-population diversity is strikingly different from
that found in other African datasets of full mtDNA
sequences (Barbieri et al., 2012; Barbieri et al., 2013b),
where the variance between distinct ethnolinguistic pop-
ulations is <2% of the total. These studies focused on
agriculturalist patrilocal societies with a social structure
that has been shown to homogenize the maternal gene
pool across different ethnolinguistic groups (Gunnarsdot-
tir et al., 2011; Barbieri et al., 2012, 2013b) in the pres-
ence of strict exogamy (Kumar et al., 2006). The
majority of Khoisan societies, however, are traditionally
foragers, and patrilocality is not the predominant sys-
tem. While the ethnographic record for the populations
included in this study is often incomplete (Barnard,
1992), uxorilocal postmarital residence is documented
for several foraging populations: it implies residence
with the bride’s band for the first years after marriage
and up to the birth of the third child, in association with
bride service that the husband has to provide for the
bride’s father (Silberbauer, 1981; Lee, 1984; Heinz, 1994;
Widlok, 1999). In addition, this extended period of stay
with the bride’s parents frequently results in permanent
settlement of the young couple with the woman’s band.
While not strictly uxorilocal, this social behavior results
in reduced female mobility in comparison to the more
common patrilocal practice, and could have influenced
the distribution of the maternal lineages through gener-
ations. Notably, Verdu et al. (2013) find a similar pattern
for Pygmy populations of Cameroon and Gabon using a
dataset of mtDNA HVR-I sequences; they also associate
this result to the less pronounced patrilocality typical of
these foraging populations. A comparison with the pater-
nal gene pool might shed further light on this hypothesis
and complete the genetic picture of a potentially sex-
biased social structure (cf. Oota et al., 2001; Gunnarsdot-
tir et al., 2011; Heyer et al., 2012).
Mitochondrial genetic drift might have further
increased the structure of the maternal genepool caused
by reduced female mobility, since most Khoisan popula-
tions traditionally led a nomadic lifestyle within a
restricted territory, where the core unit was represented
by small bands of related individuals (Barnard, 1992).
This is confirmed by the low nucleotide diversity values
found in some populations of the CENTRAL and SOUTH-
CENTRAL clusters (Table 1), like Hoan, Taa East, and
G||ana (values below 0.002), while the Bantu-speaking
sedentary agriculturalists Tonga, Mbukushu, and
Kalanga have the highest values (0.0042). Bayesian Sky-
line plots (Supporting Information Fig. S4), too, show
reduced effective population sizes in the populations of
the Kalahari area (especially for the SOUTH-CENTRAL and
CENTRAL clusters), and higher population sizes in the
Bantu speakers.
To summarize, the majority of Khoisan populations
are confirmed to be distinct in their mtDNA from their
Bantu-speaking neighbors and more generally from sub-
Saharan Africans. They are also quite heterogeneous in
their mtDNA composition, irrespective of the high fre-
quency of haplogroups L0d and L0k in several popula-
tions living in the Kalahari and in contrast to perceived
wisdom of their constituting a linguistically, culturally,
and biologically unified group: this population heteroge-
neity matches the autosomal data to a certain degree
(Pickrell et al., 2012; Schlebusch et al., 2012). The major
social factor that could have played a role in shaping
this high mtDNA diversity is the tendency for multilocal
postmarital residence patterns, with a strong uxorilocal
tradition in the first years after marriage, which charac-
terizes some of the populations. In addition, in the Kala-
hari populations in particular, low diversity values
reflect low effective population size, making it likely that
genetic drift has further increased population differen-
ces. While there is genetic structure overall, Khoisan
populations cannot be split into distinct groups; however,
TABLE 2. AMOVA analyses based on
Percentage of variance
1 Group Between pops Within pops
All 26 populations 20.99
19 Khoisan populations 16.59
11 Kalahari forager populations
Ju dialect cluster
1.87 98.13
Grouping Criteria (only Khoisan) Between groups Between pops/within groups Within pops
3 Language families (Tuu, Kx’a, and Khoe)
3.38 14.37
4 Geographic groups (West, North, Center, and East)
2 Phenotypes (“Khoisan,” “non-Khoisan”)
7 Geolinguistic clusters
-excluding Bantu 19.39
2 Groups-NW Kalahari vs. SE Kalahari
0.86 11.15
Pvalue <0.01.
Pvalue <0.05.
Taa North, Taa East, Taa West,Hoan, Ju|’hoan North, Ju|’hoan South, !Xuun, Hai||om, G|ui, G||ana, Naro.
!Xuun, Ju|’hoan North, Ju|’hoan South (see Fig. 1b).
As shown in Table 1.
West: Damara, Nama, Hai||om; North: Ju|’hoan North, !Xuun, ||Ani, Buga, ||Xo; Center: Taa East, Taa North, Taa West, Hoan,
G|ui, G||ana, Naro, Ju|’hoan South; East: Tshwa, Tcire Tcire, Shua.
NW Kalahari: Ju|’hoan South, Ju|’hoan North, !Xuun, and Hai||om. SE Kalahari: Taa North, Taa East, Taa West, Hoan, G||ana,
Shua, and Tshwa (as indicated in main text).
American Journal of Physical Anthropology
their genetic variability is best explained by the small
clusters defined here on the grounds of geographic, lin-
guistic and subsistence variation, indicating that all
these factors helped shape the maternal diversity of
Khoisan populations.
The impact of geography on mtDNA variation
and the northwestern-southeastern split
There is a significant association between U
ces and geographic distances for the 19 Khoisan popula-
tions (Mantel test, Z50.33, P50.001), indicating that
geography plays a role in shaping genetic variation at a
local scale (cf. Schlebush et al., 2013). The distribution
of sequence types as seen in networks and analyses of
haplotype sharing can provide further insights into the
geographic component of the mtDNA variation. A net-
work based on sequences belonging to haplogroup L0d1
(Fig. 3) highlights the presence of both long isolated
branches consistent with a considerable time depth and
development in isolation (cf. Barbieri et al., 2013a) as
well as common haplotypes shared between different
geographic/linguistic clusters. L0d1c1 is the most widely
represented subhaplogroup, with frequencies of 14% in
the NORTHWEST, 34% in the SOUTH-CENTRAL, and 31% in
the CENTRAL clusters. A branch of haplogroup L0d1c1 is
characterized by a haplotype shared by several clusters
Nama and one Tswana individual) surrounded by many
star-shaped pattern, with other Khoe and Bantu haplo-
types represented to a lesser extent. Out of a total of 40
haplotypes found on this branch, only nine are shared
(22.5%), with clear evidence of close ties between the
SOUTH-CENTRAL and the CENTRAL clusters.
The striking star-like pattern in L0d1c1 is consistent
with a population expansion, which is dated with the
rho statistic (Forster et al., 1996) and the calculator pro-
vided in Soares et al. (2009) to be 5,247 (62,700) years
old. An explanation for this genetically detectable expan-
sion is not obvious: the signal of expansion is restricted
to this branch of L0d1c1, which is hard to reconcile with
a demographic expansion that would have affected all of
the populations represented in this star-like cluster, and
that should thus have left a trace in several hap-
logroups. An alternative explanation for the expansion
detectable solely in L0d1c1 is positive selection. How-
ever, although there is one nonsynonymous mutation on
the branch leading to L0d1c1, this mutation is not exclu-
sive to this haplogroup; it is present eight additional
times in the entire human mtDNA phylogeny (according
to Phylotree v. 15, van Oven and Kayser, 2009), with
Fig. 3. Network of L0d1 haplotypes. The nodes are colored by geo-linguistic clusters, as shown in Table 1.
American Journal of Physical Anthropology
two events occurring within the African haplogroup L2.
It is thus not obvious why selection might have occurred
on L0d1c1.
From the heatplot of haplotypes shared between clus-
ters (Fig. 4), we can see how the majority of haplotype-
sharing is between the NORTHWEST,SOUTH-CENTRAL, and
CENTRAL clusters. CENTRAL displays the most sharing,
with 29 haplotypes (53% of 55 haplotypes) shared with
other clusters. The SOUTH-CENTRAL populations share 18
of their 44 haplotypes (41%); of these, 66% are shared
with CENTRAL as opposed to only 22% shared with NORTH-
. In contrast, NORTHWEST populations share only 23%
of their 94 haplotypes with other populations; of these
22 shared haplotypes, they share 50% with CENTRAL and
18% with SOUTH-CENTRAL. These numbers indicate a
closer connection between SOUTH-CENTRAL and CENTRAL
than between SOUTH-CENTRAL and NORTHWEST, as was
also seen in the L0d1 network (Fig. 3). Furthermore, the
NORTHWEST cluster emerges as being somewhat isolated
from the other clusters, as evidenced by the relatively
low number of haplotypes they share with others (23%),
in spite of their having the largest sample size of the
dataset (162 individuals and 94 haplotypes); this pre-
dominance of exclusive haplotypes in the NORTHWEST
cluster can also be seen in Figure 3.
Sharing is frequent between populations that belong
to the same geographic cluster (Supporting Information
Fig. S5), as expected from the positive correlation
between genetic and geographic distances emerging in
the Mantel test, which could easily derive from contact
between neighbors. However, many haplotypes are
shared more widely. For example, excluding the first
most common haplotype, which is shared only between
populations from Namibia (Himba, Herero, Damara,
Nama, and Hai||om), the second most common haplotype
is shared among the Taa, Hoan, G|ui, Naro, Shua, and
Tshwa (thus connecting SOUTH-CENTRAL and CENTRAL
with EAST), and the third most common is found in
Buga, ||Xo, Nama, Damara, Himba, and Tonga, and is
therefore found mostly in the north (with the exception
of the Nama; Supporting Information Fig. S5). This close
proximity of populations belonging to different geo-
graphic clusters also emerges from the matrix of pair-
wise genetic distances (Supporting Information Fig. S2),
where nonsignificant genetic distances at a threshold of
0.05 (without any correction) are highlighted: they occur
between populations of the same cluster but also
between populations from different clusters, such as
between Buga and ||Ani (OKAVANGO) and the !Xuun and
Hai||o m ( N ORTHWEST), who are geographically close, and
Fig. 4. Heatplot of haplotype sharing. The plot displays the amount of haplotypes shared between geo-linguistic clusters. The
most common haplotypes are at the bottom of the plot.
American Journal of Physical Anthropology
between the Bantu speakers from Botswana, especially
the Kalanga, and the EAST and OKAVANGO clusters.
Autosomal DNA data indicate a clear split between
northwestern (NW) and southeastern (SE) Kalahari
Khoisan groups that dates to roughly 30,000 years ago
(Pickrell et al., 2012; Schlebush et al. 2012). The NW
group in Pickrell et al. (2012) corresponds to the NORTH-
WEST cluster defined here while the SE group corre-
sponds roughly to our SOUTH-CENTRAL,EAST, and CENTRAL
clusters. In our data, the NW and SE Kalahari groups
each contain a total of 94 haplotypes, with a large
amount of haplotype sharing within each group (29% for
NW Kalahari, 50% for SE Kalahari); in contrast, only
seven haplotypes (7.5%) are shared between the two
groups. However, in other analyses the division of the
NW and SE Kalahari groups, based on mtDNA, is not so
clear-cut: i) an AMOVA performed with populations
grouped into NW and SE Kalahari as defined in Figure
S18 of Pickrell et al. (2012) gives a very low and non-sig-
nificant between-group variance of 0.86 (Table 2); ii) the
two groups are not separated as clearly in the MDS plot
(Fig. 2) as in the PCA plot based on the autosomal data;
iii) some populations falling into the NW Kalahari and
SE Kalahari group are not significantly differentiated
(for example the Taa West, which are not significantly
differentiated from any of the NORTHWEST populations, or
the Ju|’hoan North, which are not differentiated from
the Taa North or Taa West; cf. Supporting Information
Fig. S2); and iv) there is some sharing of haplotypes
between groups (Fig. 4).
The split between the NW and SE Kalahari popula-
tions detected in the autosomal DNA data (Pickrell
et al., 2013) was based on analyses biased towards
genetic variation specific to central Kalahari Khoisan
populations (with a PC plot based on SNPs ascertained
in a Ju|’hoan and with a tree constructed after excluding
the effect of non-Khoisan admixture). We therefore con-
structed a neighbor-joining tree based on U
using only L0d and L0k sequences (Fig. 5) for those pop-
ulations with at least 10 individuals carrying L0d and
L0k haplogroups. This separates populations of the SE
Kalahari group (Taa North, Taa East, Hoan, G||ana,
G|ui, and Tshwa) from those of the NW Kalahari group
(Ju|’hoan North, Ju|’hoan South, !Xuun, and Hai||om).
However, differences between the mtDNA sequences and
the autosomal data emerge, too: the Taa West and the
Shua, who in the autosomal analyses fall into the SE
Kalahari group, fall on the branch with the NW Kala-
hari populations in the tree based on L0d/L0k sequen-
ces. Overall, the mtDNA analyses thus suggest an initial
population divergence between the NW and SW Kalahari
groups followed by more recent contact, which was not
captured in the previous autosomal DNA studies (Pick-
rell et al., 2012; Schlebusch et al., 2012).
To investigate whether the mtDNA sequences shared
between the NW and SE Kalahari groups are compatible
with a 30,000 year old separation, we performed simula-
tions to test how long shared haplotypes are retained
after a population split (Table 3). Since new mutations
(calculated as one every 3,624 years, with the rate of
Soares et al., 2009) will eventually erase the signal of
shared haplotypes, our simulations investigated how
long shared haplotypes are retained after two popula-
tions diverge, in the absence of any further contact. The
results show that the probability of keeping shared hap-
lotypes when the populations split more than 15,000
years ago is zero. Shared haplotypes are present with a
probability >0.05 only up to 7,500 years after the split.
If we take into consideration that there are seven unique
haplotypes shared between the NW and SE Kalahari
groups, the split would have had to occur 1000–1250
years ago in the absence of subsequent migration. Our
results thus suggest that some migration and exchange
throughout the area must have taken place after the
split that was inferred with autosomal data to have hap-
pened within the last 30,000 years. Distinguishing
shared ancestry from contact is difficult with autosomal
SNP data; mtDNA analyses can thus complement such
data, as shared mtDNA genome sequences provide a
clear signal of recent contact.
Nowadays the Kalahari and surrounding areas repre-
sent the core area of settlement of the indigenous popu-
lations of southern Africa (Barnard, 1992), but the
presence of these populations in the central Kalahari
itself can only be relatively recent: this area was covered
with water until 10,000 years ago, when postglacial
conditions dried the Makgadikgadi Lake (one of the larg-
est ancient basins) and filled it with alluvial debris
(Ebert and Hitchcock, 1978; Cooke, 1979). The lake
could have represented a geographic barrier dividing
northwestern populations [currently mainly speakers of
Ju dialects (Kx’a family)] from southeastern populations
[currently speakers of Taa (Tuu family), Hoan (Kx’a
family), and Khoe languages], resulting in the signal of
genetic structure observed in the autosomal data (Pick-
rell et al., 2012; Schlebusch et al., 2012). This deep divi-
sion may also be reflected in the divergent branches in
the L0d1 network, especially in L0d1b2, which makes
up 15% of the NORTHWEST haplotypes, who in turn
Fig. 5. Neighbor Joining tree based on U
distances of L0d
and L0k sequences.
American Journal of Physical Anthropology
represent almost half of the total haplotypes of this
branch (Fig. 3). A subsequent colonization of the basin,
once it dried up, is compatible with the signal of recent
areal contact that emerges from the shared haplotype
In conclusion, geography plays a role in connecting
neighboring populations, but the effect of contact also
involves populations that are distant geographically and
linguistically. Some differences emerge between northwest-
ern and southeastern Kalahari populations, with the
NORTHWEST cluster in particular appearing distinct from the
southeastern populations. The possibility of an early diver-
gence of the NW and SE Kalahari groups, which is strongly
supported by the autosomal data, is complemented by the
added signal of recent contact emerging from mtDNA.
Thus, comparing the structure emerging from the autoso-
mal and the mtDNA data reveals a highly complex pattern
of prehistoric population movements. However, for compre-
hensive insights into the prehistoric processes that may
have had an impact on Khoisan genetic structure, data
from extant representatives of South African and Angolan
Khoisan populations are needed.
Contact among the Kalahari foragers
In the previous section, a major signal of contact and
sharing emerged between three clusters: NORTHWEST,
CENTRAL and SOUTH-CENTRAL, confirmed by the sharing of
haplotypes in the L0d1 network and in the heatplot
(Figs. 3 and 4) and in mostly low and non-significant
genetic distances between populations (Supporting Infor-
mation Fig. S2). The populations from these three clus-
ters belong to the same geographic region: the core area
of the Kalahari Basin. They also share common traits
like the “Khoisan phenotype” and a traditional way of
subsistence based on foraging. Genetically, they are
characterized by very high frequencies of mtDNA hap-
logroups L0d and L0k and a common trend for low val-
ues of nucleotide diversity associated with not so low (or
even high) values of sequence diversity (with the excep-
tion of the Hoan, who are characterized by very low
sequence diversity; Table 1). Low nucleotide diversity
values indicate reduced admixture with populations
with a different genetic composition, such as the herders
who migrated to the area 2,000 years ago, or the Bantu-
speaking agriculturalists who arrived later. Neverthe-
less, the presence of a non-Khoisan genetic component
in the autosomal data (Pickrell et al., 2012) indicates
that some admixture must have occurred, probably in
the paternal line.
The common features displayed are probably the
result of areal contact. However, this contact is not
strong enough to make these populations genetically
homogeneous (when pooled together in one group, the
between-population variance of 7% is significant, cf.
Table 2). Further evidence of potential contact can be
revealed by comparisons of linguistic and genetic rela-
tionships. For instance, speakers of Ju languages of the
Kx’a family (Fig. 1), who are settled in the northwestern
Kalahari area, are genetically undifferentiated in the
maternal line (cf. Table 2). In contrast, their linguistic
relatives the Hoan, who live in southern Botswana, dif-
fer from the Ju|’hoan North and Ju|’hoan South, but
share haplotypes with the geographically neighboring
G|ui, Taa, Naro, and the Tshwa and Shua from the EAST
cluster (Supporting Information Fig. S5); furthermore,
they are not significantly differentiated from the G||ana
(Supporting Information Fig. S2). This proximity of the
Hoan to their geographic neighbors rather than to
their linguistic relatives mirrors the results from the
autosomal data (Pickrell et al., 2012) and is in good
agreement with linguistic evidence for contact among
these populations (Traill and Nakagawa 2000,
uldemann and Loughnane, 2012).
The CENTRAL cluster includes foragers of the Kalahari
who speak a West Kalahari Khoe language: these are
the G|ui, G||ana and Naro. The Naro are genetically
closely related to both the Ju and the Taa (Supporting
Information Fig. S2), which is in agreement with autoso-
mal evidence that they are the result of admixture
between northwestern and southeastern Kalahari popu-
lations (Pickrell et al., 2012). Irrespective of their genetic
affinities with the Taa and Hoan, the G|ui and G||ana
are distinct with respect to mtDNA from other popula-
tions speaking Khoe languages. This is in good accord-
ance with the hypothesis of a language shift of the G|ui
and G||ana to the Khoe languages they speak nowadays
uldemann, 2008). There is also linguistic and histori-
cal evidence for contact between speakers of G|ui and
Taa (Traill and Nakagawa, 2000).
Summing up, similarities between Khoisan popula-
tions are particularly evident in the core area of the
Kalahari Basin, where contact has played a large role in
shaping their genetic makeup. Admixture with immi-
grants did not leave evident traces in the maternal
genetic material, in accordance with low levels of exog-
amy also emerging in the diversity values.
Khoe pastoralists and a putative
East African origin
The majority of the Khoe-speaking populations live in
peripheral areas of the Kalahari, and it has been
hypothesized that they represent the descendants of a
TABLE 3. Results of simulations
nGenerations 30 40 50 100 150 200 250 300 400 600 800
Years after split 750 1,000 1,250 2,500 3,750 5,000 6,250 7,500 10,000 15,000 20,000
5100 P0.84 0.70 0.62 0.28 0.13 0.06 0.02 0.01 0 0 0
n1.4 1.2 1.2 1.0 1.0 1.0 1.0 1.0 1.0 NA NA
51000 P1.00 0.99 0.97 0.70 0.35 0.19 0.07 0.03 0.01 0 0
n4.4 3.5 2.9 1.6 1.1 1.0 1.0 1.0 1.0 NA NA
55000 P1.00 1.00 1.00 0.92 0.66 0.34 0.16 0.08 0.01 0 0
n10.2 8.2 6.5 2.6 1.6 1.2 1.1 1.1 1.1 NA NA
510,000 P1.00 1.00 1.00 0.96 0.74 0.45 0.24 0.09 0.03 0 0
n14.0 11.1 8.9 3.5 1.8 1.3 1.1 1.1 1.0 NA NA
P(probability of retaining shared haplotypes) and n(average number of haplotypes retained) were calculated for populations with
four different effective sizes (N
American Journal of Physical Anthropology
migration of Khoe-Kwadi speakers with a herding econ-
omy (G
uldemann, 2008). The putative origin of these
Khoe-Kwadi populations is in East Africa, where live-
stock is present from 4,000 years ago (Phillipson, 2005;
Deacon and Deacon, 1999). There is some genetic evi-
dence in support of this hypothesis: the distribution of Y
chromosome haplogroup E-M293, in association with
microsatellite diversity, suggests an expansion from Tan-
zania to southern Africa that does not overlap with the
Bantu migration (Henn et al., 2008). Autosomal data
(Schlebusch et al., 2012) provides evidence of shared
ancestry between the Nama and East African Maasai,
together with the presence of the same haplotype associ-
ated with a lactase persistence allele in both popula-
tions, which supports the suggested pastoralist
character of this demographic event. Autosomal data
(Pickrell et al., 2012) also suggest a tentative link to
East Africa for the Nama as well as other Khoe popula-
tions, especially the Shua. Once the migrating pastoral-
ists reached the Kalahari, it is likely that there was
intensive exchange and sex-biased gene flow with resi-
dent foraging populations (Deacon and Deacon, 1999):
this would be reflected in a major contribution of
mtDNA haplogroups L0d and L0k to the immigrating
pastoralists, and a consequent homogenization of the for-
ager and pastoralist populations.
Can a genetic signature of the pastoralist Khoe migra-
tion be identified from the mtDNA data? A potential sig-
nature would be mtDNA haplogroups and haplotypes
shared among modern Khoe speakers if the pastoralist
migration included female migrants, since this is
assumed to have taken place not more than 2,000 years
ago. The lineages mostly shared by Khoe populations
are haplogroups L0d (present in all populations) and
L0k (present in most). These might represent retentions
from an original shared East African ancestor, which
would explain the traces of L0d in the Sandawe of Tan-
zania (Tishkoff et al., 2007), who speak a language possi-
bly related to the Khoe languages (G
uldemann and
Elderkin, 2012). However, L0d and L0k are rare outside
of southern Africa (Barbieri et al., 2013a), and are
highly characteristic of the NORTHWEST,CENTRAL, and
SOUTH-CENTRAL clusters. Thus, the presence of these line-
ages in the Khoe populations might rather be the result
of contact with local foragers.
As found for the Y chromosome, some haplogroups
might retain traces of the putative East African origin of
the Khoe, assuming that not all of these lineages were
incorporated via direct contact with Bantu-speaking
agriculturalists. A potential East African candidate is
haplogroup L5, common in East Africa and present
exclusively in the Shua and Tshwa (at 5 and 18%,
respectively); however, this is notably absent in the OKA-
VANGO and NAMA populations.
A further trace of the East African migration might be
sought in the presence of a minimal common genetic
denominator that could be interpreted as a genetic sig-
nal of shared ancestry of these populations; however, the
Khoe clusters of putative East African origin (OKAVANGO,
EAST,NAMA) harbor different proportions of non-L0d/L0k
haplogroups (Supporting Information Table). Genetic
drift and/or subsequent contact with other Khoisan pop-
ulations may have played a role in increasing this differ-
entiation. A possible exception is represented by
haplogroup L3d, which is present in Khoe-speaking indi-
viduals belonging to the EAST,NAMA, and OKAVANGO clus-
ters, and in three Hai||om and one G|ui individual (as
well as two !Xuun). However, haplogroup L3d is present
at highest frequency in NW-NAMIBIA, which comprises the
Khoe-speaking Damara and the Bantu-speaking Himba
and Herero. The L3d network (Fig. 6) shows a common
haplotype shared by 28 individuals (26 NW-NAMIBIA, one
Nama, and one Hai||om, indicated with an asterisk in the
network) and surrounded by 15 other haplotypes in a
star-shaped form, suggesting a recent expansion. The
time of this expansion is dated with the rho statistic (For-
ster et al., 1996) and the calculator provided in Soares
et al. (2009) to 1,373 years ago (6700 years), which
would have followed the arrival of the pastoralist
migrants. This haplotype stems from a motif carried by
seven Khoe-speaking individuals from various regional
clusters (indicated by an arrow), suggesting that the
ancestors of the Khoe-Kwadi speakers could have initially
carried it to the area and subsequently spread it, creating
the resulting signal of expansion. Strong female gene flow
could then have incorporated L3d lineages into the gene
pool of the ancestors of the pastoralist Himba, Herero,
and Damara (NW-NAMIBIA cluster).
Among Khoisan populations, the pastoralist Nama
show the clearest signal of ancestry with East Africa in
the autosomal data (Schlebusch et al., 2012; Pickrell
et al., 2012), which strongly contrasts with the mtDNA
results: the Nama do not harbor any characteristic East
African mtDNA lineages, and they are genetically close
to the populations from the NORTHWEST,SOUTH-CENTRAL
and CENTRAL clusters, especially to the linguistically
closely related Hai||om (Fig. 2, Supporting Information
Fig. 6. Network of L3d haplotypes. The dashed line indi-
cates a branch that has been shortened for graphic purposes.
American Journal of Physical Anthropology
Fig. S2 and S5). It is possible that high levels of contact
with local foragers in the maternal line erased any origi-
nal signal of East African maternal ancestry in the
Nama, while a signal of East African ancestry was
retained in the autosomal data, and/or that the pastoral-
ist migration was heavily male-mediated.
In summary, the variation present in the non-L0d/L0k
lineages (which are less likely to stem from contact with
autochthonous foragers) does not provide a strong
genetic link of the Khoe-speaking populations with east-
ern Africa. Haplogroup L5 might represent a relic of this
putative immigration of eastern African pastoralists, but
it is present in only two of the seven Khoe-speaking pop-
ulations most likely to have eastern African ancestry.
L3d is another genetic marker that may have been
brought to southern Africa by the Khoe-Kwadi immigra-
tion, but this signal, too, is not unequivocal. The puta-
tive genetic background carried by the maternal
ancestors of the Khoe-Kwadi may have been diluted
through gene flow from local foragers and Bantu-
speaking migrants and further been erased by drift in
some of the populations. A male-dominated migration
could also have played a role in leaving a more evident
signal of eastern African origin in the Y chromosome
(Henn et al., 2008), while the maternal genetic compo-
nent would stem from autochthonous foragers. The
hypothesized eastern African origin of the Khoe ances-
tors requires more investigation, and this line of
research would greatly benefit from the availability of
more representative samples, in particular from more
pastoralist populations of East Africa.
With this dataset of complete mtDNA genome sequen-
ces we greatly extend our knowledge about the history
and demography of Khoisan populations of southern
Africa. Most importantly, we show that they are geneti-
cally differentiated, with populations of the NORTHWEST
geolinguistic cluster somewhat isolated from other popu-
lations. However, in contrast to the deep split emerging
from previous analyses of genome-wide data, contact in
the maternal line between geographically distant popu-
lations can be shown to have taken place. This areal
contact, involving especially the populations of the cen-
tral Kalahari, has played a role in shaping their mtDNA
diversity and may have played a role in the diffusion of
common cultural and linguistic features. Furthermore,
gene flow in the maternal line is most probably the rea-
son why no strong and unambiguous signal of the
hypothesized pre-Bantu pastoralist immigration from
eastern Africa can be detected in the Khoe-speaking pop-
ulations. However, the picture presented here is limited
by our lack of comparable data from descendants of
Khoisan populations from South Africa and Angola. In
future work, analyses of the Y-chromosome will contrib-
ute to our understanding of the genetic variation of
these populations, and will complete the picture of the
socio-demographic factors (in particular, those that are
sex-biased) that have had an impact during Khoisan
This study focuses on the prehistory of populations as
reflected in their genetic variation. It does not intend to
evaluate the self-identification or cultural identity of
any group, which consist of much more than just genetic
ancestry. The authors sincerely thank all the sample
donors for their participation in this study, the govern-
ments of Botswana, Namibia, and Zambia for supporting
their research, Blesswell Kure, Justin Magabe, and
Berendt Nakwe for assistance with sample collection,
Mingkun Li for bioinformatics assistance, and Serena
Tucci, Vera Lede, Roland Schr
oder, and Anne Butthof for
assistance with sample preparation. They thank Gertrud
Boden for helpful comments on the manuscript.
Anderson CNK, Ramakrishnan U, Chan YL, Hadly EA. 2005.
Serial SimCoal: a population genetics model for data from
multiple populations and points in time. Bioinformatics 21:
Bandelt HJ, Forster P, Rohl A. 1999. Median-joining networks
for inferring intraspecific phylogenies. Mol Biol Evol 16:37–
Barbieri C, Vicente M, Rocha J, Sununguko, Mpoloka W,
Stoneking M, Pakendorf B. 2013a. Ancient Substructure in
Early mtDNA Lineages of Southern Africa. Am J Hum Genet
Barbieri C, Butthof A, Bostoen K, Pakendorf B. 2013b. Genetic
perspectives on the origin of clicks in Bantu languages from
southwestern Zambia. Eur J Hum Genet 21:430–436.
Barbieri C, Whitten M, Beyer K, Schreiber H, Li M, Pakendorf
B. 2012. Contrasting maternal and paternal histories in the
linguistic context of Burkina Faso. Mol Biol Evol 29:1213–
Barnard A. 1992. Hunters and herders of southern Africa: a
comparative ethnography of the Khoisan peoples. Cambridge,
New York: Cambridge University Press.
Barnard A. 2008. Ethnographic analogy and the reconstruction
of early Khoekhoe society. South Afr Human 20:61–75.
Batini C, Lopes J, Behar DM, Calafell F, Jorde LB, van der
Veen L, Quintana-Murci L, Spedini G, Destro-Bisol G, Comas
D. 2011. Insights into the Demographic History of African
Pygmies from Complete Mitochondrial Genomes. Mol Biol
Evol 28:1099–1110.
Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L,
Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D,
Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells SR,
Rosset S, The Genographic Consortium. 2008. The Dawn of
Human Matrilineal Diversity. Am J Hum Genet 82:1130–
Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel
U, Lalueza-Fox C, Rudan P, Brajkovic D, Kucan Z, Gu
Schmitz R, Doronichev VB, Golovanova LV, de la Rasilla M,
Fortea J, Rosas A, P
abo S. 2009. Targeted retrieval and
analysis of five Neandertal mtDNA genomes. Science 325:
Coelho, M, Sequeira F, Luiselli D, Beleza S, Rocha J. 2009. On
the edge of Bantu expansions: mtDNA, Y chromosome and
lactase persistence genetic variation in southwestern Angola.
BMC Evol Biol 9:80.
Cooke HJ. 1979. The origin of the Makgadikgadi Pans. Bots
Notes Records 11:37–42.
de Filippo C, Barbieri C, Whitten M, Mpoloka SW,
Gunnarsdottir ED, Bostoen K, Nyambe T, Beyer K, Schreiber
H, de Knijff P, Luiselli D, Stoneking M, Pakendorf B. 2011.
Y-chromosomal variation in sub-Saharan Africa: insights into
the history of Niger-Congo groups. Mol Biol Evol 28:1255–
Deacon HJ, Deacon J. 1999. Human beginnings in South Africa:
uncovering the secrets of the Stone Age. Walnut Creek, CA:
Altamira Press.
Denbow J. 1984. Prehistoric herders and foragers of the Kala-
hari: the evidence for 1500 years of interaction. In: C Schrire,
editor. Past and Present in Hunter Gatherer Studies.
Orlando: Academic Press. p 175–193.
American Journal of Physical Anthropology
Drummond AJ, Suchard MA, Xie D, Rambaut A. 2012. Bayes-
ian Phylogenetics with BEAUti and the BEAST 1.7. Mol Biol
Evol 29:1969–1973.
Ebert JI, Hitchcock RK. 1978. Ancient Lake Makgadikgadi,
Botswana: mapping, measurement and palaeoclimatic signifi-
cance. Palaeoecology Africa 10:47–56.
Fenner JN. 2005. Cross-cultural estimation of the human gen-
eration interval for use in genetics-based population diver-
gence studies. Am J Phys Anthropol 128:415–423.
Forster P, Harding R, Torroni A, Bandelt HJ. 1996. Origin and
evolution of Native American mtDNA variation: a reappraisal.
Am J Hum Genet 59:935–945.
Furrer R, Nychka D, Sain S. 2012. Fields: tools for spatial data.
R package version 6.7
uldemann T. 2004. Reconstruction through de-construction:
the marking of person, gender, and number in the Khoe fam-
ily and Kwadi. Diachronica 21:251–306.
uldemann T. 2005. Studies in Tuu (Southern Khoisan). Papers
on Africa, Languages and Literatures 23. Leipzig: Institut f
Afrikanistik, Universit
at Leipzig.
uldemann, T. 2008. A linguist’s view: Khoe-Kwadi speakers as
the earliest food-producers of southern Africa. South Afr
Human 20:93–132.
uldemann T, Elderkin ED. 2010. On external genealogical
relationships of the Khoe family. In: M Brenzinger, C K
editors. Khoisan languages and linguistics: proceedings of the
1st International Symposium January 4–8, 2003: Riezlern/
Kleinwalsertal. Quellen zur Khoisan-Forschung. K
udiger K
oppe. p 15–52.
uldemann T, Loughnane R. 2012. Are there “Khoisan” o
in body-part vocabulary? On linguistic inheritance and con-
tact in the Kalahari Basin. Language Dynamics & Change 2:
Gunnarsdottir ED, Nandineni MR, Li M, Myles S, Gil D,
Pakendorf B, Stoneking M. 2011. Larger mitochondrial DNA
than Y-chromosome differences between matrilocal and patri-
local groups from Sumatra. Nat Commun 2:228.
Heine B, Honken H. 2010. The Kx’a family: a new Khoisan
Genealogy. J Asian Afr Stud 79:5–36.
Heinz HJ. 1994. Social organization of the! K~
o Bushmen. K
R. K
Henn BM, Gignoux C, Lin AA, Oefner PJ, Shen P, Scozzari R,
Cruciani F, Tishkoff SA, Mountain JL, Underhill PA. 2008. Y-
chromosomal evidence of a pastoralist migration through Tan-
zania to southern Africa. Proc Natl Acad Sci USA 105:10693–
Henn BM, Gignoux CR, Jobin M, Granka JM, Macpherson JM,
Kidd JM, Rodriguez-Botigue L, Ramachandran S, Hon L,
Brisbin A, Lin AA, Underhill PA, Comas D, Kidd KK,
Norman PJ, Parham P, Bustamante CD, Mountain JL,
Feldman MW. 2011. Hunter-gatherer genomic diversity sug-
gests a southern African origin for modern humans. Proc
Natl Acad Sci USA 108:5154–5162.
Heyer E, Chaix R, Pavard S, Austerlitz F. 2012. Sex-specific
demographic behaviours that shape human genomic varia-
tion. Mol Ecol 21:597–612.
Jenkins T. 1986. The prehistory of the San and Khoikhoi as
recorded in their blood. In: R Vossen, K Keuthmann editors.
Contemporary Studies on Khoisan, Vol. 2. Hamburg, Helmut
Buske Verlag. p 51–77.
Kinahan J. 1991. Pastoral Nomads of the central Namib Desert:
the people history forgot. Windhoek: Namibia Archaeological
Kinahan J. 2011. From the beginning: the archaeological evi-
dence. In: M Wallace. A History of Namibia: From the Begin-
ning to 1990. London: Hurst and Company. p 15–43.
atter A, Pacher D, Sch
onherr S, Weissensteiner
H, Binna R, Specht G, Kronenberg F. 2011. HaploGrep: a fast
and reliable algorithm for automatic classification of mito-
chondrial DNA haplogroups. Hum Mutat 32:25–32.
Kumar V, Langstieh BT, Madhavi KV, Naidu VM, Singh HP,
Biswas S, Thangaraj K, Singh L, Reddy BM. 2006. Global
patterns in human mitochondrial DNA and Y-chromosome
variation caused by spatial instability of the local cultural
processes. PLoS Genet 2:e53.
Lachance J, Vernot B, Elbers CC, Ferwerda B, Froment A, Bodo
JM, Lema G, Fu W, Nyambo TB, Rebbeck TR, Zhang K, Akey
JM, Tishkoff SA. 2012. Evolutionary history and adaptation
from high-coverage whole-genome sequences of diverse African
hunter-gatherers. Cell 150:457–469.
Lee RB. 1984. The Dobe! Kung. Case studies in cultural anthro-
pology. New York: Holt, Rinehart and Winston.
Maricic T, Whitten M, P
abo S. 2010. Multiplexed DNA
sequence capture of mitochondrial genomes using PCR prod-
ucts. PLoS ONE 5:e14004–e14004.
Meyer M, Kircher M. 2010. Illumina sequencing library prepa-
ration for highly multiplexed target capture and sequencing.
Cold Spring Harbor Protoc 2010:pdb–prot5448.
Mitchell P. 2002. The Archaeology of Southern Africa. Cam-
bridge: Cambridge University Press.
Naidoo T, Schlebusch CM, Makkan H, Patel P, Mahabeer R,
Erasmus JC, Soodyall H. 2010. Development of a single base
extension method to resolve Y chromosome haplogroups in
sub-Saharan African populations. Investig Genet 1:6.
Nenadic O, Greenacre M. 2007. Correspondence analysis in R,
with two-and three-dimensional graphics: the ca package. J
Stat Software 20:1–13.
Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR,
O’Hara RB, Simpson GL, Solymos P, Stevens MRH, Wagner
H. 2012. vegan: Community Ecology Package. R package ver-
sion 2.0–5. http://CRAN. R-project. org/package5vegan.
Oota H, Settheetham-Ishida W, Tiwawech D, Ishida T,
Stoneking M. 2001. Human mtDNA and Y-chromosome varia-
tion is correlated with matrilocal versus patrilocal residence.
Nat Genet 29:20–21.
Paradis E. 2010. pegas: an R package for population genetics
with an integrated–modular approach. Bioinformatics 26:
Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylo-
genetics and evolution in R language. Bioinformatics 20:289–
Phillipson DW. 2005. African archaeology. Cambridge: Cam-
bridge University Press.
Pickrell JK, Patterson N, Barbieri C, Berthold F, Gerlach L,
Lipson M, Po-Ru L, Lachance J, G
uldemann T, Kure B, Wata
SM, Nakagawa H, Naumann C, Mountain JL, Bustamante
CD, Berger B, Stoneking M, Reich D, Pakendorf B. 2012. The
genetic prehistory of southern Africa. Nat Commun 3. doi:
Pleurdeau D, Imalwa E, Detroit F, Lesur J, Veldman A, Bahain
JJ, Marais E. 2012. "Of sheep and men": earliest direct evi-
dence of caprine domestication in southern Africa at leopard
cave (Erongo, Namibia). PLoS ONE 7:e40340.
Quintana-Murci L, Harmant C, Quach H, Balanovsky O,
Zaporozhchenko V, Bormans C, van Helden PD, Hoal EG,
Behar DM. 2010. Strong maternal Khoisan contribution to
the South African coloured population: a case of gender-
biased admixture. Am J Hum Genet 86:611–620.
Reid A, Sadr K, Hanson-James N. 1998. Herding traditions. In:
Lane P, Reid A, Segobye A, editors. Ditswa MMung: The
Archaeology of Botswana. Gaborone: Pula Press and The
Botswana Society. p 81–100.
Sadr K. 1998. The first herders at the Cape of Good Hope. Afr
Archaeol Rev 15:101–132.
Schlebusch CM, de Jongh M, Soodyall H. 2011. Different contri-
butions of ancient mitochondrial and Y-chromosomal lineages
in ’Karretjie people’ of the Great Karoo in South Africa. J
Hum Genetics 56:623–630.
Schlebusch CM, Skoglund P, Sjodin P, Gattepaille LM,
Hernandez D, Jay F, Li S, De Jongh M, Singleton A, Blum
MG, Soodyall H, Jakobsson M. 2012. Genomic variation in
seven Khoe-San groups reveals adaptation and complex Afri-
can history. Science 338:374–379.
Schlebusch CM, Lombard M, Soodyall H. 2013. MtDNA control
region variation affirms diversity and deep sub-structure in
populations from southern Africa. BMC Evol Biol 13:1–21.
American Journal of Physical Anthropology
Silberbauer GB. 1981. Hunter and habitat in the central Kala-
hari Desert. Cambridge: Cambridge University Press.
Smith AB. 1990. On becoming herders: Khoikhoi and San eth-
nicity in southern Africa. Afr Studies 49:51–73.
Soares P, Ermini L, Thomson N, Mormina M, Rito T, Rohl A,
Salas A, Oppenheimer S, Macaulay V, Richards MB. 2009.
Correcting for purifying selection: an improved human mito-
chondrial molecular clock. Am J Hum Genet 84:740–759.
Tishkoff SA, Gonder MK, Henn BM, Mortensen H, Knight A,
Gignoux C, Fernandopulle N, Lema G, Nyambo TB,
Ramakrishnan U, Reed FA, Mountain JL. 2007. History of
click-speaking Populations of Africa inferred from mtDNA and
Y chromosome genetic variation. Mol Biol Evol 24:2180–2195.
Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A,
Froment A, Hirbo JB, Awomoyi AA, Bodo JM, Doumbo O,
Ibrahim M, Juma AT, Kotze MJ, Lema G, Moore JH,
Mortensen H, Nyambo TB, Omar SA, Powell K, Pretorius GS,
Smith MW, Thera MA, Wambebe C, Weber JL, Williams SM.
2009. The genetic structure and history of Africans and Afri-
can Americans. Science 324:1035–1044.
Traill A, Nakagawa H. 2000. A historical! X
o-| Gui contact
zone: linguistic and other relations. In: Batibo H, Tsonope J,
editors. The state of Khoesan languages in Botswana. Gabor-
one: Basarwa Languages Project. p 1–17.
van Oven M, Kayser M. 2009. Updated comprehensive phyloge-
netic tree of global human mitochondrial DNA variation.
Hum Mutat 30:E386–E394.
Veeramah KR, Wegmann D, Woerner A, Mendez FL, Watkins
JC, Destro-Bisol G, Soodyall H, Louie L, Hammer MF. 2011.
An early divergence of KhoeSan ancestors from those of other
modern humans is supported by an ABC-based analysis of
autosomal re-sequencing data. Mol Biol Evol. 29:617–630.
Venables WN, Ripley BD. 2002. MASS: modern applied statis-
tics with S. New York: Springer.
Verdu P, Becker NSA, Froment A, Georges M, Grugni V,
Quintana-Murci L, Hombert JM, Van der Veen L, Le Bomin
S, Bahuchet S, Heyer E, Austerlitz F. 2013. Sociocultural
behavior, sex-biased admixture, and effective population sizes
in Central African Pygmies and Non-Pygmies. Mol Biol Evol
Weiner JS, Harris R, Harrison GA, Singer R, Jopp W. 1964.
Skin Colour in Southern Africa. Hum Biol 36:294–&.
Widlok T. 1999. Living on Mangetti: ’Bushman’ autonomy and
Namibian independence. Oxford: Oxford University Press.
American Journal of Physical Anthropology
... A total of 58 out of 87 Hessequa descendants carried mtDNA with the haplogroup L0d (Additional File 2: Table S6). L0d lineages are almost exclusively found in Khoe-San populations [60][61][62], indicating that 66.7% of the Hessequa descendants mtDNA gene pool is autochthonous to the region (Fig. 3B). L0d1b is the most prevalent lineage of L0d haplogroup in the Hessequa descendants (62.1% of the L0d lineages) and occurs commonly across all Khoe-San populations and Coloured groups from the Cape [60][61][62]. ...
... L0d lineages are almost exclusively found in Khoe-San populations [60][61][62], indicating that 66.7% of the Hessequa descendants mtDNA gene pool is autochthonous to the region (Fig. 3B). L0d1b is the most prevalent lineage of L0d haplogroup in the Hessequa descendants (62.1% of the L0d lineages) and occurs commonly across all Khoe-San populations and Coloured groups from the Cape [60][61][62]. The second most observed L0d lineage is L0d2a (31.0% of the L0d lineages). ...
... The East African-Eurasian mixed ancestry in Khoe-San groups has been reported before [2,[4][5][6]39], but the information regarding sex-biased patterns has only been assessed using uniparental markers [61,68,88,89]. Although studies on mitochondrial DNA and the nonrecombining part of the Y chromosome (NRY) can provide significant insights, these two markers are transmitted in their entirety, from parents to offspring and therefore represent single lineages (paternal for Y and maternal for mtDNA). ...
Full-text available
Background Hunter-gatherer lifestyles dominated the southern African landscape up to ~ 2000 years ago, when herding and farming groups started to arrive in the area. First, herding and livestock, likely of East African origin, appeared in southern Africa, preceding the arrival of the large-scale Bantu-speaking agro-pastoralist expansion that introduced West African-related genetic ancestry into the area. Present-day Khoekhoe-speaking Namaqua (or Nama in short) pastoralists show high proportions of East African admixture, linking the East African ancestry with Khoekhoe herders. Most other historical Khoekhoe populations have, however, disappeared over the last few centuries and their contribution to the genetic structure of present-day populations is not well understood. In our study, we analyzed genome-wide autosomal and full mitochondrial data from a population who trace their ancestry to the Khoekhoe-speaking Hessequa herders from the southern Cape region of what is now South Africa. Results We generated genome-wide data from 162 individuals and mitochondrial DNA data of a subset of 87 individuals, sampled in the Western Cape Province, South Africa, where the Hessequa population once lived. Using available comparative data from Khoe-speaking and related groups, we aligned genetic date estimates and admixture proportions to the archaeological proposed dates and routes for the arrival of the East African pastoralists in southern Africa. We identified several Afro-Asiatic-speaking pastoralist groups from Ethiopia and Tanzania who share high affinities with the East African ancestry present in southern Africa. We also found that the East African pastoralist expansion was heavily male-biased, akin to a pastoralist migration previously observed on the genetic level in ancient Europe, by which Pontic-Caspian Steppe pastoralist groups represented by the Yamnaya culture spread across the Eurasian continent during the late Neolithic/Bronze Age. Conclusion We propose that pastoralism in southern Africa arrived through male-biased migration of an East African Afro-Asiatic-related group(s) who introduced new subsistence and livestock practices to local southern African hunter-gatherers. Our results add to the understanding of historical human migration and mobility in Africa, connected to the spread of food-producing and livestock practices.
... This level of genetic differentiation is 20.2% even when the ! Xun are removed and is higher than previously observed among Bantu (5.5%; Barbieri, et al. 2014b) and "Khoisan" populations (16.6%; Barbieri, et al. 2014a). The levels of intra-population diversity are highest in the Kuvale and Himba (mean value of haplotype diversity, 0.95) and lowest in in the Kwepe (0.67), who display only five different haplotypes (Table S1). ...
... A non-metric multidimensional scaling plot (MDS) based on pairwise Φst distances reveals three main vertices of divergence (Fig. 2a): i) the !Xun from Kunene Province, who have high frequencies (97% ; Table S4) of haplogroups L0d and L0k that typically predominate in most "Khoisan" populations from southern Africa (Barbieri et al. 2014a); ii) the Tjimba and Himba, whose close genetic relationship supports the view that the two groups are merely distinguished by their socio-economic status (Warmelo 1951;Vashro and Cashdan 2015); iii) the Kwisi and Twa, whose genetic proximity is consistent with previous claims that these communities represent northern and southern branches of the same ethnic group respectively (Estermann 1976). ...
... When the genetic profiles of the populations from Namib are compared with an extended mitochondrial genome-dataset including other groups from Angola (Nyaneka-Nkhumbi, Ovimbundu, Ganguela) and the wider region of southern Africa (Fig. 6), the Kwisi and the Twa remain outliers, while the Tjimba and Himba fall close to the Herero, Himba and Damara from Namibia (see also Soodyall and Jenkins 1993;Barbieri et al. 2014a;Barbieri et al. 2014b). The Kuvale, in contrast, are more similar to other populations with high levels of maternal Bantu-"Khoisan" admixture, including the Tshua, Shua, TcireTcire and ||Ani. ...
Full-text available
Southern Angola is a poorly studied region, inhabited by populations that have been associated with different migratory movements into southern Africa. Besides the long-standing presence of indigenous Kx’a-speaking foragers and the more recent arrival of Bantu-speaking pastoralists, ethnographic and linguistic studies have suggested that other pre-Bantu communities were also present in the Namib desert, including peripatetic groups like the Kwepe (formerly Kwadi speakers), Twa and Kwisi. Here we evaluate previous peopling hypotheses by analyzing the relationships between seven groups from the Namib desert (Kuvale, Himba, Tjimba, Kwisi, Twa, Kwepe) and Kunene Province (!Xun), based on newly collected linguistic data and 295 complete mtDNA genomes. We found that: i) all groups from the Namib desert have genealogically-consistent matriclanic systems that had a strong impact on their maternal genetic structure by enhancing genetic drift and population differentiation; ii) the dominant pastoral groups represented by the Kuvale and Himba were part of a Bantu proto-population that also included the ancestors of present-day Damara and Herero peoples from Namibia; iii) Tjimba are closely related to the Himba; iv) the Kwepe, Twa and Kwisi have a divergent Bantu-related mtDNA profile and probably stem from a single population that does not show clear signs of being a pre-Bantu indigenous group. Taken together, our results suggest that the maternal genetic structure of the different groups from the Namib desert is largely derived from endogamous Bantu peoples, and that their social stratification and different subsistence patterns are not indicative of remnant groups, but reflect Bantu-internal variation and ethnogenesis.
... Genome-wide coalescent analyses suggest that ancient populations began to take structure 200 kya, which led to a rift between Khoisan and non-hunter-gatherer groups (i.e., Niger-Congo, Nilo-Saharan, Afro-Asiatic) by 160 kya, followed shortly by a split between Khoisan and RFHG groups 120-100 kya 26,27 . Mitochondrial studies have reinforced this pattern of Stone Age divergences and subsequent admixture amongst rainforest 28 , and Khoisan 29,30 hunter-gatherer groups. Much of the past 200 thousand years of human evolution has therefore been a story of population structuring and diffusion within the continent of Africa. ...
... For example, Khoisan ancestors are thought to be the outgroup to other modern humans, yet L0 (the mtDNA outgroup) is found among many African populations; however, the two subclades L0d and L0k are comprised almost entirely (82% and 83% respectively) of Khoisan-speakers. Typically, this is interpreted to reinforce a correspondence between Khoisan and L0 30,43,52 . The high population diversity of other L0 subclades may represent ancient admixture with those groups, and the specificity of Khoisan-speakers in L0d/L0k may represent the drift within this shrinking population. ...
Full-text available
Archaeological and genomic evidence suggest that modern Homo sapiens have roamed the planet for some 300–500 thousand years. In contrast, global human mitochondrial (mtDNA) diversity coalesces to one African female ancestor (“Mitochondrial Eve”) some 145 thousand years ago, owing to the ¼ gene pool size of our matrilineally inherited haploid genome. Therefore, most of human prehistory was spent in Africa where early ancestors of Southern African Khoisan and Central African rainforest hunter-gatherers (RFHGs) segregated into smaller groups. Their subdivisions followed climatic oscillations, new modes of subsistence, local adaptations, and cultural-linguistic differences, all prior to their exodus out of Africa. Seven African mtDNA haplogroups (L0–L6) traditionally captured this ancient structure—these L haplogroups have formed the backbone of the mtDNA tree for nearly two decades. Here we describe L7, an eighth haplogroup that we estimate to be ~ 100 thousand years old and which has been previously misclassified in the literature. In addition, L7 has a phylogenetic sublineage L7a*, the oldest singleton branch in the human mtDNA tree (~ 80 thousand years). We found that L7 and its sister group L5 are both low-frequency relics centered around East Africa, but in different populations (L7: Sandawe; L5: Mbuti). Although three small subclades of African foragers hint at the population origins of L5'7, the majority of subclades are divided into Afro-Asiatic and eastern Bantu groups, indicative of more recent admixture. A regular re-estimation of the entire mtDNA haplotype tree is needed to ensure correct cladistic placement of new samples in the future.
... However, it is also possible that the signals of isolation by distance reflect more recent processes after initial older divergence events. In particular, it has been suggested that Khoisan-speaking groups were initially split by the prehistoric lake Makgadigadi, with gene flow being reinitiated when the lake dried up around 10 kya (34). One group in particular that shows evidence for admixture are the Naro, who are both geographically ( Figure 1) and genetically ( Figure 2) intermediate between the northwestern/northern and southeastern/central groupings (29,31) and who also show evidence for gene flow from the G|ui and an ethnolinguistically undefined group from Xade in the Central Kalahari Game Reserve (33). ...
... Additionally, the ǂHoan, who speak a divergent language of the Kx'a family nowadays called ǂ'Amkoe, show only 5% shared ancestry with their linguistic relatives the !Xuun and the Ju|'hoan (33), while they are genetically close to the neighbouring Taa (who speak a Tuu language) and the G|ui, whose language belongs to the Khoe family (31; cf. 34,35). Distinguishing between long-term isolation by distance, vs. ...
Full-text available
Peoples speaking so-called Khoisan languages—that is, indigenous languages of southern Africa that do not belong to the Bantu family—are culturally and linguistically diverse. They comprise herders, hunter-gatherers, as well as groups of mixed modes of subsistence and their languages are classified into three distinct language families. This cultural and linguistic variation is mirrored by extensive genetic diversity. We here review the recent genomics literature and discuss the genetic evidence for a formerly wider geographic spread of peoples with Khoisan-related ancestry, for the deep divergence among populations speaking Khoisan languages overlaid by more recent gene flow among these groups, and for the impact of admixture with immigrant food-producers in their prehistory.
... Two of the Chrissie San individuals carry L0d mtDNA haplogroups, L0d2a and L0d1b. The L0d haplogroup is the major mtDNA haplogroup in Khoe-San groups (Barbieri et al. 2014;Schlebusch et al. 2013). ...
... . The mitochondrial and Y chromosome proportions in SEB have shown the interaction among the Khoe-San and Bantu-speakers to be female biased for the former and male biased for the later(44,(52)(53)(54)(55)(56)(57). While a recent study based on X-autosome ancestry comparison confirmed this trend, the extent of Khoe-San maternal bias was observed to be highly variable among the SEB groups(46). ...
Full-text available
The presence of Early and Middle Stone Age human remains and associated archaeological artefacts from various sites scattered across southern Africa, suggests this geographic region to be one of the first abodes of anatomically modern humans. Although the presence of hunter-gatherer cultures in this region dates back to deep times, the peopling of southern Africa have largely been reshaped by three major sets of migrations over the last 2000 years. These migrations have led to a confluence of four distinct ancestries (San hunter-gatherer, East African pastoralist, Bantu-speaker farmer and Eurasian) in populations from this region. In this review, we have summarized the recent insights into the refinement of timelines and routes of the migration of Bantu-speaking populations to southern Africa and their admixture with resident southern African Khoe-San populations. We highlight two recent studies hinting at the emergence of fine-scale population structure within some South-Eastern Bantu-speaker groups. We also accentuate whole genome sequencing studies (current and ancient) that have both enhanced our understanding of the peopling of southern Africa and demonstrated a huge potential for novel variant discovery in populations from this region. Finally, we identify some of the major gaps and inconsistencies in our understanding and emphasize the importance of more systematic studies of southern African populations from diverse ethnolinguistic groups and geographic locations.
... At the same time, the majority of individuals 49 from other Khoe-Kwadi-speaking populations, such as the Damara from Namibia as well as 50 populations inhabiting the Okavango delta and eastern Kalahari, are of taller stature and have darker 51 skin pigmentation (Weiner et al., 1964;Jenkins, 1986). Genetic data revealed that the Khoisan 52 populations harbor some of the earliest branching mtDNA and NRY lineages (Tishkoff et al., 2007;53 Batini et al., 2011; Rosa and Brehm, 2011; Barbieri et al., 2013bBarbieri et al., , 2014Barbieri et al., , 2016. Additionally, autosomal 54 genetic data indicate complex patterns of ancestry for most Khoisan groups, reflecting substantial 55 admixture with other groups as well as between different Khoisan groups (Pickrell et It has been shown that there are at least two admixture events in the history of Khoisan 58 populations that could have contributed to their current genetic ancestry. ...
Full-text available
Objectives We investigated the genetic history of southern African populations with a special focus on their paternal history. We reexamined previous claims that the Y-chromosome haplogroup E1b1b was brought to southern Africa by pastoralists from eastern Africa, and investigated patterns of sex-biased gene flow in southern Africa. Material and Methods We analyzed previously published complete mtDNA genome sequences and ~900 kb of NRY sequences from 23 populations from Namibia, Botswana and Zambia, as well as haplogroup frequencies from a large sample of southern African populations and 23 newly genotyped Y-linked STR loci for samples assigned to haplogroup E1b1b. Results Our results support an eastern African origin for Y-chromosome haplogroup E1b1b; however, its current distribution in southern Africa is not strongly associated with pastoralism, suggesting a more complex origin for pastoralism in this region. We confirm that the Bantu expansion had a notable genetic impact in southern Africa, and that in this region it was probably a rapid, male-dominated expansion. Furthermore, we find a significant increase in the intensity of sex-biased gene flow from north to south, which may reflect changes in the social dynamics between Khoisan and Bantu groups over time. Conclusions Our study shows that the population history of southern Africa has been very complex, with different immigrating groups mixing to different degrees with the autochthonous populations. The Bantu expansion led to heavily sex-biased admixture as a result of interactions between Khoisan females and Bantu males, with a geographic gradient which may reflect changes in the social dynamics between Khoisan and Bantu groups over time.
Full-text available
Previous studies show that the indigenous people of the southern Cape of South Africa were dramatically impacted by the arrival of European colonists starting ~400 years ago and their descendants are today mixed with Europeans and Asians. To gain insight on the occupants of the Vaalkrans Shelter located at the southernmost tip of Africa, we investigated the genetic make-up of an individual who lived there about 200 years ago. We further contextualize the genetic ancestry of this individual among prehistoric and current groups. From a hair sample excavated at the shelter, which was indirectly dated to about 200 years old, we sequenced the genome (1.01 times coverage) of a Later Stone Age individual. We analyzed the Vaalkrans genome together with genetic data from 10 ancient (pre-colonial) individuals from southern Africa spanning the last 2000 years. We show that the individual from Vaalkrans was a man who traced ~80% of his ancestry to local southern San hunter-gatherers and ~20% to a mixed East African-Eurasian source. This genetic make-up is similar to modern-day Khoekhoe individuals from the Northern Cape Province (South Africa) and Namibia, but in the southern Cape, the Vaalkrans man's descendants have likely been assimilated into mixed-ancestry "Coloured" groups. The Vaalkrans man's genome reveals that Khoekhoe pastoralist groups/individuals lived in the southern Cape as late as 200 years ago, without mixing with non-African colonists or Bantu-speaking farmers. Our findings are also consistent with the model of a Holocene pastoralist migration, originating in Eastern Africa, shaping the genomic landscape of historic and current southern African populations.
Full-text available
Recent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until about 2,000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about population history within southern Africa. We examine new genome-wide polymorphism data and whole mitochondrial genomes for more than one hundred South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share recent ancestry with other Khoe-speaking forager populations that forms a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu-speakers, approximately 14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited impact from eastern African genetic diffusion.
Full-text available
Background After three decades of mtDNA studies on human evolution the only incontrovertible main result is the African origin of all extant modern humans. In addition, a southern coastal route has been relentlessly imposed to explain the Eurasian colonization of these African pioneers. Based on the age of macrohaplogroup L3, from which all maternal Eurasian and the majority of African lineages originated, that out-of-Africa event has been dated around 60-70 kya. On the opposite side, we have proposed a northern route through Central Asia across the Levant for that expansion. Consistent with the fossil record, we have dated it around 125 kya. To help bridge differences between the molecular and fossil record ages, in this article we assess the possibility that mtDNA macrohaplogroup L3 matured in Eurasia and returned to Africa as basic L3 lineages around 70 kya. Results The coalescence ages of all Eurasian (M,N) and African L3 lineages, both around 71 kya, are not significantly different. The oldest M and N Eurasian clades are found in southeastern Asia instead near of Africa as expected by the southern route hypothesis. The split of the Y-chromosome composite DE haplogroup is very similar to the age of mtDNA L3. A Eurasian origin and back migration to Africa has been proposed for the African Y-chromosome haplogroup E. Inside Africa, frequency distributions of maternal L3 and paternal E lineages are positively correlated. This correlation is not fully explained by geographic or ethnic affinities. It seems better to be the result of a joint and global replacement of the old autochthonous male and female African lineages by the new Eurasian incomers. Conclusions These results are congruent with a model proposing an out-of-Africa of early anatomically modern humans around 125 kya. A return to Africa of Eurasian fully modern humans around 70 kya, and a second Eurasian global expansion by 60 kya. Climatic conditions and the presence of Neanderthals played key roles in these human movements.
A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods. The emphasis is on presenting practical problems and full analyses of real data sets.
G/wi society and culture have been shaped by the rugged natural environment. The volume focusses on the interrelationships, the socio-cultural system and habitat of the hunter-gatherer G/wi bushmen of the central Kalahari Desert of Botswana. Drawing on ten years of field-experience, the author sets out the foundations of G/wi society, with descriptions of their social, political and economic organisation, living patterns, subsistence technology, and seasonal adaptations. -John Sheail
David Phillipson presents an illustrated account of African prehistory, from the origins of humanity through European colonization in this revised and expanded edition of his original work. Phillipson considers Egypt and North Africa in their African context, comprehensively reviewing the archaeology of West, East, Central and Southern Africa. His book demonstrates the relevance of archaeological research to understanding contemporary Africa and stresses the continent's contribution to the cultural heritage of humankind.