Semipermeable species boundaries between Anopheles gambiae and Anopheles arabiensis: evidence from multilocus DNA sequence variation.
ABSTRACT Attempts to reconstruct the phylogenetic history of the Anopheles gambiae cryptic species complex have yielded strongly conflicting results. In particular, An. gambiae, the primary African malaria vector, is variously placed as a sister taxon to either Anopheles arabiensis or Anopheles merus. The recent divergence times for members of this complex complicate phylogenetic analysis, making it difficult to unambiguously implicate interspecific gene flow, versus retained ancestral polymorphism, as the source of conflict. Using sequences at four unlinked loci, which were determined from multiple specimens within each of five species in the complex, we found contrasting patterns of sequence divergence between the X chromosome and the autosomes. The isolation model of speciation assumes a lack of gene flow between species since their separation. This model could not be rejected for An. gambiae and An. arabiensis, although the data fit the model poorly. On the other hand, evidence from gene trees supports genetic introgression of chromosome 2 inversions between An. gambiae and An. arabiensis, and also points to more broad scale genetic exchange of autosomal sequences between this species pair. That such exchange has been relatively recent is suggested not only by the lack of fixed differences at three autosomal loci but also by the sharing of full haplotypes at two of the three loci, which is in contrast to several fixed differences and considerably deeper divergence on the X. The proposed acquisition by An. gambiae of sequences from the more arid-adapted An. arabiensis may have contributed to the spread and ecological dominance of this malaria vector.
Revista portuguesa de cardiologia: orgao oficial da Sociedade Portuguesa de Cardiologia = Portuguese journal of cardiology: an official journal of the Portuguese Society of Cardiology 10/1997; 16(9):673-6.
Semipermeable species boundaries between
Anopheles gambiae and Anopheles arabiensis:
Evidence from multilocus DNA sequence variation
N. J. Besansky†‡, J. Krzywinski†, T. Lehmann§, F. Simard¶, M. Kern†, O. Mukabayire†, D. Fontenille¶, Y. Toure ´?,
and N?F. Sagnon††
†University of Notre Dame, Center for Tropical Disease Research and Training, Department of Biological Sciences, Notre Dame, IN 46556;§Centers for
Disease Control and Prevention, Division of Parasitic Diseases, Chamblee, GA 30341;¶Laboratoire de l’Institut de Recherche pour le Developpement,
Organisation de Lutte Contre les Grandes Endemies en Afrique Centrale, BP 288 Yaounde ´, Cameroun;?Ecole Nationale de Me ´dicine et de Pharmacie,
BP 1805 Bamako, Mali; and††Centre National de Recherche et de Formation sur le Paludisme, 01 BP2208 Ouagadougou, Burkina Faso
Communicated by Fotis C. Kafatos, European Molecular Biology Laboratory, Heidelberg, Germany, July 11, 2003 (received for review March 10, 2003)
Attempts to reconstruct the phylogenetic history of the Anopheles
gambiae cryptic species complex have yielded strongly conflicting
is variously placed as a sister taxon to either Anopheles arabiensis or
Anopheles merus. The recent divergence times for members of this
complex complicate phylogenetic analysis, making it difficult to
unambiguously implicate interspecific gene flow, versus retained
ancestral polymorphism, as the source of conflict. Using sequences at
four unlinked loci, which were determined from multiple specimens
within each of five species in the complex, we found contrasting
autosomes. The isolation model of speciation assumes a lack of gene
flow between species since their separation. This model could not be
model poorly. On the other hand, evidence from gene trees supports
genetic introgression of chromosome 2 inversions betweenAn. gam-
biae and An. arabiensis, and also points to more broad scale genetic
exchange of autosomal sequences between this species pair. That
lack of fixed differences at three autosomal loci but also by the
to several fixed differences and considerably deeper divergence on
the X. The proposed acquisition by An. gambiae of sequences from
the more arid-adapted An. arabiensis may have contributed to the
spread and ecological dominance of this malaria vector.
dominant biological species concept, species are groups of actually
or potentially interbreeding natural populations that are reproduc-
tively isolated from other such groups (1). Under a strict interpre-
that it occurs at all, is inconsequential to species evolution, because
hybrids are presumed less fit. Yet recent studies of Drosophila and
Anopheles species (e.g., refs. 2–5), suggest that introgression is not
necessarily rare or inconsequential; indeed, it may be advantageous
(2). This alternative viewpoint is not new. In a comprehensive
treatment of the systematics of mosquito fauna of the South Pacific,
published just one year before Mayr’s Animal Species and Evolution
(1), Belkin (ref. 6, p. 60) hypothesized that hybridization was an
important mechanism in the speciation, the formation of new types,
and the evolution of mosquitoes. Evidence for the importance of
introgressive hybridization in mosquito evolution comes from a
group of sibling species, the Anopheles gambiae complex.
This complex, considered a single polytypic species before 1956
(7), is now known to consist of at least seven partially intersterile
cryptic species found in Africa south of the Sahara (8, 9). Three
halophilic species are minor malaria vectors because of limited
density and distribution: Anopheles melas and Anopheles merus,
species extending inland in South Africa), and Anopheles bwambae,
he role of introgressive hybridization in evolution remains
contentious, especially among zoologists. As defined by the
from mineral springs in the Semliki Forest of Uganda. Anopheles
quadriannulatus species A and B, freshwater species patchily dis-
tributed in the south and east, respectively, pose no threat to public
where An. quadriannulatus A contacts An. merus in South Africa,
but their ranges overlap those of An. gambiae and Anopheles
arabiensis. The latter two species, both widespread and extensively
sympatric, are major malaria vectors. These two species also were
the subject of recent controversy, because conflicting evidence
supported either An. arabiensis or An. merus because the true sister
taxon to An. gambiae (2, 10, 11).
At the center of the conflict, the X chromosome is the source of
fixed chromosome inversion and sequence differences by which
these isomorphic taxa are recognized. Inversion Xag, a paracentric
An. gambiae and An. merus from other species, and identifies them
as sister taxa (8). (An. arabiensis possesses an independent auta-
pomorphic compound inversion on the X.) Also X-linked but
outside of the inversion in pericentromeric heterochromatin, inter-
genic spacer sequences of ribosomal DNA distinguish each species
An. arabiensis is allied with An. gambiae (2). Resolution of the
conflict requires invoking either lineage sorting or introgression of
X chromosome sequences between An. gambiae, and either An.
merus or An. arabiensis. In nature, important premating isolating
mechanisms exist between sympatric populations of these species,
Nevertheless, postmating isolating mechanisms are incomplete,
because F1 hybrid females are fertile and potentially broker inter-
specific gene exchange. The An. gambiae–An. arabiensis relation-
ship inferred from other markers, especially the paraphyletic,
intertwining mtDNA (2, 14), has been interpreted as reflecting the
flow of nuclear and mtDNA sequences across species boundaries,
rather than the persistence of ancestral alleles.
There is no defined hybrid zone between An. gambiae and An.
arabiensis; their ranges coincide even at a microgeographic scale.
An alternative approach to studying the role of introgressive
hybridization in the history of these species is to examine patterns
of genetic variation at multiple loci within and between sibling
species (e.g., refs. 5 and 15). Previously (14, 16), we found extensive
mtDNA haplotype sharing across Africa between An. gambiae and
An. arabiensis. Here, by using the same and additional population
samples, as well as three additional sibling species (An. merus, An.
melas, and An. quadriannulatus), we extend this approach to four
loci representing each chromosome (2N ? 6). These loci include
the white gene on the X, the tryptophan oxygenase (tox) gene on
database (accession nos. AY117771–AY118063).
‡To whom correspondence should be addressed. E-mail: email@example.com.
© 2003 by The National Academy of Sciences of the USA
September 16, 2003 ?
vol. 100 ?
chromosome 2, and two genes on the left and right arms of
chromosome 3: glucose-6-phosphate dehydrogenase (G6pd) and
xanthine dehdrogenase (xdh), respectively. The white gene, located
tox gene, located within inversion 2Rb common to An. gambiae and
An. arabiensis, were previously studied in Mali, West Africa (17).
Not previously studied in the An. gambiae complex, the G6pd and
xdh genes are located outside of any chromosomal inversions. We
analyze these sets of sequences within population genetic and
phylogenetic frameworks, and test whether the data are consistent
with a model of reproductively isolated species sharing ancestral
polymorphism, or whether introgression must be invoked.
Materials and Methods
Mosquito Sampling. Table 1 lists the geographic origins and desig-
nations for all specimens. Anophelines were identified as members
of the An. gambiae complex by using morphological keys (18), and
species identification was performed with a ribosomal DNA-PCR
assay (19). With the exception of the Malian specimens, the
karyotype with respect to inversion 2Rb could not be determined,
as ovarian nurse cell polytene chromosomes were unavailable. The
South Africa are derived from samples previously analyzed for
mtDNA sequence variation (14); specimens from Mali were char-
acterized previously at white and tox (17). Further sampling details
can be obtained from the authors.
DNA Methods. DNA was extracted from individual specimens
according to protocol 48 (20). Standard procedures were used for
amplification and sequencing of white, tox, G6pd, and xdh (17, 21);
detailed protocols and oligonucleotides used are available from the
authors. Sequences from the guanylate cyclase (gua) and mtDNA
ND5 genes were obtained from GenBank (accession nos. U42609–
U42614 and U42619-U42621; ref. 10) and (AF020983–AF020987,
AF020991–AF020993, AF020998–AF020999, AF021004–
AF021008, and AF021019–AF021025; ref. 14).
Because male anophelines are hemizygous for X-linked genes,
white PCR products from males (available only from Kenya and
(Promega), and were directly cycle sequenced. All other PCR
products were cloned by using pGEM-T Easy Vector (Promega)
before sequencing. Sequences are available under GenBank acces-
AY117993–AY118033 (G6pd), and AY118034–AY118063 (xdh).
FRAGMENT ASSEMBLY programs of the Genetics Computer Group
[(GCG), Madison, WI] software package, or by using SEQUENCE
alignment was performed with the PILEUP program of the GCG.
PCR Error. In this article, direct sequencing was not possible in most
the white locus, the lack of males from most samples. To adjust for
the effect of PCR error on diversity indices and statistics relying on
these indices, we used a statistical approach (by using a program
written in SAS language by T.L., which is available from the
authors). The original set of aligned sequences (minus those ob-
tained by direct sequencing) was used to generate multiple error-
was corrected to the level expected without polymerase error.
Adjustments to the original dataset were conditioned on the prob-
ability that a given base in a column of aligned sequences was
misincorporated, assuming an overall error rate of 0.15% per base
(estimated by sequencing five clones from each of three field
specimens). The conditional probability of PCR error at each site
was calculated based on (i) whether the position was invariant,
singleton, or polymorphic; (ii) the frequency of the base at that
position; and (iii) the expected number of invariant, singleton, and
polymorphic sites in the original data. Further, the algorithm
considered the markedly different probabilities of observing a
particular misincorporated nucleotide given an inferred source
nucleotide. The performance of the algorithm (to be described in
detail elsewhere) was evaluated in two pairs of experimental data-
sets, where one set contained PCR errors and the matching set was
empirically corrected (by direct sequencing or by sequencing mul-
of the bias in the polymorphism statistics (F.S., M. Licht, and T.L.,
unpublished data). Here, we analyzed each of six replicate error-
adjusted datasets per locus, and compared the results to those
obtained from the original data. With the exception of tests of the
isolation model of speciation (see below), all trends statistically
supported in the original dataset were supported in the error-
Data Analysis. DNA polymorphism statistics, tests of neutrality, and
measuresofdivergencewerecomputedwithDNASP 3.5(22).SITES 1.1,
which can be accessed at: http:??lifesci.rutgers.edu??heylab, was
used to estimate the recombination parameter, ?, and to generate
data for input into HKA and WH, programs distributed by J. Hey
(Rutgers University, Piscataway, NJ). HKA, which implements the
method described in ref. 23, was used to test for neutral molecular
evolution. WH, which implements methods described in refs. 24 and
25, was used to fit an isolation model of speciation. This model
assumes that the two species from which data have been sampled
arose from a single ancestral species T generations ago, without
observed numbers of shared, fixed, and exclusive polymorphic sites
summed across multiple loci, under the assumption of constant
effective population sizes and neutral mutation, the program re-
turns estimates of ? and T that most closely fit the data. The quality
of fit of the data to this model was assessed by comparing ?2and
WWH (Wang–Wakeley–Hey) test statistic values (15, 25) to a
distribution of values generated by 10,000 computer simulations of
the coalescent process based on parameters estimated from the
The genetic structure of An. gambiae and An. arabiensis popu-
lations was examined by a hierarchical analysis of molecular vari-
ance, and by population pairwise FSTstatistics, with each being
2.0 (26). Both analyses were performed on data in which gapped
with zero weight. Because there was no significant difference in the
outcome, data reported are from the latter treatment.
Genealogical relationships among sequences were estimated by
using maximum parsimony as implemented in PAUP* 4.d65 (27), by
Table 1. Source of An. gambiae sensu lato specimens
No. of specimens
gam, An. gambiae; ara, An. arabiensis; mer, An. merus; qua, An. quadri-
annulatus; mel, An. melas.
Besansky et al.
September 16, 2003 ?
vol. 100 ?
no. 19 ?
using heuristic searches and TBR branch swapping. Gaps were
treated as missing data, but were scored as additional characters in
the tox analysis. Analyses were done by stepwise random addition
of taxa with 1,000 replications. Bootstrapping was performed over
100–500 replicates, each with 10 random additions of sequences.
Significance of character incongruence was evaluated by imple-
menting the incongruence length difference (ILD) test in PAUP*,
by using 10,000 replicates, each with 10 random additions of
DNA Polymorphism. SamplesofAn.gambiaeandAn.arabiensiswere
collected concurrently from villages in East Africa (Kenya) and
West Africa (Mali, Senegal) up to 6,000 km apart, to maximize the
level of diversity represented. Sampling of An. merus, An. quadri-
annulatus, and An. melas was more limited in scope (Table 1).
Wherever possible, sequences at each locus were determined from
the same specimens. Physical locations of the four loci (white, tox,
G6pd, and xdh) with respect to chromosome arm, relevant para-
centric inversions, and previously studied loci are indicated in Fig.
1, which is published as supporting information on the PNAS web
site, www.pnas.org. Multiple sequence alignments for each locus
from five species (which are available in the PopSet database of
GenBank) revealed substantial insertion?deletion variation as well
as base pair substitutions. A summary of sequence variation and
other basic statistics is given in Table 5, which is published as
error averaged 2.6% and 1.9%, respectively, whereas values for the
X-linked white locus were lower by at least half. These two species
are both polymorphic across their ranges for the cytologically
identical chromosomal inversion 2Rb, which includes the tox locus.
The total tox samples of An. gambiae and An. arabiensis included
individuals of unknown karyotype. For the An. gambiae and An.
arabiensis specimens from Mali with known karyotype, statistics
were computed separately for subsamples containing the ho-
mokaryotypic inverted (2Rb?b) and homokaryotypic standard (un-
inverted: 2R?b??b) arrangements. No apparent difference in poly-
morphism level between alternative arrangements was noted.
Natural Selection in An. gambiae and An. arabiensis. The pattern of
sequence variation was consistent with neutral evolution at all loci
except for that of white. Tajima’s D, a statistic that measures the
difference between two estimators of 4N? (? and ?), is expected to
be zero under a neutral model with constant population size (28).
The D statistic was not significantly different from zero except for
negative, indicating an excess of low frequency mutations. In
principle, this result could be due to a population bottleneck
a locus nearby. Because the results of the Tajima test appeared to
be locus-specific rather than genome-wide; they are more consis-
tent with purifying selection at or near white than recent demo-
graphic expansions in these species. Nevertheless, a second test
(HKA) was unable to detect departures from neutral evolution.
The HKA test is premised on a prediction of the neutral model
of molecular evolution, that levels of interspecific divergence and
intraspecific polymorphism should be correlated (23). HKA testing
was conducted by using multiple sequences per species pair at each
of six loci: four from this article, mtDNA ND5 (14), and X-linked
guanylate cyclase (10). Tests were carried out both on the original
and PCR error-adjusted data. The significance of the test statistic
was evaluated against a distribution generated by 10,000 coalescent
simulations, by using parameters estimated from the data. In no
instance was there a significant deviation from neutral expectation:
An. gambiae–An. merus, ?2? 3.23, ?95% (?2adj? 4.47, ?84%) of
?2? 7.69, ?29% (?2adj? 11.02, ?14%) of simulated values.
Interspecific Divergence. Locus-by-locus estimates of differentiation
between An. gambiae and other species in the complex are given in
Table 2. In the absence of gene flow between species, an equilib-
rium-neutral model predicts that interspecific differentiation will
increase with time since speciation. The most closely related species
pair should show the smallest net divergence (Da), the fewest fixed
adjusting for X versus autosome linkage). Based on the shared Xag
inversion, the most closely related species pair is expected to be An.
gambiae and An. merus. Although species could not be ranked
consistently according to levels of divergence from An. gambiae,
some striking trends emerged that proved robust to PCR error. For
each species pair, the highest levels of net divergence and fixed
differences were measured at white, whereas those at xdh were
notably lower. In general, these measures were usually higher
between An. gambiae and halophilic (An. melas and An. merus)
rather than freshwater (An. arabiensis and An. quadriannulatus)
species. In particular, divergence and fixed differences between An.
gambiae and An. merus were always higher, usually much higher,
than between An. gambiae and An. arabiensis, which was contrary to
expectation. It is especially interesting to note that this trend held
even at the white locus, within the Xag inversion. This trend is in
contrast to the gua locus within Xag, at which An. gambiae and An.
merus had the most shared polymorphisms and relatively few fixed
differences (10), suggesting a complex history for the Xag inversion.
Neither introgression nor ancestral polymorphism alone can ac-
count for these results. The data exclude neither possibility, and it
is likely that both have left their signature across loci and between
species. Other evolutionary forces, possibly selection, may also be
An interesting pattern was also revealed at the tox locus, within
the 2Rb inversion shared by An. gambiae and An. arabiensis. FST
values indicated significant differentiation between all species pairs
at all loci, except for tox between An. gambiae and An. arabiensis.
The apparent differentiation at tox was because of alternative
chromosomal arrangements, not because of interspecific differ-
ences (Table 2). When samples were partitioned by inverted
arrangement, only interspecific (and even intraspecific) compari-
arrangements were significant. As proposed by Mukabayire et al.
(17), this finding suggests a common origin for the 2Rb inversions
in both species. However, the sharing of this 2Rb arrangement
between species cannot explain the sharing of so many polymor-
phisms at tox, of which 12 derive from uninverted An. gambiae and
inverted An. arabiensis arrangements.
Beyond merely sharing polymorphic sites at these four loci, An.
gambiae and An. arabiensis share identical haplotypes at both G6pd
and tox loci across Africa. For G6pd, this sharing includes An.
arabiensis haplotypes from Kenya, South Africa, and Senegal that
are identical to An. gambiae haplotypes from Senegal (A?A7?10 ?
G?17385; A?SA4 ? A?17376 ? G?17390). For tox, this sharing
includes An. gambiae haplotypes from Kenya and Senegal that are
identical to An. arabiensis haplotypes from Senegal (G?A4?11 ?
A?14145; G?14817 ? A?1536 ? A?14153), and if gapped alignment
positions are completely excluded from consideration, An. gambiae
and An. arabiensis haplotypes from Mali (G?S?b6 ? A?b93). Given
unlikely that entire haplotypes would be preserved intact between
species since their splitting, although this event has not been dated
and is presumed relatively recent.
The interspecific sharing of entire haplotypes at freely recom-
bining loci is consistent with secondary contact and introgressive
hybridization, rather than with retained ancestral polymorphism.
On the other hand, because any instance of introgression is local,
finding introgressed haplotypes distributed across 6,000 km of
www.pnas.org?cgi?doi?10.1073?pnas.1434337100Besansky et al.
Africa is unexpected, even for a flying insect. To shed light on this
and between An. gambiae and An. arabiensis at each locus by
performing a hierarchical analysis of molecular variance (Table 3).
For this analysis, populations were defined as species-specific
groups were defined as species (and alternative 2R arrangements,
for tox). At each locus except for white, most variation was distrib-
uted within populations, with nearly all of the remaining variation
only locus for which the fixation index FCT was significant by
2), a significant amount of variation was distributed between
to the paradox of broadly distributed shared haplotypes, it is
noteworthy that for each of the four loci, the amount of variation
distributed among populations within species was either insignifi-
cant (white, tox, 2R arrangements, and xdh) or very minor (G6pd).
This finding is consistent with previous mtDNA and microsatellite
studies that have suggested shallow population structure (14, 29,
30). Without well defined geographic population structure, local-
ized introgression events cannot readily be perceived as such (16).
Gene Trees. Traditional phylogenetic methods that produce bifur-
cating gene trees cannot accurately portray genealogical relation-
ships at the intraspecific level, due to recombination. However, our
purpose here was to assess whether sequences from a given species
are monophyletic, and to examine interspecific relationships. Fig. 2,
which is published as supporting information on the PNAS web site,
shows maximum parsimony trees constructed from sequences at
each locus from five members of the An. gambiae complex. In
with the notable exception being those from An. gambiae and An.
arabiensis at autosomal loci. In the tox, G6pd, and xdh gene trees,
intertwine, which is reminiscent of previous findings based on
mtDNA and esterase gene sequences (2, 14). By contrast, the white
gene tree is congruent with the traditional chromosomal inversion-
based phylogeny of the group (8), as well as that of another gene
(guanylate cyclase) located inside the Xag inversion (10). In this case,
monophyletic An. gambiae and An. merus sequences cluster as sister
taxa. Attempts to root the tree with the white gene sequence from
the closely related species Anopheles christyi were unsuccessful,
because the intron was too divergent in length and sequence to
permit reliable alignment to the ingroup sequences. Although our
trees are unrooted, the conclusion of (An. gambiae ? An. merus) is
supported by a previous phylogenetic analysis of mtDNA that
included An. christyi as an outgroup (31). The root could not be
is that it did not lie in the clade consisting of An. gambiae or An.
merus (31). Incongruence between trees constructed from the white
and autosomal gene sequences is not a trivial result of weak
phylogenetic signal, as suggested by significant ILD tests (P ? 0.001
in all cases).
Shared Variation and the Isolation Model of Speciation. At issue is
whether the amount of shared nucleotide variation between An.
gambiae and An. arabiensis is consistent with reproductive isolation
since species splitting (the null hypothesis), or whether secondary
contact and introgressive hybridization must be invoked. Put an-
other way, do the contrasting levels of shared polymorphisms and
fixed differences found across multiple loci fall within a range that
could be expected of isolated populations evolving neutrally, given
the stochastic nature of lineage sorting? This problem was ap-
proached by fitting the data from six loci (four from this article, gua,
An. gambiae and An. arabiensis are assumed to have arisen from a
common ancestral species at some point in the past, without
Table 2. Differentiation between An. gambiae and other species in the An. gambiae complex
white (X: 2A)
tox (2: 12E)
xdh (3: 29C)
FST, P value
FST, P value
FST, P value
FST, P value
gam 2Rb?b–ara 2Rb?b
gam 2R?b??b–ara 2Rb?b
gam 2R?b??b–gam 2Rb?b
Dxy, Davalues have been multiplied by 100; shared polymorphisms are reported as observed values and, in parentheses, the number expected by recurrent mutation (35). NS, not significant.
Besansky et al.
September 16, 2003 ?
vol. 100 ?
no. 19 ?
subsequent gene flow between them. Four tests were carried out on
different datasets: (i) the original data; (ii) the original data cor-
rected for chance occurrence of shared polymorphisms, as in Table
2; (iii) PCR error-adjusted data; and (iv) PCR error-adjusted data
corrected for chance occurrence of shared polymorphisms. Esti-
mates of the population mutation rate parameter for An. gambiae,
An. arabiensis and the ancestral species (?gam, ?ara, and ?A, respec-
tively), and time of separation derived under this model for all four
tests are given in Table 4, together with statistics for goodness of fit.
The relatively large value of ?A, a consequence of accommodating
large numbers of fixed differences as well as shared polymorphisms
generally do not allow rejection of the model. The ?2statistic
measures departures from the expected numbers in four categories
of sites: (i) polymorphisms exclusive to species 1; (ii) those exclusive
to species 2; (iii) shared polymorphisms; and (iv) fixed differences.
The WWH statistic measures the variance in the number of fixed
differences and shared polymorphisms. That only tests 1 and 2 (not
by the ?2statistic, suggests that PCR error was responsible, by
inflating the number of exclusive polymorphisms. There may be at
least three reasons, not mutually exclusive, for failure to reject the
model in the other tests. First is the fact that most loci included had
large amounts of shared polymorphism; only two (white and gua)
had fixed differences. Second, the isolation model relies strongly on
two assumptions: neutral evolution, possibly violated at white; and
constant effective population size, also likely violated (32). Third,
the simulation results are sensitive to the amount of recombination,
which is underestimated by ? (33). This result raises the variance
around the simulated numbers of exclusive, shared, and fixed sites,
broadening the distribution of test statistics of the fit of the model
to the data and making the test more conservative (WH documen-
tation; which can be accessed at: http:??lifesci.rutgers.edu?
?heylab). With this conclusion in mind, the results of test 4,
marginally significant by the ?2statistic, are provocative.
If An. gambiae and An. arabiensis were species with deeply struc-
for interspecific gene flow by comparing levels of sequence diver-
gence in sympatric and allopatric samples. This was the premise
underlying the design of a previous mtDNA study of the same
samples (14), where it was predicted that within-species differen-
tiation between samples from Kenya and Senegal might exceed
interspecific differentiation in sympatric samples from those lo-
cales. However, that and other studies (32) using mtDNA and
microsatellites showed surprisingly shallow population structure
but probably also due to relatively recent population expansions in
response to human population expansion and environmental al-
terations. Moreover, An. gambiae and An. arabiensis were found to
across 6,000 km of Africa. This result was interpreted in terms of
mtDNA gene flow across species boundaries, but incomplete
sorting of ancestral polymorphism could not be ruled out without
evidence from additional loci.
This article used a multilocus approach to explore patterns of
DNA polymorphism and divergence at four nuclear genes within
and among five species in the An. gambiae complex. These data
present a compelling argument for introgressive hybridization
between An. gambiae and An. arabiensis. Taken together, they
reveal genomes that are mosaics with respect to gene flow, as
described previously for Drosophila pseudoobscura and its close
relatives (ref. 5 and references therein). In contrast to shared
polymorphisms and even full haplotype sharing of mtDNA as well
as autosomal sequences, X-linked sequences showed fixed differ-
ences and relatively deep divergences when compared between
these species. It has been argued that radiation of the An. gambiae
complex has been quite recent, but a notable feature of gene trees
reconstructed from mtDNA as well as nuclear genes is monophyly
of sequences from taxa other than An. gambiae and An. arabiensis.
Thus, retention of ancestral polymorphism is not generally ob-
served between species in the complex, other than between An.
gambiae and An. arabiensis. In mtDNA gene trees, as well as trees
from the autosomal genes tox, G6pd, and xdh, sequences from this
and refs. 2, 3, and 31), which was in contrast to the sister taxa
relationship between An. gambiae and An. merus specified by genes
Table 3. Hierarchical AMOVA for An. gambiae and An. arabiensis
Source of variation
white tox: (gam–ara)
Among populations†within groups
NS, not significant;*, P ? 0.05;**, P ? 0.01;***, P ? 0.001.
*Groups are defined as species (gam–ara) or as 2R arrangements (2R?b??b–2Rb?b).
†Populations are defined as species-specific samples from Kenya, Senegal, South Africa, Burkina Faso, and Mali; or as 2R arrangements
partitioned by species, as applicable.
Table 4. Fitting An. gambiae and An. arabiensis to the isolation model of speciation
Parameter estimates and their 99% confidence intervals are given. Time of and separation (T) is measured in
for PCR error without and with correction for chance sharing of polymorphisms.
www.pnas.org?cgi?doi?10.1073?pnas.1434337100Besansky et al.
inside inversion Xag. These data are consistent with a model in
which there are semipermeable barriers to gene flow between An.
gambiae and An. arabiensis, with Xag-linked sequences barred from
exchange in the face of introgression at tox, xdh, and possibly G6pd.
of Wang et al. (25) was not rejected, but the strong dependence of
the model on both recombination rates and neutral evolution may
have compromised its applicability here. These limitations point to
the need for more powerful analytical approaches for testing an
isolation model of speciation.
If An. gambiae and An. merus are indeed sister taxa, as suggested
by (i) sequences within the Xag inversion, and (ii) autosomal
sequences within the uninverted arrangement of 2La, and (iii) the
degree to which other genome regions support the conflicting
relationship of An. gambiae plus An. arabiensis and by inference,
introgressive hybridization, is striking. Consider the list of such
regions: mtDNA, chromosome 2 inversions 2Rb, 2La, and se-
quences within or tightly linked to these, genes and microsatellite
sequences on chromosome 3, and X chromosome genes and
microsatellite sequences outside of the Xag inversion (this study,
refs. 2, 3, 17, and 31, and A. Wang, C. della Torre, C. Schwager, G.
unpublished data). This situation is sobering in terms of its impli-
cations for phylogeny reconstruction. Without the relevant back-
increasingly confident in the ‘‘wrong’’ answer as more loci were
sequenced. The risk of being either misled or of coming to the
‘‘right’’ conclusion by lucky accident would surely increase by
species, it is only through a careful locus-by-locus assessment of
sequence divergence that history can be deciphered.
The apparent mixing of An. gambiae and An. arabiensis genomes
also raises the fundamentally important question of what propor-
Despite some mixing, the species have not fused, nor would any
vector biologist familiar with these species in Africa be prepared to
question or revoke their status as good species, owing to charac-
teristically different behaviors. Given their extensive sympatry in
the narrow sense, that hybrids are found so rarely (despite intense
sampling efforts) is testament to strong premating isolation barri-
ers. There are fixed ribosomal DNA and chromosomal inversion
differences on the X, and polymorphic inversions on chromosome
2 (e.g., 2Rd, 2Rd?, and 2Ra) that are exclusive to one species. Those
regions of the genome that resist introgression (34) are expected to
include the genes directly involved in speciation. The availability of
the complete An. gambiae genome sequence should help guide the
discovery these ‘‘speciation genes.’’
Whereas genes involved in reproductive isolation may be pro-
tected against gene flow, this is not necessarily true for genes that
could have an important bearing on malaria vectorial capacity.
that increase the probability of contact between humans and
vectors, such as preference for human bloodmeals, indoor resting
behavior, oviposition in sites created by human activity, and toler-
ance of aridity which would allow range expansion into formerly
inhospitable environments. With respect to the latter trait, over-
whelming evidence supports the role of inversions 2Rb and 2La in
conferring tolerance to arid climatic conditions. In West Africa,
frequencies of these inversions are strongly correlated with humid-
ity and rainfall, both seasonally and spatially. Thus, the transitions
from rainy to dry season within a locale, or from rainforest locales
in the south to Sahel in the north, are both accompanied by
As this and other studies have shown, these inversions are identical
they have most likely been captured by An. gambiae from An.
arabiensis (11). Acquisition of these inversions is thought to have
conferred upon An. gambiae the ecological and genetic flexibility
that allowed it to invade the savannas beyond its ancestral range in
the rainforest, thereby helping to establish this species as the
foremost malaria vector (11). That semipermeable barriers to gene
flow remain between An. gambiae and An. arabiensis is both an
invitation to the study of speciation, and a threat to controlling the
vectorial potential of these disease vectors.
comments on the manuscript; Thomas Fahey, Jane Carter, and Cristina
Rafferty for technical assistance; and F. Collins, L. Braack, S. Koenraadt,
and W. Takken for samples. This work was supported by National Institutes
of Health Grant R01 AI44003 (to N.J.B.) and the United Nations Devel-
opment Program?World Bank?World Health Organization Special Pro-
gram for Research and Training in Tropical Diseases.
1. Mayr, E. (1963) Animal Species and Evolution (Harvard Univ. Press, Cambridge,
2. Besansky, N. J., Powell, J. R., Caccone, A., Hamm, D. M., Scott, J. A. & Collins,
F. H. (1994) Proc. Natl. Acad. Sci. USA 91, 6885–6888.
3. Caccone, A., Min, G. S. & Powell, J. R. (1998) Genetics 150, 807–814.
4. Walton, C., Handley, J. M., Collins, F. H., Baimai, V., Harbach, R. E., Deesin,
V. & Butlin, R. K. (2001) Mol. Ecol. 10, 569–580.
5. Machado, C. A., Kliman, R. M., Markert, J. A. & Hey, J. (2002) Mol. Biol. Evol.
6. Belkin, J. N. (1962) Mosquitoes of the South Pacific (Univ. of California Press,
7. Davidson, G. (1956) Nature 178, 861–863.
8. Coluzzi, M., Sabatini, A., Petrarca, V. & Di Deco, M. A. (1979) Trans. R. Soc.
Trop. Med. Hyg. 73, 483–497.
9. Hunt, R. H., Coetzee, M. & Fettene, M. (1998) Trans. R. Soc. Trop. Med. Hyg.
10. Garcia, B. A., Caccone, A., Mathiopoulos, K. D. & Powell, J. R. (1996) Genetics 143,
11. Powell, J. R., Petrarca, V., della Torre, A., Caccone, A. & Coluzzi, M. (1999)
Parassitologia (Rome) 41, 101–113.
12. Temu, E. A., Hunt, R. H., Coetzee, M., Minjas, J. N. & Shiff, C. J. (1997) Ann.
Trop. Med. Parasitol. 91, 963–965.
13. Toure, Y. T., Petrarca, V., Traore, S. F., Coulibaly, A., Maiga, H. M., Sankare, O.,
Sow, M., DiDeco, M. A. & Coluzzi, M. (1998) Parassitologia (Rome) 40, 477–511.
14. Besansky, N. J., Lehmann, T., Fahey, G. T., Fontenille, D., Braack, L. E., Hawley,
W. A. & Collins, F. H. (1997) Genetics 147, 1817–1828.
15. Kliman, R. M., Andolfatto, P., Coyne, J. A., Depaulis, F., Kreitman, M., Berry,
A. J., McCarter, J., Wakeley, J. & Hey, J. (2000) Genetics 156, 1913–1931.
16. Donnelly, M. J., Pinto, J., Girod, R., Besansky, N. J. & Lehmann, T. (2003)
Heredity, in press.
17. Mukabayire, O., Caridi, J., Wang, X., Toure, Y. T., Coluzzi, M. & Besansky, N. J.
(2001) Insect Mol. Biol. 10, 33–46.
18. Gillies, M. T. & De Meillon, B. (1968) The Anophelinae of Africa South of the
Sahara (South African Institute for Medical Research, Johannesburg).
19. Scott, J. A., Brogdon, W. G. & Collins, F. H. (1993) Am. J. Trop. Med. Hyg. 49,
20. Ashburner, M. (1989) Drosophila: A Laboratory Manual (Cold Spring Harbor
Lab. Press, Plainview, NY).
21. Krzywinski, J., Wilkerson, R. C. & Besansky, N. J. (2001) Syst. Biol. 50, 540–556.
22. Rozas, J. & Rozas, R. (1999) Bioinformatics 15, 174–175.
23. Hudson, R. R., Kreitman, M. & Aguade, M. (1987) Genetics 116, 153–159.
24. Wakeley, J. & Hey, J. (1997) Genetics 145, 847–855.
25. Wang, R. L., Wakeley, J. & Hey, J. (1997) Genetics 147, 1091–1106.
26. Schneider, S., Kueffer, J.-M., Roessli, D. & Excoffier, L. (1997) ARLEQUIN (Genetics
and Biometry Laboratory, University of Geneva, Geneva), Version 2.0.
27. Swofford, D. L. (1999) PAUP*: Phylogeny Analysis Using Parsimony (*and Other Methods),
Version 4.0b2 (Sinauer, Sunderland, MA).
28. Tajima, F. (1989) Genetics 123, 585–595.
29. Lehmann, T., Hawley, W. A., Kamau, L., Fontenille, D., Simard, F. & Collins,
F. H. (1996) Heredity 77, 192–200.
A. A. & Collins, F. H. (1999) Insect Mol. Biol. 8, 287–297.
31. Caccone, A., Garcia, B. A. & Powell, J. R. (1996) Insect Mol. Biol. 5, 51–59.
32. Donnelly, M. J., Licht, M. C. & Lehmann, T. (2001) Mol. Biol. Evol. 18,
33. Hey, J. & Wakeley, J. (1997) Genetics 145, 833–846.
34. della Torre, A., Merzagora, L., Powell, J. R. & Coluzzi, M. (1997) Genetics 146,
35. Clark, A. G. (1997) Proc. Natl. Acad. Sci. USA 94, 7730–7734.
Besansky et al.
September 16, 2003 ?
vol. 100 ?
no. 19 ?