Modelling contemporary evolution in stickleback.
-
Citations (0)
-
Cited In (0)
Page 1
Quantifying population structure on short timescales
JOOST A. M. RAEYMAEKERS,*1LUC LENS,‡ FREDERIK VAN DEN BROECK,* STEFAN VAN
DONGEN§ and FILIP A. M. VOLCKAERT*
*Laboratory of Biodiversity and Evolutionary Genomics, University of Leuven, Ch. Deberiotstraat, 32, B-3000 Leuven, Belgium,
‡Terrestrial Ecology Unit, Ghent University, K.L. Ledeganckstraat 35, B-9000 Gent, Belgium, §Evolutionary Ecology Group,
University of Antwerp, Antwerp, Belgium
Abstract
Quantifying the contribution of the various processes that influence population genetic
structure is important, but difficult. One of the reasons is that no single measure
appropriately quantifies all aspects of genetic structure. An increasing number of studies
is analysing population structure using the statistic D, which measures genetic
differentiation, next to GST, which quantifies the standardized variance in allele
frequencies among populations. Few studies have evaluated which statistic is most
appropriate in particular situations. In this study, we evaluated which index is more
suitable in quantifying postglacial divergence between three-spined stickleback (Gast-
erosteus aculeatus) populations from Western Europe. Population structure on this short
timescale (10 000 generations) is probably shaped by colonization history, followed by
migration and drift. Using microsatellite markers and anticipating that D and GSTmight
have different capacities to reveal these processes, we evaluated population structure at
two levels: (i) between lowland and upland populations, aiming to infer historical
processes; and (ii) among upland populations, aiming to quantify contemporary
processes. In the first case, only D revealed clear clusters of populations, putatively
indicative of population ancestry. In the second case, only GSTwas indicative for the
balance between migration and drift. Simulations of colonization and subsequent
divergence in a hierarchical stepping stone model confirmed this discrepancy, which
becomes particularly strong for markers with moderate to high mutation rates. We
conclude that on short timescales, and across strong clines in population size and
connectivity, D is useful to infer colonization history, whereas GSTis sensitive to more
recent demographic events.
Keywords: genetic divergence, isolation by distance, landscape genetics, microsatellites, parallel
evolution, population genetics
Received 16 June 2011; revision received 22 March 2012; accepted 12 April 2012
Introduction
Understanding processes that generate and maintain
biodiversity requires insight in the evolutionary history
of populations. Assessing the contribution of the vari-
ous processes shaping population structure is a central
goal in this work. Population structure is the outcome
of neutral diversification driven by gene flow and
drift, and adaptive diversification driven by selection
(Holsinger & Weir 2009; Hurst 2009). Quantifying the
contribution of neutral processes to population structure
is particularly important. First, it allows inferring ances-
tral relationships and population connectivity. Second,
it represents a baseline to assess the contribution of nat-
ural selection to phenotypic divergence (Leinonen et al.
2008; Edelaar & Bjorklund 2011), as well as identifying
signatures of selection at the genomic level (Storz 2005).
A number of complications arise when quantifying
the contribution of neutral processes to population
Correspondence: Joost Raeymaekers, Fax: +32 16 32 45 75;
E-mail: joost.raeymaekers@bio.kuleuven.be
1Present address: Zoological Institute, University of Basel,
Vesalgasse 1, CH-4051 Basel, Switzerland
? 2012 Blackwell Publishing Ltd
Molecular Ecology (2012) 21, 3458–3473doi: 10.1111/j.1365-294X.2012.05628.x
Page 2
structure. In particular, no single measure appropriately
quantifies all neutral aspects of population structure.
Two metrics for this purpose are the subject of ongoing
debate: GST(Nei 1987), which quantifies the standard-
ized variance in allele frequencies among populations,
and its recently proposed alternative D (Jost 2008),
which measures genetic differentiation. GST and its
equivalent FSTare widely accepted and have important
applications as measures of population structure (Hol-
singer & Weir 2009). However, D was introduced
because GSTdoes not provide a straightforward assess-
ment of how different populations are (Jost 2008). For
instance, even when two populations are completely
differentiated, GSTcan approach zero, especially when
using highly polymorphic markers (Jost 2008; Gerlach
et al. 2010). This particular feature of GSThampers the
interpretation of population genetic data. D provides a
good alternative because it increases monotonically
with increasing allelic differentiation (Jost 2009), and for
this reason, empirical studies now apply D instead of,
or next to, GST. Unfortunately, D is not a perfect mea-
sure of population structure either. Ryman & Leimar
(2009) argued that D cannot be interpreted exclusively
in terms of effective population size and gene flow and
therefore is not useful when the focus is on demo-
graphic processes. Whitlock (2011) showed that in most
cases GST is the preferable summary of population
structure, as D is locus specific and heavily dependent
on mutation. Most recently, however, Leng & Zhang
(2011) argued that under nonequilibrium conditions, no
general advice can be given with regard to the appro-
priateness of D and GST.
The above considerations are based on theoretical
and simulation-based studies. One solution to over-
come this open debate is to search for the measure
which best answers the biological questions that moti-
vate the study (Whitlock 2011). Indeed, a combination
of historical and contemporary processes, including
past colonization, migration, drift, mutation and selec-
tion drive population structure, and the choice for D
or GSTdepends on how both indices respond to these
processes. Postglacial populations of fishes such as the
three-spined stickleback (Gasterosteus aculeatus) from
Western Europe represent ideal systems to test this.
These fishes have colonized different freshwater habi-
tats upon glacial retreat, after which they diversified
and adapted to local environments (Bell & Andrews
1997). Past colonization, migration, drift and selection
typically drive the diversification of these fishes. Infer-
ence on the importance and temporal scale of these
various processes is facilitated by the knowledge of
the geological history of the populations involved. For
instance, time since divergence among populations can
be inferred from the geological timing of post-glacial
retreat (Bell & Andrews 1997). Furthermore, popula-
tions from different watersheds often represent inde-
pendent replicates for the study of natural selection
(Schluter 1996; Taylor 1999).
In this study, we evaluated whether either D or GST
is more suitable for population genetic inference at two
levels: between lowland and upland stickleback popula-
tions, and among upland stickleback populations. Low-
land populations are anadromous or landlocked and
live in estuaries or polder creeks. The latter represent
dyked semi-natural brackish and freshwater bodies of
Holocene origin (i.e. <10 000 years) with varying con-
nectivity to adjacent estuaries or the open sea. Upland
populations live permanently in natural freshwater riv-
ers and streams that have not experienced coastal influ-
ences since the last Ice Age. Geographically, the
lowland–upland system extends from the southern
North Sea to the western Baltic (Wootton 1976). Phylog-
eographically, the system is clustering with other Euro-
pean three-spined stickleback populations, forming a
pan-European clade of late Pleistocene origin (Ma ¨kinen
et al. 2006; Ma ¨kinen & Merila ¨ 2008).
In the case of the diversification between lowland
and upland populations, studies based on allozyme
and microsatellite markers suggested that population
structure is shaped by multiple colonization events
(Raeymaekers et al. 2005, 2007). However, contempo-
rary gene flow could not be ruled out as an alternative
or an additional explanation for the observed popula-
tion structure. In the case of the diversification among
upland populations, studies based on microsatellite
markers attributed population structure to isolation by
human-made barriers, rather than to isolation by dis-
tance (Raeymaekers et al. 2008, 2009). In each of these
studies, FST(which is estimating the same quantity as
GST) was used to quantify population structure. In this
study, we test whether these conclusions depend on
the choice between D and GST. Reconsidering the
genetic structure of lowland and upland populations,
we expect that historical colonization would leave a
genetic signature with moderate to strong clusters of
populations according to geography or hydrology. We
anticipate that such population structure would be best
revealed using D, as GST(and FST) might be too sensi-
tive to more recent demographic processes leading to
allelic fixation—masking historical contingency. Recon-
sidering the genetic structure of upland populations,
we expect genetic differences to become stronger with
barrier-induced geographical isolation, but also more
unpredictable because of the stochastic effect of genetic
drift. We anticipate that such population structure
would be best revealed using GST(or FST) rather than
D, as in this case its higher sensitivity to recent demo-
graphic processes might enhance its ability to detect
D VS. GSTON SHORT TIMESCALES 3459
? 2012 Blackwell Publishing Ltd
Page 3
genetic differences caused by barrier-induced demo-
graphic shifts.
We test our predictions in an extended set of lowland
and upland three-spined stickleback populations (49
populations; 2450 individuals), including samples from
the previous studies (Raeymaekers et al. 2008; Van
Dongen et al. 2009) as well as a new collection of sam-
ples. We then use a simulation approach to compare
population structure based on D and GSTafter parallel
postglacial colonization of two upland basins by a com-
mon lowland source population. Specifically, we test
which metric best reflects historical contingency on the
one hand and contemporary processes on the other
hand. While our empirical comparison of D-based and
GST -based population structure only considers micro-
satellite markers, the simulations also include markers
with lower mutation rates and levels of polymorphism,
such as allozymes.
Materials and methods
Data collection
Three-spined sticklebacks (Gasterosteus aculeatus) were
sampled at 49 lowland and upland sites in Belgium, the
Netherlands and France in the spring of 2002 and in the
spring and autumn of 2004 (Fig. 1). The populations
from 2004 (n = 28) were obtained from a study by Van
Dongen et al. (2009) and were complemented with ten
new populations (Table 1). The populations from 2002
(n = 21) were analysed previously by Raeymaekers
et al. (2008) (Table 2). All populations originated from
connected systems such as ditches, rivers and streams.
The data set from 2004 was used to re-evaluate the
diversification between lowland and upland populations,
as assessed previously in Raeymaekers et al. (2005). Six
lowland populations (coded L) were caught in the pol-
ders bordering the Scheldt estuary (L1b, L2b, L3b and
L6) and the North Sea (L4 and L5). Eight populations
originated from the Meuse basin (coded M; M1, M2a,
M2b, M3-M7), ten populations originated from the cen-
tral and eastern section of the Scheldt basin (coded S; S4a,
S4b, S5a, S5b, S6a, S9a, S11a, S12a, S13a, S14), and four
populations were collected in the western section of the
Scheldt basin (S3b, Z2b, Z5b, Z7b). Later on, we refer to
this data set as the ‘lowland–upland’ data set.
The data set from 2002 was collected at 21 upland
sites across the central and eastern section of the
Scheldt River in Belgium and was used to re-evaluate
the diversification between upland populations, as
assessed previously inRaeymaekers
(Fig. 1). The populations included eight sites (coded
S4a–S13a, S14) that were chosen at regular distances in,
or as close as possible to, the main channel (Nete, Dijle
et al.
(2008)
and Demer). The remaining 13 sites (S5b, S9b–S9l, S13b)
were chosen to be downstream and upstream on principal
tributaries. Later on, we refer to this data set as the
‘upland’ data set.
Fifty adult individuals per site were caught with a
dip net or by electrofishing and flash frozen on dry ice.
Fin clips were taken and stored in 100% ethanol for
DNA analysis.
DNA extraction and genotyping
Individuals from all 49 populations were genotyped at
six microsatellite loci (Gac1097, Gac1125, Gac2111,
Gac4170, Gac5196, Gac7033), developed by Largiade `r
et al. (1999). Individuals from the 18 populations
obtained from the study by Van Dongen et al. (2009)
were genotyped at eight additional microsatellite loci
(Stn9, Stn23, Stn37, Stn96, Stn84, Stn130, Stn131, Stn174),
developed by Peichel et al. (2001). Genomic DNA was
extracted from fin clips using a silica-based purification
method (Elphinstone et al. 2003). The amplification of
loci was organized in a multiplex reaction using the Qia-
gen Multiplex PCR Kit (Qiagen, Venlo, The Netherlands).
PCR products were visualized on an ABI3130 Avant
Genetic analyzer (Applied Biosystems, Foster City, CA,
USA). Allele sizes were determined by means of an inter-
nal GeneScan 500-LIZ size standard, and genotypes were
scored using Genemapper 3.7 (Applied Biosystems).
ALLELOGRAM 2.2 was used for binning of allele lengths (Mo-
rin et al. 2009). We checked genotypes for scoring errors
that might be attributable to stutter products, large allele
dropout or to the presence of null-alleles, using the soft-
ware MICRO-CHECKER V. 2.2.3 (van Oosterhout et al. 2004).
Data analysis
For the lowland–upland data set, our analyses aimed at
identifying the signature of historical colonization. For
the upland data set, the aim was to identify geographi-
cal determinants of population connectivity.
For both data sets, population structure was analysed
in four ways. First, genetic diversity was calculated as
the observed heterozygosity (HO), using
(Belkhir et al. 2002), and as the allelic richness (AR; that
is, the number of alleles standardized for sample size
and averaged over loci) as implemented in Fstat 2.9.3.2
(Goudet 1995). Second, the global and pairwise popula-
tion differentiation (D) and the global and pairwise stan-
dardized variance in allele frequencies (GST) were
quantified using the R package DEMEtics (Gerlach et al.
2010). Confidence intervals for global values were
obtained by bootstrapping over loci. Pairwise D and GST
values were used to visualize population structure with
a two-dimensional classical multidimensional scaling
GENETIX 4.04
3460 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 4
North
Sea
Scheldt
basin
S11a
S12a
S14
S13a
S13b
S9i
S9l
S9h
S9f
S9g
S9a
S9e
S9b
S9j
S6a
S9d
S5b
S5a
S9c
S9k
z5b
z2b
z7b
L3b
L5
L2b
L6
L4
L4
L1b
M5
M1
M2
M4
M3
50 km
50 km
S9a
S13a
S14
S5a
M7
M6
Meuse
basin
S4a
S12a
S11a
S6a
S3b
S4b
M2b
S5b
Flow direction
Mills (N = 46)
Weirs (N = 57)
Tunnels (N = 14)
Sluices (N = 4)
(b)
(a)
Lowland
Upland
Fig. 1 (a) Locations of 28 populations of the three-spined stickleback from Belgium, the Netherlands and France, included in the
‘lowland–upland’ data set. These populations were collected in 2004. The study area includes the North Sea polders, the Scheldt river
basin and the Meuse river basin. The shaded area refers to the distribution zone of lowland estuarine and polder populations. The
inset refers to panel b. (b) Locations of 20 freshwater populations from the eastern sub-basins (Dijle and Demer) of the Scheldt River
in Belgium, included in the ‘upland’ data set. These populations were collected in 2002. The most downstream population (S4a; not
shown) and one more barrier are located 37 km west of population S5a. Population codes as in Tables 1 and 2.
D VS. GSTON SHORT TIMESCALES 3461
? 2012 Blackwell Publishing Ltd
Page 5
(CMDS) plot with the function cmdscale in R and with a
neighbour-joining tree using the PHYLIP 3.5 (Felsenstein
1995) subprogram neighbour. The resulting trees were
visualized with MEGA 4.0 (Tamura et al. 2007). Third, and
for comparative purposes, we assessed population struc-
ture with a Bayesian Markov chain Monte Carlo
(MCMC) assignment method based on multilocus geno-
types, implemented in STRUCTURE 2.3.2. (Pritchard et al.
2000). The most likely structure was calculated assuming
admixture and correlated allele frequencies. Each run,
considering population structure according to a specific
number of groups (1 < K < 28 for the lowland–upland
data set, and 1 < K < 21 for the upland data set), con-
sisted of three chains of 105MCMC replicates, initiated
by 104burn-in steps. The most likely result was com-
pared with population structure based on D and GST.
Fourth, we also performed a comparative analysis of
population structure with the discriminant analysis of
principal components (DAPC) ordination method for
allele frequencies (Jombart et al. 2010).
Because of a recent postglacial history, the contribu-
tion of mutation to population structure seems negligi-
ble in three-spined stickleback from Western Europe
(Ma ¨kinen et al. 2006; Ma ¨kinen & Merila ¨ 2008). Mutation
is therefore not expected to strongly influence popula-
tion structure. Nevertheless, for the lowland–upland
data set, a permutation test was performed to assess the
influence of stepwise-like mutation vs. drift and migra-
tion using the software SPAGEDI (Hardy & Vekemans
2002). Allele size at each locus was randomly permuted
among allelic states (2000 permutations) to simulate a
distribution of RST values (pRST) and 95% confidence
intervals (CIs) under the null hypothesis that differ-
ences in allele sizes do not contribute to population
structure (Hardy et al. 2003). As microsatellite markers
might as well follow a nonstepwise-like mutation
model, D and GSTwere also calculated after discarding
upland alleles that were not observed in the lowlands.
Here, potential mutation bias is reduced by only consid-
ering the upland alleles that probably belong to the
Table 1 Characteristics of 28 lowland and upland populations of three-spined stickleback sampled in 2004 in three Western
European countries
Population CodeBasin (country)
NHE
HO
AR
St. Jan-in-Eremo—Boerenkreek (L)
Borssele (L)
Westkerke (L)
De Moeren (L)
Yerseke (L)
Weert (L)
Lede (U)
Zwalm (U)
Zwalm (U)
Zwalm (U)
Mechelen (U)
Dessel (U)
Werchter (U)
Vaalbeek (U)
Aarschot (U)
Zelem (U)
Kermt (U)
Diepenbeek (U)
Bilzen (U)
Alt-Hoeselt (U)
Oss (U-hybrid)
Venlo (U)
Peer (U)
Tongeren (U)
Me ´haigne (U)
Annevoie (U)
Issancourt (U)
Balan (U)
L1b*
L2b*
L4*
L5*
L6*
L3b*
S3b
Z2b*
Z5b*
Z7b*
S4a
S4b
S5a*
S5b
S6a
S9a*
S11a
S12a
S13a*
S14*
M1*
M2*
M2b
M3*
M4*
M5*
M6
M7
Scheldt estuary (B)
Scheldt estuary (NL)
North Sea (B)
North Sea (B)
Scheldt estuary (NL)
Scheldt (B)
Scheldt (B)
Scheldt (Zwalm) (B)
Scheldt (Zwalm) (B)
Scheldt (Zwalm) (B)
Scheldt (Zennegat) (B)
Scheldt (Nete) (B)
Scheldt (Demer) (B)
Scheldt (Dijle) (B)
Scheldt (Demer) (B)
Scheldt (Demer) (B)
Scheldt (Demer) (B)
Scheldt (Demer) (B)
Scheldt (Demer) (B)
Scheldt (Demer) (B)
Meuse (NL)
Meuse (NL)
Meuse (B)
Meuse (B)
Meuse (B)
Meuse (B)
Meuse (F)
Meuse (F)
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
0.86
0.89
0.87
0.89
0.88
0.84
0.80
0.79
0.81
0.76
0.69
0.76
0.76
0.66
0.76
0.79
0.75
0.81
0.81
0.74
0.85
0.85
0.73
0.79
0.80
0.79
0.66
0.72
0.86
0.89
0.85
0.87
0.85
0.83
0.72
0.78
0.81
0.69
0.68
0.74
0.73
0.63
0.73
0.74
0.74
0.82
0.80
0.70
0.82
0.82
0.69
0.77
0.78
0.77
0.66
0.73
12.26
15.39
14.23
16.10
12.89
13.05
11.25
9.26
10.03
8.59
9.44
9.98
10.44
8.15
7.81
11.33
9.00
10.42
9.85
7.85
11.69
11.87
6.87
9.91
10.39
9.40
6.01
5.95
N, sample size; HE, expected (unbiased) heterozygosity; HO, observed heterozygosity; AR, allelic richness standardized for 38
individuals; (L), lowland population; (U), upland population; (B), Belgium; (NL), the Netherlands; (F), France.
*Samples obtained from Van Dongen et al. (2009).
3462 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 6
ancestral standing variation—hence excluding those
that might have arisen through de novo mutation.
For the upland data set, population structure was also
investigated with a landscape genetic analysis based on
the AICCcriterion (the smaller the better; models with
DAICC< 2 are equivalent) to identify which combina-
tion of landscape features (pairwise distance or pairwise
number of barriers along waterways) best explained D
and GST(Raeymaekers et al. 2008). In addition, correla-
tions between pairwise D and GSTand landscape fea-
tures were calculated and tested with simple and
partial Mantel tests in the R package Vegan (Oksanen
et al. 2007). Furthermore, under migration-drift equilib-
rium, geneticdifferences
expected to become more unpredictable with increasing
isolation because of drift (Hutchison & Templeton
1999). To test this prediction, we calculated the absolute
residuals of the isolation-by-distance and isolation-by-
barrier plots and tested their correlation with distance
or barriers.
betweenpopulationsare
Simulations
We used the software EASYPOP (Balloux 2001) to simulate
an episode of parallel postglacial colonization of two
upland basins by a common lowland source population.
Assuming a generation time of 1 year, 104postglacial
generations were simulated using a hierarchical step-
ping stone migration model containing the ‘Meuse’
basin with populations m1 to m8, and the ‘Scheldt’
basin with populations s1 to s8. In each basin, popula-
tion size declined from 800 individuals in the most
‘downstream’ populations (m1 and s1) to 100 individu-
als in the most ‘upstream’ populations (m8 and s8), in
steps of 100. A downstream ‘lowland’ connection
between the basins was established between m1 and s1.
The migration rate between (mb) and within (mw)
basins was set to 0 and 0.01, respectively. Genotypes at
ten freely recombining microsatellite loci were gener-
ated assuming an infinite allele model with 25 possible
allelic states, maximal genetic variability at the onset of
the simulation and a mutation rate (l) of 10)4.
After 104generations, the genotypes of 800 individu-
als (50 per population) were used to calculate overall
and pairwise D and GST. Colonization history is
expected to generate larger values of D and GSTamong
pairs of populations between basins than among pairs
ofpopulationswithin basins.
reflected by a positive correlation (further referred to as
R1) between pairwise D or GSTand a dummy variable
with values )1 for all within-basin and values 1 for all
between-basin population pairs. In contrast, migration
is expected to generate smaller values of D and GST
between large and well-connected downstream popula-
tions than between small and isolated upstream popula-
tions. This expectation is reflected by a negative
correlation (further referred to as R2) between the aver-
age size of each population pair and pairwise D or GST.
In addition, drift is expected to generate more unpre-
dictable values of D and GST between small and iso-
lated upstream populations than between large and
well-connected downstream populations. This expecta-
tion is reflected by a negative correlation (further
referred to as R3) between the average size of each pop-
ulation pair and the absolute residuals of the regression
of pairwise D or GSTon average size. On the basis of
R1, R2 and R3, we tested which metric is the better
match for each of these expectations. If D is to be more
strongly linked to colonization than GST, and GSTmore
strongly to recent migration and drift than D, R1should
be more positive for D than for GST, whereas R2and R3
should be more negative for GSTthan for D. A Wilcoxon
matched pairs test was used to compare R1, R2and R3
values for D with values for GSTover 20 replicates of
the simulation.
The discrepancy between pairwise D and pairwise
GSTwas assessed in the following way. For each repli-
cate, we first applied a CMDS (as above) to represent
the simulated pairwise D-based or GST-based population
Thisexpectationis
Table 2 Characteristics of 21 upland populations of three-
spined stickleback sampled in 2002 in the Nete-Dijle-Demer
basin (Belgium)
PopulationCodeBasin
NHE
HO
AR
Mechelen
Werchter
Vaalbeek
Aarschot
Zelem
Boutersem
Zoutleeuw
Hoegaarden
Landen
Gingelom
Stevoort
Mechelen-Bovelingen
Borgloon
Kortenaken
St-Truiden
Wellen
Kermt
Diepenbeek
Bilzen
Zutendaal
Alt-Hoeselt
S4a
S5a
S5b
S6a
S9a
S9b
S9c
S9d
S9e
S9f
S9g
S9h
S9i
S9j
S9k
S9l
S11a
S12a
S13a
S13b
S14
Nete
Dijle
Dijle
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
Demer
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
0.83
0.78
0.65
0.79
0.73
0.66
0.78
0.79
0.74
0.69
0.78
0.55
0.44
0.75
0.80
0.77
0.73
0.79
0.81
0.64
0.79
0.83
0.77
0.63
0.76
0.72
0.68
0.75
0.77
0.76
0.74
0.80
0.56
0.43
0.74
0.81
0.78
0.72
0.73
0.83
0.66
0.71
10.34
9.46
6.22
8.20
7.66
5.78
8.89
9.47
6.50
5.35
7.02
3.67
3.46
7.74
7.32
7.28
7.56
7.59
7.73
4.37
7.39
N, sample size; HE, expected (unbiased) heterozygosity; HO,
observed heterozygosity; AR, allelic richness standardized for
18 individuals. All samples have been analysed previously by
Raeymaekers et al. (2008).
D VS. GSTON SHORT TIMESCALES 3463
? 2012 Blackwell Publishing Ltd
Page 7
structure in a multivariate Euclidian space. We then
used a Procrustes analysis (PA) (Gower 1975; Larmu-
seau et al. 2009) implemented in the vegan library in R
to calculate the best match between the resulting D-
based and GST-based population configuration. To find
this best match, PA is minimizing the sum of the
squared differences between both configurations (SSP).
Values of SSP were averaged over all 20 replicates of
the simulation as an overall measure of the discrepancy
between pairwise D and GST. For comparison, we also
calculated the‘Procrustes
RP= (1)SSP)^(1⁄2), as well as the usual Mantel correla-
tion RM, both reflecting the quality of the match
between pairwise D and GST.
All of the above tests were also performed based on
additional simulations testing a wider range of parame-
ter values. This extension was performed to consider
parameter values that are realistic for microsatellite
markers (l: 10)3to 10)4; 8–30 possible allelic states) as
well as markers with a lower polymorphism and muta-
tion rate, such as allozymes (l: 10)6; 8 possible allelic
states). We also evaluated the influence of the migration
rate between the basins (0 vs. 0.005) on the outcome of
the simulations. This was carried out to increase the
dependence among the two colonization events, which
might occur when gene flow between basins is possible
via the lowlands, or when subsequent sets of founder
populations within the same basin start to exchange
genes. In total, we performed 10 different simulations,
eight for microsatellite markers and two for allozyme
markers, as specified in Table 4.
correlation’ as
Results
Genetic structure of lowland and upland populations
The observed heterozygosity was higher in lowland
populations than in upland populations (lowland: 0.83–
0.89; upland: 0.63–0.82; Table 1). A similar pattern was
observed for AR (lowland: 12.3–16.1; upland: 5.9–11.9;
Table 1). Among upland populations, AR reached very
low levels in the southern part of the Meuse basin
(M2b, M6, M7) and at upstream positions of the Scheldt
basin (Z7b, S5b, S6a, S14).
D revealed moderately strong population structure,
while GST was moderately low (D = 0.41, 95% CI:
0.403–0.425, P = 0.001; GST= 0.090, 95% CI: 0.086–0.093,
P = 0.001). However, both metrics were lower among
lowland populations than among upland populations
(lowland: D = 0.16, 95% CI: 0.131–0.183, P = 0.001;
GST= 0.017, 95% CI: 0.013–0.021, P = 0.001; upland:
D = 0.42, 95% CI: 0.412–0.436, P = 0.001; GST= 0.103,
95%
CI: 0.098–0.108,
P = 0.001).
based on D revealed three clusters of populations
TheCMDS plot
(Fig. 2; Fig. S1, Supporting information): (i) the lowland
populations (coded L) together with the western section
of the Scheldt basin (Sb3 and populations coded Z) and
the downstream (northern) section of the Meuse basin
(M1 and M2); (ii) the central section of the Scheldt basin
(S4a to S9a) together with the upstream (southern) sec-
tion of the Meuse basin (M4 to M7); and (iii) the eastern
section of the Scheldt basin (S12a to S14a) together with
the central section of the Meuse basin (M2b and M3).
The GST–based CMDS plot showed a much more dif-
fuse population structure, with strong aggregation of
lowland populations in the centre and isolated upland
populations (M2b, M6, M7, S5b, S14) at the periphery
(Fig. 2; Fig. S1, Supporting information). The neigh-
bour-joining tree based on D confirmed the presence of
the above clusters (Fig. 2). The GST-based neighbour-
joining tree approximately showed the same structure,
but populations with low genetic variability (M2b, M6,
M7, S5b) were rearranged (Fig. 2). This tree also
revealed stronger aggregation of lowland populations
than the D-based tree.
Bayesian analysis revealed an optimal structure with
eleven clusters, most of which were nested within the
three main clusters suggested by D (Fig. S2, Supporting
information). The DAPC ordination also revealed three
main clusters that were highly congruent with those
revealed by D (Fig. S3, Supporting information).
Randomly permuting allele sizes at each locus among
allelic states in SPAGeDI suggested that there was no
significant influence of stepwise-like mutation to popula-
tion differentiation (all samples: RST= 0.102; pRST =
0.090; P = 0.24; lowland: RST= 0.0226; pRST= 0.0202;
P = 0. 35; upland: RST= 0.106; pRST= 0.103; P = 0.42).
Furthermore, avoiding potential mutation bias in the
assessment of population structure by discarding upland
alleles that might have arisen through de novo mutation,
the magnitude of D or GST did not change (D = 0.41;
GST= 0.092). Moreover, CMDS based on pairwise D still
revealed the same clusters, whereas CMDS based on
pairwise GSTstill revealed a central core of lowland pop-
ulations along with more peripheral upland populations
(results not shown).
Focusing the analyses on the 18 populations from
Van Dongen et al. (2009), which were genotyped at
eight additional loci, the overall value of D and GST
was similar to the total data set (D = 0.36; GST= 0.087).
CMDS plots revealed that the discrepancy between
population structure based on pairwise D and GSTwas
smaller than for the total data set (Fig. S4, Supporting
information), probably because the most peripheral
upstream populations with low genetic variability were
not included. However, pairwise D again revealed
weaker aggregation of lowland populations than pair-
wise GST(Fig. S4, Supporting information).
3464 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 8
Genetic structure of upland populations
The observed heterozygosity was higher in the down-
stream section of the basin than in the upstream section
(range: 0.43–0.83; Table 2). A similar pattern was
observed for AR, with a value of 10.34 for the most
downstream populations (S4a), and values 4.37 (S13b),
3.67 (S9h) and 3.46 (S9i) for populations located far
upstream (Table 2).
D and GSTrevealed strong and moderate population
structure, respectively (D = 0.498, 95% CI: 0.487–0.509,
P = 0.001; GST= 0.147, 95% CI: 0.142–0.152, P = 0.001).
The D-based CMDS plot revealed a separation between
the eastern and the western section of the basin, and
L2b
L5
L4
L6
L3b
M1
M2
Z2b
Z5b
Z7b
S3b
M2b
M3
S14
S12a
S13a
S4b
S11a
S5a
S6a
M4
M5
M7
S5b
M6
S4a
S9a
L1b
0.05
(c)
L5
L6
L3b
S3b
L4
Z2b
Z7b
Z5b
M2b
S4b
M3
S14
S12a
S13a
M4
M5
M6
S4a
S9a
S11a
S5a
S6a
M7
S5b
L2b
M1
M2
L1b
0.005
(d)
–0.04–0.020.00
Dimension 1
0.02 0.040.06
–0.04
–0.02
0.00
0.02
0.04
0.06
Dimension 2
L1b
L2b,L4,L5
L6
L3b
M1
M2
M2b
M3
M4
M5
M6
M7
S11a
S12a
S13a
S14
S3b
S4a
S4b
S5a
S5b
S6a
S9a
Z2b
Z5b
Z7b
(b)
GST
–0.3–0.2–0.10.00.1 0.2
–0.3
–0.2
–0.1
0.0
0.1
0.2
Dimension 1
Dimension 2
L1b
L2b
L3b
L4
L5
L6
M1
M2
M2b
M3
M4
M5
M6
M7
S11a
S12a
S13a
S14
S3b
S4a
S4b
S5a
S5b
S6a
S9a
Z2b
Z5bZ7b
(a)
D
Fig. 2 Genetic structure of 28 three-spined stickleback populations from the ‘lowland–upland’ data set. (a) Classical multidimen-
sional scaling (CMDS) plot based on pairwise D values. (b) CMDS plot based on pairwise GSTvalues. (c) Neighbour-joining tree
based on pairwise D values; (d) NJ tree based on pairwise GSTvalues. Population codes as in Table 1. Three-dimensional versions of
plot A and B are provided in Fig. S1, Supporting information.
D VS. GSTON SHORT TIMESCALES 3465
? 2012 Blackwell Publishing Ltd
Page 9
strong isolation for one population with low genetic
variability (S9h) (Fig. 3). The GST -based CMDS plot
revealed small differences between the eastern and the
western section of the basin, but identified four highly
divergent populations with low genetic variability (S5b,
S9h, S9i and S13b) (Fig. 3). Bayesian analysis in struc-
ture suggested an optimal structure with eleven clus-
ters, largely matching the geography of the basin. These
clusters were nested in the structure suggested by both
D and GSTand revealed stronger substructure in partic-
ular in the eastern section of the basin (Fig. S2, Sup-
porting information). The DAPC ordination identified
the four highly divergent populations, as well as the
substructure between the eastern and the western sec-
tion of the basin (Fig. S3, Supporting information).
A landscape genetic analyses indicated that pairwise
D could be best explained by a model including dis-
tance (AICC= )30.02), followed by a model including
both distance and barriers (AICC= )29.38), but not by
barriers alone (AICC= )26.46). For pairwise GST, only
an effect of barriers was supported (AICC= )84.12).
Simple andpartial correlations
results, as pairwise D was more strongly correlated
with distance than with barriers, whereas pairwise GST
was more strongly correlated with barriers than with
distance (Table 3; Fig. 4). Correlations between abso-
lute residuals and geographical features were positive
for pairwise GST(Table 3), indicating that GSTbecomes
more unpredictable when populations are more isolated
by distance or barriers. This pattern, which becomes
particularly apparent in Fig. 4D (barriers vs. GST), prob-
ably reflects the action of drift. In contrast, correlations
between absolute residuals and geographical features
were negative for pairwise D (Table 3).
corroboratedthese
–0.4–0.20.00.20.4
–0.4
–0.2
0.0
0.2
0.4
Dimension 1
Dimension 2
(a)
S11a
S12a
S13a
S13b
S14
S4a
S5a
S5b
S6a
S9a,j,k
S9b
S9c
S9d
S9e
S9f
S9g
S9h
S9i
S9l
–0.15–0.10–0.05 0.00 0.050.10
–0.15
–0.10
–0.05
0.00
0.05
0.10
Dimension 1
Dimension 2
(b)
S11a
S9f,S9b
S12a
S13a
S13b
S14
S4a
S5a
S5b
S6a
S9a
S9d
S9l
S9e
S9h
S9i
Fig. 3 Genetic structure of 21 three-spined stickleback populations from the ‘upland’ data set. (a) Classical multidimensional scaling
(CMDS) plot based on pairwise D values. (b) CMDS plot based on pairwise GSTvalues. Four central points in panel b are left unla-
belled (S9c, S9d, S9j, S9k). Population codes as in Table 2.
Table 3 Simple, partial and residual correlations between geographical features (geographical distance along waterways and barriers
along waterways) and genetic divergence (D or GST) in 21 upland three-spined stickleback populations (see Table 2)
D
GST
SimplePartial ResidualSimple PartialResidual
Distance
Barriers
0.6267 (0.001)
0.5296 (0.001)
0.4861 (0.001)
0.3082 (0.046)
)0.2382 (0.996)
)0.2039 (0.987)
0.4523 (0.007)
0.6809 (0.001)
0.16 (0.154)
0.5856 (0.003)
0.1557 (0.151)
0.2178 (0.063)
Mantel test P-values based on 1000 permutations are between brackets. Significant P-values are in bold.
3466 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 10
Simulations
In almost every single replicate of the simulations, R1
was higher for pairwise D than for pairwise GST, result-
ing in highly significant P-values for the Wilcoxon
matched pairs test (Table 4). Likewise, R2and R3values
were consistently more negative for pairwise GSTthan
for pairwise D (Table 4). The simulations hence con-
firmed that, under a wide range of parameter values, D
is more sensitive to colonization history than GST, and
GSTis more sensitive to clines in population size (pro-
moting differential migration and drift) than D. Various
aspects of this effect are visualized in Fig. 5 for one repli-
cate of model lsat-6. The CMDS plot revealed that GST
(Fig. 5b) succeeded to isolate the most upstream sites,
but failed to discriminate between basins. The opposite
was observed for D (Fig. 5a). Overall, the discrepancy
between D-based and GST -based population structure
was larger for microsatellites than for allozymes (higher
SSPand lower RPand RMvalues in models lsat-1 to lsat-
8 than in models allo-1 and allo-2; Table 4). Furthermore,
increasing the dependence among the colonization
events by allowing migration between basins obviously
caused a drop of the D and GSTvalues (Table 4). How-
ever, it also caused a larger discrepancy between
D-based and GST-based population structure (higher SSP
and lower RPand RMvalues in models lsat-2 vs. lsat-1,
lsat-4 vs. lsat-3 (except for RM), lsat-6 vs. lsat5, lsat-8
vs. lsat-7 and allo-2 vs. allo-1; Table 4). Finally, mutation
also enhanced the discrepancy between D-based and GST
-based population structure (higher SSPand lower RP
and RMvalues in models lsat-3 vs. lsat-1, lsat-4 vs. lsat-
2, lsat-7 vs. allo-1, and lsat-8 vs. allo-2), whereas marker
polymorphism did not (similar SSP, RPand RMvalues in
models lsat-1 vs. lsat-5 vs. lsat-7, and in models lsat-2
vs. lsat-6 vs. lsat-8; Table 4).
Discussion
So far, the debate on which is the most appropriate
metric for assessing population structure is based on
theoretical considerations and numerical examples (Jost
Distance (km)
020406080100120140
Pairwise D
0.0
0.2
0.4
0.6
0.8
1.0
Distance (km)
020406080100120140
Pairwise GST
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Barriers
01020304050
Pairwise D
0.0
0.2
0.4
0.6
0.8
1.0
Barriers
01020304050
Pairwise GST
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
(a)(b)
(c)
(d)
Fig. 4 Relationship between geographical features and genetic divergence in 21 three-spined stickleback populations from the
‘upland’ data set. (a) Geographical distance along waterways vs. D. (b) Geographical distance along waterways vs. GST. (c) Number
of barriers along waterways vs. D. (d) Number of barriers along waterways vs. GST. For correlations, see Table 3.
D VS. GSTON SHORT TIMESCALES 3467
? 2012 Blackwell Publishing Ltd
Page 11
Table 4 Simulations of an episode of parallel postglacial colonization in two upland basins by a common lowland source population. Assuming a generation time of 1 year,
104postglacial generations were simulated using a hierarchical stepping stone migration model containing the ‘Meuse’ basin with populations m1 to m8, and the ‘Scheldt’ basin
with populations s1 to s8. In each basin, population size declined from 800 individuals in the most ‘downstream’ populations (m1 and s1) to 100 individuals in the most
‘upstream’ populations (m8 and s8), in steps of 100. A downstream ‘lowland’ connection between the basins was established between m1 and s1. Genotypes at freely recombin-
ing loci were generated assuming an infinite allele model and maximal genetic variability at the onset of the simulations. Other parameters, including the migration rate
between (mb) and within (mw) basins, the number of loci (loci), the mutation rate (l) and the number of possible allelic states (PAS) are specified below
Model
mb⁄mw
Loci
l
PAS
D
GST
RM
RP
SSP
R1,D
R1,GST
R1P-value
R2,D
R2,GST
R2P-value
R3,D
R3,GST
R3P-value
lsat-1
0.00⁄
0.01
10
10)4
25
0.557
± 0.010
0.313
± 0.008
0.966
± 0.004
0.973
± 0.002
0.053
± 0.005
0.977
± 0.002
0.933
± 0.006
<0.0001
)0.040
± 0.007
)0.179
± 0.011
<0.0001
0.155
± 0.027
)0.140
± 0.033
<0.0001
lsat-2
0.005⁄
0.01
10
10)4
25
0.333
± 0.009
0.153
± 0.003
0.944
± 0.004
0.942
± 0.004
0.113
± 0.008
0.495
± 0.015
0.390
± 0.012
<0.0001
)0.543
± 0.013
)0.658
± 0.007
<0.0001
)0.427
± 0.021
)0.538
± 0.016
<0.0001
lsat-3
0.00⁄
0.01
10
10)3
25
0.648
± 0.004
0.119
± 0.002
0.792
± 0.008
0.915
± 0.006
0.163
± 0.011
0.725
± 0.001
0.428
± 0.013
<0.0001
)0.239
± 0.009
)0.585
± 0.008
<0.0001
0.029
± 0.017
)0.289
± 0.013
<0.0001
lsat-4
0.005⁄
0.01
10
10)3
25
0.626
± 0.004
0.110
± 0.001
0.829
± 0.008
0.887
± 0.005
0.213
± 0.009
0.493
± 0.010
0.278
± 0.010
<0.0001
)0.480
± 0.009
)0.737
± 0.005
<0.0001
)0.217
± 0.018
)0.509
± 0.009
<0.0001
lsat-5
0.00⁄
0.01
10
10)4
30
0.584
± 0.006
0.333
± 0.007
0.966
± 0.004
0.974
± 0.003
0.051
± 0.005
0.977
± 0.002
0.934
± 0.005
<0.0001
)0.041
± 0.006
)0.180
± 0.013
<0.0001
0.197
± 0.027
)0.126
± 0.039
<0.0001
lsat-6
0.005⁄
0.01
10
10)4
30
0.369
± 0.010
0.163
± 0.004
0.938
± 0.005
0.934
± 0.005
0.128
± 0.010
0.522
± 0.014
0.406
± 0.014
<0.0001
)0.528
± 0.014
)0.649
± 0.010
<0.0001
)0.480
± 0.018
)0.582
± 0.013
<0.0001
lsat-7
0.00⁄
0.01
10
10)4
8
0.472
± 0.010
0.323
± 0.010
0.975
± 0.002
0.974
± 0.002
0.049
± 0.003
0.968
± 0.003
0.931
± 0.006
<0.0001
)0.071
± 0.015
)0.165
± 0.020
0.0001
0.054
± 0.037
)0.134
± 0.041
0.0002
lsat-8
0.005⁄
0.01
10
10)4
8
0.284
± 0.006
0.160
± 0.003
0.948
± 0.004
0.941
± 0.005
0.113
± 0.009
0.522
± 0.015
0.428
± 0.013
<0.0001
)0.547
± 0.012
)0.652
± 0.006
<0.0001
)0.431
± 0.021
)0.550
± 0.017
<0.0001
allo-1
0.00⁄
0.01
20
10)6
8
0.458
± 0.007
0.623
± 0.014
0.998
± 0.001
0.997
± 0.001
0.006
± 0.001
0.997
± 0.001
0.994
± 0.001
0.0001
)0.018
± 0.003
)0.035
± 0.005
0.0008
0.063
± 0.024
)0.040
± 0.033
0.0017
allo-2
0.005⁄
0.01
20
10)6
8
0.142
± 0.004
0.160
± 0.003
0.982
± 0.002
0.978
± 0.001
0.043
± 0.003
0.488
± 0.018
0.450
± 0.019
0.0007
)0.579
± 0.014
)0.623
± 0.009
0.0002
)0.564
± 0.017
)0.598
± 0.014
0.0064
The model output is based on the genotypes of 800 individuals (50 per population) and includes the overall value for D, the overall value for GST, the Mantel correlation RM,
the Procrustes correlation RP, the Procrustes sum of squares SSP, R1(the correlation between pairwise D or GSTand a dummy variable with values )1 for all within-basin and
values 1 for all between-basin population pairs), R2(the correlation between the average size of each population pair and pairwise D or GST) and R3(the correlation between
the average size of each population pair and the absolute residuals of the regression of pairwise D or GSTon average population size) (all values ± standard error). P-values
(Wilcoxon matched pairs test based on 20 replicates for each simulation) comparing R1, R2and R3values for D with R1, R2and R3values for GSTare provided.
3468 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 12
2008; Ryman & Leimar 2009; Whitlock 2011), simulation
studies (Leng & Zhang 2011) and meta-analyses (Heller
& Siegismund 2009; Meirmans & Hedrick 2011). Despite
the insights from this debate (see Introduction), empiri-
cal studies play an important role to identify which
metric best answers the biological questions that moti-
vate our research (Whitlock 2011). Few studies so far
have made explicit comparisons between D and GSTin
natural systems. Those studies that did calculate D next
to GSTfound that D is suitable for comparison across
species and across markers with similar mutation rates
(Johnson et al. 2010; Callens et al. 2011; Pennings et al.
2011).
In this study, we evaluated whether either D or GST
was more suitable for population genetic inference on a
set of three-spined stickleback (Gasterosteus aculeatus)
populations from Western Europe. We had two biologi-
cal questions to address. First, we aimed at identifying
the signature of postglacial colonization in the genetic
structure of lowland and upland populations. Second,
we aimed at quantifying the contribution of contempo-
rary gene flow and drift to the genetic structure of
upland populations. Importantly, D and GSTmight have
different capacities to reveal these different processes.
Strictly speaking, empirical data cannot be used to eval-
uate these different capacities, as the true population
structure is unknown. However, we know that the nat-
ural history of three-spined stickleback is characterized
by marine ancestry (Bell & Andrews 1997), and studies
basedonmitochondrial DNAandmicrosatellite
markers confirm that stickleback populations from Wes-
tern Europe have a postglacial marine origin (Ma ¨kinen
et al. 2006; Ma ¨kinen & Merila ¨ 2008). We hence do know
that population structure has been shaped by a number
of coastal-inland colonization events. Furthermore, pop-
ulation structure is largely constrained by the river-
scape, resulting in a strong isolation-by-distance pattern
as well as a strong cline in genetic diversity from the
lowland populations to the most upstream upland pop-
ulations (Raeymaekers et al. 2005, 2007, 2008, 2009). The
known ancestry and constrained population structure
strongly reduce the number of possible scenarios of
diversification and allow us to make reasonably strong
assumptions about the true population structure. Below,
we outline these scenarios in more detail, discuss the
compatibility of D and GST with these scenarios, and
summarize how simulated data support our conclu-
sions.
Genetic structure of lowland and upland populations
Lowland and upland stickleback are markedly different
in a number of phenotypic traits (Raeymaekers et al.
2007). For instance, lowland populations contain vary-
ing percentages of low-plated, partially plated and com-
pletely plated stickleback, whereas upland populations
are exclusively low plated. Such differences can evolve
rapidly (Bell et al. 2004; Le Rouzic et al. 2011; Raeymae-
kers 2011), and assessing population structure is crucial
to understand how these differences evolved. Our
–0.4–0.2
Dimension 1
0.00.2
–0.4
–0.2
0.0
0.2
0.4
Dimension 2
(a)
m1
m2
m3
m4
m5
m6
m7
m8
s1
s2
s3
s4s5
s6
s7
s8
–1.5 –1.0 –0.5 0.0
Within basins
0.5
Between basins
1.01.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Within basinsBetween basins
D
(c)R1 = 0.47
R1 = 0.32
R2 = –0.56
R2 = –0.68
R3 = –0.42
R3 = –0.56
Rm = 0.91
100300500700
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Average population size
D
(e)
100300500700
0.0
0.1
0.2
0.3
0.4
Average population size
|Residual D|
(g)
0.000.100.20
GST
0.30
0.1 0.2 0.3 0.4 0.5 0.6 0.7
D
(i)
–0.20–0.100.000.10
–0.2
–0.1
0.0
0.1
Dimension 1
Dimension 2
(b)
m1
m2
m3
m4
m5
m6
m7
m8
s1
s2
s3
s4
s5
s6
s7
s8
–1.5 –1.0 –0.5 0.00.51.01.5
0.00
0.10
0.20
0.30
(d)
100300500700
0.00
0.10
0.20
0.30
Average population size
GST
GST
(f)
100300500700
0.00
0.05
0.10
0.15
Average population size
|Residual GST|
(h)
Fig. 5 Replicate of a simulation of an episode of parallel postglacial colonization of the ‘Meuse’ basin with populations m1 to m8,
and the ‘Scheldt’ basin with populations s1 to s8. Parameter settings are listed in Table 4 (model lsat-6). (a) Classical multidimen-
sional scaling (CMDS) plot based on pairwise D values. (b) CMDS plot based on pairwise GSTvalues. (c) Pairwise D within and
between basins. (d) Pairwise GSTwithin and between basins. (e) Pairwise D vs. average population size. (f) Pairwise GSTvs. average
population size. (g) Absolute residual pairwise D vs. average population size. (f) Absolute residual pairwise GSTvs. average popula-
tion size. (i) Pairwise D vs. pairwise GST.
D VS. GSTON SHORT TIMESCALES 3469
? 2012 Blackwell Publishing Ltd
Page 13
earlier research suggested that the division between
lowland and upland populations has a Holocene origin
and that lowland populations are ancestral (Raeymae-
kers et al. 2005). Furthermore, the observation of stron-
ger genetic differences among upland populations than
between upland and lowland populations led to the
conclusion that stickleback have colonized upland riv-
ers and streams multiple times independently. How-
ever, the same observation could also be explained by
contemporary (habitat specific) gene flow and drift, as
effective population size (Ne) is larger and connectivity
between populations is stronger in the lowlands. Impor-
tantly, the scenarios of past independent colonization
vs. contemporary gene flow are not mutually exclusive,
but might mask each other. For instance, contemporary
processes might quickly dominate the balance between
migration and drift, such that colonization history
becomes untraceable. This is not desirable, as identify-
ing the signature of colonization is crucial to infer
the number of times the upland ecotype evolved in
parallel.
Our comparison of D and GSTled to different conclu-
sions with respect to population structure. In particular,
D revealed strong genetic structure with a cluster of
lowland populations and two clusters of upland popu-
lations. In contrast, GSTshowed a more diffuse genetic
structure, with a central core of lowland populations
with high genetic variability and more peripheral
upland populations with low genetic variability. As GST
provides a good description of the influence of recent
demographic events on genetic variation (Meirmans &
Hedrick 2011), this pattern is probably indicative of dif-
ferences in effective population size or migration rate.
However, this quality of GSTmight mask colonization
history and founder effects. D, on the other hand, is
independent of effective population size and less sensi-
tive to recentdemographic
Hedrick 2011), and therefore probably reflects historical
colonization more reliably than GST. Our simulations in
a postglacial hierarchical stepping stone model with
clinal variation in population size confirmed that D
maintains a closer link with colonization history than
GST.
The presence of two upland clusters in our study
area detected by D was also confirmed by the multi-
variate DAPC method. These clusters suggest that
sticklebacks colonized the upstream basins at least two
times independently. Interestingly, we observed a mis-
match between these clusters and current hydrology.
This implies that at some point gene flow between
Scheldt and Meuse populations has occurred (e.g.
between M3 and S14). Recent natural or human-
inducedtranslocationbetween
excluded, but it is highly unlikely that this has hap-
events(Meirmans&
basinscannotbe
pened on a large scale. Also, translocation between
basins is not expected to exceed migration within
basins, and hence cannot generate larger similarities
betweenpopulationsfrom
between populations from the same basin. It therefore
seems that the two clusters of upland populations
reflect different waves of historical colonization (fol-
lowed by changes in hydrology) rather than recent
processes. This is an important result modifying previ-
ous insights in the colonization history of these stickle-
back populations (Raeymaekers et al. 2005).
differentbasinsthan
Genetic structure of upland populations
Our previous work on upland stickleback populations
aimed at evaluating the impact of human-made barriers
on the connectivity between riverine stickleback popula-
tions (Raeymaekers et al. 2008). Based on a FST-based
landscape genetics approach, Raeymaekers et al. (2008)
concluded that these barriers represent the most impor-
tant geographical feature shaping population structure.
In particular, it appeared that barriers controlled the
balance between gene flow and drift more strongly than
geographical distance. First, the correlation between FST
and barriers was stronger than between FSTand geo-
graphical distance, indicating that gene flow decreases
with barriers rather than with distance. Second, the
scatter in FSTvalues also increased more strongly with
barriers than with distance, suggesting that barriers also
enhance drift (Raeymaekers et al. 2008). Note that FST
and GSTare estimating the same quantity, so both indi-
ces are highly correlated.
We re-evaluated the same data set, this time using D
and GST. We did not expect here that D would provide
more insight in contemporary population connectivity
than GST. D approaches equilibrium slower than GST
(Ryman & Leimar 2009), such that D is less sensitive
for recent demographic changes such as those induced
by human-made barriers (in this case constructed on
average <300 years ago). Our comparison of D and
GSTindeed supported these expectations. D did corre-
late with human-made barriers, but was more strongly
correlated with geographical distance. Furthermore,
there was no pattern of increasing scatter in D with
distance or the number of barriers, suggesting that D
indeed has a low sensitivity to drift. As such, only GST
(and FST) was indicative for the balance between con-
temporary migration and drift, in this case shaped by
human-made barriers rather than geographical dis-
tance. Our simulations in a postglacial hierarchical
stepping stone model confirmed that GSTis more sen-
sitive to population size declining with geographical
isolation than D, and hence more sensitive to migra-
tion and drift.
3470 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd
Page 14
GSTvs. D
Our empirical results suggested that D and GST have
different sensitivities to processes shaping short-term
(i.e. postglacial) population structure at two different
levels: between lowland and upland populations, and
among uplandpopulations.
appeared more insightful than GST to trace back how
ancestrallowland populations
freshwater basins after postglacial retreat. In the second
case, GSTappeared more useful than D to identify con-
temporary population genetic processes. Simulations of
postglacial colonization and subsequent diversification
in a hierarchical stepping stone migration model with
clinal variation in population size confirmed that D and
GST indeed behave accordingly. For instance, when
‘masking’ the signature of colonization by allowing
migration between basins, D still revealed the signature
of colonization, while GST identified the signature of
increased connectivity between the basins.
What is the origin of the discrepancy between D and
GST in this study? Correlations between both metrics
show a positive nonlinear relationship where pairwise D
increases steeply for low values of pairwise GST and
levels off for high values of pairwise GST(Fig. 5; Fig. S5,
Supporting information). The result of this rescaling is
that D has a greater capacity than GSTto discriminate
betweendemeswhenwithin-deme
between-deme connectivity are high (such as for the
lowland populations), while GSTmore easily identifies
demes when within-deme diversity and between-deme
connectivity are low (such as for the upland populations
approaching fixation). For the diversification between
lowland and upland populations, the result is that D is
less sensitive to differences in gene flow and drift, and
more sensitive to the historical contingency between
populations. The same property makes D less suitable to
describe the diversification among upland populations.
Here, differences in gene flow and drift are important
determinants of population connectivity, and only GST
sufficiently takes such differences into account.
In conclusion, both D and GST revealed insightful
aspects of population structure on short timescales, and
across strong clines in population size and connectivity.
However, both metrics differed in their suitability to
address the particular hypotheses that motivated our
study. Furthermore, while our empirical data suggested
that the overall contribution of mutation to population
structure was negligible, simulations revealed that the
discrepancy between D-based and GST-based population
structuredependsonthe
D-based and GST-based population structures were par-
ticularly different for markers with moderate to high
mutation rates (Table 4). This was expected given the
Inthe first case,D
colonizedupstream
diversityand
mutation rate. Indeed,
differential effects of mutation observed in previous
studies (Jost 2008; Meirmans & Hedrick 2011; Whitlock
2011). It is therefore important to remember that D and
GSTare more likely to generate different insights when
using markers with high mutation rates.
Acknowledgements
We thank Bart Christiaen, Ingrid Hontis, Ellen Pape, David
Monnier, Simala Souvannavong and the Conseil Supe ´rieur de
la Pe ˆche (France) for field support, and Sarah Geldof for geno-
typing. We thank Lou Jost, Nolan Kane, Pascal Hablu ¨tzel,
Maarten Larmuseau, Gregory Maes, Joachim Mergeay, Luisa
Orsini, Joost Vanoverbeke and two anonymous reviewers for
inspiring discussion, and Theo Raeymaekers and Agnes Deconinck
for proofreading. Research was sponsored by the Research
Foundation—Flanders (project G.0142.03 and research com-
munity W0.037.10N ‘Eco-evolutionary dynamics in natural
andanthropogeniccommunities’)
(GOA⁄2008⁄06 and PF⁄2010⁄07).
andtheK.U.Leuven
References
Balloux F (2001) EASYPOP (version 1.7): a computer program
for the simulation of population genetics. Journal of Heredity,
92, 301–302.
Belkhir K, Borsa P, Chikhi L, Raufaste N, Bonhomme F (2002)
Genetix 4.04: logiciel sous Windows TM pour la ge ´ne ´tique des
populations. Laboratoire Ge ´nome, Populations, Interactions,
CNRS UMR 5000, Universite ´ de Montpellier II, Montpellier,
France.
Bell MA, Andrews CA (1997) Evolutionary consequences of
postglacialcolonisationof
anadromous fishes. In: Evolutionary Ecology of Freshwater
Animals (eds Streit B, Sta ¨dler T and Lively CM), pp. 323–363.
Birkha ¨user Verlag, Basel.
Bell MA, Aguirre WE, Buck NJ (2004) Twelve years of
contemporary armor evolution in a threespine stickleback
population. Evolution, 58, 814–824.
Callens T, Galbusera P, Matthysen E et al. (2011) Genetic
signature of population fragmentation varies with mobility
in seven bird species of a fragmented Kenyan cloud forest.
Molecular Ecology, 20, 1829–1844.
Edelaar P, Bjorklund M (2011) If FSTdoes not measure neutral
genetic differentiation, then comparing it with QST is
misleading. Or is it? Molecular Ecology, 20, 1805–1812.
Elphinstone MS, Hinten GN, Anderson MJ, Nock CJ (2003) An
inexpensive and high-throughput procedure to extract and
purify total genomic DNA for population studies. Molecular
Ecology Notes, 3, 317–320.
FelsensteinJ(1995)
PHYLIP
version 3.5. Distributed by the author. Department of
Genetics, University of Washington, Seattle.
Gerlach G, Jueterbock A, Kraemer P, Deppermann J, Harmand P
(2010) Calculations of population differentiation based on GST
and D: forget GSTbut not all of statistics!. Molecular Ecology,
19, 3845–3852.
Goudet J (1995) Fstat (Version 1.2): a computer program to
calculate F-statistics. Journal of Heredity, 86, 485–486.
fresh water byprimitively
(Phylogenyinferencepackage)
D VS. GSTON SHORT TIMESCALES 3471
? 2012 Blackwell Publishing Ltd
Page 15
Gower
Psychometrika, 40, 33–51.
Hardy OJ, Vekemans X (2002) SPAGeDI: a versatile computer
program to analyse spatial genetic structure at the individual
or population levels. Molecular Ecology Notes, 2, 618–620.
Hardy OJ, Charbonnel N, Freville H, Heuertz M (2003)
Microsatellite allele sizes: a simple test to assess their
significanceongeneticdifferentiation. Genetics,163,1467–1482.
Heller R, Siegismund HR (2009) Relationship between three
measures of genetic differentiation GST, DESTand G’ST: how
wrong have we been? Molecular Ecology, 18, 2080–2083.
Holsinger KE, Weir BS (2009) Genetics in geographically
structured populations: defining, estimating and interpreting
FST. Nature Reviews Genetics, 10, 639–650.
Hurst LD (2009) Genetics and the understanding of selection.
Nature Reviews Genetics, 10, 83–93.
Hutchison DW, Templeton AR (1999) Correlation of pairwise
genetic and geographic distance measures: inferring the
relative influences of gene flow and drift on the distribution
of genetic variability. Evolution, 53, 1898–1914.
Johnson JA, Talbot SL, Sage GK et al. (2010) The use of
genetics for the management of a recovering population:
temporal assessment of migratory peregrine falcons in North
America. PLoS ONE, 5, 15.
Jombart T, Devillard S, Balloux F (2010) Discriminant analysis
of principal components: a new method for the analysis of
genetically structured populations. Bmc Genetics, 11, 94.
JostL (2008)GST
and its
differentiation. Molecular Ecology, 17, 4015–4026.
Jost L (2009) D vs. GST: response to Heller and Siegismund
(2009) and Ryman and Leimar (2009). Molecular Ecology, 18,
2088–2091.
Largiade `r CR, Fries V, Kobler B, Bakker TCM (1999) Isolation
and characterization of microsatellite loci from the three-
spined stickleback(Gasterosteus
Ecology, 8, 342–344.
Larmuseau MHD, Raeymaekers JAM, Ruddick KG, Van Houdt
JKJ, Volckaert FAM (2009) To see in different seas: spatial
variationintherhodopsin
(Pomatoschistus minutus). Molecular Ecology, 18, 4227–4239.
Le Rouzic A, Østbye K, Klepaker TO et al. (2011) Strong and
consistent natural selection associated with armor reduction
in sticklebacks. Molecular Ecology, 20, 2483–2493.
Leinonen T, O’Hara RB, Cano JM, Merila ¨ J (2008) Comparative
studies of quantitative trait and neutral marker divergence: a
meta-analysis. Journal of Evolutionary Biology, 21, 1–17.
Leng L, Zhang DX (2011) Measuring population differentiation
using GSTor D? A simulation study with microsatellite DNA
markers under a finite island model and nonequilibrium
conditions. Molecular Ecology, 20, 2494–2509.
Ma ¨kinenHS, Merila ¨J
phylogeography of the three-spined stickleback (Gasterosteus
aculeatus) in Europe – Evidence for multiple glacial refugia.
Molecular Phylogenetics and Evolution, 46, 167–182.
Ma ¨kinen HS, Cano JM, Merila ¨ J (2006) Genetic relationships
among marine and freshwater populations of the European
three-spined stickleback (Gasterosteus aculeatus) revealed by
microsatellites. Molecular Ecology, 15, 1519–1534.
MeirmansPG, Hedrick PW (2011) Assessing population
structure:FST
andrelated
Resources, 11, 5–18.
JC (1975)Generalized procrustesanalysis.
relatives do notmeasure
aculeatus
L.).
Molecular
gene ofthesand goby
(2008)MitochondrialDNA
measures.
MolecularEcology
Morin PA, Manaster C, Mesnick SL, Holland R (2009)
Normalization and binning of historical and multi-source
microsatellite data: overcoming the problems of allele size
shift with allelogram. Molecular Ecology Resources, 9, 1451–
1455.
NeiM(1987)
MolecularEvolutionary
University Press, New York.
Oksanen J, Kindt R, Legendre P, O’Hara RB, Stevens MHH
(2007) Vegan: Community Ecology Package. R package version
1.8-8. http://r-forge.r-project.org/projects/vegan.
van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P
(2004) Micro-Checker: software for identifying and correcting
genotyping errors in microsatellite data. Molecular Ecology
Notes, 4, 535–538.
Peichel CL, Nereng KS, Ohgi KA et al. (2001) The genetic
architecture of divergence between threespine stickleback
species. Nature, 414, 901–905.
PenningsPS,AchenbachA,
evolutionary potentials in an obligate ant parasite and its
two host species. Journal of Evolutionary Biology, 24, 871–
886.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of
populationstructureusing
Genetics, 155, 945–959.
Raeymaekers JAM (2011) Modelling contemporary evolution in
stickleback. Molecular Ecology, 20, 2465–2467.
Raeymaekers JAM, Maes GE, Audenaert E, Volckaert FAM
(2005) Detecting Holocene divergence in the anadromous-
freshwater three-spined stickleback (Gasterosteus aculeatus)
system. Molecular Ecology, 14, 1001–1014.
Raeymaekers JAM, Van Houdt JKJ, Larmuseau MHD, Geldof S,
Volckaert FAM (2007) Divergent selection as revealed by PST
and QTL-based FSTin three-spined stickleback (Gasterosteus
aculeatus)populationsalong
Molecular Ecology, 16, 891–905.
Raeymaekers JAM, Maes GE, Geldof S et al. (2008) Modeling
genetic connectivity in sticklebacks as a guideline for river
restoration. Evolutionary Applications, 1, 475–488.
Raeymaekers JAM, Raeymaekers D, Koizumi I, Geldof S,
Volckaert FAM (2009) Guidelines for restoring connectivity
around water mills: a population genetic approach to the
management of riverine fish. Journal of Applied Ecology, 46,
562–571.
Ryman N, Leimar O (2009) GST is still a useful measure of
genetic differentiation – a comment on Jost’s D. Molecular
Ecology, 18, 2084–2087.
Schluter D (1996) Ecological speciation in postglacial fishes.
Philosophical Transactions of the Royal Society of London Series
B: Biological Sciences, 351, 807–814.
Storz JF (2005) Using genome scans of DNA polymorphism to
infer adaptive population divergence. Molecular Ecology, 14,
671–688.
Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4:
molecular evolutionary genetics analysis (MEGA) software
version 4.0. Molecular Biology and Evolution, 24, 1596–1599.
Taylor EB (1999) Species pairs of north temperate freshwater
fishes: evolution, taxonomy, and conservation. Reviews in
Fish Biology and Fisheries, 9, 299–324.
Van Dongen S, Lens L, Pape E, Volckaert FAM, Raeymaekers
JAM (2009) Evolutionary history shapes the association
betweendevelopmentalinstability
Genetics. Columbia
FoitzikS(2011)Similar
multilocusgenotypedata.
acoastal-inlandgradient.
andpopulation-level
3472 J. A. M. RAEYMAEKERS ET AL.
? 2012 Blackwell Publishing Ltd