Challenges in Estimating Insecticide Selection Pressures
from Mosquito Field Data
Susana Barbosa1*, William C. Black IV2, Ian Hastings1
1Molecular and Biochemical Parasitology Group, Liverpool School of Tropical Medicine, Liverpool, United Kingdom, 2Department of Microbiology, Colorado State
University, Fort Collins, Colorado, United States of America
Insecticide resistance has the potential to compromise the enormous effort put into the control of dengue and malaria
vector populations. It is therefore important to quantify the amount of selection acting on resistance alleles, their
contributions to fitness in heterozygotes (dominance) and their initial frequencies, as a means to predict the rate of spread
of resistance in natural populations. We investigate practical problems of obtaining such estimates, with particular emphasis
on Mexican populations of the dengue vector Aedes aegypti. Selection and dominance coefficients can be estimated by
fitting genetic models to field data using maximum likelihood (ML) methodology. This methodology, although widely used,
makes many assumptions so we investigated how well such models perform when data are sparse or when spatial and
temporal heterogeneity occur. As expected, ML methodologies reliably estimated selection and dominance coefficients
under idealised conditions but it was difficult to recover the true values when datasets were sparse during the time that
resistance alleles increased in frequency, or when spatial and temporal heterogeneity occurred. We analysed published data
on pyrethroid resistance in Mexico that consists of the frequency of a Ile1,016 mutation. The estimates for selection
coefficient and initial allele frequency on the field dataset were in the expected range, dominance coefficient points to
incomplete dominance as observed in the laboratory, although these estimates are accompanied by strong caveats about
possible impact of spatial and temporal heterogeneity in selection.
Citation: Barbosa S, Black WC IV, Hastings I (2011) Challenges in Estimating Insecticide Selection Pressures from Mosquito Field Data. PLoS Negl Trop Dis 5(11):
Editor: Rhoel Ramos Dinglasan, Johns Hopkins Bloomberg School of Public Health, United States of America
Received June 15, 2011; Accepted September 19, 2011; Published November 1, 2011
Copyright: ? 2011 Barbosa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: S.B. was supported by research grant SFRH/BD/33534/2008 from Fundac ¸a ˜o para a Cie ˆncia e Tecnologia, Portugal and Siemens Portugal under a Ph.D.
Program in Computational Biology of the Instituto Gulbenkian de Cie ˆncia, Oeiras, Portugal. The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: firstname.lastname@example.org
Mosquito-borne diseases are prevalent in the tropics and
subtropics and constitute a large proportion of the health problems
in developing countries. The major mosquito vectors occur in the
genera Culex, Aedes, and Anopheles, which transmit Filaria spp,
Japanese encephalitis, dengue and yellow fever viruses and
malaria. The control of vector populations is often based on
insecticides, such as larviciding, indoor residual spraying (IRS) and
personal protection through insecticide treated materials (ITM)
and their use has been shown to have a powerful impact on
mosquito abundance and disease transmission . The foraging
and resting behaviors of mosquitoes ensure a number of
potentially lethal interactions with insecticide-treated surfaces
during parts of the mosquito lifecycle , but prolonged exposure
to an insecticide over many generations runs the risk that
mosquitoes will develop resistance. In general only four different
chemical classes of synthetic insecticides are used in the field:
organochlorines, organophosphates, carbamates and pyrethroids.
Pyrethroids are particularly important because they are the only
class of insecticides recommended by The World Health
Organization to use on ITM. Since ITM are being widely
distributed across malaria and dengue affected countries and
pyrethroids are employed in some areas for agricultural pest
control, there is concern that the emergence of resistance will
compromise these efforts. The widespread use of a small portfolio
of compounds against large mosquitoes populations with many
generations per year (estimated at 12 per year for Anopheles gambiae
 and 20 for Aedes aegypti ) have raised fears that high levels of
resistance may arise very quickly.
We consider the problem of measuring the strength of selection
for insecticide resistance in mosquito field populations and show
how changes in the frequencies of the alleles at a single locus can
be used to estimate the selection acting on each genotype. This
type of data is collected for the identification of genetic
mechanisms of resistance and/or during monitoring programs of
vector control campaigns. The method we developed extends that
described earlier by DuMouchel and Anderson in 1968  for
laboratory populations. Laboratory
significantly from the field. In the laboratory insecticide assays
are conducted over standardized range of doses and concentra-
tions that do not account for field situations such as decay rates
and exposure characteristics. Following insecticide deployment in
the field, concentration decreases and there is a selective window
of time at lower concentrations (see Figure 1), where resistant
heterozygotes do not die but susceptible homozygotes are still
killed, therefore acting as dominant when under more standard-
ized conditions it may appear to be recessive. This is relevant
because dominance relationships between susceptible and resis-
tance alleles affect the rate of spread of resistance.
www.plosntds.org1November 2011 | Volume 5 | Issue 11 | e1387
Using a maximum likelihood (ML) procedure and a recursive
genetic model that tracks the changes in the resistance allele
frequencies at a single locus it is possible to estimate a selection
coefficient (s), a coefficient quantifying dominance (h) and the
initial frequency of the resistance allele (p0,), key parameters that
determine the dynamics of resistance. The model provides a
straightforward way to obtain these values with the least complex
dataset possible. However, field data on the spread of resistance is
often suboptimal: datasets may be small, may only track one
period of dynamics (typically early and late stage of spread) or may
be pooled from different locations. In this paper we discuss the
challenges associated with this approach. We used published data
on pyrethroid resistance from Aedes aegypti, throughout Mexico 
on the frequency of the Ile1,016 mutation, one of the mutations in
the voltage-gated sodium channel gene known to confer resistance
to pyrethroids (this is known as knockdown resistance, a term
applied to insects that fail to lose coordinated activity immediately
Model and Methods
The genetic model we employ assumes a single autosomal locus
conferring insecticide resistance in a diploid sexually reproducing
population, with non-overlapping generations and assuming
random mating; these are standard assumptions in population
genetics models. There are two possible alleles, resistant (R) or
susceptible (S), and three possible genotypes (SR, RR, SS). The
fitness coefficient, which is a measure of survival and reproduction
of the different genotypes, was defined as 1 for the susceptible
homozygotes SS, 1+s (s is the selection coefficient) for resistant
homozygotes RR and 1+hs (h is the dominance coefficient) for
heterozygotes SR. The level of dominance is a measure of the
relative position of the phenotype of the heterozygote relative to
the phenotype of the two corresponding homozygotes. Complete
dominance for susceptible allele is represented by h=0 and
complete dominance for resistance allele by h=1, alleles are
codominant or additive when h=0.5. The fitness coefficients are
composite measures of fitness in both the exposed and unexposed
mosquitoes groups and are assumed to be the same for males and
We also assume a large population, so that genetic drift can be
ignored, which enabled us to predict the frequency of the resistant
allele at any time t according to the recursion expression:
pt: frequency of the resistant allele at time/generation t
qt=12pt: frequency of the susceptible allele at time/
This recursive equation is the basic formula of selection of a
favourable gene [7,8]. We defined the allele initial frequency p0, as
the frequency in the first sampling time point, set as generation 0.
Each subsequent generation can be converted onto a real
timescale of years by assuming a constant number of generations
per calendar year.
The ML approach to estimate the unknown parameters h and s
and p0, based on this genetic model, involved selecting initial
values of s, h, and p0, and then testing how well the predicted allele
frequencies matched those observed in the dataset.
Field datasets usually consist of the number of resistant alleles xt
and the total number of sampled alleles n at different time points t.
The probability of observing x resistant alleles among n alleles
follows a binomial distribution, with a probability of success (being
a resistance allele) p for each sampled time point t.
Figure 1. The typical change in insecticide concentration in the
field over time. As concentration decays with time after deployment
there is a differential survival of genotypes. In period A the RR genotype
will survive while the RS and SS dies: this makes the R allele recessive in
this period. In period B both RR and RS survive making the R allele
dominant in this period. In period C all genotypes can survive so no
selection occurs. These are windows of selection, adapted from .
The emergence and spread of insecticide resistance
compromise the control of mosquito borne diseases such
as dengue or malaria, which are responsible for millions of
deaths every year in tropical and subtropical areas. There
are currently no easily implemented methodologies to
quantify the strength of selection for resistance occurring
in nature. Using field data from Mexico on the frequency of
an allele mutation conferring resistance in the mosquito
Aedes aegypti we use maximum likelihood (ML) to estimate
the selection and dominance coefficients driving the
evolution of resistance. We explored the impact of poor
data collection, data that combine information from
different locations and the consequences of selection
and dominance coefficients varying over the sampling
time period. The ML method can accurately estimate these
parameters with simulated data in ideal sampling situa-
tions but it is difficult to recover true values when spatial
and temporal heterogeneity occurs. The analysis high-
lighted factors relevant to field work such as the need for
frequent surveillance in discrete sentinel sites.
Estimating Selection from Mosquito Field Data
www.plosntds.org2November 2011 | Volume 5 | Issue 11 | e1387
pt: probability of sampling an R allele, i.e, probability of
ways of sampling x resistant alleles among n total alleles
: combinatorial term to account for the number of
The corresponding binomial likelihood function is:
The likelihood function returns the likelihood of the value ptgiven
the observed data of xtresistant alleles among the sample of ntat
each generation. Essentially, it tells us how consistent the data are
with predicted values of pt(Equation 1). The likelihood value for
the dataset is the product of the likelihoods across the entire
We implemented this ML methodology in R  using constrOptim()
function (from stats package) for which it is not necessary to
provide analytic derivatives and that can minimize/maximize a
function subject to linear inequality constraints. Three constraints
on the parameter values were enforced: 0,p,1, 0,s,1, 0,h,1,
except when analysing the field data when a constraint on h (0–1.5)
was imposed. The Nelder-Mead optimization method algorithm
was used, that generates a new test position by extrapolating the
behavior of the objective function measured at each test point
arranged as a simplex. The algorithm then chooses to replace one
of these test points with the new test point and the algorithm
progresses. The simplest step is to replace the worst point with a
point reflected through the centroid of the remaining points. If this
point is better than the best current point, then it will expand
exponentially along this line. On the other hand, if this new point
is not much better than the previous value the simplex returns the
previous point. The standard error (s.e) of the estimates was
determined by inverting the Hessian matrix evaluated at the ML
estimate and the 95% confidence interval endpoints were
calculated as Parameter estimate61.96*s.e.
Maximum likelihood estimation is an optimization technique
and there is no guarantee that the set of parameters that uniquely
maximizes the likelihood will always be found because the
algorithm may converge onto local optima whose likelihood is
below the global maximum. To overcome this problem 1000 runs
of the ML iteration procedure were performed in every estimation,
with random starting values of the parameter estimates used to
initialise the optimization routine . In the analyses described
here, the runs that converged to other estimates had ML values
sufficiently less than the global maximum that a likelihood ratio
test considered them different, so that the set of parameters could
be safely discarded. However a small percentage of the runs
converged to a set of different parameters with a similar likelihood
value that could not be considered different using a likelihood ratio
test. The criteria used to exclude these results as potential best
estimates was that the estimated value of h lay on the boundary of
the constrained parameter range and is expected to reflect erratic
behavior of the algorithm when using a small sample.
We tested the algorithm and program by analyzing 100 datasets
simulated under idealized conditions using Equations 1 and 2.
Initial frequency, dominance and selection coefficient were in the
ranges 0.01–0.04, 0.3–0.8 and 0.1 to 0.3 respectively, all
distributions were uniform. Three parameters values were selected
for each dataset and held constant during the simulation, i.e., there
was no temporal or spatial variation in parameter values and
populations sizes were sufficiently large that stochastic changes in
alleles frequencies could be ignored. Data were available for each
generation, 100 alleles (50 mosquitoes) were sampled each
generation (Equation 2) and the simulations were run until the
resistance allele frequency exceeded 0.99. Accuracy of analysis was
gauged by the correlation coefficient between true and estimated
parameter values, and by checking how frequently the true values
fell with the estimated 95% confidence intervals.
Next we examined the impact of suboptimal datasets. Equation
1 was used to predict allele frequency for 120 generations and we
assumed that 100 alleles were sampled in each generation. Two
optimal datasets with different dominance values were produced to
check if the ML method accurately recovered the parameters
when data from all generations was available (as above) and to
investigate the effect of different degrees of dominance on
estimations. Subsets of the data were used to examine the
influence of incomplete sampling when only a few generations of
data are available, or when only the initial stages of spread are
available for analysis.
Field data, collected and analyzed by Garcia et al. , was
available for analysis. There were a total of 78 field collections
containing 3,808 Aedes aegypti (some as much as 2000 km apart).
Each mosquito was genotyped at the Ile1,016 locus. We pooled
data from the different locations and analysed it assuming different
number of generations of mosquitoes per year (6,9,12,16 and 20),
to check the consistency of the estimations. Intuitively, we would
expect spatial and temporal variation in the selection parameter in
the Garcia et al. dataset and in many other datasets obtained under
field conditions. It was therefore vital to ascertain how heteroge-
neity would affect the algorithm’s ability to recover the underlying
parameters from pooled data. Spatial heterogeneity was investi-
gated by simulating allele frequencies for 80 different locations
over 50 generations using Equation 1; 100 alleles were sampled
from each generation (Equation 2) and data from each generation
in each location were used in the analyses. Parameters p0and h
were randomly selected from a uniform probability distribution
(p0,<(0,1), h,<(0,1)) while s was randomly drawn from a normal
distribution (s,N(0.15,0.025)), the constraints on s coefficient are
within a reasonable range for a field setting. Once selected for a
location, the values of h and s did not change, i.e., there was no
temporal heterogeneity. Two simulation strategies were used: (i) p0
and h were allowed to vary while s was held constant at 0.1, 0.3,
0.6, 0.8 or 1, (ii) p0, and s was allowed to vary while h was held
constant at 0, 0.25, 0.5. 0.75 or 1. The data across the simulated
locations were pooled for analyses. Each simulation strategy was
run 300 times giving a total of 30065=1500 per strategy. The
mean values of each parameter over all simulated locations was
assumed to be the true value and the accuracy of the program was,
as before, gauged by the correlation coefficient between the
estimated and true values, and by the proportion of the true values
falling within the 95% CI.
The effect of temporal heterogeneity in estimations was also
investigated by varying s and h over 50 generations in a single
location, i.e., different s and h values in different generations. The
distribution of values were the same as those used for spatial
heterogeneity. Three scenarios were considered: (i) s and h both
varied over generations, (ii) h could vary while s was held constant,
(iii) h could vary while s was held constant. In the simulations of
spatial heterogeneity the values of h and s had to be fixed across
locations (e.g. h=0, 0.25, 0.5. 0.75 or 1) but in the simulation of
temporal heterogeneity only one location was examined in each
simulation so the values could be drawn from the underlying
distributions. As before, 300 datasets were produced for each
Estimating Selection from Mosquito Field Data
www.plosntds.org3November 2011 | Volume 5 | Issue 11 | e1387
scenario but because the fixed values of h and s could be drawn
from a distribution, the total number of runs was 30063=900. As
before, the performance of the algorithm under conditions of
temporal heterogeneity was assessed by defining the true value as
mean over the generations, and calculating the correlation
coefficient between the estimated and true values and how
frequently the true value was included in the 95% confidence
Finally, it is important to note two features of our analyses that
may not be obvious to non-specialists. Firstly, that the genetic
parameters h and s describe the overall, net rate of spread of
resistance alleles through natural populations and cannot formally
distinguish where selection is acting. For example, they cannot
determine whether selection was acting differentially on the adult
or larval stages, whether fitness costs were associated with
resistance, whether there was differential selection on the sexes,
nor whether killing was likely to be in early or later adult stages,
the latter being a topic of contemporary interest given recent
suggestion by Koella and colleagues  that killing older adults
will reduce the selective pressures for insecticide resistance.
Secondly, the analyses were designed to recover the genetic
parameters that resulted from past control program and, as such,
they cannot explore the issue of how differing patterns of
insecticide deployment drive resistance. This require a separate,
formal modelling approach explicitly designed to investigate the
differing impact of deployment strategies on driving resistance.
These analyses have been described elsewhere, particularly for the
agriculture pesticides [12–15].
The analysis of idealized datasets (Table 1) suggest ML can
accurately recover the underlying parameter values from optimal
Figure 2 shows six example simulations of the increase in
resistance allele frequencies over 120 generations, under two
dominance conditions (semi-recessive, h=0.2 bottom panel, and
semi-dominant, h=0.8 top panel). Values of p0, s, and h appear in
Table 2. The program appears less accurate when analysing
subsets of the original data, particularly if the resistance allele is
semi-recessive. When the resistance allele was semi-dominant,
resistance increased rapidly and the true estimates were recovered
if the subset included points that captured the pattern of increase,
such as the subset 1. When the resistance allele was semi-recessive,
the frequency was maintained at low levels for a long period, the
true parameters values were either recovered (Subset 1) but within
confidence intervals that were so large as to be uninformative, or
were not even contained in the confidence intervals (Subset 2) even
with the inclusion of the last generation, the only sampling point in
the subset that captures the incipient frequency rise. The ML
parameter estimates in Table 2 were achieved in 34 to 83% of the
1000 ML runs indicating that a significant proportion of the
estimation routines converged onto local maxima.
Analysis of the Aedes aegypti dataset resulted in the parameter
estimates in Table 3. These ML estimations were obtained
assuming 6, 9, 12, 16 and 20 generations per year. The estimation
converged on the same ML value 76 to 96% of the runs. With a
small percentage of the runs (0.005 to 0.16%) converged to a set of
different parameters with a similar likelihood value but which were
excluded because the estimated value of h was on the boundary of
the constrained parameter range. The estimates of p0and h were
highly consistent irrespective of assumed number of generations
per year and ranged from 0.0032 to 0.0035 and from 0.77 to 0.78,
respectively. As expected the s was strongly dependent on the
assumed number of generations per year and ranged from 0.042 to
Results from spatially heterogeneous datasets pooling data from
80 different locations are shown in Figures 3 and 4. The algorithm
appears unable to consistently obtain accurate estimations of the
parameters s and h under such heterogeneous settings, manifested
by low values of correlation coefficients and many true values
outside the 95% confidence interval of the estimate. For example,
with the dominance estimations in Figure 3 when selection was
constant at 0.6, only 12% of the true values fell within the
confidence interval. Initial frequency values were accurately
recovered in all simulated scenarios, possibly due to the the
recursion dependency on the initial frequency. However, the
estimation of selection and dominance coefficients was achieved
with very low values of correlation coefficients between the
estimates and the mean of the parameter over the 80 simulated
locations (not very precise), in all different hypothetical scenarios.
Additionally, if most of the values were in the confidence
interval, the mean range of the interval was as wide as the
parameter range. For example in Figure 4 note that when
dominance is constant at 0.75, 100% of the true values are in the
confidence interval, but the average mean range is 0.86 (the
parameter range is 0–1). The plotted simulated data and estimates
do not traverse the entire range of the parameters values because
they are the mean over the 80 locations, the central limit theorem
predicts that these estimates will converge to the center of the
Simulations of a location with temporal heterogeneous selection
pressure (dominance and/or selection changing in every genera-
tion) are shown on Figure 5. Again, the model does not accurately
recover the true parameters under conditions of temporal
heterogeneity. The exception was the dominance parameter when
it was held constant in a particular location with selection varying
in each generation, the correlation coefficient between the
estimate and the mean dominance value over the 80 locations
Insecticide resistance research is largely focused on the
identification of the mechanisms responsible for resistance, and
whether the genetic mechanism is monogenic or polygenic,
general or population specific and if there are associated fitness
costs and developmental patterns . The emergence and spread
of resistance is well documented, but there is still a worrying lack of
quantification of the evolution dynamics in populations under
control  and its persistence in populations following cessation
of control. The quantification of the strength of selection acting in
Table 1. Details of 100 idealized simulated datasets.
Parameter range0.01–0.04 0.2–0.80.1–0.3
r * 0.940.990.99
TV (%)* 9192 97
[ ]*0.021 0.0140.002
The simulated datasets were used to check the precision and accuracy of the
*r correlation coefficient between original value and estimate, TV percentage of
true values in the estimates 95% confidence interval and [ ] mean range value of
the confidence interval.
Estimating Selection from Mosquito Field Data
www.plosntds.org4 November 2011 | Volume 5 | Issue 11 | e1387
the wild has previously been attempted using direct laboratory and
field trials, and indirect approaches using a variety of data,
including patterns of DNA variability and spatial and temporal
changes in allele frequencies [17,18]. Selection acting on
insecticide resistance genes in the field was first estimated using
genetic models for species in the genera Anopheles by Curtis et al.
 and Wood and Cook , both were based on the observed
changes in gene frequency over regular intervals and the latter also
discussed estimation by deviations from the expected Hardy-
Weinberg equilibrium frequencies. Both methods assumed a fixed
level of effective dominance under field conditions. A recent
example is the estimation of relative fitness by Livingston and
Fackler  for pyrethroid resistance in insects that infest crops. In
this case the magnitude of the estimates were similar to those
obtained using traditional laboratorial direct approaches using non
linear least squares estimation. The most refined work that we are
Figure 2. Simulated evolution of resistance allele frequency over 120 generations under two different scenarios of dominance
relationship and analysing the full dataset or subsets of data. Specifications in Table 2.
Table 2. Specifications of datasets of Figure 2.
True Estimates [95% CI]
Complete1:120 0.001 0.20.20.0010 [0, 0.0018]0.17 [0.13, 0.38] 0.24 [0, 0.48]
0.001 0.80.20.0009 [0.0005, 0.0013]0.81 [0.77, 0.84]0.20 [0.19, 0.21]
Subset 1 1:5,15:18,116:120 0.0010.2 0.2 0.0025 [0, 0.0062]0.45 [0, 1.5]0.07 [0, 1]
0.0010.80.2 0.0016 [20.0017, 0.0049] 0.77 [0.36, 1.18]0.19 [0.08, 0.29]
Subset 2 1,13,25,37,73,85,97,1200.001 0.20.2 0.0014 [0, 0.0030]0.02 [0, 0.18]1.00 [0.96, 1]
0.0010.80.2 0.0009 [20.0007, 0.0025] 0.80 [0.64, 0.96]0.20 [0.16, 0.25]
Sampled generations and true parameter values and ML parameter estimates with respective 95% confidence intervals.
Estimating Selection from Mosquito Field Data
www.plosntds.org5 November 2011 | Volume 5 | Issue 11 | e1387
aware of, quantifies selection coefficients and costs associated with
resistance for Culex pipiens in Southern France, using spatial
information from clines to estimate selective advantages and costs,
and temporal information from a long-term survey to estimate the
selection coefficients of alleles in each environment using a
standard ML estimation approach [22,23].
We have described a ML method for simultaneously estimating
the selection and dominance coefficients and an initial resistance
allele frequency similar to that of , but we also tackled the
effects of spatial and temporal differences in selection intensity that
can arise as a result of different strategies of deployment of the
insecticide, migration patterns and/or infrequent and sparse field
sampling of mosquitoes. The approach described in this paper was
accurate with simulated data but proved less robust when
analysing few intermediate allele frequencies, especially when the
resistance allele is recessive. The reason is that all resistance
dynamics start from the same point (very low frequency) and end
at the same point (very high frequencies) but in the absence of
intermediate time points it is impossible to reconstruct the
dynamics in between. If the sampling period covers only the
onset of resistance or the final stages, when resistance is close to
fixation, the accurate estimation of selection and dominance
coefficients can be difficult. The estimation is problematic because
in the early stages heterozygotes prevail in the population, with a
fitness Wrs=1+hs for which there are a range of values of h and s
that yield the same product hs. This is illustrated using subsets in
Figure 2. The true values of s and h were 0.2 and 0.2 but the
estimates were 0.45 and 0.07 (Table 2). The situation was even
worse for subset 2 of the data (Figure 2) where the analysis inferred
a completely different trajectory of resistance spread and the true
values of h=s=0.2 were estimated as h=0.02 and s=1.0
(Table 2). Once again, note that the fitness of the heterozygote
was estimated as 1+hs=1.02 which was relatively close to the true
value of 1.04 and that it is the predicted value of the homozygotes,
which were largely absent from this subset of the data, that were
badly estimated (as Wrr=2.0 rather than the true value of 1.2).
Nevertheless, the calculated fitness (1+hs) is very similar (1.04 and
1.03). On the other extreme, when resistance is almost fixed, there
will be mainly homozygotes in the dataset (with fitness Wrr=1+s),
so estimating a dominance value will also be problematic because
of the lack of heterozygotes (with fitness Wrs=1+hs). Unfortunate-
ly, this is a very common type of data where genetic surveys
initially indicate resistance was absent then, once its presence was
detected, a second survey was run and higher levels were detected.
Our analyses indicate that it is highly unlikely that any robust
genetic parameters can be obtained from these kind of fragmented
datasets. Future surveillance surveys should consequently be
optimized by choice of a proper sampling strategy and timeframe.
It is therefore of extreme importance to sample as many
generations as possible, even if it means collecting fewer
individuals. There is an important difference between standard
statistics and ML estimation. In standard statistics, the 95% CI
should capture the likely variation in magnitude of parameter
estimates. In ML it only captures the likely variation provided the
model has identified the correct trajectory of allele frequency
changes. This is problematic in incomplete datasets where many
trajectories may provide similar fits to the observed data. It is
absolutely essential to run numerous analyses from randomly
selected starting parameter values to check for the presence of
numerous trajectories of similar ML but with widely different
Pooling data from different locations can be seen as a
reasonable option to minimize the lack of sampled generations
and small sample size. The Ile1,016 mutation frequency dataset of
Garcia et al.  provided the opportunity to apply the model to
real field data. This data contains allele frequencies of mosquitoes
collected in 78 different locations around Mexico since 1999.
Insecticide use was not uniform across cities and towns in Mexico
and will probably differ between years and in addition migration
will probably lead to different initial resistance allele frequency.
The estimates obtained from simulated pooled data demonstrated
that this kind of data-pooling, which is probably inevitable in most
surveys, is not very robust. The coefficients reported in Table 3
should simply be recognized as a rough estimate between the years
1999 to 2008 and that they may vary, albeit by an unknown
amount, over time and space.
Equation 1 describes a highly idealized population, i.e., one
that is large, randomly mating, and homogenous in time and
space. It is therefore important to consider the extent to which
our population differs from this paradigm and what consequence
this may have for the results. A large population is required so
that we can ignore genetic drift, i.e., random fluctuations in allele
frequency around our predicted values. Drift is important in
laboratory studies (see  for discussion) but in natural
populations there is a consensus that genetic drift can be ignored
provided 4Ne~ s s.10 where Ne is effective population size  and
~ s s is the weighted mean fitness of the resistance heterozygotes and
homozygotes. Estimates of Ne provided by  for Aedes Aegypti
ranged from 10–22 in different regions of Mexico. These
estimates seem intuitively to be very small. The most likely
explanation is that they are measure of historical population size,
so may have been caused by founder effects and population
bottlenecks in the distant past. Estimates of contemporary
population sizes are more appropriate in the current context
and most estimates of contemporary effective population sizes of
vectors are much higher, for example, in the region of 1000+ for
Anopheles gambiae [27–29]. It would be possible to introduce the
effects of drift by simulating small populations sizes and sampling
(with replacement) the parents of the next generation. However
one of the key conclusions of this study is the difficulty of
Table 3. Estimated p0, h and s parameters from field data.
9 0.0033 0.00330.0033
16 0.00340.0034 0.0034
The dataset corresponds to field collected data on Ile1,016 resistance allele
frequencies in Ae. Aegypti from Mexico. Assuming 6, 9, 12, 16 and 20
generations per year.
Estimating Selection from Mosquito Field Data
www.plosntds.org6November 2011 | Volume 5 | Issue 11 | e1387
obtaining good quality estimates of genetic parameters from field
data, so we prefer to ignore the effects of drift, and simply point
out that the stochastic variation introduced by drift will likely
further decrease our ability to recover accurate genetic parameter
values from field data.
The second requirement, that mating occurs at random is
unlikely to be true given the geographical scale of our surveys. It
would be relatively straightforward to incorporate this effect by
including Wrights F statistics in Equation 1. However, there was
no evidence of significant departure from Hardy-Weinberg in our
Figure 3. Effect of spatial heterogeneity (pooled data from 80 simulated locations) on estimates of initial allele frequency,
dominance and selection parameters. The value of the selection coefficient was held constant at 0.1, 0.3, 0.6, 0.8 or 1 in all locations and in every
generation, hence there are five rows of results corresponding to each of the 5 values of the selection coefficient. Dominance (h,<(0,1)) varied
between simulated locations, but was constant over time within each location. The true value is the mean parameter value over all locations. The
Pearson correlation coefficient (r) is between estimated and true values. TV is the percentage of the true values that are included in the 95%
confidence interval of the estimate. [ ] is the mean width of the 95% confidence interval in all runs.
Estimating Selection from Mosquito Field Data
www.plosntds.org7November 2011 | Volume 5 | Issue 11 | e1387
dataset (results not shown) so this strategy was not required. The
assumption that the population is homogenous in space is clearly
untrue. Pooling of data from different regions was required to
increase sample size and frequency because mosquito collections
were not uniform at the same location. The simulation results
demonstrate the dangers of this approach and work on malaria
vectors in Africa show unpredictably high levels of heterogeneities
in resistance even across relatively small distances . As
mentioned by  simple models cannot account for the alteration
of selection pressure by long term changes in the environment.
More complex models that consider geographic clines and the
antagonist effect of selection-migration, should be more accurate,
Figure 4. Effect of spatial heterogeneity on estimates of initial allele frequency, dominance and selection parameters. The value of
the dominance coefficient was held constant at 0, 0.25, 0.5, 0.75 or 1 in all locations and in every generation, hence there are five rows of results
corresponding to each of the 5 values of the dominance coefficient. The value of the selection coefficient (s,N(0.15, 0.025)) varied between locations,
but was held constant over time in each location. The true value is the mean parameter value over all locations. The Pearson correlation coefficient (r)
is between estimated and true values. TV is the percentage of the true values that are included in the 95% confidence interval of the estimate. [ ] is
the mean width of the 95% confidence interval in all runs.
Estimating Selection from Mosquito Field Data
www.plosntds.org8November 2011 | Volume 5 | Issue 11 | e1387
but the amount of data necessary make the implementation
unlikely in most settings. This model in its simplicity presents a
straightforward way to obtain estimates of fitness parameters. The
fact that only information about resistant allele frequencies is
necessary should make it easier to apply, and yet even such a
simple data design is difficult to implement.
Nevertheless the estimated value for p0(0.0032–0.0035), was in
the higher range of 1022to 10213expected when a pesticide is
first introduced, based on mutation-selection equilibrium .
This initial p0value reflects the frequency prior to the sampling
period. Since 1950, vector control programs in Mexico have used
a series of insecticides. DDT was used extensively for indoor
Figure 5. Effect of temporal heterogeneity on estimates of initial allele frequency, dominance and selection coefficients
parameters. Three different scenarios were simulated: (A) dominance and selection are different in every generation, (B) selection coefficient was
held constant in all generations but dominance was allowed to vary, (C) dominance was held constant in all generations while the selection
coefficient was allowed to vary. The Pearson correlation coefficient (r) between estimate and true value is shown. TV refers to the percentage of the
true values that are included in the 95% confidence interval of the estimate. [ ] is the mean range of the 95% confidence interval in all runs.
Estimating Selection from Mosquito Field Data
www.plosntds.org9November 2011 | Volume 5 | Issue 11 | e1387
house spraying from 1950–1960 and was still used in some
locations up until 1998. Malathion was later used for ultra-low
volume space spraying of wide areas from 1981 to 1999. In 2000,
programs switched to permethrin based insecticides . The
spread of resistance genes in a treated region will depend on the
initial resistant allele frequency and it is known that resistance
development in pest organisms can occur within 5–100
generations . The relatively high initial frequency estimated
explains the immediate, dramatic increase in frequencies of
Ile1,016 from the late 1990s to 2006–2008  neglecting genetic
As expected, the strength of selection increased as the
number of generations per year decreased, whereas there was
less time to get to the same frequency of resistance allele.
Selection coefficients ranging from 0.042 to 0.053 (assuming 20
and 16 generations per year) are similar to the selection
coefficients of DDT and dieldrin resistant phenotypes in
Anopheles mosquitoes that have been previously estimated to
be on the order of 0.013–0.061 . The values for 0.071 and
0.097 (12 and 9 generations per year) are in the range of what
was estimated for antimalarial drug resistance: 0.05–0.1 ,
however the value of 0.147 for 6 generations was higher than
any previous estimates. This is the first time selection for
insecticide resistance has been quantified in this species and
should be seen as a preliminary estimate.
The estimated values of h, 0.77 to 0.78, point to partial
dominance of the resistance allele under field settings. Alleles
conferring knockdown resistance were found to be to be
recessive or semi-recessive in their influence in Anopheles gambiae
s.s. , but there is strong evidence for partial dominance or
additive effects of Ile1,016 from two laboratory studies of
knockdown and survival in strains or families of Aedes aegypti
segregating for the Ile1,016 allele. Saavedra-Rodriguez et al.
 found that 127 of 221 heterozygotes recovered from
permethrin knockdown and showed later  that when
considering overall survival the differences among the three
phenotypes appear additive. Dominance in the field is
dependent on the concentration and decay of the insecticide
(see Figure 1), under this situation the resistant allele will be
effectively dominant and we think that our results of
intermediate dominance of Ile1,016 reflect this effect .
This interpretation is supported by Roush and Tabashnik ,
who reported the same situation of partial dominance for
cyclodienes and lindane, diazinon, malathion and also for
pyrethroids, where 20–60% of the heterozygotes survived
exposure in a field setting. There is ongoing debate about
differences between laboratory and field settings that extended
to the evolution of insecticide resistance itself, some suggesting
that resistance in the fields tends to be based on an allele of
major effect at a single locus whereas resistance obtained in the
laboratory is usually polygenically based . Our results show
rapid selection of mutations at a single locus.
The number of generations under natural conditions for this
species was estimated at 20 or more among strains in field
conditions in Brazil, this leads us to consider the results with the
highest number of generations as the most likely, but because Aedes
aegypti eggs can survive desiccation for months and hatch once
submerged in water , the number of generations is variable.
Nevertheless, the predicted resistance frequency trajectory using
equation 1 and the different estimates obtained assuming different
generations per year will be approximately the same in a timescale
of 20 years.
Most mutations encoding insecticide resistance are expected to
incur a fitness penalty, compared to unmutated genes, in the
absence of insecticide. There is some field evidence of reduced
fitness of Ile1,016 mutations in Aedes aegypti in permethrin free
environments  which leads us to make two technical points.
Firstly, that the selection and dominance coefficients reported here
are overall values that combine the mutations benefit when
encountering insecticide and any fitness effect in insecticide-free
areas. Secondly, the method can equally be applied to measure
negative selection pressures, (i.e., when a mutation is being lost
from a population) from field data on the mutation after insecticide
Two factors are of particular relevance to field work. Firstly,
that surveillance needs to be continuous so that a full dataset
covering the whole period of resistance spread becomes available
upon which to base these estimates. This may mean monitoring
sentinel sites for long periods when resistance is rare or absent,
but a continuous dataset is a prerequisite for accurately
estimating the dynamics underlying the spread of resistance.
Note that a continuous dataset does not necessarily mean
collecting samples every generation. The reason the analysis
could fail to recover the true parameters (Table 2) was because of
large gaps in the survey: simulations of semi-dominant mutations
lacked samples from periods of intermediate frequency, while
simulations of semi-recessive mutations only contained data from
the early stages (Figure 2). Operationally, this suggests that
regular, rather than intensive but periodic, sampling is the best
strategy. As an example, we re-analysed the semi-dominant
dataset but just incorporated samples every 10 generation, i.e., at
generations 1, 10, 20, 30:120. This resulted in estimates of
p0=0.0006 (95% CI: 0.0001–0.0010 ), h=0.89 (95% CI: 0.84–
0.95), s=0.20 (95% CI: 0.18–0.21) which are similar to those
obtained using data from all generations (Table 2, the dominance
coefficient is higher but the confidence intervals overlap). The
second point is that dominance levels acting in the field may be
much higher than those observed in the laboratory. The most
plausible explanation is that mosquitoes in the wild are
encountering low levels of insecticide that are insufficient to kill
heterozygotes. Increasing dominance greatly increases the rate at
which resistance develops. This suggests that insecticide applica-
tions should be enforced in such a way that ensure high coverage
with high doses. Our results suggest that the doses being applied
may be inadequate and that pursuing the current deployment
settings will lead to the rapid increase of resistant mosquitoes and
eventually to the complete inefficiency of permethrin in the
combat of dengue in Mexico.
The authors would like to thank Hilary Ranson and Tiago Anta ˜o for
providing useful comments on earlier versions of the manuscript, and three
anonymous reviewers for helpful comments and discussion regarding the
Conceived and designed the experiments: SB IH. Performed the
experiments: SB. Analyzed the data: SB IH. Contributed reagents/
materials/analysis tools: SB IH WCBIV. Wrote the paper: SB IH.
Estimating Selection from Mosquito Field Data
www.plosntds.org10November 2011 | Volume 5 | Issue 11 | e1387
1. WHO (2009) World malaria report 2009. pp 1–78.
2. Devine G, Perea E, Killeen G, Stancil J, Clark S, et al. (2009) Using adult
mosquitoes to transfer insecticides to aedes aegypti larval habitats. Proc Natl
Acad Sci USA 106: 11530–11534.
3. Chandre F, Darriet F, Duchon S, Finot L, Manguin S, et al. (2000)
Modifications of pyrethroid effects associated with kdr mutation in anopheles
gambiae. Med Vet Entomol 14: 81–88.
4. Fernandes C (2006) Biologia e exige ˆncias te ´rmicas de aedes aegypti (l.)(diptera:
Culicidae) provenientes de quatro regio ˜es bioclima ´ticas da paraı ´ba. Neotrop
Entomol 35: 853–860.
5. DuMouchel W, Anderson W (1968) The analysis of selection in experimental
populations. Genetics 58: 435.
6. Garcı ´a GP, Flores AE, Ferna ´ndez-Salas I, Saavedra-Rodrı ´guez K, Reyes-
Solis G, et al. (2009) Recent rapid rise of a permethrin knock down resistance
allele in aedes aegypti in me ´xico. PLoS Negl Trop Dis 3: e531.
7. Fisher R, Bennett J (1999) The genetical theory of natural selection: a complete
variorum edition Oxford University Press. 318 p.
8. Smith JM (1998) Evolutionary genetics Oxford University Press. 354 p.
9. R Development Core Team (2010) R: A Language and Environment for
Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org. ISBN 3-900051-07-0.
10. Myung I (2003) Tutorial on maximum likelihood estimation. J Math Biol 47:
11. Koella JC, Lynch PA, Thomas MB, Read AF (2009) Towards evolution-proof
malaria control with insecticides. Evol Appl 2: 469–480.
12. Scott M, Diwell K, McKenzie J (2000) Dieldrin resistance in lucilia cuprina (the
australian sheep blowfly): chance, selection and response. Heredity 84: 599–604.
13. McKenzie J, Yen J (1995) Genotype, environment and the assymetry phenotype.
dieldrin-resistance in lucilia cuprina (the australian sheep blowfly). Heredity 75:
14. Hoy M (1998) Myths, models and mitigation of resistance to pesticides. Phil
Trans R Soc Lond B 353: 1787–1795.
15. Lenormand T, Raymond M (1998) Resistance management: the stable zone
strategy. Proc R Soc Lond B 265: 1985–1990.
16. Roush R, McKenzie J (1987) Ecological genetics of insecticide and acaricide
resistance. Annu Rev Entomol 32: 361–380.
17. Endler J (1986) Natural selection in the wild Princeton University Press. 336 p.
18. Linnen CR, Hoekstra HE (2009) Measuring natural selection on genotypes and
phenotypes in the wild. Cold Spring Harb Symp Quant Biol 74: 155–68.
19. Curtis C, Cook L, Wood R (1978) Selection for and against insecticide resistance
and possible methods of inhibiting the evolution of resistance in mosquitoes. Ecol
Entomol 3: 273–287.
20. Wood R, Cook L (1983) A note on estimating selection pressures on insecticide-
resistance genes. Bull World Health Organ 61: 129–134.
21. Livingston M, Carlson G, Fackler P (2002) Use of mathematical models to
estimate characteristics of pyrethroid resistance in tobacco budworm and
bollworm (lepidoptera : Noctuidae) field populations. J Econ Entomol 95:
22. Labbe ´ P, Sidos N, Raymond M, Lenormand T (2009) Resistance gene
replacement in the mosquito culex pipiens: Fitness estimation from long-term
cline series. Genetics 182: 303–312.
23. Lenormand T, Bourguet D, Guillemaud T, et al. (1999) Tracking the evolution
of insecticide resistance in the mosquito culex pipiens. Nature 400: 861.
24. Templeton A (1974) Analysis of selection in populations observed over a
sequence of consecutive generations. Theor Appl Genet 45: 179–191.
25. Ohta T (1973) Slightly deleterious mutant substitutions in evolution. Nature 246:
26. Gorrochotegui-Escalante N, Gomez-Machorro C, Lozano-Fuentes S, Fernan-
dez-Salas I, Munoz M, et al. (2002) Breeding structure of aedes aegypti
populations in mexico varies by region. Am J Trop Med Hyg 66: 213–222.
27. Lehmann T, Hawley WA, Grebert H, Collins FH (1998) The effective
population size of anopheles gambiae in kenya: implications for population
structure. Mol Biol Evol 15: 264–76.
28. Pinto J, Donnelly MJ, Sousa CA, Malta-Vacas J, Gil V, et al. (2003) An island
within an island: genetic differentiation of anopheles gambiae in sa ˜o tome ´, west
africa, and its relevance to malaria vector control. Heredity 91: 407–14.
29. Solano P, Ravel S, Bouyer J, Camara M, Kagbadouno MS, et al. (2009) The
population structure of glossina palpalis gambiensis from island and continental
locations in coastal guinea. PLoS Negl Trop Dis 3: e392.
30. Ranson H, Abdallah H, Badolo A, Guelbeogo WM, Kerah-Hinzoumbe C, et al.
(2009) Insecticide resistance in anopheles gambiae: data from the first year of a
multi-country study highlight the extent of the problem. Malar J 8: 299.
31. Roush RT, Tabashnik BE (1990) Pesticide resistance in arthropods Chapman
and Hall. 303 p.
32. Flores AE, Grajales JS, Salas IF, Garcia GP, Becerra MHL, et al. (2006)
Mechanisms of insecticide resistance in field populations of aedes aegypti (l.)
from quintana roo, southern mexico. J Am Mosq Control Assoc 22: 672–7.
33. Hastings I, Donnelly M (2005) The impact of antimalarial drug resistance
mutations on parasite fitness, and its implications for the evolution of resistance.
Drug Resistance Updates 8: 43–50.
34. Hemingway J, Ranson H (2000) Insecticide resistance in insect vectors of human
disease. Annu Rev Entomol.
35. Saavedra-Rodriguez K, Urdaneta-Marquez L, Rajatileka S, Moulton M,
Flores A, et al. (2007) A mutation in the voltage-gated sodium channel gene
associated with pyrethroid resistance in latin american aedes aegypti. Insect Mol
Biol 16: 785–798.
36. Saavedra-Rodriguez K, Strode C, Suarez AF, Salas IF, Ranson H, et al. (2008)
Quantitative trait loci mapping of genome regions controlling permethrin
resistance in the mosquito aedes aegypti. Genetics 180: 1137–1152.
37. Hastings I, Watkins W (2006) Tolerance is the key to understanding antimalarial
drug resistance. Trends Parasitol 22: 71–77.
38. McKenzie J, Batterham P (1994) The genetic, molecular and phenotypic
consequences of selection for insecticide resistance. Trends Ecol Evolut 9:
39. Clements A (1992) The Biology of Mosquitoes: Development, nutrition, and
reproduction Chapman & Hall, URL http://books.google.com/books? id=
Estimating Selection from Mosquito Field Data
www.plosntds.org11 November 2011 | Volume 5 | Issue 11 | e1387