Copyright 1998 by the Genetics Society of America
Maximum Likelihood Estimation of Population Growth Rates
Based on the Coalescent
Mary K. Kuhner, Jon Yamato and Joseph Felsenstein
Department of Genetics, University of Washington, Seattle, Washington 98195
Manuscript received February 24, 1997
Accepted for publication January 2, 1998
We describe a method for co-estimating 4Ne? (four times the product of effective population size and
neutral mutation rate) and population growth rate from sequence samples using Metropolis-Hastings
sampling. Population growth (or decline) isassumed to be exponential. The estimates of growth rate are
biased upwards, especially when 4Ne? is low; there is also a slight upwards bias in the estimate of 4Ne?
itself due to correlation between the parameters. This bias cannot be attributed solely to Metropolis-
Hastings sampling but appears to be an inherent property of the estimator and is expected to appear in
any approach which estimates growth rate from genealogy structure. Sampling additional unlinked loci
is much more effective in reducing the biasthan increasing the number or length of sequences from the
history. The distribution of coalescence times (timesat
which two of the sampled individuals have a common
ancestor) depends on the effective population size Ne:
in a diploid population the distribution isproportional
to 4Ne. Since coalescence times cannot be directly ob-
lation of mutations, we rescale time proportional to the
per-site neutral rate?. Thus, thoughwecannotestimate
4Nedirectly, we can estimate the product 4Ne? which
we will call ?.
If the population size has changed over time, the
distribution of coalescence times will differ from its ex-
pectation in a population where ? is constant, and in
principle this should be detectable. In particular, if
the population has been growing the most rootward
branches will be relatively short, whereas if it has been
shrinking the most rootward branches will be relatively
Wehave previouslydescribedamethodfor estimating
? in apopulation of constant size (Kuhner etal. 1995),
using Metropolis-Hastingssampling (Met ropol is et al.
1953; Hast ings1970) ofgenealogies. Thebasic strategy
isto samplegenealogiesbased on their posterior proba-
bility with regard to the data and a trial value of ?,
and then use the sampled genealogies to evaluate the
relative likelihoodof othervaluesof?. Thisimportance
sampling approach concentrates the sampled genealo-
gies in regions of high posterior probability, which is
much more ef® cient than using random genealogiesand
HE genealogical structureof asamplefromapopu-
tion. Thisalgorithm isimplemented in our Coalesce pro-
In thispaper weextendthe Metropolis-Hastingsgene-
alogy sampling approach to the case of a population
experiencingexponential growthordecline.In thiscase
population size is represented by two parameters: the
exponential growth rate gand the present-day value of
? (that is, the value at the time when the organisms
were sampled). The parameters are not independent:
the more rapidly a population hasgrown, the larger its
current size isexpected to be compared to itsªaverageº
size.Wehavewritten aprogram, Fluctuate, whichimple-
ments this sampler.
Both analytic and simulation results show that the
estimate of the growth rate g is biased upwards when a
® nite number of individuals are analyzed. At least two
factors are at work in this bias: the nonlinear relation-
ship between coalescence times and the estimate of g,
and truncation of the coalescent distribution, in gene-
alogies of ® nite numbers of individuals, by the bot-
tommost coalescence. There is also a smaller upwards
ters. The bias in these estimators can most effectively
be reduced by sampling multiple loci.
forestimation of growth rateusesadifferent strategyfor
de® ningand sampling genealogies, but sharesacommon
mathematical rationale. It should therefore experience
the same bias. Further testing will be required to com-
pare the effectiveness of these two methods. Other ap-
proachestoestimating growth,such asthepairwisemea-
suresof Sl at kin and Hudson (1991) and Rogers and
Harpending (1992), uselessoftheinformationpresent
in the data and should be less ef® cient (Fel senst ein
Corresponding author: Mary K. Kuhner, University of Washington,
Department of Genetics, Box 357360, Seattle, WA 98195-7360.
Genetics 149: 429±434 (May, 1998)
430 M. K. Kuhner, J. Yamato and J. Felsenstein
A series of genealogies generated under a given ?0and g0
can be used to determine the likelihood L(?,g) for other
values of ? and g. For each genealogy G a product is taken
overall coalescenceintervalsi:ineach interval,kisthenumber
of lineagesin the genealogy during that interval, tsisthe time
at the tipward end, and teis the time at the rootward end.
Note that these are not rescaled times:
1992); the genealogical methodsare at a particular ad-
vantage when the growth rate g is low or negative, a
case in which pairwise methodstend to fail due to the
confounding in¯uence of the genealogical structure
(Sl at kin and Hudson 1991).
MAT ERIALS AND METHODS
P(G??,g) ? ?
k(k ? 1)
The Metropolis-Hastings genealogy sampler for constant-
sized populations (Kuhner et al. 1995) works by a two-phase
process.Itbeginswith an initial genealogyand an initial value
of ?, called ?0. In the ® rst phase, a new genealogy iscreated
by locally rearranging the previous genealogy in proportion
tothecoalescent priorprobabilityP(G??0) (given byKingman
1982a,b). In the second phase, this genealogy is accepted or
rejected based on P(D?G), the probability of the sequence
dataonthegenealogy. Thisisequivalent to samplingfrom the
posteriorprobability, which isproportional to P(G??0)P(D?G).
Thisprocessisrepeated,with samplestaken fromitatintervals
to produce asetof genealogiesfrom which amaximum likeli-
hood of ? can be made. The estimate ismost ef® cient when
?0is close to ?, so it is useful to run several iterationsof the
sampler,usingtheestimated ? ofeach iteration asthestarting
?0of the next.
Like most calculationsinvolving the coalescent, theseequa-
tions hold exactly only in the limit as the population size N
goestoin® nity: inpracticethe approximation involvedshould
beinsigni® cant aslong asthe number of individualssampled
is less than the square root of the population size.
Mutational model: Weused theDNA/ RNA sequencemodel
ofFel senst ein (1981) which allowsunequal basefrequencies
and transition/ transversion bias, extended asin Fel senst ein
and Churchil l (1996) to allowfor variable ratesamong sites
and auto-correlation of those rates. It is simple to substitute
any other mutational model for which P(D?G) can be calcu-
lated:for example, modelsappropriate to protein or microsa-
tellite data. The algorithm as designed doesnot estimate pa-
rameters of the mutational model.
Scaling for populationgrowth: When thesizeof thepopula-
tion changesexponentiallythrough time,the coalescentprior
becomesP(G??0,g0) whereg0isatrial valueof the exponential
growth rateg.(Positivevaluesof gindicate population growth,
and negative valuesindicate decline.) The units of g are 1/?
In order to sample coalescence times from this prior, we
use a time rescaling under which it becomes identical to the
simpler constant-population prior. Time is scaled propor-
tional to growth, so that the same expected amount of coales-
cenceoccursin one unit of timeregardlessof population size.
Under this transformation, the coalescent structure of the
genealogy becomes identical to the constant-population ex-
The rescaled time T is derived from the original time t
by the following relation (Sl at kin and Hudson 1991). The
negative sign in the exponent is due to the fact that we are
considering times previous to the present:
This formula can be shown to be equivalent to that given
in Grif®t hs and Tavare Â1994), bearing in mind that they
scaled time in unitsof N generationsrather than 1/? genera-
tions and they considered a haploid rather than a diploid
case: they also retained some combinatorial constants which
we omit, since we are concerned onlywith ratiosof probabili-
Thisprobability is then corrected for the importance sam-
pling function P(D?G)P(G??0,g0) (where n is the number of
[ThetermsP(D?G) drop outastheyare thesame for all values
of ? and g.]
The maximum of thisfunction, which is a joint maximum
likelihood estimate of ? and g, can be found by standard
methods. Technical dif® culties are often encountered due to
arithmetic over¯ow in exponentiation and the characteristic
curving-ridge shape of the likelihood surface.
Multiple loci: The likelihoods can be multiplied together
across unlinked loci to generate an overall multi-locus likeli-
hood. Doing so should greatly improve the ef® ciency of the
estimate, especially for g, since doubling the number of loci
doubles the amount of information available about the most
rootward partsof the genealogy(which are the most informa-
tive for growth rate, since they represent the population size
most divergent from the modern-daysize). Adding additional
sequences mainly adds information about the most tipward
partsof the genealogy, which contain relativelylittle informa-
tion about growth.
If the loci to be combined cannot be assumed to have the
same values for the parameters, this must be taken into ac-
count when combining them. It is reasonable to assume that
the population growth rate affects all loci equally (barring
selection),but both theneutral mutation rate? and theeffec-
tive population size Necan vary among loci (for example, Ne
is lower for a mitochondrial locus than for a nuclear one).
This can easily be accommodated if the relative values of
the parameters for different loci are known (or can be as-
sumed): we simply replace Neand ? with appropriate locus-
dependent functions when calculating the multi-locus likeli-
hood. In the future, a method for dealing with unknown
variability in ? among loci could be developed by assuming
Gamma distributionsfor the parametersand integrating over
the range of possibilities.
Assessing theaccuracyof theestimate:An advantage of likeli-
hood methods is that information about the accuracy of the
estimate can be gleaned from the likelihood curve. We will
consider thecon® denceintervalasthesetofallparametervalues
which would not be rejected (via a likelihood ratio test) at the
given level. Asymptotically, as the number of loci approaches
in® nity, the shape of the likelihood curve becomes Gaussian
(normal) and wecan construct avariancefor itusing a?2metric
with two degreesof freedom (Cox and Hinkl ey 1974, p. 314).
Using this approach, the area of the parameter space in which
the log likelihood is no more than three units below the maxi-
mum can be taken as a rough 95% con® dence interval.
g(1 ? e?gt) .
This rescaled time is then substituted for ordinary time in
constructingrearrangementsof the genealogy. In caseswhere
g islessthan zero, some proportion of the rescaled times will
correspond to in® nite ordinary time. Our implementation
rejects genealogies which contain in® nite times, on the
groundsthat their likelihood for biologically reasonable data
will tend to be very small. An upwardsbiasmay be created by
thisprocedure, but in practice it should be trivial.
431Estimating Growth Rates
Such con® dence intervals will be approximate at best for
® nite numbers of loci. It is not obvious a priori whether bias
present in the maximum likelihood parameter estimates will
also strongly affect the con® dence intervals. We have not
solved this problem analytically, but we can assess the use-
fulnessofthe approximatecon® denceintervalsbysimulation.
Simulation procedures: Each simulation consisted of 100
replicates. Genealogiesof 25 sequenceswererandomlygener-
ated according to given valuesof ? and g, and DNA data were
generated randomly from these genealogies using a Kimura
2-parameter model (Kimura 1980) with a transition/trans-
version ratio of 2.0. In the following description a ªstepº is
the construction of a single genealogy; a ªchainº is a set of
such genealogies used to make a parameter estimate, which
can then be used to set initial parameters for the following
chain. For both the exponential-growth program Fluctuate
and our constant population sizeprogram Coalesce(used for
comparison), we used the following search strategy: for each
locus, 10 short chains of 1000 steps each were run, followed
by 2 long chains of 15,000 steps each, sampling every 20th
step. We provided the programs with the correct transition/
transversion ratio. For initial estimates of ? we used Watter-
son's estimate (Wat t erson 1975); for initial estimates of g
we arbitrarily chose 1.0. Initial genealogies were generated
using Phylip programs (Fel senst ein 1993, version 3.5c):
Dnadist to produce corrected distances from the sequence
data, and Neighbor to generate Unweighted Pair-Group
Method using Averages(UPGMA) genealogiesfromthesedis-
likelihood estimates assuming that the true genealogy was
known withouterror. Thisisequivalenttousing in® nitelylong
sequences, since with such sequencestheMetropolis-Hastings
sampler should unerringly generate the true genealogy. We
have called these results ªin® nite sitesº in the Tables.
For each estimation, we noted whether or not the log likeli-
hood for the true ? and g was within three units of the log
likelihood at the maximum, i.e., whether or not the truth
could be rejected at the approximate 95% level.
deviation of g was much less for high true values of ?
than for low ones, even with in® nite numbers of sites.
Estimates of ? also tended to be biased upwards, in
contrast to the constant-population case, in which they
appear nearly unbiased (Kuhner et al. 1995).
With few exceptions, doubling the number of loci
wasmore effective in reducing biasand standard devia-
tion than doubling the number of sites.
In most cases the true values of ? and g were rejected
at the 95% level slightlymore often than the desired 5%.
Table 2 shows comparable results, for the case in
which the true g was zero, from the program Coalesce
(Kuhner et al. 1995) which uses a similar Metropolis-
Hastingsstrategybut doesnot allowchangesin popula-
growth asa parameter approximately doublesthe stan-
dard deviation of ?.
Why is the estimate of g biased? We have identi® ed
two processes that contribute to this bias. Both are in-
trinsic to the estimation of exponential growth from
genealogical data and are not due to the Metropolis-
Hastings sampler itself: they can be shown in simple
casesthat donot requireanyof the Metropolis-Hastings
relationshipbetween the coalescencetimesandtheesti-
mate of g. A simple two-sequence case provides a con-
crete demonstration. In genealogies of two tips where
the true growth rate is zero and ? is known without
error, the distribution of the coalescence time t follows
directly from coalescent theory (Kingman 1982a,b).
Centiles of this distribution can then be used to make
a distribution of g Ãvalues (Table 3). The distribution of
g Ãishighlyskewed, with a mean far abovethe true value.
Essentially, the nonlinear relationship between t and g Ã
transforms variance in t into bias in g Ã . Thus, bias is
expected not only in our method but in any method
that uses t (or measurements depending on it, such as
numberof mutations) asabasisfor estimating exponen-
tial growth. For example, the star-phylogenymethod of
Sl at kin and Hudson (1991), which counts variable
sites, shows a similar upwards bias; we have con® rmed
this in simulation tests (data not shown).
However, even in the absence of variability in coales-
cence times some bias is present. Table 4 shows results
based on analysis of a ªperfectº coalescent genealogy,
in which each interval has exactly its expected length;
there isno variance in t. A biasisclearlyvisiblein Table
4, although the 95% con® dence intervals do include
the true value. Thiscomponent of the biasresultsfrom
the fact that any genealogy with ® nite tips truncates
the distribution of coalescence times; it has a ª® nal
coalescenceº attheroot, priortowhich nofurtherinfor-
mation isavailable. This presentslikelihood estimation
with an attractivehypothesisinvolving apopulation bot-
Table 1 showsresults from simulation testsof Fluctu-
ate. We do not present resultsfor the case of ? ? 0.01,
g ? 100 with ® nite numbers of sites because data sets
simulated at these values frequently contained no vari-
data set to produceazero estimateof ? and an indeter-
minate estimate of g (all values are equally likely).
Cases where g is negative entail the possibility that
in® nitetimewill berequired forcoalescencewhensimu-
lating the genealogy. The probability that this will hap-
pen dependson the product of ? and g. In practice, the
case of ? ? 0.01, g ? ?10.0 could be simulated (less
than 1% failure to coalesce), but in the case of ? ? 0.1
a substantial fraction of simulated genealogiesfailed to
coalesce in ® nite time, and so no resultsare presented.
In general, estimates of g showed a strong upwards
bias, decreasing somewhat with number of sites and
more markedlywith number ofloci. Theonlyexception
wasthe case of ? ? 0.1, g? 100 in which the estimates
appear biased downwards with ® nite amounts of data,
possiblydue to saturation of variablesites. Thestandard
432 M. K. Kuhner, J. Yamato and J. Felsenstein
T ABLE 1
Fluctuate simulation results
? ? 0.01
? ? 0.1
Loci bp ? 500bp ? 1000bp ? ∞
bp ? 500bp ? 1000bp ? ∞
A. Estimate of ?
g ? ?101
g ? 0
g ? 100
B. Standard deviation of ?
g ? ?101
g ? 0
g ? 100
C. Estimate of g
g ? ?101
g ? 0
g ? 100
D. Standard deviation of g
g ? ?101
g ? 0
g ? 100
E. Number of samples (out of 100) in which the true values were rejected at the 95% level
g ? ?101 156
g ? 01 16 10
g ? 1001 ND ND
Estimates of ? and g based on 100 simulated data sets each, with 25 sequences of the given number of base
pairs. Columns headed bp ? ∞ were created by assuming that the genealogy could be reconstructed without
error. Table 1E shows the number of times that the true valuesof ? and g could be rejected at the nominal
95% level, out of 100 data sets. ND, not determined.
tleneck at the time of the ® nal coalescence; such a
hypothesishashigh likelihood becauseitmaximizesthe
probability of the ® nal event. Attraction towards this
degenerate hypothesis produces a bias in g Ã.
Correctness of the sampler: It is dif® cult to prove
a complex computer program correct, but we tested
Fluctuate in several ways to help assure ourselves that
the observed bias was not due to program error. If the
sampler is run with 100% acceptance (that is, the data
are ignored and every proposed genealogy accepted)
the genealogiesproduced should be an autocorrelated
but otherwise random sample from a coalescent distri-
bution with the given ? and g. We examined large
samplesof such genealogiesand found them consistent
with the random coalescent (data not shown). We also
tested the sampler with g ? 0 and found its results
which dealtwith the constant-population case(data not
shown). Based on these tests, we believe the sampler to
becorrect. In any case, asisshown in Tables3 and 4, bias
would be expected in a perfectly functioning sampler.
Overcoming bias: Given that thismethod (and other
433Estimating Growth Rates
T ABLE 3 T ABLE 2
Coalesce (constant-population) simulation resultsTheoretical results for tree of two tips
Loci Mean g Ã SD of meanMedian g Ã
SD of ?
500 bp1000 bp 500 bp1000 bp
A. Low? (0.01), low g (0)
The expected distribution of t for trees of two tips was
determined, and centilesof the distribution used to construct
adistributionforg Ã. Givenvaluesaremean, standard deviation,
and median of g Ãfor one, two, and three loci. The true value
of g was 0.0. ? was assumed to be known without error. The
result for 100 loci isan approximation based on 1000 replica-
tionsusing valuesof t drawn at random from the distribution
for each locus.
B. High ? (0.1), low g (0)
Estimatesof ? and gbased on 100simulated data setseach,
with 25 sequences of the given number of base pairs. SD,
methods involving use of t to estimate g) has bias, how
can the mostaccurate resultsbe obtained? Tables3 and
4 showclearly that adding additional sitesor sequences
is ineffectual, whereas adding additional unlinked loci
rapidly reduces the bias. Each new locus will provide
additional information about the region of the early
tion, and the independent variation in coalescence
times among loci helps counteract the biasintroduced
It appears that the small bias seen in ? is a conse-
quence of correlation between ? and g, since it does
not appear when g is held constant at zero (as in Co-
alesce). One positive aspect of these ® ndings is that it
is quite possible to estimate current ? accurately even
if the population has been growing or shrinking; the
bias in ? is small even when g is far from zero.
Future directions: Real biological populations often
grow or decline in ways more complicated than simple
exponential growth, but the biasin the estimator inter-
feres with attempts to ® t more complex models. For
example, one could imagine ® tting a two-stage model
with exponential growth followed by a steady-state pe-
riod; however, becauseofthesparsenessofthe rootward
part of the genealogy this model would be attracted to
wrong solutions featuring very rapid early growth. It is
possible that using a suf® ciently large number of loci
would allow such models to work.
Since relativelylittle power isavailable for estimating
growth, attempts to differentiate between different mod-
elsof growth (for example, exponential versus geomet-
ric or linear) are unlikely to succeed with reasonably
sizeddata sets. In principle, however, thismethodcould
accommodate any growth model for which the time
transformation can be worked out.
The algorithm can readily be adapted to data types
other than nucleotide sequence data, such as protein
sequences, allozyme alleles, or restriction site polymor-
phisms, as long as an appropriate evolutionary model
It is possible to extend this family of algorithms by
including recombination, which will greatly facilitate
the analysis of nuclear loci. This mayalso allowa single
loci, since recombination turns the single genealogy
into several partially correlated genealogies. However,
the algorithm with recombination will be technically
challenging due to the more complex data structures
and rearrangement scheme required. Grif®t hs and
Marjoram (1996) have developed an alternative ap-
proach to genealogical sampling in the presence of re-
it will be interesting to compare these approaches in
T ABLE 4
Results from perfectly coalescent genealogies
No. of tips
Estimates of ? and g, and upper and lower approximate 95% con® dence limits, for ªperfectº genealogies
of the given number of sequences. True ?, 1.0; true g, 0.0.
434 M. K. Kuhner, J. Yamato and J. Felsenstein
Fel senst ein, J., and G. Churchil l , 1996
approach to variation among sitesin rate of evolution. Mol. Biol.
Evol. 13: 93±104.
Grif®t hs, R. C., and P. Marjoram, 1996
samplesof DNA sequenceswith recombination. J. Comput. Biol.
Grif®t hs, R. C., and S. Tavare Â, 1994
allelesin a varying environment. Philos. Trans. R. Soc. Lond. B
Biol. Sci. 344: 403±410.
Hast ings,W.K., 1970 MonteCarlosamplingmethodsusingMarkov
chains and their applications. Biometrika 57: 97±109.
Kimura, M., 1980 A simple model for estimating evolutionaryrates
of base substitutions through comparative studies of nucleotide
sequences. J. Mol. Evol. 16: 111±120.
Kingman, J. F. C., 1982a The coalescent. Stochastic Processes and
Their Applications 13: 235±248.
Kingman, J. F. C., 1982b On the genealogy of large population. J.
Appl. Probab. 19A: 27±43.
Kuhner, M., J. Yamat o and J. Fel senst ein, 1995
tive population size and mutation rate from sequencedata using
Metropolis-Hastingssampling. Genetics 140: 1421±1430.
Met ropol is, N., A. W. Rosenbl ut h, M. N. Rosenbl ut h, A. H.
Tel l er and E. Tel l er, 1953
fast computing machines. J. Chem. Phys. 21: 1087±1092.
Rogers, A. R., and H.Harpending, 1992
waves in the distribution of pairwise genetic differences. Mol.
Biol. Evol. 9: 552±569.
Sl at kin, M., and R. R. Hudson, 1991
chondrial DNA sequences in stable and exponentially growing
populations. Genetics 129: 555±562.
Wat t erson, G. A., 1975 On the number of segregating sites in
genetical models without recombination. Theor. Popul. Biol. 7:
A Hidden Markov Model
Carlo algorithm described here is available from the au-
thorsasprogram Fluctuatein the package Lamarc, which
(The program is written in C and can by obtained by
anonymous ftp at evolution.genetics.washington.edu in
directorypub/lamarcor viathe World Wide Web at http:
Ancestral inference from
Sampling theory for neutral
Wethank Mont y Sl at kin andSimonTavare Âforhelpful discussion
and Pet er Beerl i for assistance in ® nding maxima of likelihood
surfaces. We also thank the Organizing Committee of the fourth
annual meeting of the Society of Molecular Biology and Evolution
for inviting the ® rst author to a highly productive meeting. This
research was supported by National Science Foundation grants BIR-
8918333 and DEB-9207558 and National Institutes of Health grant
2-R55GM41716-04 (all to J.F.).
Equationsof state calculations by
LIT ERATURE CITED
Population growth makes
Cox, D. R., and D. V. Hinkl ey, 1974
and Hill, London.
Fel senst ein, J., 1981
maximum likelihood approach. J. Mol. Evol. 17: 368±376.
Fel senst ein,J., 1992 Estimating effectivepopulation sizefrom sam-
ples of sequences: inef® ciency of pairwise and segregating sites
ascompared to phylogenetic estimates. Genet. Res. 59: 139±147.
Fel senst ein, J., 1993 Phyl ip (Phylogeny Inference Package) ver-
sion 3.5c. Distributed by the author. Department of Genetics,
University of Washington, Seattle.
Theoretical Statistics. Chapman
Evolutionary trees from DNA sequences: a
Communicating editor: S. Tavare Â