Vol. 22 no. 3 2006, pages 341–345
BIOINFORMATICS ORIGINAL PAPER
Genetics and population analysis
Comparison of Bayesian and maximum-likelihood inference of
population genetic parameters
School of Computational Science and Department of Biological Sciences, Florida State University, Tallahassee,
FL 32306-4120, USA
Received on September 8, 2005; revised on November 23, 2005; accepted on November 25, 2005
Advance Access publication November 29, 2005
Associate Editor: Frank Dudbridge
Comparison of the performance and accuracy of different inference
methods, such as maximum likelihood (ML) and Bayesian inference,
is difficult because the inference methods are implemented in different
programs,oftenwrittenbydifferent authors. Bothmethodswereimple-
mented in the program MIGRATE, that estimates population genetic
cence theory. Both inference methods use the same Markov chain
Monte Carlo algorithm and differ from each other in only two aspects:
tion. Using simulated datasets, the Bayesian method generally fares
better than the ML approach in accuracy and coverage, although for
some values the two approaches are equal in performance.
Motivation: The Markov chain Monte Carlo-based ML framework can
fail on sparse data and can deliver non-conservative support intervals.
A Bayesian framework with appropriate prior distribution is able to
remedy some of these problems.
Results: The program MIGRATE was extended to allow not only for
ML(-) maximum likelihood estimation of population genetics paramet-
Bayesian approach and the ML approach are facilitated because both
Availability: The program is available from http://popgen.csit.fsu.edu/
introduced the n-coalescent. The n-coalescent (coalescent for
short) allows us to calculate probabilities of relationships among
a random population sample. This in turn facilitates calculations
of probabilities of whole genealogies under a specific population
model, for example, two populations exchanging migrants at a
constant rate. The first applications that calculated the likelihood
of the population size parameter based on DNA samples were
described by Griffiths and Tavare ´ (1994) and Kuhner et al.
(1995). Bahlo and Griffiths (2000) and Beerli and Felsenstein
(1999, 2001) extended the basic estimation of a single parameter
to joint estimations of migration rates and population sizes, whereas
Kuhner et al. (2000) allowed for the estimation of recombination
rate. These maximum-likelihood (ML) approaches were comple-
mented by several Bayesian approaches (Nielsen, 1998, 2000; Hey
and Nielsen, 2004; Beaumont, 1999 and others). All of these
approaches try to estimate population genetic parameters. They
typically treat the genealogy as a nuisance parameter and summar-
ize over all possible genealogies G; to be precise, they sample over
all possible labeled histories T and branch lengths B, taking into
account the genetic data and the population genetic model. The
likelihood of the data given the model parameters is
where k(T,B|p) is the Kingman coalescent probability density
and L(D|T,B) is the likelihood
Nielsen (personal communication, 2001) suggested that the ML
approach is hampered by several problems. Maximizing the like-
lihood function L(D|p) for complicated scenarios with many para-
meters is very difficult. The Metropolis–Hastings algorithm
(Metropolis et al., 1953; Hastings, 1970) with static driving values
p0, as implemented in MIGRATE and other programs, can take a
prohibitively long run time required to explore completely all the
possible genealogies. These problems have been shown by Abdo
the problems are far less serious when using biologically reasonable
datasets and when the guidelines about convergence outlined in the
MIGRATE-manual (available from http://popgen.csit.fsu.edu) are
The program MIGRATE uses a Metropolis–Hastings algorithm to
explore all possible genealogies (Beerli and Felsenstein, 1999). The
adaptation of the program to a Bayesian framework was not difficult
because only a module handling the prior distributions and a minor
change in the program flow need to be added, together with changes
in the input and output user interfaces.
The program MIGRATE calculates the posterior probability
distribution per locus, treating each locus as completely unlinked
to the others. This assumption is reasonable: most biologists
would prefer to sample such loci rather than partially linked
or completely linked loci because unlinked loci can be treated
as independent replicates of the genealogical history. MIGRATE
?To whom correspondence should be addressed.
? The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org
by guest on June 1, 2013
approximates the posterior distribution
fðp j DÞ ¼rðpÞR
using a Metropolis–Hastings approach. The integral over G is a
condensed expression of the sum over topologies and integral
over all branch lengths. The denominator is
where we integrate over all possible parameter values p.
The updating scheme of the genealogies is the same in the ML
and the Bayesian approach and was described by Beerli Felsenstein
(1999). The updating scheme of the parameters is based on arbitrary
prior distributions r(p). MIGRATE allows the user to choose
between a small number of prior distributions.
value for each parameter.
? Exponential prior distribution with a minimum, mean and
maximum value for each parameter.
The incorporation of additional prior distributions, such as a
gamma distribution, are planned.
A key issue in Metropolis–Hastings algorithms is the acceptance
or not of a change of the current state in the Markov chain. The
algorithmshouldaccept fairlyoften sothatthe chain canexplorethe
solution space more efficiently; poor algorithms will reject often
and force very long runs to achieve equilibrium and an appropriate
sample of the possible states. Typically, the acceptance or rejection
parts: (1)the ratio ofprobabilities tomovefromanoldstatetoanew
state using a prior distribution and the effect of the data (Metropolis
et al., 1953), the Metropolis ratio rM; and (2) the ratio of probab-
ilities to be in the old or new state and go to the new or old state
(Hastings, 1970), the Hastings ratio, rH. In the Bayesian imple-
mentation in MIGRATE the ratio of accepting a move suggested
by the parameter prior is only dependent on the Kingman coalescent
probability density. The acceptance/rejection ratio is
r ¼ rMrH¼rðpðnÞ
which reduces to
If we consider the uniform random prior distribution (URP) then
iÞ ¼ rðpðoÞ
and the Hasting ratio rHwill turn into
For the exponential-prior distribution a similar logic applies,
although moving from pðoÞ
not have equal probability as with the URP (6). In this case the prior
versus from pðnÞ
probabilities in the Hastings-ratio will cancel with the prior
probabilities in the Metropolis ratio [formula (3)].
The performance of the improvements were illustrated on a data-
set for four populations with a unidirectional migration pattern
(Fig. 1). Simulated DNA sequence alignments, generated using
the population model described in Figure 1, were analyzed to
show the performance of the Bayesian and the ML approach.
One dataset with 10 loci, and four groups of 100 single locus
datasets were analyzed. Each dataset contained 20 individuals
from each of the four populations. Using a coalescence-based simu-
lator (cf. Hudson, 2002) ‘true’ genealogies using population sizes
(QT) for all populations of 0.1, 0.01, 0.001 and 0.0001, and Mji
referenced in Figure 1 were created. DNA sequences of 10 000 bp
length were then simulated on these true genealogies using an F84
model with equal base frequencies and transition/transversion ratio
of 2.0. These datasets were then analyzed using either the ML
inference mode in MIGRATE. The ML mode was run for 10 short
chains visiting 100 000 genealogies and storing 5000, updating the
driving parameter after each chain, and two long chains with
10000000 visited genealogies and 50 000 sampled using an adapt-
ive heating scheme. The Bayesian inference was run for 10 000 000
updates, approximately half of which were updates of the 16 para-
meters and approximately half (?5 000 000 because of random
switching between genealogy and parameter updates) were genea-
logy updates. New parameters were proposed using an exponential
prior distribution with population size mean of 2QTand boundaries
of QT/10 and 10QT, and scaled migration rate mean M of 200 and
boundaries of 0 and 1000. Results for uniform priors with the same
boundaries were very similar, and therefore are not shown.
The results and problematic issues are shown only for population
4, but the pattern is identical for the other three populations. The
scenario chosen for an example is difficult for any gene flow estim-
ator because it requires the estimation of 12 migration rates and
4 population sizes. With high migration rates, haplotypes are dis-
tributed evenly over all populations, so that establishing the direc-
tionality of gene flow from estimated migration rates is difficult.
Fig. 1. Population scenario used in the example: four populations exchange
population sizes Qi(4· effective population size · mutation rate per site per
generation), and scaled immigration rates Mji(immigration rate divided by
mutation rate). Migration along routes indicated by solid arrows was simu-
routes was simulated with a value of M ¼ 0. Migration along the dashed
arrows are discussed in the Results section.
by guest on June 1, 2013
With low migration rates, however, the difference from zero, and
thus the directionality, is also difficult to establish. The number of
variable sites or the number of alleles in the dataset is crucial for
accurate estimation of population size and migration rates of any
magnitude. Single locus datasets with low variability do not allow
estimating migration rates with great precision.
Despite these difficulties, with sufficient data, estimates are
expected to be useful for inferring direction and magnitude of
gene flow and magnitude of population size. Using the 16 parameter
model analyzed here will produce very variable parameter estimates
for real biological data.
2.1 Multilocus analysis
Figure 2 shows that the variability of individual loci resulting
from the coalescent can be large and that there are difficulties in
reaching convergence; the combined estimate over all loci, how-
ever, gives a rather accurate picture. The variability for migration
rate estimates is much larger than for the population size estimates.
It is difficult to establish the gene flow direction (M41versus M14)
for the single locus estimates. The estimate over all loci clearly
allows the distinction between the two directions: M14is much
bigger than M41. The estimation of migration parameter values
between populations with no direct connections, for example,
migration rate M24between population 2 and 4, is consistently
low (Fig. 2).
2.2 Comparison of Bayesian and ML inference
MIGRATE allows direct comparison of the success of parameter
inference using the Bayesian approach and the ML approach.
In theory the results should be very similar. Table 1 shows
medians and quartiles of 100 single locus runs. The medians and
quartiles were chosen because they are a better indicator of the
distribution of the results than mean and standard deviation because
these are heavily influenced by large outliers. The median of
the maximum posterior probabilities is similar to the median
of the ML estimates for moderate values of the population size
(Q¼0.01). The results for the low-variability datasets are mixed;
the medians of the two methods are still comparable but the range of
the quartiles of ML M estimates are very large; standard deviations
(data not shown) were even larger because of outliers in the ML
analysis. Several of the 100 runs reported values that were very
different from the true value. The datasets with the smallest true Q
(0.0001) shows even more problems with the ML approach because
the medians for Q is strongly overestimated and the range of the
quartiles for M is huge. In contrast to ML the Bayesian runs recover
the population size, but report very low values for the migration
rate. Figure 3 shows a comparison of posterior distributions of the
scaled migration rate M of the first datasets of each population size
category (Table 1). The power to make inferences about the mag-
nitude of the migrationrate is directlycorrelated withthe magnitude
of the population size. For very small population sizes there is no
power to estimate such low migration rates in the chosen 16 para-
meter problems with a single locus dataset of 10 000 bp for each of
the 100 individuals. The posterior distribution is similar to the
exponential prior distribution used. In contrast to the problems
encountered in the migration rate estimations, the posterior distri-
butions for Q are strongly peaked near 0.0001 (data not shown).
The ML method has difficulty is recovering the expected values
when the dataset is very variable, whereas the Bayesian inference is
closer to the ‘true’ values for all the scenarios. The range of the
quartiles of the ML approach is often much larger than the range of
the Bayesian approach.
includes the ‘true’ values in the 95% credibility interval with fre-
quencies of 0.85–1.00 for the migration and population size para-
meters (Table 2), whereas the ML approach has difficulty with
Fig. 2. Posterior distributions f estimated using exponential priors: expected
M24are zero, expected mode for the scaled effective population size Q4is
0.01. The posterior distributions of 10 independent loci (thin lines) and the
combined posterior distribution (thick line) are shown. The relationship
among the populations is explained in Figure 1.
Table 1. Comparison of Maximum likelihood and Bayesian inference
Medians and quartiles of 100 single-locus datasets for the two inference methods (I):
four different values of ‘true’ population sizes (Q(t)). Q is 4· effective population size ·
mutation rate per site per generation, and M is immigration rate over mutation rate. The
range of number of migrants per generation Nem ¼ QM/4 covers a wide range from
generation. Run conditions for ML and Bayes inferences are specified in the text.
Bayesian inference using MIGRATE
by guest on June 1, 2013
convergence, especially on low-variability datasets, and so has a
rather low coverage (frequencies between 0.06 and 0.94).
The scenario chosen for an example is difficult for any gene flow
estimation program that uses only a single sample in time. The
problem stems from the fact that the only information about the
directionality are the mutationsinthe dataset. If themigration rateis
high, all mutations, even the rare ones, are distributed over all popu-
lations and any directionality estimation based on a single locus
will fail. With low migration rates among the populations, each
population will acquire unique mutations and in principle the mag-
nitude and directionality can be estimated even for single locus
dataset, if there is enough variability in the dataset. In reality,
however, such an estimation has proven difficult because the dif-
ference between the migration rates between two populations is
small and often close to zero. Estimation based on single locus
datasets thus often cannot recover the directionality, but multilocus
estimateswillallow theinference ofthe migrationdirection(Fig.2).
The power to estimate migration rate is crucially dependent on
the number of variable sites or number of alleles in the dataset. Too
little variation leads to haphazard results in the ML method because
the MCMC process has no strong guidance whether to insert or
remove migration events during the course of the analysis; the
process is more dependent on the static driving parameters. Com-
parison of several runs will deliver very different results and there-
fore show non-convergence. The only remedy is to run these
analyses much longer to get a better estimate of the uncertainty
of the estimate. Bayesian analysis is straightforward in such cases
because when the posterior distribution is similar to the prior dis-
tribution, we can conclude that the dataset does not contain enough
information for the inference. The ML method also has difficulties
exploring the distribution around the ML estimate with highly vari-
able data because the genealogy is very well defined by the large
number of variable sites: the static driving value and the updating
scheme (Beerli and Felsenstein, 1999) will not explore many dif-
ferent migration scenarios and therefore the tails of the distribution
are not visited. This results in too narrow support intervals with
small coverage values. In contrast, Bayesian inference manipulates
the parameters using a diffuse prior. This forces more changes of
the genealogy, therefore exploring more different migration scen-
arios and visiting the tails of the posterior distribution more
The coverage shown for the Bayesian runs might be conservative
but this is preferable to the coverage reported for ML, especially in
not really converge and were estimating either very large or zero
Many users of MIGRATE have reported in numerous email queries
that achieving convergence with the ML approach with low-
information data, such as single locus datasets or data with a low
mutation rate, is difficult and needs special attention. Bayesian
inference seems to allow such users to achieve reliable results
with less effort than the ML approach. It seems appropriate that
if only the parameters and their support intervals are ofinterest, then
biologists should prefer the Bayesian approach, although it will be
interesting to see whether this will hold for all biological datasets.
The author thank Rasmus Nielsen for the suggestion to implement a
Bayesian method into MIGRATE. At first, the author was uncertain
Thomas Uzzell, Koffi Sampson and two anonymous reviewers
helped to improve the manuscript. Funding for this research was
supplied through Florida State University and National Science
Foundation grant DEB-0108249 to Scott Edwards and P.B.
Abdo,Z. et al. (2004) Evaluating the performance of likelihood methods for detecting
population structure and migration. Mol. Ecol., 13, 837–851.
Fig. 3. Posterior distribution of the scaled migration rate M14 for four
different values of Q4of a single locus dataset. The population model is
explained in Figure 1. Graphs are results from the first replicate of the four
replicate groups shown in Table 1. Data were simulated with M14¼ 100.
Table 2. Coverage of Maximum likelihood and Bayesian inferences
Coverage analysis of 100 simulated data sets that were generated with four different
values of ‘true’ population sizes (Q(t)). Q is 4· effective population size · mutation rate
per site per generation, and M is immigration rate over mutation rate. Coverage is
measured as the percentage of times the true value is within the estimated 95% support
interval. Run conditions for ML and Bayes inferences are specified in the text.
by guest on June 1, 2013
Bahlo,M. and Griffiths,R.C. (2000) Inference from gene trees in a subdivided
population. Theor. Popul. Biol., 57, 79–95.
Beaumont,M.A.(1999) Detecting population
microsatellites. Genetics, 153, 2013–2029.
Beerli,P. and Felsenstein,J. (1999) Maximum-likelihood estimation of migration rates
and effective population numbers in two populations using a coalescent approach.
Genetic, 152, 763–773.
Beerli,P. and Felsenstein,J. (2001) Maximum likelihood estimation of migration rates
and effective population numbers in two populations using a coalescent approach.
Proc. Nath Acad. Sci. USA, 98, 4563–4568.
Griffiths,R. and Tavare ´,S. (1994) Sampling theory for neutral alleles in a varying
environment. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci., 344, 403–410.
Hastings,W.K. (1970) Monte Carlo sampling methods using markov chains and their
application. Biometrika, 57, 97–109.
Hey,J. and Nielsen,R. (2004) Multilocus methods for estimating population sizes,
migration rates and divergence time, with applications to the divergence of
Drosophila pseudoobscura and D. persimilis. Genetics, 167, 747–760.
Hudson,R.R. (2002) Generating samples under a Wright–Fisher neutral model of
genetic variation. Bioinformatics, 18, 337–338.
Kingman,J.F.C. (1982a) Exchangeability and the evolution of large populations.
In Koch,G. and Spizzichino,F. (eds), Exchangeability in Probability and Statistics.
North-Holland, Amsterdam, pp. 97–112.
Kingman,J.F.C. (1982b) On the genealogy of large populations. J. Appl. Probab., 19A,
Kingman,J.F.C. (1982c)The coalescent.
Kuhner,M.K. et al. (1995) Estimating effective population size and mutation rate
from sequence data using Metropolis–Hastings sampling. Genetics, 140,
Kuhner,M.K. etal. (2000)Maximumlikelihoodestimation ofrecombinationrates from
population data. Genetics, 156, 1393–1401.
Metropolis,N. et al. (1953) Equation of state calculations by fast computing machines.
J. Chem. Phys., 21, 1087–1092.
Nielsen,R. (1998) Maximum likelihood estimation of population divergence times and
population phylogenies under the infinite sites model. J. Theor. Popul. Biol., 53,
Nielsen,R. (2000) Estimation of population parameters and recombination rates from
single nucleotide polymorphisms. Genetics, 154, 931–942.
Bayesian inference using MIGRATE
by guest on June 1, 2013