Page 1

Vol. 22 no. 3 2006, pages 341–345

doi:10.1093/bioinformatics/bti803

BIOINFORMATICS ORIGINAL PAPER

Genetics and population analysis

Comparison of Bayesian and maximum-likelihood inference of

population genetic parameters

Peter Beerli

School of Computational Science and Department of Biological Sciences, Florida State University, Tallahassee,

FL 32306-4120, USA

Received on September 8, 2005; revised on November 23, 2005; accepted on November 25, 2005

Advance Access publication November 29, 2005

Associate Editor: Frank Dudbridge

ABSTRACT

Comparison of the performance and accuracy of different inference

methods, such as maximum likelihood (ML) and Bayesian inference,

is difficult because the inference methods are implemented in different

programs,oftenwrittenbydifferent authors. Bothmethodswereimple-

mented in the program MIGRATE, that estimates population genetic

parameters,suchaspopulationsizesandmigrationrates,usingcoales-

cence theory. Both inference methods use the same Markov chain

Monte Carlo algorithm and differ from each other in only two aspects:

parameterproposaldistributionandmaximizationofthelikelihoodfunc-

tion. Using simulated datasets, the Bayesian method generally fares

better than the ML approach in accuracy and coverage, although for

some values the two approaches are equal in performance.

Motivation: The Markov chain Monte Carlo-based ML framework can

fail on sparse data and can deliver non-conservative support intervals.

A Bayesian framework with appropriate prior distribution is able to

remedy some of these problems.

Results: The program MIGRATE was extended to allow not only for

ML(-) maximum likelihood estimation of population genetics paramet-

ersbutalsoforusingaBayesianframework.Comparisonsbetweenthe

Bayesian approach and the ML approach are facilitated because both

modesestimatethesameparametersunderthesamepopulationmodel

and assumptions.

Availability: The program is available from http://popgen.csit.fsu.edu/

Contact: beerli@csit.fsu.edu

1 INTRODUCTION

PopulationgeneticschangedconsiderablyafterKingman(1982a,b,c)

introduced the n-coalescent. The n-coalescent (coalescent for

short) allows us to calculate probabilities of relationships among

a random population sample. This in turn facilitates calculations

of probabilities of whole genealogies under a specific population

model, for example, two populations exchanging migrants at a

constant rate. The first applications that calculated the likelihood

of the population size parameter based on DNA samples were

described by Griffiths and Tavare ´ (1994) and Kuhner et al.

(1995). Bahlo and Griffiths (2000) and Beerli and Felsenstein

(1999, 2001) extended the basic estimation of a single parameter

to joint estimations of migration rates and population sizes, whereas

Kuhner et al. (2000) allowed for the estimation of recombination

rate. These maximum-likelihood (ML) approaches were comple-

mented by several Bayesian approaches (Nielsen, 1998, 2000; Hey

and Nielsen, 2004; Beaumont, 1999 and others). All of these

approaches try to estimate population genetic parameters. They

typically treat the genealogy as a nuisance parameter and summar-

ize over all possible genealogies G; to be precise, they sample over

all possible labeled histories T and branch lengths B, taking into

account the genetic data and the population genetic model. The

likelihood of the data given the model parameters is

X

where k(T,B|p) is the Kingman coalescent probability density

and L(D|T,B) is the likelihood

genealogy.

Nielsen (personal communication, 2001) suggested that the ML

approach is hampered by several problems. Maximizing the like-

lihood function L(D|p) for complicated scenarios with many para-

meters is very difficult. The Metropolis–Hastings algorithm

(Metropolis et al., 1953; Hastings, 1970) with static driving values

p0, as implemented in MIGRATE and other programs, can take a

prohibitively long run time required to explore completely all the

possible genealogies. These problems have been shown by Abdo

etal.(2004),althoughAbdoetal.apparentlyfailedtorecognizethat

the problems are far less serious when using biologically reasonable

datasets and when the guidelines about convergence outlined in the

MIGRATE-manual (available from http://popgen.csit.fsu.edu) are

followed.

LðDjpÞ¼

T

Z

B

kðT, BjpÞLðDjT‚BÞdB‚

ð1Þ

ofthe datagiventhe

2 APPROACH

The program MIGRATE uses a Metropolis–Hastings algorithm to

explore all possible genealogies (Beerli and Felsenstein, 1999). The

adaptation of the program to a Bayesian framework was not difficult

because only a module handling the prior distributions and a minor

change in the program flow need to be added, together with changes

in the input and output user interfaces.

The program MIGRATE calculates the posterior probability

distribution per locus, treating each locus as completely unlinked

to the others. This assumption is reasonable: most biologists

would prefer to sample such loci rather than partially linked

or completely linked loci because unlinked loci can be treated

as independent replicates of the genealogical history. MIGRATE

?To whom correspondence should be addressed.

? The Author 2005. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

341

by guest on June 1, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 2

approximates the posterior distribution

fðp j DÞ ¼rðpÞR

using a Metropolis–Hastings approach. The integral over G is a

condensed expression of the sum over topologies and integral

over all branch lengths. The denominator is

Z

p2W

where we integrate over all possible parameter values p.

The updating scheme of the genealogies is the same in the ML

and the Bayesian approach and was described by Beerli Felsenstein

(1999). The updating scheme of the parameters is based on arbitrary

prior distributions r(p). MIGRATE allows the user to choose

between a small number of prior distributions.

GkðGjpÞLðDjGÞdG

PðDÞ

ð2Þ

PðDÞ ¼

rðpÞ

Z

G

kðGjpÞLðDjGÞdGdp‚

? Uniformpriordistributionbetweenaminimumandamaximum

value for each parameter.

? Exponential prior distribution with a minimum, mean and

maximum value for each parameter.

The incorporation of additional prior distributions, such as a

gamma distribution, are planned.

A key issue in Metropolis–Hastings algorithms is the acceptance

or not of a change of the current state in the Markov chain. The

algorithmshouldaccept fairlyoften sothatthe chain canexplorethe

solution space more efficiently; poor algorithms will reject often

and force very long runs to achieve equilibrium and an appropriate

sample of the possible states. Typically, the acceptance or rejection

ofamoveintheMarkovchainisbasedonaratio thatconsistsoftwo

parts: (1)the ratio ofprobabilities tomovefromanoldstatetoanew

state using a prior distribution and the effect of the data (Metropolis

et al., 1953), the Metropolis ratio rM; and (2) the ratio of probab-

ilities to be in the old or new state and go to the new or old state

(Hastings, 1970), the Hastings ratio, rH. In the Bayesian imple-

mentation in MIGRATE the ratio of accepting a move suggested

by the parameter prior is only dependent on the Kingman coalescent

probability density. The acceptance/rejection ratio is

r ¼ rMrH¼rðpðnÞ

iÞkðGjpðnÞ

rðpðoÞ

iÞLðDjGÞ

iÞLðDjGÞ

iÞkðGjpðoÞ

probðpðoÞ

probðpðoÞ

ijpðnÞ

ijpðnÞ

iÞ

iÞ

‚

ð3Þ

which reduces to

r ¼kðGjpðnÞ

kðGjpðoÞ

iÞ

iÞ

ð4Þ

If we consider the uniform random prior distribution (URP) then

rðpðnÞ

iÞ ¼ rðpðoÞ

iÞð5Þ

and the Hasting ratio rHwill turn into

probðpðoÞ

probðpðnÞ

ijpðnÞ

ijpðoÞ

iÞ

iÞ

¼rðpðoÞ

rðpðnÞ

iÞ

iÞ

¼ 1:

ð6Þ

For the exponential-prior distribution a similar logic applies,

although moving from pðoÞ

i

not have equal probability as with the URP (6). In this case the prior

to pðnÞ

i

versus from pðnÞ

i

to pðoÞ

i

will

probabilities in the Hastings-ratio will cancel with the prior

probabilities in the Metropolis ratio [formula (3)].

The performance of the improvements were illustrated on a data-

set for four populations with a unidirectional migration pattern

(Fig. 1). Simulated DNA sequence alignments, generated using

the population model described in Figure 1, were analyzed to

show the performance of the Bayesian and the ML approach.

One dataset with 10 loci, and four groups of 100 single locus

datasets were analyzed. Each dataset contained 20 individuals

from each of the four populations. Using a coalescence-based simu-

lator (cf. Hudson, 2002) ‘true’ genealogies using population sizes

(QT) for all populations of 0.1, 0.01, 0.001 and 0.0001, and Mji

referenced in Figure 1 were created. DNA sequences of 10 000 bp

length were then simulated on these true genealogies using an F84

model with equal base frequencies and transition/transversion ratio

of 2.0. These datasets were then analyzed using either the ML

inferencemode(BeerliandFelsenstein,1999,2001)ortheBayesian

inference mode in MIGRATE. The ML mode was run for 10 short

chains visiting 100 000 genealogies and storing 5000, updating the

driving parameter after each chain, and two long chains with

10000000 visited genealogies and 50 000 sampled using an adapt-

ive heating scheme. The Bayesian inference was run for 10 000 000

updates, approximately half of which were updates of the 16 para-

meters and approximately half (?5 000 000 because of random

switching between genealogy and parameter updates) were genea-

logy updates. New parameters were proposed using an exponential

prior distribution with population size mean of 2QTand boundaries

of QT/10 and 10QT, and scaled migration rate mean M of 200 and

boundaries of 0 and 1000. Results for uniform priors with the same

boundaries were very similar, and therefore are not shown.

The results and problematic issues are shown only for population

4, but the pattern is identical for the other three populations. The

scenario chosen for an example is difficult for any gene flow estim-

ator because it requires the estimation of 12 migration rates and

4 population sizes. With high migration rates, haplotypes are dis-

tributed evenly over all populations, so that establishing the direc-

tionality of gene flow from estimated migration rates is difficult.

Fig. 1. Population scenario used in the example: four populations exchange

migrantsunidirectionallyasfollows:frompopulation1to4(M14),from4to3

(M43),from3to2(M32)andfrom2to1(M21).Parametersarescaledeffective

population sizes Qi(4· effective population size · mutation rate per site per

generation), and scaled immigration rates Mji(immigration rate divided by

mutation rate). Migration along routes indicated by solid arrows was simu-

latedusing‘true’valuesofM¼100;migrationalongalleightothermigration

routes was simulated with a value of M ¼ 0. Migration along the dashed

arrows are discussed in the Results section.

P.Beerli

342

by guest on June 1, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 3

With low migration rates, however, the difference from zero, and

thus the directionality, is also difficult to establish. The number of

variable sites or the number of alleles in the dataset is crucial for

accurate estimation of population size and migration rates of any

magnitude. Single locus datasets with low variability do not allow

estimating migration rates with great precision.

Despite these difficulties, with sufficient data, estimates are

expected to be useful for inferring direction and magnitude of

gene flow and magnitude of population size. Using the 16 parameter

model analyzed here will produce very variable parameter estimates

fromsinglelocusdata,however,andsuchanalysesarenotadvisable

for real biological data.

2.1 Multilocus analysis

Figure 2 shows that the variability of individual loci resulting

from the coalescent can be large and that there are difficulties in

reaching convergence; the combined estimate over all loci, how-

ever, gives a rather accurate picture. The variability for migration

rate estimates is much larger than for the population size estimates.

It is difficult to establish the gene flow direction (M41versus M14)

for the single locus estimates. The estimate over all loci clearly

allows the distinction between the two directions: M14is much

bigger than M41. The estimation of migration parameter values

between populations with no direct connections, for example,

migration rate M24between population 2 and 4, is consistently

low (Fig. 2).

2.2 Comparison of Bayesian and ML inference

MIGRATE allows direct comparison of the success of parameter

inference using the Bayesian approach and the ML approach.

In theory the results should be very similar. Table 1 shows

medians and quartiles of 100 single locus runs. The medians and

quartiles were chosen because they are a better indicator of the

distribution of the results than mean and standard deviation because

these are heavily influenced by large outliers. The median of

the maximum posterior probabilities is similar to the median

of the ML estimates for moderate values of the population size

(Q¼0.01). The results for the low-variability datasets are mixed;

the medians of the two methods are still comparable but the range of

the quartiles of ML M estimates are very large; standard deviations

(data not shown) were even larger because of outliers in the ML

analysis. Several of the 100 runs reported values that were very

different from the true value. The datasets with the smallest true Q

(0.0001) shows even more problems with the ML approach because

the medians for Q is strongly overestimated and the range of the

quartiles for M is huge. In contrast to ML the Bayesian runs recover

the population size, but report very low values for the migration

rate. Figure 3 shows a comparison of posterior distributions of the

scaled migration rate M of the first datasets of each population size

category (Table 1). The power to make inferences about the mag-

nitude of the migrationrate is directlycorrelated withthe magnitude

of the population size. For very small population sizes there is no

power to estimate such low migration rates in the chosen 16 para-

meter problems with a single locus dataset of 10 000 bp for each of

the 100 individuals. The posterior distribution is similar to the

exponential prior distribution used. In contrast to the problems

encountered in the migration rate estimations, the posterior distri-

butions for Q are strongly peaked near 0.0001 (data not shown).

The ML method has difficulty is recovering the expected values

when the dataset is very variable, whereas the Bayesian inference is

closer to the ‘true’ values for all the scenarios. The range of the

quartiles of the ML approach is often much larger than the range of

the Bayesian approach.

ThecoverageoftheBayesianapproachisratherconservativeand

includes the ‘true’ values in the 95% credibility interval with fre-

quencies of 0.85–1.00 for the migration and population size para-

meters (Table 2), whereas the ML approach has difficulty with

Fig. 2. Posterior distributions f estimated using exponential priors: expected

modeforthescaledmigrationrateM14is100,expectedmodesforM41andfor

M24are zero, expected mode for the scaled effective population size Q4is

0.01. The posterior distributions of 10 independent loci (thin lines) and the

combined posterior distribution (thick line) are shown. The relationship

among the populations is explained in Figure 1.

Table 1. Comparison of Maximum likelihood and Bayesian inference

QðtÞ

4

MðtÞ

14

I

Q4

25%

M14

25%Med 75%Med75%

0.0001100M

B

M

B

M

B

M

B

0.0004

0.00006

0.0010

0.0013

0.0089

0.0085

0.0295

0.0698

0.00092

0.00009

0.0017

0.0015

0.0104

0.0101

0.0573

0.0891

0.0028

0.00013

0.0036

0.0017

0.0128

0.0012

0.0825

0.1143

0.0

7.0

0.0

65.0

20.0

63.0

36.1

45.5

0.2

9.0

46.3

79.0

53.7

90.0

66.5

69.0

643.6

41.0

171.5

117.0

108.1

125.0

100.5

116.5

0.001100

0.01 100

0.1 100

Medians and quartiles of 100 single-locus datasets for the two inference methods (I):

maximumlikelihood(M)andBayesian(B).Simulateddatasetsthatweregeneratedwith

four different values of ‘true’ population sizes (Q(t)). Q is 4· effective population size ·

mutation rate per site per generation, and M is immigration rate over mutation rate. The

range of number of migrants per generation Nem ¼ QM/4 covers a wide range from

0.0025(correspondingtoaQ¼0.0001)to2.5(correspondingtoaQ¼0.1)migrantsper

generation. Run conditions for ML and Bayes inferences are specified in the text.

Bayesian inference using MIGRATE

343

by guest on June 1, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 4

convergence, especially on low-variability datasets, and so has a

rather low coverage (frequencies between 0.06 and 0.94).

3 DISCUSSION

The scenario chosen for an example is difficult for any gene flow

estimation program that uses only a single sample in time. The

problem stems from the fact that the only information about the

directionality are the mutationsinthe dataset. If themigration rateis

high, all mutations, even the rare ones, are distributed over all popu-

lations and any directionality estimation based on a single locus

will fail. With low migration rates among the populations, each

population will acquire unique mutations and in principle the mag-

nitude and directionality can be estimated even for single locus

dataset, if there is enough variability in the dataset. In reality,

however, such an estimation has proven difficult because the dif-

ference between the migration rates between two populations is

small and often close to zero. Estimation based on single locus

datasets thus often cannot recover the directionality, but multilocus

estimateswillallow theinference ofthe migrationdirection(Fig.2).

The power to estimate migration rate is crucially dependent on

the number of variable sites or number of alleles in the dataset. Too

little variation leads to haphazard results in the ML method because

the MCMC process has no strong guidance whether to insert or

remove migration events during the course of the analysis; the

process is more dependent on the static driving parameters. Com-

parison of several runs will deliver very different results and there-

fore show non-convergence. The only remedy is to run these

analyses much longer to get a better estimate of the uncertainty

of the estimate. Bayesian analysis is straightforward in such cases

because when the posterior distribution is similar to the prior dis-

tribution, we can conclude that the dataset does not contain enough

information for the inference. The ML method also has difficulties

exploring the distribution around the ML estimate with highly vari-

able data because the genealogy is very well defined by the large

number of variable sites: the static driving value and the updating

scheme (Beerli and Felsenstein, 1999) will not explore many dif-

ferent migration scenarios and therefore the tails of the distribution

are not visited. This results in too narrow support intervals with

small coverage values. In contrast, Bayesian inference manipulates

the parameters using a diffuse prior. This forces more changes of

the genealogy, therefore exploring more different migration scen-

arios and visiting the tails of the posterior distribution more

efficiently.

The coverage shown for the Bayesian runs might be conservative

but this is preferable to the coverage reported for ML, especially in

thelow-variabilitydatasets(Q?0.001,Table2).SomeMLrunsdid

not really converge and were estimating either very large or zero

migration rates.

4 CONCLUSION

Many users of MIGRATE have reported in numerous email queries

that achieving convergence with the ML approach with low-

information data, such as single locus datasets or data with a low

mutation rate, is difficult and needs special attention. Bayesian

inference seems to allow such users to achieve reliable results

with less effort than the ML approach. It seems appropriate that

if only the parameters and their support intervals are ofinterest, then

biologists should prefer the Bayesian approach, although it will be

interesting to see whether this will hold for all biological datasets.

ACKNOWLEDGEMENTS

The author thank Rasmus Nielsen for the suggestion to implement a

Bayesian method into MIGRATE. At first, the author was uncertain

aboutitssuccessforcomplicatedscenarios,liketheoneused(Fig.1).

Thomas Uzzell, Koffi Sampson and two anonymous reviewers

helped to improve the manuscript. Funding for this research was

supplied through Florida State University and National Science

Foundation grant DEB-0108249 to Scott Edwards and P.B.

REFERENCES

Abdo,Z. et al. (2004) Evaluating the performance of likelihood methods for detecting

population structure and migration. Mol. Ecol., 13, 837–851.

Fig. 3. Posterior distribution of the scaled migration rate M14 for four

different values of Q4of a single locus dataset. The population model is

explained in Figure 1. Graphs are results from the first replicate of the four

replicate groups shown in Table 1. Data were simulated with M14¼ 100.

Table 2. Coverage of Maximum likelihood and Bayesian inferences

QðtÞ

4

MðtÞ

14

Coverage (%)

Q4

ML

M14

ML BayesBayes

0.0001

0.001

0.01

0.1

100

100

100

100

6 98

100

96

91

33

55

62

49

100

99

96

85

47

94

51

Coverage analysis of 100 simulated data sets that were generated with four different

values of ‘true’ population sizes (Q(t)). Q is 4· effective population size · mutation rate

per site per generation, and M is immigration rate over mutation rate. Coverage is

measured as the percentage of times the true value is within the estimated 95% support

interval. Run conditions for ML and Bayes inferences are specified in the text.

P.Beerli

344

by guest on June 1, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from

Page 5

Bahlo,M. and Griffiths,R.C. (2000) Inference from gene trees in a subdivided

population. Theor. Popul. Biol., 57, 79–95.

Beaumont,M.A.(1999) Detecting population

microsatellites. Genetics, 153, 2013–2029.

Beerli,P. and Felsenstein,J. (1999) Maximum-likelihood estimation of migration rates

and effective population numbers in two populations using a coalescent approach.

Genetic, 152, 763–773.

Beerli,P. and Felsenstein,J. (2001) Maximum likelihood estimation of migration rates

and effective population numbers in two populations using a coalescent approach.

Proc. Nath Acad. Sci. USA, 98, 4563–4568.

Griffiths,R. and Tavare ´,S. (1994) Sampling theory for neutral alleles in a varying

environment. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci., 344, 403–410.

Hastings,W.K. (1970) Monte Carlo sampling methods using markov chains and their

application. Biometrika, 57, 97–109.

Hey,J. and Nielsen,R. (2004) Multilocus methods for estimating population sizes,

migration rates and divergence time, with applications to the divergence of

Drosophila pseudoobscura and D. persimilis. Genetics, 167, 747–760.

Hudson,R.R. (2002) Generating samples under a Wright–Fisher neutral model of

genetic variation. Bioinformatics, 18, 337–338.

expansionand declineusing

Kingman,J.F.C. (1982a) Exchangeability and the evolution of large populations.

In Koch,G. and Spizzichino,F. (eds), Exchangeability in Probability and Statistics.

North-Holland, Amsterdam, pp. 97–112.

Kingman,J.F.C. (1982b) On the genealogy of large populations. J. Appl. Probab., 19A,

27–43.

Kingman,J.F.C. (1982c)The coalescent.

235–248.

Kuhner,M.K. et al. (1995) Estimating effective population size and mutation rate

from sequence data using Metropolis–Hastings sampling. Genetics, 140,

1421–1430.

Kuhner,M.K. etal. (2000)Maximumlikelihoodestimation ofrecombinationrates from

population data. Genetics, 156, 1393–1401.

Metropolis,N. et al. (1953) Equation of state calculations by fast computing machines.

J. Chem. Phys., 21, 1087–1092.

Nielsen,R. (1998) Maximum likelihood estimation of population divergence times and

population phylogenies under the infinite sites model. J. Theor. Popul. Biol., 53,

143–151.

Nielsen,R. (2000) Estimation of population parameters and recombination rates from

single nucleotide polymorphisms. Genetics, 154, 931–942.

StochasticProc. Appl.,

13,

Bayesian inference using MIGRATE

345

by guest on June 1, 2013

http://bioinformatics.oxfordjournals.org/

Downloaded from