arXiv:math/0506145v1 [math.PR] 8 Jun 2005
Continuous and Tractable models for the Variation
of Evolutionary Rates
Thomas Lepage∗, Stephan Lawi†, Paul Tupper∗and David Bryant∗
February 1, 2008
McGill Centre for Bioinformatics
Montr´ eal, Qu´ ebec H3A 2B4
ph: 1 514 398-4578. fax: 1 514 398-3387.
∗Dept. of Mathematics and Statistics, McGill University, lMontr´ eal
†Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Pierre et Marie Curie,
We propose a continuous model for evolutionary rate variation
across sites and over the tree and derive exact transition probabilities
under this model. Changes in rate are modelled using the CIR process,
a diffusion widely used in financial applications. The model directly
extends the standard gamma distributed rates across site model, with
one additional parameter governing changes in rate down the tree. The
parameters of the model can be estimated directly from two well-known
statistics: the index of dispersion and the gamma shape parameter of
the rates across sites model. The CIR model can be readily incor-
porated into probabilistic models for sequence evolution. We provide
here an exact formula for the likelihood of a three taxa tree. Larger
trees can be evaluated using Monte-Carlo methods.
Keywords Evolutionary rate; Molecular clocks; CIR process; Diffusion
processes; Covarion; Phylogenetics.
Understanding evolutionary rates and how they vary is one of the central
concerns of molecular evolution. It has been clearly shown that inadequate
models of rate variation, between lineages and between loci, can dramatically
affect the accuracy of phylogenetic inference [1, 2, 3]. The dependency of
molecular dating on evolutionary rate models is even more critical: we will
only obtain precise divergence time estimates from molecular data once we
can model the rate at which sequences evolve [4, 5].
Modelling the evolutionary rate is made difficult by the number and va-
riety of factors influencing it. The base rate of mutation can vary because of
changes in the accuracy of transcription machinery , DNA repair mech-
anisms , and metabolic rate . At the cellular level, selective pressures
can lead to variation of rate between loci and over time, as evidenced by
differential rates of the three codon position [9, 10], the slower evolutionary
rate of highly expressed genes , and the effect of tertiary structure on
patterns of sequence conservation .
Selection also affects the evolutionary rate at the level of populations.
For the most part, the only mutations that affect phylogenetics are those
that are fixed in the population. Hence evolutionary rate is a combination
of mutation rate and fixation rate. Fluctuations in population size, gener-
ation times, and environmental pressures affect fixation rates and thereby
influence evolutionary rate [13, 14, 15].
Because of this complexity, the strategies employed for modelling evolu-
tionary rate have tended to be statistical in nature. As with all statistical
inference, there is an iterative sequence of model formulation, model as-
sessment, and model improvement. The aim is to construct a model that
accurately explains the observed variation but is as simple, and tractable,
Our goal in this paper is to derive a continuous model for rate evolu-
tion that avoids many of the problems of existing approaches. We base
our model on the CIR process, a continuous Markov process that is widely
used in finance to model interest rates . As we shall see, the model fits
well into existing protocols for phylogenetic inference. The process has a
stationary distribution given by a gamma distribution and yet, unlike the
rates-across-sites (RAS) model of Uzzell and Corbin , the rate is allowed
to vary along lineages. The CIR model adds only one parameter to the
RAS model, and this parameter can be estimated directly from the index of
dispersion or the autocorrelation (see below). Furthermore, we can derive
exact transition probabilities when we incorporate CIR based rate variation
into the standard models for sequence evolution.
The outline of the paper is as follows:
• In the following section we summarise the key characteristics of models
for rate evolution, and show how existing models are classified with
respect to these characteristics.
• In Section 3 we present the CIR model for rate evolution and discuss
its basic properties.
• In Section 4 we derive transition probabilities for standard mutation
models where the rate is described as a Markov process.
• In Section 5 we focus on the case where the rate is modelled by a CIR
• In Section 6 we extend this one step further to derive an expression
for likelihood of a three-taxa tree using a mutation model with rate
determined by the CIR process. We note that three-taxa trees are
often used to study differences in evolutionary rate.
We conclude with an outline of future work and work in progress.
In a companion paper (in preparation) we describe the incorporation of
this model into software for Bayesian phylogenetic inference, and use this to
show how our model captures important information lost in standard RAS
2 Properties of models for rate variation
In this section we examine several important characteristics that can be used
to distinguish, and choose between, different models for rate variation. We
discuss how the different existing models fit into this scheme and summarise
the differences between them in Table 1.
The rate of evolution for a given locus at time t ≥ 0 is denoted by Rt.
For each t > 0, Rtis a non-negative random variable, and different models
of rate evolution give different distributions for the rates Rt, t ≥ 0.
Here and throughout the paper we will restrict out attention to Markov
processes. That is, for any t1≤ t2≤ t3, we assume that R(t3) conditioned
on R(t2) is independent of R(t1). In other words, the future depends on the
past only through the present.
Property I: Continuous or Discontinuous Sample Paths
The first characteristic is whether sample paths of the process are con-
tinuous or discontinuous with respect to time. Typically, models with dis-
continous paths have rates Rtthat are constant except for discrete points
in time at which there is a jump in the value (Figure 1-1(a)). If the number
of possible values for the rate is finite, then the rate can easily be described
as a continuous-time Markov chain with a infinitesimal rate matrix. For ex-
ample, in the covarion process  the basic rates are ‘off’ (Rt= 0) or ‘on’
(Rt= 1) and transitions occur between them at exponentially distributed
random time intervals. Galtier [19, 20] generalizes this process to one with
more than two possible states. In other models, the range of possible val-
ues for the rate is continuous, as in the model of Huelsenbeck , where
a rate change event consists of multiplying the previous rate by a gamma
random variable. The rate change events are still discrete and exponentially
There also are models that describes the rate as a continuous function
with time, and the most important class of Markov processes with continuous
paths are diffusions (Figure 1-2(a)).
presented here, the Ornstein-Uhlenbeck model of Aris-Brosou and Yang [4,
22], and the log-normal model of Kishino et al. [5, 23].
Finally, it is also possible for Rtto make jumps in value at a discrete set
of times while also changing continuously in between these points.
Examples include the CIR process
Property II: Long Term Behaviour and Ergodicity
The second property we consider is the distribution of Rtas t goes to
infinity, that is, the distribution of the rate of evolution in the long term.
Surprisingly, many models of rate evolution are very badly behaved in the
One problematic class of processes that have already been applied to
rates in phylogenetics is the martingales. We say that a Markov process is
a martingale if, for all s,t ≥ 0 we have E[Mt+s|Mt] = Mt. An example
of of a Markov martingale is Brownian motion. As a result of this fairly
innocuous looking condition, a martingale Mthas the property that either
E[|Mt|] is unbounded in time or Mtconverges to a random constant .
Either possibility is undesirable from a modelling point of view. This may
not be a problem if we only look at the process over a finite time, but
to the classification of property I. On top are examples of the rate history.
Below are the corresponding integrated rates τ(t) =?t
events, and in figures 2(a-b), R(t) is modelled as a diffusion process, with
A representation of the two classes of rate process with respect
s=0Rsds. The figures
1(a-b) refer to a continuous-time Markov chain with discrete rate change
neither is it particularly desirable. The processes of Kishino et al. [5, 23]
and Huelsenbeck et al.  all have the property that either Rtor log(Rt)
is a martingale.
At a purely theoretical level, we observe that an ever-increasing variance
will result for almost any signal that is only driven by its initial value and a
stochastic force, with no directional bias. The position of a particle subjected
to a random force produced by collisions with other particles is a classical
example of such a case. In our context, the effects on the evolutionary rate
are not independent of the actual rate : whatever the theoretical framework
we consider, a high evolutionary rate is not as likely to increase (or to stay
in high values) as to go back to smaller values.
fits particularly well to this idea, where periods of drastic adaptation with
The episodic evolution
high evolutionary rates are naturally followed by periods where a population
is adapted and its genome evolves much more slowly. Even according to
the neutral theory, as argued by Takahata , the overall dynamic of the
rate should behave as a random function that takes high values whenever
bottlenecks occur, and goes back to small values afterwards.
The concept of ergodicity naturally arises from this observation. We say
that a Markov process is ergodic if for any initial rate R0the distribution
of Rtconverges to a unique distribution as t goes to infinity. The limiting
distribution is known as the invariant or stationary distribution. Examples
of ergodic processes include the OU process, the CIR model and (usually)
the discrete space covarion and covarion-type models [26, 19, 20].
One possible way for a process to not be ergodic is if the distribution of
Rtdoes not converge for large t for some initial rate R0. This must be the
case if Rtis a martingale and does not converge to a constant, as is the case
with Brownian motion. Another possibility is that Rtconverges to different
stationary distributions for different values of R0.
Property III: Tractability
A highly desirable feature of any model is its tractability, both math-
ematical (does there exist a closed formula?) and computational (can we
compute probabilities efficiently?). Nowadays, Monte Carlo methods make
it possible to use arbitrarily complex models: however, explicit analytical
formulae allow for more efficient sampling .
There are several probability distribution functions that are important
to have when working with rate processes. The most basic is the distribution
of the rate Rtgiven the rate at time t = 0. This we have for the models
[4, 22, 23, 5] and for the CIR model, but not for the models of .
In phylogenetics we incorporate the model for evolutionary rate into the
mutation model for sequence evolution at a site. These interact to give a
joint process (Rt,Xt) for both the rate Rtat time t and the nucleotide or
protein Xtat time t. To evaluate the likelihood we require an expression for
the joint conditional probability
P[Xt= j,Rt= s|X0= i,R0= r] (1)
of going from one nucleotide (or amino acid) state and rate state to another
pair of states.Even though it is sometimes possible to perform Monte
Carlo computations to estimate this probability without a formula (as in
), having a formula will speed up the computations significantly without
having to resort to approximations, as in [23, 5].
Property IV: Autocovariance and dispersion
There is general agreement , ,  on the relevance of autocorre-
lation in the modelling of evolutionary rate. Broadly speaking, if the various
causes that explain rate variation (generation time, population size, environ-
mental fitness) vary with time, it should be reflected in rate variations. The
extent to which the rate varies can be studied using the index of dispersion
(Kimura,  , Langley and Fitch ). Let N(t) be the number of sub-
stitutions or mutations of a sequence over time t. The index of dispersion
I(t) is defined as
This statistic can be estimated by comparing the number of substitutions
that have accumulated in different lineages [32, 34]. The population genetics
community has proposed different models to account for a high index of
dispersion (, ), and any reasonable model should yield an index of
dispersion of at least one.
The index of dispersion resulting from a particular model of rate varia-
tion is a function of the autocovariance of that model. The autocovariance
for a process Rtis defined by
ρ(t) = Cov(R0,Rt). (3)
For many processes we can derive an explicit formula for the autocovariance.
If we assume that the substitutions occur according to a Poisson process with
rate governed by our rate process (that is, the substitutions follow a doubly
stochastic or Cox process, Section 4) and the rate process has autocovariance
function ρ(t) then
I(t) = 1 +2?t
as stated by a theorem in , and the stationary index of dispersion  is
I(∞) = lim
t→∞I(t) = 1 +2?∞
provided that µ, the stationary mean of the process R(t), and the limit,
exist. Note that if there is any variation in rate then the index of dispersion
will be greater than one .
Some rate models in phylogenetics [23, 22] don’t model explicitly the
rate, but instead assign a (fixed) rate to each branch, so that the expected
number of substitutions on a particular branch is equal to its length times
its assigned (constant) rate.
A close look at the log-normal model from Thorne et al. , which differs
from their previous version  in that the rate is explicitly modelled, we
suggest that the rate has constant autocovariance, since this rate process is
close to a transform of the Brownian motion, and Brownian motion has a
constant autocovariance function. Put into equation (5), we see that the the
index of dispersion diverges. This problematic result illustrates the necessity
of a balance between the presence of autocorrelation on one side, and the
decrease of autocorrelation on a large time scale.
Property V: Heterotachy or Homotachy
There are two general ways that models for evolutionary rate can be
incorporated into phylogenetics. On one hand, we have rate variation among
lineages that applies to all sites (or loci) together. This can be modelled by
trees for which the paths from the root to the leaves have different lengths.
The rate variation explains the extent to which the evolution of the sequences
has violated the molecular clock. Alternatively, we can introduce a distinct
rate process for each site or locus.
lineage rate changes are site-specific . The transition probabilities that we
derive in Section 4 can be applied to homotachous as well as heterotachous
This models heterotachy, where the
3 A Continuous diffusion model for the evolution-
A Markov process with continuous paths and satisfying some additional
smoothness conditions on its transition probabilities  is called a diffu-
sion. There are many ways of specifying a diffusion process: perhaps the
most intuitive one is by giving the probability distribution function (pdf)
of Rtgiven R0= r0, for arbitrary r0.We denote this pdf by fR[Rt|r0]. For
example, Brownian motion with parameter σ2is defined by the condition
that fR[Rt|r0] is a normal density with mean r0and variance σ2t.
A mathematically convenient representation of a diffusion is by means of
a stochastic differential equation (SDE). In the same way that a dynamical
system can be defined as the solution of a differential equation, a diffusion
for the transi-
for the autoco-
withYes NoneNone 
YesYes None[18, 26,
HLS withNo NoneNone
Log-Normal NoNone Constant auto-
DIffusionNo None[4, 22]
Table 1: Models for the substitution rate, classified according to the prop-
erties of section 2 . CTMC stands for “Continuous-Time Markov Chain”.
process Rtcan be defined as the solution of an equation taking the general
form (see  p.61)
dRt= α(t,Rt)dt + β(t,Rt)dBt. (6)
Here, α(t,Rt) represents the deterministic effect on Rt, β(t,Rt) the stochas-
tic part, and dBtis an infinitesimal “random” increment. Brownian motion
corresponds to the case when α(t,Rt) = 0 for all t, β(t,Rt) is constant and
the SDE becomes
Note that if β(t,Rt) = 0 for all t and Rtthen (6) becomes a deterministic
ordinary differential equation.
Going from an SDE such as (6) to a pdf for the diffusion involves solving
a variable-coefficient second-order partial differential equation (PDE). For
general functions α and β this PDE has no analytic solution. There are
very few diffusions known that have closed form equations for their pdfs,
and even fewer of these are ergodic. The simplest ergodic diffusions with
closed-form expressions for the pdf are the Ornstein-Uhlenbeck and the CIR
(Cox-Ingersoll-Ross)  processes.
The Ornstein Uhlenbeck (OU) process is described by the SDE
dRt= −bRtdt + σdBt.
The pdf for Rtgiven R0= r0is the normal density with mean r0e−btand
variance σ2(1 − e−2θt).Its stationary distribution is normal with mean
0 and variance σ2. The OU process was used by Aris-Brosou and Yang
 to model evolutionary rates. However, the OU process can take on
negative values, and it is not clear how it can be used directly without
any transformation, such as a reflected OU or a squared OU. Aris-Brosou
and Yang also proposed another model, the EXP (for exponential) model,
defined as the following : the rate assigned to a branch is drawn from an
exponential distribution with mean equal to the rate of its ancestral branch.
It is then obvious that their EXP model was a martingale. They outlined
that the OU model seemed to provide a better fit to their data than the
EXP model. Even if the reason of this better fit is still to be investigated,
it seems reasonable to think that the ergodic property of the OU model
could be a important factor. They also mentioned that the σ2parameter
of the OU model was hard to infer, perhaps because the OU model has an
insufficient number of free parameters.
The use of the CIR model solves the problem, since it is a generaliza-
tion of the squared OU process, where the mean and the variance can be
independently inferred by the addition of a third parameter. If the mean
is fixed to one, we avoid any identifiability problem with branch lengths
without fixing the variance, which can therefore be inferred as well as the
The CIR process satisfies the SDE
dRt= b(a − Rt)dt + σ
and the pdf fR(Rt|r0) for Rtgiven R0= r0is a non-central χ2distribution
with degree of freedom 4ab/σ2and parameter of non-centrality
Its mean and variance are equal to
E[rt]=r0e−bt+ a(1 − e−bt)
b(e−bt− e−2bt) +aσ2
2b(1 − e−bt)2. (9)
The stationary distribution of Rtis a gamma distribution with shape
parameter 2ab/σ2and scale parameter σ2/2b. Hence the mean of the sta-
tionary distribution is a and the variance isaσ2
Unlike an OU process, if r0, a, and b are all positive a CIR process is
always non-negative. The square of an OU process is a special case of the
CIR process. Furthermore, by multiplying Rtby a constant in equation (7),
we see that multiplying a CIR process by a positive constant gives another
The covariance of the stationary CIR process can be exactly computed
ρ(t) = Cov(R0,Rt) =aσ2
From this, (4) leads to a closed formula for the index of dispersion:
ICIR(t) = 1 +σ2
b3t(bt − 1 + e−bt).
ICIR(∞) = lim
t→∞ICIR(t) = 1 +σ2
In Section 2 we emphasized that the concept of autocovariance is close
to the index of dispersion. As Zheng showed , the effect of complex
infinitesimal rate matrices on the index of dispersion (with constant rate) is
not likely to explain alone the observed large empirical values. If the rate
varies, Cutler  showed that an elevated index of dispersion can only be
achieved if the time-scale of the rate process is approximately of the same
magnitude as the substitution process itself. The CIR process provides the
possibility to satisfy this property, while incorporating autocovariance. It is
the consensus of these two ideas that should guide our choice for the rate of
From (7) we see that the CIR process possesses three parameters, namely
the stationary mean a, the stationary variance σ2, and the intensity of the
force that drives the process to its stationary distribution, b. The parameter
b determines how fast the process autocovariance goes to 0 as t increases.
The three parameters of the CIR process can be quickly estimated from
standard statistics in molecular evolution. The parameter a is a scale param-
eter. It determines the expected rate at any time given no other information.
Throughout the paper, we will assume that a = 1, so that the model has an
expected rate equal to one. This parallels the constraint that the gamma
distribution has an expected rate equal to one in the Rate-Across-Site (RAS)
The CIR process has a stationary distribution given by a gamma dis-
tribution. To make the stationary distribution coincide with the gamma
distribution of a RAS model with parameter Γ we choose σ and b such that
The stationary index of dispersion, ICIR(∞), can be estimated empirically
[33, 28]. We can then use (11) and (12) to obtain the estimates
ˆICIR(∞) − 1,
ˆICIR(∞) − 1.ˆ σ2
4 Mutation models with a rate process
The standard model for the substitution process at a particular locus is a
continuous-time Markov chain. This kind of process is defined by a square
matrix Q called the infinitesimal rate matrix. Suppose, to begin, that there
is a constant evolutionary rate r0. As above, we let Xtdenote the state (e.g.
amino acid) at time t. The transition probabilities are then given by
Pr[Xt= j|X0= i] = [eQr0t]ij. (13)
We suppose that the process has a unique stationary distribution π, where
πjis the stationary probability of state j and
t→∞Pr[Xt= j|X0= i]
for all i and j.
stationary distribution the expected number of mutations over time t equals
r0t. Note that the transition probabilities (13) depend only on the product
r0t, so will be the same if we double the rate and halve the time, for example.
Suppose now that the rate is not constant, but instead varies according
to some fixed function rs, s ≥ 0. Equation (13) then becomes
We assume that Q has been normalised so that in the
Pr[Xt= j|X0= i,r] = [eQτr]ij. (14)
is the area under the curve rs.
In the models we will consider, the fixed function r = (rt)t≥0is replaced
by a random process R = (Rt)t≥0 that is dependent only on the starting
rate r0. The integral
is also random in this case; let gRdenote its pdf. The transition probabilities
can be determined from the expected value of (14) with τrreplaced by the
random variable τR. By the law of total expectation, this simplifies to
Pr[Xt= j|X0= i] =
Let M(η) = Eτ[eητR] denote the moment generating function (mgf) for
the random variable τsR. Then (16) can be rewritten
Pr[Xt= j|X0= i] = [M(Q)]ij
where the function M is interpreted as a matrix function . We assume
that Q can be diagonalised as Q = V ΛV−1, where Λ = diag(λ1,...,λn) is
a diagonal matrix formed from the eigenvalues of Q. The matrix function
M(Q) can then be evaluated as M(Q) = V M(Λ)V−1, where
M(Λ) = diag(M(λ1),...,M(λn)).
See  for a more details on matrix functions. The problem of determining
pattern probabilities therefore boils down to the problem of determining the
moment generating function of the integrated rate, τR(eqn. 15). Tuffley
and Steel use this approach to derive distance estimates for the covarion
For applications in phylogenetics, we need the mgf of τRconditioned on
just the starting rate, or both the starting and finishing rate. The mgf of
τR, conditioned on a starting rate of r0, is then
As before, we let fR(Rt|r0) denote the pdf of Rtconditioned on R0= r0.
Let δ(x) denote the Dirac delta function with δ(0) = 1 and δ(x) = 0 for all
x ?= 0. The mgf of τRconditioned on both the starting and finishing rates is Download full-text
Mr0,rt(η)=E?exp(ητR)??R0= r0, Rt= rt
fR(rt|r0)E?exp(ητR)δ(Rt− rt)??R0= r0
??? R0= r0
Equations (17) and (18) hold irrespective of whether R is discrete or con-
tinuous, a diffusion, jump process, or a continuous time Markov chain.
We note in passing that analytic formulae for Mr0(η) and Mr0,rt(η) exist
in the case that R is a continuous time Markov chain, for example in the
covarion-type model of Galtier . Suppose that the evolutionary rate
switches between rate values g1,g2,...gkfollowing a continuous time Markov
chain with infinitesimal rate matrix G. Let D be the k × k diagonal matrix
with entries g1,g2,...,gk. A careful reworking of the proof of Theorem 1 in
 gives the mgf of τRconditioned on both the starting and finishing rate.
The mgf for τRconditioned on r0= giis then
while the mgf of τRconditioned on r0= giand rt= gjis
This provides an independent derivation of the formula in  for transition
probabilities under a covarion-type model.
5 Moment generating functions and transition prob-
abilities for the CIR model
In this section we derive expressions for the (joint) transition probabilities
Pr[Xt= j|X0= i,R0= r0]. (19)
Pr[Xt= j,Rt= s|X0= i,R0= r0](20)