arXiv:math/0506145v1 [math.PR] 8 Jun 2005
Continuous and Tractable models for the Variation
of Evolutionary Rates
Thomas Lepage∗, Stephan Lawi†, Paul Tupper∗and David Bryant∗
February 1, 2008
McGill Centre for Bioinformatics
Montr´ eal, Qu´ ebec H3A 2B4
ph: 1 514 398-4578. fax: 1 514 398-3387.
∗Dept. of Mathematics and Statistics, McGill University, lMontr´ eal
†Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Pierre et Marie Curie,
We propose a continuous model for evolutionary rate variation
across sites and over the tree and derive exact transition probabilities
under this model. Changes in rate are modelled using the CIR process,
a diffusion widely used in financial applications. The model directly
extends the standard gamma distributed rates across site model, with
one additional parameter governing changes in rate down the tree. The
parameters of the model can be estimated directly from two well-known
statistics: the index of dispersion and the gamma shape parameter of
the rates across sites model. The CIR model can be readily incor-
porated into probabilistic models for sequence evolution. We provide
here an exact formula for the likelihood of a three taxa tree. Larger
trees can be evaluated using Monte-Carlo methods.
Keywords Evolutionary rate; Molecular clocks; CIR process; Diffusion
processes; Covarion; Phylogenetics.
Understanding evolutionary rates and how they vary is one of the central
concerns of molecular evolution. It has been clearly shown that inadequate
models of rate variation, between lineages and between loci, can dramatically
affect the accuracy of phylogenetic inference [1, 2, 3]. The dependency of
molecular dating on evolutionary rate models is even more critical: we will
only obtain precise divergence time estimates from molecular data once we
can model the rate at which sequences evolve [4, 5].
Modelling the evolutionary rate is made difficult by the number and va-
riety of factors influencing it. The base rate of mutation can vary because of
changes in the accuracy of transcription machinery , DNA repair mech-
anisms , and metabolic rate . At the cellular level, selective pressures
can lead to variation of rate between loci and over time, as evidenced by
differential rates of the three codon position [9, 10], the slower evolutionary
rate of highly expressed genes , and the effect of tertiary structure on
patterns of sequence conservation .
Selection also affects the evolutionary rate at the level of populations.
For the most part, the only mutations that affect phylogenetics are those
that are fixed in the population. Hence evolutionary rate is a combination
of mutation rate and fixation rate. Fluctuations in population size, gener-
ation times, and environmental pressures affect fixation rates and thereby
influence evolutionary rate [13, 14, 15].
Because of this complexity, the strategies employed for modelling evolu-
tionary rate have tended to be statistical in nature. As with all statistical
inference, there is an iterative sequence of model formulation, model as-
sessment, and model improvement. The aim is to construct a model that
accurately explains the observed variation but is as simple, and tractable,
Our goal in this paper is to derive a continuous model for rate evolu-
tion that avoids many of the problems of existing approaches. We base
our model on the CIR process, a continuous Markov process that is widely
used in finance to model interest rates . As we shall see, the model fits
well into existing protocols for phylogenetic inference. The process has a
stationary distribution given by a gamma distribution and yet, unlike the
rates-across-sites (RAS) model of Uzzell and Corbin , the rate is allowed
to vary along lineages. The CIR model adds only one parameter to the
RAS model, and this parameter can be estimated directly from the index of
dispersion or the autocorrelation (see below). Furthermore, we can derive
exact transition probabilities when we incorporate CIR based rate variation
into the standard models for sequence evolution.
The outline of the paper is as follows:
• In the following section we summarise the key characteristics of models
for rate evolution, and show how existing models are classified with
respect to these characteristics.
• In Section 3 we present the CIR model for rate evolution and discuss
its basic properties.
• In Section 4 we derive transition probabilities for standard mutation
models where the rate is described as a Markov process.
• In Section 5 we focus on the case where the rate is modelled by a CIR
• In Section 6 we extend this one step further to derive an expression
for likelihood of a three-taxa tree using a mutation model with rate
determined by the CIR process. We note that three-taxa trees are
often used to study differences in evolutionary rate.
We conclude with an outline of future work and work in progress.
In a companion paper (in preparation) we describe the incorporation of
this model into software for Bayesian phylogenetic inference, and use this to
show how our model captures important information lost in standard RAS
2 Properties of models for rate variation
In this section we examine several important characteristics that can be used
to distinguish, and choose between, different models for rate variation. We
discuss how the different existing models fit into this scheme and summarise
the differences between them in Table 1.
The rate of evolution for a given locus at time t ≥ 0 is denoted by Rt.
For each t > 0, Rtis a non-negative random variable, and different models
of rate evolution give different distributions for the rates Rt, t ≥ 0.
Here and throughout the paper we will restrict out attention to Markov
processes. That is, for any t1≤ t2≤ t3, we assume that R(t3) conditioned
on R(t2) is independent of R(t1). In other words, the future depends on the
past only through the present.
Property I: Continuous or Discontinuous Sample Paths
The first characteristic is whether sample paths of the process are con-
tinuous or discontinuous with respect to time. Typically, models with dis-
continous paths have rates Rtthat are constant except for discrete points
in time at which there is a jump in the value (Figure 1-1(a)). If the number
of possible values for the rate is finite, then the rate can easily be described
as a continuous-time Markov chain with a infinitesimal rate matrix. For ex-
ample, in the covarion process  the basic rates are ‘off’ (Rt= 0) or ‘on’
(Rt= 1) and transitions occur between them at exponentially distributed
random time intervals. Galtier [19, 20] generalizes this process to one with
more than two possible states. In other models, the range of possible val-
ues for the rate is continuous, as in the model of Huelsenbeck , where
a rate change event consists of multiplying the previous rate by a gamma
random variable. The rate change events are still discrete and exponentially
There also are models that describes the rate as a continuous function
with time, and the most important class of Markov processes with continuous
paths are diffusions (Figure 1-2(a)).
presented here, the Ornstein-Uhlenbeck model of Aris-Brosou and Yang [4,
22], and the log-normal model of Kishino et al. [5, 23].
Finally, it is also possible for Rtto make jumps in value at a discrete set
of times while also changing continuously in between these points.
Examples include the CIR process
Property II: Long Term Behaviour and Ergodicity
The second property we consider is the distribution of Rtas t goes to
infinity, that is, the distribution of the rate of evolution in the long term.
Surprisingly, many models of rate evolution are very badly behaved in the
One problematic class of processes that have already been applied to
rates in phylogenetics is the martingales. We say that a Markov process is
a martingale if, for all s,t ≥ 0 we have E[Mt+s|Mt] = Mt. An example
of of a Markov martingale is Brownian motion. As a result of this fairly
innocuous looking condition, a martingale Mthas the property that either
E[|Mt|] is unbounded in time or Mtconverges to a random constant .
Either possibility is undesirable from a modelling point of view. This may
not be a problem if we only look at the process over a finite time, but
to the classification of property I. On top are examples of the rate history.
Below are the corresponding integrated rates τ(t) =?t
events, and in figures 2(a-b), R(t) is modelled as a diffusion process, with
A representation of the two classes of rate process with respect
s=0Rsds. The figures
1(a-b) refer to a continuous-time Markov chain with discrete rate change
neither is it particularly desirable. The processes of Kishino et al. [5, 23]
and Huelsenbeck et al.  all have the property that either Rtor log(Rt)
is a martingale.
At a purely theoretical level, we observe that an ever-increasing variance
will result for almost any signal that is only driven by its initial value and a
stochastic force, with no directional bias. The position of a particle subjected
to a random force produced by collisions with other particles is a classical
example of such a case. In our context, the effects on the evolutionary rate
are not independent of the actual rate : whatever the theoretical framework
we consider, a high evolutionary rate is not as likely to increase (or to stay
in high values) as to go back to smaller values.
fits particularly well to this idea, where periods of drastic adaptation with
The episodic evolution
high evolutionary rates are naturally followed by periods where a population
is adapted and its genome evolves much more slowly. Even according to
the neutral theory, as argued by Takahata , the overall dynamic of the
rate should behave as a random function that takes high values whenever
bottlenecks occur, and goes back to small values afterwards.
The concept of ergodicity naturally arises from this observation. We say
that a Markov process is ergodic if for any initial rate R0the distribution
of Rtconverges to a unique distribution as t goes to infinity. The limiting
distribution is known as the invariant or stationary distribution. Examples
of ergodic processes include the OU process, the CIR model and (usually)
the discrete space covarion and covarion-type models [26, 19, 20].
One possible way for a process to not be ergodic is if the distribution of
Rtdoes not converge for large t for some initial rate R0. This must be the
case if Rtis a martingale and does not converge to a constant, as is the case
with Brownian motion. Another possibility is that Rtconverges to different
stationary distributions for different values of R0.
Property III: Tractability
A highly desirable feature of any model is its tractability, both math-
ematical (does there exist a closed formula?) and computational (can we
compute probabilities efficiently?). Nowadays, Monte Carlo methods make
it possible to use arbitrarily complex models: however, explicit analytical
formulae allow for more efficient sampling .
There are several probability distribution functions that are important
to have when working with rate processes. The most basic is the distribution
of the rate Rtgiven the rate at time t = 0. This we have for the models
[4, 22, 23, 5] and for the CIR model, but not for the models of .
In phylogenetics we incorporate the model for evolutionary rate into the
mutation model for sequence evolution at a site. These interact to give a
joint process (Rt,Xt) for both the rate Rtat time t and the nucleotide or
protein Xtat time t. To evaluate the likelihood we require an expression for
the joint conditional probability
P[Xt= j,Rt= s|X0= i,R0= r] (1)
of going from one nucleotide (or amino acid) state and rate state to another
pair of states.Even though it is sometimes possible to perform Monte
Carlo computations to estimate this probability without a formula (as in
), having a formula will speed up the computations significantly without
having to resort to approximations, as in [23, 5].
Property IV: Autocovariance and dispersion
There is general agreement , ,  on the relevance of autocorre-
lation in the modelling of evolutionary rate. Broadly speaking, if the various
causes that explain rate variation (generation time, population size, environ-
mental fitness) vary with time, it should be reflected in rate variations. The
extent to which the rate varies can be studied using the index of dispersion
(Kimura,  , Langley and Fitch ). Let N(t) be the number of sub-
stitutions or mutations of a sequence over time t. The index of dispersion
I(t) is defined as
This statistic can be estimated by comparing the number of substitutions
that have accumulated in different lineages [32, 34]. The population genetics
community has proposed different models to account for a high index of
dispersion (, ), and any reasonable model should yield an index of
dispersion of at least one.
The index of dispersion resulting from a particular model of rate varia-
tion is a function of the autocovariance of that model. The autocovariance
for a process Rtis defined by
ρ(t) = Cov(R0,Rt). (3)
For many processes we can derive an explicit formula for the autocovariance.
If we assume that the substitutions occur according to a Poisson process with
rate governed by our rate process (that is, the substitutions follow a doubly
stochastic or Cox process, Section 4) and the rate process has autocovariance
function ρ(t) then
I(t) = 1 +2?t
as stated by a theorem in , and the stationary index of dispersion  is
I(∞) = lim
t→∞I(t) = 1 +2?∞
provided that µ, the stationary mean of the process R(t), and the limit,
exist. Note that if there is any variation in rate then the index of dispersion
will be greater than one .
Some rate models in phylogenetics [23, 22] don’t model explicitly the
rate, but instead assign a (fixed) rate to each branch, so that the expected
number of substitutions on a particular branch is equal to its length times
its assigned (constant) rate.
A close look at the log-normal model from Thorne et al. , which differs
from their previous version  in that the rate is explicitly modelled, we
suggest that the rate has constant autocovariance, since this rate process is
close to a transform of the Brownian motion, and Brownian motion has a
constant autocovariance function. Put into equation (5), we see that the the
index of dispersion diverges. This problematic result illustrates the necessity
of a balance between the presence of autocorrelation on one side, and the
decrease of autocorrelation on a large time scale.
Property V: Heterotachy or Homotachy
There are two general ways that models for evolutionary rate can be
incorporated into phylogenetics. On one hand, we have rate variation among
lineages that applies to all sites (or loci) together. This can be modelled by
trees for which the paths from the root to the leaves have different lengths.
The rate variation explains the extent to which the evolution of the sequences
has violated the molecular clock. Alternatively, we can introduce a distinct
rate process for each site or locus.
lineage rate changes are site-specific . The transition probabilities that we
derive in Section 4 can be applied to homotachous as well as heterotachous
This models heterotachy, where the
3 A Continuous diffusion model for the evolution-
A Markov process with continuous paths and satisfying some additional
smoothness conditions on its transition probabilities  is called a diffu-
sion. There are many ways of specifying a diffusion process: perhaps the
most intuitive one is by giving the probability distribution function (pdf)
of Rtgiven R0= r0, for arbitrary r0.We denote this pdf by fR[Rt|r0]. For
example, Brownian motion with parameter σ2is defined by the condition
that fR[Rt|r0] is a normal density with mean r0and variance σ2t.
A mathematically convenient representation of a diffusion is by means of
a stochastic differential equation (SDE). In the same way that a dynamical
system can be defined as the solution of a differential equation, a diffusion
for the transi-
for the autoco-
withYes NoneNone 
YesYes None[18, 26,
HLS withNo NoneNone
Log-Normal NoNone Constant auto-
DIffusionNo None[4, 22]
Table 1: Models for the substitution rate, classified according to the prop-
erties of section 2 . CTMC stands for “Continuous-Time Markov Chain”.
process Rtcan be defined as the solution of an equation taking the general
form (see  p.61)
dRt= α(t,Rt)dt + β(t,Rt)dBt. (6)
Here, α(t,Rt) represents the deterministic effect on Rt, β(t,Rt) the stochas-
tic part, and dBtis an infinitesimal “random” increment. Brownian motion
corresponds to the case when α(t,Rt) = 0 for all t, β(t,Rt) is constant and
the SDE becomes
Note that if β(t,Rt) = 0 for all t and Rtthen (6) becomes a deterministic
ordinary differential equation.
Going from an SDE such as (6) to a pdf for the diffusion involves solving
a variable-coefficient second-order partial differential equation (PDE). For
general functions α and β this PDE has no analytic solution. There are
very few diffusions known that have closed form equations for their pdfs,
and even fewer of these are ergodic. The simplest ergodic diffusions with
closed-form expressions for the pdf are the Ornstein-Uhlenbeck and the CIR
(Cox-Ingersoll-Ross)  processes.
The Ornstein Uhlenbeck (OU) process is described by the SDE
dRt= −bRtdt + σdBt.
The pdf for Rtgiven R0= r0is the normal density with mean r0e−btand
variance σ2(1 − e−2θt).Its stationary distribution is normal with mean
0 and variance σ2. The OU process was used by Aris-Brosou and Yang
 to model evolutionary rates. However, the OU process can take on
negative values, and it is not clear how it can be used directly without
any transformation, such as a reflected OU or a squared OU. Aris-Brosou
and Yang also proposed another model, the EXP (for exponential) model,
defined as the following : the rate assigned to a branch is drawn from an
exponential distribution with mean equal to the rate of its ancestral branch.
It is then obvious that their EXP model was a martingale. They outlined
that the OU model seemed to provide a better fit to their data than the
EXP model. Even if the reason of this better fit is still to be investigated,
it seems reasonable to think that the ergodic property of the OU model
could be a important factor. They also mentioned that the σ2parameter
of the OU model was hard to infer, perhaps because the OU model has an
insufficient number of free parameters.
The use of the CIR model solves the problem, since it is a generaliza-
tion of the squared OU process, where the mean and the variance can be
independently inferred by the addition of a third parameter. If the mean
is fixed to one, we avoid any identifiability problem with branch lengths
without fixing the variance, which can therefore be inferred as well as the
The CIR process satisfies the SDE
dRt= b(a − Rt)dt + σ
and the pdf fR(Rt|r0) for Rtgiven R0= r0is a non-central χ2distribution
with degree of freedom 4ab/σ2and parameter of non-centrality
Its mean and variance are equal to
E[rt]=r0e−bt+ a(1 − e−bt)
b(e−bt− e−2bt) +aσ2
2b(1 − e−bt)2. (9)
The stationary distribution of Rtis a gamma distribution with shape
parameter 2ab/σ2and scale parameter σ2/2b. Hence the mean of the sta-
tionary distribution is a and the variance isaσ2
Unlike an OU process, if r0, a, and b are all positive a CIR process is
always non-negative. The square of an OU process is a special case of the
CIR process. Furthermore, by multiplying Rtby a constant in equation (7),
we see that multiplying a CIR process by a positive constant gives another
The covariance of the stationary CIR process can be exactly computed
ρ(t) = Cov(R0,Rt) =aσ2
From this, (4) leads to a closed formula for the index of dispersion:
ICIR(t) = 1 +σ2
b3t(bt − 1 + e−bt).
ICIR(∞) = lim
t→∞ICIR(t) = 1 +σ2
In Section 2 we emphasized that the concept of autocovariance is close
to the index of dispersion. As Zheng showed , the effect of complex
infinitesimal rate matrices on the index of dispersion (with constant rate) is
not likely to explain alone the observed large empirical values. If the rate
varies, Cutler  showed that an elevated index of dispersion can only be
achieved if the time-scale of the rate process is approximately of the same
magnitude as the substitution process itself. The CIR process provides the
possibility to satisfy this property, while incorporating autocovariance. It is
the consensus of these two ideas that should guide our choice for the rate of
From (7) we see that the CIR process possesses three parameters, namely
the stationary mean a, the stationary variance σ2, and the intensity of the
force that drives the process to its stationary distribution, b. The parameter
b determines how fast the process autocovariance goes to 0 as t increases.
The three parameters of the CIR process can be quickly estimated from
standard statistics in molecular evolution. The parameter a is a scale param-
eter. It determines the expected rate at any time given no other information.
Throughout the paper, we will assume that a = 1, so that the model has an
expected rate equal to one. This parallels the constraint that the gamma
distribution has an expected rate equal to one in the Rate-Across-Site (RAS)
The CIR process has a stationary distribution given by a gamma dis-
tribution. To make the stationary distribution coincide with the gamma
distribution of a RAS model with parameter Γ we choose σ and b such that
The stationary index of dispersion, ICIR(∞), can be estimated empirically
[33, 28]. We can then use (11) and (12) to obtain the estimates
ˆICIR(∞) − 1,
ˆICIR(∞) − 1.ˆ σ2
4 Mutation models with a rate process
The standard model for the substitution process at a particular locus is a
continuous-time Markov chain. This kind of process is defined by a square
matrix Q called the infinitesimal rate matrix. Suppose, to begin, that there
is a constant evolutionary rate r0. As above, we let Xtdenote the state (e.g.
amino acid) at time t. The transition probabilities are then given by
Pr[Xt= j|X0= i] = [eQr0t]ij. (13)
We suppose that the process has a unique stationary distribution π, where
πjis the stationary probability of state j and
t→∞Pr[Xt= j|X0= i]
for all i and j.
stationary distribution the expected number of mutations over time t equals
r0t. Note that the transition probabilities (13) depend only on the product
r0t, so will be the same if we double the rate and halve the time, for example.
Suppose now that the rate is not constant, but instead varies according
to some fixed function rs, s ≥ 0. Equation (13) then becomes
We assume that Q has been normalised so that in the
Pr[Xt= j|X0= i,r] = [eQτr]ij. (14)
is the area under the curve rs.
In the models we will consider, the fixed function r = (rt)t≥0is replaced
by a random process R = (Rt)t≥0 that is dependent only on the starting
rate r0. The integral
is also random in this case; let gRdenote its pdf. The transition probabilities
can be determined from the expected value of (14) with τrreplaced by the
random variable τR. By the law of total expectation, this simplifies to
Pr[Xt= j|X0= i] =
Let M(η) = Eτ[eητR] denote the moment generating function (mgf) for
the random variable τsR. Then (16) can be rewritten
Pr[Xt= j|X0= i] = [M(Q)]ij
where the function M is interpreted as a matrix function . We assume
that Q can be diagonalised as Q = V ΛV−1, where Λ = diag(λ1,...,λn) is
a diagonal matrix formed from the eigenvalues of Q. The matrix function
M(Q) can then be evaluated as M(Q) = V M(Λ)V−1, where
M(Λ) = diag(M(λ1),...,M(λn)).
See  for a more details on matrix functions. The problem of determining
pattern probabilities therefore boils down to the problem of determining the
moment generating function of the integrated rate, τR(eqn. 15). Tuffley
and Steel use this approach to derive distance estimates for the covarion
For applications in phylogenetics, we need the mgf of τRconditioned on
just the starting rate, or both the starting and finishing rate. The mgf of
τR, conditioned on a starting rate of r0, is then
As before, we let fR(Rt|r0) denote the pdf of Rtconditioned on R0= r0.
Let δ(x) denote the Dirac delta function with δ(0) = 1 and δ(x) = 0 for all
x ?= 0. The mgf of τRconditioned on both the starting and finishing rates is
Mr0,rt(η)=E?exp(ητR)??R0= r0, Rt= rt
fR(rt|r0)E?exp(ητR)δ(Rt− rt)??R0= r0
??? R0= r0
Equations (17) and (18) hold irrespective of whether R is discrete or con-
tinuous, a diffusion, jump process, or a continuous time Markov chain.
We note in passing that analytic formulae for Mr0(η) and Mr0,rt(η) exist
in the case that R is a continuous time Markov chain, for example in the
covarion-type model of Galtier . Suppose that the evolutionary rate
switches between rate values g1,g2,...gkfollowing a continuous time Markov
chain with infinitesimal rate matrix G. Let D be the k × k diagonal matrix
with entries g1,g2,...,gk. A careful reworking of the proof of Theorem 1 in
 gives the mgf of τRconditioned on both the starting and finishing rate.
The mgf for τRconditioned on r0= giis then
while the mgf of τRconditioned on r0= giand rt= gjis
This provides an independent derivation of the formula in  for transition
probabilities under a covarion-type model.
5 Moment generating functions and transition prob-
abilities for the CIR model
In this section we derive expressions for the (joint) transition probabilities
Pr[Xt= j|X0= i,R0= r0]. (19)
Pr[Xt= j,Rt= s|X0= i,R0= r0](20)
As we have seen, to evaluate these probabilities we need to determine the
moment generating functions (mgfs) defined in equations (17) and (18).
We use the Feynman-Kac formula [44, 24] to derive analytic formulae
for Mr0(η) and Mrt,r0(η) under the CIR model. Let g(·) be a real-valued
function. Define the function v = v(t,x) by
v(t,x) = E
??? R0= x
The Feynman-Kac formula  asserts that v(t,x) solves the following par-
tial differential equation (PDE)
∂tv(t,x) = b(1 − x)∂
∂x2v(t,x) + ηxv (22)
for t > 0, x ∈ R, and with boundary condition
v(0,x) = g(x) for all x ∈ R. (23)
We apply the methods in  and  to solve these pdes with the different
First consider the case when we condition only on the initial rate, eqn. (17).
To make (21) equal to (17) we set g(x) = 1 for all x. The boundary condition
(23) in this case becomes
v(0,x) = 1 for all x ∈ R..
With this boundary condition, the pde (22) has solution
v(t,x) = Ψ(η,t)e−xΞ(η,t)
bcosh(bt/2) + bsinh(bt/2)
bcosh(bt/2) + bsinh(bt/2)
We therefore have
Mr0(η) = Ψ(η,t)e−r0Ξ(η,t).(27)
The case when both the starting and finished rates are specified is more
complicated. From (18) the mgf Mr0,rtcan be written
where, in this case, v(t,x) is given by (21) with g(x) = δ(x − rt). The
boundary condition (23) therefore becomes
v(0,x) = δ(x − rt).
With this new boundary condition, the pde (22) has solution
σ2(b − b) +b − b
σ2x −b + b
σ2rt− c(rt+ x)e−bt
σ2(1 − e−bt)
and Iν(x) is the modified Bessel function of the first kind with parameter ν
. Hence the mgf conditioned on initial and final rates is given by
σ2(b − b) +b − b
σ2r0−b + b
σ2rt− c(rt+ r0)e−bt
where c and b are defined above and, from Section 3, fR(Rt|R0 = r0) is
the pdf for a non-central χ2distribution with degree of freedom 4ab/σ2and
parameter of non-centrality
Bringing everything together, we have our main result.
Theorem 1 Define P by Pij = Pr[Xt = j,Rt = rt|X0 = i,R0 = r0].
Suppose that Q = V ΛV−1where Λ is a diagonal matrix containing the
eigenvalues λ1,...,λnof Q. Then
P = M(Q) = V M(Λ)V−1
where M(Λ) is the diagonal matrix where, for all i,
6 Three taxa phylogenies
The simplest phylogeny for which we can distinguish between constant and
variable evolutionary rates is a tree with three taxa. For this reason, there
are many methods for testing, and estimating, rate variation that are based
on three taxon analyses . Here we show that the likelihood for a three
taxa tree, under the CIR model of rate variation, can be computed exactly.
The problem for general phylogenies is more complex since we have to in-
tegrate out rates for the internal nodes. Here, we consider a heterotachous
model, so that each site has its own rate history. Because the sites (and the
rate at each site) evolve independently from each other, the likelihood of a
sequence will be the product of all site-specific likelihoods. Therefore, we
only require the likelihood computation for one site.
We recall that the stationary distribution of the CIR is a gamma distri-
bution Γ(ω,ν), where ω = 2b/σ2and ν = 2b/σ2, i.e.
Therefore the stationary mean and variance are 1 and σ2/2b.
In order to get the transition probabilities, we will use the mgf of τR
unconditioned on the final rate, given by equation (27).
probability matrix of the subsitution process, given initial rates, can be
obtained by equations (27) and (29). Let λ1,...,λnbe the eigenvalues of
Q. Using eigenvalue decomposition, we can find vectors u(1),...,u(n)and
Pr[Xt= i|X0= j,R0= r0] =
where we changed slightly our notation and explicitly wrote the dependency
of Mr0on t.
Now consider the 3-taxa tree with branches of lengths t1,t2,t3leading
to leaves labelled with states x1,x2,x3(Figure 2). If we condition on a rate
r0and state x0at the root then the probability of observing x1,x2,x3at
the leaves is given by
L(x1,x2,x3|x0,r0)=P[Xt1= x1|x0,r0]P[Xt2= x2|x0,r0]P[Xt3= x3|x0,r0]
Figure 2: A three taxa unrooted tree, with branch lengths and one character
state and rate value associated to each leaf.
The rate at the root is assumed to have the stationary distribution fR0
given by (30). The likelihood integrated with respect to r0is then
which by (32) equals
We now use the formula (27) for the mgf’s derived above.
Using integration by parts, or simply using the fact that the gamma distri-
bution integrates to 1, we get
ω + Ξ(λi,t1) + Ξ(λj,t3) + Ξ(λk,t3)
Finally, we can substitute this back into (33) to obtain
ω + Ξ(λi,t1) + Ξ(λj,t3) + Ξ(λk,t3)
The formula extends immediately to phylogenies with n leaves attached
to the root, though the number of terms in the summation increases ex-
ponentially. Our approach has been to evaluate likelihoods on complete
phylogenies using Monte-Carlo techniques, together with the exact transi-
tion probabilities derived here.
We have shown how, given a few natural criteria for our model selection,
the CIR appeared as the simplest continuous model that is at the same time
ergodic, has a non-zero autocovariance function and that can account for an
arbitrarily large index of dispersion. Moreover, we provided simple ways to
estimate its parameters with the help of two observable statistics, namely
the RAS gamma parameter and empirical index of dispersion. Another very
interesting practical aspect of the CIR process is that it can be easily, and
without approximations, implemented in the MCMC framework.
A possible future extension of our model could involve jump models, in which
the rate path is discontinuous as in the continuous-time Markov chain, but
also varies as diffusion between these discontinuities. However, the use of
such a model implies the use of more parameters, and it may well be the
case that the relative weakness of the rate of evolution signal cannot allow
the use of more than two parameters, because of identifiability problems.
 Z. Yang, Maximum likelihood estimation of phylogeny from DNA se-
quences when substitution rates differ over sites, Mol. Biol. Evol. 10
 P. Lopez, D. Casane, H. Philippe, Heterotachy, an important process
of protein evolution, Mol. Biol. Evol. 19 (2002) 1-7.
 E. Susko, Y. Inagaki, C. Field, M. E. Holder, A. J. Roger, Testing for
differences in rates-across-sites distributions in phylogenetic subtrees,
Mol. Biol. Evol. 19 (2002) 1514-1523.
 S. Aris-Brosou, Z. Yang, Effects of models of rate evolution on esti-
mation of divergence dates with special reference to the metazoan 18S
ribosomal RNA phylogeny, Syst. Biol. 51 (2002) 703-714.
 H. Kishino, J. L. Thorne, W. J. Bruno, Performance of a divergence
time estimation method under a probabilistic model of rate evolution,
Mol. Biol. Evol. 18 (2001) 352-361.
 R. J. Britten, Rates of DNA sequence evolution differ between taxo-
nomic groups, Science 231 (1986) 1393-1398.
 K. H. Wolfe, P. M. Sharp, W.-H. Li, Mutation rates differ among
regions of the mammalian genome, Nature 337 (1989) 283-285.
 A. P. Martin, S. R. Palumbi, Body size, metabolic rate, generation
time, and the molecular clock, Proc. Natl. Sci. USA 90 (1993) 4087-
 N. Goldman, Z. Yang, A codon-based model of nucleotide substitution
for protein-coding DNA sequences, Mol. Biol. Evol. 11 (1994) 725-736.
 S.V. Muse, B.S. Gaut, A likelihood method for comparing synonymous
and nonsynonymous nucleotide substitution rates, with application to
the chloroplast genome, Mol. Biol. Evol. 11 (1994) 715-724.
 C. P` al, B. Papp, L. D. Hurst, Highly expressed genes in yeast evolve
slowly, Genetics 158 (1998): 927-931.
 D. M. Robinson, D. T. Jones, H. Kishino, N. Goldman, J.L. Thorne,
Protein evolution with dependence among codons due to tertiary
structure, Mol. Biol. Evol. 20 (1998) 1692-1704.
 T. Ohta, H. Tachida, Theoretical study of near neutrality. I. Heterozy-
gosity and rate of mutant substitution, Genetics 126 (1990) 210-229.
 N. Takahata, On the overdispersed molecular clock, Genetics 116
 C. D. Laird, B. L. Mc Conaughy, B. J. Mc Carthy, Rate of fixation of
nucleotide substitutions in evolution, Nature 224 (1969) 149-154.
 J. C. Cox, J. E. Ingersoll, S. A. Ross, A theory of the term structure
of interest rates, Econometrica 53 (1985) 385-408
 T. Uzzell, K.W. Corbin, Fitting discrete probability distributions to
evolutionary events, Science 172 (1971) 1089-1096.
 W. M. Fitch, E. Markowitz, An improved method for determining
codon variability in a gene and its application to the rate of fixation
of mutations in evolution, Biochem. Genet. 4 (1970) 579-593.
 N. Galtier,
covarion-like model, Mol. Biol. Evol. 18 (2001) 866-873.
Maximum-likelihood phylogenetic analysis under a
 N. Galtier, Markov-modulated markov chains and the covarion process
of molecular evolution, Journal of Computational Biology 11 (2004)
 J. P. Huelsenbeck, B. Larget, D. Swofford, A compound Poisson pro-
cess for relaxing the molecular clock, Genetics 154 (2000) 1879-1892
 S. Aris-Brosou, Z. Yang, Bayesian models of episodic evolution support
a late precambrian explosive diversification of the metazoa, Mol. Biol.
Evol. 20 (2003) 1947-1954.
 J.L. Thorne, H. Kishino, I.S. Painter, Estimating the rate of evolution
of the rate of molecular evolution, Mol. Biol. Evol. 15 (1998) 1647-
 B. Øksendal, Stochastic Differential Equations: an Introduction with
Applications, Springer, 1998.
 D. Williams, Probabilities with Martingales, Cambridge mathematical
 C. Tuffley, M. A. Steel, Modeling the covarion hypothesis of nucleotide
substitution, Math. Biosci. 147 (1998) 63-91.
 J. S. Liu, Monte Carlo Strategies in Scientific Computing, Springer,
 J.H. Gillespie, Lineage effects and the index of dispersion of molecular
evolution, Mol. Biol. Evol. 6 (1989) 636-647.
 L. Chao, D. E. Carr, The molecular clock and the relationship between
population size and generation time, Evolution 47 (1993) 688-690.
 M. J. Sanderson, A nonparametric approach to estimating divergence
times in the absence of rate constancy, Mol. Biol. Evol. 14 (1997)
 T. Ohta, M. Kimura, On the constancy of the evolutionary rate in
cistrons, J. Mol. Evol. 1 (1971) 18-25.
 M. Kimura, The Neutral Theory of Molecular Evolution, Cambridge
University Press, Cambridge, 1983.
 C.H. Langley, W.M. Fitch, The constancy of evolution: a statistical
anlalysis of the α and β haemoglobins, cytochrome c, and fibrinopep-
tide A, in: Genetic Structure of Populations, Univ. of Hawaii press,
 M. Bulmer, Estimating the variablility of substitution rates, Genetics
123 (1989) 615-619.
 D. J. Cutler, Understanding the overdispersed molecular clock, Ge-
netics 154 (2000) 1403-1417.
 D. R. Cox, V. Isham, Point Processes, Chapman and Hall, NewYork,
 J. H., Gillespie, The Causes of Molecular Evolution, Oxford University
 S. Karlin, H. M. Taylor, H. M., A Second Course in Stochastic Pro-
cesses, New York academic press, 1981.
 Q. Zheng, On the dispersion index of a Markovian molecular clock,
Math. Biosci. 172 (2001) 115-128.
 D. J. Cutler, The index of dispersion of molecular evolution: slow
fluctuations, Theor. Pop. Biol. 57 (2000) 177-186.
 R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, Download full-text
 G. H. Golub, C. F. Van Loan, Matrix Computations, John Hopkins
University Press, 1996.
 J. N. Darroch, K. W. Morris, Passage-time generating functions for
continuous-time finite Markov chains, J. Appl. Prob. 5 (1968) 414-426.
 M. Kac, On Some connections between probability theory and differen-
tial and integral equations, Proceedings of the second Berkeley sympo-
sium on probability and statistics, University of California, Berkeley,
 S. E. Shreve, Stochastic Calculus for Finance, Springer Finance, 2004.
 C. Albanese, S. Lawi, Laplace Transforms for Integrals of Markov
Processes, Submitted to Markov Proc. Rel. Fields (2005).
 M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions
with Formulas, Graphs, and Mathematical Tables, Dover New York,