Continuous and tractable models for the variation of evolutionary rates.
ABSTRACT We propose a continuous model for variation in the evolutionary rate across sites and over the phylogenetic tree. We derive exact transition probabilities of substitutions under this model. Changes in rate are modelled using the CIR process, a diffusion widely used in financial applications. The model directly extends the standard gamma distributed rates across site model, with one additional parameter governing changes in rate down the tree. The parameters of the model can be estimated directly from two well-known statistics: the index of dispersion and the gamma shape parameter of the rates across sites model. The CIR model can be readily incorporated into probabilistic models for sequence evolution. We provide here an exact formula for the likelihood of a three-taxon tree. The likelihoods of larger trees can be evaluated using Monte-Carlo methods.
- [Show abstract] [Hide abstract]
ABSTRACT: Myriapods had been considered closely allied to hexapods (insects and relatives). However, analyses of molecular sequence data have consistently placed Myriapoda either as a sister group of Pancrustacea, comprising crustaceans and hexapods, and thereby supporting the monophyly of Mandibulata, or retrieved Myriapoda as a sister group of Chelicerata (spiders, ticks, mites and allies). In addition, the relationships among the four myriapod groups (Pauropoda, Symphyla, Diplopoda, Chilopoda) are unclear. To resolve the phylogeny of myriapods and their relationship to other main arthropod groups, we collected transcriptome data from the symphylan Symphylella vulgaris, the centipedes Lithobius forficatus and Scolopendra dehaani, and the millipedes Polyxenus lagurus, Glomeris pustulata and Polydesmus angustus by 454 sequencing. We concatenated a multiple sequence alignment that contained 1,550 orthologous single copy genes (1,109,847 amino acid positions) from 55 euarthropod and 14 outgroup taxa. The final selected alignment included 181 genes and 37,425 amino acid positions from 55 taxa, with eight myriapods and 33 other euarthropods. Bayesian analyses robustly recovered monophyletic Mandibulata, Pancrustacea and Myriapoda. Most analyses support a sister group relationship of Symphyla in respect to a clade comprising Chilopoda and Diplopoda. Inclusion of additional sequence data from nine myriapod species resulted in an alignment with poor data density, but broader taxon average. With this dataset we inferred Diplopoda + Pauropoda as closest relatives (i.e., Dignatha) and recovered monophyletic Helminthomorpha. Molecular clock calculations suggest an early Cambrian emergence of Myriapoda ∼513 million years ago and a late Cambrian divergence of myriapod classes. This implies a marine origin of the myriapods and independent terrestrialization events during myriapod evolution.Molecular Phylogenetics and Evolution 04/2014; · 4.02 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The molecular clock presents a means of estimating evolutionary rates and timescales using genetic data. These estimates can lead to important insights into evolutionary processes and mechanisms, as well as providing a framework for further biological analyses. In order to deal with rate variation among genes and among lineages, a diverse range of molecular-clock methods have been developed. These methods have been implemented in various software packages and differ in their statistical properties, ability to handle different models of rate variation, capacity to incorporate various forms of calibrating information, and tractability for analysing large data sets. Choosing a suitable molecular-clock model can be a challenging exercise, but a number of model-selection techniques are available. In this review, we describe the different forms of evolutionary rate heterogeneity and explain how they can be accommodated in molecular-clock analyses. We provide an outline of the various clock methods and models that are available, including the strict clock, local clocks, discrete clocks, and relaxed clocks. Techniques for calibration and clock-model selection are also described, along with methods for handling multilocus data sets. We conclude our review with some comments about the future of molecular clocks.This article is protected by copyright. All rights reserved.Molecular Ecology 10/2014; · 5.84 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Time-calibrated species phylogenies are critical for addressing a wide range of questions in evolutionary biology, such as those that elucidate historical biogeography or uncover patterns of coevolution and diversification. Because molecular sequence data are not informative on absolute time, external data-most commonly, fossil age estimates-are required to calibrate estimates of species divergence dates. For Bayesian divergence time methods, the common practice for calibration using fossil information involves placing arbitrarily chosen parametric distributions on internal nodes, often disregarding most of the information in the fossil record. We introduce the "fossilized birth-death" (FBD) process-a model for calibrating divergence time estimates in a Bayesian framework, explicitly acknowledging that extant species and fossils are part of the same macroevolutionary process. Under this model, absolute node age estimates are calibrated by a single diversification model and arbitrary calibration densities are not necessary. Moreover, the FBD model allows for inclusion of all available fossils. We performed analyses of simulated data and show that node age estimation under the FBD model results in robust and accurate estimates of species divergence times with realistic measures of statistical uncertainty, overcoming major limitations of standard divergence time estimation methods. We used this model to estimate the speciation times for a dataset composed of all living bears, indicating that the genus Ursus diversified in the Late Miocene to Middle Pliocene.Proceedings of the National Academy of Sciences 07/2014; 111(29). · 9.81 Impact Factor
arXiv:math/0506145v1 [math.PR] 8 Jun 2005
Continuous and Tractable models for the Variation
of Evolutionary Rates
Thomas Lepage∗, Stephan Lawi†, Paul Tupper∗and David Bryant∗
February 1, 2008
McGill Centre for Bioinformatics
Montr´ eal, Qu´ ebec H3A 2B4
ph: 1 514 398-4578. fax: 1 514 398-3387.
∗Dept. of Mathematics and Statistics, McGill University, lMontr´ eal
†Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Pierre et Marie Curie,
We propose a continuous model for evolutionary rate variation
across sites and over the tree and derive exact transition probabilities
under this model. Changes in rate are modelled using the CIR process,
a diffusion widely used in financial applications. The model directly
extends the standard gamma distributed rates across site model, with
one additional parameter governing changes in rate down the tree. The
parameters of the model can be estimated directly from two well-known
statistics: the index of dispersion and the gamma shape parameter of
the rates across sites model. The CIR model can be readily incor-
porated into probabilistic models for sequence evolution. We provide
here an exact formula for the likelihood of a three taxa tree. Larger
trees can be evaluated using Monte-Carlo methods.
Keywords Evolutionary rate; Molecular clocks; CIR process; Diffusion
processes; Covarion; Phylogenetics.
Understanding evolutionary rates and how they vary is one of the central
concerns of molecular evolution. It has been clearly shown that inadequate
models of rate variation, between lineages and between loci, can dramatically
affect the accuracy of phylogenetic inference [1, 2, 3]. The dependency of
molecular dating on evolutionary rate models is even more critical: we will
only obtain precise divergence time estimates from molecular data once we
can model the rate at which sequences evolve [4, 5].
Modelling the evolutionary rate is made difficult by the number and va-
riety of factors influencing it. The base rate of mutation can vary because of
changes in the accuracy of transcription machinery , DNA repair mech-
anisms , and metabolic rate . At the cellular level, selective pressures
can lead to variation of rate between loci and over time, as evidenced by
differential rates of the three codon position [9, 10], the slower evolutionary
rate of highly expressed genes , and the effect of tertiary structure on
patterns of sequence conservation .
Selection also affects the evolutionary rate at the level of populations.
For the most part, the only mutations that affect phylogenetics are those
that are fixed in the population. Hence evolutionary rate is a combination
of mutation rate and fixation rate. Fluctuations in population size, gener-
ation times, and environmental pressures affect fixation rates and thereby
influence evolutionary rate [13, 14, 15].
Because of this complexity, the strategies employed for modelling evolu-
tionary rate have tended to be statistical in nature. As with all statistical
inference, there is an iterative sequence of model formulation, model as-
sessment, and model improvement. The aim is to construct a model that
accurately explains the observed variation but is as simple, and tractable,
Our goal in this paper is to derive a continuous model for rate evolu-
tion that avoids many of the problems of existing approaches. We base
our model on the CIR process, a continuous Markov process that is widely
used in finance to model interest rates . As we shall see, the model fits
well into existing protocols for phylogenetic inference. The process has a
stationary distribution given by a gamma distribution and yet, unlike the
rates-across-sites (RAS) model of Uzzell and Corbin , the rate is allowed
to vary along lineages. The CIR model adds only one parameter to the
RAS model, and this parameter can be estimated directly from the index of
dispersion or the autocorrelation (see below). Furthermore, we can derive
exact transition probabilities when we incorporate CIR based rate variation
into the standard models for sequence evolution.
The outline of the paper is as follows:
• In the following section we summarise the key characteristics of models
for rate evolution, and show how existing models are classified with
respect to these characteristics.
• In Section 3 we present the CIR model for rate evolution and discuss
its basic properties.
• In Section 4 we derive transition probabilities for standard mutation
models where the rate is described as a Markov process.
• In Section 5 we focus on the case where the rate is modelled by a CIR
• In Section 6 we extend this one step further to derive an expression
for likelihood of a three-taxa tree using a mutation model with rate
determined by the CIR process. We note that three-taxa trees are
often used to study differences in evolutionary rate.
We conclude with an outline of future work and work in progress.
In a companion paper (in preparation) we describe the incorporation of
this model into software for Bayesian phylogenetic inference, and use this to
show how our model captures important information lost in standard RAS
2 Properties of models for rate variation
In this section we examine several important characteristics that can be used
to distinguish, and choose between, different models for rate variation. We
discuss how the different existing models fit into this scheme and summarise
the differences between them in Table 1.
The rate of evolution for a given locus at time t ≥ 0 is denoted by Rt.
For each t > 0, Rtis a non-negative random variable, and different models
of rate evolution give different distributions for the rates Rt, t ≥ 0.
Here and throughout the paper we will restrict out attention to Markov
processes. That is, for any t1≤ t2≤ t3, we assume that R(t3) conditioned
on R(t2) is independent of R(t1). In other words, the future depends on the
past only through the present.
Property I: Continuous or Discontinuous Sample Paths
The first characteristic is whether sample paths of the process are con-
tinuous or discontinuous with respect to time. Typically, models with dis-
continous paths have rates Rtthat are constant except for discrete points
in time at which there is a jump in the value (Figure 1-1(a)). If the number
of possible values for the rate is finite, then the rate can easily be described
as a continuous-time Markov chain with a infinitesimal rate matrix. For ex-
ample, in the covarion process  the basic rates are ‘off’ (Rt= 0) or ‘on’
(Rt= 1) and transitions occur between them at exponentially distributed
random time intervals. Galtier [19, 20] generalizes this process to one with
more than two possible states. In other models, the range of possible val-
ues for the rate is continuous, as in the model of Huelsenbeck , where
a rate change event consists of multiplying the previous rate by a gamma
random variable. The rate change events are still discrete and exponentially
There also are models that describes the rate as a continuous function
with time, and the most important class of Markov processes with continuous
paths are diffusions (Figure 1-2(a)).
presented here, the Ornstein-Uhlenbeck model of Aris-Brosou and Yang [4,
22], and the log-normal model of Kishino et al. [5, 23].
Finally, it is also possible for Rtto make jumps in value at a discrete set
of times while also changing continuously in between these points.
Examples include the CIR process
Property II: Long Term Behaviour and Ergodicity
The second property we consider is the distribution of Rtas t goes to
infinity, that is, the distribution of the rate of evolution in the long term.
Surprisingly, many models of rate evolution are very badly behaved in the
One problematic class of processes that have already been applied to
rates in phylogenetics is the martingales. We say that a Markov process is
a martingale if, for all s,t ≥ 0 we have E[Mt+s|Mt] = Mt. An example
of of a Markov martingale is Brownian motion. As a result of this fairly
innocuous looking condition, a martingale Mthas the property that either
E[|Mt|] is unbounded in time or Mtconverges to a random constant .
Either possibility is undesirable from a modelling point of view. This may
not be a problem if we only look at the process over a finite time, but
to the classification of property I. On top are examples of the rate history.
Below are the corresponding integrated rates τ(t) =?t
events, and in figures 2(a-b), R(t) is modelled as a diffusion process, with
A representation of the two classes of rate process with respect
s=0Rsds. The figures
1(a-b) refer to a continuous-time Markov chain with discrete rate change
neither is it particularly desirable. The processes of Kishino et al. [5, 23]
and Huelsenbeck et al.  all have the property that either Rtor log(Rt)
is a martingale.
At a purely theoretical level, we observe that an ever-increasing variance
will result for almost any signal that is only driven by its initial value and a
stochastic force, with no directional bias. The position of a particle subjected
to a random force produced by collisions with other particles is a classical
example of such a case. In our context, the effects on the evolutionary rate
are not independent of the actual rate : whatever the theoretical framework
we consider, a high evolutionary rate is not as likely to increase (or to stay
in high values) as to go back to smaller values.
fits particularly well to this idea, where periods of drastic adaptation with
The episodic evolution
high evolutionary rates are naturally followed by periods where a population
is adapted and its genome evolves much more slowly. Even according to
the neutral theory, as argued by Takahata , the overall dynamic of the
rate should behave as a random function that takes high values whenever
bottlenecks occur, and goes back to small values afterwards.
The concept of ergodicity naturally arises from this observation. We say
that a Markov process is ergodic if for any initial rate R0the distribution
of Rtconverges to a unique distribution as t goes to infinity. The limiting
distribution is known as the invariant or stationary distribution. Examples
of ergodic processes include the OU process, the CIR model and (usually)
the discrete space covarion and covarion-type models [26, 19, 20].
One possible way for a process to not be ergodic is if the distribution of
Rtdoes not converge for large t for some initial rate R0. This must be the
case if Rtis a martingale and does not converge to a constant, as is the case
with Brownian motion. Another possibility is that Rtconverges to different
stationary distributions for different values of R0.
Property III: Tractability
A highly desirable feature of any model is its tractability, both math-
ematical (does there exist a closed formula?) and computational (can we
compute probabilities efficiently?). Nowadays, Monte Carlo methods make
it possible to use arbitrarily complex models: however, explicit analytical
formulae allow for more efficient sampling .
There are several probability distribution functions that are important
to have when working with rate processes. The most basic is the distribution
of the rate Rtgiven the rate at time t = 0. This we have for the models
[4, 22, 23, 5] and for the CIR model, but not for the models of .
In phylogenetics we incorporate the model for evolutionary rate into the
mutation model for sequence evolution at a site. These interact to give a
joint process (Rt,Xt) for both the rate Rtat time t and the nucleotide or
protein Xtat time t. To evaluate the likelihood we require an expression for
the joint conditional probability
P[Xt= j,Rt= s|X0= i,R0= r] (1)
of going from one nucleotide (or amino acid) state and rate state to another
pair of states. Even though it is sometimes possible to perform Monte
Carlo computations to estimate this probability without a formula (as in
), having a formula will speed up the computations significantly without
having to resort to approximations, as in [23, 5].
Property IV: Autocovariance and dispersion
There is general agreement , ,  on the relevance of autocorre-
lation in the modelling of evolutionary rate. Broadly speaking, if the various
causes that explain rate variation (generation time, population size, environ-
mental fitness) vary with time, it should be reflected in rate variations. The
extent to which the rate varies can be studied using the index of dispersion
(Kimura,  , Langley and Fitch ). Let N(t) be the number of sub-
stitutions or mutations of a sequence over time t. The index of dispersion
I(t) is defined as
This statistic can be estimated by comparing the number of substitutions
that have accumulated in different lineages [32, 34]. The population genetics
community has proposed different models to account for a high index of
dispersion (, ), and any reasonable model should yield an index of
dispersion of at least one.
The index of dispersion resulting from a particular model of rate varia-
tion is a function of the autocovariance of that model. The autocovariance
for a process Rtis defined by
ρ(t) = Cov(R0,Rt). (3)
For many processes we can derive an explicit formula for the autocovariance.
If we assume that the substitutions occur according to a Poisson process with
rate governed by our rate process (that is, the substitutions follow a doubly
stochastic or Cox process, Section 4) and the rate process has autocovariance
function ρ(t) then
I(t) = 1 +2?t
as stated by a theorem in , and the stationary index of dispersion  is
I(∞) = lim
t→∞I(t) = 1 +2?∞
provided that µ, the stationary mean of the process R(t), and the limit,
exist. Note that if there is any variation in rate then the index of dispersion
will be greater than one .
Some rate models in phylogenetics [23, 22] don’t model explicitly the
rate, but instead assign a (fixed) rate to each branch, so that the expected
number of substitutions on a particular branch is equal to its length times
its assigned (constant) rate.
A close look at the log-normal model from Thorne et al. , which differs
from their previous version  in that the rate is explicitly modelled, we
suggest that the rate has constant autocovariance, since this rate process is
close to a transform of the Brownian motion, and Brownian motion has a
constant autocovariance function. Put into equation (5), we see that the the
index of dispersion diverges. This problematic result illustrates the necessity
of a balance between the presence of autocorrelation on one side, and the
decrease of autocorrelation on a large time scale.
Property V: Heterotachy or Homotachy
There are two general ways that models for evolutionary rate can be
incorporated into phylogenetics. On one hand, we have rate variation among
lineages that applies to all sites (or loci) together. This can be modelled by
trees for which the paths from the root to the leaves have different lengths.
The rate variation explains the extent to which the evolution of the sequences
has violated the molecular clock. Alternatively, we can introduce a distinct
rate process for each site or locus.
lineage rate changes are site-specific . The transition probabilities that we
derive in Section 4 can be applied to homotachous as well as heterotachous
This models heterotachy, where the
3 A Continuous diffusion model for the evolution-
A Markov process with continuous paths and satisfying some additional
smoothness conditions on its transition probabilities  is called a diffu-
sion. There are many ways of specifying a diffusion process: perhaps the
most intuitive one is by giving the probability distribution function (pdf)
of Rtgiven R0= r0, for arbitrary r0.We denote this pdf by fR[Rt|r0]. For
example, Brownian motion with parameter σ2is defined by the condition
that fR[Rt|r0] is a normal density with mean r0and variance σ2t.
A mathematically convenient representation of a diffusion is by means of
a stochastic differential equation (SDE). In the same way that a dynamical
system can be defined as the solution of a differential equation, a diffusion
for the transi-
for the autoco-
HLSwith No NoneNone
Log-NormalNoNone Constant auto-
DIffusionNo None[4, 22]
Diffusion YesYes 
Table 1: Models for the substitution rate, classified according to the prop-
erties of section 2 . CTMC stands for “Continuous-Time Markov Chain”.
process Rtcan be defined as the solution of an equation taking the general
form (see  p.61)
dRt= α(t,Rt)dt + β(t,Rt)dBt. (6)
Here, α(t,Rt) represents the deterministic effect on Rt, β(t,Rt) the stochas-
tic part, and dBtis an infinitesimal “random” increment. Brownian motion
corresponds to the case when α(t,Rt) = 0 for all t, β(t,Rt) is constant and
the SDE becomes
Note that if β(t,Rt) = 0 for all t and Rtthen (6) becomes a deterministic
ordinary differential equation.
Going from an SDE such as (6) to a pdf for the diffusion involves solving
a variable-coefficient second-order partial differential equation (PDE). For
general functions α and β this PDE has no analytic solution. There are
very few diffusions known that have closed form equations for their pdfs,
and even fewer of these are ergodic. The simplest ergodic diffusions with
closed-form expressions for the pdf are the Ornstein-Uhlenbeck and the CIR
(Cox-Ingersoll-Ross)  processes.
The Ornstein Uhlenbeck (OU) process is described by the SDE
dRt= −bRtdt + σdBt.
The pdf for Rtgiven R0= r0is the normal density with mean r0e−btand
variance σ2(1 − e−2θt).Its stationary distribution is normal with mean
0 and variance σ2. The OU process was used by Aris-Brosou and Yang
 to model evolutionary rates. However, the OU process can take on
negative values, and it is not clear how it can be used directly without
any transformation, such as a reflected OU or a squared OU. Aris-Brosou
and Yang also proposed another model, the EXP (for exponential) model,
defined as the following : the rate assigned to a branch is drawn from an
exponential distribution with mean equal to the rate of its ancestral branch.
It is then obvious that their EXP model was a martingale. They outlined
that the OU model seemed to provide a better fit to their data than the
EXP model. Even if the reason of this better fit is still to be investigated,
it seems reasonable to think that the ergodic property of the OU model
could be a important factor. They also mentioned that the σ2parameter
of the OU model was hard to infer, perhaps because the OU model has an
insufficient number of free parameters.
The use of the CIR model solves the problem, since it is a generaliza-
tion of the squared OU process, where the mean and the variance can be
independently inferred by the addition of a third parameter. If the mean
is fixed to one, we avoid any identifiability problem with branch lengths
without fixing the variance, which can therefore be inferred as well as the
The CIR process satisfies the SDE
dRt= b(a − Rt)dt + σ
and the pdf fR(Rt|r0) for Rtgiven R0= r0is a non-central χ2distribution
with degree of freedom 4ab/σ2and parameter of non-centrality
Its mean and variance are equal to
E[rt]=r0e−bt+ a(1 − e−bt)
b(e−bt− e−2bt) +aσ2
2b(1 − e−bt)2. (9)
The stationary distribution of Rtis a gamma distribution with shape
parameter 2ab/σ2and scale parameter σ2/2b. Hence the mean of the sta-
tionary distribution is a and the variance isaσ2
Unlike an OU process, if r0, a, and b are all positive a CIR process is
always non-negative. The square of an OU process is a special case of the
CIR process. Furthermore, by multiplying Rtby a constant in equation (7),
we see that multiplying a CIR process by a positive constant gives another
The covariance of the stationary CIR process can be exactly computed
ρ(t) = Cov(R0,Rt) =aσ2
From this, (4) leads to a closed formula for the index of dispersion:
ICIR(t) = 1 +σ2
b3t(bt − 1 + e−bt).
ICIR(∞) = lim
t→∞ICIR(t) = 1 +σ2
In Section 2 we emphasized that the concept of autocovariance is close
to the index of dispersion. As Zheng showed , the effect of complex
infinitesimal rate matrices on the index of dispersion (with constant rate) is
not likely to explain alone the observed large empirical values. If the rate
varies, Cutler  showed that an elevated index of dispersion can only be
achieved if the time-scale of the rate process is approximately of the same
magnitude as the substitution process itself. The CIR process provides the
possibility to satisfy this property, while incorporating autocovariance. It is
the consensus of these two ideas that should guide our choice for the rate of
From (7) we see that the CIR process possesses three parameters, namely
the stationary mean a, the stationary variance σ2, and the intensity of the
force that drives the process to its stationary distribution, b. The parameter
b determines how fast the process autocovariance goes to 0 as t increases.
The three parameters of the CIR process can be quickly estimated from
standard statistics in molecular evolution. The parameter a is a scale param-
eter. It determines the expected rate at any time given no other information.
Throughout the paper, we will assume that a = 1, so that the model has an
expected rate equal to one. This parallels the constraint that the gamma
distribution has an expected rate equal to one in the Rate-Across-Site (RAS)
The CIR process has a stationary distribution given by a gamma dis-
tribution. To make the stationary distribution coincide with the gamma
distribution of a RAS model with parameter Γ we choose σ and b such that
The stationary index of dispersion, ICIR(∞), can be estimated empirically
[33, 28]. We can then use (11) and (12) to obtain the estimates
ˆICIR(∞) − 1,
ˆICIR(∞) − 1.ˆ σ2
4Mutation models with a rate process
The standard model for the substitution process at a particular locus is a
continuous-time Markov chain. This kind of process is defined by a square
matrix Q called the infinitesimal rate matrix. Suppose, to begin, that there
is a constant evolutionary rate r0. As above, we let Xtdenote the state (e.g.
amino acid) at time t. The transition probabilities are then given by
Pr[Xt= j|X0= i] = [eQr0t]ij. (13)
We suppose that the process has a unique stationary distribution π, where
πjis the stationary probability of state j and
t→∞Pr[Xt= j|X0= i]
for all i and j.
stationary distribution the expected number of mutations over time t equals
r0t. Note that the transition probabilities (13) depend only on the product
r0t, so will be the same if we double the rate and halve the time, for example.
Suppose now that the rate is not constant, but instead varies according
to some fixed function rs, s ≥ 0. Equation (13) then becomes
We assume that Q has been normalised so that in the
Pr[Xt= j|X0= i,r] = [eQτr]ij. (14)
is the area under the curve rs.
In the models we will consider, the fixed function r = (rt)t≥0is replaced
by a random process R = (Rt)t≥0 that is dependent only on the starting
rate r0. The integral
is also random in this case; let gRdenote its pdf. The transition probabilities
can be determined from the expected value of (14) with τrreplaced by the
random variable τR. By the law of total expectation, this simplifies to
Pr[Xt= j|X0= i] =
Let M(η) = Eτ[eητR] denote the moment generating function (mgf) for
the random variable τsR. Then (16) can be rewritten
Pr[Xt= j|X0= i] = [M(Q)]ij
where the function M is interpreted as a matrix function . We assume
that Q can be diagonalised as Q = V ΛV−1, where Λ = diag(λ1,...,λn) is
a diagonal matrix formed from the eigenvalues of Q. The matrix function
M(Q) can then be evaluated as M(Q) = V M(Λ)V−1, where
M(Λ) = diag(M(λ1),...,M(λn)).
See  for a more details on matrix functions. The problem of determining
pattern probabilities therefore boils down to the problem of determining the
moment generating function of the integrated rate, τR(eqn. 15). Tuffley
and Steel use this approach to derive distance estimates for the covarion
For applications in phylogenetics, we need the mgf of τRconditioned on
just the starting rate, or both the starting and finishing rate. The mgf of
τR, conditioned on a starting rate of r0, is then
As before, we let fR(Rt|r0) denote the pdf of Rtconditioned on R0= r0.
Let δ(x) denote the Dirac delta function with δ(0) = 1 and δ(x) = 0 for all
x ?= 0. The mgf of τRconditioned on both the starting and finishing rates is
Mr0,rt(η)=E?exp(ητR)??R0= r0, Rt= rt
fR(rt|r0)E?exp(ητR)δ(Rt− rt)??R0= r0
??? R0= r0
Equations (17) and (18) hold irrespective of whether R is discrete or con-
tinuous, a diffusion, jump process, or a continuous time Markov chain.
We note in passing that analytic formulae for Mr0(η) and Mr0,rt(η) exist
in the case that R is a continuous time Markov chain, for example in the
covarion-type model of Galtier . Suppose that the evolutionary rate
switches between rate values g1,g2,...gkfollowing a continuous time Markov
chain with infinitesimal rate matrix G. Let D be the k × k diagonal matrix
with entries g1,g2,...,gk. A careful reworking of the proof of Theorem 1 in
 gives the mgf of τRconditioned on both the starting and finishing rate.
The mgf for τRconditioned on r0= giis then
while the mgf of τRconditioned on r0= giand rt= gjis
This provides an independent derivation of the formula in  for transition
probabilities under a covarion-type model.
5Moment generating functions and transition prob-
abilities for the CIR model
In this section we derive expressions for the (joint) transition probabilities
Pr[Xt= j|X0= i,R0= r0].(19)
Pr[Xt= j,Rt= s|X0= i,R0= r0](20)