Copyright c ? Society of Systematic Biologists
Point of View
Time Dependency of Molecular Rates in Ancient DNA Data Sets, A Sampling Artifact?
REGIS DEBRUYNE∗AND HENDRIK N. POINAR
Ancient DNA Centre, Department of Anthropology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4L9, Canada;
∗Correspondence to be sent to: Ancient DNA Centre, Department of Anthropology, McMaster University, 1280 Main Street West, Hamilton, Ontario
L8S 4L9, Canada; E-mail: email@example.com.
It is common knowledge that the instantaneous rate
of mutation (RoM) in DNA sequences exceeds the long-
term rate of substitution (RoS) when measured in in-
terspecific phylogenetic analyses. The neutral theory
of molecular evolution describes this temporary excess
diversity as transient polymorphisms either removed
from the population through the actions of purifying
selection or fixed by random genetic drift over a few
generations (Kimura 1983). Observations of these “ac-
celerations” in the molecular rates within recent evolu-
tionary time have been documented (Parsons et al. 1997;
Lambert et al. 2002); however, they did not resolve the
magnitude and duration of this phenomenon. Howell
et al. (2003) have addressed these issues through pedi-
gree analyses of human mitochondrial (mt) hypervari-
able region (HVR) sequences and have suggested a
5- to 10-fold acceleration compared with the long-term
RoS. In addition, Burridge et al. (2008) have shown that
the calibration of the mt clock for galaxiid fishes using
geological divergence dates with cytochrome b and
control region sequences supports a transition period
during which the RoM would decrease toward the RoS
extending up to ∼200 kyr (Burridge et al. 2008). How-
ever, the general applicability of these specific results re-
The recent publication (Ho, Phillips, Cooper, et al.
2005) of a Bayesian analysis supporting that a drastic
acceleration in the molecular rates as time approaches 0
is a general and predictable feature has become a hotly
debated topic in both systematics and evolutionary bi-
ology (Bandelt 2007; Emerson 2007; Ho, Kolokotronis,
et al. 2007; Ho, Shapiro, et al. 2007; Howell et al. 2008).
When using the program BEAST (Drummond and
Rambaut 2007), with data sets containing sequences of
vertebrate taxa in conjunction with a wide range of cal-
ibration points, Ho, Phillips, Cooper, et al. (2005) have
proposed a “time-dependency” model in which the rate
reflects a direct response of the divergence time between
the terminals. In this model, the molecular rate, referred
to as the rate of change (RoC), is allowed to decrease
rapidly along a vertically translated exponential decay
curve until the long-term RoS reaches equilibrium. Ho,
Phillips, Cooper, et al. (2005) have evaluated the puta-
tive effects of sequencing and calibration errors as well
as mutational saturation in their model and have con-
cluded that purifying selection was the most likely con-
This model, however, describes a phenomenon an or-
of rates (up to and beyond 20-fold) during an extended
period of time (up to 2 million years). Although intu-
Phillips, Cooper, et al. (2005) well, this model supports a
serious and prolonged impact of deleterious mutations
and would thus require a few adjustments to the current
Recently, Bandelt (2007) and Emerson (2007) have
questioned both the model of the time dependency and
the significance of the rate acceleration phenomenon.
Emerson (2007) has emphasized the critical role of the
selection of priors in BEAST analyses and has shown
that the results described by Ho, Phillips, Cooper, et al.
(2005) could only be retrieved under specific conditions
within a full Bayesian framework (where the level of en-
forcement over the priors is minimal). In this paper, our
primary objective was thus to re-address the nature of
the causal factor(s) of the rate acceleration described by
Ho, Phillips, Cooper, et al. (2005) as well as their biolog-
ical meaning. Based on previously published material,
we suggest that the emphasis placed on the divergence
time in the current explanation of this phenomenon
might have hidden other relevant factors such as the in-
formation content of the data sets. In order to compare
the performance of the strict “time-dependency” model
with a more inclusive “signal-dependency” hypothesis,
we examine the impact of sequence length over the es-
timates of the RoC for 2 different calibrations. We show
that our hypothesis for a signal-dependent artifact ap-
pears to model the data presented here more accurately
and may explain some inconsistencies between pub-
lished reports on evolutionary rates.
EVIDENCE SUPPORTING THE PHENOMENON OF RATE
In their original paper, Ho, Phillips, Cooper, et al.
(2005) reported an apparent acceleration of the RoC
in recent times (<2 Ma) for 3 groups of mt data sets
calibrated with dates ranging from 29 to 42 ka for tip
14C calibrations to 0.125–35 Ma for node (internal)
Systematic Biology Advance Access published July 1, 2009
calibrations using the software application BEAST.
This software, developed by Drummond and Rambaut
(2007), simultaneously analyzes genetic data with their
associated dates/ages of terminals and/or nodes, in
order to infer, under various clock models within a
Bayesian framework, the substitution rates, phyloge-
netic structure, branch lengths, and demographic pa-
rameters, which best describe the data at hand. In his
re-analysis of the same data sets, Emerson (2007) has
documented evidence of 3 main sources of error: 1) the
accuracy of the phylogenetic methodology to estimate
distances/branch lengths, 2) the quality of the molec-
ular sampling, and 3) the accuracy of the time calibra-
tions. He has shown that when different sets of priors
were selected in BEAST, the exponential decay rate as
seen by Ho, Phillips, Cooper, et al. (2005) could either
not be retrieved (for the data sets using primates control
region and protein-coding sequences) or not with the
same magnitude (for data sets of protein-coding genes
in avian taxa). Of the original data set, only the human
Neandertal HVR sequences (8 sequences of 356 bp with
34 informative characters) produced a consistently ele-
vated RoC in both papers when the calibration method
used was based exclusively on the14C radiocarbon
dates of the 4 Neandertal sequences. However, Emerson
(2007) observed no rate acceleration when the time of
the most recent common ancestor (tMRCA) of the Ne-
andertal sequences was implemented as a node calibra-
tion. These results exemplify how the rate acceleration
itself appears to be heavily dependent on the analytical
framework of BEAST analyses. Ho, Phillips, Cooper,
et al. (2005) advocate for a “full Bayesian framework”
with parameter-rich models and unconstrained priors,
whereas Emerson (2007) and Bandelt (2007) rather pro-
mote more tightly controlled priors and calibrations
within simple models.
Within a full Bayesian framework, Ho, Kolokotronis,
et al. (2007) have consistently documented elevated
RoCs for BEAST analyses based on 19 data sets includ-
ing ancient DNA (aDNA) sequences and calibrations
(Fig. 1) except for a Chlorobium data set (which also had
the oldest calibration point at 206 ka). These analyses
shared striking similarities with the original human Ne-
andertal comparison: 1) all were derived from relatively
short, potentially low informative, sequences (range
114–741 bp; 350 bp average), 2) all were performed us-
ing time calibration priors that relied exclusively on14C
radiocarbon dates of the terminals, and 3) all yielded
elevated RoCs associated with widely distributed high-
est posterior densities (HPD). We will explore below
the reasons why these conditions might be inseparable
from the rate acceleration phenomenon described by
Ho, Phillips, Cooper, et al. (2005).
Inconsistencies in the Rate Estimates between Simulated and
Real Data Sets
Although much recent literature confirms the re-
peated occurrence of the rate acceleration, Emerson
(2007) has argued that this result might be artificial. In
yses using real (in black) or simulated data sets (in gray). The width
of the 95% HPD for the simulated data sets is shaded in order to com-
pare with the real data sets. Figure modified after the figure 1 of Ho,
Kolokotronis, et al. (2007, p. 3).
Comparison of the RoC estimate through BEAST anal-
response to this criticism, Ho, Kolokotronis, et al. (2007)
have performed BEAST analyses on simulated data sets.
The rationale for this analysis was that if BEAST yielded
accurate posterior estimates of the RoC that had been
used as a prior to generate the simulated data sets in
a full Bayesian framework (using recent tip calibration
only), it would symmetrically confirm the accuracy of
the posterior rate estimates recovered for the real data.
However, for those comparisons to be relevant, the data
and results from the simulated and real data sets should
be as similar as possible.
Ho, Kolokotronis, et al. (2007) generated nucleotide
sequences of 1 kb in length representing 30000 years of
variation with a RoS as high as 5×10−7per site per year
(about 25-fold the estimate of the substitution rate for
the mt genome of vertebrates). With such a high RoS,
one would expect on average 1 change per sequence ev-
ery 2000 years, so that a representative picture of their
divergence could be captured within radiocarbon time.
For these simulated data sets, the estimate of the poste-
tremely precise. The precision defined the width of the
HPD for the RoC, which generally spanned about 50%
of the average estimate (95% HPD typically spanning
∼ 2.5 × 10−7substitution per site per year for a mean
RoC of ∼5 × 10−7).
However, the precision of the posterior RoC for the
real aDNA mt data sets analyzed in the same paper was
generally much lower (Ho, Kolokotronis, et al. 2007):
with the exception of the bison alignment, the HPDs for
the rates were on average >3 times wider than for the
POINT OF VIEW
alignments, the posterior RoC was not only more pre-
cisely estimated than for the 615-bp alignment, but its
average estimate was also consistently lower.
The comparison of these results with the ones de-
rived from the short calibration range (0–10 ka, Fig. 7)
shows that the signal-dependent artifact affects the lat-
ter data set even more heavily: larger increase in RoC
average and HPD when the sequence shortens. Like for
the real proboscidean sequences, the rate variation for
the bison data sets can also be modeled by a hyperbolic
fit (Fig. 7). Both hyperbolic fits vary in almost parallel
fashion over 5-fold of the original data set and thus
never converge to similar estimates: the RoS for the
0–60 ka data set could be as low as 1.2 × 10−7(rather
than the estimate of 3.0 × 10−7derived from the 615-bp
alignment), whereas the RoS for the 0–10 ka range is
more than double (2.7 × 10−7). This difference suggests
that fragment length and time depth are not the only
contributing factors to the rate acceleration. Contrary
to the proboscidean study (where the composition in
terminals was constant), the 2 bison sequence sets an-
alyzed here differ in the number of terminals and the
overall phylogenetic pattern, 2 factors that may con-
tribute to the signal-dependent bias. Alternatively, the
difference in RoS estimates could also support a predic-
tion by Emerson that the signal-dependent bias may not
be sufficient to explain the entire phenomenon of rate
acceleration. In all cases, the approach of the decom-
position/concatenation of the original data provides a
useful (and relatively easy to implement) tool to eval-
uate to what extent the RoC derived from any data set
limited in both calibration depth and sequence length
might be biased due to a signal-dependent artifact.
for the re-analyzed bison data spanning (from left to right) 1/4, 1/2,
3/4, 1×,2×,5×, and 10× the original data. Both the complete 0–60
ka calibration range (empty circles) and the 0–10 ka range (filled tri-
angles) are displayed. The asterisk indicates the results from the orig-
inal data without length modification. The hyperbolic fits adjusted to
the data are displayed by the dashed lines with associated correlation
Posterior estimates of the RoC (average and 95% HPD)
Our Bayesian analyses presented here have implica-
tions of general significance for aDNA data sets ana-
lyzed with BEAST. Based on the proboscidean mt data
set, we were able to show that the apparent time de-
pendency of the RoC recovered for inferences built on
poorly informative data sets calibrated in time with only
recent radiocarbon dates is more likely explained by an
artifact than an actual evolutionary paradigm. The lim-
ited phylogenetic content of short sequences appears to
relax the constraint over the substitution rates, which
can vary so greatly that their mean estimate becomes
irrelevant for use and leads to a reproducible bias of the
apparent acceleration of the molecular rates. By show-
ing how the pattern described from the analyses of the
proboscidean data can be extended to other published
material, we suggest that all aDNA data sets be tested
for such a signal dependency through the analytical
framework provided here to evaluate the risk of such a
Once the effects of the signal-dependent artifact are
accounted for, the difference between the RoC estimates
this artifact alone does not account for the entire rate ac-
celeration. It does, however, show that the acceleration
has been previously reported by Ho, Phillips, Cooper,
et al. (2005).
We have also attempted to address the reciprocal
qualities of recent versus deep calibration approaches.
The shallow calibration approach is more pertinent to
a full Bayesian framework as it relies on the inherent
structure and quality of the data to converge to both
accurate and precise estimates. However, we have ex-
emplified how this approach can be misleading when
the data are poorly structured (Rannala 2002): inaccu-
rate tMRCAs and both inaccurate and imprecise RoCs
are then recovered. Conversely, the consistent and ac-
curate results obtained for the RoC with an enforced
deep calibration provide support for this methodolog-
ical approach provided accurate information is avail-
able for the enforced calibration(s). Despite the obvious
efficiency of BEAST algorithms in a full Bayesian frame-
work when analyzing long DNA sequences, our analy-
ses suggest that data sets of very limited phylogenetic
content might remain out of the range of precise and
accurate estimates of divergence dates. The legitimacy
of dating divergence events using short, potentially un-
informative,14C-dated data sets exclusively, is thus of
limited value, despite being the norm for aDNA studies.
This work was funded by The Natural Science and
Engineering Research Council (grant 299103-2004),
the Canadian Research Chairs program, a fellowship
from the Human Frontier Science Program (HFSP-
00285C/2005), and McMaster University.
We thank Melanie Kuch and Carsten Schwarz for dis-
cussions and critical evaluation of our protocol. We are
grateful to J. Sullivan, K. Zamudio, B. Emerson, and an
anonymous reviewer for their constructive comments.
We are grateful to Beth Shapiro and Simon Ho for shar-
ing preformatted files of the bison data set.
Bandelt H.J. 2007. Clock debate: when times are a-changin’: time
dependency of molecular rate estimates: tempest in a teacup. He-
Barnes I., Shapiro B., Lister A., Kuznetsova T., Sher A., Guthrie D.,
Thomas M.G. 2007. Genetic structure and extinction of the woolly
mammoth, Mammuthus primigenius. Curr. Biol. 17:1072–1075.
Burridge C.P., Craw D., Fletcher D., Waters J.M. 2008. Geological dates
and molecular rates: fish DNA sheds light on time dependency.
Mol. Biol. Evol. 25:624–633.
Grocke D.R., Matheus P., Zazula G., Guthrie D., Froese D., Buigues
B., de Marliave C., Flemming C., Poinar D., Fisher D., Southon J.,
Tikhonov A.N., Macphee R.D., Poinar H.N. 2008. Out of America:
ancient DNA evidence for a new world origin of late quaternary
woolly mammoths. Curr. Biol. 18:1320–1326.
Drummond A.J., Rambaut A. 2007. BEAST: Bayesian evolutionary
analysis by sampling trees. BMC Evol. Biol. 7:214.
Emerson B.C. 2007. Alarm bells for the molecular clock? No support
for Ho et al.’s model of time-dependent molecular rate estimates.
Syst. Biol. 56:337–345.
Endicott P., Ho S.Y. 2008. A Bayesian evaluation of human mitochon-
drial substitution rates. Am. J. Hum. Genet. 82:895–902.
Seattle (WA): University of Washington.
Gilbert M.T., Drautz D.I., Lesk A.M., Ho S.Y., Qi J., Ratan A., Hsu
C.H., Sher A., Dalen L., Gotherstrom A., Tomsho L.P., Rendulic S.,
A., Willerslev E., Iacumin P., Buigues B., Ericson P.G., Germonpre
M., Kosintsev P., Nikolaev V., Nowak-Kemp M., Knight J.R., Irzyk
G.P., Perbost C.S., Fredrikson K.M., Harkins T.T.,
Miller W., Schuster S.C. 2008. Intraspecific phylogenetic analy-
sis of Siberian woolly mammoths using complete mitochondrial
genomes. Proc. Natl. Acad. Sci. USA. 105:8327–8332.
Gilbert M.T., Tomsho L.P., Rendulic S., Packard M., Drautz D.I., Sher
A., TikhonovA., DalenL., Kuznetsova T., KosintsevP., CamposP.F.,
Higham T., Collins M.J., Wilson A.S., Shidlovskiy F., Buigues B.,
Ericson P.G., Germonpre M., Gotherstrom A., Iacumin P.,
Nikolaev V., Nowak-Kemp M., Willerslev E., Knight J.R., Irzyk
G.P.,Perbost C.S., Fredrikson K.M., Harkins T.T., Sheridan
S., Miller W., Schuster S.C. 2007. Whole-genome shotgun se-
quencing of mitochondria from ancient hair shafts. Science. 317:
Hall T.A. 1999. BioEdit: a user-friendly biological sequence align-
ment editor and analysis program for Windows 95/98/NT. Nucleic
Acids. Symp. Ser. 41:95–98.
Hauf J., Waddell P., Chalwatzis N., Joger U., Zimmermann F.K.
2000. The complete mitochondrial genome sequence of the African
elephant (Loxodonta africana) phylogenetic relationships of Pro-
boscidea to other mammals and D-loop heteroplasmy. Zoology.
Ho S.Y., Kolokotronis S.O., Allaby R.G. 2007. Elevated substitution
rates estimated from ancient DNA sequences. Biol. Lett. 3:702–705.
Ho S.Y., Phillips M.J., Cooper A., Drummond A.J. 2005. Time depen-
dency of molecular rate estimates and systematic overestimation of
recent divergence times. Mol. Biol. Evol. 22:1561–1568.
Ho S.Y., Phillips M.J., Drummond A.J., Cooper A. 2005. Accuracy of
rate estimation using relaxed-clock models with a critical focus on
the early metazoan radiation. Mol. Biol. Evol. 22:1355–1363.
Ho S.Y., Shapiro B., Phillips M.J., Cooper A., Drummond A.J. 2007. Ev-
idence for time dependency of molecular rate estimates. Syst. Biol.
Howell N., Howell C., Elson J.L. 2008. Molecular clock debate: time
dependency of molecular rate estimates for mtDNA: this is not the
time for wishful thinking. Heredity. 101:107–108.
Howell N., Smejkal C.B., Mackey D.A., Chinnery P.F., Turnbull D.M.,
Herrnstadt C. 2003. The pedigree rate of sequence divergence in the
human mitochondrial genome: there is a difference between phylo-
genetic and pedigree rates. Am. J. Hum. Genet. 72:659–670.
Kimura M. 1983. The neutral theory of molecular evolution.
Cambridge (UK): Cambridge University Press.
Krause J., Dear P.H., Pollack J.L., Slatkin M., Spriggs H., Barnes I.,
Lister A.M., Ebersberger I., Paabo S., Hofreiter M. 2006. Multiplex
amplification of the mammoth mitochondrial genome and the evo-
lution of Elephantidae. Nature. 439:724–727.
Lambert D.M., Ritchie P.A., Millar C.D., Holland B., Drummond A.J.,
Baroni C. 2002. Rates of evolution in ancient DNA from Adelie pen-
guins. Science. 295:2270–2273.
Parsons T.J., Muniec D.S., Sullivan K., Woodyatt N., Alliston-Greiner
R., Wilson M.R., Berry D.L., Holland K.A., Weedn V.W., Gill P.,
Holland M.M. 1997. A high observed substitution rate in the hu-
man mitochondrial DNA control region. Nat. Genet. 15:363–368.
Penny D. 2005. Relativity for molecular clocks. Nature. 426:183–184.
Poinar H.N., Schwarz C., Qi J., Shapiro B., Macphee R.D., Buigues B.,
Tikhonov A., Huson D.H., Tomsho L.P., Auch A., Rampp M., Miller
W., Schuster S.C. 2006. Metagenomics to paleogenomics: large-scale
sequencing of mammoth DNA. Science. 311:392–394.
Posada D., Crandall K.A. 1998. Modeltest: testing the model of DNA
substitution. Bioinformatics. 14:817–818.
Rambaut A., Drummond A.J. 2007. Tracer. Version 1.4. Available from:
Rannala B. 2002. Identifiability of parameters in MCMC Bayesian in-
ference of phylogeny. Syst. Biol. 51:754–760.
Rogaev E.I., Moliaka Y.K., Malyarchuk B.A., Kondrashov F.A.,
Derenko M.V., Chumakov I., Grigorenko A.P. 2006. Complete mito-
chondrial genome and phylogeny of Pleistocene mammoth Mam-
muthus primigenius. PLoS Biol. 4:e73.
Rohland N., Malaspinas A.S., Pollack J.L., Slatkin M., Matheus P.,
Hofreiter M. 2007. Proboscidean mitogenomics: chronology and
mode of elephant evolution using mastodon as outgroup. PLoS
Shapiro B., Drummond A.J., Rambaut A., Wilson M.C., Matheus P.E.,
E., Hansen A.J., Baryshnikov G.F., Burns J.A., Davydov S., Driver
J.C., Froese D.G., Harington C.R., Keddie G., Kosintsev P., Kunz
M.L., Martin L.D., Stephenson R.O., Storer J., Tedford R., Zimov S.,
Cooper A. 2004. Rise and fall of the Beringian steppe bison. Science.
Shoshani J., Tassy P. 2005. Advances in proboscidean taxonomy &
classification, anatomy & physiology, and ecology & behavior.
Quat. Int. 126–128:5–20.
Shoshani J., Walter R.C., Abraha M., Berhe S., Tassy P., Sanders W.J.,
Marchant G.H., Libsekal Y., Ghirmai T., Zinner D. 2006. A pro-
boscidean from the late Oligocene of Eritrea, a ”missing link”
between early Elephantiformes and Elephantimorpha, and biogeo-
graphic implications. Proc. Natl. Acad. Sci. USA. 103:17296–17301.
Tassy P. 2003. Elephantoidea from Lothagam. In: Leakey M.G., Harris
J.M., editors. Lothagam: the dawn of humanity in Eastern Africa.
New York: Columbia University Press. p. 331–358.
Received 26 September 2008; revised 5 December 2008; accepted 24 March
Associate Editor: Kelly Zamudio