Figure 1 - uploaded by Alexei J Drummond
Content may be subject to copyright.

The Wright-Fisher and Kingman coalescent processes, with generation time ρ. (left) K = 40 generations of a Wright-Fisher population of N e = 20 individuals, (center) the ancestral lineages of contemporary individuals 5 and 15 in the population at left coalesce M = 20 generations back, (right) the continuous time coalescent tree summarizing the discrete time ancestry in the centre.
Source publication
This chapter focuses on on-going research into chronology building tools based on genetic data from DNA sequences. By combining genetic information and radiocarbon data from fossil remains it is possible to recover genealogical structures, population size information, mutation rates and, hence, approximate chronologies for genetic trees. Since this...
Context in source publication
Similar publications
Epidemiology and public health planning will increasingly rely on the analysis of genetic sequence data. In particular, genetic data coupled with dates and locations of sampled isolates can be used to reconstruct the spatiotemporal dynamics of pathogens during outbreaks. Thus far, phylogenetic methods have been used to tackle this issue. Although t...
Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (...
Author
Our understanding of the distribution of genetic variation in natural populations has been driven by mathematical models of the underlying biological and demographic processes. A key strength of such coalescent models is that they enable efficient simulation of data we might see under a variety of evolutionary scenarios. However, current me...
A genealogical relationship among genes at a locus (gene tree) sampled from three related populations was examined with special reference to population relatedness (population tree). A phylogenetically informative event in a gene tree constructed from nucleotide differences consists of interspecific coalescences of genes in each of which two genes...
Genome studies of facultative sexual species are providing insight into the evolutionary consequences of mixed reproductive modes. However, it is unclear if the evolutionary history of facultative sexuals' genomes can be captured by standard population genetic models; in particular, whether they can be approximated by Wright-Fisher dynamics while a...
Citations
... When the tips of a tree are all contemporaneous an isochronous sample results. Extension of the coalescent process to heterochronous samples allowed multiple collections at successive epochs ( [15], [16], [17], [18]). Enumeration of heterochronous genealogies proven in Sections 2 and 3 turns out to be iterative in the form of nested sum-products, thereby disproving a conjectured recursion in the literature of computational biology (equation 1, [19], Section 2.2 [20]). ...
The enumeration formula of a non-contemporaneous genealogy with total sample size n = n1 + n2 requires a nested sum-product. The set of ancestral patterns in the non-contemporaneous genealogy yields a multiplicity factor that translates from the set of ancestral patterns in the isochronous genealogy. A computation formula of the multiplicity factor proves to be non-recursive. Evaluation of small sample sizes demonstrates the emergent complexity. Extension to the enumeration formula in the heterochronous genealogy with m samples of total size n = n1 + · · · + nm yields a non-recursive nested sum-product. These enumeration formulae measure sample spaces of Bayesian prior distributions of trees relevant to theoretical and computational phylogenetics. c 2017 Academic Publications, Ltd.
... Several of the rpoB1 positive marmosets show late qPCR amplification signals, which reflect results produced by mismatches between the primer/probe and endogenous DNA and suggest that the DNA present in these marmosets likely belongs to a mycobacterial species distantly related to the MTBC. Although this study did not identify any IS6110 positives, previous studies have identified the presence of this locus in New World primates [37]. However, the IS6110 mobile element has been found to be homologous in some related soil mycobacteria [5], and targeting this element with PCR has previously produced false-positives [38]. ...
Zoonotic pathogens that cause leprosy (Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis complex, MTBC) continue to impact modern human populations. Therefore, methods able to survey mycobacterial infection in potential animal hosts are necessary for proper evaluation of human exposure threats. Here we tested for mycobacterial-specific single- and multi-copy loci using qPCR. In a trial study in which armadillos were artificially infected with M. leprae, these techniques were specific and sensitive to pathogen detection, while more traditional ELISAs were only specific. These assays were then employed in a case study to detect M. leprae as well as MTBC in wild marmosets. All marmosets were negative for M. leprae DNA, but 14 were positive for the mycobacterial rpoB gene assay. Targeted capture and sequencing of rpoB and other MTBC genes validated the presence of mycobacterial DNA in these samples and revealed that qPCR is useful for identifying mycobacterial-infected animal hosts.
... Although there are proof-ofconcept examples in several application areas in Buck et al. (1996) and others have also published examples (Orton, 2000;Millard, 2002;Millard and Gowland, 2002;Byers and Roberts, 2003;Millard, 2004Millard, , 2005Finke et al., 2008;van Leusen et al., 2009;Fernandes et al., 2014), there is really only one application area where Bayesian methods can be said to be routine: absolute, scientific-dating-based chronology construction. There is one other application area with close connections to archaeology, that of phylogeny (both genetic and linguistic), where use of Bayesian methods is also increasingly routine (Drummond et al., 2004;Edwards et al., 2007;Kitchen et al., 2009;Drummond et al., 2012;Bouckaert et al., 2014). Here, however, methodological development was driven largely by the genetics and linguistic research communities. ...
Bayesianism is fast becoming the dominant paradigm in archaeological chronology construction. This paradigm shift has been brought about in large part by widespread access to tailored computer software which provides users with powerful tools for complex statistical inference with little need to learn about statistical modelling or computer programming. As a result, we run the risk that such software will be reduced to the status of black boxes. This would be a dangerous position for our community since good, principled use of Bayesian methods requires mindfulness when selecting the initial model, defining prior information, checking the reliability and sensitivity of the software runs and interpreting the results obtained. In this article, we provide users with a brief review of the nature of the care required and offer some comments and suggestions to help ensure that our community continues to be respected for its philosophically rigorous scientific approach.
... Although paleontological or geological calibrations can be used, there is evidence that these are inappropriate for intraspecific analyses because of the effects of saturation, purifying selection, and other factors (Ho and Larson 2006;Ho et al. 2008). Therefore, it is preferable to employ calibrations within the intraspecific genealogy, using DNA sequences from dated material sampled at various points in time-time-stamped DNA sequences (Rambaut 2000;Drummond et al. 2004). Provided that these sequences are sufficiently variable, owing to either a high mutation rate or a broad temporal span, it is possible to estimate the rate of molecular evolution (Drummond et al. 2002(Drummond et al. , 2003. ...
In recent years, ancient DNA has increasingly been used for estimating molecular timescales, particularly in studies of substitution rates and demographic histories. Molecular clocks can be calibrated using temporal information from ancient DNA sequences. This information comes from the ages of the ancient samples, which can be estimated by radiocarbon dating the source material or by dating the layers in which the material was deposited. Both methods involve sources of uncertainty. The performance of Bayesian phylogenetic inference depends on the information content of the data set, which includes variation in the DNA sequences and the structure of the sample ages. Various sources of estimation error can reduce our ability to estimate rates and timescales accurately and precisely. We investigated the impact of sample-dating uncertainties on the estimation of evolutionary timescale parameters using the software BEAST. Our analyses involved 11 published data sets and focused on estimates of substitution rate and root age. We show that, provided that samples have been accurately dated and have a broad temporal span, it might be unnecessary to account for sample-dating uncertainty in Bayesian phylogenetic analyses of ancient DNA. We also investigated the sample size and temporal span of the ancient DNA sequences needed to estimate phylogenetic timescales reliably. Our results show that the range of sample ages plays a crucial role in determining the quality of the results but that accurate and precise phylogenetic estimates of timescales can be made even with only a few ancient sequences. These findings have important practical consequences for studies of molecular rates, timescales, and population dynamics.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused millions of deaths and substantial morbidity worldwide. Intense scientific effort to understand the biology of SARS-CoV-2 has resulted in daunting numbers of genomic sequences. We witnessed evolutionary events that could mostly be inferred indirectly before, such as the emergence of variants with distinct phenotypes, for example transmissibility, severity and immune evasion. This Review explores the mechanisms that generate genetic variation in SARS-CoV-2, underlying the within-host and population-level processes that underpin these events. We examine the selective forces that likely drove the evolution of higher transmissibility and, in some cases, higher severity during the first year of the pandemic and the role of antigenic evolution during the second and third years, together with the implications of immune escape and reinfections, and the increasing evidence for and potential relevance of recombination. In order to understand how major lineages, such as variants of concern (VOCs), are generated, we contrast the evidence for the chronic infection model underlying the emergence of VOCs with the possibility of an animal reservoir playing a role in SARS-CoV-2 evolution, and conclude that the former is more likely. We evaluate uncertainties and outline scenarios for the possible future evolutionary trajectories of SARS-CoV-2.
The field of ancient DNA (aDNA) has rapidly accelerated in recent years as a result of new methods in next-generation sequencing, library preparation and targeted enrichment. Such research is restricted, however, by the highly variable DNA preservation within different tissues, especially when isolating ancient pathogens from human remains. Identifying positive candidate samples via quantitative PCR (qPCR) for downstream procedures can reduce reagent costs, increase capture efficiency and maximize the number of sequencing reads of the target. This study uses four qPCR assays designed to target regions within the Mycobacterium tuberculosis complex (MTBC) to examine 133 human skeletal samples from a wide geographical and temporal range, identified by the presence of skeletal lesions typical of chronic disseminated tuberculosis. Given the inherent challenges working with ancient mycobacteria, strict criteria must be used and primer/probe design continually re-evaluated as new data from bacteria become available. Seven samples tested positive for multiple MTBC loci, supporting them as strong candidates for downstream analyses. Using strict and conservative criteria, qPCR remains a fast and effective screening tool when compared with screening by more expensive sequencing and enrichment technologies.
© 2014 The Author(s) Published by the Royal Society. All rights reserved.
The use of fossil evidence to calibrate divergence time estimation has a long history. More recently, Bayesian Markov chain Monte Carlo has become the dominant method of divergence time estimation, and fossil evidence has been reinterpreted as the specification of prior distributions on the divergence times of calibration nodes. These so-called "soft calibrations" have become widely used but the statistical properties of calibrated tree priors in a Bayesian setting hashave not been carefully investigated. Here, we clarify that calibration densities, such as those defined in BEAST 1.5, do not represent the marginal prior distribution of the calibration node. We illustrate this with a number of analytical results on small trees. We also describe an alternative construction for a calibrated Yule prior on trees that allows direct specification of the marginal prior distribution of the calibrated divergence time, with or without the restriction of monophyly. This method requires the computation of the Yule prior conditional on the height of the divergence being calibrated. Unfortunately, a practical solution for multiple calibrations remains elusive. Our results suggest that direct estimation of the prior induced by specifying multiple calibration densities should be a prerequisite of any divergence time dating analysis.
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of θ estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.
MICROSATELLITES, also called short tandem repeats (STRs), are repetitions of a DNA sequence motif with length between 1 and 6 bp. Because they are abundant, widely distributed in the genome, and highly polymorphic, microsatellites have become one of the most popular genetic markers for making inferences on molecular evolution and population genetics (Shikanoet al. 2010; Sponget al. 2010).