
Graham Jones- PhD
- Researcher at University of Gothenburg
Graham Jones
- PhD
- Researcher at University of Gothenburg
About
29
Publications
10,953
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,335
Citations
Introduction
Current institution
Additional affiliations
June 1991 - December 2006
Own business
Position
- Self employed
Description
- Freelance computer programmmer. 1991-97: various programs for a company called BEEBUG. 1997-2006: Wrote and marketed the music OCR program SharpEye.
September 1980 - September 1983
Publications
Publications (29)
The focus of this article is a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using molecular sequences from multiple loci. The species delimitation requires no a priori assignment of individuals to species, and no guide tree. The method is implemented in a package called STACEY fo...
The multispecies coalescent model provides a formal framework for the assignment of individual organisms to species, where the species are modeled as the branches of the species tree. None of the available approaches so far have simultaneously co-estimated all the relevant parameters in the model, without restricting the parameter space by requirin...
Polyploidy is an important speciation mechanism, particularly in land plants. Allopolyploid species are formed after hybridization between otherwise intersterile parental species. Recent theoretical progress has led to successful implementation of species tree models that take population genetic parameters into account. However, these models have n...
This article focuses on the problem of estimating a species tree from multilocus data in the presence of incomplete lineage sorting and migration. I develop a mathematical model similar to IMa2 (Hey 2010) for the relevant evolutionary processes which allows both the population size parameters and the migration rates between pairs of species tree br...
Ten replicates of data generated to mimic a small SARS-Cov-2 outbreak. There are 5-10 hosts. Transmission time distribution, mutation rate, genome size approximately match observations. Effective population size within host, bottleneck, recombination rate, sequencing error rate are guesses.
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, inclu...
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput s...
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, inclu...
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring major advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by High-throug...
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring major advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by High-throug...
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring major advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by High-throug...
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring major advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by next-genera...
We give an overview of recently developed methods to reconstruct phylog-enies of taxa that include allopolyploids that have originated in relatively recent times-in other words, taxa for which at least some of the parental lineages of lower ploidy levels are not extinct and for which ploidy information is clearly shown by variation in chromosome co...
Mayrose et al. (2011) and Arrigo and Barker (2012) concluded that neopolyploid lineages diversify more slowly than the diploid lineages from which they arise. We expressed concerns about this statement in Soltis et al. (2014a) to which Mayrose et al. (2014) responded. This article continues the discussion. We demonstrate a statistical problem with...
Motivation
The multispecies coalescent model provides a formal framework for the assignment of individual organisms to species, where the species are modeled as the branches of the species tree. None of the available approaches so far have simultaneously co-estimated all the relevant parameters in the model, without restricting the parameter space...
We consider a stochastic process for the generation of species which combines a Yule process with a simple model for hybridization between pairs of co-existent species. We assume that the origin of the process, when there was one species, occurred at an unknown time in the past, and we condition the process on producing n species via the Yule proce...
We consider a stochastic process for the generation of species which combines a Yule process with a simple model for hybridization between pairs of co-existent species. We assume that the origin of the process, when there was one species, occurred at an unknown time in the past, and we condition the process on producing n species via the Yule proce...
Polyploidy is an important speciation mechanism, particularly in land plants. Allopolyploid species are formed after hybridization between otherwise intersterile parental species. Recent theoretical progress has led to successful implementation of species tree models that take population genetic parameters into account. However, these models have n...
It has long been recognized that phylogenetic trees are more unbalanced than those generated by a Yule process. Recently,
the degree of this imbalance has been quantified using the large set of phylogenetic trees available in the TreeBASE data
set. In this article, a more precise analysis of imbalance is undertaken. Trees simulated under a range of...
This article provides a method for calculating the joint probability density for the topology and the node times of a tree which has been produced by an multi-type age-dependent binary branching process and then sampled at a given time. These processes are a generalization, in two ways, of the constant rate birth-death process. There are a finite n...
This article discusses possible reasons why posterior clade probabilities obtained from Bayesian phylogenetic analyses might be inaccurate. It attempts to list all possible sources of uncertainty and error in Bayesian phylogenetic analysis. The choice of priors on trees has been suggested by several authors as a cause of inaccurate posterior clade...
The problem of segmenting an image with an unknown number of unknown textures is addressed. An operator is defined which gives a low output in the middle of homogeneous regions and a high output near boundaries. A hierarchical segmentation is then obtained using a novel algorithm which finds significant hollows in the output of the operator.
Questions
Question (1)
I have experience in phylogenetic analysis, but not virology. I would like to develop a substitution model for the mutations that occur in SARS-CoV-2, and would like feedback from virologists. A substitution model provides values for the 12 mutation rates A->C, A->G, ..., T->G at a particular site. There are lots of substitution models used in phylogenetic analysis, from the simplest Jukes-Cantor model, which says all 12 rates are equal, to one with 12 individual rates, and more complicated ones still, which take into account neighbouring nucleotides.
Here are some observed values:
A C G T
A - 52 308 68
C 58 - 18 1098
G 255 46 - 437
T 56 327 52 -
These may be mutations produced by the virus, or by host editing, and there are sequencing errors which can confuse matters. Note the large number of C->Ts and G->Ts, and high C->T/G->A and G->T/C->A ratios.
Please correct me if I am wrong, but this is how I understand the mechanisms within a cell:
Virus-mediated mutations. If the polymerase mismatches a pair like G:T instead of G:C, at a certain rate, it will do this when copying positive sense to negative sense, and negative to positive, at the same rate. (If not, why not?) Every virus genome that exits a cell must be the result of an equal number of positive-to-negative and negative-to-positive copies. This implies a symmetry among the rates like this:
A->C ~= T->G
A->G ~= T->C
A->T ~= T->A
C->A ~= G->T
C->G ~= G->C
C->T ~= G->A
For example,
C mispaired with A, A correctly paired with T produces a C->T.
G correctly paired with C, C mispaired with A produces a G->A.
Host-mediated mutations. (See refs below.) These do not have to obey the above symmetry, if they act preferentially on positive or negative sense RNA. The APOBEC proteins can cause a C->T mutations, but that would produce as many G->A mutations as C->T mutations if it acted on positive or negative sense RNA equally. So it seems it must mainly act on positive sense RNA. What would be the most likely reason for this? (I can think of various ideas, but don't know how plausible they are.)
Are there any known mechanisms (virus or host) which can produce more G->T mutations than C->A mutations?
RG project:
References:
The divergence between SARS-CoV-2 and RaTG13 might be overestimated due to the extensive RNA modification,
Yue Li, Xinai Yang, Na Wang, Haiyan Wang, Bin Yin, Xiaoping Yang& Wenqing Jiang.
Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2,
Salvatore Di Giorgio, Filippo Martignano, Maria Gabriella Torcia, Giorgio Mattiuz, Silvestro G. Conticello.
Evidence for strong mutation bias towards, and selection against, T/U content in SARS-CoV2: implications for attenuated vaccine design,
Alan M Rice, Atahualpa Castillo Morales, Alexander T Ho, Christine Mordstein, Stefanie Mühlhausen, Samir Watson, Laura Cano, Bethan Young, Grzegorz Kudla, Laurence D. Hurst.
Rampant C to U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses – causes and consequences for their short and long evolutionary trajectories,
Peter Simmonds.