Questions related to Molecular Clocks
I will appreciate your help in setting the parameters to use two nodes fossil-calibration for a phylogenetic-chronogram reconstruction using BEAST.
I have the divergence time between the outgroup (two species) 43 mya and the crown group of species that I would like to date, and the divergence time of these two species in the outgroup (8.7 mya).
Is there any way to get the a posteriori mutation rate in the way I can use this value later on to estimate populations divergence time using among populations genetic distance?
Many thanks in advance for your help.
I am trying to calculate the origin time of some bacteria lineages, and testing the beast2 with a very sample dataset with only 12 taxa and 1 protein sequence with 1000 AAs, with wag model. I used the prior root age with 3500 MA and one cyanobacteria lineage 1200 MA with normal distribution at "priors" at BEAUti, and calibrated yule model using a fixed starting tree (with 4 parameters turn to 0). However, I keep getting the results that have very short branch length and the ESS is always low even I set the chain length to 40000000. Could anyone provide me some suggestions? Thanks a lot!
What is the best way to date a phylogenomic tree using fossil calibration? It is more or less straightforward with a few Sanger loci using programs like BEAST, but it becomes intractable with hundreds of genes, as produced with phylogenomic approaches (e.g., target capture). Just wondering if anyone had any opinions?
Thanks a lot!
I have long been wondering about the following problem.
The regular, conventional theory of evolution states that mutations occur randomly.
The molecular clock states that mutations occur at a relatively constant rate over geological time (thus enabling to establish phylogenetic relationships).
Are not these two statements completely incompatible?
How can random events lead to a regular mechanism?
Albert Jacquard stated that the simplest hypothesis to explain the constant rate of amino-acid substitutions in polypeptides was to postulate that spontaneous mutations occur at a constant rate over time. Is not such an explanation tautological?
How is it possible to establish a logical correlation between random mutations and the molecular clock?
I did several runs of NS analysis in BEAST2 (through Cipress Science Portal) for particular combinations of priors used (fossil calibration, molecular clock, paleogeographical events) to find out the best model describing my data. However outputs of NS analysis are fluctuating too much for certain combinations of priors. Even I used high number of particles (40) resulting in low SD (between 1 and 2). For example, the output of one run of analysis resulted in marginal likelihood 11432.23, the other one however in ML 8343.235. What to improve in analysis to get stable results for specific combination of priors or how to choose between these ones which one is better?
Thanks for advice.
The 16S rRNA gene is used for phylogenetic studies as it is highly conserved between different species of bacteria and archaea. ... It is suggested that 16S rRNA gene can be used as a reliable molecular clock because 16S rRNA sequences from distantly related bacterial lineages are shown to have similar functionalities.
While running fossil-calibrated molecular-clocks analysis in BEAST, I keep receiving some strange numbers as node ages. I input the numbers in millions of years (see Fig.1) and yet I am receiving mean node ages in numbers like 0.899, 0.371 etc. (see Fig.2).
I am basically rerunning published time-calibrated pyhlogenetic analysis after inclusion of new OTUs. Therefore, I have some idea how results should look like and it seems that node ages are dated relatively (in sense of their relative position) correctlly, i.e. in accordance to that published study.
In other words, the tree itself seems to be fine but my time axis looks like this (Fig.3), while it should look like this (Fig.4, btw I received this picture with correct units on the axis by forcing the root age in FigTree to be in accordance to the published study mentioned above, which is a step I wish to avoid).
I bet this will be some minor issue but perhaps someone will share their experience and save me a bit of time. Does anyone have any ideas?
I'm wondering if someone could offer some guidance on what should be considered a "minimum" sample size for estimating accurate clock rates/divergence dates within a single BEAST analysis.
I have several E. coli SNP datasets from different MLST sequence types (same species) that are quite divergent from each other (~30K SNPs between sequence types; <100 SNPs within sequence types), and I was considering analyzing each dataset separately, rather than as one analysis, as this allows me to use more appropriate (i.e. closely-related) references for SNP calling, etc. From the literature, the expected clock rate is ~2x10^(-7) SNPs/site/year, but when I include ALL isolates from all sequence types in a single root-to-tip regression (in TempEst to test for a temporal signal), the rate estimate ends up on the order of 10^(-2) SNPs/site/year...which is much higher than expected (and quite unrealistic for bacteria!). I suspect has something to do with how divergent the sequence types are from each other...although I do not know how to test this...(any thoughts on this would also be welcome!)
The problem is that some datasets end up with only 3-4 bacterial isolates per sequence type - is this too small a sample size for getting accurate results from a BEAST analysis? I have tried searching online, reading other papers, and the BEAST handbook but I can't seem to find anything discussing minimum sample sizes for accurate estimation. Intuitively, I would think that as long as there is a temporal signal in the data (i.e. a root-to-tip regression has a high R^2), then I don't see why the analysis shouldn't work with only a few samples...but from my naive statistical knowledge, small sample sizes are generally a negative...is this true for such Bayesian approaches as well?
Any help is much appreciated!!
I have a phylogenetic tree obtained using BEAST2 and inferred from two genes (COI and 18S). I would like to use this tree to improve my understanding of the evolution of each of the groups, therefore I thought about a molecular clock.
Unfortunately, I don't have fossils or geologic events that can help me set this molecular clock.
I am working with psyllids (Hemiptera: Psylloidea) and I was wondering if any known (or similar) mutation rate can be applied..
Question within the question, given that my phylogenetic tree was inferred from two genes (having different mutation rates) does this complicate the situation?
Thank you for your time,
I wonder what an average time-span of genera may be for evolutionary lineages of various supergeneric taxonomical level across the tree of life (e.g. fish, mammals, vertebrates, beetles, insects, ecdysozoans, metazoans, flowering plants, embryophytes, fungi etc.). In other words, how long on average may genera live in certain lineages? I am aware of the subjectivity of higher taxonomic categories, but there must be some time-span in which the genus is being found in the paleontological record. Similarly, using molecular clocks calibrated with fossils, we may assume the age of extant genera. Does anyone have some tips for relevant literature?
I am having some problems with node dating using substitution rates in MrBayes 3.2.6, even following the example in the program manual.
According to the manual I need to:
1. Set a normal distribution as the prior for the clock rate: e.g. using 0.02 as the mean and 0.005 as the standard deviation assuming the rate is approximately 0.01 ± 0.005 substitutions per site per million years:
MrBayes > prset clockratepr = normal(0.01,0.005)
2. Modify the tree age prior to an exponential distribution with the rate 0.01:
MrBayes > prset treeagepr = exponential(0.01)
When I run the analysis, the program does not recognize the argument “exponential” to modify the age prior:
No valid match for argument “exponential”
Invalid Treeagepr argument
Error when setting parameter “Treeagepr”
I have checked the "Command Reference for MrBayes ver. 3.2.6" and, in fact, “exponential” does not appears as a valid argument for Treeage parameter, so I think it is an error in the manual but I cannot find a way to solve it. .
Does anyone have had such situation before?
Any solution to solve the problem?
The posterior probability is mostly (>0.90), ESSs are great, MCMC samplings converge very well but I am getting overlapped HPD for divergence time estimate. I used relaxed molecular clock with log noarmal distribution. I am analyzing a mitochondrial gene with two calibration node, one mid-interior and the other is very recent.
Hi I want to check the molecular divergence or clock or time of birds by some specific nuclear genes, So please anyone can suggest me some best and easiest softwares to check molecular divergence. I am trying to do in BEAST software, If anyone knows the easiest protocol of how to use beast software means please share with me. Thanks in advance.
In a molecular phylogeny of fishes produced using a cytb marker of 704 bp, Sota et al. (2005) (http://www.ncbi.nlm.nih.gov/pubmed/15684588) calibrated an ML clock tree with a node corresponding to the MRCA of two lineages that are assumed to have diverged in allopatry for 3.5 million years.
The 'node height' of this calibrated node is 0.047 (in Fig. 3, illustrating the ML clock tree, 'node height' apparently = number of substitutions per site).
The authors state that the calibration resulted in a "substitution rate of 2.7% per million years". Later on, they state that "3.5 million years corresponds to 9.4% sequence difference, giving a molecular clock of 2.7% per My".
I suppose that: 9.4/3.5 = 2.7 ...
The node height (0.047) should in fact be the branch length, or the number of substitutions separating the MRCA to one of the two sister lineages, divided by the length of the sequence (704), that is, the (average) number of substitutions per site. In this case, the 'divergence' between the two sister sequences should be twice this amount (the number of substitutions per site between the two sequences, along both branches), or 0.094.
By dividing the divergence (0.094 or 9.4/100, or '9.4%') by 3.4 million years, the authors found a 'divergence rate' of 2.7% per million year.
This however is referred to as the "molecular clock", or the "substitution rate".
Indeed, many authors (including me) would in this case use the term 'substitution rate' to indicate the average number of substitutions per site between the MRCA and one of its descendants, that is 0.047/3.5 = 0.0134 per million year, or '1.3% per million year'.
(incidentally, it always puzzled me why this complication of the '%', which should correspond to a 'rate per 100 million years').
When Sota et al. (2005) compare their "fish cytb molecular clock" of 2.7% per million year with the estimates of different studies (Orti et al. 1994; Cantatore et al. 1994), they find a range 0.8-2.8% per million year that is perfectly compatible with both the 'divergence rate' (2.7%) and the 'substitution rate' (1.3%) calculated above ... a misunderstanding of these rates is obviously very easy, since it is entirely possible that these other authors reported 'substitution rates', and not 'divergence rates'.
I'd be happy to share your thoughts about this topic.
I wanted to know any affect of continuous light (LL) and continuous darkness (DD) on the circadian rhythm of clock genes expression on fish pineal at 24 hours time period, mainly clock and bmal genes. LL and DD is the free running condition here any environmental cues does not works. So in this situation the clock genes showing any rhythm or not like 12L-12D condition?
I am interested to know the estimated mitochondrial mutation rate (substitutions/site/my) in rodents (actually shrews) with the idea of calibrating a molecular clock (COXII gene). I have seen from the literature that it is common to see Cytb third codon positions for such purpose. I have also come across some papers on the Control Region, but being non coding is not ideal for comparisons. I have little variation (population study) and I just want an 'average' estimate to start from. I really want to avoid the 1%-2% estimate of Brown (1979) as it is most likely to be too conservative and outdated.
Thanks to all for any help/advice on this one.
Hi, molecular clock rates are widely used to link genetic divergence in invertebrates to vicariance; for example, geological events in the Pleistocene, or earlier in the Miocene. My question is how far back in time is appropriate for (invertebrate) mtCOI dating analysis? Is the Mesozoic too far back in time? (btw, I realise the use of mtCOI molecular clock rates are controversial)
I was trying to do Bayesian analysis on some of my sequence data using BEAST 1.7.5 to see how closely related they are and their migration patterns.
The substitution model used was GTR+I+G (strict molecular clock). I did 10 million iterations primarily to have a better ESS thus a rich posterior probability. Well it worked fine and for each run, I had ESS <700.
But once their locations (discrete trait) are added to the analysis, ESS dropped down to <10. Even after combining 4 independent runs, ESS remained low (<75). Trees each run generated were significantly different and location patterns doesn't seem to right. The branch colours were really confusing.
Can anyone help me to get this analysis right with the discrete trait (location)?
I guess if everything goes right, the posterior probability values I got w/o locations should be similar to with locations, right?
My expertise with Bayesian algorithms and BEAST/ beauti is extremely low.
The quartet method is one of the methods for a molecular clock. The method has some advantages over the other methods and some weaknesses against other methods. So, I search about strengths and weakness of the quartet method, if you have any experience or paper, explain about it, please.
What I mean: I have sequences of one mitochondrial gene of one species from different regions of the world. I see micro-variation in it (haplotypes). It's difficult to get a specific and good value to describe diversity of the networks I get with my data. So I thought about using a molecular clock to compare my data. At best I want to get an estimated time-value of evolution for each of my datasets. So that I could say for example: In place A there we have 1.2 million estimated years passed and in comparison to that in place B there have only 0.92 million estimated years passed.
Is there a software to use for such a thing? I searched already, but only found clocks for interspecific questions, or to get a timeline for some trees and so on.
Thank you all in advance for your help.
Molecular Clock estimation has been used frequently in phylogeographic studies in order to determine divergence time of specific taxon within group of taxa, exploring phylogegeraphical scenarios. How can I do this? I'm not able to run BEAST software. Do I have to use this software? Are there any other tools to do this?
My current research focuses on the family of polyketide synthase (PKS) genes in filamentous fungi, particularly on entomopathogenic fungi. Our previous work identified a group of reducing clade III PKSs that are highly specific and highly conserved for these insect-pathogenic fungi. Interestingly unlike other groups of PKSs, a reducing clade III PKS is present as a single copy gene in a fungal genome, verified by the data from two available genome sequences of two fungi, Beauveria bassiana and Cordyceps militaris.
In the evolutionary viewpoint, the fact that this reducing clade III PKS gene is very conserved and single-copied in the genome might lead to a hypothesis (my hypothesis) that this PKS could be an ancestor of this PKS gene family.
I would like to ask an expert in the field how to determine which clade is an ancestor and which clades are descendants and/or the ratio of nonsynonymous to synonymous substitution using a software. I understand that PAML can do that. However, the program seems difficult to use and we are not evolution people. I have tried the graphical user interface version of this software, PAMLX, still I could not complete the analysis, likely due to the incorrect settings or options selected. I was wondering if anyone can give me a clue in this analysis. Any input or comment is highly appreciated.
Does somebody know/use a method for phylogeny calibration of groups without fossil records? I've read about calibration using genetic distance of sequences to estimate approximate divergence times but I really don't know how it works.