Project

Molecular evolution models⭐️⭐️⭐️

Goal: Create models of evolution as working hypotheses for the analysis of DNA and protein sequence data. Develop methods of error propagation to quantify the uncertainty in the results of such data analysis.

Updates
0 new
3
Recommendations
0 new
0
Followers
0 new
8
Reads
0 new
60

Project log

David R. Bickel
added a research item
Confidence intervals of divergence times and branch lengths do not reflect uncertainty about their clades or about the prior distributions and other model assumptions on which they are based. Uncertainty about the clade may be propagated to a confidence interval by multiplying its confidence level by the bootstrap proportion of its clade or by another probability that the clade is correct. (If the confidence level is 95% and the bootstrap proportion is 90%, then the uncertainty-adjusted confidence level is (0.95)(0.90) = 86%.) Uncertainty about the model can be propagated to the confidence interval by reporting the union of the confidence intervals from all the plausible models. Unless there is no overlap between the confidence intervals, that results in an uncertainty-adjusted interval that has as its lower and upper limits the most extreme limits of the models. The proposed methods of uncertainty quantification may be used together.
David R. Bickel
added 3 research items
Null hypothesis significance testing is generalized by controlling the Type I error rate conditional on the existence of a non-empty confidence interval. The control of that conditional error rate results in corrected p-values called c-values. A further generalization from point null hypotheses to composite hypotheses generates C-values. The framework has implications for the following areas of application. First, for bounded parameter spaces, C-values of unspeci-fied catch-all hypotheses provide conditions under which the entire statistical model would be rejected. Second, the C-value of a point estimate or confidence interval from a previous study determines whether the conclusion of the study is replicated, discredited, or neither replicated nor discredited by a new study. Third, c-values of a finite number of hypotheses, theories, or other models facilitate both incorporating previous information into frequentist hypothesis testing and the comparison of scientific models such as those of molecular evolution. In all cases, the corrections of p-values are simple enough to be performed on a handheld device. https://doi.org/10.5281/zenodo.5123388
Confidence intervals of divergence times and branch lengths do not reflect uncertainty about their clades or about the prior distributions and other model assumptions on which they are based. Uncertainty about the clade may be propagated to a confidence interval by multiplying its confidence level by the bootstrap proportion of its clade or by another probability that the clade is correct. (If the confidence level is 95% and the bootstrap proportion is 90%, then the uncertainty-adjusted confidence level is (0.95)(0.90) = 86%.) Uncertainty about the model can be propagated to the confidence interval by reporting the union of the confidence intervals from all the plausible models. Unless there is no overlap between the confidence intervals, that results in an uncertainty-adjusted interval that has as its lower and upper limits the most extreme limits of the models. The proposed methods of uncertainty quantification may be used together. https://doi.org/10.5281/zenodo.5212069
A number of physical and biological phenomena are intermittent in the sense that they tend to have large departures from their typical dynamics. The intermittency of a multifractal can be qualified and quantified by differential or nondifferential multifractality, the extent to which the generalized Hurst exponents differ. Multifractality is related to the generalized dimension of a singular measure, but also applies to other signals, including noises, walks, anomalous diffusion, and point processes. Multifractality has uses in data-model and data-data comparisons; e.g., the multifractality of the heart rate reveals the inadequacy of unifractal models and distinguishes healthy subjects from those with heart failure. In addition, the multifractality of human activity quantifies restfulness at night.
David R. Bickel
added an update
Multiplicative model of molecular evolution
Fractal-rate Poisson model of molecular evolution
Fractal renewal model of molecular evolution
Lévy-stable model of molecular evolution
Fractional-difference model of molecular evolution
Fractal-Gaussian-rate model of molecular evolution
Quantifying the intermittency of point processes
 
David R. Bickel
added 6 research items
Modeling the rate of nucleotide substitutions in DNA as a dichotomous stochastic process with an inverse power-law correlation function describes evolution by a fractal stochastic process (FSP). This FSP model agrees with recent findings on the relationship between the variance and mean number of synonymous and nonsynonymous substitutions in 49 different genes in mammals, that being a power-law increase in the ratio of the variance to the mean, the index of dispersion, with the number of substitutions in a protein. The probability of a given number of substitutions occuring in a time t is determined by a fractional diffusion equation whose solution is a truncated Lévy distribution implying that evolution is a Lévy process in time and yields the same functional behavior for the variance in the number of substitutions as does the FSP model. In addition to obtaining these relationships, the FSP model implies lognormal statistics for the index of dispersion as a function of the mean number of substitutions in a protein, which is confirmed in the regression of the FSP model to data. Lognormal statistics suggest that molecular evolution can be viewed as a multiplicative stochastic process, rather than the linear additive processes of Darwinian selection and drift.
The fractal doubly stochastic Poisson process (FDSPP) model of molecular evolution, like other doubly stochastic Poisson models, agrees with the high estimates for the index of dispersion found from sequence comparisons. Unlike certain previous models, the FDSPP also predicts a positive geometric correlation between the index of dispersion and the mean number of substitutions. Such a relationship is statistically proven herein using comparisons between 49 mammalian genes. There is no characteristic rate associated with molecular evolution according to this model, but there is a scaling relationship in rates according to a fractal dimension of evolution. The FDSPP is a suitable replacement for the homogeneous Poisson process in tests of the lineage dependence of rates and in estimating confidence intervals for divergence times. As opposed to other fractal models, this model can be interpreted in terms of Darwinian selection and drift.
Darwin's theory of evolution by natural selection revolutionized science in the nineteenth century. Not only did it provide a new paradigm for biology, the theory formed the basis for analogous interpretations of complex systems studied by other disciplines, such as sociology and psychology. With the subsequent linking of macroscopic phenomena to microscopic processes, the Darwinian interpretation was adopted to patterns observed in molecular evolution by assuming that natural selection operate fundamentally at the level of DNA. Thus, patterns of molecular evolution have important implications in many fields of science. Although the evolution rate of a given gene seems to be of approximately the same order of magnitude in all species, genes appear to differ in rate from one another by orders of magnitude, a fact which standard theory does not adequately explain. An understanding of the statistics of rates across different genes may shed light on this problem. The evolution rates of mammalian DNA, based on recent estimates of numbers of nonsynonymous substitutions in 49 genes of human, rodents, and artiodactyls, are studied. We find that the rate variations are better described by lognormal statistics, as would be the case for a multiplicative process, than by Gaussian statistics, which would correspond to a linear, additive process. Thus, we introduce a multiplicative evolution statistical hypothesis (MESH), in which the theoretical explanation of these statistics requires the evolution of different substitution rates in different genes to be a multiplicative process in that each rate results from the interaction of a number of interdependent contingency processes.
David R. Bickel
added an update
six articles in molecular evolution, including three articles in Journal of Molecular Evolution & Molecular Biology and Evolution
 
David R. Bickel
added a project goal
Create models of evolution as working hypotheses for the analysis of DNA and protein sequence data. Develop methods of error propagation to quantify the uncertainty in the results of such data analysis.