Divakar Viswanath’s research while affiliated with University of Michigan and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
The Whitney embedding theorem is a basic result of differential topology and it is natural to ask for versions of the embedding theorem for time series data. Embedding theorems for dynamical time series data were claimed by Takens (1981) and proved in a different setting by Sauer, Yorke, and Casdagli (1991). Those results were stated assuming the observation function to be parametrized. A possibly more pertinent setting is to assume the dynamical system to be parametrized, with the observation function fixed, for example, as a projection to a certain coordinate. We prove an embedding theorem in such a setting. Our proof introduces a technique that relies on the notion of Lebesgue points.
Delay coordinates are a widely used technique to pass from observations of a dynamical system to a representation of the dynamical system as an embedding in Euclidean space. Current proofs show that delay coordinates of a given dynamical system result in embeddings generically with respect to the observation function (Sauer et al., 1991). Motivated by applications of the embedding theory, we consider flow along a single periodic orbit where the observation function is fixed but the dynamics is perturbed. For an observation function that is fixed (as a nonzero linear combination of coordinates) and for the special case of periodic solutions, we prove that delay coordinates result in an embedding generically over the space of vector fields in the Cr−1 topology with r≥2.
The first terms of the Wright–Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size [Formula presented] to be equal to the population size [Formula presented]. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is [Formula presented] for a population sized sample with [Formula presented], which is only 1% for [Formula presented]. The coalescent remains a good approximation for the site frequency spectrum of-large samples.
The diversity in genomes is due to the accumulation of mutations and the site frequency spectrum (SFS) is a popular statistic for summarizing genomic data. The current coalescent algorithm for calculating the SFS for a given demography assumes the μ → 0 limit, where μ is the mutation probability (or rate) per base pair per generation. The algorithm is applicable when μ N, N being the haploid population size, is negligible. We derive a coalescent based algorithm for calculating the SFS that allows the mutation rate μ(t) as well as the population size N(t) to vary arbitrarily as a function of time. That algorithm shows that the probability of two mutations in the genealogy becomes noticeable already for μ=10 ⁻⁸ for samples of n=10 ⁵ haploid human genomes and increases rapidly with μ. Our algorithm calculates the SFS under the assumption of a single mutation in the genealogy, and the part of the SFS due to a single mutation depends only mildly on the finiteness of μ. However, the dependence of the SFS on variation in μ can be substantial for even n=100 samples. In addition, increasing and decreasing mutation rates alter the SFS in different ways and to different extents.
Let , , be a dynamical system with being a diffeomorphism. Although the state vector is often unobservable, the dynamics can be recovered from the delay vector , where o is the scalar-valued observation function and D is the embedding dimension. The delay map is an embedding for generic o, and more strongly, the embedding property is prevalent. We consider the situation where the observation function is fixed at , with being the projection to the first coordinate. However, we allow polynomial perturbations to be applied directly to the diffeomorphism , thus mimicking the way dynamical systems are parametrized. We prove that the delay map is an embedding with probability one with respect to the perturbations. Our proof introduces a new technique for proving prevalence using the concept of Lebesgue points.
The first terms of the Wright-Fisher (WF) site frequency spectrum that follow the coalescent approximation are determined precisely, with a view to understanding the accuracy of the coalescent approximation for large samples. The perturbing terms show that the probability of a single mutant in the sample (singleton probability) is elevated in WF but the rest of the frequency spectrum is lowered. A part of the perturbation can be attributed to a mismatch in rates of merger between WF and the coalescent. The rest of it can be attributed to the difference in the way WF and the coalescent partition children between parents. In particular, the number of children of a parent is approximately Poisson under WF and approximately geometric under the coalescent. Whereas the mismatch in rates raises the probability of singletons under WF, its offspring distribution being approximately Poisson lowers it. The two effects are of opposite sense everywhere except at the tail of the frequency spectrum. The WF frequency spectrum begins to depart from that of the coalescent only for sample sizes that are comparable to the population size. These conclusions are confirmed by a separate analysis that assumes the sample size n to be equal to the population size N. Partly thanks to the canceling effects, the total variation distance of WF minus coalescent is 0.12/ log(N) for a population sized sample with n = N, which is only 1% for N = 2 x 10 ⁴ .
The Kingman coalescent is a commonly used model in genetics, which is often justified with reference to the Wright-Fisher (WF) model. Current proofs of convergence of WF and other models to the Kingman coalescent assume a constant sample size. However, sample sizes have become quite large in human genetics. Therefore, we develop a convergence theory that allows the sample size to increase with population size. If the haploid population size is N and the sample size is N1∕3-ϵ, ϵ>0, we prove that Wright-Fisher genealogies involve at most a single binary merger in each generation with probability converging to 1 in the limit of large N. Single binary merger or no merger in each generation of the genealogy implies that the Kingman partition distribution is obtained exactly. If the sample size is N1∕2-ϵ, Wright-Fisher genealogies may involve simultaneous binary mergers in a single generation but do not involve triple mergers in the large N limit. The asymptotic theory is verified using numerical calculations. Variable population sizes are handled algorithmically. It is found that even distant bottlenecks can increase the probability of triple mergers as well as simultaneous binary mergers in WF genealogies.
Delay coordinates are a widely used technique to pass from observations of a dynamical system to a representation of the dynamical system as an embedding in Euclidean space. Current proofs show that delay coordinates of a given dynamical system result in embeddings generically over a space of observations (Sauer, Yorke, Casdagli, J. Stat. Phys., vol. 65 (1991), p. 579-616). Motivated by applications of the embedding theory, we consider the situation where the observation function is fixed. For example, the observation function may simply be some fixed coordinate of the state vector. For a fixed observation function (any nonzero linear combination of coordinates) and for the special case of periodic solutions, we prove that delay coordinates result in an embedding generically over the space of flows in the topology with .
The Kingman coalescent, widely used in genetics, is known to be a good approximation when the sample size is small relative to the population size. In this article, we investigate how large the sample size can get without violating the coalescent approximation. If the haploid population size is 2 N , we prove that for samples of size N 1/3− ϵ , ϵ > 0, coalescence under the Wright-Fisher (WF) model converges in probability to the Kingman coalescent in the limit of large N . For samples of size N 2/5− ϵ or smaller, the WF coalescent converges to a mixture of the Kingman coalescent and what we call the mod-2 coalescent. For samples of size N 1/2 or larger, triple collisions in the WF genealogy of the sample become important. The sample size for which the probability of conformance with the Kingman coalescent is 95% is found to be 1.47 × N 0.31 for N ∈ [10 ³ , 10 ⁵ ], showing the pertinence of the asymptotic theory. The probability of no triple collisions is found to be 95% for sample sizes equal to 0.92 × N 0.49 , which too is in accord with the asymptotic theory.
Varying population sizes are handled using algorithms that calculate the probability of WF coalescence agreeing with the Kingman model or taking place without triple collisions. For a sample of size 100, the probabilities of coalescence according to the Kingman model are 2%, 0%, 1%, and 0% in four models of human population with constant N , constant N except for two bottlenecks, recent exponential growth, and increasing recent exponential growth, respectively. For the same four demographic models and the same sample size, the probabilities of coalescence with no triple collision are 92%, 73%, 88%, and 87%, respectively. Visualizations of the algorithm show that even distant bottlenecks can impede agreement between the coalescent and the WF model.
Finally, we prove that the WF sample frequency spectrum for samples of size N 1/3− ϵ or smaller converges to the classical answer for the coalescent.
Citations (34)
... [16,22,37,38]). There have also appeared various mathematical generalization of the theorem [6,10,13,19,27,[30][31][32][33][34][35][36]. Furthermore recently a probabilistic point of view has been introduced to the theory [2][3][4]. ...
... The main technical novelty in our approach is related to the concept of Lebesgue points. Our delay embedding theorem for the o = π 1 case requires D ≥ 4d + 2, although our earlier work [4] suggests D ≥ 2d+1. In the concluding section, we express the hope that the technique of Lebesgue points may prove useful in obtaining prevalence versions of some classical results in dynamical systems theory. ...
... (but see Melfi and Viswanath 2018b). In this regime, we see that both our approximation and the diffusion approximation accurately reconstruct the true normalized SFS, but with our approximation being about 4× more accurate in terms of total variation distance, and about 30× more accurate in terms of symmetrized KL. ...
... Several alternative approaches have therefore been put forward in an attempt to reduce or altogether avoid using forwardbased solvers, with varying levels of success: via likelihood-free sequential Monte Carlo methods [32], local approximations [33] and manifold-constrained Gaussian processes [34]. Sometimes a purely statistical approach (such as spline and kernel regression methods in [35]) is used instead to model a smooth temporal pattern. With the exception of the Gaussian-processbased approach in [34], none of the above methods are guaranteed to solve the underlying dynamical system; our weak-form approach in contrast guarantees that. ...
... This inaccuracy has been noted several times, usually in the context of the coalescent, but the coalescent and WF diffusion are dual processes, so these inaccuracies also apply to the WF diffusion. There begin to be notable discrepancies between the DTWF model and the WF diffusion for the likelihood of observing rare alleles in the sample when the sample size, n, is larger than roughly the square root of the population size, √ N [14,19,20,21,22]. ...
... On a conceptual level, unpredictability in chaotic systems arises from the rapid growth of the gap between the true trajectory and its approximation by a forecast model-motivating the intuition that Lyapunov time bounds predictability. The long lookback of models like Chronos allows them to leverage information from multiple past timepoints, and thus stabilize accumulation of error relative to forecast models that only consider the most recent timepoints (Viswanath, 2001). In this sense, long-context forecasting resembles multistep integration (Zhang & Cornelius, 2023, 2024. ...
... For floating point arithmetic, we mostly follow Higham [5], with a few modifications from [8]. The axiom of floating point arithmetic is fl(x.op.y) = (x.op.y)(1 + δ) with |δ| ≤ u, where u is the unit-roundoff (2 −53 for double precision arithmetic). ...
... These bounds are, however, already tight; consider, for instance, how error propagates in a generic numerical scheme applied to the special system of ˙ x = x. It is possible to show that the increase of global errors is linear in time only for a restricted class of ODEs (using techniques from Lyapunov's theory of stability [115]). Notice that the constant C in the exponential of our bound does not scale with −1 , and therefore the bound is uniform and rather tight. ...
... The plane Couette geometry we study is a shear flow in which two infinite plates move in opposite directions at constant speed, with turbulent behavior beginning to set in approximately above Reynolds number Re = 325 [11]. Eulerian equilibrium velocity fields have been computed for this setup over a number of years, and plane Couette flow also admits periodic, relative periodic, and traveling wave solutions [11,12]. ...
... In a parallel line of investigation, classical periodic orbit theory has been used to calculate the fractal dimension, the topological entropy, and the Lyapunov exponents of strange attractors [8,[33][34][35][36][37][38][39]. Symbolic dynamics [40][41][42][43][44][45][46] has been closely related to unstable periodic orbits as a tool for qualitative analysis of a dynamical system. ...