ArticlePDF Available

A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines

Authors:

Abstract

An unbiased stochastic estimator of tr(I-A), where A is the influence matrix associated with the calculation of Laplacian smoothing splines, is described. The estimator is similar to one recently developed by Girard but satisfies a minimum variance criterion and does not require the simulation of a standard normal variable. It uses instead simulations of the discrete random variable which takes the values 1, -1 each with probability 1/2. Bounds on the variance of the estimator, similar to those established by Girard, are obtained using elementary methods. The estimator can be used to approximately minimize generalised cross validation (GCV) when using discretized iterative methods for fitting Laplacian smoothing splines to very large data sets. Simulated examples show that the estimated trace values, using either the estimator presented here or the estimator of Girard, perform almost as well as the exact values when applied to the minimization of GCV for n as small as a few hundred, where n is the number ...
... Using the add and sub operators we can construct the O NtNN c operator that connects NtNN pairs corresponding to Eq. (31). Looking at Eq. (30), the first term with P μ<ν can be done by combinations of add and sub matching those Kronecker deltas in the first term. ...
... This can be done by recasting the trace as an expectation value and using QME. A similar classical approach can be carried out stochastically [31,32], and this approach has been pursued in the lattice QCD community for decades [33,34]. Quantum mean estimation uses quantum phase estimation (QPE). ...
... The trace is them computed using QME. We find an improved scaling with this quantum algorithm relative to the classical computing algorithms which are either exact based on sparse LU decomposition [38] to compute the determinant, or stochastic trace estimation [31]; with the present algorithm scaling like OðV logðVÞÞ while the traditional classical algorithm scales like OðV 3 Þ. ...
Article
Full-text available
We present a quantum algorithm to compute the logarithm of the determinant of the fermion matrix, assuming access to a classical lattice gauge field configuration. The algorithm uses the quantum eigenvalue transform, and quantum mean estimation, giving a query complexity that scales like O ( V log ( V ) ) in the matrix dimension V . Published by the American Physical Society 2025
... The covariance matrix is a dense symmetric positive definite matrix H ∈ R p×p , unlike its inverse, the precision matrix, A ∈ R p×p which is often sparse. If only the diagonal of H is required, Hutchinson's stochastic estimator [11] can be applied: ...
... The exponential and inverse quadratic covariance kernels took only one iteration for Algorithm 2.1 to converge, irrespective of the dimension of the covariance matrix, as Table 2. The RBF kernel took more iterations to converge for smaller matrices, but did converge in one iteration for covariance matrices of dimension greater than or equal to 2 11 . We note here that for certain cases, such as ill-conditioned matrices, Algorithm 2.1 will likely require more iterations to converge, therefore performing slower compared to direct methods. ...
Preprint
Full-text available
Obtaining the inverse of a large symmetric positive definite matrix ARp×p\mathcal{A}\in\mathbb{R}^{p\times p} is a continual challenge across many mathematical disciplines. The computational complexity associated with direct methods can be prohibitively expensive, making it infeasible to compute the inverse. In this paper, we present a novel iterative algorithm (IBMI), which is designed to approximate the inverse of a large, dense, symmetric positive definite matrix. The matrix is first partitioned into blocks, and an iterative process using block matrix inversion is repeated until the matrix approximation reaches a satisfactory level of accuracy. We demonstrate that the two-block, non-overlapping approach converges for any positive definite matrix, while numerical results provide strong evidence that the multi-block, overlapping approach also converges for such matrices.
... where the divergence can be approximated by the Hutchinson trace estimator. 36 Neural ODEs yield a change of coordinates that smoothly deforms p(x) into q(x ′ ), and the neural ODE can be simulated forward or reverse in time to transport samples between the prior and target distributions and compute free energy differences. However, they are potentially difficult and expensive to parameterize since Eq. ...
Article
Full-text available
Generative artificial intelligence is now a widely used tool in molecular science. Despite the popularity of probabilistic generative models, numerical experiments benchmarking their performance on molecular data are lacking. In this work, we introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models. We select three representative models: neural spline flows, conditional flow matching, and denoising diffusion probabilistic models, and examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry. Our findings are varied, with no one framework being the best for all purposes. In a nutshell, (i) neural spline flows do best at capturing mode asymmetry present in low-dimensional data, (ii) conditional flow matching outperforms other models for high-dimensional data with low complexity, and (iii) denoising diffusion probabilistic models appear the best for low-dimensional data with high complexity. Our datasets include a Gaussian mixture model and the dihedral torsion angle distribution of the Aib9 peptide, generated via a molecular dynamics simulation. We hope our taxonomy of probabilistic generative frameworks and numerical results may guide model selection for a wide range of molecular tasks.
... Lastly, for the estimation of the operator trace tr[D z ] in a matrix-free way, a popular way of approximating the trace is through Monte Carlo methods using the Hutchinson estimator [69] ...
Preprint
Full-text available
We show how to efficiently compute asymptotically sharp estimates of extreme event probabilities in stochastic differential equations (SDEs) with small multiplicative Brownian noise. The underlying approximation is known as sharp large deviation theory or precise Laplace asymptotics in mathematics, the second-order reliability method (SORM) in reliability engineering, and the instanton or optimal fluctuation method with 1-loop corrections in physics. It is based on approximating the tail probability in question with the most probable realization of the stochastic process, and local perturbations around this realization. We first recall and contextualize the relevant classical theoretical result on precise Laplace asymptotics of diffusion processes [Ben Arous (1988), Stochastics, 25(3), 125-153], and then show how to compute the involved infinite-dimensional quantities - operator traces and Carleman-Fredholm determinants - numerically in a way that is scalable with respect to the time discretization and remains feasible in high spatial dimensions. Using tools from automatic differentiation, we achieve a straightforward black-box numerical computation of the SORM estimates in JAX. The method is illustrated in examples of SDEs and stochastic partial differential equations, including a two-dimensional random advection-diffusion model of a passive scalar. We thereby demonstrate that it is possible to obtain efficient and accurate SORM estimates for very high-dimensional problems, as long as the infinite-dimensional structure of the problem is correctly taken into account. Our JAX implementation of the method is made publicly available.
... New contributions. We analyze three well-established randomized trace estimators when they are applied to parameter-dependent matrices with constant randomness (with respect to t): The Girard-Hutchinson estimator [ Gir89;Hut90], the trace of the Nyström lowrank approximation [GM13], and the Nyström++ estimator [PCK22]. Combined with Chebyshev approximation, this allows one to reuse the majority of computations across different values of the parameter t, making the estimators scale favorably with respect to the number of parameter evaluations. ...
Preprint
Full-text available
Stochastic trace estimation is a well-established tool for approximating the trace of a large symmetric matrix B\mathbf{B}. Several applications involve a matrix that depends continuously on a parameter t[a,b]t \in [a,b], and require trace estimates of B(t)\mathbf{B}(t) for many values of t. This is, for example, the case when approximating the spectral density of a matrix. Approximating the trace separately for each matrix B(t1),,B(tm)\mathbf{B}(t_1), \dots, \mathbf{B}(t_m) clearly incurs redundancies and a cost that scales linearly with m. To address this issue, we propose and analyze modifications for three stochastic trace estimators, the Girard-Hutchinson, Nystr\"om, and Nystr\"om++ estimators. Our modification uses \emph{constant} randomization across different values of t, that is, every matrix B(t1),,B(tm)\mathbf{B}(t_1), \dots, \mathbf{B}(t_m) is multiplied with the \emph{same} set of random vectors. When combined with Chebyshev approximation in t, the use of such constant random matrices allows one to reuse matrix-vector products across different values of t, leading to significant cost reduction. Our analysis shows that the loss of stochastic independence across different t does not lead to deterioration. In particular, we show that O(ε1)\mathcal{O}(\varepsilon^{-1}) random matrix-vector products suffice to ensure an error of ε>0\varepsilon > 0 for Nystr\"om++, independent of low-rank properties of B(t)\mathbf{B}(t). We discuss in detail how the combination of Nystr\"om++ with Chebyshev approximation applies to spectral density estimation and provide an analysis of the resulting method. This improves various aspects of an existing stochastic estimator for spectral density estimation. Several numerical experiments from electronic structure interaction, statistical thermodynamics, and neural network optimization validate our findings.
... where z is a vector of i.i.d. random variables with zero mean and unit variance, 1 is a vector of ones, and is a matrix collecting copies of z column-wise [18,23]. The MC-sampling reduces diag( ( )) calculations to simple matvec operations, which can be efficiently updated for the next order of moments +1,• due to the recurrent definition of ( ): indeed, assume we store the values of ( ) for = 0 . . . ...
Preprint
Full-text available
Simplicial complexes (SCs), a generalization of graph models for relational data that account for higher-order relations between data items, have become a popular abstraction for analyzing complex data using tools from topological data analysis or topological signal processing. However, the analysis of many real-world datasets leads to dense SCs with a large number of higher-order interactions. Unfortunately, analyzing such large SCs often has a prohibitive cost in terms of computation time and memory consumption. The sparsification of such complexes, i.e., the approximation of an original SC with a sparser simplicial complex with only a log-linear number of high-order simplices while maintaining a spectrum close to the original SC, is of broad interest. In this work, we develop a novel method for a probabilistic sparsifaction of SCs. At its core lies the efficient computation of sparsifying sampling probability through local densities of states as functional descriptors of the spectral information. To avoid pathological structures in the spectrum of the corresponding Hodge Laplacian operators, we suggest a "kernel-ignoring" decomposition for approximating the sampling probability; additionally, we exploit error estimates to show asymptotically prevailing algorithmic complexity of the developed method. The performance of the framework is demonstrated on the family of Vietoris--Rips filtered simplicial complexes.
... The only practical way to access information about the entries of −1 is through matrix-vector multiplications −1 , that is, by solving linear systems involving the matrix . This situation necessitates the use of stochastic estimation techniques, such as Hutchinson's method [3], which uses random vectors ∈ C whose components follow an isotropic distribution, meaning ...
... In the future, MLFFs could be specialized in ways beyond chemical subgroups, such as performing high-temperature simulations (Stocker et al., 2022) or modeling phase transitions (Jinnouchi et al., 2019) using Hessian-based KD. Exploring techniques like sketching (Woodruff et al., 2014) and stochastic estimators (Hutchinson, 1989) to accelerate Hessian computation would also be a fruitful direction. Additionally, applying sampling techniques when pre-computing the teacher Hessians would reduce the upfront cost of our approach. ...
Preprint
Full-text available
The foundation model (FM) paradigm is transforming Machine Learning Force Fields (MLFFs), leveraging general-purpose representations and scalable training to perform a variety of computational chemistry tasks. Although MLFF FMs have begun to close the accuracy gap relative to first-principles methods, there is still a strong need for faster inference speed. Additionally, while research is increasingly focused on general-purpose models which transfer across chemical space, practitioners typically only study a small subset of systems at a given time. This underscores the need for fast, specialized MLFFs relevant to specific downstream applications, which preserve test-time physical soundness while maintaining train-time scalability. In this work, we introduce a method for transferring general-purpose representations from MLFF foundation models to smaller, faster MLFFs specialized to specific regions of chemical space. We formulate our approach as a knowledge distillation procedure, where the smaller "student" MLFF is trained to match the Hessians of the energy predictions of the "teacher" foundation model. Our specialized MLFFs can be up to 20 ×\times faster than the original foundation model, while retaining, and in some cases exceeding, its performance and that of undistilled models. We also show that distilling from a teacher model with a direct force parameterization into a student model trained with conservative forces (i.e., computed as derivatives of the potential energy) successfully leverages the representations from the large-scale teacher for improved accuracy, while maintaining energy conservation during test-time molecular dynamics simulations. More broadly, our work suggests a new paradigm for MLFF development, in which foundation models are released along with smaller, specialized simulation "engines" for common chemical subsets.
Article
Full-text available
In numerical weather prediction (NWP), a large number of observations are used to create initial conditions for weather forecasting through a process known as data assimilation. An assessment of the value of these observations for NWP can guide us in the design of future observation networks, help us to identify problems with the assimilation system, and allow us to assess changes to the assimilation system. However, assessment can be challenging in convection‐permitting NWP. This is because verification of convection‐permitting forecasts is not easy, the forecast model is strongly nonlinear, a limited‐area model is used, and the observations used often contain complex error statistics and are often associated with nonlinear observation operators. We compare methods that can be used to assess the value of observations in convection‐permitting NWP and discuss operational considerations when using these methods. We focus on their applicability to ensemble forecasting systems, as these systems are becoming increasingly dominant for convection‐permitting NWP. We also identify several future research directions, which include comparing results from different methods, comparing forecast validation using analyses versus using observations, applying flow‐dependent covariance localization, investigating the effect of ensemble size on the assessment, and generating and validating the nature run in observing‐system simulation experiments.
Article
Full-text available
Let Φ(x,y,p,t) be a meteorological field of interest, say, height, temperature, a component of the wind field, etc. We suppose that data concerning the field of the form Φ_I = L_iΦ + ϵ_i are where each L_i is an arbitrary continuous linear functional and ϵ_i is a measurement error. The data Φ_i may be the result of theory, direct measurements, remote soundings or a combination of these. We develop a new mathematical formalism exploiting the method of Generalized Cross Validation (GCV), and some recently developed optimization results, for analyzing this data. The analyzed field Φ_{N,m,λ} is the solution to the minimization problem: Find Φ in a suitable space of functions to minimize N^{-1} Σ_{i=1}^N (L_i Φ - Φ_i)^2 σ_i^{-2} + λ J_m(Φ), (1) where J_m(Φ) = Σ_{α_1+α_2+α_3+α_4=m} (m!/({α_1}! {α_2}! {α_3}! {α_4}!)) ∫∫∫∫ (∂^m Φ / (∂x^{α_1} ∂y^{α_2} ∂p^{α_3} ∂t^{α_4}))^2 dx dy dp dt . Functions of d=1, 2 or 3 of the four variables x, y, p, t are also considered. The approach can he used to analyze temperature fields from radiosonde-measured temperatures and satellite radiance measurements simultaneously, to incorporate the geostrophic wind approximation and other information. In a test of the method (for d = 2) simulated 500 mb height data were obtained at discrete points corresponding to the U.S. radiosonde network, by using an analytic representation of a 500 mb wave and superimposing realistic random errors. The analytic representation was recovered on a fine grid with what appear to be impressive results. An explicit representation for the minimizer of Eq. (1) is found, and used as the basis for a direct (as opposed to iterative) numerical algorithm, which is accurate and efficient for N2 somewhat less than the high-speed storage capacity of the computer. The results extend those of Sasaki and others in several directions. In particular, no starting guesses and no preliminary interpolation of the data is required, and it is not necessary to solve a boundary-value problem or even assume boundary conditions to obtain a solution. Different types of data can be combined in a natural way. Prior climatologically estimated covariances are not used. This method may be thought of as a very general form of low-pass filter. The parameter λ controls the half-power point of the implied data filter, while m controls the rate of “roll off” of the power spectrum of the analyzed field. From another point of view. λ and m play the roles of the most important free parameters in an (implicit) prior covariance. The correct choice of the parameter λ and to some extent m is important. These parameters are estimated from the data being analyzed by the GCV method. This method estimates λ and m for which the implied data filter has maximum internal predictive capability. This capability is assessed by the GCV method by implicitly leaving out one data point at a time and determining how well the missing datum can be predicted from the remaining data. The numerical algorithm given provides for the efficient calculation of the optimum λ and m.
Article
Full-text available
A procedure for calculating the trace of the influence matrix associated with a polynomial smoothing spline of degree2m–1 fitted ton distinct, not necessarily equally spaced or uniformly weighted, data points is presented. The procedure requires orderm 2 n operations and therefore permits efficient orderm 2 n calculation of statistics associated with a polynomial smoothing spline, including the generalized cross validation. The method is a significant improvement over an existing method which requires ordern 3 operations.
Article
Smoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline. We consider the model yi(ti)+εi, i=1, 2, ..., n, ti∈[0, 1], where g∈W2(m)={f:f, f′, ..., f(m-1) abs. cont., f(m)∈ℒ2[0,1]}, and the {εi} are random errors with Eεi=0, Eεiεj=σ2δij. The error variance σ2 may be unknown. As an estimate of g we take the solution gn, λ to the problem: Find f∈W2(m) to minimize {Mathematical expression}. The function gn, λ is a smoothing polynomial spline of degree 2 m-1. The parameter λ controls the tradeoff between the "roughness" of the solution, as measured by {Mathematical expression}, and the infidelity to the data as measured by {Mathematical expression}, and so governs the average square error R(λ; g)=R(λ) defined by {Mathematical expression}. We provide an estimate {Mathematical expression}, called the generalized cross-validation estimate, for the minimizer of R(λ). The estimate {Mathematical expression} is the minimizer of V(λ) defined by {Mathematical expression}, where y=(y1, ..., yn)t and A(λ) is the n×n matrix satisfying (gn, λ (t1), ..., gn, λ (tn))t=A (λ) y. We prove that there exist a sequence of minimizers {Mathematical expression} of EV(λ), such that as the (regular) mesh {ti}i=1n becomes finer, {Mathematical expression}. A Monte Carlo experiment with several smooth g's was tried with m=2, n=50 and several values of σ2, and typical values of {Mathematical expression} were found to be in the range 1.01-1.4. The derivative g′ of g can be estimated by {Mathematical expression}. In the Monte Carlo examples tried, the minimizer of {Mathematical expression} tended to be close to the minimizer of R(λ), so that {Mathematical expression} was also a good value of the smoothing parameter for estimating the derivative.
Article
Machine contouring must not introduce information which is not present in the data. The one-dimensional spline fit has well defined smoothness properties. These are duplicated for two-dimensional interpolation in this paper, by solving the corresponding differential equation. Finite difference equations are deduced from a principle of minimum total curvature, and an iterative method of solution is outlined. Observations do not have to lie on a regular grid. Gravity and aeromagnetic surveys provide examples which compare favorably with the work of draftsmen.
Article
Non‐parametric regression using cubic splines is an attractive, flexible and widely‐applicable approach to curve estimation. Although the basic idea was formulated many years ago, the method is not as widely known or adopted as perhaps it should be. The topics and examples discussed in this paper are intended to promote the understanding and extend the practicability of the spline smoothing methodology. Particular subjects covered include the basic principles of the method; the relation with moving average and other smoothing methods; the automatic choice of the amount of smoothing; and the use of residuals for diagnostic checking and model adaptation. The question of providing inference regions for curves – and for relevant properties of curves – is approached via a finite‐dimensional Bayesian formulation.
Article
This paper is the first of three dealing with the three-dimensional wind field analysis from dual-Doppler radar data. Here we deal with the first step of the analysis which consists in interpolating and filtering the raw radial velocity fields within each coplane (or common plane simultaneously scanned by the two radars). To carry out such interpolation and filtering, a new method is proposed based on the principles of numerical variational analysis described by Sasaki (1970): the `filtered' representation of the observed field should be both `close' to the data points (in a least-squares sense) and verify some imperative of mathematical regularity. Any method for interpolating and smoothing data is inherently a filtering process. The proposed variational method enables this filtering to be controlled. The presented method is developed for any function of two variables but could be extended to the case of three or more variables.Numerical simulations substantiate the theoretically predicted filtering characteristics and show an improvement on other filtering schemes. It is found, compared to the classical filtering using the Cressman weighting function, that the variational method brings a substantial improvement of the gain curve (in the sense of a steeper cut-off), when the `regularity' of the second-order derivatives is imposed. It is worth noting that this improvement is achieved without increasing the computing time. It is also emphasized that an elaborate numerical differentiation scheme should be used to estimate the divergence, otherwise the gain curve for this parameter may be different from that for the Cartesian coplane velocities (which may induce distortion in the final three-dimensional wind field).
Article
A procedure for calculating the mean squared residual and the trace of the influence matrix associated with a polynomial smoothing spline of degree 2m–1 using an orthogonal factorization is presented. The procedure substantially overcomes the problem of ill-conditioning encountered by a recently developed method which employs a Cholesky factorization, but still requires only orderm 2 n operations and ordermn storage.
Article
We consider the problem of studying the behaviour of the eigenvalues associated with spline functions with equally spaced knots. We show that they are O( \fraci2m n )i = 1, ¼,n - mO\left( {\frac{{i^{2m} }}{n}} \right)i = 1, \ldots ,n - m wherem is the order of the spline andn, the number of knots.This result is of particular interest to prove optimality properties of the Generalized Cross-Validation Method and had been conjectured by Craven and Wahba in a recent paper.