Aku Kammonen’s research while affiliated with King Abdullah University of Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


Figure 8: Test 6, i.e. (7) with B in (10) and the real-valued neural network (11), illustrating Algorithm 1 as a pre-trainer for Adam. Validation loss with respect to actual time. The pre-training is done with Algorithm 1 with (R, A) = (1, true). The blue dashed line denotes the validation loss when training with Algorithm 1 for (R, A) = (1, false), the orange dashed and dotted one for (R, A) = (0, true), and the green dotted one for (R, A) = (1, true).
Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression
  • Preprint
  • File available

October 2024

·

38 Reads

Aku Kammonen

·

·

·

This paper presents an enhanced adaptive random Fourier features (ARFF) training algorithm for shallow neural networks, building upon the work introduced in "Adaptive Random Fourier Features with Metropolis Sampling", Kammonen et al., Foundations of Data Science, 2(3):309--332, 2020. This improved method uses a particle filter type resampling technique to stabilize the training process and reduce sensitivity to parameter choices. With resampling, the Metropolis test may also be omitted, reducing the number of hyperparameters and reducing the computational cost per iteration, compared to ARFF. We present comprehensive numerical experiments demonstrating the efficacy of our proposed algorithm in function regression tasks, both as a standalone method and as a pre-training step before gradient-based optimization, here Adam. Furthermore, we apply our algorithm to a simple image regression problem, showcasing its utility in sampling frequencies for the random Fourier features (RFF) layer of coordinate-based multilayer perceptrons (MLPs). In this context, we use the proposed algorithm to sample the parameters of the RFF layer in an automated manner.

Download

COMPARING SPECTRAL BIAS AND ROBUSTNESS FOR TWO-LAYER NEURAL NETWORKS: SGD VS ADAPTIVE RANDOM FOURIER FEATURES

May 2024

·

19 Reads

·

1 Citation

We present experimental results highlighting two key differences resulting from the choice of training algorithm for two-layer neural networks. The spectral bias of neural networks is well known, while the spectral bias dependence on the choice of training algorithm is less studied. Our experiments demonstrate that an adaptive random Fourier features algorithm (ARFF) can yield a spectral bias closer to zero compared to the stochastic gradient descent optimizer (SGD). Additionally, we train two identically structured classifiers, employing SGD and ARFF, to the same accuracy levels and empirically assess their robustness against adversarial noise attacks.


Smaller generalization error derived for a deep residual neural network compared with shallow networks

September 2022

·

99 Reads

·

7 Citations

IMA Journal of Numerical Analysis

Aku Kammonen

·

·

·

[...]

·

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers zˉ+1=zˉ+Rek=1Kbˉkeiωkzˉ+Rek=1Kcˉkeiωkx\bar z_{\ell +1}=\bar z_\ell + \textrm {Re}\sum _{k=1}^K\bar b_{\ell k}\,e^{\textrm {i}\omega _{\ell k}\bar z_\ell }+ \textrm {Re}\sum _{k=1}^K\bar c_{\ell k}\,e^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x}. An optimal distribution for the frequencies (ωk,ωk)(\omega _{\ell k},\omega ^{\prime}_{\ell k}) of the random Fourier features eiωkzˉe^{\textrm {i}\omega _{\ell k}\bar z_\ell } and eiωkxe^{\textrm {i}\omega ^{\prime}_{\ell k}\cdot x} is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x). The generalization error turns out to be smaller than the estimate f^L1(Rd)2/(KL){\|\hat f\|^2_{L^1({\mathbb {R}}^d)}}/{(KL)} of the generalization error for random Fourier features, with one hidden layer and the same total number of nodes KL, in the case of the LL^\infty -norm of f is much less than the L1L^1-norm of its Fourier transform f^\hat f. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.


Figure 1. Stratified sampling of t k with four layers and the density proportional to q. The red curve is the graph Q : [0, 1] → [0, 1].
Figure 3. Target function f 1 and approximating neural network are presented where the subfigures to the left show a slice of the functions along the x 1 -axis and the figures to the right shows the function values plotted against the target function values. The problem is in dimension d = 10.
Figure 4. Target function f 2 and approximating neural network are presented where the subfigures to the left show all the function values along the x 1 -axis and the figures to the right show the function values plotted against the target function values. The problem is in dimension d = 10. For Method 3, Layer by Layer & ADAM, the neural network vales are under the target function values.
Figure 6. The figure illustrates how the generalization error depends on the total number of nodes, LK, in Method 3. For each value of LK the 11 blue dots and the 10 red dots shows the different outcomes of the generalization error for L = 1 and L = 5 respectively. The blue dots are almost on top of each other. The problem is in dimension d = 4.
Smaller generalization error derived for deep compared to shallow residual neural networks

October 2020

·

163 Reads

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers zˉ+1=zˉ+Rek=1Kbˉkeiωkzˉ+Rek=1Kcˉkeiωkx\bar z_{\ell+1}=\bar z_\ell + \text{Re}\sum_{k=1}^K\bar b_{\ell k}e^{{\rm i}\omega_{\ell k}\bar z_\ell}+ \text{Re}\sum_{k=1}^K\bar c_{\ell k}e^{{\rm i}\omega'_{\ell k}\cdot x}. An optimal distribution for the frequencies (ωk,ωk)(\omega_{\ell k},\omega'_{\ell k}) of the random Fourier features eiωkzˉe^{{\rm i}\omega_{\ell k}\bar z_\ell} and eiωkxe^{{\rm i}\omega'_{\ell k}\cdot x} is derived. The derivation is based on the corresponding generalization error to approximate function values f(x). The generalization error turns out to be smaller than the estimate f^L1(Rd)2/(LK){\|\hat f\|^2_{L^1(\mathbb{R}^d)}}/{(LK)} of the generalization error for random Fourier features with one hidden layer and the same total number of nodes LK, in the case the LL^\infty-norm of f is much less than the L1L^1-norm of its Fourier transform f^\hat f. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network that shows promising results.


Figure 1. Case 1: Graph of the target function f with sampled data set points (x n , y n ) marked (red on-line). The inset shows | ˆ f | of its Fourier transform and the detail of its behaviour at the origin.
Figure 3. Case 2: The figure illustrates the generalization error with respect to K for a target function in dimension d = 5.
Figure 5. Case 4: Dependence on K of the misclassification percentage in the MNIST.
Adaptive random Fourier features with Metropolis sampling

July 2020

·

51 Reads

The supervised learning problem to determine a neural network approximation Rdxk=1Kβ^keiωkx\mathbb{R}^d\ni x\mapsto\sum_{k=1}^K\hat\beta_k e^{{\mathrm{i}}\omega_k\cdot x} with one hidden layer is studied as a random Fourier features algorithm. The Fourier features, i.e., the frequencies ωkRd\omega_k\in\mathbb{R}^d, are sampled using an adaptive Metropolis sampler. The Metropolis test accepts proposal frequencies ωk\omega_k', having corresponding amplitudes β^k\hat\beta_k', with the probability min{1,(β^k/β^k)γ}\min\big\{1, (|\hat\beta_k'|/|\hat\beta_k|)^\gamma\big\}, for a certain positive parameter γ\gamma, determined by minimizing the approximation error for given computational work. This adaptive, non-parametric stochastic method leads asymptotically, as KK\to\infty, to equidistributed amplitudes β^k|\hat\beta_k|, analogous to deterministic adaptive algorithms for differential equations. The equidistributed amplitudes are shown to asymptotically correspond to the optimal density for independent samples in random Fourier features methods. Numerical evidence is provided in order to demonstrate the approximation properties and efficiency of the proposed algorithm. The algorithm is tested both on synthetic data and a real-world high-dimensional benchmark.




Figure 1. The eigenvalue functions λ 1 (x) and λ 2 (x) of V (x).
Canonical quantum observables for molecular systems approximated by ab initio molecular dynamics

November 2016

·

1 Read

It is known that ab initio molecular dynamics based on the electron ground state eigenvalue can be used to approximate quantum observables in the canonical ensemble when the temperature is low compared to the first electron eigenvalue gap. This work proves that a certain weighted average of the different ab initio dynamics, corresponding to each electron eigenvalue, approximates quantum observables for any temperature. The proof uses the semiclassical Weyl law to show that canonical quantum observables of nuclei-electron systems, based on matrix valued Hamiltonian symbols, can be approximated by ab initio molecular dynamics with the error proportional to the electron-nuclei mass ratio. The result covers observables that depend on time-correlations. A combination of the Hilbert-Schmidt inner product for quantum operators and Weyl's law shows that the error estimate holds for observables and Hamiltonian symbols that have three and five bounded derivatives, respectively, provided the electron eigenvalues are distinct for any nuclei position and the observables are in the diagonal form with respect to the electron eigenstates.


Canonical Quantum Observables for Molecular Systems Approximated by Ab Initio Molecular Dynamics

November 2016

·

37 Reads

·

10 Citations

Annales Henri Poincare

Ab initio molecular dynamics based on the electron ground state eigenvalue can be used to approximate quantum observables in the canonical ensemble when the temperature is low compared to the first electron eigenvalue gap. This work proves that a certain weighted average of the different ab initio dynamics, corresponding to each electron eigenvalue, approximates quantum observables for all temperatures. The proof uses the semi-classical Weyl law to show that canonical quantum observables of nuclei-electron systems, based on matrix valued Hamiltonian symbols, can be approximated by ab initio molecular dynamics with the error proportional to the electron-nuclei mass ratio. The result includes observables that depend on correlations in time. A combination of the Hilbert-Schmidt inner product for quantum operators and Weyl's law shows that the error estimate holds for observable and Hamiltonian symbols that have three and five bounded derivatives, respectively, provided the electron eigenvalues are distinct for any nuclei position.

Citations (4)


... Training algorithms based on randomizing the feature weights have been resurfacing in the machine learning community, largely motivated by the fact that randomization, in many cases, is computationally cheaper than optimization. There have been many interesting studies reported that highlight the idea of randomizing feature weights instead of training them, demonstrating both numerical success [22,9,6] and providing theoretical understanding [25,20,14,6]. ...

Reference:

Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression
COMPARING SPECTRAL BIAS AND ROBUSTNESS FOR TWO-LAYER NEURAL NETWORKS: SGD VS ADAPTIVE RANDOM FOURIER FEATURES

... More modern methods learn a generative model for the feature distribution (Li et al. 2019;Falk et al. 2022); this results in more flexible RF methods. Another line of research derives explicit formulas for the "optimal" RF distribution based on minimization of upper bounds on the approximation error (Kammonen et al. 2020(Kammonen et al. , 2023. The authors then approximately sample from this inaccessible distribution with Markov chain Monte Carlo (MCMC) methods. ...

Smaller generalization error derived for a deep residual neural network compared with shallow networks

IMA Journal of Numerical Analysis

... More modern methods learn a generative model for the feature distribution (Li et al. 2019;Falk et al. 2022); this results in more flexible RF methods. Another line of research derives explicit formulas for the "optimal" RF distribution based on minimization of upper bounds on the approximation error (Kammonen et al. 2020(Kammonen et al. , 2023. The authors then approximately sample from this inaccessible distribution with Markov chain Monte Carlo (MCMC) methods. ...

Adaptive random Fourier features with Metropolis sampling
  • Citing Article
  • January 2019

Foundations of Data Science

... Section 5 presents numerical results comparing quantum mechanics to the three different numerical approximations based on: the ground state potential λ 0 , mean-field potential λ * and excited state dynamics. The excited state molecular dynamics studied in [14] uses several paths related to different electron eigenvalues and is defined by ...

Canonical Quantum Observables for Molecular Systems Approximated by Ab Initio Molecular Dynamics

Annales Henri Poincare