Yuji Nakatsukasa

Yuji Nakatsukasa
University of Oxford | OX

About

138
Publications
14,849
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,171
Citations

Publications

Publications (138)
Preprint
Stochastic gradient descent (SGD) is a workhorse algorithm for solving large-scale optimization problems in data science and machine learning. Understanding the convergence of SGD is hence of fundamental importance. In this work we examine the SGD convergence (with various step sizes) when applied to unconstrained convex quadratic programming (esse...
Preprint
Full-text available
Approximating a univariate function on the interval $[-1,1]$ with a polynomial is among the most classical problems in numerical analysis. When the function evaluations come with noise, a least-squares fit is known to reduce the effect of noise as more samples are taken. The generic algorithm for the least-squares problem requires $O(Nn^2)$ operati...
Preprint
Full-text available
Given (orthonormal) approximations $\tilde{U}$ and $\tilde{V}$ to the left and right subspaces spanned by the leading singular vectors of a matrix $A$, we discuss methods to approximate the leading singular values of $A$ and study their accuracy. In particular, we focus our analysis on the generalized Nystr\"om approximation, as surprisingly, it is...
Preprint
A low-rank approximation of a parameter-dependent matrix $A(t)$ is an important task in the computational sciences appearing for example in dynamical systems and compression of a series of images. In this work, we introduce AdaCUR, an efficient algorithm for computing a low-rank approximation of parameter-dependent matrices via CUR decomposition. T...
Preprint
Full-text available
We present a simple formula to update the pseudoinverse of a full-rank rectangular matrix that undergoes a low-rank modification, and demonstrate its utility for solving least squares problems. The resulting algorithm can be dramatically faster than solving the modified least squares problem from scratch, just like the speedup enabled by Sherman--M...
Preprint
One of the greatest success stories of randomized algorithms for linear algebra has been the development of fast, randomized algorithms for highly overdetermined linear least-squares problems. However, none of the existing algorithms is backward stable, preventing them from being deployed as drop-in replacements for existing QR-based solvers. This...
Preprint
Full-text available
This work investigates the accuracy and numerical stability of CUR decompositions with oversampling. The CUR decomposition approximates a matrix using a subset of columns and rows of the matrix. When the number of columns and the rows are the same, the CUR decomposition can become unstable and less accurate due to the presence of the matrix inverse...
Article
Unless special conditions apply, the attempt to solve ill-conditioned systems of linear equations with standard numerical methods leads to uncontrollably high numerical error and often slow convergence of an iterative solver. In many cases, such systems arise from the discretization of operator equations with a large number of discrete variables an...
Preprint
Full-text available
The computation of a matrix function f (A) is an important task in scientific computing appearing in machine learning, network analysis and the solution of partial differential equations. In this work, we use only matrix-vector products x → Ax to approximate functions of sparse matrices and matrices with similar structures such as sparse matrices A...
Article
Full-text available
Randomized algorithms in numerical linear algebra can be fast, scalable and robust. This paper examines the effect of sketching on the right singular vectors corresponding to the smallest singular values of a tall–skinny matrix. We analyze a fast algorithm by Gilbert, Park and Wakin for finding the trailing right singular vectors using randomizatio...
Preprint
Full-text available
AAA rational approximation has normally been carried out on a discrete set, typically hundreds or thousands of points in a real interval or complex domain. Here we introduce a continuum AAA algorithm that discretizes a domain adaptively as it goes. This enables fast computation of high-accuracy rational approximations on domains such as the unit in...
Preprint
Full-text available
Sketch-and-precondition techniques are popular for solving large least squares (LS) problems of the form $Ax=b$ with $A\in\mathbb{R}^{m\times n}$ and $m\gg n$. This is where $A$ is ``sketched" to a smaller matrix $SA$ with $S\in\mathbb{R}^{\lceil cn\rceil\times m}$ for some constant $c>1$ before an iterative LS solver computes the solution to $Ax=b...
Preprint
Full-text available
Vandermonde matrices are usually exponentially ill-conditioned and often result in unstable approximations. In this paper, we introduce and analyze the \textit{multivariate Vandermonde with Arnoldi (V+A) method}, which is based on least-squares approximation together with a Stieltjes orthogonalization process, for approximating continuous, multivar...
Preprint
Full-text available
This work is concerned with the computation of the action of a matrix function f(A), such as the matrix exponential or the matrix square root, on a vector b. For a general matrix A, this can be done by computing the compression of A onto a suitable Krylov subspace. Such compression is usually computed by forming an orthonormal basis of the Krylov s...
Preprint
Full-text available
The Nystr\"om method is a popular choice for finding a low-rank approximation to a symmetric positive semi-definite matrix. The method can fail when applied to symmetric indefinite matrices, for which the error can be unboundedly large. In this work, we first identify the main challenges in finding a Nystr\"om approximation to symmetric indefinite...
Article
Full-text available
Often the easiest way to discretize an ordinary or partial differential equation is by a rectangular numerical method, in which n basis functions are sampled at m ≫ n collocation points. We show how eigenvalue problems can be solved in this setting by QR reduction to square matrix generalized eigenvalue problems. The method applies equally in the l...
Preprint
Randomized subspace approximation with "matrix sketching" is an effective approach for constructing approximate partial singular value decompositions (SVDs) of large matrices. The performance of such techniques has been extensively analyzed, and very precise estimates on the distribution of the residual errors have been derived. However, our unders...
Preprint
Full-text available
We explore the concept of eigenvalue avoidance, which is well understood for real symmetric and Hermitian matrices, for other classes of structured matrices. We adopt a differential geometric perspective and study the generic behaviour of the eigenvalues of regular and injective curves $t \in ]a,b[ \mapsto A(t) \in \mathcal{N} $ where $\mathcal{N}$...
Preprint
Full-text available
Randomized algorithms in numerical linear algebra can be fast, scalable and robust. This paper examines the effect of sketching on the right singular vectors corresponding to the smallest singular values of a tall-skinny matrix. We devise a fast algorithm for finding the trailing right singular vectors using randomization and examine the quality of...
Preprint
We devise a spectral divide-and-conquer scheme for matrices that are self-adjoint with respect to a given indefinite scalar product (i.e. pseudosymmetic matrices). The pseudosymmetric structure of the matrix is preserved in the spectral division, such that the method can be applied recursively to achieve full diagonalization. The method is well-sui...
Preprint
We describe two algorithms to efficiently solve regularized linear least squares systems based on sketching. The algorithms compute preconditioners for $\min \|Ax-b\|^2_2 + \lambda \|x\|^2_2$, where $A\in\mathbb{R}^{m\times n}$ and $\lambda>0$ is a regularization parameter, such that LSQR converges in $\mathcal{O}(\log(1/\epsilon))$ iterations for...
Article
We present a new approach to compute selected eigenvalues and eigenvectors of the two-parameter eigenvalue problem. Our method requires computing generalized eigenvalue problems of the same size as the matrices of the initial two-parameter eigenvalue problem. The method is applicable for right definite problems, possibly after performing an affine...
Preprint
We study the problem of estimating the diagonal of an implicitly given matrix $A$. For such a matrix we have access to an oracle that allows us to evaluate the matrix vector product $Av$. For random variable $v$ drawn from an appropriate distribution, this may be used to return an estimate of the diagonal of the matrix $A$. Whilst results exist for...
Article
Full-text available
Tenfold improvements in computation speed can be brought to the alternating direction method of multipliers (ADMM) for Semidefinite Programming with virtually no decrease in robustness and provable convergence simply by projecting approximately to the Semidefinite cone. Instead of computing the projections via “exact” eigendecompositions that scale...
Preprint
Often the easiest way to discretize an ordinary or partial differential equation is by a rectangular numerical method, in which n basis functions are sampled at m>>n collocation points. We show how eigenvalue problems can be solved in this setting by QR reduction to square matrix generalized eigenvalue problems. The method applies equally in the li...
Preprint
This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh--Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in...
Preprint
Quantum subspace diagonalization methods are an exciting new class of algorithms for solving large scale eigenvalue problems using quantum computers. Unfortunately, these methods require the solution of an ill-conditioned generalized eigenvalue problem, with a matrix pencil corrupted by a non-negligible amount of noise that is far above the machine...
Preprint
We develop spectral methods for ODEs and operator eigenvalue problems that are based on a least-squares formulation of the problem. The key tool is a method for rectangular generalized eigenvalue problems, which we extend to quasimatrices and objects combining quasimatrices and matrices. The strength of the approach is its flexibility that lies in...
Preprint
Full-text available
Unless special conditions apply, the attempt to solve ill-conditioned systems of linear equations with standard numerical methods leads to uncontrollably high numerical error. Often, such systems arise from the discretization of operator equations with a large number of discrete variables. In this paper we show that the accuracy can be improved sig...
Preprint
Matrices with low-rank structure are ubiquitous in scientific computing. Choosing an appropriate rank is a key step in many computational algorithms that exploit low-rank structure. However, estimating the rank has been done largely in an ad-hoc fashion in previous studies. In this work we develop a randomized algorithm for estimating the numerical...
Preprint
Full-text available
Current dense symmetric eigenvalue (EIG) and singular value decomposition (SVD) implementations may suffer from the lack of concurrency during the tridiagonal and bidiagonal reductions, respectively. This performance bottleneck is typical for the two-sided transformations due to the Level-2 BLAS memory-bound calls. Therefore, the current state-of-t...
Preprint
Full-text available
We present methods for computing the generalized polar decomposition of a matrix based on the dynamically weighted Halley (DWH) iteration. This method is well established for computing the standard polar decomposition. A stable implementation is available, where matrix inversion is avoided and QR decompositions are used instead. We establish a natu...
Article
Full-text available
We present an algorithm for the minimization of a nonconvex quadratic function subject to linear inequality constraints and a two-sided bound on the 2-norm of its solution. The algorithm minimizes the objective using an active-set method by solving a series of trust-region subproblems (TRS). Underpinning the efficiency of this approach is that the...
Article
A landmark result from rational approximation theory states that x1∕p on [0,1] can be approximated by a type-(n,n) rational function with root-exponential accuracy. Motivated by the recursive optimality property of Zolotarev functions (for the square root and sign functions), we investigate approximating x1∕p by composite rational functions of the...
Article
Full-text available
Rational approximations of functions with singularities can converge at a root-exponential rate if the poles are exponentially clustered. We begin by reviewing this effect in minimax, least-squares, and AAA approximations on intervals and complex domains, conformal mapping, and the numerical solution of Laplace, Helmholtz, and biharmonic equations...
Preprint
In an influential 1877 paper, Zolotarev asked and answered four questions about polynomial and rational approximation. We ask and answer two questions: what are the best rational approximants $r$ and $s$ to $\sqrt{z}$ and $\mbox{sign}(z)$ on the unit circle (excluding certain arcs near the discontinuities), with the property that $|r(z)|=|s(z)|=1$...
Preprint
Full-text available
We analyze the stability of a class of eigensolvers that target interior eigenvalues with rational filters. We show that subspace iteration with a rational filter is stable even when an eigenvalue is located near a pole of the filter. These dangerous eigenvalues contribute to large round-off errors in the first iteration, but are self-correcting in...
Preprint
This article is about both approximation theory and the numerical solution of partial differential equations (PDEs). First we introduce the notion of {\em reciprocal-log} or {\em log-lightning approximation} of analytic functions with branch point singularities at points $\{z_k\}$ by functions of the form $g(z) = \sum_k c_k /(\log(z-z_k) - s_k)$, w...
Preprint
Full-text available
Randomized SVD has become an extremely successful approach for efficiently computing a low-rank approximation of matrices. In particular the paper by Halko, Martinsson, and Tropp (SIREV 2011) contains extensive analysis, and has made it a very popular method. The typical complexity for a rank-$r$ approximation of $m\times n$ matrices is $O(mn\log n...
Preprint
Full-text available
We present a new approach to compute selected eigenvalues and eigenvectors of the two-parameter eigenvalue problem. Our method requires computing generalized eigenvalue problems of the same size as the matrices of the initial two-parameter eigenvalue problem. The method is applicable for right definite problems, possibly after performing an affine...
Preprint
Rational approximations of functions with singularities can converge at a root-exponential rate if the poles are exponentially clustered. We begin by reviewing this effect in minimax, least-squares, and AAA approximations on intervals and complex domains, conformal mapping, and the numerical solution of Laplace, Helmholtz, and biharmonic equations...
Preprint
Full-text available
We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more effici...
Article
Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a bandlimited or smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least square (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead...
Article
When a projection of a symmetric or Hermitian matrix to the positive semidefinite cone is computed approximately (or to working precision on a computer), a natural question is to quantify its accuracy. A straightforward bound invoking standard eigenvalue perturbation theory (e.g. Davis-Kahan and Weyl bounds) suggests that the accuracy would be inve...
Preprint
Tenfold speedups can be brought to ADMM for Semidefinite Programming with virtually no decrease in robustness and provable convergence simply by projecting approximately to the Semidefinite cone. Instead of computing the projections via "exact" eigendecompositions that scale cubically with the matrix size and cannot be warm-started, we suggest usin...
Article
Full-text available
We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynam...
Preprint
Vandermonde matrices are exponentially ill-conditioned, rendering the familiar "polyval(polyfit)" algorithm for polynomial interpolation and least-squares fitting ineffective at higher degrees. We show that Arnoldi orthogonalization fixes the problem.
Preprint
Full-text available
We propose a methodology for computing single and multi-asset European option prices, and more generally expectations of scalar functions of (multivariate) random variables. This new approach combines the ability of Monte Carlo simulation to handle high-dimensional problems with the efficiency of function approximation. Specifically, we first gener...
Preprint
Rational minimax approximation of real functions on real intervals is an established topic, but when it comes to complex functions or domains, there appear to be no algorithms currently in use. Such a method is introduced here, the {\em AAA-Lawson algorithm,} available in Chebfun. The new algorithm solves a wide range of problems on arbitrary domai...
Preprint
When a projection of a symmetric or Hermitian matrix to the positive semidefinite cone is computed approximately (or to working precision on a computer), a natural question is to quantify its accuracy. A straightforward bound invoking standard eigenvalue perturbation theory (e.g. Davis-Kahan and Weyl bounds) suggests that the accuracy would be inve...
Preprint
Full-text available
We use deep neural networks to classify time series generated by discrete and continuous dynamical systems based on their chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dim...
Preprint
Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a bandlimited or smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least square (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead...
Preprint
Full-text available
A landmark result from rational approximation theory states that $x^{1/p}$ on $[0,1]$ can be approximated by a type-$(n,n)$ rational function with root-exponential accuracy. Motivated by the recursive optimality property of Zolotarev functions (for the square root and sign functions), we investigate approximating $x^{1/p}$ by composite rational fun...
Preprint
We present an algorithm for the minimization of a nonconvex quadratic function subject to linear inequality constraints and a two-sided bound on the 2-norm of its solution. The algorithm minimizes the objective using an active-set method by solving a series of Trust-Region Subproblems (TRS). Underpinning the efficiency of this approach is that the...
Article
We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to furth...
Preprint
The nonzero eigenvalues of $AB$ are equal to those of $BA$: an identity that holds as long as the products are square, even when $A,B$ are rectangular. This fact naturally suggests an efficient algorithm for computing eigenvalues and eigenvectors of a low-rank matrix $X= AB$ with $A,B^T\in\mathbb{C}^{N\times r}, N\gg r$: form the small $r\times r$...
Preprint
An important observation in compressed sensing is that the $\ell_0$ minimizer of an underdetermined linear system is equal to the $\ell_1$ minimizer when there exists a sparse solution vector. Here, we develop a continuous analogue of this observation and show that the best $L_0$ and $L_1$ polynomial approximants of a polynomial that is corrupted o...
Preprint
We derive sharp bounds for the accuracy of approximate eigenvectors (Ritz vectors) obtained by the Rayleigh-Ritz process for symmetric eigenvalue problems. Using information that is available or easy to estimate, our bounds improve the classical Davis-Kahan $\sin\theta$ theorem by a factor that can be arbitrarily large, and can give nontrivial info...
Preprint
Full-text available
The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is ill-conditioned. A recent work establishes that the instability can be cured by repeating the algorithm twice (called Choles...
Article
Full-text available
A common way of finding the poles of a meromorphic function f in a domain, where an explicit expression of f is unknown but f can be evaluated at any given z, is to interpolate f by a rational function \(\frac{p}{q}\) such that \(r(\gamma _i)=f(\gamma _i)\) at prescribed sample points \(\{\gamma _i\}_{i=1}^L\), and then find the roots of q. This is...
Preprint
Full-text available
Classical algorithms in numerical analysis for numerical integration (quadrature/cubature) follow the principle of approximate and integrate: the integrand is approximated by a simple function (e.g. a polynomial), which is then integrated exactly. In high-dimensional integration, such methods quickly become infeasible due to the curse of dimensiona...
Article
Using a variational approach applied to generalized Rayleigh functionals, we extend the concepts of singular values and singular functions to trivariate functions defined on a rectangular parallelepiped. We also consider eigenvalues and eigenfunctions for trivariate functions whose domain is a cube. For a general finite-rank trivariate function, we...
Article
Full-text available
We introduce a backward stable algorithm for computing the CS decomposition of a partitioned $2n \times n$ matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two $n \times n$ polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted $n \times n$...
Preprint
We introduce a backward stable algorithm for computing the CS decomposition of a partitioned $2n \times n$ matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two $n \times n$ polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted $n \times n$...
Article
Full-text available
Let $E_{kk}^{(n)}$ denote the minimax (i.e., best supremum norm) error in approximation of $x^n$ on $[\kern .3pt 0,1]$ by rational functions of type $(k,k)$ with $k<n$. We show that in an appropriate limit $E_{kk}^{(n)} \sim 2\kern .3pt H^{k+1/2}$ independently of $n$, where $H \approx 1/9.28903$ is Halphen's constant. This is the same formula as f...
Preprint
Let $E_{kk}^{(n)}$ denote the minimax (i.e., best supremum norm) error in approximation of $x^n$ on $[\kern .3pt 0,1]$ by rational functions of type $(k,k)$ with $k<n$. We show that in an appropriate limit $E_{kk}^{(n)} \sim 2\kern .3pt H^{k+1/2}$ independently of $n$, where $H \approx 1/9.28903$ is Halphen's constant. This is the same formula as f...
Article
Full-text available
A square matrix can be reduced to simpler form via similarity transformations. Here "simpler form" may refer to diagonal (when possible), triangular (Schur), or Hessenberg form. Similar reductions exist for matrix pencils if we consider general equivalence transformations instead of similarity transformations. For both matrices and matrix pencils,...
Article
Full-text available
A nonconvex quadratically constrained quadratic programming (QCQP) with one constraint is usually solved via a dual SDP problem, or Moré’s algorithm based on iteratively solving linear systems. In this work we introduce an algorithm for QCQP that requires finding just one eigenpair of a generalized eigenvalue problem, and involves no outer iteratio...
Article
Sylvester's law of inertia states that the number of positive, negative and zero eigenvalues of Hermitian matrices is preserved under congruence transformations. The same is true of generalized Hermitian definite eigenvalue problems, in which the two matrices are allowed to undergo different congruence transformations, but not for the indefinite ca...
Preprint
Sylvester's law of inertia states that the number of positive, negative and zero eigenvalues of Hermitian matrices is preserved under congruence transformations. The same is true of generalized Hermitian definite eigenvalue problems, in which the two matrices are allowed to undergo different congruence transformations, but not for the indefinite ca...
Article
Full-text available
As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an $m \times n$ matrix with $m \le n$ is $1/\sqrt{m}$ and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of $n_1 \times \dots...
Preprint
As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an $m \times n$ matrix with $m \le n$ is $1/\sqrt{m}$ and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of $n_1 \times \dots...
Article
Full-text available
The standard approach to computing an approximate SVD of a large-scale matrix is to project it onto lower-dimensional trial subspaces from both sides, compute the SVD of the small projected matrix, and project it back to the original space. This results in a low-rank approximate SVD to the original matrix, and we can then obtain approximate left an...
Article
Computing rational minimax approximations can be very challenging when there are singularities on or near the interval of approximation - precisely the case where rational functions outperform polynomials by a landslide. We show that far more robust algorithms than previously available can be developed by making use of rational barycentric represen...
Preprint
Computing rational minimax approximations can be very challenging when there are singularities on or near the interval of approximation - precisely the case where rational functions outperform polynomials by a landslide. We show that far more robust algorithms than previously available can be developed by making use of rational barycentric represen...
Preprint
We present in this paper algorithms for solving stiff PDEs on the unit sphere with spectral accuracy in space and fourth-order accuracy in time. These are based on a variant of the double Fourier sphere method in coefficient space with multiplication matrices that differ from the usual ones, and implicit-explicit time-stepping schemes. Operating in...
Article
We present in this paper algorithms for solving stiff PDEs on the unit sphere with spectral accuracy in space and fourth-order accuracy in time. These are based on a variant of the double Fourier sphere methodin coefficient space with multiplication matrices that differ from the usual ones, and implicit-explicit time-stepping schemes. Operating in...
Article
Full-text available
The state-of-the-art algorithms for solving the trust-region subproblem (TRS) are based on an iterative process, involving solutions of many linear systems, eigenvalue problems, subspace optimization, or line search steps. A relatively underappreciated fact, due to Gander, Golub, and von Matt [Linear Algebra Appl., 114 (1989), pp. 815--839], is tha...

Network

Cited By