
Yuji NakatsukasaUniversity of Oxford | OX
Yuji Nakatsukasa
About
138
Publications
14,849
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,171
Citations
Publications
Publications (138)
Stochastic gradient descent (SGD) is a workhorse algorithm for solving large-scale optimization problems in data science and machine learning. Understanding the convergence of SGD is hence of fundamental importance. In this work we examine the SGD convergence (with various step sizes) when applied to unconstrained convex quadratic programming (esse...
Approximating a univariate function on the interval $[-1,1]$ with a polynomial is among the most classical problems in numerical analysis. When the function evaluations come with noise, a least-squares fit is known to reduce the effect of noise as more samples are taken. The generic algorithm for the least-squares problem requires $O(Nn^2)$ operati...
Given (orthonormal) approximations $\tilde{U}$ and $\tilde{V}$ to the left and right subspaces spanned by the leading singular vectors of a matrix $A$, we discuss methods to approximate the leading singular values of $A$ and study their accuracy. In particular, we focus our analysis on the generalized Nystr\"om approximation, as surprisingly, it is...
A low-rank approximation of a parameter-dependent matrix $A(t)$ is an important task in the computational sciences appearing for example in dynamical systems and compression of a series of images. In this work, we introduce AdaCUR, an efficient algorithm for computing a low-rank approximation of parameter-dependent matrices via CUR decomposition. T...
We present a simple formula to update the pseudoinverse of a full-rank rectangular matrix that undergoes a low-rank modification, and demonstrate its utility for solving least squares problems. The resulting algorithm can be dramatically faster than solving the modified least squares problem from scratch, just like the speedup enabled by Sherman--M...
One of the greatest success stories of randomized algorithms for linear algebra has been the development of fast, randomized algorithms for highly overdetermined linear least-squares problems. However, none of the existing algorithms is backward stable, preventing them from being deployed as drop-in replacements for existing QR-based solvers. This...
This work investigates the accuracy and numerical stability of CUR decompositions with oversampling. The CUR decomposition approximates a matrix using a subset of columns and rows of the matrix. When the number of columns and the rows are the same, the CUR decomposition can become unstable and less accurate due to the presence of the matrix inverse...
Unless special conditions apply, the attempt to solve ill-conditioned systems of linear equations with standard numerical methods leads to uncontrollably high numerical error and often slow convergence of an iterative solver. In many cases, such systems arise from the discretization of operator equations with a large number of discrete variables an...
The computation of a matrix function f (A) is an important task in scientific computing appearing in machine learning, network analysis and the solution of partial differential equations. In this work, we use only matrix-vector products x → Ax to approximate functions of sparse matrices and matrices with similar structures such as sparse matrices A...
Randomized algorithms in numerical linear algebra can be fast, scalable and robust. This paper examines the effect of sketching on the right singular vectors corresponding to the smallest singular values of a tall–skinny matrix. We analyze a fast algorithm by Gilbert, Park and Wakin for finding the trailing right singular vectors using randomizatio...
AAA rational approximation has normally been carried out on a discrete set, typically hundreds or thousands of points in a real interval or complex domain. Here we introduce a continuum AAA algorithm that discretizes a domain adaptively as it goes. This enables fast computation of high-accuracy rational approximations on domains such as the unit in...
Sketch-and-precondition techniques are popular for solving large least squares (LS) problems of the form $Ax=b$ with $A\in\mathbb{R}^{m\times n}$ and $m\gg n$. This is where $A$ is ``sketched" to a smaller matrix $SA$ with $S\in\mathbb{R}^{\lceil cn\rceil\times m}$ for some constant $c>1$ before an iterative LS solver computes the solution to $Ax=b...
Vandermonde matrices are usually exponentially ill-conditioned and often result in unstable approximations. In this paper, we introduce and analyze the \textit{multivariate Vandermonde with Arnoldi (V+A) method}, which is based on least-squares approximation together with a Stieltjes orthogonalization process, for approximating continuous, multivar...
This work is concerned with the computation of the action of a matrix function f(A), such as the matrix exponential or the matrix square root, on a vector b. For a general matrix A, this can be done by computing the compression of A onto a suitable Krylov subspace. Such compression is usually computed by forming an orthonormal basis of the Krylov s...
The Nystr\"om method is a popular choice for finding a low-rank approximation to a symmetric positive semi-definite matrix. The method can fail when applied to symmetric indefinite matrices, for which the error can be unboundedly large. In this work, we first identify the main challenges in finding a Nystr\"om approximation to symmetric indefinite...
Often the easiest way to discretize an ordinary or partial differential equation is by a rectangular numerical method, in which n basis functions are sampled at m ≫ n collocation points. We show how eigenvalue problems can be solved in this setting by QR reduction to square matrix generalized eigenvalue problems. The method applies equally in the l...
Randomized subspace approximation with "matrix sketching" is an effective approach for constructing approximate partial singular value decompositions (SVDs) of large matrices. The performance of such techniques has been extensively analyzed, and very precise estimates on the distribution of the residual errors have been derived. However, our unders...
We explore the concept of eigenvalue avoidance, which is well understood for real symmetric and Hermitian matrices, for other classes of structured matrices. We adopt a differential geometric perspective and study the generic behaviour of the eigenvalues of regular and injective curves $t \in ]a,b[ \mapsto A(t) \in \mathcal{N} $ where $\mathcal{N}$...
Randomized algorithms in numerical linear algebra can be fast, scalable and robust. This paper examines the effect of sketching on the right singular vectors corresponding to the smallest singular values of a tall-skinny matrix. We devise a fast algorithm for finding the trailing right singular vectors using randomization and examine the quality of...
We devise a spectral divide-and-conquer scheme for matrices that are self-adjoint with respect to a given indefinite scalar product (i.e. pseudosymmetic matrices). The pseudosymmetric structure of the matrix is preserved in the spectral division, such that the method can be applied recursively to achieve full diagonalization. The method is well-sui...
We describe two algorithms to efficiently solve regularized linear least squares systems based on sketching. The algorithms compute preconditioners for $\min \|Ax-b\|^2_2 + \lambda \|x\|^2_2$, where $A\in\mathbb{R}^{m\times n}$ and $\lambda>0$ is a regularization parameter, such that LSQR converges in $\mathcal{O}(\log(1/\epsilon))$ iterations for...
We present a new approach to compute selected eigenvalues and eigenvectors of the two-parameter eigenvalue problem. Our method requires computing generalized eigenvalue problems of the same size as the matrices of the initial two-parameter eigenvalue problem. The method is applicable for right definite problems, possibly after performing an affine...
We study the problem of estimating the diagonal of an implicitly given matrix $A$. For such a matrix we have access to an oracle that allows us to evaluate the matrix vector product $Av$. For random variable $v$ drawn from an appropriate distribution, this may be used to return an estimate of the diagonal of the matrix $A$. Whilst results exist for...
Tenfold improvements in computation speed can be brought to the alternating direction method of multipliers (ADMM) for Semidefinite Programming with virtually no decrease in robustness and provable convergence simply by projecting approximately to the Semidefinite cone. Instead of computing the projections via “exact” eigendecompositions that scale...
Often the easiest way to discretize an ordinary or partial differential equation is by a rectangular numerical method, in which n basis functions are sampled at m>>n collocation points. We show how eigenvalue problems can be solved in this setting by QR reduction to square matrix generalized eigenvalue problems. The method applies equally in the li...
This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh--Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in...
Quantum subspace diagonalization methods are an exciting new class of algorithms for solving large scale eigenvalue problems using quantum computers. Unfortunately, these methods require the solution of an ill-conditioned generalized eigenvalue problem, with a matrix pencil corrupted by a non-negligible amount of noise that is far above the machine...
We develop spectral methods for ODEs and operator eigenvalue problems that are based on a least-squares formulation of the problem. The key tool is a method for rectangular generalized eigenvalue problems, which we extend to quasimatrices and objects combining quasimatrices and matrices. The strength of the approach is its flexibility that lies in...
Unless special conditions apply, the attempt to solve ill-conditioned systems of linear equations with standard numerical methods leads to uncontrollably high numerical error. Often, such systems arise from the discretization of operator equations with a large number of discrete variables. In this paper we show that the accuracy can be improved sig...
Matrices with low-rank structure are ubiquitous in scientific computing. Choosing an appropriate rank is a key step in many computational algorithms that exploit low-rank structure. However, estimating the rank has been done largely in an ad-hoc fashion in previous studies. In this work we develop a randomized algorithm for estimating the numerical...
Current dense symmetric eigenvalue (EIG) and singular value decomposition (SVD) implementations may suffer from the lack of concurrency during the tridiagonal and bidiagonal reductions, respectively. This performance bottleneck is typical for the two-sided transformations due to the Level-2 BLAS memory-bound calls. Therefore, the current state-of-t...
We present methods for computing the generalized polar decomposition of a matrix based on the dynamically weighted Halley (DWH) iteration. This method is well established for computing the standard polar decomposition. A stable implementation is available, where matrix inversion is avoided and QR decompositions are used instead. We establish a natu...
We present an algorithm for the minimization of a nonconvex quadratic function subject to linear inequality constraints and a two-sided bound on the 2-norm of its solution. The algorithm minimizes the objective using an active-set method by solving a series of trust-region subproblems (TRS). Underpinning the efficiency of this approach is that the...
A landmark result from rational approximation theory states that x1∕p on [0,1] can be approximated by a type-(n,n) rational function with root-exponential accuracy. Motivated by the recursive optimality property of Zolotarev functions (for the square root and sign functions), we investigate approximating x1∕p by composite rational functions of the...
Rational approximations of functions with singularities can converge at a root-exponential rate if the poles are exponentially clustered. We begin by reviewing this effect in minimax, least-squares, and AAA approximations on intervals and complex domains, conformal mapping, and the numerical solution of Laplace, Helmholtz, and biharmonic equations...
In an influential 1877 paper, Zolotarev asked and answered four questions about polynomial and rational approximation. We ask and answer two questions: what are the best rational approximants $r$ and $s$ to $\sqrt{z}$ and $\mbox{sign}(z)$ on the unit circle (excluding certain arcs near the discontinuities), with the property that $|r(z)|=|s(z)|=1$...
We analyze the stability of a class of eigensolvers that target interior eigenvalues with rational filters. We show that subspace iteration with a rational filter is stable even when an eigenvalue is located near a pole of the filter. These dangerous eigenvalues contribute to large round-off errors in the first iteration, but are self-correcting in...
This article is about both approximation theory and the numerical solution of partial differential equations (PDEs). First we introduce the notion of {\em reciprocal-log} or {\em log-lightning approximation} of analytic functions with branch point singularities at points $\{z_k\}$ by functions of the form $g(z) = \sum_k c_k /(\log(z-z_k) - s_k)$, w...
Randomized SVD has become an extremely successful approach for efficiently computing a low-rank approximation of matrices. In particular the paper by Halko, Martinsson, and Tropp (SIREV 2011) contains extensive analysis, and has made it a very popular method. The typical complexity for a rank-$r$ approximation of $m\times n$ matrices is $O(mn\log n...
We present a new approach to compute selected eigenvalues and eigenvectors of the two-parameter eigenvalue problem. Our method requires computing generalized eigenvalue problems of the same size as the matrices of the initial two-parameter eigenvalue problem. The method is applicable for right definite problems, possibly after performing an affine...
Rational approximations of functions with singularities can converge at a root-exponential rate if the poles are exponentially clustered. We begin by reviewing this effect in minimax, least-squares, and AAA approximations on intervals and complex domains, conformal mapping, and the numerical solution of Laplace, Helmholtz, and biharmonic equations...
We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more effici...
Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a bandlimited or smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least square (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead...
When a projection of a symmetric or Hermitian matrix to the positive semidefinite cone is computed approximately (or to working precision on a computer), a natural question is to quantify its accuracy. A straightforward bound invoking standard eigenvalue perturbation theory (e.g. Davis-Kahan and Weyl bounds) suggests that the accuracy would be inve...
Tenfold speedups can be brought to ADMM for Semidefinite Programming with virtually no decrease in robustness and provable convergence simply by projecting approximately to the Semidefinite cone. Instead of computing the projections via "exact" eigendecompositions that scale cubically with the matrix size and cannot be warm-started, we suggest usin...
We use standard deep neural networks to classify univariate time series generated by discrete and continuous dynamical systems based on their chaotic or non-chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynam...
Vandermonde matrices are exponentially ill-conditioned, rendering the familiar "polyval(polyfit)" algorithm for polynomial interpolation and least-squares fitting ineffective at higher degrees. We show that Arnoldi orthogonalization fixes the problem.
We propose a methodology for computing single and multi-asset European option prices, and more generally expectations of scalar functions of (multivariate) random variables. This new approach combines the ability of Monte Carlo simulation to handle high-dimensional problems with the efficiency of function approximation. Specifically, we first gener...
Rational minimax approximation of real functions on real intervals is an established topic, but when it comes to complex functions or domains, there appear to be no algorithms currently in use. Such a method is introduced here, the {\em AAA-Lawson algorithm,} available in Chebfun. The new algorithm solves a wide range of problems on arbitrary domai...
When a projection of a symmetric or Hermitian matrix to the positive semidefinite cone is computed approximately (or to working precision on a computer), a natural question is to quantify its accuracy. A straightforward bound invoking standard eigenvalue perturbation theory (e.g. Davis-Kahan and Weyl bounds) suggests that the accuracy would be inve...
We use deep neural networks to classify time series generated by discrete and continuous dynamical systems based on their chaotic behaviour. Our approach to circumvent the lack of precise models for some of the most challenging real-life applications is to train different neural networks on a data set from a dynamical system with a basic or low-dim...
Graph sampling set selection, where a subset of nodes are chosen to collect samples to reconstruct a bandlimited or smooth graph signal, is a fundamental problem in graph signal processing (GSP). Previous works employ an unbiased least square (LS) signal reconstruction scheme and select samples via expensive extreme eigenvector computation. Instead...
A landmark result from rational approximation theory states that $x^{1/p}$ on $[0,1]$ can be approximated by a type-$(n,n)$ rational function with root-exponential accuracy. Motivated by the recursive optimality property of Zolotarev functions (for the square root and sign functions), we investigate approximating $x^{1/p}$ by composite rational fun...
We present an algorithm for the minimization of a nonconvex quadratic function subject to linear inequality constraints and a two-sided bound on the 2-norm of its solution. The algorithm minimizes the objective using an active-set method by solving a series of Trust-Region Subproblems (TRS). Underpinning the efficiency of this approach is that the...
We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to furth...
The nonzero eigenvalues of $AB$ are equal to those of $BA$: an identity that holds as long as the products are square, even when $A,B$ are rectangular. This fact naturally suggests an efficient algorithm for computing eigenvalues and eigenvectors of a low-rank matrix $X= AB$ with $A,B^T\in\mathbb{C}^{N\times r}, N\gg r$: form the small $r\times r$...
An important observation in compressed sensing is that the $\ell_0$ minimizer of an underdetermined linear system is equal to the $\ell_1$ minimizer when there exists a sparse solution vector. Here, we develop a continuous analogue of this observation and show that the best $L_0$ and $L_1$ polynomial approximants of a polynomial that is corrupted o...
We derive sharp bounds for the accuracy of approximate eigenvectors (Ritz vectors) obtained by the Rayleigh-Ritz process for symmetric eigenvalue problems. Using information that is available or easy to estimate, our bounds improve the classical Davis-Kahan $\sin\theta$ theorem by a factor that can be arbitrarily large, and can give nontrivial info...
The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is ill-conditioned. A recent work establishes that the instability can be cured by repeating the algorithm twice (called Choles...
A common way of finding the poles of a meromorphic function f in a domain, where an explicit expression of f is unknown but f can be evaluated at any given z, is to interpolate f by a rational function \(\frac{p}{q}\) such that \(r(\gamma _i)=f(\gamma _i)\) at prescribed sample points \(\{\gamma _i\}_{i=1}^L\), and then find the roots of q. This is...
Classical algorithms in numerical analysis for numerical integration (quadrature/cubature) follow the principle of approximate and integrate: the integrand is approximated by a simple function (e.g. a polynomial), which is then integrated exactly. In high-dimensional integration, such methods quickly become infeasible due to the curse of dimensiona...
Using a variational approach applied to generalized Rayleigh functionals, we extend the concepts of singular values and singular functions to trivariate functions defined on a rectangular parallelepiped. We also consider eigenvalues and eigenfunctions for trivariate functions whose domain is a cube. For a general finite-rank trivariate function, we...
We introduce a backward stable algorithm for computing the CS decomposition of a partitioned $2n \times n$ matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two $n \times n$ polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted $n \times n$...
We introduce a backward stable algorithm for computing the CS decomposition of a partitioned $2n \times n$ matrix with orthonormal columns, or a rank-deficient partial isometry. The algorithm computes two $n \times n$ polar decompositions (which can be carried out in parallel) followed by an eigendecomposition of a judiciously crafted $n \times n$...
Let $E_{kk}^{(n)}$ denote the minimax (i.e., best supremum norm) error in approximation of $x^n$ on $[\kern .3pt 0,1]$ by rational functions of type $(k,k)$ with $k<n$. We show that in an appropriate limit $E_{kk}^{(n)} \sim 2\kern .3pt H^{k+1/2}$ independently of $n$, where $H \approx 1/9.28903$ is Halphen's constant. This is the same formula as f...
Let $E_{kk}^{(n)}$ denote the minimax (i.e., best supremum norm) error in approximation of $x^n$ on $[\kern .3pt 0,1]$ by rational functions of type $(k,k)$ with $k<n$. We show that in an appropriate limit $E_{kk}^{(n)} \sim 2\kern .3pt H^{k+1/2}$ independently of $n$, where $H \approx 1/9.28903$ is Halphen's constant. This is the same formula as f...
A square matrix can be reduced to simpler form via similarity transformations. Here "simpler form" may refer to diagonal (when possible), triangular (Schur), or Hessenberg form. Similar reductions exist for matrix pencils if we consider general equivalence transformations instead of similarity transformations. For both matrices and matrix pencils,...
A nonconvex quadratically constrained quadratic programming (QCQP) with one constraint is usually solved via a dual SDP problem, or Moré’s algorithm based on iteratively solving linear systems. In this work we introduce an algorithm for QCQP that requires finding just one eigenpair of a generalized eigenvalue problem, and involves no outer iteratio...
Sylvester's law of inertia states that the number of positive, negative and zero eigenvalues of Hermitian matrices is preserved under congruence transformations. The same is true of generalized Hermitian definite eigenvalue problems, in which the two matrices are allowed to undergo different congruence transformations, but not for the indefinite ca...
Sylvester's law of inertia states that the number of positive, negative and zero eigenvalues of Hermitian matrices is preserved under congruence transformations. The same is true of generalized Hermitian definite eigenvalue problems, in which the two matrices are allowed to undergo different congruence transformations, but not for the indefinite ca...
As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an $m \times n$ matrix with $m \le n$ is $1/\sqrt{m}$ and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of $n_1 \times \dots...
As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an $m \times n$ matrix with $m \le n$ is $1/\sqrt{m}$ and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of $n_1 \times \dots...
The standard approach to computing an approximate SVD of a large-scale matrix is to project it onto lower-dimensional trial subspaces from both sides, compute the SVD of the small projected matrix, and project it back to the original space. This results in a low-rank approximate SVD to the original matrix, and we can then obtain approximate left an...
Computing rational minimax approximations can be very challenging when there are singularities on or near the interval of approximation - precisely the case where rational functions outperform polynomials by a landslide. We show that far more robust algorithms than previously available can be developed by making use of rational barycentric represen...
Computing rational minimax approximations can be very challenging when there are singularities on or near the interval of approximation - precisely the case where rational functions outperform polynomials by a landslide. We show that far more robust algorithms than previously available can be developed by making use of rational barycentric represen...
We present in this paper algorithms for solving stiff PDEs on the unit sphere with spectral accuracy in space and fourth-order accuracy in time. These are based on a variant of the double Fourier sphere method in coefficient space with multiplication matrices that differ from the usual ones, and implicit-explicit time-stepping schemes. Operating in...
We present in this paper algorithms for solving stiff PDEs on the unit sphere with spectral accuracy in space and fourth-order accuracy in time. These are based on a variant of the double Fourier sphere methodin coefficient space with multiplication matrices that differ from the usual ones, and implicit-explicit time-stepping schemes. Operating in...
The state-of-the-art algorithms for solving the trust-region subproblem (TRS) are based on an iterative process, involving solutions of many linear systems, eigenvalue problems, subspace optimization, or line search steps. A relatively underappreciated fact, due to Gander, Golub, and von Matt [Linear Algebra Appl., 114 (1989), pp. 815--839], is tha...