Article

Structured Quasi-Newton Methods for Optimization with Orthogonality Constraints

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Fundamental work on the Grassmann manifold also includes [1,38]. In recent years, more and more advanced algorithms for optimization over the Stiefel and Grassmann manifolds have been proposed, including gradient-type methods [32,19], CG methods [37,49,51], second-order methods [28,29,27,25], proximal gradient and Newton methods [12,13,30,40], stochastic variance reduced gradient methods [39,33], etc. ...
... This follows from the proof of Corollary 1 in [23], but for self-containedness, we prove the result as follows. By (25) we have ...
... Alg2, OptM, and NAG2 still succeeded in all tests while Grad failed in 9 tests. It can be seen from these test problems that the Riemannian version of Nesterov's AG method (25) is effective on the Stiefel manifold but possibly numerically problematic on the Grassmann manifold, while the proposed AG methods are effective on both of the Grassmann and Stiefel manifolds. ...
Preprint
Full-text available
In this paper we extend a nonconvex Nesterov-type accelerated gradient (AG) method to optimization over the Grassmann and Stiefel manifolds. We propose an exponential-based AG algorithm for the Grassmann manifold and a retraction-based AG algorithm that exploits the Cayley transform for both of the Grassmann and Stiefel manifolds. Under some mild assumptions, we obtain the global rate of convergence of the exponential-based AG algorithm. With additional but reasonable assumptions on retraction and vector transport, the same global rate of convergence is obtained for the retraction-based AG algorithm. Details of computing the geometric objects as ingredients of our AG algorithms are also discussed. Preliminary numerical results demonstrate the potential effectiveness of our AG methods.
... In addition, extra restrictions on the vector transport and the retraction are required for better convergence property or even convergence [61,64,[70][71][72][73][74]. Non-vector-transport-based quasi-Newton method is also explored in [75]. ...
... Furthermore, if the Euclidean Hessian itself consists of cheap and expensive parts, i.e., (3.22) where the computational cost of H e (x k ) is much more expensive than H c (x k ), an approximation of ∇ 2 f (x k ) is constructed as [75] is presented in Algorithm 6. ...
... To explain the differences between the two quasi-Newton algorithms more straightforwardly, we take the HF total energy minimization problem (2.10) as an example. From the calculation in [75], we have the Euclidean gradients ...
Article
Full-text available
Manifold optimization is ubiquitous in computational and applied mathematics, statistics, engineering, machine learning, physics, chemistry, etc. One of the main challenges usually is the non-convexity of the manifold constraints. By utilizing the geometry of manifold, a large class of constrained optimization problems can be viewed as unconstrained optimization problems on manifold. From this perspective, intrinsic structures, optimality conditions and numerical algorithms for manifold optimization are investigated. Some recent progress on the theoretical results of manifold optimization is also presented.
... In addition to being a flexible tool for new algorithm development, KSSOLV 2.0 can also be easily used to study the properties of molecules and solids. It serves as both a research and teaching tool for researchers engaged in the simulation and prediction of chemical and material properties, such as linear-response time-dependent density functional theory [11], many-electron self energy calculations [18], structure optimization [19], photocatalytic 1 Bitbucket repository with documentation: ...
... where Ψ C represents the leading N e rows of the row permuted Ψ where the permutation is defined by the permutation matrix Π obtained in (19). Note that the localized columns in Φ are not necessarily orthonormal. ...
Preprint
Full-text available
KSSOLV (Kohn-Sham Solver) is a MATLAB toolbox for performing Kohn-Sham density functional theory (DFT) calculations with a plane-wave basis set. KSSOLV 2.0 preserves the design features of the original KSSOLV software to allow users and developers to easily set up a problem and perform ground-state calculations as well as to prototype and test new algorithms. Furthermore, it includes new functionalities such as new iterative diagonalization algorithms, k-point sampling for electron band structures, geometry optimization and advanced algorithms for performing DFT calculations with local, semi-local, and hybrid exchange-correlation functionals. It can be used to study the electronic structures of both molecules and solids. We describe these new capabilities in this work through a few use cases. We also demonstrate the numerical accuracy and computational efficiency of KSSOLV on a variety of examples.
... In addition to being a flexible tool for new algorithm development, KSSOLV 2.0 can also be easily used to study the properties of molecules and solids. It serves as both a research and teaching tool for researchers engaged in the simulation and prediction of chemical and material properties, such as linear-response time-dependent density functional theory [11], many-electron self energy calculations [18], structure optimization [19], photocatalytic 1 Bitbucket repository with documentation: ...
... where Ψ C represents the leading N e rows of the row permuted Ψ where the permutation is defined by the permutation matrix Π obtained in (19). Note that the localized columns in Φ are not necessarily orthonormal. ...
Article
Full-text available
KSSOLV (Kohn-Sham Solver) is a MATLAB toolbox for performing Kohn-Sham density functional theory (DFT) calculations with a plane-wave basis set. KSSOLV 2.0 preserves the design features of the original KSSOLV software to allow users and developers to easily set up a problem and perform ground-state calculations as well as to prototype and test new algorithms. Furthermore, it includes new functionalities such as new iterative diagonalization algorithms, k-point sampling for electron band structures, geometry optimization and advanced algorithms for performing DFT calculations with local, semi-local, and hybrid exchange-correlation functionals. It can be used to study the electronic structures of both molecules and solids. We describe these new capabilities in this work through a few use cases. We also demonstrate the numerical accuracy and computational efficiency of KSSOLV on a variety of examples. Program summary Program title: Kohn-Sham Solver 2.0 (KSSOLV 2.0) CPC Library link to program files: https://doi.org/10.17632/pp8vgvfcv4.1 Developer's repository link: https://bitbucket.org/berkeleylab/kssolv2.0/src/release/ Licensing provisions: BSD 3-clause Programming language:: MATLAB Nature of problem: KSSOLV2.0 is used to perform Kohn-Sham density functional theory based electronic structure calculations to study chemical and material properties of molecules and solids. The key problem to be solved is a constrained energy minimization problem, which can also be formulated as a nonlinear eigenvalue problem. Solution method: The KSSOLV 2.0 implements both the self-consistent field (SCF) iteration with a variety of acceleration strategies and a direct constrained minimization algorithms. It is written completely in MATLAB and uses MATLAB's object oriented programming features to make it easy to use and modify.
... such as tensor approximation (with missing entries), joint diagonalization, joint t-SVD, (sparse) tensor PCA, and beyond; these will be introduced in Sect. 4. In fact, when l = 1, (1.2) boils down to optimization over the orthogonal matrix constraint, namely, the Stiefel manifold, which is a special Riemannian manifold. In recent years, Riemannian manifold optimization has drawn much attention; see, e.g., [7,10,[14][15][16][17]; fundamental concepts, tools, and algorithms can be found in [1,3,36]. Classical methods in the Euclidean space, including the gradient descent/conjugate gradient/(quasi-)Newton's method/trust region method, have been generalized to Riemannian manifolds. ...
... We expect that the derived formulas can serve as fundamental tools for designing and analyzing Riemannian gradient/conjugate gradient/(quasi-)Newton's methods, etc, for optimization over the tensor Stiefel manifold. In particular, as these components are consistent with their matrix counterparts, it is expected that the recently developed algorithms over the matrix Stiefel manifold, such as [7,10,14], can be parallelly transplanted to the tensor setting without many modifications. l i=1 tr  (i) . ...
Preprint
Let * denote the t-product between two third-order tensors. The purpose of this work is to study fundamental computation over the set St(n,p,l):={QRn×p×lQQ=I}St(n,p,l):= \{\mathcal Q\in \mathbb R^{n\times p\times l} \mid \mathcal Q^{\top}* \mathcal Q = \mathcal I \}, where Q\mathcal Q is a third-order tensor of size n×p×ln\times p \times l and I\mathcal I (npn\geq p) is the identity tensor. It is first verified that St(n,p,l) endowed with the usual Frobenius norm forms a Riemannian manifold, which is termed as the (third-order) \emph{tensor Stiefel manifold} in this work. We then derive the tangent space, Riemannian gradient, and Riemannian Hessian on St(n,p,l). In addition, formulas of various retractions based on t-QR, t-polar decomposition, Cayley transform, and t-exponential, as well as vector transports, are presented. It is expected that analogous to their matrix counterparts, the formulas derived in this study may provide basic building blocks for designing and analyzing Riemannian algorithms, such as Riemannian gradient/conjugate gradient/(quasi-)Newton methods for optimization over the tensor Stiefel manifold.
... Optimization problems with orthogonality constraints have been adequately investigated in recent decades. There emerge quite a few algorithms and solvers, such as, gradient approaches [31,32,4], conjugate gradient approaches [15,3], constraint preserving updating schemes [40,27], Newton methods [25,24], trust-region methods [5], multipliers correction frameworks [18,38], and orthonormalization-free approaches [19,42]. The aforementioned algorithms, designed for smooth objective functions, are not applicable to the problem (2). ...
... Combining (23), (24) and (25), we acquire the assertion (21). Then it follows from the definition of h ...
Preprint
As a prominent variant of principal component analysis (PCA), sparse PCA attempts to find sparse loading vectors when conducting dimension reduction. This paper aims to calculate sparse PCA through solving an optimization problem pursuing orthogonality and sparsity simultaneously. We propose a splitting and alternating approach, leading to an efficient distributed algorithm, called DAL1, for solving this nonconvex and nonsmooth optimization problem. Convergence of DAL1 to stationary points has been rigorously established. Computational experiments demonstrate that, due to its fast convergence in terms of iteration count, DAL1 requires far fewer rounds of communications to reach the prescribed accuracy than those required by existing peer methods. Unlike existing algorithms, there is a relatively small possibility of data leakage for DAL1.
... In [38], Wen and Yin applied the Cayley transform to preserve the orthogonal constraints and develop curvilinear search algorithms with lower flops compared to those based on projections and geodesics. In [24], structured quasi-Newton methods were studied for optimization problems with orthogonality constraints. In [14], Gao et al. proposed a proximal linearized augmented Lagrangian algorithm for solving optimization problems with orthogonality constraints. ...
Preprint
Full-text available
Quadratic minimization problems with orthogonality constraints (QMPO) play an important role in many applications of science and engineering. However, some existing methods may suffer from low accuracy or heavy workload for large-scale QMPO. Krylov subspace methods are popular for large-scale optimization problems. In this work, we propose a block Lanczos method for solving the large-scale QMPO. In the proposed method, the original problem is projected into a small-sized one, and the Riemannian Trust-Region method is employed to solve the reduced QMPO. Convergence results on the optimal solution, the optimal objective function value, the multiplier and the KKT error are established. Moreover, we give the convergence speed of optimal solution, and show that if the block Lanczos process terminates, then an exact KKT solution is derived. Numerical experiments illustrate the numerical behavior of the proposed algorithm, and demonstrate that it is more powerful than many state-of-the-art algorithms for large-scale quadratic minimization problems with orthogonality constraints.
... Related works Optimization problems with orthogonality constraints have been actively investigated in recent decades, for which many algorithms and solvers have been developed, such as, gradient approaches [20][21][22][23], conjugate gradient approaches [24][25][26], constraint preserving updating schemes [27,28], Newton methods [29,30], trust-region methods [31], multipliers correction frameworks [32,33], and orthonormalization-free approaches [34,35]. These aforementioned algorithms are designed for smooth objective functions, and are generally not suitable for problem (2). ...
Article
Full-text available
Sparse principal component analysis (PCA) improves interpretability of the classic PCA by introducing sparsity into the dimension-reduction process. Optimization models for sparse PCA, however, are generally non-convex, non-smooth and more difficult to solve, especially on large-scale datasets requiring distributed computation over a wide network. In this paper, we develop a distributed and centralized algorithm called DSSAL1 for sparse PCA that aims to achieve low communication overheads by adapting a newly proposed subspace-splitting strategy to accelerate convergence. Theoretically, convergence to stationary points is established for DSSAL1. Extensive numerical results show that DSSAL1 requires far fewer rounds of communication than state-of-the-art peer methods. In addition, we make the case that since messages exchanged in DSSAL1 are well-masked, the possibility of private-data leakage in DSSAL1 is much lower than in some other distributed algorithms.
... On the other hand, the optimization methods also play an important role, which can be categorized into three categories [43]. 1) The first order methods, e.g., SGD [38] and its variants [27,16,33] have been used with a wide range of models including ConvNet [29,42,22], LSTM [23], Transformer [47,15] and MLP [45,46]. 2) The high-order methods [40,25,35] utilize the curvature information but faces a challenge of calculating or approximating the Hessian matrix [8,39]. ...
Preprint
The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers (e.g., SGD). In this paper, we propose a novel paradigm of incorporating model-specific prior knowledge into optimizers and using them to train generic (simple) models. As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper-parameters, which is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. The code and models are publicly available at https://github.com/DingXiaoH/RepOptimizers.
... We also mention the existence of useful algorithms for electronic structure calculations that are in fact Stiefel optimizers (e.g., [40][41][42][43][44][45]). However, to the best of our knowledge they are momentumless, based on retractions, sometimes not exactly manifold preserving, and sometimes even implicit. ...
Preprint
The problem of optimization on Stiefel manifold, i.e., minimizing functions of (not necessarily square) matrices that satisfy orthogonality constraints, has been extensively studied, partly due to rich machine learning applications. Yet, a new approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. It leads to a gradient-based optimizer with intrinsically added momentum. This method exactly preserves the manifold structure but does not require commonly used projection or retraction, and thus having low computational costs when compared to existing algorithms. Its generalization to adaptive learning rates is also demonstrated. Pleasant performances are observed in various practical tasks. For instance, we discover that placing orthogonal constraints on attention heads of trained-from-scratch Vision Transformer [Dosovitskiy et al. 2022] could remarkably improve its performance, when our optimizer is used, and it is better that each head is made orthogonal within itself but not necessarily to other heads. This optimizer also makes the useful notion of Projection Robust Wasserstein Distance [Paty & Cuturi 2019][Lin et al. 2020] for high-dim. optimal transport even more effective.
... Additionally, in [2] was analyzed the global convergence of the general Riemannian line-search methods using the Armijo's rule to determined the step-size, and under the assumption that all the search directions are gradient-related. The asymptotic convergence analysis of some specific line-search methods in the Riemannian setting have been analyzed by many papers, for instance, the Newton method is analyzed in [4,17,36], Riemannian quasi-Newton methods [15,18,40], Riemannian conjugate gradient methods [23,33,36,37,39,46], several gradient-type methods are proposed and analyzed in [7,8,16,19,32,34]. All these papers listed above, present convergence analysis by choosing the step-size, in such a way that there is a monotone decrease in the objective function values, which is achieved using the Armijo rule, the strong Wolfe conditions, among others. ...
Article
Full-text available
In this paper, we analyze the global convergence of a general non-monotone line search method on Riemannian manifolds. For this end, we introduce some properties for the tangent search directions that guarantee the convergence, to a stationary point, of this family of optimization methods under appropriate assumptions. A modified version of the non-monotone line search of Zhang and Hager is the chosen globalization strategy to determine the step-size at each iteration. In addition, we develop a new globally convergent Riemannian conjugate gradient method that satisfies the direction assumptions introduced in this work. Finally, some numerical experiments are performed in order to demonstrate the effectiveness of the new procedure.
... Recent years have witnessed the development of algorithms for optimization problems over the Stiefel manifold, including gradient descent approaches [24,28,1,3], conjugate gradient approaches [9,33,53], constraint preserving updating schemes [41,20], Newton methods [19,18], trust-region methods [2], multipliers correction frameworks [13,38], augmented Lagrangian methods [14], sequential linearized proximal gradient algorithms [45], and so on. Moreover, exact penalty models are constructed for this special type of problems, such as PenC [44] and ExPen [43]. ...
Preprint
In this paper, we focus on the decentralized optimization problem over the Stiefel manifold, which is defined on a connected network of d agents. The objective is an average of d local functions, and each function is privately held by an agent and encodes its data. The agents can only communicate with their neighbors in a collaborative effort to solve this problem. In existing methods, multiple rounds of communications are required to guarantee the convergence, giving rise to high communication costs. In contrast, this paper proposes a decentralized algorithm, called DESTINY, which only invokes a single round of communications per iteration. DESTINY combines gradient tracking techniques with a novel approximate augmented Lagrangian function. The global convergence to stationary points is rigorously established. Comprehensive numerical experiments demonstrate that DESTINY has a strong potential to deliver a cutting-edge performance in solving a variety of testing problems.
... ≤ λ n . The calculation of eigenvalue and eigenvector is a fundamental problem with important applications in scientific computing and engineering such as principal component analysis [18] and electronic structure calculation [26]. In practice, it is usually realistic to compute the first r n largest/smallest eigenvalues and their corresponding eigenvectors of matrix A. ...
Chapter
The Barzilai-Borwein (BB) gradient method, which employs two-point stepsizes computed by the information of two consecutive iterations, is efficient in solving large-scale unconstrained optimization. In this paper, motivated by the success of BB method and the multi-point stepsize proposed by Dai and Fletcher (Mathematical Programming, 2006, 106: 403–421), we develop a new efficient gradient method, called MPSG, which adaptively uses the Dai-Fletcher long stepsize and Dai-Fletcher short stepsize. The R-linear convergence of MPSG for general n-dimensional strictly convex quadratic functions is established. By making use of two modified multi-point stepsizes and nonmonotone line searches, MPSG is extended to solve general unconstrained optimization. Moreover, the proposed MPSG method is further generalized for solving extreme eigenvalue problems. Numerical experiments on quadratic optimization, general unconstrained optimization and extreme eigenvalue problems demonstrate the efficiency of our method.
... For a given n\times n real symmetric positive definite matrix A, we are interested in the first r \ll n largest/smallest eigenvalues and their corresponding eigenvectors, which have important applications in scientific and engineering computing such as principal component analysis [19] and electronic structure calculation [29]. This problem can be formulated as an unconstrained optimization problem [1] min X\in \BbbR n\times r tr(X \sansT AX(X \sansT X) - 1 ) (4.1) or a constrained optimization problem with orthogonality constraints [40,41] min X\in \BbbR n\times r tr(X \sansT AX) s.t. ...
... For a given n×n real symmetric positive definite matrix A, we are interested in the first r ≪ n largest/smallest eigenvalues and their corresponding eigenvectors, which has important applications in scientific and engineering computing such as principal component analysis [19] and electronic structure calculation [29]. This problem can be formulated as an unconstrained optimization problem [1] (4.1) min ...
Preprint
Full-text available
A novel gradient stepsize is derived at the motivation of equipping the Barzilai-Borwein (BB) method with two dimensional quadratic termination property. A remarkable feature of the novel stepsize is that its computation only depends on the BB stepsizes in previous iterations and does not require any exact line search or the Hessian, and hence it can easily be extended for nonlinear optimization. By adaptively taking long BB steps and some short steps associated with the new stepsize, we develop an efficient gradient method for quadratic optimization and general unconstrained optimization and extend it to solve extreme eigenvalues problems. The proposed method is further extended for box-constrained optimization and singly linearly box-constrained optimization by incorporating gradient projection techniques. Numerical experiments demonstrate that the proposed method outperforms the most successful gradient methods in the literature.
... 在二阶算法方面, Riemann 信赖域算法是一个常用的方法. 最近, 文献 [161] 提出了流 形上的自适应正则化 Newton 算法, 文献 [165,166] 将 3 次正则化方法推广到了流形优化中, 文献 [167] 对正交约束问题设计了一种结构拟 Newton 算法. 张量分解最早由 Hitchcock [175] 提出, 该分解现在被称为 CANDECOMP/PARAFAC (CP) 分解. ...
Article
Over the past a few decades, significant progress has been made in the field of operations research, particularly in optimization theory and its applications. This article intensively investigates many subjects in this field, such as linear programming, nonlinear programming, online optimization, machine learning, combinatorial optimization, integer programming, mechanism design, inventory management, revenue management, etc. This survey paper does not intend to provide an encyclopedic review for the whole field, but rather focuses more on mainstream methodology, research framework, and the newest advances of some important subjects. It particularly emphasizes several recent interesting and meaningful discoveries,which potentially stimulates more quality research in the field.
... On the large residual problems, approaches that compensate the Gauss-Newton matrix by a quasi-Newton approximation to the complicate part of the Hessian matrix are often much better [25,34]. This concept has been further verified in optimization problems with orthogonality constraints in [17]. In this paper, we exploit the partial Hessian information in the stochastic setting for problem (1). ...
Preprint
In this paper, we consider large-scale finite-sum nonconvex problems arising from machine learning. Since the Hessian is often a summation of a relative cheap and accessible part and an expensive or even inaccessible part, a stochastic quasi-Newton matrix is constructed using partial Hessian information as much as possible. By further exploiting the low-rank structures based on the Nystr\"om approximation, the computation of the quasi-Newton direction is affordable. To make full use of the gradient estimation, we also develop an extra-step strategy for this framework. Global convergence to stationary point in expectation and local suplinear convergence rate are established under some mild assumptions. Numerical experiments on logistic regression, deep autoencoder networks and deep learning problems show that the efficiency of our proposed method is at least comparable with the state-of-the-art methods.
... Absil et al. [2009] provide a comprehensive treatment, showing how first-order and second-order algorithms are extended to the Riemannian setting and proving asymptotic convergence to first-order stationary points. have established global sublinear convergence results for Riemannian gradient descent and Riemannian trust region algorithms, and further showed that the latter approach converges to a second order stationary point in polynomial time; see also Kasai and Mishra [2018], Hu et al. [2018Hu et al. [ , 2019. In contradistinction to the Euclidean setting, the Riemannian trust region algorithm requires a Hessian oracle. ...
Preprint
Full-text available
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by Niles-Weed and Rigollet [2019] in a minimax sense, the original formulation for PRW/WPP can be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.
... This search makes SGD actually very computational expensive in large training data problems. also being sequential, SGDs are more difficult to parallelize allowing only data parallelism when training the models [17,18]. In the reinforcement learning space, Rafati et al. [7] notes that deep reinforcement learning applications using SGD require large memory store (Replay Buffers) and the optimization function often gets stuck with local optima, presenting further challenges to model generalization in learning applications. ...
Preprint
With advances in deep learning, exponential data growth and increasing model complexity, developing efficient optimization methods are attracting much research attention. Several implementations favor the use of Conjugate Gradient (CG) and Stochastic Gradient Descent (SGD) as being practical and elegant solutions to achieve quick convergence, however, these optimization processes also present many limitations in learning across deep learning applications. Recent research is exploring higher-order optimization functions as better approaches, but these present very complex computational challenges for practical use. Comparing first and higher-order optimization functions, in this paper, our experiments reveal that Levemberg-Marquardt (LM) significantly supersedes optimal convergence but suffers from very large processing time increasing the training complexity of both, classification and reinforcement learning problems. Our experiments compare off-the-shelf optimization functions(CG, SGD, LM and L-BFGS) in standard CIFAR, MNIST, CartPole and FlappyBird experiments.The paper presents arguments on which optimization functions to use and further, which functions would benefit from parallelization efforts to improve pretraining time and learning rate convergence.
... Compared with first-order optimization methods, highorder methods [3], [4], [5] converge at a faster speed in which the curvature information makes the search direction more effective. High-order optimizations attract widespread attention but face more challenges. ...
Article
Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this article, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Finally, we explore and give some challenges and open problems for the optimization in machine learning.
... Due to the wide applicability and fundamental difficulty of minimizing differentiable function over St(n, p), several iterative methods have been proposed to solve problem (1). Most of the proposed methods correspond to first-order methods based on Riemannian gradient methods [2,[15][16][17][18][19][20], generalizations of the non-linear conjugate gradient method [15,[21][22][23], and also Newton and quasi-Newton methods [24][25][26][27]. Other methods based on generalized power iterations, augmented lagrangean multipliers algorithms and Bregman's iterations have been proposed in [5,13,28]. ...
Conference Paper
Full-text available
In this paper we consider a class of iterative gradient projection methods for solving optimization problems with orthogonality constraints. The proposed method can be seen as a forward-backward gradient projection method which is an extension of a gradient method based on the Cayley transform. The proposal incorporates a self-adaptive scaling matrix and the Barzilai-Borwein step-sizes that accelerate the convergence of the method. In order to preserve feasibility, we adopt a projection operator based on the QR factorization. We demonstrate the efficiency of our procedure in several test problems including eigen-value computations and sparse principal component analysis. Numerical comparisons show that our proposal is effective for solving these kind of problems and presents competitive results compared with some state-of-art methods.
... Compared with first-order optimization methods, highorder methods [3], [4], [5] converge at a faster speed in which the curvature information makes the search direction more effective. High-order optimizations attract widespread attention but face more challenges. ...
Preprint
Full-text available
Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. In this paper, we first describe the optimization problems in machine learning. Then, we introduce the principles and progresses of commonly used optimization methods. Next, we summarize the applications and developments of optimization methods in some popular machine learning fields. Finally, we explore and give some challenges and open problems for the optimization in machine learning.
Article
Nowadays, anodized coating on additively manufactured (AM) or 3D printed Al-10Si-Mg alloy are used for various components in spacecraft such as antenna feeds, wave guides, structural brackets, collimators, thermal radiators etc. In this study, Artificial Neural Network (ANN) and Power law-based models are developed from experimental nanoindentation data for predicting elastic modulus and hardness of anodized AM Al-10Si-Mg at any desired loads. Data from nanoindentation experiments conducted on plan- and cross-sections of anodized coating on AM Al-10Si-Mg alloy was considered for modelling. Apart from nanomechanical properties, load and displacement curves were predicted using Python software from ANN and the Power law model of nanoindentation. It is observed that the ANN model of 50 mN nanoindentation experimental data can accurately predict the loading pattern at any desired load below 50 mN. Elastic modulus and hardness of anodized AM Al-10Si-Mg computed from ANN and the power law model of the unloading curve are also comparable with the values obtained from Weibull distribution analysis reported elsewhere. The derived models were also used to predict nanomechanical properties at 25 and 35 mN, for which no experimental data was available. The computed hardness of plan section of the anodic coating is 3.99 and 4.02 GPa for 25 and 35 mN, respectively. The computed hardness of cross-section of the anodic coating of is 7.16 and 6.61 GPa for 25 and 35 mN, respectively. Thus, the ANN and Power law model of nanoindentation can predict elastic modulus and hardness at different loads by conducting the minimum number of experiments. The novel approach to predict nanomechanical properties using ANN resulted in determining realistic and design specific data on hardness and modulus of the anodized coating on AM Al-10Si-Mg alloy.
Article
Let * denote the t-product between two third-order tensors proposed by Kilmer and Martin (Linear Algebra Appl 435(3): 641–658, 2011). The purpose of this work is to study fundamental computation over the set St(n,p,l):={XRn×p×lXX=I} \textrm{St}\left( n,p,l\right) := \{\mathcal {X} \in \mathbb R^{n\times p \times l} \mid \mathcal {X} ^{\top } * \mathcal {X} = \mathcal I \}, where X\mathcal {X} is a third-order tensor of size n×p×ln\times p \times l (npn\geqslant p) and I{\mathcal {I}} is the identity tensor. It is first verified that St(n,p,l) \textrm{St}\left( n,p,l\right) endowed with the Euclidean metric forms a Riemannian manifold, which is termed as the (third-order) tensor Stiefel manifold in this work. We then derive the tangent space, Riemannian gradient, and Riemannian Hessian on St(n,p,l) \textrm{St}\left( n,p,l\right) . In addition, formulas of various retractions based on t-QR, t-polar decomposition, t-Cayley transform, and t-exponential, as well as vector transports, are presented. It is expected that analogous to their matrix counterparts, the derived formulas may serve as building blocks for analyzing optimization problems over the tensor Stiefel manifold and designing Riemannian algorithms.
Chapter
With todays’ modern technology and lifestyle, vast amounts of data are generated exponentially day by day. Storing, processing, and analyzing such huge data is a complex problem and challenging for many applications. Since the data is extremely large, a conventional machine learning algorithm may not perform well with regards to time complexity, data scalability, and accuracy. A good amount of research work is being carried out to solve such problems by different research groups, including distributed and parallel computing, GPU based parallel and distributed computing, etc. A Support Vector Machine is a popular machine learning algorithm for classification problems and a systematic retrospect of SVM algorithms from the perspective of big data are of ample significance, which motivates for development of SVM optimization in parallel distributed computing. In this paper, we describe the state of the art SVM algorithms with their pros, cons, and challenges in parallel and distributed computing methods. Mechanisms for efficient computation of SVM for big data and avenues for the future research are also explored.
Preprint
This paper considers the optimization problem in the form of minXFvf(x)+λX1, \min_{X \in \mathcal{F}_v} f(x) + \lambda \|X\|_1, where f is smooth, Fv={XRn×q:XTX=Iq,vspan(X)}\mathcal{F}_v = \{X \in \mathbb{R}^{n \times q} : X^T X = I_q, v \in \mathrm{span}(X)\}, and v is a given positive vector. The clustering models including but not limited to the models used by k-means, community detection, and normalized cut can be reformulated as such optimization problems. It is proven that the domain Fv\mathcal{F}_v forms a compact embedded submanifold of Rn×q\mathbb{R}^{n \times q} and optimization-related tools are derived. An inexact accelerated Riemannian proximal gradient method is proposed and its global convergence is established. Numerical experiments on community detection in networks and normalized cut for image segmentation are used to demonstrate the performance of the proposed method.
Chapter
Hundreds of billions of connected IoT devices will populate the earth in future. The environment interacts with devices with restricted resources. Machine learning models will be applied in these devices to analyse the behaviour of sensor data and deliver better predictions. With high-level linked devices, network congestion is a problem. As a result of introducing optimisation into machine learning, computations can now be conducted on edge devices, reducing network congestion. Many researchers are interested in optimisation as part of machine learning. We continue to face obstacles in adopting optimisation approaches in machine learning as the quantity of data and the machine learning complexity grows. Many studies have been proposed to improve machine learning optimisation techniques and solve optimisation difficulties. The major goal of this research is to learn more about machine learning optimisation strategies that ensure the execution of these models in IoT. The first step is to conduct a survey of machine learning techniques utilised in Internet of Things. Second, issues in using optimisation in machine learning are discussed, as well as a review of several optimisation approaches used in machine learning models. Finally, some of the open challenges in machine learning optimisation are discussed.
Article
In this paper, we focus on the decentralized optimization problem over the Stiefel manifold, which is defined on a connected network of d agents. The objective is an average of d local functions, and each function is privately held by an agent and encodes its data. The agents can only communicate with their neighbors in a collaborative effort to solve this problem. In existing methods, multiple rounds of communications are required to guarantee the convergence, giving rise to high communication costs. In contrast, this paper proposes a decentralized algorithm, called DESTINY, which only invokes a single round of communications per iteration. DESTINY combines gradient tracking techniques with a novel approximate augmented Lagrangian function. The global convergence to stationary points is rigorously established. Comprehensive numerical experiments demonstrate that DESTINY has a strong potential to deliver a cutting-edge performance in solving a variety of testing problems.
Preprint
Full-text available
From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.
Chapter
Artificial neural networks (ANNs) are today the most popular machine learning algorithms. ANNs are widely applied in various fields such as medical imaging and remote sensing. One of the main challenges related to the use of ANNs is the inherent optimization problem to be solved during the training phase. This optimization step is generally performed using a gradient-based approach with a backpropagation strategy. For the sake of efficiency, regularization is generally used. When non-smooth regularizers are used to promote sparse networks, this optimization becomes challenging. Classical gradient-based optimizers cannot be used due to differentiability issues. In this paper, we propose an efficient optimization scheme formulated in a Bayesian framework. Hamiltonian dynamics are used to design an efficient sampling scheme. Promising results show the usefulness of the proposed method to allow ANNs with low complexity levels reaching high accuracy rates while performing faster that with other optimizers.
Article
Optimization with nonnegative orthogonality constraints has wide applications in machine learning and data sciences. It is NP-hard due to some combinatorial properties of the constraints. We first propose an equivalent optimization formulation with nonnegative and multiple spherical constraints and an additional single nonlinear constraint. Various constraint qualifications, the first- and second-order optimality conditions of the equivalent formulation are discussed. By establishing a local error bound of the feasible set, we design a class of (smooth) exact penalty models via keeping the nonnegative and multiple spherical constraints. The penalty models are exact if the penalty parameter is sufficiently large but finite. A practical penalty algorithm with postprocessing is then developed to approximately solve a series of subproblems with nonnegative and multiple spherical constraints. We study the asymptotic convergence and establish that any limit point is a weakly stationary point of the original problem and becomes a stationary point under some additional mild conditions. Extensive numerical results on the problem of computing the orthogonal projection onto nonnegative orthogonality constraints, the orthogonal nonnegative matrix factorization problems and the K-indicators model show the effectiveness of our proposed approach.
Preprint
We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., have zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem. The MDE framework is simple but general. It includes a wide variety of embedding methods, such as spectral embedding, principal component analysis, multidimensional scaling, dimensionality reduction methods (like Isomap and UMAP), force-directed layout, and others. It also includes new embeddings, and provides principled ways of validating historical and new embeddings alike. We develop a projected quasi-Newton method that approximately solves MDE problems and scales to large data sets. We implement this method in PyMDE, an open-source Python package. In PyMDE, users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Our software scales to data sets with millions of items and tens of millions of distortion functions. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes.
Article
Kohn–Sham density functional theory (DFT) is the most widely used electronic structure theory. Despite significant progress in the past few decades, the numerical solution of Kohn–Sham DFT problems remains challenging, especially for large-scale systems. In this paper we review the basics as well as state-of-the-art numerical methods, and focus on the unique numerical challenges of DFT.
Article
Full-text available
The commutator direct inversion of the iterative subspace (commutator DIIS or C-DIIS) method developed by Pulay is an efficient and the most widely used scheme in quantum chemistry to accelerate the convergence of self consistent field (SCF) iterations in Hartree-Fock theory and Kohn-Sham density functional theory. The C-DIIS method requires the explicit storage of the density matrix, the Fock matrix and the commutator matrix. Hence the method can only be used for systems with a relatively small basis set, such as the Gaussian basis set. We develop a new method that enables the C-DIIS method to be efficiently employed in electronic structure calculations with a large basis set such as planewaves for the first time. The key ingredient is the projection of both the density matrix and the commutator matrix to an auxiliary matrix called the gauge-fixing matrix. The resulting projected commutator-DIIS method (PC-DIIS) only operates on matrices of the same dimension as the that consists of Kohn-Sham orbitals. The cost of the method is comparable to that of standard charge mixing schemes used in large basis set calculations. The PC-DIIS method is gauge-invariant, which guarantees that its performance is invariant with respect to any unitary transformation of the Kohn-Sham orbitals. We demonstrate that the PC-DIIS method can be viewed as an extension of an iterative eigensolver for nonlinear problems. We use the PC-DIIS method for accelerating Kohn-Sham density functional theory calculations with hybrid exchange-correlation functionals, and demonstrate its superior performance compared to the commonly used nested two-level SCF iteration procedure.
Article
Full-text available
Optimization on Riemannian manifolds widely arises in eigenvalue computation, density functional theory, Bose-Einstein condensates, low rank nearest correlation, image registration, and signal processing, etc. We propose an adaptive regularized Newton method which approximates the original objective function by the second-order Taylor expansion in Euclidean space but keeps the Riemannian manifold constraints. The regularization term in the objective function of the subproblem enables us to establish a Cauchy-point like condition as the standard trust-region method for proving global convergence. The subproblem can be solved inexactly either by first-order methods or a modified Riemannian Newton method. In the later case, it can further take advantage of negative curvature directions. Both global convergence and superlinear local convergence are guaranteed under mild conditions. Extensive computational experiments and comparisons with other state-of-the-art methods indicate that the proposed algorithm is very promising.
Article
Full-text available
We consider the minimization of a cost function f on a manifold M using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance ε\varepsilon. Specifically, we show that, under Lipschitz-type assumptions on the pullbacks of f to the tangent spaces of M, both of these algorithms produce points with Riemannian gradient smaller than ε\varepsilon in O(1/ε2)O(1/\varepsilon^2) iterations. Furthermore, RTR returns a point where also the Riemannian Hessian's least eigenvalue is larger than ε-\varepsilon in O(1/ε3)O(1/\varepsilon^3) iterations. There are no assumptions on initialization. The rates match their (sharp) unconstrained counterparts as a function of the accuracy ε\varepsilon (up to constants) and hence are sharp in that sense. These are the first general results for global rates of convergence to approximate first- and second-order KKT points on manifolds. They apply in particular for optimization constrained to compact submanifolds of Rn\mathbb{R}^n, under simpler assumptions.
Article
Full-text available
QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). The acronym ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly-restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and interoperable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
Article
Full-text available
Minimization with orthogonality constraints (e.g., X X = I) and/or spherical constraints (e.g., x 2 = 1) has wide applications in polynomial optimization, combinatorial optimization, eigenvalue problems, sparse PCA, p-harmonic flows, 1-bit compressive sensing, matrix rank minimization, etc. These problems are difficult because the constraints are not only non-convex but numerically expensive to preserve during iterations. To deal with these difficulties, we propose to use a Crank-Nicolson-like update scheme to preserve the constraints and based on it, develop curvilinear search algorithms with lower per-iteration cost compared to those based on projections and geodesics. The efficiency of the proposed algorithms is demonstrated on a variety of test problems. In particular, for the maxcut problem, it exactly solves a decomposition formulation for the SDP relaxation. For polynomial optimization, nearest correlation matrix estimation and extreme eigenvalue problems, the proposed algorithms run very fast and return solutions no worse than those from their state-of-the-art algorithms. For the quadratic assignment problem, a gap 0.842% to the best known solution on the largest problem "tai256c" in QAPLIB can be reached in 5 minutes on a typical laptop.
Article
Full-text available
A general scheme for trust-region methods on Riemannian manifolds is proposed and analyzed. Among the various approaches available to (approximately) solve the trust-region subproblems, particular attention is paid to the truncated conjugate-gradient technique. The method is illustrated on problems from numerical linear algebra.
Article
Full-text available
We give an overview of the fundamental concepts of density functional theory. We give a careful discussion of the several density functionals and their differentiability properties. We show that for nondegenerate ground states we can calculate the necessary functional derivatives by means of linear response theory, but that there are some differentiability problems for degenerate ground states. These problems can be overcome by extending the domains of the functionals. We further show that for every interacting v-representable density we can find a noninteracting v-representable density arbitrarily close and show that this is sufficient to set up a Kohn–Sham scheme. We finally describe two systematic approaches for the construction of density functionals.
Article
Full-text available
In this paper, we provide theoretical analysis for a cubic regularization of Newton method as applied to unconstrained minimization problem. For this scheme, we prove general local convergence results. However, the main contribution of the paper is related to global worst-case complexity bounds for different problem classes including some nonconvex cases. It is shown that the search direction can be computed by standard linear algebra technique.
Article
Full-text available
We describe the design and implementation of KSSOLV, a MATLAB toolbox for solving a class of nonlinear eigenvalue problems known as the Kohn-Sham equations. These types of problems arise in electronic structure calculations, which are nowadays essential for studying the microscopic quantum mechanical properties of molecules, solids, and other nanoscale materials. KSSOLV is well suited for developing new algorithms for solving the Kohn-Sham equations and is designed to enable researchers in computational and applied mathematics to investigate the convergence properties of the existing algorithms. The toolbox makes use of the object-oriented programming features available in MATLAB so that the process of setting up a physical system is straightforward and the amount of coding effort required to prototype, test, and compare new algorithms is significantly reduced. All of these features should also make this package attractive to other computational scientists and students who wish to study small- to medium-size systems.
Article
Full-text available
The self-consistent field (SCF) iteration, widely used for computing the ground state energy and the corresponding single particle wave functions associated with a many-electron atomistic system, is viewed in this paper as an optimization procedure that minimizes the Kohn-Sham (KS) total energy indirectly by minimizing a sequence of quadratic surrogate functions. We point out the similarity and difference between the total energy and the surrogate, and show how the SCF iteration can fail when the minimizer of the surrogate produces an increase in the KS total energy. A trust region technique is introduced as a way to restrict the update of the wave functions within a small neighborhood of an approximate solution at which the gradient of the total energy agrees with that of the surrogate. The use of trust regions in SCF is not new. However, it has been observed that directly applying a trust region-based SCF (TRSCF) to the KS total energy often leads to slow convergence. We propose to use TRSCF within a direct constrained minimization (DCM) algorithm we developed in [J. Comput. Phys., 217 (2006), pp. 709-721]. The key ingredients of the DCM algorithm involve projecting the total energy function into a sequence of subspaces of small dimensions and seeking the minimizer of the total energy function within each subspace. The minimizer of a subspace energy function, which is computed by the TRSCF, not only provides a search direction along which the KS total energy function decreases, but also gives an optimal “step length" that yields a sufficient decrease in total energy. A numerical example is provided to demonstrate that the combination of TRSCF and DCM is more efficient than SCF.
Article
Full-text available
We propose a hybrid Gauss-Newton structured BFGS method with a new update formula and a new switch criterion for the iterative matrix to solve nonlinear least squares problems. We approximate the second term in the Hessian by a positive definite BFGS matrix. Under suitable conditions, global convergence of the proposed method with a backtracking line search is established. Moreover, the proposed method automatically reduces to the Gauss-Newton method for zero residual problems and the structured BFGS method for nonzero residual problems in a neighborhood of an accumulation point. A locally quadratic convergence rate for zero residual problems and a locally superlinear convergence rate for nonzero residual problems are obtained for the proposed method. Some numerical results are given to compare the proposed method with some existing methods.
Article
Full-text available
To generalize the descent methods of unconstrained optimization to the constrained case, we define intrinsically the gradient field of the objective function on the constraint manifold and analyze descent methods along geodesics, including the gradient projection and reduced gradient methods for special choices of coordinate systems. In particular, we generalize the quasi-Newton methods and establish their superlinear convergence; we show that they only require the updating of a reduced size matrix. In practice, the geodesic search is approximated by a tangent step followed by a constraints restoration or by a simple arc search again followed by a restoration step.
Article
Full-text available
The trust-region self-consistent field (TRSCF) method is extended to the optimization of the Kohn-Sham energy. In the TRSCF method, both the Roothaan-Hall step and the density-subspace minimization step are replaced by trust-region optimizations of local approximations to the Kohn-Sham energy, leading to a controlled, monotonic convergence towards the optimized energy. Previously the TRSCF method has been developed for optimization of the Hartree-Fock energy, which is a simple quadratic function in the density matrix. However, since the Kohn-Sham energy is a nonquadratic function of the density matrix, the local energy functions must be generalized for use with the Kohn-Sham model. Such a generalization, which contains the Hartree-Fock model as a special case, is presented here. For comparison, a rederivation of the popular direct inversion in the iterative subspace (DIIS) algorithm is performed, demonstrating that the DIIS method may be viewed as a quasi-Newton method, explaining its fast local convergence. In the global region the convergence behavior of DIIS is less predictable. The related energy DIIS technique is also discussed and shown to be inappropriate for the optimization of the Kohn-Sham energy.
Article
Full-text available
We describe new algorithms of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) Method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able to compare numerically different methods in the class, with different preconditioners, we propose a common system of model tests, using random preconditioners and initial guesses. As the "ideal" control algorithm, we advocate the standard preconditioned conjugate gradient method for nding an eigenvector as an element of the null-space of the corresponding homogeneous system of linear equations under the assumption that the eigenvalue is known. We recommend that every new preconditioned eigensolver be compared with this "ideal" algorithm on our model test problems in terms of the speed of convergence, costs of every iterations and memory requirements. We provide such comparison for our LOBPCG method. Numerical results establish that our algo...
Article
Full-text available
In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper. Comment: The condensed matter interest is as new methods for minimizing Kohn-Sham orbitals under the constraints of orthonormality and as "geometrically correct" generalizations and extensions of the analystically continued functional approach, Phys. Rev. Lett. 69, 1077 (1992). The problem of orthonormality constraints is quite general and the methods discussed are also applicable in a wide range of fields. To appear in SIAM Journal of Matrix Analysis and Applications, in press for sometime in August-October 1998; 52 pages, 8 figures
Article
Riemannian optimization is the task of finding an optimum of a real-valued function defined on a Riemannian manifold. Riemannian optimization has been a topic of much interest over the past few years due to many applications including computer vision, signal processing, and numerical linear algebra. The substantial background required to successfully design and apply Riemannian optimization algorithms is a significant impediment for many potential users. Therefore, multiple packages, such as Manopt (in Matlab) and Pymanopt (in Python), have been developed. This article describes ROPTLIB, a C++ library for Riemannian optimization. Unlike prior packages, ROPTLIB simultaneously achieves the following goals: (i) it has user-friendly interfaces in Matlab, Julia, and C++; (ii) users do not need to implement manifold- and algorithm-related objects; (iii) it provides efficient computational time due to its C++ core; (iv) it implements state-of-the-art generic Riemannian optimization algorithms, including quasi-Newton algorithms; and (v) it is based on object-oriented programming, allowing users to rapidly add new algorithms and manifolds.
Article
In this paper, a Riemannian BFGS method for minimizing a smooth function on a Riemannian manifold is defined, based on a Riemannian generalization of a cautious update and a weak line search condition. It is proven that the Riemannian BFGS method converges (i) globally to stationary points without assuming the objective function to be convex and (ii) superlinearly to a nondegenerate minimizer. Using the weak line search condition removes the need for information from differentiated retraction. The joint matrix diagonalization problem is chosen to demonstrate the performance of the algorithms with various parameters, line search conditions, and pairs of retraction and vector transport. A preliminary version can be found in [Numerical Mathematics and Advanced Applications: ENUMATH 2015, Lect. Notes Comput. Sci. Eng. 112, Springer, New York, 2016, pp. 627-634].
Article
The adaptively compressed exchange (ACE) method provides an efficient way for solving Hartree-Fock-like equations in quantum physics, chemistry, and materials science. The key step of the ACE method is to adaptively compress an operator that is possibly dense and full-rank. In this paper, we present a detailed study of the adaptive compression operation, and establish rigorous convergence properties of the adaptive compression method in the context of solving linear eigenvalue problems. Our analysis also elucidates the potential use of the adaptive compression method in a wide range of problems.
Article
The Fock exchange operator plays a central role in modern quantum chemistry. The large computational cost associated with the Fock exchange operator hinders Hartree-Fock calculations and Kohn-Sham density functional theory calculations with hybrid exchange-correlation functionals, even for systems consisting of hundreds of atoms. We develop the adaptively compressed exchange operator (ACE) formulation, which greatly reduces the computational cost associated with the Fock exchange operator without loss of accuracy. The ACE formulation does not depend on the size of the band gap, and thus can be applied to insulating, semiconducting as well as metallic systems. In an iterative framework for solving Hartree-Fock-like systems such as in planewave based methods, the ACE formulation only requires moderate modification of the code. The ACE formulation can also be advantageous for other types of basis sets, especially when the storage cost of the exchange operator is expensive. Numerical results indicate that the ACE formulation can become advantageous even for small systems with tens of atoms. In particular, the cost of each self-consistent field iteration for the electron density in the ACE formulation is only marginally larger than that of the generalized gradient approximation (GGA) calculation, and thus offers orders of magnitude speedup for Hartree-Fock-like calculations.
Article
This paper develops and analyzes a generalization of the Broyden class of quasi- Newton methods to the problem of minimizing a smooth objective function f on a Riemannian manifold. A condition on vector transport and retraction that guarantees convergence and facilitates efficient computation is derived. Experimental evidence is presented demonstrating the value of the extension to the Riemannian Broyden class through superior performance for some problems compared to existing Riemannian BFGS methods, in particular those that depend on differentiated retraction.
Article
The well-known symmetric rank-one trust-region method—where the Hessian approximation is generated by the symmetric rank-one update—is generalized to the problem of minimizing a real-valued function over a d -dimensional Riemannian manifold. The generalization relies on basic differential-geometric concepts, such as tangent spaces, Riemannian metrics, and the Riemannian gradient, as well as on the more recent notions of (first-order) retraction and vector transport. The new method, called RTR-SR1, is shown to converge globally and d+1 -step q-superlinearly to stationary points of the objective function. A limited-memory version, referred to as LRTR-SR1, is also introduced. In this context, novel efficient strategies are presented to construct a vector transport on a submanifold of a Euclidean space. Numerical experiments—Rayleigh quotient minimization on the sphere and a joint diagonalization problem on the Stiefel manifold—illustrate the value of the new methods.
Article
The self-consistent field (SCF) iteration has been used ubiquitously for solving the Kohn-Sham (KS) equation or the minimization of the KS total energy functional with respect to orthogonality constraints in electronic structure calculations. Although SCF with heuristics such as charge mixing often works remarkably well on many problems, it is well known that its convergence can be unpredictable and there is no general theoretical analysis on its performance. We regularize the SCF iteration and establish rigorous global convergence to the first-order optimality conditions. The Hessian of the total energy functional is further exploited. By adding the part of the Hessian which is not considered in SCF, our methods can always achieve a highly accurate solution on problems for which SCF fails and exhibit a better convergence rate than SCF in the KSSOLV toolbox under the MATLAB environment.
Article
In many data-intensive applications, the use of principal component analysis (PCA) and other related techniques is ubiquitous for dimension reduction, data mining or other transformational purposes. Such transformations often require efficiently, reliably and accurately computing dominant singular value decompositions (SVDs) of large unstructured matrices. In this paper, we propose and study a subspace optimization technique to significantly accelerate the classic simultaneous iteration method. We analyze the convergence of the proposed algorithm, and numerically compare it with several state-of-the-art SVD solvers under the MATLAB environment. Extensive computational results show that on a wide range of large unstructured matrices, the proposed algorithm can often provide improved efficiency or robustness over existing algorithms.
Article
We extend the scope of analysis for linesearch optimization algorithms on (possibly infinite-dimensional) Riemannian manifolds to the convergence analysis of the BFGS quasi-Newton scheme and the Fletcher–Reeves conjugate gradient iteration. Numerical implementations for exemplary problems in shape spaces show the practical applicability of these methods.
Article
This paper deals with the ground state of an interacting electron gas in an external potential v(r). It is proved that there exists a universal functional of the density, Fn(r), independent of v(r), such that the expression Ev(r)n(r)dr+Fn(r) has as its minimum value the correct ground-state energy associated with v(r). The functional Fn(r) is then discussed for two situations: (1) n(r)=n0+n(r), n/n01, and (2) n(r)= (r/r0) with arbitrary and r0. In both cases F can be expressed entirely in terms of the correlation energy and linear and higher order electronic polarizabilities of a uniform electron gas. This approach also sheds some light on generalized Thomas-Fermi methods and their limitations. Some new extensions of these methods are presented.
Article
Several recent computational studies have shown that the symmetric rank-one (SR1) update is a very competitive quasi-Newton update in optimization algorithms. This paper gives a new analysis of a trust region SR1 method for unconstrained optimization and shows that the method has an n+1 step q-superlinear rate of convergence. The analysis makes neither of the assumptions of uniform linear independence of the iterates nor positive definiteness of the Hessian approximations that have been made in other recent analyses of SR1 methods. The trust region method that is analyzed is fairly standard, except that it includes the feature that the Hessian approximation is updated after all steps, including rejected steps. We also present computational results that show that this feature, safeguarded in a way that is consistent with the convergence analysis, does not harm the efficiency of the SR1 trust region method.
Article
Hybrid density functionals are very successful in describing a wide range of molecular properties accurately. In large molecules and solids, however, calculating the exact (Hartree-Fock) exchange is computationally expensive, especially for systems with metallic characteristics. In the present work, we develop a new hybrid density functional based on a screened Coulomb potential for the exchange interaction which circumvents this bottleneck. The results obtained for structural and thermodynamic properties of molecules are comparable in quality to the most widely used hybrid functionals. In addition, we present results of periodic boundary condition calculations for both semiconducting and metallic single wall carbon nanotubes. Using a screened Coulomb potential for Hartree-Fock exchange enables fast and accurate hybrid calculations, even of usually difficult metallic systems. The high accuracy of the new screened Coulomb potential hybrid, combined with its computational advantages, makes it widely applicable to large molecules and periodic systems.
Article
We present the field of computational chemistry from the standpoint of nu-merical analysis. We introduce the most commonly used models and comment on their applicability. We briefly outline the results of mathematical analysis and then mostly concentrate on the main issues raised by numerical simu-lations. A special emphasis is laid on recent results in numerical analysis, recent developments of new methods and challenging open issues.
Article
We study conditions under which line search Newton methods for nonlinear systems of equations and optimization fail due to the presence of singular non-stationary points. These points are not solutions of the problem and are characterized by the fact that Jacobian or Hessian matrices are singular. It is shown that, for systems of nonlinear equations, the interaction between the Newton direction and the merit function can prevent the iterates from escaping such non-stationary points. The unconstrained minimization problem is also studied, and conditions under which false convergence cannot occur are presented. Several examples illustrating failure of Newton iterations for constrained optimization are also presented. The paper also shows that a class of line search feasible interior methods cannot exhibit convergence to non-stationary points.
Article
An Adaptive Regularisation framework using Cubics (ARC) was proposed for unconstrained optimization and analysed in Cartis, Gould and Toint (Part I, Math Program, doi:10.1007/s10107-009-0286-5, 2009), generalizing at the same time an unpublished method due to Griewank (Technical Report NA/12, 1981, DAMTP, University of Cambridge), an algorithm by Nesterov and Polyak (Math Program 108(1):177–205, 2006) and a proposal by Weiser, Deuflhard and Erdmann (Optim Methods Softw 22(3):413–431, 2007). In this companion paper, we further the analysis by providing worst-case global iteration complexity bounds for ARC and a second-order variant to achieve approximate first-order, and for the latter second-order, criticality of the iterates. In particular, the second-order ARC algorithm requires at most O(ϵ3/2){\mathcal{O}(\epsilon^{-3/2})} iterations, or equivalently, function- and gradient-evaluations, to drive the norm of the gradient of the objective below the desired accuracy ϵ{\epsilon}, and O(ϵ3){\mathcal{O}(\epsilon^{-3})} iterations, to reach approximate nonnegative curvature in a subspace. The orders of these bounds match those proved for Algorithm 3.3 of Nesterov and Polyak which minimizes the cubic model globally on each iteration. Our approach is more general in that it allows the cubic model to be solved only approximately and may employ approximate Hessians.
Article
An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank (Technical Report NA/12, 1981, DAMTP, University of Cambridge), an algorithm by Nesterov and Polyak (Math Program 108(1):177–205, 2006) and a proposal by Weiser et al. (Optim Methods Softw 22(3):413–431, 2007). At each iteration of our approach, an approximate global minimizer of a local cubic regularisation of the objective function is determined, and this ensures a significant improvement in the objective so long as the Hessian of the objective is locally Lipschitz continuous. The new method uses an adaptive estimation of the local Lipschitz constant and approximations to the global model-minimizer which remain computationally-viable even for large-scale problems. We show that the excellent global and local convergence properties obtained by Nesterov and Polyak are retained, and sometimes extended to a wider class of problems, by our ARC approach. Preliminary numerical experiments with small-scale test problems from the CUTEr set show encouraging performance of the ARC algorithm when compared to a basic trust-region implementation.
Article
Despite the remarkable thermochemical accuracy of Kohn–Sham density-functional theories with gradient corrections for exchange-correlation [see, for example, A. D. Becke, J. Chem. Phys. 96, 2155 (1992)], we believe that further improvements are unlikely unless exact-exchange information is considered. Arguments to support this view are presented, and a semiempirical exchange-correlation functional containing local-spin-density, gradient, and exact-exchange terms is tested on 56 atomization energies, 42 ionization potentials, 8 proton affinities, and 10 total atomic energies of first- and second-row systems. This functional performs significantly better than previous functionals with gradient corrections only, and fits experimental atomization energies with an impressively small average absolute deviation of 2.4 kcal/mol.
Article
From a theory of Hohenberg and Kohn, approximation methods for treating an inhomogeneous system of interacting electrons are developed. These methods are exact for systems of slowly varying or high density. For the ground state, they lead to self-consistent equations analogous to the Hartree and Hartree-Fock equations, respectively. In these equations the exchange and correlation portions of the chemical potential of a uniform electron gas appear as additional effective potentials. (The exchange portion of our effective potential differs from that due to Slater by a factor of 23.) Electronic systems at finite temperatures and in magnetic fields are also treated by similar methods. An appendix deals with a further correction for systems with short-wavelength density oscillations.
Article
We derive compact representations of BFGS and symmetric rank-one matrices for optimization. These representations allow us to efficiently implement limited memory methods for large constrained optimization problems. In particular, we discuss how to compute projections of limited memory matrices onto subspaces. We also present a compact representation of the matrices generated by Broyden's update for solving systems of nonlinear equations. Key words: Quasi-Newton method, constrained optimization, limited memory method, large-scale optimization. Abbreviated title: Representation of quasi-Newton matrices. 1. Introduction. Limited memory quasi-Newton methods are known to be effective techniques for solving certain classes of large-scale unconstrained optimization problems (Buckley and Le Nir (1983), Liu and Nocedal (1989), Gilbert and Lemar'echal (1989)) . They make simple approximations of Hessian matrices, which are often good enough to provide a fast rate of linear convergence, and re...