Preprint

Optimistic Noise-Aware Sequential Quadratic Programming for Equality Constrained Optimization with Rank-Deficient Jacobians

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

We propose and analyze a sequential quadratic programming algorithm for minimizing a noisy nonlinear smooth function subject to noisy nonlinear smooth equality constraints. The algorithm uses a step decomposition strategy and, as a result, is robust to potential rank-deficiency in the constraints, allows for two different step size strategies, and has an early stopping mechanism. Under the linear independence constraint qualification, convergence is established to a neighborhood of a first-order stationary point, where the radius of the neighborhood is proportional to the noise levels in the objective function and constraints. Moreover, in the rank-deficient setting, the merit parameter may converge to zero, and convergence to a neighborhood of an infeasible stationary point is established. Numerical experiments demonstrate the efficiency and robustness of the proposed method.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper, we propose a stochastic method for solving equality constrained optimization problems that utilizes predictive variance reduction. Specifically, we develop a method based on the sequential quadratic programming paradigm that employs variance reduction in the gradient approximations. Under reasonable assumptions, we prove that a measure of first-order stationarity evaluated at the iterates generated by our proposed algorithm converges to zero in expectation from arbitrary starting points, for both constant and adaptive step size strategies. Finally, we demonstrate the practical performance of our proposed algorithm on constrained binary classification problems that arise in machine learning.
Article
Full-text available
Intrinsic noise in objective function and derivatives evaluations may cause premature termination of optimization algorithms. Evaluation complexity bounds taking this situation into account are presented in the framework of a deterministic trust-region method. The results show that the presence of intrinsic noise may dominate these bounds, in contrast with what is known for methods in which the inexactness in function and derivatives’ evaluations is fully controllable. Moreover, the new analysis provides estimates of the optimality level achievable, should noise cause early termination. Numerical experiments are reported that support the theory. The analysis finally sheds some light on the impact of inexact computer arithmetic on evaluation complexity.
Book
Full-text available
Full text available at: https://mdobook.github.io/ Based on course-tested material, this rigorous yet accessible graduate textbook covers both fundamental and advanced optimization theory and algorithms. It covers a wide range of numerical methods and topics, including both gradient-based and gradient-free algorithms, multidisciplinary design optimization, and uncertainty, with instruction on how to determine which algorithm should be used for a given application. It also provides an overview of models and how to prepare them for use with numerical optimization, including derivative computation. Over 400 high-quality visualizations and numerous examples facilitate understanding of the theory, and practical tips address common issues encountered in practical engineering design optimization and how to address them. Numerous end-of-chapter homework problems, progressing in difficulty, help put knowledge into practice. Accompanied online by a solutions manual for instructors and source code for problems, this is ideal for a one- or two-semester graduate course on optimization in aerospace, civil, mechanical, electrical, and chemical engineering departments.
Article
Full-text available
Simulation Optimization (SO) refers to the optimization of an objective function subject to constraints, both of which can be evaluated through a stochastic simulation. To address specific features of a particular simulation---discrete or continuous decisions, expensive or cheap simulations, single or multiple outputs, homogeneous or heterogeneous noise---various algorithms have been proposed in the literature. As one can imagine, there exist several competing algorithms for each of these classes of problems. This document emphasizes the difficulties in simulation optimization as compared to mathematical programming, makes reference to state-of-the-art algorithms in the field, examines and contrasts the different approaches used, reviews some of the diverse applications that have been tackled by these methods, and speculates on future directions in the field.
Article
Full-text available
We present a line search algorithm for large-scale constrained optimization that is robust and efficient even for problems with (nearly) rank-deficient Jacobian matrices. The method is matrix-free (i.e., it does not require explicit storage or factorizations of derivative matrices), allows for inexact step computations, and is applicable for nonconvex problems. The main components of the approach are a trust region subproblem for handling ill-conditioned or inconsistent linear models of the constraints and a process for attaining a sufficient reduction in a local model of a penalty function. We show that the algorithm is globally convergent to first-order optimal points or to stationary points of an infeasibility measure. Numerical results are presented.
Article
Full-text available
Information from various public and private data sources of extremely large sample sizes are now increasingly available for research purposes. Statistical methods are needed for utilizing information from such big data sources while analyzing data from individual studies that may collect more detailed information required for addressing specific hypotheses of interest. In this article, we consider the problem of building regression models based on individual-level data from an “internal” study while utilizing summary-level information, such as information on parameters for reduced models, from an “external” big data source. We identify a set of very general constraints that link internal and external models. These constraints are used to develop a framework for semiparametric maximum likelihood inference that allows the distribution of covariates to be estimated using either the internal sample or an external reference sample. We develop extensions for handling complex stratified sampling designs, such as case-control sampling, for the internal study. Asymptotic theory and variance estimators are developed for each case. We use simulation studies and a real data application to assess the performance of the proposed methods in contrast to the generalized regression (GR) calibration methodology that is popular in the sample survey literature.
Article
Full-text available
Line search methods are proposed for nonlinear programming using Fletcher and Leyffer's filter method [Math. Program., 91 (2002), pp. 239--269], which replaces the traditional merit function. Their global convergence properties are analyzed. The presented framework is applied to active set sequential quadratic programming (SQP) and barrier interior point algorithms. Under mild assumptions it is shown that every limit point of the sequence of iterates generated by the algorithm is feasible, and that there exists at least one limit point that is a stationary point for the problem under consideration. A new alternative filter approach employing the Lagrangian function instead of the objective function with identical global convergence properties is briefly discussed.
Article
Full-text available
CG, SYMMLQ, and MINRES are Krylov subspace methods for solving symmetric systems of linear equations. When these methods are applied to an incompatible system (that is, a singular symmetric least-squares problem), CG could break down and SYMMLQ's solution could explode, while MINRES would give a least-squares solution but not necessarily the minimum-length (pseudoinverse) solution. This understanding motivates us to design a MINRES-like algorithm to compute minimum-length solutions to singular symmetric systems. MINRES uses QR factors of the tridiagonal matrix from the Lanczos process (where R is upper-tridiagonal). MINRES-QLP uses a QLP decomposition (where rotations on the right reduce R to lower-tridiagonal form). On ill-conditioned systems (singular or not), MINRES-QLP can give more accurate solutions than MINRES. We derive preconditioned MINRES-QLP, new stopping rules, and better estimates of the solution and residual norms, the matrix norm, and the condition number.
Article
Full-text available
The approximate minimization of a quadratic function within an ellipsoidal trust region is an important subproblem for many nonlinear programming methods. When the number of variables is large, the most widely-used strategy is to trace the path of conjugate gradient iterates either to convergence or until it reaches the trust-region boundary. In this paper, we investigate ways of continuing the process once the boundary has been encountered. The key is to observe that the trust-region problem within the currently generated Krylov subspace has very special structure which enables it to be solved very efficiently. We compare the new strategy with existing methods. The resulting software package is available as HSL VF05 within the Harwell Subroutine Library. 1 Department for Computation and Information, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England, EU Email : n.gould@rl.ac.uk 2 Current reports available by anonymous ftp from joyous-gard.cc.rl.ac.uk (internet ...
Article
Full-text available
For each n in ℕ let Xn = [(Xn) jk]j.k=1n be a random Hermitian matrix such that the n2 random variables √n(Xn)ii, √2nRe((Xn)ij)i<j, √2n Im((X n)ij)i<j are independent, identically distributed, with common distribution μ on ℝ. Let Xn(1) , ... , Xn(r) be r independent copies of Xn and (x1,... ,xr) be a semicircular system in a C*-probability space with a faithful state. Assuming that μ is symmetric and satisfies a Poincaré inequality, we show that, almost everywhere, for any non commutative polynomial p in r variables, (0.1) lim n→ + ∞ ∥ p (Xn(1) ,..., Xn(r)) ∥ = ∥ (x1 ,...,xr)∥. We follow the method of [10] and [17] which gave (0.1) in the Gaussian (complex, real or symplectic) case. We also get that (0.1) remains true when the X n(i) are Wishart matrices while the Xi are Marchenko-Pastur distributed. Indiana University Mathematics Journal
Article
A key component of variational quantum algorithms (VQAs) is the choice of classical optimizer employed to update the parameterization of an ansatz. It is well recognized that quantum algorithms will, for the foreseeable future, necessarily be run on noisy devices with limited fidelities. Thus, the evaluation of an objective function (e.g., the guiding function in the quantum approximate optimization algorithm (QAOA) or the expectation of the electronic Hamiltonian in variational quantum eigensolver (VQE)) required by a classical optimizer is subject not only to stochastic error from estimating an expected value but also to error resulting from intermittent hardware noise. Model-based derivative-free optimization methods have emerged as popular choices of a classical optimizer in the noisy VQA setting, based on empirical studies. However, these optimization methods were not explicitly designed with the consideration of noise. In this work we adapt recent developments from the “noise-aware numerical optimization” literature to these commonly used derivative-free model-based methods. We introduce the key defining characteristics of these novel noise-aware derivative-free model-based methods that separate them from standard model-based methods. We study an implementation of such noise-aware derivative-free model-based methods and compare its performance on demonstrative VQA simulations to classical solvers packaged in scikit-quant. History: Accepted by Giacomo Nannicini, Area Editor for Quantum Computing and Operations Research. Accepted for Special Issue. Funding: This material is based upon work supported by the U.S. Department of Energy, Office of Science, National Quantum Information Science Research Centers and the Office of Advanced Scientific Computing Research, Accelerated Research for Quantum Computing program under contract number DE-AC02-06CH11357. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0177 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0177 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Article
A stochastic algorithm is proposed, analyzed, and tested experimentally for solving continuous optimization problems with nonlinear equality constraints. It is assumed that constraint function and derivative values can be computed but that only stochastic approximations are available for the objective function and its derivatives. The algorithm is of the sequential quadratic optimization variety. Distinguishing features of the algorithm are that it only employs stochastic objective gradient estimates that satisfy a relatively weak set of assumptions (while using neither objective function values nor estimates of them) and that it allows inexact subproblem solutions to be employed, the latter of which is particularly useful in large-scale settings when the matrices defining the subproblems are too large to form and/or factorize. Conditions are imposed on the inexact subproblem solutions that account for the fact that only stochastic objective gradient estimates are employed. Convergence results are established for the method. Numerical experiments show that the proposed method vastly outperforms a stochastic subgradient method and can outperform an alternative sequential quadratic programming algorithm that employs highly accurate subproblem solutions in every iteration. Funding: This material is based upon work supported by the National Science Foundation [Awards CCF-1740796 and CCF-2139735] and the Office of Naval Research [Award N00014-21-1-2532].
Article
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear-equality-constrained optimization problems in which the objective function is defined by an expectation. The algorithmic structure of the proposed method is based on a step decomposition strategy that is known in the literature to be widely effective in practice, wherein each search direction is computed as the sum of a normal step (toward linearized feasibility) and a tangential step (toward objective decrease in the null space of the constraint Jacobian). However, the proposed method is unique from others in the literature in that it both allows the use of stochastic objective gradient estimates and possesses convergence guarantees even in the setting in which the constraint Jacobians may be rank-deficient. The results of numerical experiments demonstrate that the algorithm offers superior performance when compared with popular alternatives. Funding: This material is based upon work supported by the U.S. National Science Foundation’s Division of Computing and Communication Foundations under award [CF-1740796], by the Office of Naval Research under award [N00014-21-1-2532], and by the National Science Foundation under award [2030859] to the Computing Research Association for the CIFellows Project.
Article
In this paper, we present convergence guarantees for a modified trust-region method designed for minimizing objective functions whose value and gradient and Hessian estimates are computed with noise. These estimates are produced by generic stochastic oracles, which are not assumed to be unbiased or consistent. We introduce these oracles and show that they are more general and have more relaxed assumptions than the stochastic oracles used in prior literature on stochastic trust-region methods. Our method utilizes a relaxed step acceptance criterion and a cautious trust-region radius updating strategy which allows us to derive exponentially decaying tail bounds on the iteration complexity for convergence to points that satisfy approximate first- and second-order optimality conditions. Finally, we present two sets of numerical results. We first explore the tightness of our theoretical results on an example with adversarial zeroth- and first-order oracles. We then investigate the performance of the modified trust-region algorithm on standard noisy derivative-free optimization problems.
Article
A worst-case complexity bound is proved for a sequential quadratic optimization (commonly known as SQP) algorithm that has been designed for solving optimization problems involving a stochastic objective function and deterministic nonlinear equality constraints. Barring additional terms that arise due to the adaptivity of the monotonically nonincreasing merit parameter sequence, the proved complexity bound is comparable to that known for the stochastic gradient algorithm for unconstrained nonconvex optimization. The overall complexity bound, which accounts for the adaptivity of the merit parameter sequence, shows that a result comparable to the unconstrained setting (with additional logarithmic factors) holds with high probability.
Article
Classical trust region methods were designed to solve problems in which function and gradient information are exact. This paper considers the case when there are errors (or noise) in the above computations and proposes a simple modification of the trust region method to cope with these errors. The new algorithm only requires information about the size/standard deviation of the errors in the function evaluations and incurs no additional computational expense. It is shown that, when applied to a smooth (but not necessarily convex) objective function, the iterates of the algorithm visit a neighborhood of stationarity infinitely often, assuming errors in the function and gradient evaluations are bounded. It is also shown that, after visiting the above neighborhood for the first time, the iterates cannot stray too far from it, as measured by the objective value. Numerical results illustrate how the classical trust region algorithm may fail in the presence of noise, and how the proposed algorithm ensures steady progress towards stationarity in these cases.
Article
We consider solving nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We assume for the objective that its evaluation, gradient, and Hessian are inaccessible, while one can compute their stochastic estimates by, for example, subsampling. We propose a stochastic algorithm based on sequential quadratic programming (SQP) that uses a differentiable exact augmented Lagrangian as the merit function. To motivate our algorithm design, we first revisit and simplify an old SQP method Lucidi (J. Optim. Theory Appl. 67(2): 227–245, 1990) developed for solving deterministic problems, which serves as the skeleton of our stochastic algorithm. Based on the simplified deterministic algorithm, we then propose a non-adaptive SQP for dealing with stochastic objective, where the gradient and Hessian are replaced by stochastic estimates but the stepsizes are deterministic and prespecified. Finally, we incorporate a recent stochastic line search procedure Paquette and Scheinberg (SIAM J. Optim. 30(1): 349–376 2020) into the non-adaptive stochastic SQP to adaptively select the random stepsizes, which leads to an adaptive stochastic SQP. The global “almost sure” convergence for both non-adaptive and adaptive SQP methods is established. Numerical experiments on nonlinear problems in CUTEst test set demonstrate the superiority of the adaptive algorithm.
Article
In many optimization problems arising from scientific, engineering and artificial intelligence applications, objective and constraint functions are available only as the output of a black-box or simulation oracle that does not provide derivative information. Such settings necessitate the use of methods for derivative-free, or zeroth-order, optimization. We provide a review and perspectives on developments in these methods, with an emphasis on highlighting recent developments and on unifying treatment of such problems in the non-linear optimization and machine learning literature. We categorize methods based on assumed properties of the black-box functions, as well as features of the methods. We first overview the primary setting of deterministic methods applied to unconstrained, non-convex optimization problems where the objective function is defined by a deterministic black-box oracle. We then discuss developments in randomized methods, methods that assume some additional structure about the objective (including convexity, separability and general non-smooth compositions), methods for problems where the output of the black-box oracle is stochastic, and methods for handling different types of constraints.
Article
This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval h based on the noise estimation techniques of Hamming (2012) and Mor\'e and Wild (2011). This noise estimation procedure and the selection of h are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a model based trust region method are presented.
Book
Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.
Conference Paper
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limited. In this paper, we propose nonconvex stochastic Frank-Wolfe methods and analyze their convergence properties. Furthermore, for objective functions that decompose into a finite-sum, we leverage ideas from variance reduction for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method.
Article
We present a trust region-based method for the general nonlinearly equality constrained optimization problem. The method works by iteratively minimizing a quadratic model of the Lagrangian subject to a possibly relaxed linearization of the problem constraints and a trust region constraint. The model minimization may be done approximately with a dogleg-type approach. We show that this method is globally convergent even if singular or indefinite Hessian approximations are made. A second-order correction step that brings the iterates closer to the feasible set is described.
Article
Algorithms based on trust regions have been shown to be robust methods for unconstrained optimization problems. All existing methods, either based on the dogleg strategy or Hebden-More iterations, require solution of system of linear equations. In large scale optimization this may be prohibitively expensive. It is shown in this paper that an approximate solution of the trust region problem may be found by the preconditioned conjugate gradient method. This may be regarded as a generalized dogleg technique where we asymptotically take the inexact quasi-Newton step. We also show that we have the same convergence properties as existing methods based on the dogleg strategy using an approximate Hessian.
Article
Perturbation estimates for the square root and Pythagorean sum of complex matrices are proved. We present bounds in the spectral norm for the cases that R and P are accretive, i.e. have positive definite Hermitian parts, with special attention to the case of a large condition of A. The results are based on bounds for the solution of the matrix Sylvester equation and for the separation of two matrices.
Article
Recently developed Newton and quasi-Newton methods for nonlinear programming possess only local convergence properties. Adopting the concept of the damped Newton method in unconstrained optimization, we propose a stepsize procedure to maintain the monotone decrease of an exact penalty function. In so doing, the convergence of the method is globalized.
Article
In this paper, we consider the singular values and singular vectors of finite, low rank perturbations of large rectangular random matrices. Specifically, we prove almost sure convergence of the extreme singular values and appropriate projections of the corresponding singular vectors of the perturbed matrix. As in the prequel, where we considered the eigenvalue aspect of the problem, the non-random limiting value is shown to depend explicitly on the limiting singular value distribution of the unperturbed matrix via an integral transforms that linearizes rectangular additive convolution in free probability theory. The large matrix limit of the extreme singular values of the perturbed matrix differs from that of the original matrix if and only if the singular values of the perturbing matrix are above a certain critical threshold which depends on this same aforementioned integral transform. We examine the consequence of this singular value phase transition on the associated left and right singular eigenvectors and discuss the finite n fluctuations above these non-random limits.
Article
Thesis (Ph. D.)--University of Colorado, 1989. Includes bibliographical references (leaves [107]-112).
Article
We propose performance profiles-distribution functions for a performance metric-as a tool for benchmarking and comparing optimization software. We show that performance profiles combine the best features of other tools for performance evaluation.
Modified line search sequential quadratic methods for equality-constrained optimization with unified global and local convergence guarantees
  • S Albert
  • Raghu Berahas
  • Jiahao Bollapragada
  • Shi
Albert S Berahas, Raghu Bollapragada, and Jiahao Shi. Modified line search sequential quadratic methods for equality-constrained optimization with unified global and local convergence guarantees. arXiv preprint arXiv:2406.11144, 2024.
An interior-point algorithm for continuous nonlinearly constrained optimization with noisy function and derivative evaluations
  • E Frank
  • Shima Curtis
  • Andreas Dezfulian
  • Waechter
Frank E Curtis, Shima Dezfulian, and Andreas Waechter. An interior-point algorithm for continuous nonlinearly constrained optimization with noisy function and derivative evaluations. arXiv preprint arXiv:2502.11302, 2025.
A stochasticgradient-based interior-point algorithm for solving smooth bound-constrained optimization problems
  • E Frank
  • Vyacheslav Curtis
  • Kungurtsev
  • P Daniel
  • Qi Robinson
  • Wang
Frank E Curtis, Vyacheslav Kungurtsev, Daniel P Robinson, and Qi Wang. A stochasticgradient-based interior-point algorithm for solving smooth bound-constrained optimization problems. arXiv preprint arXiv:2304.14907, 2023.
On the convergence of interior-point methods for bound-constrained nonlinear optimization problems with noise
  • Shima Dezfulian
  • Andreas Wächter
Shima Dezfulian and Andreas Wächter. On the convergence of interior-point methods for bound-constrained nonlinear optimization problems with noise. arXiv preprint arXiv:2405.11400, 2024.
S2mpj and cutest optimization problems for matlab, python and julia
  • Serge Gratton
  • Philippe L Toint
Serge Gratton and Philippe L Toint. S2mpj and cutest optimization problems for matlab, python and julia. arXiv preprint arXiv:2407.07812, 2024.
Noise-tolerant optimization methods for the solution of a robust design problem
  • Yuchen Lou
  • Shigeng Sun
  • Jorge Nocedal
Yuchen Lou, Shigeng Sun, and Jorge Nocedal. Noise-tolerant optimization methods for the solution of a robust design problem. arXiv preprint arXiv:2401.15007, 2024.
  • Pablo Márquez-Neila
  • Mathieu Salzmann
  • Pascal Fua
Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. Imposing hard constraints on deep networks: Promises and limitations. arXiv preprint arXiv:1706.02025, 2017.
A fast algorithm for nonlinearly constrained optimization calculations
  • J D Michael
  • Powell
Michael JD Powell. A fast algorithm for nonlinearly constrained optimization calculations. In Numerical Analysis: Proceedings of the Biennial Conference Held at Dundee, June 28-July 1, 1977, pages 144-157. Springer, 2006.
Matrix perturbation theory
  • W Gilbert
  • Stewart
Gilbert W Stewart. Matrix perturbation theory. 1990.
A trust-region algorithm for noisy equality constrained optimization
  • Shigeng Sun
  • Jorge Nocedal
Shigeng Sun and Jorge Nocedal. A trust-region algorithm for noisy equality constrained optimization. arXiv preprint arXiv:2411.02665, 2024.