Jorge Nocedal’s research while affiliated with Northwestern University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (149)


Figure 4.1: Distance to optimality (log 2 (x k − x * )) vs iteration number for 1 = 2 = 10 −3
Constrained Optimization in the Presence of Noise
  • Preprint
  • File available

October 2021

·

104 Reads

Figen Oztoprak

·

Richard Byrd

·

Jorge Nocedal

The problem of interest is the minimization of a nonlinear function subject to nonlinear equality constraints using a sequential quadratic programming (SQP) method. The minimization must be performed while observing only noisy evaluations of the objective and constraint functions. In order to obtain stability, the classical SQP method is modified by relaxing the standard Armijo line search based on the noise level in the functions, which is assumed to be known. Convergence theory is presented giving conditions under which the iterates converge to a neighborhood of the solution characterized by the noise level and the problem conditioning. The analysis assumes that the SQP algorithm does not require regularization or trust regions. Numerical experiments indicate that the relaxed line search improves the practical performance of the method on problems involving uniformly distributed noise. One important application of this work is in the field of derivative-free optimization, when finite differences are employed to estimate gradients.

Download

On the Numerical Performance of Derivative-Free Optimization Methods Based on Finite-Difference Approximations

February 2021

·

53 Reads

The goal of this paper is to investigate an approach for derivative-free optimization that has not received sufficient attention in the literature and is yet one of the simplest to implement and parallelize. It consists of computing gradients of a smoothed approximation of the objective function (and constraints), and employing them within established codes. These gradient approximations are calculated by finite differences, with a differencing interval determined by the noise level in the functions and a bound on the second or third derivatives. It is assumed that noise level is known or can be estimated by means of difference tables or sampling. The use of finite differences has been largely dismissed in the derivative-free optimization literature as too expensive in terms of function evaluations and/or as impractical when the objective function contains noise. The test results presented in this paper suggest that such views should be re-examined and that the finite-difference approach has much to be recommended. The tests compared NEWUOA, DFO-LS and COBYLA against the finite-difference approach on three classes of problems: general unconstrained problems, nonlinear least squares, and general nonlinear programs with equality constraints.


Constrained and Composite Optimization via Adaptive Sampling Methods

December 2020

·

33 Reads

The motivation for this paper stems from the desire to develop an adaptive sampling method for solving constrained optimization problems in which the objective function is stochastic and the constraints are deterministic. The method proposed in this paper is a proximal gradient method that can also be applied to the composite optimization problem min f(x) + h(x), where f is stochastic and h is convex (but not necessarily differentiable). Adaptive sampling methods employ a mechanism for gradually improving the quality of the gradient approximation so as to keep computational cost to a minimum. The mechanism commonly employed in unconstrained optimization is no longer reliable in the constrained or composite optimization settings because it is based on pointwise decisions that cannot correctly predict the quality of the proximal gradient step. The method proposed in this paper measures the result of a complete step to determine if the gradient approximation is accurate enough; otherwise a more accurate gradient is generated and a new step is computed. Convergence results are established both for strongly convex and general convex f. Numerical experiments are presented to illustrate the practical behavior of the method.


A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization

October 2020

·

52 Reads

This paper describes an extension of the BFGS and L-BFGS methods for the minimization of a nonlinear function subject to errors. This work is motivated by applications that contain computational noise, employ low-precision arithmetic, or are subject to statistical noise. The classical BFGS and L-BFGS methods can fail in such circumstances because the updating procedure can be corrupted and the line search can behave erratically. The proposed method addresses these difficulties and ensures that the BFGS update is stable by employing a lengthening procedure that spaces out the points at which gradient differences are collected. A new line search, designed to tolerate errors, guarantees that the Armijo-Wolfe conditions are satisfied under most reasonable conditions, and works in conjunction with the lengthening procedure. The proposed methods are shown to enjoy convergence guarantees for strongly convex functions. Detailed implementations of the methods are presented, together with encouraging numerical results.



Analysis of the BFGS Method with Errors

January 2019

·

80 Reads

The classical convergence analysis of quasi-Newton methods assumes that the function and gradients employed at each iteration are exact. In this paper, we consider the case when there are (bounded) errors in both computations and establish conditions under which a slight modification of the BFGS algorithm with an Armijo-Wolfe line search converges to a neighborhood of the solution that is determined by the size of the errors. One of our results is an extension of the analysis presented in Byrd, R. H., & Nocedal, J. (1989), which establishes that, for strongly convex functions, a fraction of the BFGS iterates are good iterates. We present numerical results illustrating the performance of the new BFGS method in the presence of noise.


Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods

March 2018

·

79 Reads

·

115 Citations

SIAM Journal on Optimization

This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval h based on the noise estimation techniques of Hamming (2012) and Mor\'e and Wild (2011). This noise estimation procedure and the selection of h are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a model based trust region method are presented.


Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods

March 2018

This paper presents a finite difference quasi-Newton method for the minimization of noisy functions. The method takes advantage of the scalability and power of BFGS updating, and employs an adaptive procedure for choosing the differencing interval h based on the noise estimation techniques of Hamming (2012) and Mor\'e and Wild (2011). This noise estimation procedure and the selection of h are inexpensive but not always accurate, and to prevent failures the algorithm incorporates a recovery mechanism that takes appropriate action in the case when the line search procedure is unable to produce an acceptable point. A novel convergence analysis is presented that considers the effect of a noisy line search procedure. Numerical experiments comparing the method to a function interpolating trust region method are presented.


A Progressive Batching L-BFGS Method for Machine Learning

February 2018

·

508 Reads

·

91 Citations

·

·

Jorge Nocedal

·

[...]

·

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.


Adaptive Sampling Strategies for Stochastic Optimization

October 2017

·

2 Reads

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the regular computation of full gradients, the proposed method reduces variance by increasing the sample size as needed. The decision to increase the sample size is governed by an inner product test that ensures that search directions are descent directions with high probability. We show that the inner product test improves upon the well known norm test, and can be used as a basis for an algorithm that is globally convergent on nonconvex functions and enjoys a global linear rate of convergence on strongly convex functions. Numerical experiments on logistic regression problems illustrate the performance of the algorithm.


Citations (83)


... Various algorithms have been designed to solve deterministic equality-constrained optimization problems (see [6,11] for further references), while recent research has focused on developing stochastic optimization algorithms. There has been a growing interest in adapting line search and trust region methods in stochastic framework for unconstrained optimization problems [1-5, 9, 10, 12, 14, 20, 22, 24, 26-28], but significantly fewer algorithms have been proposed to solve stochastic equality-constrained optimization problems (see [6] for further references and [7,13,15,34,37]). ...

Reference:

IPAS: An Adaptive Sample Size Method for Weighted Finite Sum Problems with Linear Equality Constraints
Constrained Optimization in the Presence of Noise
  • Citing Article
  • August 2023

SIAM Journal on Optimization

... Deterministic version of this condition has been used in [2] for the analysis of a proximal inexact trust-region algorithm. A stochastic version imposed in expectation was used in [4] and an alternative, that is meant to be more practical, is suggested in [60]. Further variants for general constrained optimization are proposed in [8]. ...

Constrained and composite optimization via adaptive sampling methods
  • Citing Article
  • May 2023

IMA Journal of Numerical Analysis

... In this paper, we focus on noise-aware algorithms for solving such problems, i.e., algorithms that exploit information about the noise and that are adaptive. In the unconstrained and bounded noise setting, several noise-aware algorithms that leverage noise-level dependent constants (e.g., ϵ f and ϵ g ) to evaluate the acceptability of steps within line search [5,6,29,48] or trust region [2,12,30,44] methods have been proposed. A natural extension of these algorithms to the constrained setting assumes bounded noise in the objective function and associated derivatives, and possibly in the constraint functions. ...

A trust region method for noisy unconstrained optimization
  • Citing Article
  • March 2023

Mathematical Programming

... Our findings indicate a general superiority of newly developed methods over the basic version of IGD (inexact gradient descent) method without momentum. As discussed in [20,45], IGD in general outperforms other well-developed methods in derivative-free optimization including FMINSEARCH, i.e., the Nelder-Mead simplex-based method from [25], the implicit filtering algorithms [10], and the random gradient-free algorithm for smooth optimization proposed by Nesterov and Spokoiny [34]. As a consequence, IGDm can be recommended as a preferable optimizer for derivative-free smooth (convex and nonconvex) optimization problems. ...

On the numerical performance of finite-difference-based methods for derivative-free optimization
  • Citing Article
  • September 2022

Optimization Methods and Software

... Finally, it would be interesting to compare our methods with recent results on adaptive finite-difference methods [35], which automatically adjust the finite-difference interval to balance truncation error and measurement error, making them suitable for noisy derivativefree optimization. We keep these questions for further research. ...

Adaptive Finite-Difference Interval Estimation for Noisy Derivative-Free Optimization
  • Citing Article
  • August 2022

SIAM Journal on Scientific Computing

... Byrd et al. [3] have proposed a stochastic quasi-Newton method in limited memory form through subsampled Hessian-vector products. Shi et al. [23] have proposed practical extensions of the BFGS and L-BFGS methods for nonlinear optimization that are capable of dealing with noise by employing a new linesearch technique. Xie et al. [24] have considered the convergence analysis of quasi-Newton methods when there are (bounded) errors in both function and gradient evaluations, and established conditions under which an Armijo-Wolfe linesearch on the noisy function yields sufficient decrease in the true objective function. ...

A Noise-Tolerant Quasi-Newton Algorithm for Unconstrained Optimization
  • Citing Article
  • March 2022

SIAM Journal on Optimization

... In this paper, we focus on noise-aware algorithms for solving such problems, i.e., algorithms that exploit information about the noise and that are adaptive. In the unconstrained and bounded noise setting, several noise-aware algorithms that leverage noise-level dependent constants (e.g., ϵ f and ϵ g ) to evaluate the acceptability of steps within line search [5,6,29,48] or trust region [2,12,30,44] methods have been proposed. A natural extension of these algorithms to the constrained setting assumes bounded noise in the objective function and associated derivatives, and possibly in the constraint functions. ...

Analysis of the BFGS Method with Errors
  • Citing Article
  • January 2020

SIAM Journal on Optimization

... In this paper, we focus on noise-aware algorithms for solving such problems, i.e., algorithms that exploit information about the noise and that are adaptive. In the unconstrained and bounded noise setting, several noise-aware algorithms that leverage noise-level dependent constants (e.g., ϵ f and ϵ g ) to evaluate the acceptability of steps within line search [5,6,29,48] or trust region [2,12,30,44] methods have been proposed. A natural extension of these algorithms to the constrained setting assumes bounded noise in the objective function and associated derivatives, and possibly in the constraint functions. ...

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods
  • Citing Article
  • March 2018

SIAM Journal on Optimization

... Due to the importance of machine learning and deep learning, [29] and [30] analyze quasi-Newton methods performance in these fields. Also, [31] and [32] seek to determine a suitable batch selection method for training machine learning models. ...

A Progressive Batching L-BFGS Method for Machine Learning