Book

Numerical Optimization

Authors:

Abstract

Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.

Chapters (10)

... The new iterate is well defined in case DR MMD (X k , ⃗ λ k ) is invertible, e.g., if J has full row rank and if ∇ 2 MMD(X) + S is positive definite on the tangent space of the constraints [26]. In the case where all the constraints h i are linear, the matrix S vanishes, leading to a significant reduction in the computational cost, as there is no need to compute the second-order derivative of the constraint. ...
... To ensure the Newton update in Eq. (10) is a descending direction, we must precondition the Hessian matrix. To realize this, we utilize the preconditioning Algorithm 3.3 from [26] (details in Appendix C.1). Lastly, we employ a backtracking line search with Armijo's condition [26] to determine the step-size s of the Newton step. ...
... To ensure the Newton update in Eq. (10) is a descending direction, we must precondition the Hessian matrix. To realize this, we utilize the preconditioning Algorithm 3.3 from [26] (details in Appendix C.1). Lastly, we employ a backtracking line search with Armijo's condition [26] to determine the step-size s of the Newton step. ...
Preprint
Maximum mean discrepancy (MMD) has been widely employed to measure the distance between probability distributions. In this paper, we propose using MMD to solve continuous multi-objective optimization problems (MOPs). For solving MOPs, a common approach is to minimize the distance (e.g., Hausdorff) between a finite approximate set of the Pareto front and a reference set. Viewing these two sets as empirical measures, we propose using MMD to measure the distance between them. To minimize the MMD value, we provide the analytical expression of its gradient and Hessian matrix w.r.t. the search variables, and use them to devise a novel set-oriented, MMD-based Newton (MMDN) method. Also, we analyze the theoretical properties of MMD's gradient and Hessian, including the first-order stationary condition and the eigenspectrum of the Hessian, which are important for verifying the correctness of MMDN. To solve complicated problems, we propose hybridizing MMDN with multiobjective evolutionary algorithms (MOEAs), where we first execute an EA for several iterations to get close to the global Pareto front and then warm-start MMDN with the result of the MOEA to efficiently refine the approximation. We empirically test the hybrid algorithm on 11 widely used benchmark problems, and the results show the hybrid (MMDN + MOEA) can achieve a much better optimization accuracy than EA alone with the same computation budget.
... Mixed-Integer Linear Programming (MILP) models are widely used for hydropower scheduling [42,51,63]. MILP formulations permit the inclusion of various constraints, allowing for a better representation of the STHS problem, such as reservoir storage, water release policies, and energy generation targets, to name a few [4,18,39]. In [57], a two-phase STHS optimization model is developed to first obtain the water discharge, the volume of the reservoir and the number of units working in each period, then determine which combination of turbines to use. ...
... parameters, known values that define the problem, variables, values to optimize the objective function, and constraints, which are restrictions on the problem, and bounds, which must be respected by the variables values in order to obtain a feasible solution [39]. The following formulation represents a generic form of a MILP minimization : ...
... In hydropower, the modelling of an MILP for the STHS problem is based on transforming the problem into mathematical functions, parameters and variables, using proper algorithms to find the optimal value, given the scope of the problem [39]. Compared to the MILP formulation, a machine learning model is built with historical values related to the problem. ...
Preprint
Full-text available
Hydropower generation plays a crucial role in the global energy landscape, offering a renewable and sustainable source of electricity. Accurate forecasting of hydropower output is essential for efficient energy management and maintaining grid stability. This paper presents an autoregressive Long Short-Term Memory (LSTM) model designed to predict short-term hydropower production, specifically targeting the hourly water output decisions of two interconnected hydropower plants located on the Péribonka River in Québec, Canada. Given the critical role of efficient scheduling in hydropower operations, especially within the Short-Term Hydropower Scheduling (STHS) problem, our model aims to offer a viable machine learning-based solution to complement traditional optimization approaches. We evaluated the LSTM model by comparing its predictive performance with historical operational data and results derived from a deterministic Mixed-Integer Linear Programming (MILP) model. Our analysis covers multiple validation instances, showcasing the capabilities of the model and highlighting its strengths and limitations. The results demonstrate that the autoregressive LSTM approach successfully captures the underlying patterns in water discharge decisions, providing predictions that are generally aligned with operational realities and optimized benchmarks. However, the study also underscores challenges such as maintaining reservoir volume constraints, particularly in periods of high inflow variability. Despite these challenges, the LSTM model presents promising predictive performance, laying the foundation for further improvements in integrating machine learning into short-term hydropower management. To our knowledge, this is the first study to apply an autoregressive supervised LSTM model to predict hourly water flow decisions in hydropower systems, thus significantly contributing to the advancement of machine learning applications in hydropower scheduling.
... In practice, many real-world optimization tasks are large-scale, nonconvex, and constrained. Typically, such problems lack the closed-form analytical solutions and cannot be solved efficiently by polynomial-time algorithms [2]. Consequently, the design of efficient numerical solvers remains a crucial research topic in optimization [3,4]. ...
... Nevertheless, the traditional numerical iterative solvers inherently rely on the sequential computation, which poses a significant obstacle to parallel implementation [12]. Generally, an unconstrained optimization iteration adopts the following form [2]: ...
... where x k denotes the current solution iterate, α k is the step size selected by line search strategies (e.g., golden section or Armijo rule [13]), and p k represents a descent direction (e.g., gradient or Newton-like direction). For constrained optimization problems, methods like penalty-based reformulation or Lagrangian dualization are commonly employed to transform constrained problems into unconstrained forms [2]. ...
Preprint
Full-text available
We propose an input convex neural network (ICNN)-based self-supervised learning framework to solve continuous constrained optimization problems. By integrating the augmented Lagrangian method (ALM) with the constraint correction mechanism, our framework ensures \emph{non-strict constraint feasibility}, \emph{better optimality gap}, and \emph{best convergence rate} with respect to the state-of-the-art learning-based methods. We provide a rigorous convergence analysis, showing that the algorithm converges to a Karush-Kuhn-Tucker (KKT) point of the original problem even when the internal solver is a neural network, and the approximation error is bounded. We test our approach on a range of benchmark tasks including quadratic programming (QP), nonconvex programming, and large-scale AC optimal power flow problems. The results demonstrate that compared to existing solvers (e.g., \texttt{OSQP}, \texttt{IPOPT}) and the latest learning-based methods (e.g., DC3, PDL), our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.
... The set C (U, G) here is called the critical cone. It is a classical result that if U 0 satisfies (6) and (7) then it is a strict local minimum; see [12,Theorem 12.6]. However, as pointed out by Murty and Kadabi [11], verifying that (7) holds is NP-hard in general, as it amounts to testing whether a submatrix of the Hessian is copositive. ...
... A point U 0 that satisfies (6) and (8) is called a second-order critical point; it can be either a local minimum or a higher-order saddle point, see [12,Theorem 12.5]. Our proof of the inexistence of spurious local minima for δ = 0 and r ⋆ = 1 work by using (6) and (8) to imply U U T = U ⋆ U T ⋆ . ...
... We also recall the derivatives of the balancing regularizer from (12). We are now ready to verify that the asymmetric case LR T inherits the same spurious local minimum L = R = U 0 = αQ 2 from the symmetric case by verifying the first-order necessary condition (6) and the second-order sufficient condition (7) . ...
Preprint
The classical low-rank matrix recovery problem is well-known to exhibit \emph{benign nonconvexity} under the restricted isometry property (RIP): local optimization is guaranteed to converge to the global optimum, where the ground truth is recovered. We investigate whether benign nonconvexity continues to hold when the factor matrices are constrained to be elementwise nonnegative -- a common practical requirement. In the simple setting of a rank-1 nonnegative ground truth, we confirm that benign nonconvexity holds in the fully-observed case with RIP constant δ=0\delta=0. Surprisingly, however, this property fails to extend to the partially-observed case with any arbitrarily small RIP constant δ0+\delta\to0^{+}, irrespective of rank overparameterization. This finding exposes a critical theoretical gap: the continuity argument widely used to explain the empirical robustness of low-rank matrix recovery fundamentally breaks down once nonnegative constraints are imposed.
... Since this dual formulation is convex and unconstrained, it can be tackled using standard convex optimization algorithms. In practice, we solve it using the BFGS algorithm [21] available through SciPy's minimize function. Example. ...
... , λ K ] ⊤ ∈ R K+1 . We will find a solution to (21) by solving the dual problem: min ...
... Since its partial derivatives are equal to ∂d(λ) ∂λ k = m k − Ω x k g λ (x) dx we have that whenever the gradient of d(λ) is zero, then g λ satisfies the moments constraints (20). In other words, if λ * is a solution to the dual problem (23), then the function g λ * given by (24) both maximizes the Lagrangian (22) and satisfies the moments constraints (20), i.e., it is a solution to the primal problem (21). This implies that strong duality holds for the maximum entropy problem. ...
Preprint
Full-text available
Modeling of intricate relational patterns % through the analysis structures of network data has become a cornerstone of contemporary statistical research and related data science fields. Networks, represented as graphs, offer a natural framework for this analysis. This paper extends the Random Dot Product Graph (RDPG) model to accommodate weighted graphs, markedly broadening the model's scope to scenarios where edges exhibit heterogeneous weight distributions. We propose a nonparametric weighted (W)RDPG model that assigns a sequence of latent positions to each node. Inner products of these nodal vectors specify the moments of their incident edge weights' distribution via moment-generating functions. In this way, and unlike prior art, the WRDPG can discriminate between weight distributions that share the same mean but differ in other higher-order moments. We derive statistical guarantees for an estimator of the nodal's latent positions adapted from the workhorse adjacency spectral embedding, establishing its consistency and asymptotic normality. We also contribute a generative framework that enables sampling of graphs that adhere to a (prescribed or data-fitted) WRDPG, facilitating, e.g., the analysis and testing of observed graph metrics using judicious reference distributions. The paper is organized to formalize the model's definition, the estimation (or nodal embedding) process and its guarantees, as well as the methodologies for generating weighted graphs, all complemented by illustrative and reproducible examples showcasing the WRDPG's effectiveness in various network analytic applications.
... At a high level, derivatives offer a local characterization of a function's steepest ascent or descent directions. In practice, this property is frequently employed in numerical optimization, where derivatives guide the iterative process of navigating downhill through the landscape of a function [19]. For example, derivative-based optimization is widely used in robotics for tasks such as inverse kinematics, trajectory optimization, physics simulation, control, learning, and constrained planning. ...
... We provide a closed-form solution to this minimization problem by directly solving its corresponding Karush-Kuhn-Tucker (KKT) system [9,19]. Our algorithm that uses this minimization also incorporates an error detection and correction mechanism that automatically identifies when its outputs drift too far from the ground-truth derivatives, allocating additional iterations to realign its results as needed. ...
... where Λ ∈ R m×1 are the Lagrange multipliers. A first-order necessary condition for an optimal solution is that the Karush-Kuhn-Tucker (KKT) conditions are satisfied [19]. Specifically, for an equality constrained problem, this means that the partial derivatives of the Lagrangian with respect to both the decision variables and the Lagrange multipliers (associated with the equality constraints) are zero: ...
Preprint
Computing derivatives is a crucial subroutine in computer science and related fields as it provides a local characterization of a function's steepest directions of ascent or descent. In this work, we recognize that derivatives are often not computed in isolation; conversely, it is quite common to compute a \textit{sequence} of derivatives, each one somewhat related to the last. Thus, we propose accelerating derivative computation by reusing information from previous, related calculations-a general strategy known as \textit{coherence}. We introduce the first instantiation of this strategy through a novel approach called the Web of Affine Spaces (WASP) Optimization. This approach provides an accurate approximation of a function's derivative object (i.e. gradient, Jacobian matrix, etc.) at the current input within a sequence. Each derivative within the sequence only requires a small number of forward passes through the function (typically two), regardless of the number of function inputs and outputs. We demonstrate the efficacy of our approach through several numerical experiments, comparing it with alternative derivative computation methods on benchmark functions. We show that our method significantly improves the performance of derivative computation on small to medium-sized functions, i.e., functions with approximately fewer than 500 combined inputs and outputs. Furthermore, we show that this method can be effectively applied in a robotics optimization context. We conclude with a discussion of the limitations and implications of our work. Open-source code, visual explanations, and videos are located at the paper website: \href{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}{https://apollo-lab-yale.github.io/25-RSS-WASP-website/}.
... A rich literature of methods from classical optimization has emerged in the context of VQA-based quantum optimization [46][47][48][49]. For reference, we can list the Nelder-Mead simplex algorithm [50], SPSA [51], collective and swarm optimization (ant-based, particleswarm), Bayesian optimization, gradient-based reinforcement learning, Powell's method, Conjugate Gradient, Constrained Optimization by Linear Approximation (COPYLA), Sequential Least Squares Programming (SLSQP), Broyden-Fletcher-Goldfarb-Shanno (BFGS), and Byrd-Omojokun Trust Region Sequential Quadratic Programming (trust-constr) [52]. Another family of promising methods comes from the field of machine learning, which has seen significant success over the last two decades. ...
... These methods have been extensively studied in various fields and are represented by the well-known (L-)BFGS method [93], which has been applied to diverse quantum optimization problems [41,43,52,94]. Other quasi-Newton methods include the Davidon-Fletcher-Powell (DFP) method, which is less robust but more efficient per iteration; the Symmetric Rank One (SR1) method [61], which improves upon the dynamical scaling of Hessian approximations; and the Quasi-Newton Conjugate Gradient (NCG) [95], which employs the conjugate gradient method for the Hessian approximation. ...
... Below, we provide a brief overview of some standard quasi-Newton approximations. For more detailed descriptions, please refer to the references [52,60,93]. ...
Preprint
Full-text available
The optimization of parametric quantum circuits is technically hindered by three major obstacles: the non-convex nature of the objective function, noisy gradient evaluations, and the presence of barren plateaus. As a result, the selection of classical optimizer becomes a critical factor in assessing and exploiting quantum-classical applications. One promising approach to tackle these challenges involves incorporating curvature information into the parameter update. The most prominent methods in this field are quasi-Newton and quantum natural gradient methods, which can facilitate faster convergence compared to first-order approaches. Second order methods however exhibit a significant trade-off between computational cost and accuracy, as well as heightened sensitivity to noise. This study evaluates the performance of three families of optimizers on synthetically generated MaxCut problems on a shallow QAOA algorithm. To address noise sensitivity and iteration cost, we demonstrate that incorporating secant-penalization in the BFGS update rule (SP-BFGS) yields improved outcomes for QAOA optimization problems, introducing a novel approach to stabilizing BFGS updates against gradient noise.
... Despite recent progress in first-order differentiable physics enabled by adjoint-based implicit differentiable programming, second-order implicit differentiation, i.e., Hessian matrices, remains underexplored. Hessians encode curvature information about the landscape of the objective function, allowing Newton-type algorithms (e.g., Newton-CG [18,19]) to achieve quadratic convergence rates near minima [20,21]. The potential advantage of accelerated optimization convergence using the Hessian information is appealing and desired in the differentiable programming community (see issue #474 in the discussion forum of the implicit differentiation package JAXopt [12]), yet this Hessian information for implicit differentiation is currently not available. ...
... whereθ ∈ R M is an arbitrarily given incremental parameter vector,ŷ := ∂y ∂θθ is the incremental state vector, andλ := ∂λ ∂θθ is the incremental adjoint vector. Accordingly, the incremental forward problem in Eq. (20) and the incremental adjoint problem in Eq. (21) can be modified as ...
Preprint
Differentiable programming is revolutionizing computational science by enabling automatic differentiation (AD) of numerical simulations. While first-order gradients are well-established, second-order derivatives (Hessians) for implicit functions in finite-element-based differentiable physics remain underexplored. This work bridges this gap by deriving and implementing a framework for implicit Hessian computation in PDE-constrained optimization problems. We leverage primitive AD tools (Jacobian-vector product/vector-Jacobian product) to build an algorithm for Hessian-vector products and validate the accuracy against finite difference approximations. Four benchmarks spanning linear/nonlinear, 2D/3D, and single/coupled-variable problems demonstrate the utility of second-order information. Results show that the Newton-CG method with exact Hessians accelerates convergence for nonlinear inverse problems (e.g., traction force identification, shape optimization), while the L-BFGS-B method suffices for linear cases. Our work provides a robust foundation for integrating second-order implicit differentiation into differentiable physics engines, enabling faster and more reliable optimization.
... where x k denotes the kth iterate, α k and p k are the step length and search direction at the kth iterate, respectively, and x 0 is typically a user-provided "guess". The search direction p k is some function of the gradient of the objective and constraints, and if α k is chosen to satisfy certain "sufficient decrease" and "curvature" conditions, then this sequence of iterates is guaranteed to converge to a stationary point x * ∈ Ω, provided the objective and constraints are smooth and bounded from below [37]. However, much work goes into finding a good α k (involving several evaluations of the objective, constraints and their gradients) which also impacts their convergence rate. ...
... Overall, from an objective minimization and reducing the budget of queries to an expensive aerodynamic model perspective, derivative-free methods perform quite competitively while frequently outperforming derivative-based methods. We note that for some of the derivative-based algorithms, we are conservative in estimating the number of function and gradient evaluations to be two per iteration-in practice, computing line search step lengths can cost much more than that [37]. Despite this, a competitive performance demonstrated by derivative-free methods provides substantial evidence, at least in moderate dimensions, that they are a realistic option for practical aerodynamic design optimization. ...
Preprint
Full-text available
Aerodynamic design optimization is an important problem in aircraft design that depends on the interplay between a numerical optimizer and a high-fidelity flow physics solver. Derivative-based, first and (quasi) second order, optimization techniques are the de facto choice, particularly given the availability of the adjoint method and its ability to efficiently compute gradients at the cost of just one solution of the forward problem. However, implementation of the adjoint method requires careful mathematical treatment, and its sensitivity to changes in mesh quality limits widespread applicability. Derivative-free approaches are often overlooked for large scale optimization, citing their lack of scalability in higher dimensions and/or the lack of practical interest in globally optimal solutions that they often target. However, breaking free from an adjoint solver can be paradigm-shifting in broadening the applicability of aerodynamic design optimization. We provide a systematic benchmarking of a select sample of widely used derivative-based and derivative-free optimization algorithms on the design optimization of three canonical aerodynamic bodies, namely, the NACA0012 and RAE2822 airfoils, and the ONERAM6 wing. Our results demonstrate that derivative-free methods are competitive with derivative-based methods, while outperforming them consistently in the high-dimensional setting. These findings highlight the practical competitiveness of modern derivative-free strategies, offering a scalable and robust alternative for aerodynamic design optimization when adjoint-based gradients are unavailable or unreliable.
... The LM method (Nocedal and Wright 2006) is an optimization technique that is particularly effective for nonlinear least-squares problems, making it suitable for optimizing the parameters in the RR model. The LM algorithm is a hybrid approach that combines the Gauss-Newton method with the gradient descent. ...
... using the QN method(Nocedal and Wright 2006) involves adjusting the parameters a, d 0 , k to minimize the difference between the model F d i ; a, d 0 , k in Equation(1)and N observed data d i , F i . This can be formulated as the objective function of the sum of the squared errors (SSE):Compute the gradient of the objective function with respect to the parameters a, d 0 , k ...
Article
Full-text available
The particle size distribution (PSD) of ground coffee significantly influences its extraction, flavor, and overall beverage quality. This study aimed to develop, validate, and optimize PSD models for the coffee grinding process. Arabica coffee beans subjected to light, medium, and dark roasting were ground to 12 distinct levels ranging from fine to coarse. The PSDs were examined using laser diffraction. The Rosin–Rammler (RR) model was applied to the data by employing quasi‐Newton (QN) and Levenberg–Marquardt (LM) optimization methods. Indicators of uniformity, including the uniformity index k (k) , coefficient of uniformity (Cu), size span (Span), and coefficient of variation (CV), were computed and subsequently compared across various grinding levels and roasting types. Both the QN and LM methodologies demonstrated an excellent fit to the PSD data, evidenced by high R² values across all grinding levels. The medium grinding level exhibited optimal uniformity, as indicated by the high k k and low Cu, Span, and CV values. Although the medium roast displayed slightly superior uniformity, the Kruskal–Wallis analysis revealed no statistically significant differences in grind consistency across the various roast types. This study demonstrated the effectiveness of PSD modeling for characterizing coffee grind consistency. The results provide insights for optimizing grinding parameters to improve coffee quality, while suggesting that roast type may have a limited influence on grind uniformity compared to grinder settings. The developed models and approaches can inform coffee grinding processes and quality control.
... (3) We then decompose the two-qubit gates into CNOTs and single-qubit gates [78][79][80]. (4) Finally, we further optimize the parametrized single-qubit gates using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [82]. ...
... This leads to very slow convergence close to a maximum of the fidelity. To speed up convergence, we therefore switch to the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [82] to optimize the parameters in the circuit after the gate decomposition. The BFGS algorithm is also an iterative method, similar to gradient ascent, except that the update rule additionally incorporates information about the Hessian of the function, making it a secondorder method. ...
Preprint
Quantum machine learning (QML) is an emerging field that investigates the capabilities of quantum computers for learning tasks. While QML models can theoretically offer advantages such as exponential speed-ups, challenges in data loading and the ability to scale to relevant problem sizes have prevented demonstrations of such advantages on practical problems. In particular, the encoding of arbitrary classical data into quantum states usually comes at a high computational cost, either in terms of qubits or gate count. However, real-world data typically exhibits some inherent structure (such as image data) which can be leveraged to load them with a much smaller cost on a quantum computer. This work further develops an efficient algorithm for finding low-depth quantum circuits to load classical image data as quantum states. To evaluate its effectiveness, we conduct systematic studies on the MNIST, Fashion-MNIST, CIFAR-10, and Imagenette datasets. The corresponding circuits for loading the full large-scale datasets are available publicly as PennyLane datasets and can be used by the community for their own benchmarks. We further analyze the performance of various quantum classifiers, such as quantum kernel methods, parameterized quantum circuits, and tensor-network classifiers, and we compare them to convolutional neural networks. In particular, we focus on the performance of the quantum classifiers as we introduce nonlinear functions of the input state, e.g., by letting the circuit parameters depend on the input state.
... Exact line searches are computationally expensive, so other than in special cases, they are rarely used in practice. Common inexact line search methods include backtracking line search (Nocedal & Wright, 1999), the Polyak step size (Polyak, 1987), spectral methods such as (Barzilai & Borwein, 1988), and learning rate scheduling (Duchi et al., 2011). Among these, backtracking line search is particularly popular due to its simplicity and explainable design, often employing stopping criteria like the Armijo and Wolfe conditions (Nocedal & Wright, 1999). ...
... Common inexact line search methods include backtracking line search (Nocedal & Wright, 1999), the Polyak step size (Polyak, 1987), spectral methods such as (Barzilai & Borwein, 1988), and learning rate scheduling (Duchi et al., 2011). Among these, backtracking line search is particularly popular due to its simplicity and explainable design, often employing stopping criteria like the Armijo and Wolfe conditions (Nocedal & Wright, 1999). However, backtracking line search increases the overall computational costs considerably due to the numerous function evaluations required at each iteration. ...
Preprint
Full-text available
Efficient optimization remains a fundamental challenge across numerous scientific and engineering domains, especially when objective function and gradient evaluations are computationally expensive. While zeroth-order optimization methods offer effective approaches when gradients are inaccessible, their practical performance can be limited by the high cost associated with function queries. This work introduces the bi-fidelity stochastic subspace descent (BF-SSD) algorithm, a novel zeroth-order optimization method designed to reduce this computational burden. BF-SSD leverages a bi-fidelity framework, constructing a surrogate model from a combination of computationally inexpensive low-fidelity (LF) and accurate high-fidelity (HF) function evaluations. This surrogate model facilitates an efficient backtracking line search for step size selection, for which we provide theoretical convergence guarantees under standard assumptions. We perform a comprehensive empirical evaluation of BF-SSD across four distinct problems: a synthetic optimization benchmark, dual-form kernel ridge regression, black-box adversarial attacks on machine learning models, and transformer-based black-box language model fine-tuning. Numerical results demonstrate that BF-SSD consistently achieves superior optimization performance while requiring significantly fewer HF function evaluations compared to relevant baseline methods. This study highlights the efficacy of integrating bi-fidelity strategies within zeroth-order optimization, positioning BF-SSD as a promising and computationally efficient approach for tackling large-scale, high-dimensional problems encountered in various real-world applications.
... Given the nature of the scalarized objective in the Weighted Sum Method, various single-objective optimization techniques can be applied to solve problem (5.1). Among these, Sequential Quadratic Programming (SQP) is particularly well-suited due to its numerical stability and well-established convergence properties in constrained nonlinear optimization [47]. SQP iteratively solves a sequence of quadratic programming (QP) subproblems which approximate the original problem by linearizing constraints and using a second-order approximation of the objective function. ...
... To reduce computational complexity, quasi-Newton methods such as BFGS are commonly used to approximate the Hessian matrix ∇ 2 L(x k , λ k , ν k ) by a positive definite symmetric matrix H k , ensuring positive definiteness while leveraging curvature information from gradients [11,26,29,48,58]. The iterative update is expressed as [47]: ...
Preprint
Achieving both high quality and cost-efficiency are two critical yet often conflicting objectives in manufacturing and maintenance processes. Quality standards vary depending on the specific application, while cost-effectiveness remains a constant priority. These competing objectives lead to multi-objective optimization problems, where algorithms are employed to identify Pareto-optimal solutions-compromise points which provide decision-makers with feasible parameter settings. The successful application of such optimization algorithms relies on the ability to model the underlying physical system, which is typically complex, through either physical or data-driven approaches, and to represent it mathematically. This paper applies three multi-objective optimization algorithms to determine optimal process parameters for high-velocity oxygen fuel (HVOF) thermal spraying. Their ability to enhance coating performance while maintaining process efficiency is systematically evaluated, considering practical constraints and industrial feasibility. Practical validation trials are conducted to verify the theoretical solutions generated by the algorithms, ensuring their applicability and reliability in real-world scenarios. By exploring the performance of these diverse algorithms in an industrial setting, this study offers insights into their practical applicability, guiding both researchers and practitioners in enhancing process efficiency and product quality in the coating industry.
... This smoothing results in a convex optimization problem, ensuring the existence of a Pareto minimum that simultaneously minimizes the log-likelihood and satisfies the fairness constraint. This minimum can be found using iterative methods such as Sequential Quadratic Programming (SQP) [22]. Empirically, c proves to be a parameter that allows users to trade off between accuracy and fairness metrics. ...
... C-LRT inherits the advantages of C-LR, as this relaxation implies that both constraints (inequalities) are convex functions with respect to the parameter θ jt and can be used with iterative methods that ensure convergence and optimality [22]. 1 For a detailed description of CART, see [6] or [16] 4 Results ...
Preprint
Full-text available
Given the high computational complexity of decision tree estimation, classical methods construct a tree by adding one node at a time in a recursive way. To facilitate promoting fairness, we propose a fairness criterion local to the tree nodes. We prove how it is related to the Statistical Parity criterion, popular in the Algorithmic Fairness literature, and show how to incorporate it into standard recursive tree estimation algorithms. We present a tree estimation algorithm called Constrained Logistic Regression Tree (C-LRT), which is a modification of the standard CART algorithm using locally linear classifiers and imposing restrictions as done in Constrained Logistic Regression. Finally, we evaluate the performance of trees estimated with C-LRT on datasets commonly used in the Algorithmic Fairness literature, using various classification and fairness metrics. The results confirm that C-LRT successfully allows to control and balance accuracy and fairness.
... Across disciplines the computational task of expanding m-variate functions into closed form expressions is omnipresent. This includes solving partial differential equations [11,36,53,59], optimization tasks [34,41,43,47], inverse problems [5,65], and uncertainty quantification [2,17,32,63]. ...
Preprint
We present the Fast Newton Transform (FNT), an algorithm for performing m-variate Newton interpolation in downward closed polynomial spaces with time complexity O(Amn)\mathcal{O}(|A|m\overline{n}). Here, A is a downward closed set of cardinality A|A| equal to the dimension of the associated downward closed polynomial space ΠA\Pi_A, where n\overline{n} denotes the mean of the maximum polynomial degrees across the spatial dimensions. For functions being analytic in an open Bernstein poly-ellipse, geometric approximation rates apply when interpolating in non-tensorial Leja-ordered Chebyshev-Lobatto grids or Leja nodes. To mitigate the curse of dimensionality, we utilize p\ell^p-sets, with the Euclidean case (p=2) turning out to be the pivotal choice, leading to A/(n+1)mO(em)|A|/(n+1)^m \in \mathcal{O}(e^{-m}). Expanding non-periodic functions, the FNT complements the approximation capabilities of the Fast Fourier Transform (FFT). Choosing 2\ell^2-sets for A renders the FNT time complexity to be less than the FFT time complexity O((n+1)mmlog(n))\mathcal{O}((n+1)^m m \log(n)) in a range of n, behaving as O(mem)\mathcal{O}(m e^m). Maintaining this advantage true for the differentials, the FNT sets a new standard in m-variate interpolation and approximation practice.
... The energy minimization is terminated when one of three conditions is satisfied: the energy tolerance 1 reaches 10 −7 , the energy gradient reaches 10 −6 , or the optimization reaches 1000 steps. For every value of U/t and for each lattice size, we perform ten independent optimizations (using the L-BFGS method [62]), starting from different random parameters θ, and we take the minimum value among these as our estimate of the ground state energy. In this manner, we reduce the chances of the system becoming trapped in a local minimum. ...
Article
Full-text available
Simulating the Hubbard model is of great interest to a wide range of applications within condensed matter physics, however its solution on classical computers remains challenging in dimensions larger than one. The relative simplicity of this model, embodied by the sparseness of the Hamiltonian matrix, allows for its efficient implementation on quantum computers, and for its approximate solution using variational algorithms such as the variational quantum eigensolver. While these algorithms have been shown to reproduce the qualitative features of the Hubbard model, their quantitative accuracy in terms of producing true ground state energies and other properties, and the dependence of this accuracy on the system size and interaction strength, the choice of variational ansatz, and the degree of spatial inhomogeneity in the model, remains unknown. Here we present a rigorous classical benchmarking study, demonstrating the potential impact of these factors on the accuracy of the variational solution of the Hubbard model on quantum hardware, for systems with up to 32 qubits. We find that even when using the most accurate wavefunction ansätze for the Hubbard model, the error in its ground state energy and wavefunction plateaus for larger lattices, while stronger electronic correlations magnify this issue. Concurrently, spatially inhomogeneous parameters and the presence of off-site Coulomb interactions only have a small effect on the accuracy of the computed ground state energies. Our study highlights the capabilities and limitations of current approaches for solving the Hubbard model on quantum hardware, and we discuss potential future avenues of research.
... First, the zig-zag motion in the empirical trajectory ( Fig. 4c) can be attributed to the competing influences of the two loss components, whose gradient directions in the complex plane are nearly opposed. While such zig-zag patterns are well documented in ill-conditioned optimization (e.g., due to large Hessian condition numbers [39,40]), here they result from a structured conflict between short-and long-term objectives-a hallmark of closed-loop learning. Notably, this conflict is mitigated by adaptive optimizers such as Adam [41] (see Appendix C.7). ...
Preprint
Full-text available
Recurrent neural networks (RNNs) trained on neuroscience-inspired tasks offer powerful models of brain computation. However, typical training paradigms rely on open-loop, supervised settings, whereas real-world learning unfolds in closed-loop environments. Here, we develop a mathematical theory describing the learning dynamics of linear RNNs trained in closed-loop contexts. We first demonstrate that two otherwise identical RNNs, trained in either closed- or open-loop modes, follow markedly different learning trajectories. To probe this divergence, we analytically characterize the closed-loop case, revealing distinct stages aligned with the evolution of the training loss. Specifically, we show that the learning dynamics of closed-loop RNNs, in contrast to open-loop ones, are governed by an interplay between two competing objectives: short-term policy improvement and long-term stability of the agent-environment interaction. Finally, we apply our framework to a realistic motor control task, highlighting its broader applicability. Taken together, our results underscore the importance of modeling closed-loop dynamics in a biologically plausible setting.
... Projected Gradient (SPG) and Newtons method in AI optimisation [13,14,15,12], with much faster convergence and much less computational cost then it is possible within the standard AI tools. ...
Preprint
Full-text available
Shannon entropy (SE) and its quantum mechanical analogue von Neumann entropy are key components in many tools used in physics, information theory, machine learning (ML) and quantum computing. Besides of the significant amounts of SE computations required in these fields, the singularity of the SE gradient is one of the central mathematical reason inducing the high cost, frequently low robustness and slow convergence of such tools. Here we propose the Fast Entropy Approximation (FEA) - a non-singular rational approximation of Shannon entropy and its gradient that achieves a mean absolute error of 10310^{-3}, which is approximately 20 times lower than comparable state-of-the-art methods. FEA allows around 50%50\% faster computation, requiring only 5 to 6 elementary computational operations, as compared to tens of elementary operations behind the fastest entropy computation algorithms with table look-ups, bitshifts, or series approximations. On a set of common benchmarks for the feature selection problem in machine learning, we show that the combined effect of fewer elementary operations, low approximation error, and a non-singular gradient allows significantly better model quality and enables ML feature extraction that is two to three orders of magnitude faster and computationally cheaper when incorporating FEA into AI tools.
... where N denotes the length of the observation sequence; S k represents the number of satellites at epoch t k . The Gauss-Newton method [35] was used to solve the above optimization problem. However, in complex urban environments, satellite signals are susceptible to multipath and non-line-ofsight signals, and there are significant errors in pseudorange and Doppler observations, which leads to the challenge of optimization failing to converge to the global minimum. ...
Article
Accurate and rapid INS state initialization is crucial to ensure the performance of vehicular GNSS/INS integrated navigation. However, in typical urban environments (such as under viaducts and urban canyons), existing GNSS-assisted INS state initialization methods are sensitive to observation outliers. This paper proposes a robust INS state initialization method for vehicle-mounted GNSS/INS integrated navigation. The proposed method first derives the error propagation between the shortterm relative navigation (i.e., position, velocity, attitude) and the INS initial state and the GNSS observation error model; then, the high-precision relative pose generated by INS is used to construct constraints between GNSS observation sequences, and the INS state initialization problem is converted into an optimization problem; finally, a two-step optimization strategy is designed to improve the problem of high computational complexity in solving the full-state optimization problem. We use six datasets collected in a typical urban environment to verify the feasibility of the proposed method. The proposed method uses observation sequences within a 10-second to initialize the heading, velocity, and horizontal position with errors of 2.50◦, 0.30 m/s, and 11.1 m, respectively, which are reduced by 73%, 41%, and 14% compared with existing methods.
... In our initial experimental series, we evaluated widely-used gradient-based optimization methods Dembo and Steihaug [1983], Nocedal and Wright [2006] from the SciPy library 2 : BFGS (Broyden-Fletcher-Goldfarb-Shanno algorithm); CG (Conjugate Gradient algorithm); L-BFGS-B (limited-memory BFGS variant with box constraints); Newton-CG; SLSQP (Sequential Least Squares Programming); and TNC (Truncated Newton Algorithm). The computation results are reported in Table 1, and Table 2. ...
Preprint
Full-text available
We present a novel method called TESALOCS (TEnsor SAmpling and LOCal Search) for multidimensional optimization, combining the strengths of gradient-free discrete methods and gradient-based approaches. The discrete optimization in our method is based on low-rank tensor techniques, which, thanks to their low-parameter representation, enable efficient optimization of high-dimensional problems. For the second part, i.e., local search, any effective gradient-based method can be used, whether existing (such as quasi-Newton methods) or any other developed in the future. Our approach addresses the limitations of gradient-based methods, such as getting stuck in local optima; the limitations of discrete methods, which cannot be directly applied to continuous functions; and limitations of gradient-free methods that require large computational budgets. Note that we are not limited to a single type of low-rank tensor decomposition for discrete optimization, but for illustrative purposes, we consider a specific efficient low-rank tensor train decomposition. For 20 challenging 100-dimensional functions, we demonstrate that our method can significantly outperform results obtained with gradient-based methods like Conjugate Gradient, BFGS, SLSQP, and other methods, improving them by orders of magnitude with the same computing budget.
... By minimising the total interaction energy of the network with respect to aggregate positions and orientation, the equilibrium structure formed due to particle interactions was found. In this work, the nearly-exact trust region method was used to perform the minimization, and the minimization proceeded until the Euclidean norm of the gradient vector fell below a defined tolerance gtol [32]. To examine the types of structures formed due to van der Waals interactions alone, the following system was simulated: The electrical network of filler particles was also described as an undirected weighted graph. ...
Preprint
Carbon-elastomer composites exhibit complex piezoresistive behaviour that cannot be fully explained by existing macroscopic or microstructural models. In this work, we introduce a network-based modelling methodology to explore the hypothesis that van der Waals interactions between carbon particles contribute to the formation of a conductivity-promoting network structure prior to curing. We combine a discrete aggregate-based representation of filler with a mesh-free, quasi-static viscoelastic model adapted from bond-based peridynamics, resolving equilibrium states through energy minimization. The resulting particle networks are analysed using graph-theoretic measures of connectivity and conductivity. Our simulations reproduce several unexplained experimental phenomena, including long-timescale resistivity decay, non-monotonic secondary peaks upon strain release, and the increasing prominence of these features with higher filler density. Crucially, these behaviours emerge from the interplay between viscoelastic stresses and van der Waals interactions. We show that the resistance response of the network operates over different characteristic timescales to the viscoelastic stress response. The approach has potential for understanding and predicting emergent behaviour in composite materials more broadly, where material characteristics often depend on percolating network structure.
... The conditioning of a matrix is measured by its condition number, defined as the ratio of its largest singular value to its smallest. A high condition number indicates ill-conditioning, which is a well-known challenge for the convergence of gradient-based optimization methods [15]. ...
Preprint
Transformers have transformed modern machine learning, driving breakthroughs in computer vision, natural language processing, and robotics. At the core of their success lies the attention mechanism, which enables the modeling of global dependencies among input tokens. However, we reveal that the attention block in transformers suffers from inherent ill-conditioning, which hampers gradient-based optimization and leads to inefficient training. To address this, we develop a theoretical framework that establishes a direct relationship between the conditioning of the attention block and that of the embedded tokenized data. Building on this insight, we introduce conditioned embedded tokens, a method that systematically modifies the embedded tokens to improve the conditioning of the attention mechanism. Our analysis demonstrates that this approach significantly mitigates ill-conditioning, leading to more stable and efficient training. We validate our methodology across various transformer architectures, achieving consistent improvements in image classification, object detection, instance segmentation, and natural language processing, highlighting its broad applicability and effectiveness.
... In the decoupled approach, their values w s and w b,s would have to be updated each iteration to compensate for this fact. This means that the constraints of the NLP change during the solving process, which interferes with globalisation strategies of NLP solvers [8,13,21]. To limit the frequency at which these problem parameters change between iterations, and thus the negative effect on the globalisation, two filters are implemented: a broad-phase and a trust-region filter. ...
Preprint
Full-text available
This paper details an approach to linearise differentiable but non-convex collision avoidance constraints tailored to convex shapes. It revisits introducing differential collision avoidance constraints for convex objects into an optimal control problem (OCP) using the separating hyperplane theorem. By framing this theorem as a classification problem, the hyperplanes are eliminated as optimisation variables from the OCP. This effectively transforms non-convex constraints into linear constraints. A bi-level algorithm computes the hyperplanes between the iterations of an optimisation solver and subsequently embeds them as parameters into the OCP. Experiments demonstrate the approach's favourable scalability towards cluttered environments and its applicability to various motion planning approaches. It decreases trajectory computation times between 50\% and 90\% compared to a state-of-the-art approach that directly includes the hyperplanes as variables in the optimal control problem.
... The probability function of GMM can be expressed as [49] or by using the expectation maximization framework [50]. In this work, we use the latter, which is more commonly used. ...
Article
Full-text available
Designing an aircraft that operates optimally under various flight conditions requires the consideration of flight operational data in the aircraft design process. Traditionally, aircraft design optimization assumes one nominal or multiple conditions mostly during the cruise stage, neglecting operational variations. This study introduces a data-driven, cluster-based approach to address this gap, incorporating real-world operational data into the aerodynamic shape optimization of commercial aircraft wings. Using the NASA Common Research Model wing configuration, this study employs a compact modal parameterization method to efficiently capture wing shape variations. A physics-based mission analysis and performance model is employed to extract relevant flight conditions from flight data and evaluate fuel consumption. The Gaussian mixture model is performed on the flight data distributions to identify relevant clusters and derive the multipoint objective function, which is modeled as a weighted average of drag coefficients. The proposed cluster-based 17-point optimization formulation shows a notable total fuel burn reduction of around 2.3% compared to conventional methods (evaluated across 100 of our airline partner's most flown flight missions), demonstrating the efficacy of incorporating operational information into the optimization problem formulation to improve the aerodynamic performance and fuel efficiency.
... The ability to estimate Hessian elements enables the implementation of a second-order method, often referred to as Newton's method, for optimizing f (µ). Considering that the objective function f (µ) is smooth and concave, this approach can offer faster local convergence than firstorder techniques [14,85]. Moreover, the equivalence of the Hessian with the Kubo-Mori information matrix as described in (63) implies that the Euclidean curvature of f (µ) coincides with the information-geometric curvature induced by the (Umegaki) relative entropy [79,80]. ...
Preprint
In quantum thermodynamics, a system is described by a Hamiltonian and a list of non-commuting charges representing conserved quantities like particle number or electric charge, and an important goal is to determine the system's minimum energy in the presence of these conserved charges. In optimization theory, a semi-definite program (SDP) involves a linear objective function optimized over the cone of positive semi-definite operators intersected with an affine space. These problems arise from differing motivations in the physics and optimization communities and are phrased using very different terminology, yet they are essentially identical mathematically. By adopting Jaynes' mindset motivated by quantum thermodynamics, we observe that minimizing free energy in the aforementioned thermodynamics problem, instead of energy, leads to an elegant solution in terms of a dual chemical potential maximization problem that is concave in the chemical potential parameters. As such, one can employ standard (stochastic) gradient ascent methods to find the optimal values of these parameters, and these methods are guaranteed to converge quickly. At low temperature, the minimum free energy provides an excellent approximation for the minimum energy. We then show how this Jaynes-inspired gradient-ascent approach can be used in both first- and second-order classical and hybrid quantum-classical algorithms for minimizing energy, and equivalently, how it can be used for solving SDPs, with guarantees on the runtimes of the algorithms. The approach discussed here is well grounded in quantum thermodynamics and, as such, provides physical motivation underpinning why algorithms published fifty years after Jaynes' seminal work, including the matrix multiplicative weights update method, the matrix exponentiated gradient update method, and their quantum algorithmic generalizations, perform well at solving SDPs.
... Given the nonlinear nature of the objective function ( ) and the presence of multiple local minima, a two-stage hybrid optimization strategy is adopted. This strategy combines the global search capability of the Genetic Algorithm (GA) [16] with the local refinement efficiency of the Sequential Quadratic Programming (SQP) algorithm [17]. This hybrid approach effectively mitigates the limitations associated with using a single optimization method. ...
Article
Full-text available
The increasing deployment of robotic systems in industrial applications has driven widespread use of two-link robots, valued for their high speed and precision. However, their inherent nonlinear dynamics and strong coupling effects present substantial challenges to achieving high-precision trajectory tracking. To address these issues, this paper proposes a feedforwardPID control strategy optimized using a hybrid Genetic AlgorithmSequential Quadratic Programming (GASQP) approach. The proposed method combines the anticipatory capabilities of feedforward control with the corrective feedback of PID control, enabling automatic and efficient parameter tuning. Simulation results demonstrate that, in comparison to conventional PID control, the proposed approach enhances trajectory tracking accuracy by approximately 39.61%. Specifically, the GASQP-optimized controller reduces the Root Mean Square Error (RMSE) to 0.48mm for an Archimedean spiral trajectory, and further to 0.01mm for a Sine-like trajectory, confirming its adaptability across various trajectory profiles. Torque analysis further highlights the complementary interaction between feedforward and PID components, substantiating the methods effectiveness. These results underscore the proposed strategys potential to significantly improve trajectory tracking accuracy and robustness for two-link robots, especially in complex dynamic environments.
... The middle number on the connecting arrow indicates the volume transported between the locations. merical optimization algorithms that approximate second-order information (the Hessian matrix) using only first-order gradient evaluations [44]. ...
Preprint
Full-text available
We present an integrated framework for truckload procurement in container logistics, bridging strategic and operational aspects that are often treated independently in existing research. Drayage, the short-haul trucking of containers, plays a critical role in intermodal container logistics. Using dynamic programming, we identify optimal operational policies for allocating drayage volumes among capacitated carriers under uncertain container flows and spot rates. The computational complexity of optimization under uncertainty is mitigated through sample average approximation. These optimal policies serve as the basis for evaluating specific capacity arrangements. To optimize capacity reservations with strategic and spot carriers, we employ an efficient quasi-Newton method. Numerical experiments demonstrate significant cost-efficiency improvements, including a 21.2% cost reduction in a four-period scenario. Monte Carlo simulations further highlight the strong generalization capabilities of the proposed joint optimization method across out-of-sample scenarios. These findings underscore the importance of integrating strategic and operational decisions to enhance cost efficiency in truckload procurement under uncertainty.
... Lastly, a new filtering algorithm has been derived where the likelihood function is analytically computed and included within the propagated uncertainties, evaluated inside the KO framework. The resulting posterior is maximized using a backtracking line search gradient descent algorithm [29], providing the estimate of the system according to the MAP principle. ...
Article
This paper proposes a method to propagate uncertainties undergoing nonlinear dynamics using the Koopman Operator (KO). Probability density functions are propagated directly using the Koopman approximation of the solution flow of the system, where the dynamics have been projected on a well-defined set of basis functions. The prediction technique is derived following both the analytical (Galerkin) and numerical (EDMD) derivation of the KO, and a least square reduction algorithm assures the recursivity of the proposed methodology. Furthermore, a complete filtering algorithm is proposed, where the predicted uncertainties are upadated analytically using the likelihood function, following Bayes' formulation. Estimates are provided after optimization according to the Maximum A Posteriori formulation, where a backtracking Newton solver identifies the global most likely posterior state.
... We now perform a simulation study of the models in Section 2.1 and 2.2, estimating parameters by the methods in Section 2.3. To minimize the sum of squared errors (11), we use optim in R [24] with the Brodyden-Fletcher-Goldfarb-Shanno (BFGS) method [19]. ...
Preprint
Ordinary and stochastic differential equations (ODEs and SDEs) are widely used to model continuous-time processes across various scientific fields. While ODEs offer interpretability and simplicity, SDEs incorporate randomness, providing robustness to noise and model misspecifications. Recent research highlights the statistical advantages of SDEs, such as improved parameter identifiability and stability under perturbations. This paper investigates the robustness of parameter estimation in SDEs versus ODEs under three types of model misspecifications: unrecognized noise sources, external perturbations, and simplified models. Furthermore, the effect of missing data is explored. Through simulations and an analysis of Danish COVID-19 data, we demonstrate that SDEs yield more stable and reliable parameter estimates, making them a strong alternative to traditional ODE modeling in the presence of uncertainty.
... Although DPA provides detailed insights into battery properties, this process is usually invasive and time-consuming. As a promising alternative for DPA, numerical optimization [20] provides a non-destructive way for parameter identification. The model parameters are iteratively updated to minimize the discrepancy between model predictions and experimental measurements. ...
Preprint
The physics-based Doyle-Fuller-Newman (DFN) model, widely adopted for its precise electrochemical modeling, stands out among various simulation models of lithium-ion batteries (LIBs). Although the DFN model is powerful in forward predictive analysis, the inverse identification of its model parameters has remained a long-standing challenge. The numerous unknown parameters associated with the nonlinear, time-dependent, and multi-scale DFN model are extremely difficult to be determined accurately and efficiently, hindering the practical use of such battery simulation models in industrial applications. To tackle this challenge, we introduce DiffLiB, a high-fidelity finite-element-based LIB simulation framework, equipped with advanced differentiable programming techniques so that efficient gradient-based inverse parameter identification is enabled. Customized automatic differentiation rules are defined by identifying the VJP (vector-Jacobian product) structure in the chain rule and implemented using adjoint-based implicit differentiation methods. Four numerical examples, including both 2D and 3D forward predictions and inverse parameter identification, are presented to validate the accuracy and computational efficiency of DiffLiB. Benchmarking against COMSOL demonstrates excellent agreement in forward predictions, with terminal voltage discrepancies maintaining a root-mean-square error (RMSE) below 2 mV across all test conditions. In parameter identification tasks using experimentally measured voltage data, the proposed gradient-based optimization scheme achieves superior computational performance, with 96% fewer forward predictions and 72% less computational time compared with gradient-free approaches. These results demonstrate that DiffLiB is a versatile and powerful computational framework for the development of advanced LIBs.
... Assume that the necessary condition (38) is satisfied, then we have ...
Preprint
Full-text available
High-order methods for convex and nonconvex optimization, particularly pth-order Adaptive Regularization Methods (ARp), have attracted significant research interest by naturally incorporating high-order Taylor models into adaptive regularization frameworks, resulting in algorithms with faster global and local convergence rates than first- and second-order methods. This paper establishes global optimality conditions for general, nonconvex cubic polynomials with quartic regularization. These criteria generalise existing results, recovering the optimality results for regularized quadratic polynomials, and can be further simplified in the low-rank and diagonal tensor cases. Under suitable assumptions on the Taylor polynomial, we derive a lower bound for the regularization parameter such that the necessary and sufficient criteria coincide, establishing a connection between this bound and the subproblem's convexification and sum-of-squares (SoS) convexification techniques. Leveraging the optimality characterization, we develop a Diagonal Tensor Method (DTM) for minimizing quartically-regularized cubic Taylor polynomials by iteratively minimizing a sequence of local models that incorporate both diagonal cubic terms and quartic regularization (DTM model). We show that the DTM algorithm is provably convergent, with a global evaluation complexity of O(ϵ3/2)\mathcal{O}(\epsilon^{-3/2}). Furthermore, when special structure is present (such as low rank or diagonal), DTM can exactly solve the given problem (in one iteration). In our numerical experiments, we propose practical DTM variants that exploit local problem information for model construction, which we then show to be competitive with cubic regularization and other subproblem solvers, with superior performance on problems with special structure.
... I positioned the molecule 2.5 Å above the surface as an initial guess and performed local structure relaxation using an "optimize" module from ASE. As the optimizer, I used the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [47], and set the force convergence criterion to 0.01 eV/Å. During the structural relaxation process, the crystal cell remained fixed while the atoms in the bottom layer of the slab model were frozen in their positions, as they would be in the bulk environment. ...
Article
Full-text available
It is essential that one understands how the surface degrees of freedom influence molecular spin switching to successfully integrate spin crossover (SCO) molecules into devices. This study uses density functional theory calculations to investigate how spin state energetics and molecular vibrations change in a Fe(II) SCO compound named [Fe(py)2bpym(NCS)2] when deposited on an Al(100) surface. The calculations consider an environment-dependent U to assess the local Coulomb correlation of 3d electrons. The results show that the adsorption configurations heavily affect the spin state splitting, which increases by 10–40 kJmol−1 on the surface, and this is detrimental to spin conversion. This effect is due to the surface binding energy variation across the spin transition. The preference for the low-spin state originates partly from the strong correlation effect. Furthermore, the surface environment constrains the vibrational entropy difference, which decreases by 8–17 Jmol−1K−1 (at 300 K) and leads to higher critical temperatures. These results suggest that the electronic energy splitting and vibrational level shifting are suitable features for characterizing the spin transition process on surfaces, and they can provide access to high-throughput screening of spin crossover devices.
... Unless otherwise stated in this paper we use a Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton optimizer (Nocedal and Wright 2006) with empirical gradient estimates for optimizing scoring functions, which falls back onto a Nelder-Mead optimizer (Nelder and Mead 1965) should it fail. Using analytic gradients is also possible and might increase the optimization efficiency, but we leave such exploration for future work. ...
Article
Full-text available
Accurate forecasts of extreme wind speeds are of high importance for many applications. Such forecasts are usually generated by ensembles of numerical weather prediction (NWP) models, which however can be biased and have errors in dispersion, thus necessitating the application of statistical post-processing techniques. In this work we aim to improve statistical post-processing models for probabilistic predictions of extreme wind speeds. We do this by adjusting the training procedure used to fit ensemble model output statistics (EMOS) models – a commonly applied post-processing technique – and propose estimating parameters using the so-called threshold-weighted continuous ranked probability score (twCRPS), a proper scoring rule that places special emphasis on predictions over a threshold. We show that training using the twCRPS leads to improved extreme event performance of post-processing models for a variety of thresholds. We find a distribution body-tail trade-off where improved performance for probabilistic predictions of extreme events comes with worse performance for predictions of the distribution body. However, we introduce strategies to mitigate this trade-off based on weighted training and linear pooling. Finally, we consider some synthetic experiments to explain the training impact of the twCRPS and derive closed-form expressions of the twCRPS for a number of distributions, giving the first such collection in the literature. The results will enable researchers and practitioners alike to improve the performance of probabilistic forecasting models for extremes and other events of interest.
... The Karush-Kuhn-Tucker (KKT) conditions Nocedal and Wright (2009);Beck (2023) are necessary (and sometimes sufficient) optimality conditions for a solution of a constrained optimization problem. Denote a local minimizer of (5) by ⋆ ∈ A h and the gradient of F at ⋆ by g ⋆ = M −1 dF( ⋆ ) . ...
Article
Full-text available
We introduce a novel method for solving density-based topology optimization problems: Sigmoidal Mirror descent with a Projected Latent variable (SiMPL). The SiMPL method (pronounced as “the simple method”) optimizes a design using only first-order derivative information of the objective function. The bound constraints on the density field are enforced with the help of the (negative) Fermi–Dirac entropy, which is also used to define a non-symmetric distance function called a Bregman divergence on the set of admissible designs. This Bregman divergence leads to a simple update rule that is further simplified with the help of a so-called latent variable. Because the SiMPL method involves discretizing the latent variable, it produces a sequence of pointwise-feasible iterates, even when high-order finite elements are used in the discretization. Numerical experiments demonstrate that the method outperforms other popular first-order optimization algorithms. To outline the general applicability of the technique, we include examples with (self-load) compliance minimization and compliant mechanism optimization problems.
... In such situations, a common alternative consists of relying on iterative solvers; classical representatives of this type of approach are the Gauss-Seidel or conjugate-gradient (CG) methods, for instance (see e.g. [1,2,5,6,12,15]). ...
Preprint
We investigate the properties of a class of piecewise-fractional maps arising from the introduction of an invariance under rescaling into convex quadratic maps. The subsequent maps are quasiconvex, and pseudoconvex on specific convex cones; they can be optimised via exact line search along admissible directions, and the iterates then inherit a bidimensional optimality property. We study the minimisation of such relaxed maps via coordinate descents with gradient-based rules, placing a special emphasis on coordinate directions verifying a maximum-alignment property in the reproducing kernel Hilbert spaces related to the underlying positive-semidefinite matrices. In this setting, we illustrate that accounting for the optimal rescaling of the iterates can in certain situations substantially accelerate the unconstrained minimisation of convex quadratic maps.
... If we denote the fractional optical depth variations as y and the radial displacements from the nominal resonance location as x x r , then we can fit these data to the following function using the scipy.optimize.cur-ve_fit program in the Scipy Python package (Virtanen et al., 2020) using the Trust Region Reflective Method (TRF) (Dennis & Schnabel, 1996;Gill et al., 2019;Moré & Sorensen, 1983;Nocedal & Wright, 2006;Press et al., 1986): ...
Article
Full-text available
Certain spiral density waves in Saturn's rings are generated through resonances with planetary normal modes, making them valuable probes of Saturn's internal structure. Previous research has primarily focused on the rotation rates of these waves. However, other characteristics of these waves also contain valuable information about the planet's interior. In this work, we investigate the amplitudes of the waves across the C‐ring by analyzing high signal‐to‐noise profiles derived from phase‐corrected averages of occultation profiles obtained by Cassini's Visual and Infrared Mapping Spectrometer (VIMS). By fitting these wave profiles to linear density wave models, we estimate the ring's surface mass density, mass extinction coefficient, and effective kinematic viscosity at 34 locations in the C‐ring, as well as the amplitude of the gravitational potential perturbations associated with 6 satellite resonances and 28 planetary normal mode resonances. Our estimates of the C‐ring's mass extinction coefficient indicate that the typical particle mass density is around 0.3 g/cm³ interior to 84,000 km, but can get as low as 0.03 g/cm³ exterior to 84,000 km. We also find the ring's viscosity is reduced in the outer C‐ring, which is consistent with the exceptionally high porosity of the particles in this region. Meanwhile, we find the amplitudes of Saturn's normal modes are complex functions of frequency, ℓ \ell and m m, implying that multiple factors influence how efficiently these modes are excited. This analysis identified two primary sources of these normal‐mode oscillations: a deep source located close to Saturn's core, and a shallow source residing near the surface.
Chapter
Iterative methods are based on a successive improvement of solution estimation. Starting from an initial guess, a sequence of improving approximate solutions is obtained. It means that receiving an unambiguous optimal solution “at once” is impossible, but the sequence of solution guesses can differ for different techniques and their settings. Nevertheless, if the technique suits a particular problem well, the sequence (at least asymptotically) approaches the actual optimum.
Preprint
Full-text available
We consider the problem of sampling from a product-of-experts-type model that encompasses many standard prior and posterior distributions commonly found in Bayesian imaging. We show that this model can be easily lifted into a novel latent variable model, which we refer to as a Gaussian latent machine. This leads to a general sampling approach that unifies and generalizes many existing sampling algorithms in the literature. Most notably, it yields a highly efficient and effective two-block Gibbs sampling approach in the general case, while also specializing to direct sampling algorithms in particular cases. Finally, we present detailed numerical experiments that demonstrate the efficiency and effectiveness of our proposed sampling approach across a wide range of prior and posterior sampling problems from Bayesian imaging.
Article
The distinction between “reinforcement” and “cloaking” has been overlooked in optimization-based design of devices intended to conceal a defect in an elastic medium. In the former, a so-called “cloak” is severely biased toward one or a few specific elastic disturbances, whereas in the latter, an “unbiased cloak” is effective under any elastic disturbance. We propose a two-stage approach for optimization-based design of elastostatic cloaks that targets true, unbiased cloaks. First, we perform load-case optimization to find a finite set of worst-case design loads. Then we perform topology optimization of the cloak microstructure under these worst-case loads using a judicious choice of the objective function, formulated in terms of energy mismatch. Although a small subset of the infinite load cases that the cloak must handle, these highly nonintuitive, worst-case loads lead to designs that approach perfect and unbiased elastostatic cloaking. In demonstration, we consider elastic media composed of spinodal architected materials, which provides an ideal testbed for exploring elastostatic cloaks in media with varying anisotropy and porosity, without sacrificing manufacturability. To numerically verify the universal nature of our cloaks, we compare the elastic response of the medium containing the cloaked defect to that of the undisturbed medium under many random load cases not considered during design. By using digital light processing additive manufacturing to realize the elastic media containing cloaked defects and analyzing their response experimentally using compression testing with digital image correlation, this study provides a physical demonstration of elastostatic cloaking of a three-dimensional defect in a three-dimensional medium.
Preprint
We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation to approximate the pruned blocks. This estimated linear mapping can be seamlessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model's performance on open benchmarks - without any training or healing steps, resulting in minimal computational overhead (see Fig.1). We provide an open-source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at this repository.
Preprint
Full-text available
In recent years, the compression of large language models (LLMs) has emerged as a key problem in facilitating LLM deployment on resource-limited devices, reducing compute costs, and mitigating the environmental footprint due to large-scale AI infrastructure. Here, we establish the foundations of LLM quantization from a rate-distortion theory perspective and propose a quantization technique based on simple rate-distortion optimization. Our technique scales to models containing hundreds of billions of weight parameters and offers users the flexibility to compress models, post-training, to a model size or accuracy specified by the user.
Article
Full-text available
Financial markets exhibit quasi-stationary patterns, known as market states, which previous studies have identified through the correlation structure of asset returns using clustering techniques. In this paper, we propose a novel approach to the portfolio selection problem by leveraging market states through an unsupervised method that extends beyond conventional correlation analysis. We introduce market distance metrics for both price and non-price data to estimate market states. Our proposed model allocates assets based on portfolio choices optimized within specified market states. In addition to using asset return correlations, we experiment with two non-price data sources: (1) a text-based market distance metric derived from Federal Open Market Committee (FOMC) policy statements, and (2) the correlation of multivariate time series search volumes from Google Trends. Using a multi-asset portfolio encompassing major U.S. asset classes, we trained our model from 2016 to 2019 and evaluated its out-of-sample performance from 2020 to 2023. Our findings demonstrate the superiority of our ensemble solution, incorporating multiple market distance metrics with price and non-price data, over established baselines with statistically significant results.
Article
The nonlinear optimization problem with possible infeasible constraints was studied early by Burke (J Math Anal Appl, 139:19–351, 1989) and was revisited by Dai and Zhang (CSIAM Trans Appl Math, 2:551–584, 2021; Math Program, 200:633–667, 2023) in a broad perspective. This paper considers nonlinear optimization with least 1\ell _1-norm measure of constraint violations and introduces the concepts of the D-stationary point, the DL-stationary point, and the DZ-stationary point with the help of exact penalty function. If the stationary point is feasible, they correspond to the Fritz–John stationary point, the KKT stationary point, and the singular stationary point, respectively. In order to show the usefulness of these specific stationary points, we propose an exact penalty sequential quadratic programming (SQP) method with inner and outer iterations and analyze its global and local convergence. The proposed method admits convergence to a D-stationary point and rapid infeasibility detection without driving the penalty parameter to zero, which demonstrates the commentary given in Byrd et al (SIAM J Optim, 20:2281–2299, 2010) and can be thought to be a supplement of the theory of nonlinear optimization on rapid detection of infeasibility. Some illustrative examples and preliminary numerical results demonstrate that the proposed method is robust and efficient in solving infeasible nonlinear problems and a degenerate problem without LICQ in the literature.
Article
Large-scale optimization algorithms frequently require sparse Hessian matrices that are not readily available. Existing methods for approximating large sparse Hessian matrices either do not impose sparsity or are computationally prohibitive. To try and overcome these limitations, we propose a novel approach that seeks to satisfy as many componentwise secant equations as necessary to define each row of the Hessian matrix. A naive application of this approach is too expensive for Hessian matrices that have some relatively dense rows but, by carefully taking into account the symmetry and connectivity of the Hessian matrix, we are able devise an approximation algorithm that is fast and efficient with scope for parallelism. Example sparse Hessian matrices from the CUTEst test collection for optimization illustrate the effectiveness and robustness of our proposed method.
Article
Full-text available
Pathophysiological conditions in arteries, such as stenosis or aneurysms, have a great impact on blood flow dynamics enforcing the numerical study of such pathologies. Computational fluid dynamics (CFD) could provide the means for the calculation and interpretation of pressure and velocity fields, wall stresses, and important biomedical factors in such pathologies. Additionally, most of these pathological conditions are connected with geometric vessel changes. In this study, the numerical solution of the 2D flow in a branching artery and a multiscale model of 3D flow are presented utilizing CFD. In the 3D case, a multiscale approach (3D and 0D–1D) is pursued, in which a dynamically altered velocity parabolic profile is applied at the inlet of the geometry. The obtained waveforms are derived from a 0D–1D mathematical model of the entire arterial tree. The geometries of interest are patient-specific 3D reconstructed abdominal aortic aneurysms after fenestrated (FEVAR) and branched endovascular aneurysm repair (BEVAR). Critical hemodynamic parameters such as velocity, wall shear stress, time averaged wall shear stress, and local normalized helicity are presented, evaluated, and compared.
ResearchGate has not been able to resolve any references for this publication.