ArticlePDF Available

# Some methods of speeding up the convergence of iteration methods

Authors:

## Abstract

For the solution of the functional equation P (x) = 0 (1) (where P is an operator, usually linear, from B into B, and B is a Banach space) iteration methods are generally used. These consist of the construction of a series x0, …, xn, …, which converges to the solution (see, for example [1]). Continuous analogues of these methods are also known, in which a trajectory x(t), 0 ⩽ t ⩽ ∞ is constructed, which satisfies the ordinary differential equation in B and is such that x(t) approaches the solution of (1) as t → ∞ (see [2]). We shall call the method a k-step method if for the construction of each successive iteration xn+1 we use k previous iterations xn, …, xn−k+1. The same term will also be used for continuous methods if x(t) satisfies a differential equation of the k-th order or k-th degree. Iteration methods which are more widely used are one-step (e.g. methods of successive approximations). They are generally simple from the calculation point of view but often converge very slowly. This is confirmed both by the evaluation of the speed of convergence and by calculation in practice (for more details see below). Therefore the question of the rate of convergence is most important. Some multistep methods, which we shall consider further, which are only slightly more complicated than the corresponding one-step methods, make it possible to speed up the convergence substantially. Note that all the methods mentioned below are applicable also to the problem of minimizing the differentiable functional (x) in Hilbert space, so long as this problem reduces to the solution of the equation grad (x) = 0.
A preview of the PDF is not available
... The algorithm we are going to investigate in this subsection is related to the B.T. Polyak Heavy Ball Method [74]. Actually, as shown by Su, Boyd and Candes [81], the celebrated Nesterov minimization algorithm belongs to the heavy ball family. ...
... The slowest one is almost always the RK(4,5) scheme. One exception is observed on the G-set where the Lie scheme slowed down at the end, which is due to the fact that there are very large-scale instances with 20000 variables leading to the solution of the linear system (74) in the Lie scheme quite slow. ...
Article
Full-text available
In this article, we discuss the numerical solution of Boolean polynomial programs by algorithms borrowing from numerical methods for differential equations, namely the Houbolt scheme, the Lie scheme, and a Runge-Kutta scheme. We first introduce a quartic penalty functional (of Ginzburg-Landau type) to approximate the Boolean program by a continuous one and prove some convergence results as the penalty parameter ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon$$\end{document} converges to 0. We prove also that, under reasonable assumptions, the distance between local minimizers of the penalized problem and the set {±1}n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{\pm 1\}^n$$\end{document} is of order O(nε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\sqrt{n}\varepsilon )$$\end{document}. Next, we introduce algorithms for the numerical solution of the penalized problem, these algorithms relying on the Houbolt, Lie and Runge-Kutta schemes, classical methods for the numerical solution of ordinary or partial differential equations. We performed numerical experiments to investigate the impact of various parameters on the convergence of the algorithms. We have tested our ODE approaches and compared with the classical nonlinear optimization solver IPOPT and a quadratic binary formulation approach (QB-G) as well as an exhaustive method using parallel computing techniques. The numerical results on various datasets (including small and large-scale randomly generated synthetic datasets of general Boolean polynomial optimization problems, and a large-scale heterogeneous MQLib benchmark dataset of Max-Cut and Quadratic Unconstrained Binary Optimization (QUBO) problems) show good performances for our ODE approaches. As a result, our ODE algorithms often converge faster than the other compared methods to better integer solutions of the Boolean program.
... The first step taken to develop the proposed fire detection system was the training of the YOLOv4 network. The training was performed using the Mini-batch Gradient Descent algorithm [63] over 30,000 iterations and with a momentum [64] 0.9 and a mini-batch size of 64. Each mini-batch was further subdivided into 16 partitions, so that only 64=16 ¼ 4 images were simultaneously processed by the GPU at a time. ...
Article
Full-text available
Large-scale fires have been increasingly reported in the news media. These events can cause a variety of irreversible damage, what encourages the search for effective solutions to prevent and fight fires. A promising solution is an automatic system based on computer vision capable of detecting fire in early stages, enabling rapid suppression to mitigate damage, minimizing combat and restoration costs. Currently, the most effective systems are typically based on convolutional neural networks (CNNs). However, these networks are computationally expensive and consume a large amount of memory, usually requiring graphics processing units to operate properly in emergency situations. Thus, we propose a CNN-based fire detector system suitable for low-power, resource-constrained devices. Our approach consists of training a deep detection network and then removing its less important convolutional filters in order to reduce its computational cost while trying to preserve its original performance. Through an investigation of different pruning techniques, our results show that we can reduce the computational cost by up to 83.60% and the memory consumption by up to 83.86% without degrading the system's performance. A case study was performed on a Raspberry Pi 4 where the results demonstrate the viability of implementing our proposed system on a low-end device.
... The inertial method is a powerful heuristic technique for accelerating the convergence of iterative methods. The original inertial method is the heavy ball method, developed by Polyak in the seminal work (Polyak 1964), to accelerate the gradient descent for minimizing a smooth objective function. Since then, this method has been employed in many gradient-type methods for acceleration. ...
Article
Full-text available
In this paper, we propose an inertial version of the Proximal Incremental Aggregated Gradient (abbreviated by iPIAG) method for minimizing the sum of smooth convex component functions and a possibly nonsmooth convex regularization function. First, we prove that iPIAG converges linearly under the gradient Lipschitz continuity and the strong convexity, along with an upper bound estimation of the inertial parameter. Then, by employing the recent Lyapunov-function-based method, we derive a weaker linear convergence guarantee, which replaces the strong convexity by the quadratic growth condition. At last, we present two numerical tests to illustrate that iPIAG outperforms the original PIAG.
... FSI and Du Fort-Frankel schemes are just two representatives of a large class of extrapolation strategies, see e.g. [74,81,103]. The ongoing success of using momentum methods for training [91,100] and constructing [69,78,111] neural networks warrants an extensive investigation of these strategies in both worlds. ...
Article
Full-text available
We investigate numerous structural connections between numerical algorithms for partial differential equations (PDEs) and neural architectures. Our goal is to transfer the rich set of mathematical foundations from the world of PDEs to neural networks. Besides structural insights, we provide concrete examples and experimental evaluations of the resulting architectures. Using the example of generalised nonlinear diffusion in 1D, we consider explicit schemes, acceleration strategies thereof, implicit schemes, and multigrid approaches. We connect these concepts to residual networks, recurrent neural networks, and U-net architectures. Our findings inspire a symmetric residual network design with provable stability guarantees and justify the effectiveness of skip connections in neural networks from a numerical perspective. Moreover, we present U-net architectures that implement multigrid techniques for learning efficient solutions of partial differential equation models, and motivate uncommon design choices such as trainable nonmonotone activation functions. Experimental evaluations show that the proposed architectures save half of the trainable parameters and can thus outperform standard ones with the same model complexity. Our considerations serve as a basis for explaining the success of popular neural architectures and provide a blueprint for developing new mathematically well-founded neural building blocks.
... By the definition of M η in (20), the last equation is the same as (24). ...
Preprint
The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence of PGD for UMLS often focus on global convergence to stationary points. As a non-convex problem, only sublinear convergence rate has been established. However, these results do not explain the fast convergence of PGD frequently observed in practice. This manuscript presents a novel analysis of convergence of PGD for UMLS, justifying the linear convergence behavior of the algorithm near the solution. By exploiting the local structure of the objective function and the constraint set, we establish an exact expression of the convergence rate and characterize the conditions for linear convergence. Simulations show that our theoretical analysis corroborates numerical examples. Furthermore, variants of PGD with adaptive step sizes are proposed based on the new insight revealed in our convergence analysis. The variants show substantial acceleration in practice.
... Forward-Backward with Polyak momentum [30] can be written as a fixed point iteration of the frugal splitting operator ...
Preprint
We consider frugal splitting operators for finite sum monotone inclusion problems, i.e., splitting operators that use exactly one direct or resolvent evaluation of each operator of the sum. A novel representation of these operators in terms of what we call a generalized primal-dual resolvent is presented. This representation reveals a number of new results regarding lifting numbers, existence of solution maps, and parallelizability of the forward/backward evaluations. We show that the minimal lifting is $n-1-f$ where $n$ is the number of monotone operators and $f$ is the number of direct evaluations in the splitting. Furthermore, we show that this lifting number is only achievable as long as the first and last evaluation are resolvent evaluations. In the case of frugal resolvent splitting operators, these results recovers the results of Ryu and Malitsky--Tam. The representation also enables a unified convergence analysis and we present a generally applicable theorem for the convergence and Fej\'er monotonicity of fixed point iterations of frugal splitting operators with cocoercive direct evaluations. We conclude by constructing a new convergent and parallelizable frugal splitting operator with minimal lifting.
... We would like to view our updates as analogous to Polyak's heavy ball momentum [61,67]. In the context of optimization, momentum allows previous update directions to influence the current update direction, typically in the form of an exponentially-decaying average. ...
Article
Full-text available
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective’s gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak’s heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
... To speed up the convergence rate, following the heavy ball method of Polyak [35], Nesterov [34] introduced a modified heavy ball method as follows: ...
Chapter
We consider the problem of applying adaptive optimization methods in neural network regression tasks like estimating image quality or aligning objects in the frame. Such tasks are usual for preprocessing in image analysis. Two sample tasks are presented. The first is evaluating the degree of blurring in the iris recognition system. Eye images are taken from BATH and CASIA databases, and samples are generated by Gaussian blurring. The second sample task is the alignment of the face in an image. Training samples are obtained by rotating face images, and the rotation angle is estimated. Both tasks are solved by direct estimation of parameters with a neural network. The resulting accuracy of parameter estimation is acceptable for practical use. The Adam algorithm and its modifications, such as AdamW and Radam, are applied. The modification of the Radam algorithm, changing weight decay strategy, is proposed. This modification reduces the error 1.5 times in comparison with the model trained by the original algorithm.
Nonlinear functional equations and continuous malooaes of iterative methods. frv. FIIZov. Ser. lat
• M K Gavurin
Gavurin, M. K., Nonlinear functional equations and continuous malooaes of iterative methods. frv. FIIZov. Ser. lat., 5, 16-31, 1958.
The problem of speeding up the convergence of iteration processes in the approximate solution of linear operational equations
• A S Buledx
Buledx, A. S., The problem of speeding up the convergence of iteration processes in the approximate solution of linear operational equations.
Commutative Noraal- izcd Rings (Kommutstivnye normirovsntue kol' ts8)
• I W Gel 'fsnd
• D A Raikov
• G E Shilov
Gel' fsnd. I. W., Raikov, D.A. 8nd Shilov, G.E., Commutative Noraal- izcd Rings (Kommutstivnye normirovsntue kol' ts8). Fizmatgiz, Moscow. 1960.