Conference Paper

# Learning rate adaptation by line search in evolution strategies with recombination

Authors:
• Inria (National Institute for Research in Computer Science and Control)
To read the full-text of this research, you can request a copy directly from the authors.

## No full-text available

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Scaling-invariant functions preserve the order of points when the points are scaled by the same positive scalar (usually with respect to a unique reference point). Composites of strictly monotonic functions with positively homogeneous functions are scaling-invariant with respect to zero. We prove in this paper that also the reverse is true for large classes of scaling-invariant functions. Specifically, we give necessary and sufficient conditions for scaling-invariant functions to be composites of a strictly monotonic function with a positively homogeneous function. We also study sublevel sets of scaling-invariant functions generalizing well-known properties of positively homogeneous functions.
Article
Full-text available
In the context of unconstraint numerical optimization, this paper investigates the global linear convergence of a simple probabilistic derivative-free optimization algorithm (DFO). The algorithm samples a candidate solution from a standard multivariate normal distribution scaled by a step-size and centered in the current solution. This solution is accepted if it has a better objective function value than the current one. Crucial to the algorithm is the adaptation of the step-size that is done in order to maintain a certain probability of success. The algorithm, already proposed in the 60's, is a generalization of the well-known Rechenberg's $(1+1)$ Evolution Strategy (ES) with one-fifth success rule which was also proposed by Devroye under the name compound random search or by Schumer and Steiglitz under the name step-size adaptive random search. In addition to be derivative-free, the algorithm is function-value-free: it exploits the objective function only through comparisons. It belongs to the class of comparison-based step-size adaptive randomized search (CB-SARS). For the convergence analysis, we follow the methodology developed in a companion paper for investigating linear convergence of CB-SARS: by exploiting invariance properties of the algorithm, we turn the study of global linear convergence on scaling-invariant functions into the study of the stability of an underlying normalized Markov chain (MC). We hence prove global linear convergence by studying the stability (irreducibility, recurrence, positivity, geometric ergodicity) of the normalized MC associated to the $(1+1)$-ES. More precisely, we prove that starting from any initial solution and any step-size, linear convergence with probability one and in expectation occurs. Our proof holds on unimodal functions that are the composite of strictly increasing functions by positively homogeneous functions with degree $\alpha$ (assumed also to be continuously differentiable). This function class includes composite of norm functions but also non-quasi convex functions. Because of the composition by a strictly increasing function, it includes non continuous functions. We find that a sufficient condition for global linear convergence is the step-size increase on linear functions, a condition typically satisfied for standard parameter choices. While introduced more than 40 years ago, we provide here the first proof of global linear convergence for the $(1+1)$-ES with generalized one-fifth success rule and the first proof of linear convergence for a CB-SARS on such a class of functions that includes non-quasi convex and non-continuous functions. Our proof also holds on functions where linear convergence of some CB-SARS was previously proven, namely convex-quadratic functions (including the well-know sphere function).
Chapter
Full-text available
The paper presents the asymptotical analysis of a technique for improving the convergence of evolution strategies (ES) on noisy fitness data. This technique that may be called “Mutate large, but inherit small”, is discussed in light of the EPP (evolutionary progress principle). The derivation of the progress rate formula is sketched, its predictions are compared with experiments, and its limitations are shown. The dynamical behavior of the ES is investigated. It will be shown that standard self-adaptation has considerable problems to drive the ES in its optimum working regime. Remedies are provided to improve the self-adaptation.
Conference Paper
Full-text available
This paper investigates the limits of the predictions based on the classical progress rate theory for Evolution Strategies. We explain on the sphere function why positive progress rates give convergence in mean, negative progress rates di- vergence in mean and show that almost sure convergence can take place despite divergence in mean. Hence step-sizes associated to negative progress can actually lead to almost sure convergence. Based on these results we provide an al- ternative progress rate denition related to almost sure con- vergence. We present Monte Carlo simulations to investigate the discrepancy between both progress rates and therefore both types of convergence. This discrepancy vanishes when dimension increases. The observation is supported by an asymptotic estimation of the new progress rate denition. Categories and Subject Descriptors: G.1.6 (Numerical Analysis): Optimization|Global optimization,Unconstrained optimization; F.2.1(Analysis of Algorithms and Problem Com- plexity):Numerical Algorithms and Problems
Conference Paper
Full-text available
This paper introduces mirrored sampling into evolution strategies (ESs) with weighted multi-recombination. Two further heuristics are introduced: pairwise selection selects at most one of two mirrored vectors in order to avoid a bias due to recombination. Selective mirroring only mirrors the worst solutions of the population. Convergence rates on the sphere function are derived that also yield upper bounds for the convergence rate on any spherical function. The optimal fraction of offspring to be mirrored is regardless of pairwise selection one without selective mirroring and about 19% with selective mirroring, where the convergence rate reaches a value of 0.390. This is an improvement of 56% compared to the best known convergence rate of 0.25 with positive recombination weights.
Article
Full-text available
We consider unconstrained randomized optimization of convex objective functions. We analyze the Random Pursuit algorithm, which iteratively computes an approximate solution to the optimization problem by repeated optimization over a randomly chosen one-dimensional subspace. This randomized method only uses zeroth-order information about the objective function and does not need any problem-specific parametrization. We prove convergence and give convergence rates for smooth objectives assuming that the one-dimensional optimization can be solved exactly or approximately by an oracle. A convenient property of Random Pursuit is its invariance under strictly monotone transformations of the objective function. It thus enjoys identical convergence behavior on a wider function class. To support the theoretical results we present extensive numerical performance results of Random Pursuit, two gradient-free algorithms recently proposed by Nesterov, and a classical adaptive step-size random search scheme. We also present an accelerated heuristic version of the Random Pursuit algorithm which significantly improves standard Random Pursuit on all numerical benchmark problems. A general comparison of the experimental results reveals that (i) standard Random Pursuit is effective on strongly convex functions with moderate condition number, and (ii) the accelerated scheme is comparable to Nesterov's fast gradient method and outperforms adaptive step-size strategies. The appendix contains additional supporting online material.
Article
Quality gain is the expected relative improvement of the function value in a single step of a search algorithm. Quality gain analysis reveals the dependencies of the quality gain on the parameters of a search algorithm, based on which one can derive the optimal values for the parameters. In this paper, we investigate evolution strategies with weighted recombination on general convex quadratic functions. We derive a bound for the quality gain and two limit expressions of the quality gain. From the limit expressions, we derive the optimal recombination weights and the optimal step-size, and find that the optimal recombination weights are independent of the Hessian of the objective function. Moreover, the dependencies of the optimal parameters on the dimension and the population size are revealed. Differently from previous works where the population size is implicitly assumed to be smaller than the dimension, our results cover the population size proportional to or greater than the dimension. Numerical simulation shows that the asymptotically optimal step-size well approximates the empirically optimal step-size for a finite dimensional convex quadratic function.
Article
In this paper, we consider \emph{comparison-based} adaptive stochastic algorithms for solving numerical optimisation problems. We consider a specific subclass of algorithms called comparison-based step-size adaptive randomized search (CB-SARS), where the state variables at a given iteration are a vector of the search space and a positive parameter, the step-size, typically controlling the overall standard deviation of the underlying search distribution. We investigate the linear convergence of CB-SARS on \emph{scaling-invariant} objective functions. Scaling-invariant functions preserve the ordering of points with respect to their function value when the points are scaled with the same positive parameter (the scaling is done w.r.t. a fixed reference point). This class of functions includes norms composed with strictly increasing functions as well as \emph{non quasi-convex} and \emph{non-continuous} functions. On scaling-invariant functions, we show the existence of a homogeneous Markov chain, as a consequence of natural invariance properties of CB-SARS (essentially scale-invariance and invariance to strictly increasing transformation of the objective function). We then derive sufficient conditions for asymptotic \emph{global linear convergence} of CB-SARS, expressed in terms of different stability conditions of the normalised homogeneous Markov chain (irreducibility, positivity, Harris recurrence, geometric ergodicity) and thus define a general methodology for proving global linear convergence of CB-SARS algorithms on scaling-invariant functions.
Article
Evolution strategies (ESs) are stochastic optimization algorithms recognized as powerful algorithms for difficult optimization problems in a black-box scenario. Together with other stochastic search algorithms for continuous domain (like differential evolution, estimation of distribution algorithms, particle swarm optimization, simulated annealing ...) they are so-called global optimization algorithms, as opposed to gradient based algorithms usually referred to as local search algorithms. Many theoretical works on stochastic optimization algorithms focus on investigating convergence to the global optimum with probability one, under very mild assumptions on the objective functions. On the other hand, the theory of evolution strategies has been restricted for a long time to the so-called progress rate theory, analyzing the one-step progress of ESs on unimodal, possibly noisy functions. This chapter covers global convergence results, revealing slow convergence rates on a wide class of functions, and fast convergence results on more restricted function classes. After reviewing the important components of ESs algorithms, we illustrate how global convergence with probability one can be proven easily. We recall two important classes of convergence, namely sub-linear and linear convergence, corresponding to the convergence class of the pure random search and to the optimal convergence class for rank-based algorithms respectively. We review different lower and upper bounds for adaptive ESs, and explain the link between lower bounds and the progress rate theory. In the last part, we focus on recent results on linear convergence of adaptive ESs for the class of spherical and ellipsoidal functions, we explain how almost sure linear convergence can be proven using different laws of large numbers (LLN).
Conference Paper
Evolution Strategies (ESs) are population-based methods well suited for parallelization. In this paper, we study the convergence of the (μ/μ w ,λ)-ES, an ES with weighted recombination, and derive its optimal convergence rate and optimal μ especially for large population sizes. First, we theoretically prove the log-linear convergence of the algorithm using a scale-invariant adaptation rule for the step-size and minimizing spherical objective functions and identify its convergence rate as the expectation of an underlying random variable. Then, using Monte-Carlo computations of the convergence rate in the case of equal weights, we derive optimal values for μ that we compare with previously proposed rules. Our numerical computations show also a dependency of the optimal convergence rate in ln (λ) in agreement with previous theoretical results.
Conference Paper
“Hit-and-run is fast and fun” to generate a random point in a high dimensional convex set K (Lovász/Vempala, MSR-TR-2003-05). More precisely, the hit-and-run random walk mixes fast independently of where it is started inside the convex set. To hit-and-run from a point $${x} \varepsilon {\mathcal{R}}^{n}$$, a line L through x is randomly chosen (uniformly over all directions). Subsequently, the walk’s next point is sampled from L ∩ K using a membership oracle which tells us whether a point is in K or not. Here the focus is on black-box optimization, however, where the function $$f:{\mathcal{R}}^{n} \rightarrow \mathcal R$$ to be minimized is given as an oracle, namely a black box for f-evaluations. We obtain in an obvious way a direct-search method when we substitute the f-oracle for the K-membership oracle to do a line search over L, and, naturally, we are interested in how fast such a hit-and-run direct-search heuristic converges to the optimum point x * in the search space $${\mathcal{R}}^{n}$$. We prove that, even under the assumption of perfect line search, the search converges (at best) linearly at an expected rate larger (i.e. worse) than 1 − 1/n. This implies a lower bound of 0.5 n on the expected number of line searches necessary to halve the approximation error. Moreover, we show that 0.4 n line searches suffice to halve the approximation error only with an exponentially small probability of $$\exp(-\Omega(n^{1/3}))$$. Since each line search requires at least one query to the f-oracle, the lower bounds obtained hold also for the number of f-evaluations.
Article
This paper puts forward two useful methods for self-adaptation of the mutation distribution - the concepts of derandomization and cumulation. Principle shortcomings of the concept of mutative strategy parameter control and two levels of derandomization are reviewed. Basic demands on the self-adaptation of arbitrary (normal) mutation distributions are developed. Applying arbitrary, normal mutation distributions is equiv-alent to applying a general, linear problem encoding. The underlying objective of mutative strategy parameter control is roughly to favor previously selected mutation steps in the future. If this objective is pursued rigor-ously, a completely derandomized self-adaptation scheme results, which adapts arbitrary normal mutation distributions. This scheme, called covariance matrix adaptation (CMA), meets the previously stated demands. It can still be considerably improved by cumulation - utilizing an evolution path rather than single search steps. Simulations on various test functions reveal local and global search properties of the evolution strategy with and without covariance matrix adaptation. Their performances are comparable only on perfectly scaled functions. On badly scaled, non-separable functions usually a speed up factor of several orders of magnitude is ob-served. On moderately mis-scaled functions a speed up factor of three to ten can be expected.
Supplementary material for Learning rate adaptation by line search in evolution strategies with recombination. hal-03626292 , 2022 . Armand Gissler, Anne Auger, and Nikolaus Hansen. Supplementary material for Learning rate adaptation by line search in evolution strategies with recombination
• Armand Gissler
• Anne Auger
• Nikolaus Hansen
• Gissler Armand
Global linear convergence of evolution strategies with recombination on scaling-invariant functions
• Cheikh Touré
• Anne Auger
• Nikolaus Hansen
• Touré Cheikh