ArticlePublisher preview available

Scaling-invariant Functions versus Positively Homogeneous Functions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Scaling-invariant functions preserve the order of points when the points are scaled by the same positive scalar (usually with respect to a unique reference point). Composites of strictly monotonic functions with positively homogeneous functions are scaling-invariant with respect to zero. We prove in this paper that also the reverse is true for large classes of scaling-invariant functions. Specifically, we give necessary and sufficient conditions for scaling-invariant functions to be composites of a strictly monotonic function with a positively homogeneous function. We also study sublevel sets of scaling-invariant functions generalizing well-known properties of positively homogeneous functions.
Level sets of SI functions with respect to the red star x⋆\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document}. The four functions are strictly increasing transformations of x↦p(x-x⋆)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x \mapsto p(x - x^{\star })$$\end{document} where p is a PH function. From left to right: p(x)=‖x‖;\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x) = \Vert x\Vert ;$$\end{document}p(x)=x⊤Ax\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x) = x^\top A x$$\end{document} for A symmetric positive and definite; p(x)=∑ixi2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x) = \left( \sum _{i} \sqrt{\left|x_{i}\right|} \right) ^{2}$$\end{document} the 12\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{2}$$\end{document}-norm; a randomly generated SI function from a “smoothly” randomly perturbed sphere function. The two first functions from the left have convex sublevel sets, contrary to the last two
… 
This content is subject to copyright. Terms and conditions apply.
Journal of Optimization Theory and Applications (2021) 191:363–383
https://doi.org/10.1007/s10957-021-01943-7
Scaling-invariant Functions versus Positively Homogeneous
Functions
Cheikh Toure1·Armand Gissler1·Anne Auger1·Nikolaus Hansen1
Received: 8 January 2021 / Accepted: 7 September 2021 / Published online: 23 September 2021
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
Abstract
Scaling-invariant functions preserve the order of points when the points are scaled
by the same positive scalar (usually with respect to a unique reference point). Com-
posites of strictly monotonic functions with positively homogeneous functions are
scaling-invariant with respect to zero. We prove in this paper that also the reverse is
true for large classes of scaling-invariant functions. Specifically, we give necessary
and sufficient conditions for scaling-invariant functions to be composites of a strictly
monotonic function with a positively homogeneous function. We also study sublevel
sets of scaling-invariant functions generalizing well-known properties of positively
homogeneous functions.
Keywords Scaling-invariant function ·Positively homogeneous function ·Compact
level set
Mathematics Subject Classification 49J52 ·54C35
Communicated by Juan-Enrique Martinez Legaz.
BAnne Auger
anne.auger@nospam-inria.fr
Cheikh Toure
cheikh.toure@polytechnique.edu
Armand Gissler
armand.gissler@nospam-inria.fr
Nikolaus Hansen
nikolaus.hansen@nospam-inria.fr
1Inria and CMAP, Ecole Polytechnique, IP Paris, Palaiseau, France
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Surprisingly, we highlight that scaling-invariant functions can have highly pathological behaviors, by showing real-valued scaling-invariant functions that are not monotonic on any nontrivial interval. The results of Chapter 3 are published in the Journal of Optimization Theory and Applications (JOTA) and are presented in [184]: Cheikh Toure, Armand Gissler, Anne Auger and Nikolaus Hansen, Scaling-invariant functions versus positively homogeneous functions, Journal of Optimization Theory and Applications, 2021. ...
... Le contenu de ce chapitre est publié dans le Journal of Optimization Theory and Applications (JOTA). Il est présenté dans [184]: Cheikh Toure, Armand Gissler, Anne Auger and Nikolaus Hansen, Scaling-invariant functions versus positively homogeneous functions, Journal of Optimization Theory and Applications, 2021. ...
... A nontrivial linear function is a continuous scaling-invariant function with Lebesgue negligible level sets. Also [184,Proposition 4.2] implies that f still has Lebesgue negligible level sets in the case where it is a C 1 scaling-invariant function with a unique global argmin. ...
Full-text available
Thesis
This work is dedicated to zero-order black-box optimization, where only a sequence of function evaluations is available for the update of the optimization algorithm. Evolutionary algorithms are commonly used to solve this type of problems. Among them, evolution strategies like CMA-ES are state-of-the-art optimization algorithms for zero-order black-box optimization problems with a continuous search space. Particular aspects of the CMA-ES are the recombination mechanism and the non-elitist selection scheme, that are crucial to deal with local irregularities and multimodality. A multiobjective CMA-ES (with recombination) is then particularly in demand for real world applications, to tackle multiobjective problems with local Pareto fronts.We design that type of multiobjective optimizers. More specifically, a new multiobjective indicator called Uncrowded Hypervolume Improvement (UHVI) is created, along with a framework of multiobjective optimizers called Sofomore. By instantiating Sofomore with CMA-ES, COMO-CMA-ES is obtained. The COMO-CMA-ES algorithm is experimented on bi-objective functions that we analyze in details in this thesis, that are the bi-objective convex quadratic problems. Interestingly, linear convergence results are empirically observed, which is the optimal linear behavior we can get since CMA-ES converges linearly on strictly convex-quadratic functions. A Python package called pycomocma and a Matlab interface are developed in this work for COMO-CMA-ES and the Sofomore framework.On a theoretical perspective, we analyze global linear convergence of evolution strategies with recombination that include well-known optimization algorithms, on a wide class of functions that are the scaling-invariant functions. Our main condition for convergence is that the expected logarithm of the step-size must increase on nontrivial linear functions. We analyze thoroughly the class of scaling-invariant functions and emphasize similar properties that they share with the positively homogeneous functions.
... A nontrivial linear function is a continuous scaling-invariant function with Lebesgue negligible level sets. Also [46,Proposition 4.2] implies that f still has Lebesgue negligible level sets in the case where it is a C 1 scaling-invariant function with a unique global argmin. ...
... By Proposition 5, we obtain that for all z ∈ R n , p f z is defined as in (15). In addition, f has Lebesgue negligible level sets (see [46,Proposition 4.2] and Lemma 1). Therefore p f z > 0 almost everywhere. ...
... Note beforehand that α f (x + z, U 1 ) = αf (z, U 1 ) so that we assume without loss of generality that x = 0 and f (0) = 0. If f is a C 1 scaling-invariant function with a unique global argmin, we can construct thanks to [46,Proposition 4.11] a positive number δ f such that for all element z of the compact set L f,z f 0 +B(0, 2δ f ), z ∇f (z) > 0. In particular this result produces a compact neighborhood of the level set L f,z f 0 where ∇f does not vanish. This helps to establish the limit of E [ϕ(α f (z, U 1 ))] when z goes to ∞. ...
Full-text available
Preprint
Evolution Strategies (ES) are stochastic derivative-free optimization algorithms whose most prominent representative, the CMA-ES algorithm, is widely used to solve difficult numerical optimization problems. We provide the first rigorous investigation of the linear convergence of step-size adaptive ES involving a population and recombination, two ingredients crucially important in practice to be robust to local irregularities or multimodality. Our methodology relies on investigating the stability of a Markov chain associated to the algorithm. Our stability study is crucially based on recent developments connecting the stability of deterministic control models to the stability of associated Markov chains. We investigate convergence on composites of strictly increasing functions with continuously differentiable scaling-invariant functions with a global optimum. This function class includes functions with non-convex sublevel sets and discontinuous functions. We prove the existence of a constant r such that the logarithm of the distance to the optimum divided by the number of iterations of step-size adaptive ES with weighted recombination converges to r. The constant is given as an expectation with respect to the stationary distribution of a Markov chain-its sign allows to infer linear convergence or divergence of the ES and is found numerically. Our main condition for convergence is the increase of the expected log step-size on linear functions. In contrast to previous results, our condition is equivalent to the almost sure geometric divergence of the step-size.
... A nontrivial linear function is a continuous scaling-invariant function with Lebesgue negligible level sets. Also f still has Lebesgue negligible level sets in the case where it is a C 1 scaling-invariant function with a unique global argmin [66,Proposition 4.2]. ...
Full-text available
Article
Evolution Strategies (ESs) are stochastic derivative-free optimization algorithms whose most prominent representative, the CMA-ES algorithm, is widely used to solve difficult numerical optimization problems. We provide the first rigorous investigation of the linear convergence of step-size adaptive ESs involving a population and recombination, two ingredients crucially important in practice to be robust to local irregularities or multimodality. We investigate the convergence of step-size adaptive ESs with weighted recombination on composites of strictly increasing functions with continuously differentiable scaling-invariant functions with a global optimum. This function class includes functions with non-convex sublevel sets and discontinuous functions. We prove the existence of a constant r such that the logarithm of the distance to the optimum divided by the number of iterations converges to r. The constant is given as an expectation with respect to the stationary distribution of a Markov chain—its sign allows to infer linear convergence or divergence of the ES and is found numerically. Our main condition for convergence is the increase of the expected log step-size on linear functions. In contrast to previous results, our condition is equivalent to the almost sure geometric divergence of the step-size on linear functions.
Full-text available
Article
We say that a positively homogeneous function admits a saddle representation by linear functions iff it admits both an inf-sup-representation and a sup-inf-representation with the same two-index family of linear functions. In the paper we show that each continuous positively homogeneous function can be associated with a two-index family of linear functions which provides its saddle representation. We also establish characteristic properties of those two-index families of linear functions which provides saddle representations of functions belonging to the subspace of Lipschitz continuous positively homogeneous functions as well as the subspaces of difference sublinear and piecewise linear functions.
Full-text available
Article
The paper deals with positively homogeneous functions defined on a finite-dimensional space. Our attention is mainly focused on those subspaces of positively homogeneous functions that are important in nonsmooth analysis and optimization: the subspace of continuous positively homogeneous functions, of Lipschitz continuous positively homogeneous functions, of difference sublinear functions, and of piecewise linear functions. We reproduce some known results and present a number of new ones, in particular, those that concern Lipschitz continuous positively homogeneous functions.
Conference Paper
We prove the linear convergence of the (1 + 1)-Evolution Strategy (ES) with a success based step-size adaptation on a broad class of functions, including strongly convex functions with Lipschitz continuous gradients, which is often assumed to analyze gradient based methods. Our proof is based on the methodology recently developed to analyze the same algorithm on the spherical function, namely the additive drift analysis on unbounded continuous domain. An upper bound of the expected first hitting time is derived, from which we can conclude that our algorithm converges linearly. We investigate the class of functions that satisfy the assumptions of our main theorem, revealing that strongly convex functions with Lipschitz continuous gradients and their strictly increasing transformation satisfy the assumptions. To the best of our knowledge, this is the first paper showing the linear convergence of the (1+1)-ES on such a broad class of functions. This opens the possibility to compare the (1 + 1)-ES and gradient based methods in theory.
Article
We consider Markov chains that obey the following general non-linear state space model: k+1 = F(k,α(k,Uk+1)) where the function F is C¹ while α is typically discontinuous and {Uk : k ∈ Z>0} is an independent and identically distributed process. We assume that for all x, the random variable α(x,U1) admits a density px such that (x,w)→ px(w) is lower semi-continuous. We generalize and extend previous results that connect properties of the underlying deterministic control model to provide conditions for the chain to be ϕ-irreducible and aperiodic. By building on those results, we show that if a rank condition on the controllability matrix is satisfied for all x, there is equivalence between the existence of a globally attracting state for the control model and ϕ-irreducibility of the Markov chain. Additionally, under the same rank condition on the controllability matrix, we prove that there is equivalence between the existence of a steadily attracting state and the ϕ-irreducibility and aperiodicity of the chain. The notion of steadily attracting state is new. We additionally derive practical conditions by showing that the rank condition on the controllability matrix needs to be verified only at a globally attracting state (resp. steadily attracting state) for the chain to be a ϕ-irreducible T-chain (resp. ϕ-irreducible aperiodic T-chain). Those results hold under considerably weaker assumptions on the model than previous ones that would require (x,u) → F(x,α(x,u)) to be C∞ (while it can be discontinuous here). Additionally the establishment of a necessary and sufficient condition on the control model for the ϕ-irreducibility and aperiodicity without a structural assumption on the control set is novel – even for Markov chains where (x,u)→ F(x,α(x,u)) is C∞. We illustrate that the conditions are easy to verify on a non-trivial and non-artificial example of Markov chain arising in the context of adaptive stochastic search algorithms to optimize continuous functions in a black-box scenario.
Article
A nonlinear duality operation is defined for the class of increasing positively homogeneous functions defined on the positive orthant (including zero). This class of function and the associated class of normal sets are used extensively in Mathematical Economics. Various examples are provided along with a discussion of duality for a class of optimization problems involving increasing functions and normal sets.
Book
Marek Kuczma was born in 1935 in Katowice, Poland, and died there in 1991. After finishing high school in his home town, he studied at the Jagiellonian University in Kraków. He defended his doctoral dissertation under the supervision of Stanislaw Golab. In the year of his habilitation, in 1963, he obtained a position at the Katowice branch of the Jagiellonian University (now University of Silesia, Katowice), and worked there till his death. Besides his several administrative positions and his outstanding teaching activity, he accomplished excellent and rich scientific work publishing three monographs and 180 scientific papers. He is considered to be the founder of the celebrated Polish school of functional equations and inequalities. "The second half of the title of this book describes its contents adequately. Probably even the most devoted specialist would not have thought that about 300 pages can be written just about the Cauchy equation (and on some closely related equations and inequalities). And the book is by no means chatty, and does not even claim completeness. Part I lists the required preliminary knowledge in set and measure theory, topology and algebra. Part II gives details on solutions of the Cauchy equation and of the Jensen inequality [...], in particular on continuous convex functions, Hamel bases, on inequalities following from the Jensen inequality [...]. Part III deals with related equations and inequalities (in particular, Pexider, Hossz, and conditional equations, derivations, convex functions of higher order, subadditive functions and stability theorems). It concludes with an excursion into the field of extensions of homomorphisms in general." (Janos Aczel, Mathematical Reviews) "This book is a real holiday for all the mathematicians independently of their strict speciality. One can imagine what deliciousness represents this book for functional equationists." (B. Crstici, Zentralblatt für Mathematik).
Article
In this paper, we consider \emph{comparison-based} adaptive stochastic algorithms for solving numerical optimisation problems. We consider a specific subclass of algorithms called comparison-based step-size adaptive randomized search (CB-SARS), where the state variables at a given iteration are a vector of the search space and a positive parameter, the step-size, typically controlling the overall standard deviation of the underlying search distribution. We investigate the linear convergence of CB-SARS on \emph{scaling-invariant} objective functions. Scaling-invariant functions preserve the ordering of points with respect to their function value when the points are scaled with the same positive parameter (the scaling is done w.r.t. a fixed reference point). This class of functions includes norms composed with strictly increasing functions as well as \emph{non quasi-convex} and \emph{non-continuous} functions. On scaling-invariant functions, we show the existence of a homogeneous Markov chain, as a consequence of natural invariance properties of CB-SARS (essentially scale-invariance and invariance to strictly increasing transformation of the objective function). We then derive sufficient conditions for asymptotic \emph{global linear convergence} of CB-SARS, expressed in terms of different stability conditions of the normalised homogeneous Markov chain (irreducibility, positivity, Harris recurrence, geometric ergodicity) and thus define a general methodology for proving global linear convergence of CB-SARS algorithms on scaling-invariant functions.