ArticlePDF Available

Smoothing SQP Methods for Solving Degenerate Nonsmooth Constrained Optimization Problems with Applications to Bilevel Programs

Authors:

Abstract

We consider a degenerate nonsmooth and nonconvex optimization problem for which the standard constraint qualification such as the generalized Mangasarian.Fromovitz constraint qualification (GMFCQ) may not hold. We use smoothing functions with the gradient consistency property to approximate the nonsmooth functions and introduce a smoothing sequential quadratic programming (SQP) algorithm under the l∞ penalty framework. We show that any accumulation point of a selected subsequence of the iteration sequence generated by the smoothing SQP algorithm is a Clarke stationary point, provided that the sequence of multipliers and the sequence of penalty parameters are bounded. Furthermore, we propose a new condition called the weakly generalized Mangasarian- Fromovitz constraint qualification (WGMFCQ) that is weaker than the GMFCQ. We show that the extended version of the WGMFCQ guarantees the boundedness of the sequence of multipliers and the sequence of penalty parameters and thus guarantees the global convergence of the smoothing SQP algorithm. We demonstrate that the WGMFCQ can be satisfied by bilevel programs for which the GMFCQ never holds. Preliminary numerical experiments show that the algorithm is efficient for solving degenerate nonsmooth optimization problems such as the simple bilevel program.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. OPTIM.c
2015 Society for Industrial and Applied Mathematics
Vol. 25, No. 3, pp. 1388–1410
SMOOTHING SQP METHODS FOR SOLVING DEGENERATE
NONSMOOTH CONSTRAINED OPTIMIZATION PROBLEMS WITH
APPLICATIONS TO BILEVEL PROGRAMS
MENGWEI XU, JANE J. YE,AND LIWEI ZHANG§
Abstract. We consider a degenerate nonsmooth and nonconvex optimization problem for which
the standard constraint qualification such as the generalized Mangasarian–Fromovitz constraint qual-
ification (GMFCQ) may not hold. We use smoothing functions with the gradient consistency property
to approximate the nonsmooth functions and introduce a smoothing sequential quadratic program-
ming (SQP) algorithm under the lpenalty framework. We show that any accumulation point of a
selected subsequence of the iteration sequence generated by the smoothing SQP algorithm is a Clarke
stationary point, provided that the sequence of multipliers and the sequence of penalty parameters
are bounded. Furthermore, we propose a new condition called the weakly generalized Mangasarian–
Fromovitz constraint qualification (WGMFCQ) that is weaker than the GMFCQ. We show that the
extended version of the WGMFCQ guarantees the boundedness of the sequence of multipliers and
the sequence of penalty parameters and thus guarantees the global convergence of the smoothing
SQP algorithm. We demonstrate that the WGMFCQ can be satisfied by bilevel programs for which
the GMFCQ never holds. Preliminary numerical experiments show that the algorithm is efficient for
solving degenerate nonsmooth optimization problems such as the simple bilevel program.
Key words. nonsmooth optimization, constrained optimization, smoothing function, sequential
quadratic programming algorithm, bilevel program, constraint qualification
AMS subject classifications. 65K10, 90C26, 90C30
DOI. 10.1137/140971580
1. Introduction. In this paper, we consider the constrained optimization prob-
lem of the form
(P) min f(x)
s.t.g
i(x)0,i=1,...,p,
hj(x)=0,j=p+1,...,q,
where the objective function and constraint functions f,gi(i=1,...,p),h
j(j=
p+1,...,q):RnRare locally Lipschitz. In particular, our focus is on solving
a degenerate problem for which the generalized Mangasarian–Fromovitz constraint
qualification (GMFCQ) may not hold at a stationary point.
The sequential quadratic programming (SQP) method is one of the most effective
methods for solving smooth constrained optimization problems. For the current iter-
ation point xk, the basic idea of the SQP method is to generate a descent direction
Received by the editors June 4, 2014; accepted for publication (in revised form) May 8, 2015;
published electronically July 14, 2015.
http://www.siam.org/journals/siopt/25-3/97158.html
Department of Mathematics, School of Science, Tianjin University, Tianjin, 300072, China
(xumengw@hotmail.com).
Department of Mathematics and Statistics, University of Victoria, Victoria V8W 2Y2, BC,
Canada (janeye@uvic.ca). The research of this author was partially supported by NSERC.
§School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
(lwzhang@dlut.edu.cn). The research of this author was supported by the National Natural Sci-
ence Foundation of China under project 91330206.
1388
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1389
dkby solving the following quadratic programming problem:
min
df(xk)Td+1
2dTWkd
s.t.g
i(xk)+gi(xk)Td0,i=1,...,p,
hj(xk)+hj(xk)Td=0,j=p+1,...,q,
where f(x) denotes the gradient of function fat xand Wkis a symmetric positive
definite matrix that approximates the Hessian matrix of the Lagrangian function.
Then dkis used to generate the next iteration point: xk+1 := xk+αkdk,where the
stepsize αkis chosen to yield a sufficient decrease of a suitable merit function. The
SQP algorithm with αk= 1 was first studied by Wilson in [44], where the exact
Hessian matrix of the Lagrangian function was used as Wk. Garcia-Palomares and
Mangasarian [19] proposed to use an estimate to approximate the Hessian matrix. Han
[21] proposed to update the matrix Wkby the Broyden–Fletcher–Goldfarb–Shanno
(BFGS) formula. When the stepsize αk= 1, the convergence is only local. To
obtain a global convergence, Han [22] proposed to use the classical l1penalty function
as a merit function to determine the step size. While the l1penalty function is not
differentiable, the authors of [36] suggested using the augmented Lagrangian function,
which is a smooth function as a merit function. The inconsistency of the system of the
linearized constraints is a serious limitation of the SQP method. Several techniques
have been introduced to deal with the possible inconsistency. For example, Pantoja
and Mayne [35] proposed to replace the standard SQP subproblem by the following
penalized SQP subproblem:
min
d,ξ f(xk)Td+1
2dTWkd+rkξ
s.t.g
i(xk)+gi(xk)Tdξ, i =1,...,p,
ξhj(xk)+hj(xk)Tdξ, j =p+1,...,q,
ξ0,
where the penalty parameter rk>0. Unlike the standard SQP subproblem which
may not have feasible solutions, the penalized SQP subproblem is always feasible.
Other alternative methods for inconsistency of the SQP method are also presented
[3, 17, 20, 29, 40, 41, 50]. For nonlinear programs which have some simple bound
constraints on some of the variables, Heinkenschloss [23] proposed a projected SQP
method which combines the ideas of the projected Newton methods and the SQP
method.
Recently Curtis and Overton [12] pointed out that applying SQP methods di-
rectly to a general nonsmooth and nonconvex constrained optimization problem will
fail in theory and in practice. They employed a process of gradient sampling (GS)
method to make the search direction effective in nonsmooth regions and proved that
the iteration points generated by the SQP-GS method converge globally to a station-
ary point of the penalty function with probability one. A smoothing method is a
well-recognized technique for numerical solution of a nonsmooth optimization prob-
lem. Using a smoothing method, one replaces the nonsmooth function by a suitable
smooth approximation, solves a sequence of smooth problems, and drives the approx-
imation closer and closer to the original problem. The fundamental question is as
follows: What property should a family of the smoothing functions have in order for
the stationary points of the smoothing problems to approach a stationary point of
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1390 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
the original problem? In most of the literature, a particular smoothing function is
employed for the particular problem studied. It turns out that not all smooth approx-
imations of the nonsmooth function can be used in the smoothing technique to obtain
the desired result; an example for which the smoothing method fails to converge with
almost all initial points was given by Kummer [26]. Zhang and Chen [49] (see also
the recent survey on the subject by Chen [8]) identified the desired property as the
gradient consistency property. Zhang and Chen [49] proposed a smoothing projected
gradient algorithm for solving optimization problems with a convex set constraint
by using a family of smoothing functions with the gradient consistency property to
approximate the nonsmooth objective function. They proved that any accumulation
point of the iteration sequence is a Clarke stationary point of the original nonsmooth
optimization problem. Recently [27, 45] extended the result of [49] to a class of non-
smooth constrained optimization problems using the projected gradient method and
the augmented Lagrangian method, respectively. Smoothing functions were proposed
and the SQP method was used for the smooth problem in [18, 25] to solve the mathe-
matical programs with complementarity constraints (MPCC) and in [28, 42] to solve
the semi-infinite programming (SIP). In this paper we will combine the SQP method
and the smoothing technique to design a smoothing SQP method for a class of general
constrained optimization problems with smoothing functions satisfying the gradient
consistency property.
For the SQP method under a penalty framework to converge globally, usually the
set of the multipliers is required to be bounded (see, e.g., [2]). This amounts to saying
that the MFCQ is required to hold. For the nonsmooth optimization problem, the
corresponding MFCQ is referred to as the GMFCQ. Unfortunately, the GMFCQ is
quite strong for certain classes of problems. For example, it is well known by now
that the GMFCQ never holds for the bilevel program [46]. Another example of a non-
smooth optimization problem which does not satisfy the GMFCQ is a reformulation
of an SIP [28]. In this paper we propose a new constraint qualification that is much
weaker than the GMFCQ. We call it the weakly generalized Mangasarian–Fromovitz
constraint qualification (WGMFCQ). WGMFCQ is not a constraint qualification in
the classical sense. It is defined in terms of the smoothing functions and the se-
quence of iteration points generated by the smoothing algorithm. In our numerical
experiment, WGMFCQ is very easy to satisfy for the bilevel programs.
Both the objective function and the constrained functions may be nonsmooth.
We first use some smoothing functions approximating the nonsmooth functions and
then consider the robust formulation which is proposed by Pantoja and Mayne. Under
the EWGMFCQ, global convergence can be obtained.
The rest of the paper is organized as follows. In section 2, we present prelimi-
naries which will be used in this paper and introduce the new constraint qualification
WGMFCQ. In section 3, we consider the smoothing approximations of the original
problem and propose the smoothing SQP method under an lpenalty framework.
Then we establish the global convergence for the algorithm. In section 4, we apply
the smoothing SQP method to bilevel programs. The final section contains some
concluding remarks.
We adopt the following standard notation in this paper. For any two vectors a
and bin Rn, we denote their inner product by aTb. Given a function G:RnRm,
we denote its Jacobian by G(z)Rm×n, and, if m= 1, the gradient G(z)Rnis
considered as a column vector. For a set Ω Rn,wedenotetheinterior,therelative
interior, the closure, the convex hull, and the distance from xto Ω by int Ω, ri Ω, cl Ω,
co Ω, and dist(x, Ω), respectively. For a matrix ARn×m,ATdenotes its transpose.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1391
In addition, we let Nbe the set of nonnegative integers and exp[z] be the exponential
function.
2. Preliminaries and the new constraint qualifications. In this section,
we first present some background materials and results which will be used later. We
then discuss the issue of constraint qualifications.
Let ϕ:RnRbe Lipschitz continuous near ¯x. The directional derivative of ϕ
at ¯xin direction dis defined by
ϕx;d) := lim
t0
ϕx+td)ϕx)
t.
The Clarke generalized directional derivative of ϕat ¯xin direction dis defined by
ϕx;d) := lim sup
x¯x, t0
ϕ(x+td)ϕ(x)
t.
The Clarke generalized gradient of ϕat ¯xis a convex and compact subset of Rn
defined by
∂ϕx):={ξRn:ξTdϕx;d)dRn}.
Note that when ϕis convex, the Clarke generalized gradient coincides with the sub-
differential in the sense of convex analysis, i.e.,
∂ϕx)={ξRn:ξT(x¯x)ϕ(x)ϕx)xRn},
and, when ϕis continuously differentiable at ¯x,wehave∂ϕx)={∇ϕx)}. Detailed
discussions of the Clarke generalized gradient and its properties can be found in
[10,11].
For ¯x, a feasible solution of problem (P), we denote by Ix):={i=1,...,p :
gix)=0}the active set at ¯x. The following nonsmooth Fritz John–type multiplier
rule holds by Clarke [10, Theorem 6.1.1]) and the nonsmooth calculus (see, e.g., [10]).
Theorem 2.1 (Fritz John multiplier rule). Let ¯xbe a local optimal solution of
problem (P). Then there exist r0
i0(iIx)),λjR(j=p+1,...,q)not
all zero such that
0r∂f x)+
iIx)
λi∂gix)+
q
j=p+1
λj∂hjx).(2.1)
There are two possible cases in the Fritz John multiplier rule: r>0orr=0. Let
¯xbe a feasible solution of problem (P). If the Fritz John condition (2.1) holds with
r>0, then we call ¯xa (Clarke) stationary point of (P). According to Clarke [10], any
multiplier λRqwith λi0,i=1,...,p, satisfying the Fritz John condition (2.1)
with r= 0 is an abnormal multiplier. From the Fritz John multiplier rule, it is easy
to see that if there is no nonzero abnormal multiplier, then any local optimal solution
¯xmust be a stationary point. Hence it is natural to define the following constraint
qualification.
Definition 2.1 (NNAMCQ). We say that the no nonzero abnormal multiplier
constraint qualification (NNAMCQ) holds at a feasible point ¯xof problem (P) if
0
iIx)
λi∂gix)+
q
j=p+1
λj∂hjx)and λi0,iIx),
=λi=0
j=0,iIx),j=p+1,...,q.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1392 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
It is easy to see that NNAMCQ amounts to saying that any collection of vectors
{vi,iIx),v
p+1,...,v
q},
where vi∂gix)(iIx)),v
j∂hjx)(j=p+1,...,q), are positively lin-
early independent. NNAMCQ is equivalent to the generalized MFCQ which was first
introduced by Hiriart-Urruty [24].
Definition 2.2 (GMFCQ). Afeasiblepoint¯xis said to satisfy the general-
ized Mangasarian–Fromovitz constraint qualification (GMFCQ) for problem (P) if for
any given collection of vectors {vi,i Ix),v
p+1,...,v
q},wherevi∂gix)(i
Ix)),v
j∂hjx)(j=p+1,...,q), the following two conditions hold:
(i) vp+1,...,v
qare linearly independent.
(ii) There exists a direction dsuch that
vT
id<0,iIx),
vT
jd=0,j=p+1,...,q.
In order to accommodate infeasible accumulation points in the numerical algo-
rithm, we now extend the NNAMCQ and the GMFCQ to allow infeasible points.
Note that when ¯xis feasible, ENNAMCQ and EGMFCQ (see Definitions 2.3 and 2.4)
reduce to NNAMCQ and GMFCQ, respectively.
Definition 2.3 (ENNAMCQ). We say that the extended no nonzero abnormal
multiplier constraint qualification (ENNAMCQ) holds at ¯xRnif
0
p
i=1
λi∂gix)+
q
j=p+1
λj∂hjx)and λi0,i=1,...,p,
p
i=1
λigix)+
q
j=p+1
λjhjx)0
imply that λi=0
j=0for all i=1,...,p,j=p+1,...,q.
Definition 2.4 (EGMFCQ). Apoint¯xRnis said to satisfy the extended
generalized Mangasarian–Fromovitz constraint qualification (EGMFCQ) for problem
(P) if for any given collection of vectors {vi,v
j:i=1,...,p, j =p+1,...,q},where
vi∂gix),v
j∂hjx), the following two conditions hold:
(i) vp+1,...,v
qare linearly independent.
(ii) There exists a direction dsuch that
gix)+vT
id<0,i=1,...,p,
hjx)+vT
jd=0,j=p+1,...,q.
Note that under the extra assumption that the functions giare directionally
differentiable, the EGMFCQ coincides with the conditions (B4) and (B5) in [25].
Since the set of the Clarke generalized gradient can be large, the ENNAMCQ and
the EGMFCQ may be too strong for some problems to hold. In what follows, we pro-
pose two conditions that are much weaker than the ENNAMCQ and the EGMFCQ,
respectively. For this purpose, we first recall the definition of smoothing functions.
Definition 2.5. Let g:RnRbe a locally Lipschitz function. Assume that,
for a given ρ>0,gρ:RnRis a continuously differentiable function. We say that
{gρ:ρ>0}is a family of smoothing functions of gif limzx, ρ↑∞ gρ(z)=g(x)for
any fixed xRn.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1393
Such sequence gρ(·) converges continuously to g(·)asdenedin[38].
Definition 2.6 (see [4, 9]). Let g:RnRbe a locally Lipschitz continu-
ous function. We say that a family of smoothing functions {gρ:ρ>0}of gsat-
isfies the gradient consistency property if lim supzx, ρ↑∞ gρ(z)is nonempty and
lim supzx, ρ↑∞ gρ(z)∂g(x)for any xRn,wherelim supzx, ρ↑∞ gρ(z)de-
notes the set of all limiting points
lim sup
zx, ρ↑∞
gρ(z):=lim
k→∞ gρk(zk):zkx, ρk↑∞
.
Note that according to [38, Theorem 9.61 and Corollary 8.47(b)], for a locally
Lipschitz function gand its smoothing family {gρ:ρ>0}, one always has the
inclusion
∂g(x)co lim sup
zx, ρ↑∞
gρ(z).
Thus our definition of gradient consistency is equivalent to saying that
∂g(x) = co lim sup
zx, ρ↑∞
gρ(z),
which is the definition used in [5, 8].
It is natural to ask whether one can always find a family of smoothing functions
with the gradient consistency property for a locally Lipschitz function. The answer
is yes. Rockafellar and Wets [38, Example 7.19 and Theorem 9.67] show that for any
locally Lipschitz function g, one can construct a family of smoothing functions of g
with the gradient consistency property by the integral convolution:
gρ(x):=Rn
g(xy)φρ(y)dy =Rn
g(y)φρ(xy)dy,
where φρ:RnR+is a sequence of bounded, measurable functions with Rnφρ(x)dx
= 1 such that the sets Bρ={x:φρ(x)>0}form a bounded sequence converging to
{0}as ρ↑∞. Although one can always generate a family of smoothing functions with
the gradient consistency property by integral convolution with bounded supports,
there are many other smoothing functions which are not generated by the integral
convolution with bounded supports [5, 6, 7, 8, 32].
Using the smoothing technique, we approximate the locally Lipschitz functions
f(x), gi(x), i=1,...,p,andhj(x), j=p+1,...,q, by families of smoothing functions
{fρ(x):ρ>0},{gi
ρ(x):ρ>0},i=1,...,p,and{hj
ρ(x):ρ>0},j=p+1,...,q.We
also assume that these families of smoothing functions satisfy the gradient consistency
property. We use certain algorithms to solve the smooth problem and drive the
smoothing parameter ρto infinity. Based on the sequence of iteration points of the
algorithm, we now define the new conditions.
Definition 2.7 (WNNAMCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞. Suppose that ¯xis a feasible accumulation
point of the sequence {xk}. We say that the weakly no nonzero abnormal multiplier
constraint qualification (WNNAMCQ) based on the smoothing functions {gi
ρ(x):ρ>
0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that for any
K0KNsuch that limk→∞,kKxkxand any collection of vectors {vi(i
Ix)),v
j(j=p+1,...,q)},where
vi= lim
k→∞,kK0
gi
ρk(xk),iIx),v
j= lim
k→∞,kK0
hj
ρk(xk),j=p+1,...,q,
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1394 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
0=
iIx)
λivi+
q
j=p+1
λjvjand λi0,iIx)
=λi=0
j=0,iIx),j=p+1,...,q.
Definition 2.8 (WGMFCQ). Let {xk}be a sequence of iteration points for prob-
lem (P),andletρk↑∞as k→∞.Let¯xbe a feasible accumulation point of the
sequence {xk}. We say that the weakly generalized Mangasarian–Fromovitz constraint
qualification (WGMFCQ) based on the smoothing functions {gi
ρ(x):ρ>0},i=
1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the following con-
ditions hold. For any K0KNsuch that limk→∞,kKxkxand any collect ion
of vectors {vi(iIx)),v
j(j=p+1,...,q)},wherevi= limk→∞,kK0gi
ρk(xk),i
Ix),v
j= limk→∞,kK0hj
ρk(xk),j=p+1,...,q,
(i) vp+1,...,v
qare linearly independent;
(ii) there exists a direction dsuch that
vT
id<0iIx),
vT
jd=0 j=p+1,...,q.
We now extend the WNNAMCQ and the WGMFCQ to accommodate infeasible
points.
Definition 2.9 (EWNNAMCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞.Let¯xbe an accumulation point of the
sequence {xk}. We say that the extended weakly no nonzero abnormal multiplier
constraint qualification (EWNNAMCQ) based on the smoothing functions {gi
ρ(x):
ρ>0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the
following condition holds. For any K0KNsuch that limk→∞,kKxkxand
any
vi= lim
k→∞,kK0
gi
ρk(xk),i=1,...,p, v
j= lim
k→∞,kK0
hi
ρk(xk),j=p+1,...,q,
0=
p
i=1
λivi+
q
j=p+1
λjvjand λi0,i=1,...,p,(2.2)
p
i=1
λigix)+
q
j=p+1
λjhjx)0(2.3)
imply that λi=0
j=0,i=1,...,p,j=p+1,...,q.
Definition 2.10 (EWGMFCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞.Let¯xbe an accumulation point of the
sequence {xk}. We say that the extended weakly generalized Mangasarian–Fromovitz
constraint qualification (EWGMFCQ) based on the smoothing functions {gi
ρ(x):ρ>
0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the
following conditions hold. For any K0KNsuch that limk→∞,kKxkxand
any collection of vectors {vi,v
j:i=1,...,p, j =p+1,...,q},where
vi= lim
k→∞,kK0
gi
ρk(xk),i=1,...,p, v
j= lim
k→∞,kK0
hi
ρk(xk),j=p+1,...,q,
(i) vp+1,...,v
qare linearly independent;
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1395
(ii) there exists a direction dsuch that
gix)+vT
id<0i=1,...,p,(2.4)
hjx)+vT
jd=0 j=p+1,...,q.(2.5)
Due to the gradient consistency property, it is easy to see that, in general, the
EWNNAMCQ and the EWGMFCQ are weaker than the ENNAMCQ and the EGM-
FCQ, respectively. We finish this section with an equivalence between the EWGM-
FCQ and EWNNAMCQ.
Theorem 2.2. The following implication always holds:
EWGMFCQ ⇐⇒ EWNNAMCQ.
Proof. We first show that EWGMFCQ implies EWNNAMCQ. To the contrary we
suppose that EWGMFCQ holds, but EWNNAMCQ does not hold, which means that
there exist scalars λiR,i=1,...,q, not all zero such that conditions (2.2)–(2.3)
hold. Suppose that dis the direction that satisfies condition (ii) of EWGMFCQ. Due
to the linear independence of vp+1,...,v
q(condition (i) of EWGMFCQ), the scalars
λi,i=1,...,p, cannot all be equal to zero. Multiplying both sides of condition (2.2)
by d, it follows from conditions (2.4) and (2.5) that
0=
p
i=1
λivT
id+
q
j=p+1
λjvT
jd
<
p
i=1
λigix)
q
j=p+1
λjhjx)0,
which is a contradiction. Therefore, EWNNAMCQ holds.
We now prove the reverse implication. Assume that EWNNAMCQ holds. EWN-
NAMCQ implies condition (i) of EWGMFCQ. If both (i) and (ii) of EWGMFCQ
hold, we are done. Suppose that condition (ii) of EWGMFCQ does not hold; that is,
there exist a subsequence K0KNand v1,...,v
qwith limk→∞,kKxkxand
vi= lim
k→∞,kK0
gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,kK0
hj
ρk(xk),j=p+1,...,q,
such that for all directions d,(2.4) or (2.5) fails to hold. Let A:= [v1,...,v
q]bethe
matrix where v1,...,v
qare columns and
S1:= {z:dsuch that z=ATd},
S2:= {z:zi<gix),i=1,...,p, z
j=hjx),j=p+1,...,q}.
Since the convex sets S1and clS2are nonempty and ri S1and ri clS2have no point
in common by the violation of condition (ii) of EWGMFCQ, from [37, Theorem 11.3],
there exists a hyperplane separating S1and clS2properly. Since S1is a subspace and
thus a cone, from [37, Theorem 11.7], there exists a hyperplane separating S1and
clS2properly and passing through the origin. By the separation theorem (see, e.g.,
[37, Theorem 11.1]), there exists a vector ysuch that
inf{yTz:zS1}≥0sup{yTz:zclS2},
sup{yTz:zS1}>inf{yTz:zclS2}.(2.6)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1396 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
From (2.6), we know that y= 0. Therefore, there exists yRq,y= 0, such that
yTz0 for all zS1and yTz0 for all zclS2.
(a) We first consider the inequality yTz0 for all zclS2. By taking z0clS2
such that z0
j,j=p+1,...,q, are constants and z0
i→−,i∈{1,...,p}, we conclude
that
(2.7) yi0,i=1,...,p.
Choosing z2clS2with z2
i=gix),i=1,...,p, z2
j=hjx),j=p+1,...,q,we
have
(2.8)
p
i=1
yigix)+
q
j=p+1
yjhjx)=yTz20.
(b) We now consider the inequality yTz0 for all zS1. Select an arbitrary d.
Then z1:= ATdS1,z:= z1=AT(d)S1, and hence
p
i=1
yivT
id+
q
j=p+1
yjvT
jd=yTz10,
p
i=1
yivT
i(d)+
q
j=p+1
yjvT
j(d)=yTz0.
That is,
(2.9)
p
i=1
yivi+
q
j=p+1
yjvj=0.
Therefore, if there exists a nonzero vector ysuch that yTz0 for all zS1and yTz
0 for all zclS2, the vector should also satisfy conditions (2.7)–(2.9). However, from
the EWNNAMCQ, conditions (2.7)–(2.9) imply that y= 0, which is a contradiction.
Thus condition (ii) of EWGMFCQ must hold. The proof is therefore complete.
In the case when there is only one inequality constraint and no equality con-
straints in problem (P), the EWNNAMCQ and EWGMFCQ at ¯xreduce to the fol-
lowing condition: There is no K0KNsuch that limk→∞,kKxkxand
limk→∞,kK0g1
ρk(xk)= 0. This condition is slightly weaker than a similar con-
dition [28, Assumption (B4)] which requires that there is no K0Nsuch that
limk→∞,kK0g1
ρk(xk)=0.
3. Smoothing SQP method. In this section we design the smoothing SQP
algorithm and prove its convergence.
Suppose that {gi
ρ(x):ρ>0}and {hj
ρ(x):ρ>0}are families of smoothing
functions for gi,h
j, respectively. Let xkbe the current iterate, and let (Wk,r
k
k)
be current updates of the positive definite matrix, the penalty parameter, and the
smoothing parameter, respectively. We will try to find a descent direction of a smooth-
ing merit function by using the smoothing SQP subprogram. In order to overcome
the inconsistency of the smoothing SQP subprograms, following Pantoja and Mayne
[35], we solve the penalized smoothing SQP subprogram:
(QP)kmin
dRnRfρk(xk)Td+1
2dTWkd+rkξ
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1397
s.t.g
i
ρk(xk)+gi
ρk(xk)Tdξ, i =1,...,p,
hj
ρk(xk)+hj
ρk(xk)Tdξ, j =p+1,...,q,
hj
ρk(xk)−∇hj
ρk(xk)Tdξ, j =p+1,...,q,
ξ0.
If (dk
k) is a solution of (QP)k, then its Karush–Kuhn–Tucker (KKT) condition can
be written as
0=fρk(xk)+Wkdk+
p
i=1
λg
i,k gi
ρk(xk)+
q
j=p+1
(λ+
j,k λ
j,k )hj
ρk(xk),(3.1)
0=rk
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ
j,k )+λξ
k
,(3.2)
0λg
i,k (gi
ρk(xk)+gi
ρk(xk)Tdkξk)0,i=1,...,p,(3.3)
0λ+
j,k (hj
ρk(xk)+hj
ρk(xk)Tdkξk)0,j=p+1,...,q,(3.4)
0λ
j,k (hj
ρk(xk)−∇hj
ρk(xk)Tdkξk)0,j=p+1,...,q,(3.5)
0λξ
k⊥−ξk0,(3.6)
where λk=(λg
k
+
k
k
ξ
k) is a corresponding Lagrange multiplier.
Let ρ>0,r > 0. We define the smoothing merit function by
θρ,r (x):=fρ(x)+ρ(x),
where φρ(x):=max{0,g
i
ρ(x),i=1,...,p, |hj
ρ(x)|,j=p+1,...,q}, and propose the
following smoothing SQP algorithm.
Algorithm 3.1. Let {β, σ1}be constants in (0,1),andlet{σ, σ,ˆη}be constants
in (1,). Choose an initial point x0, an initial smoothing parameter ρ0>0,an
initial penalty parameter r0>0, and an initial positive definite matrix W0Rn×n,
and set k:= 0.
1. Solve (QP)kto obtain (dk
k)with the corresponding Lagrange multiplier
λk=(λg
k
+
k
k
ξ
k);gotostep2.
2. If ξk=0,setrk+1 := rkandgotostep3. Otherwise, set rk+1 := σrkand
go to step 3.
3. Let xk+1 := xk+αkdk,whereαk:= βl,l∈{0,1,2,...}is the smallest
nonnegative integer satisfying
θρk,rk(xk+1)θρk,rk(xk)≤−σ1αkdkWkdk.(3.7)
If
dk≤ˆηρ1
k,(3.8)
set ρk+1 := σρkandgotostep4. Otherwise, set ρk+1 := ρkandgotostep1.In
either case, update to a symmetric positive definite matrix Wk+1 and k=k+1.
4. If a stopping criterion holds, terminate. Otherwise, go to step 1.
We now show the global convergence of the smoothing SQP algorithm. For this
purpose, we need the following standard assumption.
Assumption 3.1. There exist two positive constants mand M,mM, such that
for each kand each dRn,
md2dTWkdMd2.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1398 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
Theorem 3.1. Suppose that {(xk
k,d
k
k
k,r
k,W
k)}is a sequence generated
by Algorithm 3.1. Then for every k,
(3.9) θ
ρk,rk(xk,d
k)≤−dkWkdk
and dkis a descent direction of function θρk,rk(x)at xk, provided that Assump-
tion 3.1 holds. Furthermore, suppose that Algorithm 3.1 does not terminate within
finite iterations. Suppose that the sequences {xk},{λk},and{rk}are bounded. Then
¯
K:= {k:dk≤ˆηρ1
k}is an infinite set, and any accumulation point of sequence
{xk}¯
Kis a stationary point of problem (P).
Proof. Since (dk
k) is a solution of (QP)k, the KKT conditions (3.1)(3.6) hold.
The directional derivative of the function x→|hj
ρk(x)|at xkin direction dkis
−∇hj
ρk(xk)Tdkif hj
ρk(xk)<0,
|∇hj
ρk(xk)Tdk|if hj
ρk(xk)=0,
hj
ρk(xk)Tdkif hj
ρk(xk)>0.
Denote the index sets
Ik:= {i=1,...,p :gi
ρk(xk)=φρk(xk)},
J+
k:= {j=p+1,...,q :hj
ρk(xk)=φρk(xk)},
J
k:= {j=p+1,...,q :hj
ρk(xk)=φρk(xk)},
and Γk:= IkJ+
kJ
k. Therefore the directional derivative of the function xφρk(x)
at xkin direction dkis
0ifφρk(xk)=0an
k=,
max{0,gi
ρk(xk)Tdk,iIk,|∇hj
ρk(xk)Tdk|,jJ+
k}if φρk(xk)=0an
k=,
max{∇gi
ρk(xk)Tdk,iIk,hj
ρk(xk)Tdk,jJ+
k,
−∇hj
ρk(xk)Tdk,jJ
k}if φρk(xk)>0.
From (3.3)–(3.5), we have
gi
ρk(xk)Tdkξkgi
ρk(xk)=ξkφρk(xk),iIk,
hj
ρk(xk)Tdkξkhj
ρk(xk)=ξkφρk(xk),jJ+
k,
−∇hj
ρk(xk)Tdkξk+hj
ρk(xk)=ξkφρk(xk),jJ
k.
Thus, φ
ρk(xk,d
k)ξkφρk(xk). Therefore,
θ
ρk,rk(xk,d
k)=fρk(xk)Tdk+rkφ
ρk(xk,d
k)
≤∇fρk(xk)Tdk+rk(ξkφρk(xk)) .
From ( 3.2) and (3.6), we know that if ξk>0, then
rk=
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ
j,k )
,
which means
rkξk=
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ
j,k )
ξk.(3.10)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1399
By taking conditions (3.1), (3.3)–(3.5), and (3.10) into account, we obtain that
for each k,
θ
ρk,rk(xk,d
k)=θ
ρk,rk(xk,d
k)+
p
i=1
λg
i,k (gi
ρk(xk)+gi
ρk(xk)Tdkξk)
+
q
j=p+1
λ+
j,k (hj
ρk(xk)+hj
ρk(xk)Tdkξk)+
q
j=p+1
λ
j,k (hj
ρk(xk)−∇hj
ρk(xk)Tdkξk)
≤−dkWkdk+
p
i=1
λg
i,k (gi
ρk(xk)ξk)+
q
i=p+1
λ+
j,k (hj
ρk(xk)ξk)
+
q
i=p+1
λ
j,k (hj
ρk(xk)ξk)+rk(ξkφρk(xk))
≤−dkWkdk+rk(ξkφρk(xk)) +
p
i=1
λg
i,k +
q
i=p+1
λ+
j,k +
q
i=p+1
λ
j,k
(φρk(xk)ξk)
=dkWkdk
rk
p
i=1
λg
i,k
q
i=p+1
λ+
j,k
q
i=p+1
λ
j,k
φρk(xk)
≤−dkWkdk.
Hence inequality (3.9) holds. Since Wkis assumed to be positive definite, it follows
that dkis a descent direction of function θρk,rk(x)atxkfor every k. Therefore, the
algorithm is well defined.
We now suppose that Algorithm 3.1 does not terminate within finite iterations.
We first prove that there always exists some dksuch that (3.8) holds; thus ¯
Kis an
infinite set.
To the contrary suppose that dk≥c0>0foreachk. Then Assumption 3.1
together with condition (3.7) implies the existence of a positive constant csuch that
θρk,rk(xk+1)θρk,rk(xk)c.Consequently,(3.8) fails. From the boundedness of
{rk}, we know that ξk=0whenkis large. We can then assume that there exists a ¯
k
large enough such that ρk=ρ¯
kand rkrfor k¯
kbytheupdatingruleofρkand
rk.
Since the sequence {xk}is bounded, the sequence {θρ¯
k,¯r(xk)}is bounded below.
Moreover, θρk,rk(xk+1)θρk,rk(xk)c,c>0, which implies that the sequence
{θρ¯
k,¯r(xk)}is monotonously decreasing. Hence we have
k¯
k
c
k¯
kθρ¯
k,¯r(xk)θρ¯
k,¯r(xk+1)
=θρ¯
k,¯r(x¯
k)lim
k→∞ θρ¯
k,¯r(xk)
<,
which is a contradiction. Therefore ¯
Kis an infinite set, which also implies that ρk↑∞
as k→∞.
Suppose there exist K¯
Kand ¯xsuch that limk→∞,kKxkx. Since the
sequence {λk}is bounded, without loss of generality, assume there exists a subse-
quence K1Ksuch that (λg
k
+
k
k
ξ
k)(¯
λg,¯
λ+,¯
λ,¯
λξ)ask→∞,kK1and
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1400 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
¯
λ0. By the gradient consistency property of fρ(·), gi
ρ(·), i=1,...,p,andhj
ρ(·),
j=p+1,...,q, there exists a subsequence ˜
K1K1such that
lim
k→∞,k˜
K1
fρk(xk)∂fx),
lim
k→∞,k˜
K1
gi
ρk(xk)∂gix),i=1,...,p,
lim
k→∞,k˜
K1
hj
ρk(xk)∂hjx),j=p+1,...,q.
Taking limits in (3.1) and (3.4)–(3.6) as k→∞,k˜
K1, by the gradient consistency
properties and ξk0, it is easy to see that ¯xis a stationary point of problem (P).
The proof of the theorem is complete.
In the rest of this section, we give a sufficient condition for the boundedness of
sequences {rk}and {λk}. We first give the following result on error bounds.
Lemma 3.1. For eac h kN,j=1,...,l,letFj
k,Fj:RnRbe cont inuously
differentiable. Assume that for each j=1,...,l,{Fj
k(·)}and {∇Fj
k(·)}converge
pointwise to Fj(·)and Fj(·), respectively, as kgoes to infinity. Let ˆ
dbe the point
such that Fj(ˆ
d)=0,j =1,...,l. Suppose that there exist κ>0and δ>0such that
for all μj[1,1],j=1,...,l, not all zero and all dˆ
d+δB it holds that
l
j=1
μjFj(d)
>1
κ.
Then for sufficiently large k,
dist( ˆ
d, Sk)κ
l
j=1
|Fj
k(ˆ
d)|,(3.11)
where Sk:= {dRn:Fj
k(d)=0,j =1,...,l}.
Proof.DenoteF(d):=l
j=1 |Fj(d)|,Fk(d):=l
j=1 |Fj
k(d)|.Ifˆ
dSk,then
(3.11) holds trivially. Now suppose that ˆ
d∈ Sk.SinceFk(ˆ
d)F(ˆ
d)ask→∞,
there exists a ¯
kNsuch that Fk(ˆ
d)
1δwhen k¯
k.Letε:= Fk(ˆ
d). Then
εκ < δ.Takeλ(εκ, δ). Then by Ekeland’s variational principle [38, Proposition
1.43], there exists an ωsuch that ωˆ
d≤λ,Fk(ω)Fk(ˆ
d), and the function
ϕ(d):=Fk(d)+ ε
λdωattains minimum at ω. Hence by the nonsmooth calculus
of the Clarke generalized gradient, we have
0∂Fk(w)+ ε
λB,
where Bdenotes the closed unit ball of Rn.Thusvk≤ ε
λ<1
κfor all vk
∂Fk(ω), for k¯
k. We now show that Fk(w) = 0 by contradiction. Suppose that
Fk(w)= 0. Then there exists at least one jsuch that Fj
k(w)=0. Forsuchaj,
|Fj
k(w)|={±∇Fj
k(w)}. Therefore there exist μk
j[1,1], j=1,...,l, not all zero
such that vk=l
j=1 μk
jFj
k(ω). We assume that there exist a subsequence KN
and μj[1,1], j=1,...,l, not all zero such that for every kK,Fk(w)=0,
limk→∞,kKμk
j=μj,j=1,...,l.Since{∇Fj
k(w)}kconverge to Fj(w), we have
v:= limk→∞,kKvk=l
j=1 μjFj(ω)andv≤ 1
κ, which is a contradiction. The
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1401
contradiction shows that we must have Fk(w) = 0 and hence wSk. Therefore we
have
dist( ˆ
d, Sk)≤ˆ
dω≤λ.
Since this is true for every λ(εκ, δ), we have that for all k¯
k,
dist( ˆ
d, Sk)εκ =κ|Fk(ˆ
d)|.
Theorem 3.2. Assume that Assumption 3.1 holds. Suppose that Algorithm 3.1
does not terminate within finite iterations and {(xk
k,d
k
k
k,r
k)}is a sequence
generated by Algorithm 3.1. If EWGMFCQ holds (or, equivalently, EWNNAMCQ
holds) at any accumulation point ¯x, then the following two statements are true:
(a) {dk}and {ξk}are bounded.
(b) {rk}and {λk}are bounded. Furthermore, when kis large enough, ξk=0.
Proof. (a) Assume that there exists a subset KNsuch that limk→∞,kKxk=
¯x. To the contrary, suppose that {dk}Kis unbounded. Then there exists a subset
K0Ksuch that limk→∞,kK0dk=and limk→∞,kK0xkx. By the gradient
consistency property, without loss of generality we may assume that
vi= lim
k→∞,kK0
gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,kK0
hj
ρk(xk),j=p+1,...,q.
By EWGMFCQ, vp+1,...,v
qare linearly independent and there exists ˆ
dsuch that
gix)+vT
iˆ
d<0,i=1,...,p,
hjx)+vT
jˆ
d=0,j=p+1,...,q.
Since the vectors {limk→∞,kK0hj
ρk(xk):j=p+1,...,q}are linearly independent,
it is easy to see that for sufficiently large kK0, the vectors {∇hj
ρk(xk),j =p+
1,...,q}are also linearly independent. Denote
Fj(d):=hjx)+vT
jd, j =p+1,...,q,
Fj
k(d):=hj
ρk(xk)+hj
ρk(xk)Td, j =p+1,...,q.
Then Fj(ˆ
d)=0,j=p+1,...,q.Sincevp+1,...,v
qare linearly independent, there
is κsuch that 0 <1
κ<min{ q
j=p+1 μjvj:μj[1,1] not all equal to zero}.By
Lemma 3.1, for sufficiently large k,
dist( ˆ
d, Sk)κ
q
j=p+1
|Fj
k(ˆ
d)|,(3.12)
where Sk:= {dRn:Fj
k(d)=0,j =p+1,...,q}.Since Skis closed, there exists
ˆ
dkSksuch that ˆ
dˆ
dk=dist(ˆ
d, Sk). Moreover, by virtue of (3.12), the fact that
limk→∞,kK0Fj
k(ˆ
d)=Fj(ˆ
d) = 0 for all j=p+1,...,q implies that ˆ
dˆ
dk→0as
k→∞,k K0. Hence for sufficiently large k,wehave
hj
ρk(xk)+hj
ρk(xk)Tˆ
dk=0,j=p+1,...,q,(3.13)
gi
ρk(xk)+gi
ρk(xk)Tˆ
dk<0,i=1,...,p.(3.14)
Conditions (3.13)–(3.14) imply that ( ˆ
dk,0) is a feasible solution for (QP)k.Since
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1402 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
(dk
k) is an optimal solution to problem (QP)k, we have that for any k¯
k,kK0,
fρk(xk)Tdk+1
2dT
kWkdk≤∇fρk(xk)Tdk+1
2dT
kWkdk+rkξk
≤∇fρk(xk)Tˆ
dk+1
2ˆ
dT
kWkˆ
dk.(3.15)
Since fρk(xk)Tˆ
dk+1
2ˆ
dT
kWkˆ
dkis bounded, it follows that {dk}Kis bounded from
Assumption 3.1. Since (dk
k) are feasible for problem (QP)k, by the definition of
the smoothing function and the gradient consistency property, it is easy to see that
if {dk}Kis bounded, then {ξk}Kis also bounded. Since Kand ¯xare an arbitrary
subset and an arbitrary accumulation point, {dk}and {ξk}are bounded for the whole
sequence.
(b) To the contrary, suppose that {λk}is unbounded. Then there exists a subset
K1Ksuch that limk→∞,kK1λk=and ξk>0forkK1sufficiently large.
By the gradient consistency property, without loss of generality we may assume that
vi= lim
k→∞,kK1
gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,kK1
hj
ρk(xk),j=p+1,...,q,
and limk→∞,kK1
λk
λk=¯
λfor some nonzero vector ¯
λ=(
¯
λg,¯
λ+,¯
λ,¯
λξ)0. Divid-
ing by λkin both sides of (3.1) and letting k→∞,kK1,wehave
0=
p
i=1
¯
λg
ivi+
q
j=p+1
(¯
λ+
j¯
λ
j)vj.(3.16)
Letting k→∞,kK1, in conditions (3.3)–(3.6) and assuming that ( ¯
d, ¯
ξ)isthe
limiting point of {(dk
k)}K1,wehave
0¯
λg
i(gix)+vT
i¯
d¯
ξ)0,i=1,...,p,
0¯
λ+
j(hjx)+vT
j¯
d¯
ξ)0,j=p+1,...,q,
0¯
λ
j(hjx)vT
j¯
d¯
ξ)0,j=p+1,...,q,
0¯
λ¯
ξ⊥−
¯
ξ0.
Multiplying both sides of (3.16) by ¯
d,since
¯
λg
i(gix)+vT
i¯
d¯
ξ)=0,i=1,...,p,
¯
λ+
j(hjx)+vT
j¯
d¯
ξ)=0,j=p+1,...,q,
¯
λ
j(hjx)vT
j¯
d¯
ξ)=0,j=p+1,...,q,
we have
0=
p
i=1
¯
λg
ivT
i¯
d+
q
j=p+1
(¯
λ+
j¯
λ
j)vT
j¯
d
=
p
i=1
¯
λg
i(¯
ξgix)) +
q
j=p+1
¯
λ+
j(¯
ξhjx)) +
q
j=p+1
¯
λ
j(¯
ξ+hjx)).
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1403
Thus,
p
i=1
¯
λg
igix)+
q
j=p+1
(¯
λ+
j¯
λ
j)hjx)=
p
i=1
¯
λg
i¯
ξ+
q
j=p+1
(¯
λ+
j+¯
λ
j)¯
ξ0.(3.17)
From EWGMFCQ (equivalently, EWNNAMCQ), condition (3.17) together with con-
dition (3.16) implies that ¯
λg
i=0,i=1,...,p,and¯
λ+
j¯
λ
j=0,j=p+1,...,q.
Consider the case where ¯
λg
i=0,i=1,...,p, and there exists an index j
{p+1,...,q}such that ¯
λ+
j=¯
λ
j>0. Then for sufficiently large kK1,λ+
j,k >0
and λ
j,k >0. From the complementary condition (3.4)–(3.5), we must have ξk=0
for sufficiently large kK1, which is a contradiction.
Otherwise, consider the case where ¯
λg
i=0,i=1,...,p,and¯
λ+
j=¯
λ
j=0,
j=p+1,...,q.Thensince¯
λis a nonzero vector, we must have ¯
λξ>0, which implies
that λξ
k>0 for sufficiently large kK1. From the complementarity condition (3.6),
ξk= 0 for sufficiently large kK1, which is a contradiction.
The contradiction shows that {λk}must be bounded. By the relationship between
{λk}and {rk}given in (3.2), the boundedness of {λk}implies the boundedness of
{rk}. Furthermore, from the updating rule of the algorithm, the boundedness of the
sequences {λk}and {rk}implies that when kis large enough, ξk=0. Wecomplete
the proof.
The following corollary follows immediately from Theorems 3.1 and 3.2.
Corollary 3.2. Let Assumption 3.1 hold, and suppose that Algorithm 3.1 does
not terminate within finite iterations. Suppose that the sequence {xk}is bounded.
Assume that EWGMFCQ (or, equivalently, EWNNAMCQ) holds at any accumulation
point of sequ ence {xk};then ¯
K:= {k:dk≤ˆηρ1
k}is an infinite set, and any
accumulation point of sequence {xk}¯
Kis a stationary point of problem (P).
In the case where the objective function is smooth and there is only one inequality
constraint and no equality constraints in problem (P), Corollary 3.2 extends [28,
Theorem 4.3] to allow the general smoothing function instead of the specific smoothing
function.
4. Applications to bilevel programs. The purpose of this section is to apply
the smoothing SQP algorithm to the bilevel program. We illustrate how we can
apply our algorithm to solve the bilevel program, and we demonstrate through some
numerical examples that although the GMFCQ never holds for bilevel programs, the
WGMFCQ may be satisfied easily.
In this section we consider the simple bilevel program
(SBP) min F(x, y)
s.t.yS(x),
where S(x) denotes the set of solutions of the lower level program
(Px)min
yYf(x, y),
where F, f :Rn×RmRare continuously differentiable and twice continuously
differentiable, respectively, and Yis a compact subset of Rm.OursmoothingSQP
algorithm can easily handle any extra upper level constraint, but we omit it for sim-
plicity. For a general bilevel program, the lower level constraint may depend on the
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1404 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
upper level variables. By “simple,” we mean that the lower level constraint Yis inde-
pendent of x. Although (SBP) is a simple case of the general bilevel program, it has
many applications such as the principal-agent problem [30] in economics. We refer
the reader to [1, 15, 16, 39, 43] for applications of general bilevel programs.
When the lower level program is a convex program in variable y, the first order
approach to solving a bilevel program is to replace the lower level program by its
KKT conditions. In the case where fis not convex in variable y, Mirrlees [30] showed
that this approach may not be valid in the sense that the true optimal solution for
the bilevel problem may not even be a stationary point of the reformulated problem
by the first order approach.
For a numerical purpose, Outrata [34] proposed to reformulate a bilevel program
as a nonsmooth single level optimization problem by replacing the lower level program
by its value function constraint, which in our simple case is
(VP) min F(x, y)
s.t.f(x, y )V(x)=0,(4.1)
xRn,yY,
where V(x):=min
yYf(x, y) is the value function of the lower level problem. By
Danskin’s theorem (see [11, page 99] or [14]), the value function is Lipschitz continuous
but not necessarily differentiable, and hence problem (VP) is a nonsmooth optimiza-
tion problem with Lipschitz continuous problem data. Ye and Zhu [46] pointed out
that the usual constraint qualifications such as the GMFCQ never hold for problem
(VP). Ye and Zhu [46, 47] derived the first order necessary optimality condition for the
general bilevel program under the so-called partial calmness condition under which
the difficult constraint (4.1) is moved to the objective function with a penalty.
Based on the value function approach, Lin, Xu, and Ye [27] recently proposed to
approximate the value function by its integral entropy function, i.e.,
γρ(x):=ρ1ln Y
exp[ρf(x, y)]dy
=V(x)ρ1ln Y
exp[ρ(f(x, y)V(x))]dy,
and developed a smoothing projected gradient algorithm to solve problem (VP) when
problem (SBP) is partially calm and to solve an approximate bilevel problem (VP)ε,
where the constraint (4.1) is replaced by f(x, y)V(x)εfor small ε>0when
(SBP) is not partially calm.
Unfortunately, the partial calmness condition is rather strong, and hence a local
optimal solution of a bilevel program may not be a stationary point of (VP). Ye and
Zhu [48] proposed to study the following combined program by adding the first order
condition of the lower level problem into the problem (VP). Although the partial
calmness condition is a very strong condition for (VP), it is likely to hold for the
combined problem under some reasonable conditions [48].
Recently Xu and Ye [45] proposed a smoothing augmented Lagrangian method
to solve the combined problem with the assumption that each lower level solution lies
in the interior of Y:
(CP) min
(x,y)Rn×YF(x, y)
s.t.f(x, y )V(x)0,(4.2)
yf(x, y)=0.(4.3)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1405
They showed that if the sequence of penalty parameters is bounded, then any accumu-
lation point is a Clarke stationary point of (CP). They argued that since the problem
(CP) is very likely to satisfy the partial calmness or the weak calmness condition (see
[48]), the sequence of penalty parameters is likely to be bounded.
To simplify our discussion so that we can concentrate on the main idea, we make
the following assumption.
Assumption 4.1. Every optimal solution of the lower level problem is an interior
point of set Y.
Under Assumption 4.1, every optimal solution to the lower level constrained prob-
lem is a local minimizer to the objective function of the lower level problem, and
hence the necessary optimality condition of the lower level problem is simply equal
to yf(x, y) = 0. For some practical problems, it may be possible to set the set Y
large enough so that all optimal solutions of the lower level problem are contained in
the interior of Y. For example, for the principal-agent problem in economics [30], a
very important application of simple bilevel programs, the lower level constraint is an
interval and the solution of the lower level problem can usually be estimated to lie
in the interior of a certain bounded interval Y. If it is difficult to find a compact set
Ythat includes all optimal solutions of the lower level problem, but the set Ycan
be represented by some equality or inequality constraints, then one can use the KKT
condition to replace the constraint (4.3) in the problem (CP). In this case the problem
(CP) will become a nonsmooth mathematical program with equilibrium constraints.
We will study this case in a separate paper.
Since problem (CP) is a nonconvex and nonsmooth optimization problem, in
general the best we can do is to look for its Clarke stationary points. Since we assume
that all lower level solutions lie in the interior of set Y, any local optimal solution of
(CP) must be the Clarke stationary point of (CP) with the constraint yYremoved.
Hence the smoothing SQP method introduced in this paper can be used to find the
stationary points of (CP).
Let x, ¯y) be a local optimal solution of (CP). Then by the Fritz John–type mul-
tiplier rule, there exist r0
10
2Rmnot all zero such that
0rFx, ¯y)+λ1(fx, ¯y)∂V x)×{0})+(yf)(¯x, ¯y)Tλ2.(4.4)
In the case when ris positive, x, ¯y) is a stationary point of (CP). A sufficient condition
for rto be positive is that in the Fritz John condition, r= 0 implies that λ1
2are
all equal to zero. Unfortunately we now show that rcan always be taken as zero in
the above Fritz John condition for problem (CP). Indeed, from the definition of V(x),
we always have f(x, y)V(x)0 for any yY. Hence any feasible point x, ¯y)of
problem (CP) is always an optimal solution of the problem
min
(x,y)Rn×Yf(x, y)V(x) s.t. yf(x, y)=0.
By the Fritz John–type multiplier rule, there exist λ10
2Rmnot all equal to
zero such that
0λ1(fx, ¯y)∂V x)×{0})+(yf)(¯x, ¯y)Tλ2.(4.5)
Observe that (4.5) is (4.4) with r= 0. Since (λ1
2) is nonzero, we have shown that
the Fritz John condition (4.4) for problem (CP) holds with r=0. Inotherwords,
NNAMCQ (or, equivalently, GMFCQ) for problem (CP) never holds.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1406 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
It follows from [27, Theorems 5.1 and 5.5] that the integral entropy function γρ(x)
is a smoothing function with the gradient consistency property for the value function
V(x). That is,
lim
zx, ρ↑∞ γρ(z)=V(x)and = lim sup
zx, ρ↑∞
γρ(z)∂V (x).
For a sequence of iteration points {(xk,y
k)}, the set lim supk→∞ γρk(xk)maybe
strictly contained in ∂V (x). Therefore while (4.5) holds for some λ10
2Rm
not all equal to zero, the following inclusion may hold only when λ1=0
2=0:
0λ1fx, ¯y)lim sup
k→∞
γρk(xk)×{0}+(yf)(¯x, ¯y)Tλ2.
Then, consequently, the WNNAMCQ may hold. We illustrate this point by using
some numerical examples. In these examples, since yR, the problem (CP) has one
inequality constraint f(x, y)V(x)0 and one equality constraint yf(x, y)=0.
Hence the WNNAMCQ
0λ1fx, ¯y)lim sup
k→∞
γρk(xk)×{0}+λ2(yf)(¯x, ¯y)
10=λ1=λ2=0,
amounts to saying that for limk→∞(xk,y
k)=(¯x, ¯y)andv= limk→∞ γρk(xk), the
vectors
fx, ¯y)(v, 0) and (yf)(¯x, ¯y)
are linearly independent.
In our numerical experiments, we use the so-called limited-memory Broyden–
Fletcher–Goldfarb–Shanno (LBFGS) approach proposed by Nocedal [33], which is a
modification to the BFGS method for unconstrained optimization problems, to update
the matrix Wk. Define sk:= xk+1 xkand
yk:= fρk(xk+1)−∇fρk(xk)
p
i=1
λg
i,k (gi
ρk(xk+1)−∇gi
ρk(xk))
q
j=p+1
(λ+
j,k λ
j,k )(hj
ρk(xk+1)−∇hj
ρk(xk)).
We update Wk+1 by
Wk+1 =WkWksksT
kWk
sT
kWksk
+ykyT
k
sT
kyk
if and only if
sk≤γs, yk≤γy, and sT
kykγsy 2
for given (γs
y
sy )>0. Otherwise, we skip the update. As shown in [13], these
restrictions guarantee the existence of Mm>0 such that
md2dTWkdMd2.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1407
In numerical practice, it is impossible to obtain an exact “0”; thus we select some
small enough ε>0, ε>0 and change the update rule of rkand ρkto the case when
ξk
and
dk≤max{ˆηρ1
k},
respectively. Also the stopping criterion is considered as follows: For a given 1>0,
we terminate the algorithm at the kth iteration if
dk and ξk
.
In the remainder of this section, we test the algorithm for some bilevel problems.
Example 4.1 (see [30]). Consider Mirrlees’ problem. Note that the solution
of Mirrlees’ problem does not change if we add the constraint y[2,2] into the
problem:
min (x2)2+(y1)2
s.t.yS(x),
where S(x) is the solution set of the lower level program
min xexp[(y+1)
2]exp[(y1)2]
s.t.y[2,2].
It was shown in [30] that the unique optimal solution is x, ¯y)witx=1,¯y0.958
being the positive solution of the equation
(1 + y)=(1y)exp[4y].
In our test, we chose the initial point (x0,y
0)=(0.6,0.3) and the parameters
β=0.8
1=10
6
0= 100,r
0= 100,ˆη=5105= 10
= 10, ε=10
7,
and ε=10
10. Since the stopping criteria hold, we terminate at the 16th iteration
with (xk,y
k)=(1,0.95759). It seems that the sequence converges to (¯x, ¯y).
Since
f(xk,y
k)(γρk(xk),0) = (0.01784,0.00015),
(yf)(xk,y
k)=(0.084813,1.70049),
by virtue of the continuity of the gradients it is easy to see that the vectors
fx, ¯y)lim
k→∞ γρk(xk),0and (yf)(¯x, ¯y)
are linearly independent. Thus the WNNAMCQ holds at x, ¯y), and our algorithm
guarantees that x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of Mirrlees’ problem.
Example 4.2 (see [31, Example 3.14]). The bilevel program
min F(x, y):=x1
42
+y2
s.t.yS(x):=argmin
y[1,1]
f(x, y):= y3
3xy
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1408 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
has the optimal solution point x, ¯y)=(
1
4,1
2) with an objective value of 1
4.
In our test, we chose the initial point (x0,y
0)=(0.3,0.3) and the parameters
β=0.9
1=10
6
0= 100,r
0= 100,ˆη= 5000= 10
= 10, ε=10
7,and
ε=10
10. Since the stopping criteria hold, we terminate at the 7th iteration with
(xk,y
k)=(0.25,0.5). It seems that the sequence converges to x, ¯y).
Since
f(xk,y
k)(γρk(xk),0) = (1.5,0),
(yf)(xk,y
k)=(1,1),
by virtue of the continuity of the gradients it is easy to see that the vectors
fx, ¯y)lim
k→∞ γρk(xk),0and (yf)(¯x, ¯y)
are linearly independent. Thus the WNNAMCQ holds at x, ¯y), and our algorithm
guarantees that x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of the problem.
Example 4.3 (see [31, Example 3.20]). The bilevel program
min F(x, y):=(x0.25)2+y2
s.t.yS(x):=argmin
y[1,1]
f(x, y):= 1
3y3x2y
has the optimal solution point x, ¯y)=(
1
2,1
2) with an objective value of 5
16 .
In our test, we chose the parameters β=0.9
1=10
6
0= 100,r
0=
100,ˆη= 500= 10
= 10, ε=10
7,andε=10
10. Wechosetheinitial
point (x0,y
0)=(0.3,0.8). Since the stopping criteria hold, we terminate at the 8th
iteration with (xk,y
k)=(0.4999996,0.4999996). It seems that the sequence converges
to x, ¯y).
Since
f(xk,y
k)(γρk(xk),0) = (1.499898,0),
(yf)(xk,y
k)=(1,1),
by virtue of the continuity of the gradients it is easy to see that the vectors
fx, ¯y1)lim
k→∞ γρk(xk),0and (yf)(¯x, ¯y1)
are linearly independent. Thus the WNNAMCQ holds at x, ¯y), and our algorithm
guarantees that x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of the problem.
5. Conclusion. In this paper, we propose a smoothing SQP method for solv-
ing nonsmooth and nonconvex optimization problems with Lipschitz inequality and
equality constraints. The algorithm is applicable even to degenerate constrained op-
timization problems which do not satisfy the GMFCQ, the standard constraint qual-
ification for a local minimizer to satisfy the KKT conditions. Our main motivation
comes from solving the bilevel program which is nonsmooth, nonconvex, and never
satisfies the GMFCQ. In this paper, we have proposed the concept of the WGMFCQ
(equivalently, WNNAMCQ), a weaker version of the GMFCQ, and have shown the
global convergence of the smoothing SQP algorithm under the WGMFCQ. Moreover,
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1409
we have demonstrated the applicability of the smoothing SQP algorithm for solv-
ing the combined program of a simple bilevel program with a nonconvex lower level
problem. For smooth optimization problems, it is well known that the SQP methods
converge very quickly when the iterates are close to the solution. The rapid local
convergence of the SQP is due to the fact that the positive definite matrix Wkin the
SQP subproblem is an approximation of the Hessian matrix of the Lagrangian func-
tion. For our nonsmooth problem, the Lagrangian function is only locally Lipschitz,
and no classical Hessian matrix can be defined. However, it would be interesting to
study the local behavior of the smoothing SQP algorithm by using the generalized
second order subderivatives [38] of the Lagrangian function. This remains a topic of
our future research.
Acknowledgments. The authors are grateful to the anonymous referees for
their careful reading of the paper and helpful comments and to Shaoyan Guo for
helpful discussions.
REFERENCES
[1] J.F. Bard,Practical Bilevel Optimization: Algorithms and Applications,KluwerAcademic
Publishers, Dordrecht, The Netherlands, 1998.
[2] D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press,
New York, 1982.
[3] J.V. Burke and S.-P. Han,A robust sequential quadratic programming method, Math. Pro-
gramming, 43 (1989), pp. 277–303.
[4] J.V. Burke and T. Hoheisel,Epi-convergent smoothing with applications to convex composite
functions, SIAM J. Optim., 23 (2013), pp. 1457–1479.
[5] J.V. Burke, T. Hoheisel, and C. Kanzow,Gradient consistency for integral-convolution
smoothing functions, Set-Valued Var. Anal., 21 (2013), pp. 359–376.
[6] B. Chen and X. Chen,A global and local superlinear continuation-smoothing method for P0
and R0NCP or monotone NCP, SIAM J. Optim., 9 (1999), pp. 624–645.
[7] C. Chen and O.L. Mangasarian,A class of smoothing functions for nonlinear and mixed
complementarity problems, Math. Programming, 71 (1995), pp. 51–70.
[8] X. Chen,Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 134
(2012), pp. 71–99.
[9] X. Chen, R.S. Womersley, and J.J. Ye,Minimizing the condition number of a Gram matrix,
SIAM J. Optim., 21 (2011), pp. 127–148.
[10] F.H. Clarke,Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, 1983.
[11] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, and P.R. Wolenski,Nonsmooth Analysis and
Control Theory, Springer, New York, 1998.
[12] F.E. Curtis and M.L. Overton,A sequential quadratic programming algorithm for noncon-
vex, nonsmooth constrained optimization, SIAM J. Optim., 22 (2012), pp. 474–500.
[13] F.E. Curtis and X. Quez,An adaptive gradient sampling algorithm for nonsmooth optimiza-
tion, Optim. Methods Softw., 28 (2013), pp. 1302–1324.
[14] J.M. Danskin,The Theory of Max-Min and Its Applications to Weapons Allocation Problems,
Springer, New York, 1967.
[15] S. Dempe,Foundations of Bilevel Programming, Kluwer Academic Publishers, Dordrecht, The
Netherlands, 2002.
[16] S. Dempe,Annotated bibliography on bilevel programming and mathematical programs with
equilibrium constraints, Optimization, 52 (2003), pp. 333–359.
[17] F. Facchinei,Robust recursive quadratic programming algorithm model with global and super-
linear convergence properties, J. Optim. Theory Appl., 92 (1997), pp. 543–579.
[18] M. Fukushima and J.-S. Pang,Some feasibility issues in mathematical programs with equi-
librium constraints, SIAM J. Optim., 8 (1998), pp. 673–681.
[19] U.M. Garcia-Palomares and O.L. Mangasarian,Superlinearly convergent quasi-Newton
methods for nonlinearly constrained optimization problems, Math. Programming, 11
(1976), pp. 1–13.
[20] P.E. Gill and E. Wong,Sequential quadratic programming methods, in Mixed Integer Nonlin-
ear Programming, IMA Vol. Math. Appl. 154, Springer-Verlag, Berlin, 2012, pp. 147–224.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1410 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
[21] S.P. Han,Superlinearly convergent variable metric algorithms for general nonlinear program-
ming problems, Math. Programming, 11 (1976), pp. 263–282.
[22] S.P. Han,A globally convergent method for nonlinear programming, J. Optim. Theory Appl.,
22 (1977), pp. 297–309.
[23] M. Heinkenschloss,Projected sequential quadratic programming methods, SIAM J. Optim., 6
(1996), pp. 373–417.
[24] J.B. Hiriart-Urruty,Refinements of necessary optimality conditions in nondifferentiable
programming. I, Appl. Math. Optim., 5 (1979), pp. 63–82.
[25] H. Jiang and D. Ralph,Smooth SQP methods for mathematical programs with nonlinear
complementarity constraints, SIAM J. Optim, 10 (2000), pp. 779–808.
[26] B. Kummer,Newton’s method for nondifferentiable functions, in Advances in Mathematical
Optimization, Math. Res. 45, Akademie-Verlag, Berlin, 1988.
[27] G.-H. Lin, M. Xu, and J.J. Ye,On solving simple bilevel programs with a nonconvex lower
level program, Math. Program. Ser. A, 144 (2014), pp. 277–305.
[28] C. Ling, L. Qi, G.L. Zhou, and S.Y. Wu,Global convergence of a robust smoothing SQP
method for semi-infinite programming, J. Optim. Theory Appl., 129 (2006), pp. 147–164.
[29] X.-W. Liu and Y.-X. Yuan,A robust algorithm for optimization with general equality and
inequa lit y constraint s, SIAM J. Sci. Comput., 22 (2000), pp. 517–534.
[30] J.A. Mirrlees,The theory of moral hazard and unobservable behaviour: Part I, Rev. Econ.
Stud., 66 (1999), pp. 3–21.
[31] A. Mitsos and P.I. Barton,A Test Set for Bilevel Programs, Technical report, Department
of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 2006.
[32] Y. Nesterov,Smoothing minimization of nonsmooth functions, Math. Program., 103 (2005),
pp. 127–152.
[33] J. Nocedal,Updating quasi-Newton matrices with limited storage, Math. Comp., 35 (1980),
pp. 773–782.
[34] J.V. Outrata,On the numerical solution of a class of Stackelberg problems,Z.Oper.Res.,34
(1990), pp. 255–277.
[35] J.F.A. Pantoja and D.Q. Mayne,Exact penalty function algorithm with simple updating of
the penalty parameter, J. Optim. Theory Appl., 69 (1991), pp. 441–467.
[36] M.J.D. Powell and Y. Yuan,A recursive quadratic programming algorithm that uses differ-
entiable exact penalty functions, Math. Programming, 35 (1986), pp. 265–278.
[37] R.T. Rockafellar,Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[38] R.T. Rockafellar and R.J.-B. Wets,Var iational An al ysis, Springer-Verlag, Berlin, 1998.
[39] K. Shimizu, Y. Ishizuka, and J.F. Bard,Nondifferentiable and Two-Level Mathematical
Programming, Kluwer Academic Publishers, Boston, 1997.
[40] P. Spellucci,A new technique for inconsistent QP problems in the SQP method,Math.
Methods Oper. Res., 47 (1998), pp. 355–400.
[41] K. Tone,Revision of constraint approximations in the successive QP-method for nonlinear
programming problems. Math. Programming, 26 (1983), pp. 144–152.
[42] X. Tong, L.Q. Qi, G.L. Zhou, and S.Y. Wu,A smoothing SQP method for nonlinear programs
with stability constraints arising from power systems, Comput. Optim. Appl., 51 (2012),
pp. 175–197.
[43] L.N. Vicente and P.H. Calamai,Bilevel and multilevel programming: A bibliography review,
J. Global Optim., 5 (1994), pp. 291–306.
[44] R.B. Wilson,A Simplicial Algorithm for Concave Programming, Ph.D. thesis, Graduate
School of Business Administration, Harvard University, Cambridge, MA, 1963.
[45] M. Xu and J.J. Ye,A smoothing augmented Lagrangian method for solving simple bilevel
programs, Comput. Optim. Appl., 59 (2014), pp. 353–377.
[46] J.J. Ye and D.L. Zhu,Optimality conditions for bilevel programming problems, Optimization,
33 (1995), pp. 9–27.
[47] J.J. Ye and D.L. Zhu,A note on: “Optimality conditions for bilevel programming problems,”
Optimization, 39 (1997), pp. 361–366.
[48] J.J. Ye and D. Zhu,New necessary optimality conditions for bilevel programs by combining
the MPEC and value function approaches, SIAM J. Optim., 20 (2010), pp. 1885–1905.
[49] C. Zhang and X. Chen,Smoothing projected gradient method and its application to stochastic
linear complementarity problems, SIAM J. Optim., 20 (2009), pp. 627–649.
[50] J. Zhang and X. Zhang,A robust SQP method for optimization with inequality constraints,
J. Comput. Math, 21 (2003), pp. 247–256.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
... Over the past two decades, many numerical algorithms were proposed for solving bilevel programs. However, most of them assume that the lower level program is convex, with few exceptions [20,25,26,31,38,39,40]. In [25,26], an algorithm using the branch and bound in combination with the exchange technique was proposed to find approximate global optimal solutions. ...
... In [25,26], an algorithm using the branch and bound in combination with the exchange technique was proposed to find approximate global optimal solutions. Recently, the smoothing techniques were used to find stationary points of the valued function or the combined reformulation of simple bilevel programs [20,38,39,40]. ...
Preprint
A bilevel program is an optimization problem whose constraints involve another optimization problem. This paper studies bilevel polynomial programs (BPPs), i.e., all the functions are polynomials. We reformulate BPPs equivalently as semi-infinite polynomial programs (SIPPs), using Fritz John conditions and Jacobian representations. Combining the exchange technique and Lasserre type semidefinite relaxations, we propose numerical methods for solving both simple and general BPPs. For simple BPPs, we prove the convergence to global optimal solutions. Numerical experiments are presented to show the efficiency of proposed algorithms.
... The SQO-type method proposed by Wilson [28] has excellent rate of convergence and computational efficiency, and is one of the most effective methods for solving smooth constrained optimization problems, especially, for small-to-medium-scale problems. Therefore, it has received extensive attention and research; see e.g., [29][30][31][32][33][34][35][36][37][38][39][40][41]. The SQO-type method, also known as the sequential quadratic programming (SQP) method, is often used to solve the case where the objective functions f and θ of problem (P3) are sufficiently smooth (not necessarily convex). ...
Article
Full-text available
This paper discusses a class of two-block smooth large-scale optimization problems with both linear equality and linear inequality constraints, which have a wide range of applications, such as economic power dispatch, data mining, signal processing, etc. Our goal is to develop a novel partially feasible distributed (PFD) sequential quadratic optimization (SQO) method (PFD-SQOM) for this kind of problems. The design of the method is based on the ideas of SQO method and augmented Lagrangian Jacobi splitting scheme as well as feasible direction method, which decomposes the quadratic optimization (QO) subproblem into two small-scale QOs that can be solved independently and parallelly. A novel disturbance contraction term that can be suitably adjusted is introduced into the inequality constraints so that the feasible step size along the search direction can be increased to 1. The new iteration points are generated by the Armijo line search and the partially augmented Lagrangian function that only contains equality constraints as the merit function. The iteration points always satisfy all the inequality constraints of the problem. The global convergence and iteration complexity O(1/ε2)O(1/\varepsilon ^2) of the proposed PFD-SQOM are obtained under appropriate assumptions without the Kurdyka–Łojasiewicz (KL) property. Furthermore, the rate of convergence such as superlinear and quadratic rates of convergence of the proposed method are analyzed when the equality constraint vanishes. Finally, the numerical effectiveness of the method is tested on a class of academic examples and an economic power dispatch problem, which shows that the proposed method is promising.
... Building on the value function reformulation proposed by Outrata [30], Lin et al. introduced smoothing methods for nonconvex bilevel program, where the constraint set of (P x ) is independent of the upper level variable [19,39]. Based on partial calmness, researchers [8,15,38] introduced an equation system for the value function reformulation, parameterized by partial exact penalization. ...
Preprint
Full-text available
Bilevel optimization problems, encountered in fields such as economics, engineering, and machine learning, pose significant computational challenges due to their hierarchical structure and constraints at both upper and lower levels. Traditional gradient-based methods are effective for unconstrained bilevel programs with unique lower level solutions, but struggle with constrained bilevel problems due to the nonsmoothness of lower level solution mappings. To overcome these challenges, this paper introduces the Enhanced Barrier-Smoothing Algorithm (EBSA), a novel approach that integrates gradient-based techniques with an augmented Lagrangian framework. EBSA utilizes innovative smoothing functions to approximate the primal-dual solution mapping of the lower level problem, and then transforms the bilevel problem into a sequence of smooth single-level problems. This approach not only addresses the nonsmoothness but also enhances convergence properties. Theoretical analysis demonstrates its superiority in achieving Clarke and, under certain conditions, Bouligand stationary points for bilevel problems. Both theoretical analysis and preliminary numerical experiments confirm the robustness and efficiency of EBSA.
... To address the first difficulty mentioned above, smoothing techniques for the objective function and smoothing methods are promising [10,11,25,[39][40][41][42]. The smoothing techniques for deterministic nonsmooth convex programming problems are well known, including the integral convolution [11], the Nesterov's smoothing technique [25], as well as the inf-conv smoothing approximation [6]. ...
Preprint
We propose a novel stochastic smoothing accelerated gradient (SSAG) method for general constrained nonsmooth convex composite optimization, and analyze the convergence rates. The SSAG method allows various smoothing techniques, and can deal with the nonsmooth term that is not easy to compute its proximal term, or that does not own the linear max structure. To the best of our knowledge, it is the first stochastic approximation type method with solid convergence result to solve the convex composite optimization problem whose nonsmooth term is the maximization of numerous nonlinear convex functions. We prove that the SSAG method achieves the best-known complexity bounds in terms of the stochastic first-order oracle (SFO\mathcal{SFO}), using either diminishing smoothing parameters or a fixed smoothing parameter. We give two applications of our results to distributionally robust optimization problems. Numerical results on the two applications demonstrate the effectiveness and efficiency of the proposed SSAG method.
... Moreover, when the lower level multipliers are not unique, the resulting optimization usually has different local optimizers from the original ones [13,33]. In recent years, some numerical algorithms for bilevel programs that are not reformulated as MPECs are proposed in [20,33,37,41,51,60,61,64]. ...
Preprint
This paper studies bilevel polynomial optimization in which lower level constraining functions depend linearly on lower level variables. We show that such a bilevel program can be reformulated as a disjunctive program using partial Lagrange multiplier expressions (PLMEs). An advantage of this approach is that branch problems of the disjunctive program are easier to solve. In particular, since the PLME can be easily obtained, these branch problems can be efficiently solved by polynomial optimization techniques. Solving each branch problem either returns infeasibility or gives a candidate local or global optimizer for the original bilevel optimization. We give necessary and sufficient conditions for these candidates to be global optimizers, and sufficient conditions for the local optimality. Numerical experiments are also presented to show the efficiency of the method.
... Inspired by the effect of smoothing technique [23], Zhang and Chen studied a smoothing projected gradient method in [53] for minimizing (1) with locally Lipschitz continuous c and g = 0, and showed that any accumulation point of the iterates is a stationary point of the problem associated with a smoothing function. Many numerical algorithms based on the smoothing methods for solving the nonsmooth optimization problem have been studied extensively [15,24,35,50,54]. However, we find that most of these first-order acceleration algorithms mentioned above for solving nonsmooth convex optimization problems do not have sequential convergence. ...
Article
Full-text available
We propose a smoothing accelerated proximal gradient (SAPG) method with fast convergence rate for finding a minimizer of a decomposable nonsmooth convex function over a closed convex set. The proposed algorithm combines the smoothing method with the proximal gradient algorithm with extrapolation k-1k+α-1 and α>3. The updating rule of smoothing parameter μk is a smart scheme and guarantees the global convergence rate of o(lnσk/k) with σ∈(12,1] on the objective function values. Moreover, we prove that the iterates sequence is convergent to an optimal solution of the problem. We then introduce an error term in the SAPG algorithm to get the inexact smoothing accelerated proximal gradient algorithm. And we obtain the same convergence results as the SAPG algorithm under the summability condition on the errors. Finally, numerical experiments show the effectiveness and efficiency of the proposed algorithm.
... (1.2) Then (1.2) can be solved by employing existing approaches for constrained optimization, including augmented Lagrangian methods [29], sequential quadratic programming methods [8,34], etc. However, these approaches treat BLO as a constrained optimization problem with p equality constraints, hence they are usually not as efficient as those aforementioned single-loop and double-loop approaches in practice [16]. ...
Preprint
In this paper, we focus on the nonconvex-strongly-convex bilevel optimization problem (BLO). In this BLO, the objective function of the upper-level problem is nonconvex and possibly nonsmooth, and the lower-level problem is smooth and strongly convex with respect to the underlying variable y. We show that the feasible region of BLO is a Riemannian manifold. Then we transform BLO to its corresponding unconstrained constraint dissolving problem (CDB), whose objective function is explicitly formulated from the objective functions in BLO. We prove that BLO is equivalent to the unconstrained optimization problem CDB. Therefore, various efficient unconstrained approaches, together with their theoretical results, can be directly applied to BLO through CDB. We propose a unified framework for developing subgradient-based methods for CDB. Remarkably, we show that several existing efficient algorithms can fit the unified framework and be interpreted as descent algorithms for CDB. These examples further demonstrate the great potential of our proposed approach.
Article
In this paper, we study the generalized subdifferentials and the Riemannian gradient subconsistency that are the basis for non-Lipschitz optimization on embedded submanifolds of [Formula: see text]. We then propose a Riemannian smoothing steepest descent method for non-Lipschitz optimization on complete embedded submanifolds of [Formula: see text]. We prove that any accumulation point of the sequence generated by the Riemannian smoothing steepest descent method is a stationary point associated with the smoothing function employed in the method, which is necessary for the local optimality of the original non-Lipschitz problem. We also prove that any accumulation point of the sequence generated by our method that satisfies the Riemannian gradient subconsistency is a limiting stationary point of the original non-Lipschitz problem. Numerical experiments are conducted to demonstrate the advantages of Riemannian [Formula: see text] [Formula: see text] optimization over Riemannian [Formula: see text] optimization for finding sparse solutions and the effectiveness of the proposed method. Funding: C. Zhang was supported in part by the National Natural Science Foundation of China [Grant 12171027] and the Natural Science Foundation of Beijing [Grant 1202021]. X. Chen was supported in part by the Hong Kong Research Council [Grant PolyU15300219]. S. Ma was supported in part by the National Science Foundation [Grants DMS-2243650 and CCF-2308597], the UC Davis Center for Data Science and Artificial Intelligence Research Innovative Data Science Seed Funding Program, and a startup fund from Rice University.
Article
Full-text available
In this paper, we design a numerical algorithm for solving a simple bilevel program where the lower level program is a nonconvex minimization problem with a convex set constraint. We propose to solve a combined problem where the first order condition and the value function are both present in the constraints. Since the value function is in general nonsmooth, the combined problem is in general a nonsmooth and nonconvex optimization problem. We propose a smoothing augmented Lagrangian method for solving a general class of nonsmooth and nonconvex constrained optimization problems. We show that, if the sequence of penalty parameters is bounded, then any accumulation point is a Karush-Kuch-Tucker (KKT) point of the nonsmooth optimization problem. The smoothing augmented Lagrangian method is used to solve the combined problem. Numerical experiments show that the algorithm is efficient for solving the simple bilevel program.
Article
Full-text available
In J. Guddat et al. eds. Advances in Math. Optimization. Akademie Verlag Berlin, Ser. Math. Res. 45: 114--125 (1988). Necessary and sufficient conditions for convergence of Newton's method are presented for non-smooth equations.. Also sufficient conditions for generalized equations and an example with always alternating Newton iterates are added. Proposition 3 covers the known semismooth Newton-approaches.
Article
Full-text available
Chen and Mangasarian (Comput Optim Appl 5:97–138, 1996) developed smoothing approximations to the plus function built on integral-convolution with density functions. X. Chen (Math Program 134:71–99, 2012) has recently picked up this idea constructing a large class of smoothing functions for nonsmooth minimization through composition with smooth mappings. In this paper, we generalize this idea by substituting the plus function for an arbitrary finite max-function. Calculus rules such as inner and outer composition with smooth mappings are provided, showing that the new class of smoothing functions satisfies, under reasonable assumptions, gradient consistency, a fundamental concept coined by Chen (Math Program 134:71–99, 2012). In particular, this guarantees the desired limiting behavior of critical points of the smooth approximations.
Article
Full-text available
In this paper, we consider a simple bilevel program where the lower level program is a nonconvex minimization problem with a convex set constraint and the upper level program has a convex set constraint. By using the value function of the lower level program, we reformulate the bilevel program as a single level optimization problem with a nonsmooth inequality constraint and a convex set constraint. To deal with such a nonsmooth and nonconvex optimization problem, we design a smoothing projected gradient algorithm for a general optimization problem with a nonsmooth inequality constraint and a convex set constraint. We show that, if the sequence of penalty parameters is bounded then any accumulation point is a stationary point of the nonsmooth optimization problem and, if the generated sequence is convergent and the extended Mangasarian-Fromovitz constraint qualification holds at the limit then the limit point is a stationary point of the nonsmooth optimization problem. We apply the smoothing projected gradient algorithm to the bilevel program if a calmness condition holds and to an approximate bilevel program otherwise. Preliminary numerical experiments show that the algorithm is efficient for solving the simple bilevel program.
Article
A new algorithm for inequality constrained optimization is presented, which solves a linear programming subproblem and a quadratic subproblem at each iteration. The algorithm can circumvent the difficulties associated with the possible inconsistency of QP subproblem of the original SQP method. Moreover, the algorithm can converge to a point which satisfies a certain first-order necessary condition even if the original problem is itself infeasible. Under certain condition, some global convergence results are proved and local superlinear convergence results are also obtained. Preliminary numerical results are reported.
Article
We consider a class of smoothing methods for minimization problems where the feasible set is convex but the objective function is not convex, not differentiable and perhaps not even locally Lipschitz at the solutions. Such optimization problems arise from wide applications including image restoration, signal reconstruction, variable selection, optimal control, stochastic equilibrium and spherical approximations. In this paper, we focus on smoothing methods for solving such optimization problems, which use the structure of the minimization problems and composition of smoothing functions for the plus function (x)+. Many existing optimization algorithms and codes can be used in the inner iteration of the smoothing methods. We present properties of the smoothing functions and the gradient consistency of subdifferential associated with a smoothing function. Moreover, we describe how to update the smoothing parameter in the outer iteration of the smoothing methods to guarantee convergence of the smoothing methods to a stationary point of the original minimization problem.
Article
In this paper we introduce and analyze a class of optimization methods, called projected sequential quadratic programming (SQP) methods, for the solution of optimization problems with nonlinear equality constraints and simple bound constraints on parts of the variables. Such problems frequently arise in the numerical solution of optimal control problems. Projected SQP methods combine the ideas of projected Newton methods and SQP methods. They use the simple projection onto the set defined by the bound constraints and maintain feasibility with respect to these constraints. The iterates are computed using an extension of SQP methods and require only the solution of the linearized equality constraint. Global convergence of these methods is enforced using a constrained merit function and an Armijo-like line search. We discuss global and local convergence properties of these methods, the identification of active indices, and we present numerical examples for an optimal control problem governed by a nonlinear heat equation.
Article
We propose a continuation method for a class of nonlinear complementarity problems(NCPs), including the NCP with a P 0 and R 0 function and the monotone NCP with a feasible interior point. The continuation method is based on a class of Chen-Mangasarian smooth functions. Unlike many existing continuation methods, the method follows the non-interior smoothing paths, and as a result, an initial point can be easily constructed. In addition, we introduce a procedure to dynamically update the neighborhoods associated with the smoothing paths, so that the algorithm is both globally convergent and locally superlinearly convergent under suitable assumptions. Finally, a hybrid continuation-smoothing method is proposed and is shown to have the same convergence properties under weaker conditions. 1 Introduction Let F : R n ! R n be a continuously differentiable function. The nonlinear complementarity problem, denoted by NCP(F ), is to find a vector (x; y) 2 R n Theta R n such that F (x)...