Content uploaded by L.W. Zhang
Author content
All content in this area was uploaded by L.W. Zhang on May 12, 2018
Content may be subject to copyright.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. OPTIM.c
2015 Society for Industrial and Applied Mathematics
Vol. 25, No. 3, pp. 1388–1410
SMOOTHING SQP METHODS FOR SOLVING DEGENERATE
NONSMOOTH CONSTRAINED OPTIMIZATION PROBLEMS WITH
APPLICATIONS TO BILEVEL PROGRAMS∗
MENGWEI XU†, JANE J. YE‡,AND LIWEI ZHANG§
Abstract. We consider a degenerate nonsmooth and nonconvex optimization problem for which
the standard constraint qualification such as the generalized Mangasarian–Fromovitz constraint qual-
ification (GMFCQ) may not hold. We use smoothing functions with the gradient consistency property
to approximate the nonsmooth functions and introduce a smoothing sequential quadratic program-
ming (SQP) algorithm under the l∞penalty framework. We show that any accumulation point of a
selected subsequence of the iteration sequence generated by the smoothing SQP algorithm is a Clarke
stationary point, provided that the sequence of multipliers and the sequence of penalty parameters
are bounded. Furthermore, we propose a new condition called the weakly generalized Mangasarian–
Fromovitz constraint qualification (WGMFCQ) that is weaker than the GMFCQ. We show that the
extended version of the WGMFCQ guarantees the boundedness of the sequence of multipliers and
the sequence of penalty parameters and thus guarantees the global convergence of the smoothing
SQP algorithm. We demonstrate that the WGMFCQ can be satisfied by bilevel programs for which
the GMFCQ never holds. Preliminary numerical experiments show that the algorithm is efficient for
solving degenerate nonsmooth optimization problems such as the simple bilevel program.
Key words. nonsmooth optimization, constrained optimization, smoothing function, sequential
quadratic programming algorithm, bilevel program, constraint qualification
AMS subject classifications. 65K10, 90C26, 90C30
DOI. 10.1137/140971580
1. Introduction. In this paper, we consider the constrained optimization prob-
lem of the form
(P) min f(x)
s.t.g
i(x)≤0,i=1,...,p,
hj(x)=0,j=p+1,...,q,
where the objective function and constraint functions f,gi(i=1,...,p),h
j(j=
p+1,...,q):Rn→Rare locally Lipschitz. In particular, our focus is on solving
a degenerate problem for which the generalized Mangasarian–Fromovitz constraint
qualification (GMFCQ) may not hold at a stationary point.
The sequential quadratic programming (SQP) method is one of the most effective
methods for solving smooth constrained optimization problems. For the current iter-
ation point xk, the basic idea of the SQP method is to generate a descent direction
∗Received by the editors June 4, 2014; accepted for publication (in revised form) May 8, 2015;
published electronically July 14, 2015.
http://www.siam.org/journals/siopt/25-3/97158.html
†Department of Mathematics, School of Science, Tianjin University, Tianjin, 300072, China
(xumengw@hotmail.com).
‡Department of Mathematics and Statistics, University of Victoria, Victoria V8W 2Y2, BC,
Canada (janeye@uvic.ca). The research of this author was partially supported by NSERC.
§School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
(lwzhang@dlut.edu.cn). The research of this author was supported by the National Natural Sci-
ence Foundation of China under project 91330206.
1388
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1389
dkby solving the following quadratic programming problem:
min
d∇f(xk)Td+1
2dTWkd
s.t.g
i(xk)+∇gi(xk)Td≤0,i=1,...,p,
hj(xk)+∇hj(xk)Td=0,j=p+1,...,q,
where ∇f(x) denotes the gradient of function fat xand Wkis a symmetric positive
definite matrix that approximates the Hessian matrix of the Lagrangian function.
Then dkis used to generate the next iteration point: xk+1 := xk+αkdk,where the
stepsize αkis chosen to yield a sufficient decrease of a suitable merit function. The
SQP algorithm with αk= 1 was first studied by Wilson in [44], where the exact
Hessian matrix of the Lagrangian function was used as Wk. Garcia-Palomares and
Mangasarian [19] proposed to use an estimate to approximate the Hessian matrix. Han
[21] proposed to update the matrix Wkby the Broyden–Fletcher–Goldfarb–Shanno
(BFGS) formula. When the stepsize αk= 1, the convergence is only local. To
obtain a global convergence, Han [22] proposed to use the classical l1penalty function
as a merit function to determine the step size. While the l1penalty function is not
differentiable, the authors of [36] suggested using the augmented Lagrangian function,
which is a smooth function as a merit function. The inconsistency of the system of the
linearized constraints is a serious limitation of the SQP method. Several techniques
have been introduced to deal with the possible inconsistency. For example, Pantoja
and Mayne [35] proposed to replace the standard SQP subproblem by the following
penalized SQP subproblem:
min
d,ξ ∇f(xk)Td+1
2dTWkd+rkξ
s.t.g
i(xk)+∇gi(xk)Td≤ξ, i =1,...,p,
−ξ≤hj(xk)+∇hj(xk)Td≤ξ, j =p+1,...,q,
ξ≥0,
where the penalty parameter rk>0. Unlike the standard SQP subproblem which
may not have feasible solutions, the penalized SQP subproblem is always feasible.
Other alternative methods for inconsistency of the SQP method are also presented
[3, 17, 20, 29, 40, 41, 50]. For nonlinear programs which have some simple bound
constraints on some of the variables, Heinkenschloss [23] proposed a projected SQP
method which combines the ideas of the projected Newton methods and the SQP
method.
Recently Curtis and Overton [12] pointed out that applying SQP methods di-
rectly to a general nonsmooth and nonconvex constrained optimization problem will
fail in theory and in practice. They employed a process of gradient sampling (GS)
method to make the search direction effective in nonsmooth regions and proved that
the iteration points generated by the SQP-GS method converge globally to a station-
ary point of the penalty function with probability one. A smoothing method is a
well-recognized technique for numerical solution of a nonsmooth optimization prob-
lem. Using a smoothing method, one replaces the nonsmooth function by a suitable
smooth approximation, solves a sequence of smooth problems, and drives the approx-
imation closer and closer to the original problem. The fundamental question is as
follows: What property should a family of the smoothing functions have in order for
the stationary points of the smoothing problems to approach a stationary point of
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1390 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
the original problem? In most of the literature, a particular smoothing function is
employed for the particular problem studied. It turns out that not all smooth approx-
imations of the nonsmooth function can be used in the smoothing technique to obtain
the desired result; an example for which the smoothing method fails to converge with
almost all initial points was given by Kummer [26]. Zhang and Chen [49] (see also
the recent survey on the subject by Chen [8]) identified the desired property as the
gradient consistency property. Zhang and Chen [49] proposed a smoothing projected
gradient algorithm for solving optimization problems with a convex set constraint
by using a family of smoothing functions with the gradient consistency property to
approximate the nonsmooth objective function. They proved that any accumulation
point of the iteration sequence is a Clarke stationary point of the original nonsmooth
optimization problem. Recently [27, 45] extended the result of [49] to a class of non-
smooth constrained optimization problems using the projected gradient method and
the augmented Lagrangian method, respectively. Smoothing functions were proposed
and the SQP method was used for the smooth problem in [18, 25] to solve the mathe-
matical programs with complementarity constraints (MPCC) and in [28, 42] to solve
the semi-infinite programming (SIP). In this paper we will combine the SQP method
and the smoothing technique to design a smoothing SQP method for a class of general
constrained optimization problems with smoothing functions satisfying the gradient
consistency property.
For the SQP method under a penalty framework to converge globally, usually the
set of the multipliers is required to be bounded (see, e.g., [2]). This amounts to saying
that the MFCQ is required to hold. For the nonsmooth optimization problem, the
corresponding MFCQ is referred to as the GMFCQ. Unfortunately, the GMFCQ is
quite strong for certain classes of problems. For example, it is well known by now
that the GMFCQ never holds for the bilevel program [46]. Another example of a non-
smooth optimization problem which does not satisfy the GMFCQ is a reformulation
of an SIP [28]. In this paper we propose a new constraint qualification that is much
weaker than the GMFCQ. We call it the weakly generalized Mangasarian–Fromovitz
constraint qualification (WGMFCQ). WGMFCQ is not a constraint qualification in
the classical sense. It is defined in terms of the smoothing functions and the se-
quence of iteration points generated by the smoothing algorithm. In our numerical
experiment, WGMFCQ is very easy to satisfy for the bilevel programs.
Both the objective function and the constrained functions may be nonsmooth.
We first use some smoothing functions approximating the nonsmooth functions and
then consider the robust formulation which is proposed by Pantoja and Mayne. Under
the EWGMFCQ, global convergence can be obtained.
The rest of the paper is organized as follows. In section 2, we present prelimi-
naries which will be used in this paper and introduce the new constraint qualification
WGMFCQ. In section 3, we consider the smoothing approximations of the original
problem and propose the smoothing SQP method under an l∞penalty framework.
Then we establish the global convergence for the algorithm. In section 4, we apply
the smoothing SQP method to bilevel programs. The final section contains some
concluding remarks.
We adopt the following standard notation in this paper. For any two vectors a
and bin Rn, we denote their inner product by aTb. Given a function G:Rn→Rm,
we denote its Jacobian by ∇G(z)∈Rm×n, and, if m= 1, the gradient ∇G(z)∈Rnis
considered as a column vector. For a set Ω ⊆Rn,wedenotetheinterior,therelative
interior, the closure, the convex hull, and the distance from xto Ω by int Ω, ri Ω, cl Ω,
co Ω, and dist(x, Ω), respectively. For a matrix A∈Rn×m,ATdenotes its transpose.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1391
In addition, we let Nbe the set of nonnegative integers and exp[z] be the exponential
function.
2. Preliminaries and the new constraint qualifications. In this section,
we first present some background materials and results which will be used later. We
then discuss the issue of constraint qualifications.
Let ϕ:Rn→Rbe Lipschitz continuous near ¯x. The directional derivative of ϕ
at ¯xin direction dis defined by
ϕ(¯x;d) := lim
t↓0
ϕ(¯x+td)−ϕ(¯x)
t.
The Clarke generalized directional derivative of ϕat ¯xin direction dis defined by
ϕ◦(¯x;d) := lim sup
x→¯x, t↓0
ϕ(x+td)−ϕ(x)
t.
The Clarke generalized gradient of ϕat ¯xis a convex and compact subset of Rn
defined by
∂ϕ(¯x):={ξ∈Rn:ξTd≤ϕ◦(¯x;d)∀d∈Rn}.
Note that when ϕis convex, the Clarke generalized gradient coincides with the sub-
differential in the sense of convex analysis, i.e.,
∂ϕ(¯x)={ξ∈Rn:ξT(x−¯x)≤ϕ(x)−ϕ(¯x)∀x∈Rn},
and, when ϕis continuously differentiable at ¯x,wehave∂ϕ(¯x)={∇ϕ(¯x)}. Detailed
discussions of the Clarke generalized gradient and its properties can be found in
[10,11].
For ¯x, a feasible solution of problem (P), we denote by I(¯x):={i=1,...,p :
gi(¯x)=0}the active set at ¯x. The following nonsmooth Fritz John–type multiplier
rule holds by Clarke [10, Theorem 6.1.1]) and the nonsmooth calculus (see, e.g., [10]).
Theorem 2.1 (Fritz John multiplier rule). Let ¯xbe a local optimal solution of
problem (P). Then there exist r≥0,λ
i≥0(i∈I(¯x)),λj∈R(j=p+1,...,q)not
all zero such that
0∈r∂f (¯x)+
i∈I(¯x)
λi∂gi(¯x)+
q
j=p+1
λj∂hj(¯x).(2.1)
There are two possible cases in the Fritz John multiplier rule: r>0orr=0. Let
¯xbe a feasible solution of problem (P). If the Fritz John condition (2.1) holds with
r>0, then we call ¯xa (Clarke) stationary point of (P). According to Clarke [10], any
multiplier λ∈Rqwith λi≥0,i=1,...,p, satisfying the Fritz John condition (2.1)
with r= 0 is an abnormal multiplier. From the Fritz John multiplier rule, it is easy
to see that if there is no nonzero abnormal multiplier, then any local optimal solution
¯xmust be a stationary point. Hence it is natural to define the following constraint
qualification.
Definition 2.1 (NNAMCQ). We say that the no nonzero abnormal multiplier
constraint qualification (NNAMCQ) holds at a feasible point ¯xof problem (P) if
0∈
i∈I(¯x)
λi∂gi(¯x)+
q
j=p+1
λj∂hj(¯x)and λi≥0,i∈I(¯x),
=⇒λi=0,λ
j=0,i∈I(¯x),j=p+1,...,q.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1392 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
It is easy to see that NNAMCQ amounts to saying that any collection of vectors
{vi,i∈I(¯x),v
p+1,...,v
q},
where vi∈∂gi(¯x)(i∈I(¯x)),v
j∈∂hj(¯x)(j=p+1,...,q), are positively lin-
early independent. NNAMCQ is equivalent to the generalized MFCQ which was first
introduced by Hiriart-Urruty [24].
Definition 2.2 (GMFCQ). Afeasiblepoint¯xis said to satisfy the general-
ized Mangasarian–Fromovitz constraint qualification (GMFCQ) for problem (P) if for
any given collection of vectors {vi,i ∈I(¯x),v
p+1,...,v
q},wherevi∈∂gi(¯x)(i∈
I(¯x)),v
j∈∂hj(¯x)(j=p+1,...,q), the following two conditions hold:
(i) vp+1,...,v
qare linearly independent.
(ii) There exists a direction dsuch that
vT
id<0,i∈I(¯x),
vT
jd=0,j=p+1,...,q.
In order to accommodate infeasible accumulation points in the numerical algo-
rithm, we now extend the NNAMCQ and the GMFCQ to allow infeasible points.
Note that when ¯xis feasible, ENNAMCQ and EGMFCQ (see Definitions 2.3 and 2.4)
reduce to NNAMCQ and GMFCQ, respectively.
Definition 2.3 (ENNAMCQ). We say that the extended no nonzero abnormal
multiplier constraint qualification (ENNAMCQ) holds at ¯x∈Rnif
0∈
p
i=1
λi∂gi(¯x)+
q
j=p+1
λj∂hj(¯x)and λi≥0,i=1,...,p,
p
i=1
λigi(¯x)+
q
j=p+1
λjhj(¯x)≥0
imply that λi=0,λ
j=0for all i=1,...,p,j=p+1,...,q.
Definition 2.4 (EGMFCQ). Apoint¯x∈Rnis said to satisfy the extended
generalized Mangasarian–Fromovitz constraint qualification (EGMFCQ) for problem
(P) if for any given collection of vectors {vi,v
j:i=1,...,p, j =p+1,...,q},where
vi∈∂gi(¯x),v
j∈∂hj(¯x), the following two conditions hold:
(i) vp+1,...,v
qare linearly independent.
(ii) There exists a direction dsuch that
gi(¯x)+vT
id<0,i=1,...,p,
hj(¯x)+vT
jd=0,j=p+1,...,q.
Note that under the extra assumption that the functions giare directionally
differentiable, the EGMFCQ coincides with the conditions (B4) and (B5) in [25].
Since the set of the Clarke generalized gradient can be large, the ENNAMCQ and
the EGMFCQ may be too strong for some problems to hold. In what follows, we pro-
pose two conditions that are much weaker than the ENNAMCQ and the EGMFCQ,
respectively. For this purpose, we first recall the definition of smoothing functions.
Definition 2.5. Let g:Rn→Rbe a locally Lipschitz function. Assume that,
for a given ρ>0,gρ:Rn→Ris a continuously differentiable function. We say that
{gρ:ρ>0}is a family of smoothing functions of gif limz→x, ρ↑∞ gρ(z)=g(x)for
any fixed x∈Rn.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1393
Such sequence gρ(·) converges continuously to g(·)asdefinedin[38].
Definition 2.6 (see [4, 9]). Let g:Rn→Rbe a locally Lipschitz continu-
ous function. We say that a family of smoothing functions {gρ:ρ>0}of gsat-
isfies the gradient consistency property if lim supz→x, ρ↑∞ ∇gρ(z)is nonempty and
lim supz→x, ρ↑∞ ∇gρ(z)⊆∂g(x)for any x∈Rn,wherelim supz→x, ρ↑∞ ∇gρ(z)de-
notes the set of all limiting points
lim sup
z→x, ρ↑∞
∇gρ(z):=lim
k→∞ ∇gρk(zk):zk→x, ρk↑∞
.
Note that according to [38, Theorem 9.61 and Corollary 8.47(b)], for a locally
Lipschitz function gand its smoothing family {gρ:ρ>0}, one always has the
inclusion
∂g(x)⊆co lim sup
z→x, ρ↑∞
∇gρ(z).
Thus our definition of gradient consistency is equivalent to saying that
∂g(x) = co lim sup
z→x, ρ↑∞
∇gρ(z),
which is the definition used in [5, 8].
It is natural to ask whether one can always find a family of smoothing functions
with the gradient consistency property for a locally Lipschitz function. The answer
is yes. Rockafellar and Wets [38, Example 7.19 and Theorem 9.67] show that for any
locally Lipschitz function g, one can construct a family of smoothing functions of g
with the gradient consistency property by the integral convolution:
gρ(x):=Rn
g(x−y)φρ(y)dy =Rn
g(y)φρ(x−y)dy,
where φρ:Rn→R+is a sequence of bounded, measurable functions with Rnφρ(x)dx
= 1 such that the sets Bρ={x:φρ(x)>0}form a bounded sequence converging to
{0}as ρ↑∞. Although one can always generate a family of smoothing functions with
the gradient consistency property by integral convolution with bounded supports,
there are many other smoothing functions which are not generated by the integral
convolution with bounded supports [5, 6, 7, 8, 32].
Using the smoothing technique, we approximate the locally Lipschitz functions
f(x), gi(x), i=1,...,p,andhj(x), j=p+1,...,q, by families of smoothing functions
{fρ(x):ρ>0},{gi
ρ(x):ρ>0},i=1,...,p,and{hj
ρ(x):ρ>0},j=p+1,...,q.We
also assume that these families of smoothing functions satisfy the gradient consistency
property. We use certain algorithms to solve the smooth problem and drive the
smoothing parameter ρto infinity. Based on the sequence of iteration points of the
algorithm, we now define the new conditions.
Definition 2.7 (WNNAMCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞. Suppose that ¯xis a feasible accumulation
point of the sequence {xk}. We say that the weakly no nonzero abnormal multiplier
constraint qualification (WNNAMCQ) based on the smoothing functions {gi
ρ(x):ρ>
0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that for any
K0⊂K⊂Nsuch that limk→∞,k∈Kxk=¯xand any collection of vectors {vi(i∈
I(¯x)),v
j(j=p+1,...,q)},where
vi= lim
k→∞,k∈K0
∇gi
ρk(xk),i∈I(¯x),v
j= lim
k→∞,k∈K0
∇hj
ρk(xk),j=p+1,...,q,
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1394 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
0=
i∈I(¯x)
λivi+
q
j=p+1
λjvjand λi≥0,i∈I(¯x)
=⇒λi=0,λ
j=0,i∈I(¯x),j=p+1,...,q.
Definition 2.8 (WGMFCQ). Let {xk}be a sequence of iteration points for prob-
lem (P),andletρk↑∞as k→∞.Let¯xbe a feasible accumulation point of the
sequence {xk}. We say that the weakly generalized Mangasarian–Fromovitz constraint
qualification (WGMFCQ) based on the smoothing functions {gi
ρ(x):ρ>0},i=
1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the following con-
ditions hold. For any K0⊂K⊂Nsuch that limk→∞,k∈Kxk=¯xand any collect ion
of vectors {vi(i∈I(¯x)),v
j(j=p+1,...,q)},wherevi= limk→∞,k∈K0∇gi
ρk(xk),i∈
I(¯x),v
j= limk→∞,k∈K0∇hj
ρk(xk),j=p+1,...,q,
(i) vp+1,...,v
qare linearly independent;
(ii) there exists a direction dsuch that
vT
id<0∀i∈I(¯x),
vT
jd=0 ∀j=p+1,...,q.
We now extend the WNNAMCQ and the WGMFCQ to accommodate infeasible
points.
Definition 2.9 (EWNNAMCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞.Let¯xbe an accumulation point of the
sequence {xk}. We say that the extended weakly no nonzero abnormal multiplier
constraint qualification (EWNNAMCQ) based on the smoothing functions {gi
ρ(x):
ρ>0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the
following condition holds. For any K0⊂K⊂Nsuch that limk→∞,k∈Kxk=¯xand
any
vi= lim
k→∞,k∈K0
∇gi
ρk(xk),i=1,...,p, v
j= lim
k→∞,k∈K0
∇hi
ρk(xk),j=p+1,...,q,
0=
p
i=1
λivi+
q
j=p+1
λjvjand λi≥0,i=1,...,p,(2.2)
p
i=1
λigi(¯x)+
q
j=p+1
λjhj(¯x)≥0(2.3)
imply that λi=0,λ
j=0,i=1,...,p,j=p+1,...,q.
Definition 2.10 (EWGMFCQ). Let {xk}be a sequence of iteration points for
problem (P),andletρk↑∞as k→∞.Let¯xbe an accumulation point of the
sequence {xk}. We say that the extended weakly generalized Mangasarian–Fromovitz
constraint qualification (EWGMFCQ) based on the smoothing functions {gi
ρ(x):ρ>
0},i=1,...,p,{hj
ρ(x):ρ>0},j=p+1,...,q, holds at ¯x, provided that the
following conditions hold. For any K0⊂K⊂Nsuch that limk→∞,k∈Kxk=¯xand
any collection of vectors {vi,v
j:i=1,...,p, j =p+1,...,q},where
vi= lim
k→∞,k∈K0
∇gi
ρk(xk),i=1,...,p, v
j= lim
k→∞,k∈K0
∇hi
ρk(xk),j=p+1,...,q,
(i) vp+1,...,v
qare linearly independent;
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1395
(ii) there exists a direction dsuch that
gi(¯x)+vT
id<0∀i=1,...,p,(2.4)
hj(¯x)+vT
jd=0 ∀j=p+1,...,q.(2.5)
Due to the gradient consistency property, it is easy to see that, in general, the
EWNNAMCQ and the EWGMFCQ are weaker than the ENNAMCQ and the EGM-
FCQ, respectively. We finish this section with an equivalence between the EWGM-
FCQ and EWNNAMCQ.
Theorem 2.2. The following implication always holds:
EWGMFCQ ⇐⇒ EWNNAMCQ.
Proof. We first show that EWGMFCQ implies EWNNAMCQ. To the contrary we
suppose that EWGMFCQ holds, but EWNNAMCQ does not hold, which means that
there exist scalars λi∈R,i=1,...,q, not all zero such that conditions (2.2)–(2.3)
hold. Suppose that dis the direction that satisfies condition (ii) of EWGMFCQ. Due
to the linear independence of vp+1,...,v
q(condition (i) of EWGMFCQ), the scalars
λi,i=1,...,p, cannot all be equal to zero. Multiplying both sides of condition (2.2)
by d, it follows from conditions (2.4) and (2.5) that
0=
p
i=1
λivT
id+
q
j=p+1
λjvT
jd
<−
p
i=1
λigi(¯x)−
q
j=p+1
λjhj(¯x)≤0,
which is a contradiction. Therefore, EWNNAMCQ holds.
We now prove the reverse implication. Assume that EWNNAMCQ holds. EWN-
NAMCQ implies condition (i) of EWGMFCQ. If both (i) and (ii) of EWGMFCQ
hold, we are done. Suppose that condition (ii) of EWGMFCQ does not hold; that is,
there exist a subsequence K0⊂K⊂Nand v1,...,v
qwith limk→∞,k∈Kxk=¯xand
vi= lim
k→∞,k∈K0
∇gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,k∈K0
∇hj
ρk(xk),j=p+1,...,q,
such that for all directions d,(2.4) or (2.5) fails to hold. Let A:= [v1,...,v
q]bethe
matrix where v1,...,v
qare columns and
S1:= {z:∃dsuch that z=ATd},
S2:= {z:zi<−gi(¯x),i=1,...,p, z
j=−hj(¯x),j=p+1,...,q}.
Since the convex sets S1and clS2are nonempty and ri S1and ri clS2have no point
in common by the violation of condition (ii) of EWGMFCQ, from [37, Theorem 11.3],
there exists a hyperplane separating S1and clS2properly. Since S1is a subspace and
thus a cone, from [37, Theorem 11.7], there exists a hyperplane separating S1and
clS2properly and passing through the origin. By the separation theorem (see, e.g.,
[37, Theorem 11.1]), there exists a vector ysuch that
inf{yTz:z∈S1}≥0≥sup{yTz:z∈clS2},
sup{yTz:z∈S1}>inf{yTz:z∈clS2}.(2.6)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1396 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
From (2.6), we know that y= 0. Therefore, there exists y∈Rq,y= 0, such that
yTz≥0 for all z∈S1and yTz≤0 for all z∈clS2.
(a) We first consider the inequality yTz≤0 for all z∈clS2. By taking z0∈clS2
such that z0
j,j=p+1,...,q, are constants and z0
i→−∞,i∈{1,...,p}, we conclude
that
(2.7) yi≥0,i=1,...,p.
Choosing z2∈clS2with z2
i=−gi(¯x),i=1,...,p, z2
j=−hj(¯x),j=p+1,...,q,we
have
(2.8)
p
i=1
yigi(¯x)+
q
j=p+1
yjhj(¯x)=−yTz2≥0.
(b) We now consider the inequality yTz≥0 for all z∈S1. Select an arbitrary d.
Then z1:= ATd∈S1,z:= −z1=AT(−d)∈S1, and hence
p
i=1
yivT
id+
q
j=p+1
yjvT
jd=yTz1≥0,
p
i=1
yivT
i(−d)+
q
j=p+1
yjvT
j(−d)=yTz≥0.
That is,
(2.9)
p
i=1
yivi+
q
j=p+1
yjvj=0.
Therefore, if there exists a nonzero vector ysuch that yTz≥0 for all z∈S1and yTz≤
0 for all z∈clS2, the vector should also satisfy conditions (2.7)–(2.9). However, from
the EWNNAMCQ, conditions (2.7)–(2.9) imply that y= 0, which is a contradiction.
Thus condition (ii) of EWGMFCQ must hold. The proof is therefore complete.
In the case when there is only one inequality constraint and no equality con-
straints in problem (P), the EWNNAMCQ and EWGMFCQ at ¯xreduce to the fol-
lowing condition: There is no K0⊂K⊂Nsuch that limk→∞,k∈Kxk=¯xand
limk→∞,k∈K0∇g1
ρk(xk)= 0. This condition is slightly weaker than a similar con-
dition [28, Assumption (B4)] which requires that there is no K0⊂Nsuch that
limk→∞,k∈K0∇g1
ρk(xk)=0.
3. Smoothing SQP method. In this section we design the smoothing SQP
algorithm and prove its convergence.
Suppose that {gi
ρ(x):ρ>0}and {hj
ρ(x):ρ>0}are families of smoothing
functions for gi,h
j, respectively. Let xkbe the current iterate, and let (Wk,r
k,ρ
k)
be current updates of the positive definite matrix, the penalty parameter, and the
smoothing parameter, respectively. We will try to find a descent direction of a smooth-
ing merit function by using the smoothing SQP subprogram. In order to overcome
the inconsistency of the smoothing SQP subprograms, following Pantoja and Mayne
[35], we solve the penalized smoothing SQP subprogram:
(QP)kmin
d∈Rn,ξ∈R∇fρk(xk)Td+1
2dTWkd+rkξ
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1397
s.t.g
i
ρk(xk)+∇gi
ρk(xk)Td≤ξ, i =1,...,p,
hj
ρk(xk)+∇hj
ρk(xk)Td≤ξ, j =p+1,...,q,
−hj
ρk(xk)−∇hj
ρk(xk)Td≤ξ, j =p+1,...,q,
ξ≥0.
If (dk,ξ
k) is a solution of (QP)k, then its Karush–Kuhn–Tucker (KKT) condition can
be written as
0=∇fρk(xk)+Wkdk+
p
i=1
λg
i,k ∇gi
ρk(xk)+
q
j=p+1
(λ+
j,k −λ−
j,k )∇hj
ρk(xk),(3.1)
0=rk−⎛
⎝
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ−
j,k )+λξ
k⎞
⎠,(3.2)
0≤λg
i,k ⊥(gi
ρk(xk)+∇gi
ρk(xk)Tdk−ξk)≤0,i=1,...,p,(3.3)
0≤λ+
j,k ⊥(hj
ρk(xk)+∇hj
ρk(xk)Tdk−ξk)≤0,j=p+1,...,q,(3.4)
0≤λ−
j,k ⊥(−hj
ρk(xk)−∇hj
ρk(xk)Tdk−ξk)≤0,j=p+1,...,q,(3.5)
0≤λξ
k⊥−ξk≤0,(3.6)
where λk=(λg
k,λ
+
k,λ
−
k,λ
ξ
k) is a corresponding Lagrange multiplier.
Let ρ>0,r > 0. We define the smoothing merit function by
θρ,r (x):=fρ(x)+rφρ(x),
where φρ(x):=max{0,g
i
ρ(x),i=1,...,p, |hj
ρ(x)|,j=p+1,...,q}, and propose the
following smoothing SQP algorithm.
Algorithm 3.1. Let {β, σ1}be constants in (0,1),andlet{σ, σ,ˆη}be constants
in (1,∞). Choose an initial point x0, an initial smoothing parameter ρ0>0,an
initial penalty parameter r0>0, and an initial positive definite matrix W0∈Rn×n,
and set k:= 0.
1. Solve (QP)kto obtain (dk,ξ
k)with the corresponding Lagrange multiplier
λk=(λg
k,λ
+
k,λ
−
k,λ
ξ
k);gotostep2.
2. If ξk=0,setrk+1 := rkandgotostep3. Otherwise, set rk+1 := σrkand
go to step 3.
3. Let xk+1 := xk+αkdk,whereαk:= βl,l∈{0,1,2,...}is the smallest
nonnegative integer satisfying
θρk,rk(xk+1)−θρk,rk(xk)≤−σ1αkdkWkdk.(3.7)
If
dk≤ˆηρ−1
k,(3.8)
set ρk+1 := σρkandgotostep4. Otherwise, set ρk+1 := ρkandgotostep1.In
either case, update to a symmetric positive definite matrix Wk+1 and k=k+1.
4. If a stopping criterion holds, terminate. Otherwise, go to step 1.
We now show the global convergence of the smoothing SQP algorithm. For this
purpose, we need the following standard assumption.
Assumption 3.1. There exist two positive constants mand M,m≤M, such that
for each kand each d∈Rn,
md2≤dTWkd≤Md2.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1398 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
Theorem 3.1. Suppose that {(xk,ρ
k,d
k,ξ
k,λ
k,r
k,W
k)}is a sequence generated
by Algorithm 3.1. Then for every k,
(3.9) θ
ρk,rk(xk,d
k)≤−dkWkdk
and dkis a descent direction of function θρk,rk(x)at xk, provided that Assump-
tion 3.1 holds. Furthermore, suppose that Algorithm 3.1 does not terminate within
finite iterations. Suppose that the sequences {xk},{λk},and{rk}are bounded. Then
¯
K:= {k:dk≤ˆηρ−1
k}is an infinite set, and any accumulation point of sequence
{xk}¯
Kis a stationary point of problem (P).
Proof. Since (dk,ξ
k) is a solution of (QP)k, the KKT conditions (3.1)−(3.6) hold.
The directional derivative of the function x→|hj
ρk(x)|at xkin direction dkis
⎧
⎨
⎩
−∇hj
ρk(xk)Tdkif hj
ρk(xk)<0,
|∇hj
ρk(xk)Tdk|if hj
ρk(xk)=0,
∇hj
ρk(xk)Tdkif hj
ρk(xk)>0.
Denote the index sets
Ik:= {i=1,...,p :gi
ρk(xk)=φρk(xk)},
J+
k:= {j=p+1,...,q :hj
ρk(xk)=φρk(xk)},
J−
k:= {j=p+1,...,q :−hj
ρk(xk)=φρk(xk)},
and Γk:= Ik∪J+
k∪J−
k. Therefore the directional derivative of the function x→φρk(x)
at xkin direction dkis
⎧
⎪
⎪
⎨
⎪
⎪
⎩
0ifφρk(xk)=0andΓ
k=∅,
max{0,∇gi
ρk(xk)Tdk,i∈Ik,|∇hj
ρk(xk)Tdk|,j∈J+
k}if φρk(xk)=0andΓ
k=∅,
max{∇gi
ρk(xk)Tdk,i∈Ik,∇hj
ρk(xk)Tdk,j∈J+
k,
−∇hj
ρk(xk)Tdk,j∈J−
k}if φρk(xk)>0.
From (3.3)–(3.5), we have
∇gi
ρk(xk)Tdk≤ξk−gi
ρk(xk)=ξk−φρk(xk),i∈Ik,
∇hj
ρk(xk)Tdk≤ξk−hj
ρk(xk)=ξk−φρk(xk),j∈J+
k,
−∇hj
ρk(xk)Tdk≤ξk+hj
ρk(xk)=ξk−φρk(xk),j∈J−
k.
Thus, φ
ρk(xk,d
k)≤ξk−φρk(xk). Therefore,
θ
ρk,rk(xk,d
k)=∇fρk(xk)Tdk+rkφ
ρk(xk,d
k)
≤∇fρk(xk)Tdk+rk(ξk−φρk(xk)) .
From ( 3.2) and (3.6), we know that if ξk>0, then
rk=⎛
⎝
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ−
j,k )⎞
⎠,
which means
rkξk=⎛
⎝
p
i=1
λg
i,k +
q
j=p+1
(λ+
j,k +λ−
j,k )⎞
⎠ξk.(3.10)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1399
By taking conditions (3.1), (3.3)–(3.5), and (3.10) into account, we obtain that
for each k,
θ
ρk,rk(xk,d
k)=θ
ρk,rk(xk,d
k)+
p
i=1
λg
i,k (gi
ρk(xk)+∇gi
ρk(xk)Tdk−ξk)
+
q
j=p+1
λ+
j,k (hj
ρk(xk)+∇hj
ρk(xk)Tdk−ξk)+
q
j=p+1
λ−
j,k (−hj
ρk(xk)−∇hj
ρk(xk)Tdk−ξk)
≤−dkWkdk+
p
i=1
λg
i,k (gi
ρk(xk)−ξk)+
q
i=p+1
λ+
j,k (hj
ρk(xk)−ξk)
+
q
i=p+1
λ−
j,k (−hj
ρk(xk)−ξk)+rk(ξk−φρk(xk))
≤−dkWkdk+rk(ξk−φρk(xk)) + ⎛
⎝
p
i=1
λg
i,k +
q
i=p+1
λ+
j,k +
q
i=p+1
λ−
j,k ⎞
⎠(φρk(xk)−ξk)
=−dkWkdk−⎛
⎝rk−
p
i=1
λg
i,k −
q
i=p+1
λ+
j,k −
q
i=p+1
λ−
j,k ⎞
⎠φρk(xk)
≤−dkWkdk.
Hence inequality (3.9) holds. Since Wkis assumed to be positive definite, it follows
that dkis a descent direction of function θρk,rk(x)atxkfor every k. Therefore, the
algorithm is well defined.
We now suppose that Algorithm 3.1 does not terminate within finite iterations.
We first prove that there always exists some dksuch that (3.8) holds; thus ¯
Kis an
infinite set.
To the contrary suppose that dk≥c0>0foreachk. Then Assumption 3.1
together with condition (3.7) implies the existence of a positive constant csuch that
θρk,rk(xk+1)≤θρk,rk(xk)−c.Consequently,(3.8) fails. From the boundedness of
{rk}, we know that ξk=0whenkis large. We can then assume that there exists a ¯
k
large enough such that ρk=ρ¯
kand rk=¯rfor k≥¯
kbytheupdatingruleofρkand
rk.
Since the sequence {xk}is bounded, the sequence {θρ¯
k,¯r(xk)}is bounded below.
Moreover, θρk,rk(xk+1)≤θρk,rk(xk)−c,c>0, which implies that the sequence
{θρ¯
k,¯r(xk)}is monotonously decreasing. Hence we have
k≥¯
k
c≤
k≥¯
kθρ¯
k,¯r(xk)−θρ¯
k,¯r(xk+1)
=θρ¯
k,¯r(x¯
k)−lim
k→∞ θρ¯
k,¯r(xk)
<∞,
which is a contradiction. Therefore ¯
Kis an infinite set, which also implies that ρk↑∞
as k→∞.
Suppose there exist K⊆¯
Kand ¯xsuch that limk→∞,k∈Kxk=¯x. Since the
sequence {λk}is bounded, without loss of generality, assume there exists a subse-
quence K1⊂Ksuch that (λg
k,λ
+
k,λ
−
k,λ
ξ
k)→(¯
λg,¯
λ+,¯
λ−,¯
λξ)ask→∞,k∈K1and
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1400 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
¯
λ≥0. By the gradient consistency property of fρ(·), gi
ρ(·), i=1,...,p,andhj
ρ(·),
j=p+1,...,q, there exists a subsequence ˜
K1⊂K1such that
lim
k→∞,k∈˜
K1
∇fρk(xk)∈∂f(¯x),
lim
k→∞,k∈˜
K1
∇gi
ρk(xk)∈∂gi(¯x),i=1,...,p,
lim
k→∞,k∈˜
K1
∇hj
ρk(xk)∈∂hj(¯x),j=p+1,...,q.
Taking limits in (3.1) and (3.4)–(3.6) as k→∞,k∈˜
K1, by the gradient consistency
properties and ξk→0, it is easy to see that ¯xis a stationary point of problem (P).
The proof of the theorem is complete.
In the rest of this section, we give a sufficient condition for the boundedness of
sequences {rk}and {λk}. We first give the following result on error bounds.
Lemma 3.1. For eac h k∈N,j=1,...,l,letFj
k,Fj:Rn→Rbe cont inuously
differentiable. Assume that for each j=1,...,l,{Fj
k(·)}and {∇Fj
k(·)}converge
pointwise to Fj(·)and ∇Fj(·), respectively, as kgoes to infinity. Let ˆ
dbe the point
such that Fj(ˆ
d)=0,j =1,...,l. Suppose that there exist κ>0and δ>0such that
for all μj∈[−1,1],j=1,...,l, not all zero and all d∈ˆ
d+δB it holds that
l
j=1
μj∇Fj(d)
>1
κ.
Then for sufficiently large k,
dist( ˆ
d, Sk)≤κ
l
j=1
|Fj
k(ˆ
d)|,(3.11)
where Sk:= {d∈Rn:Fj
k(d)=0,j =1,...,l}.
Proof.DenoteF(d):=l
j=1 |Fj(d)|,Fk(d):=l
j=1 |Fj
k(d)|.Ifˆ
d∈Sk,then
(3.11) holds trivially. Now suppose that ˆ
d∈ Sk.SinceFk(ˆ
d)→F(ˆ
d)ask→∞,
there exists a ¯
k∈Nsuch that Fk(ˆ
d)<κ
−1δwhen k≥¯
k.Letε:= Fk(ˆ
d). Then
εκ < δ.Takeλ∈(εκ, δ). Then by Ekeland’s variational principle [38, Proposition
1.43], there exists an ωsuch that ω−ˆ
d≤λ,Fk(ω)≤Fk(ˆ
d), and the function
ϕ(d):=Fk(d)+ ε
λd−ωattains minimum at ω. Hence by the nonsmooth calculus
of the Clarke generalized gradient, we have
0∈∂Fk(w)+ ε
λB,
where Bdenotes the closed unit ball of Rn.Thusvk≤ ε
λ<1
κfor all vk∈
∂Fk(ω), for k≥¯
k. We now show that Fk(w) = 0 by contradiction. Suppose that
Fk(w)= 0. Then there exists at least one jsuch that Fj
k(w)=0. Forsuchaj,
∂|Fj
k(w)|={±∇Fj
k(w)}. Therefore there exist μk
j∈[−1,1], j=1,...,l, not all zero
such that vk=l
j=1 μk
j∇Fj
k(ω). We assume that there exist a subsequence K⊂N
and μj∈[−1,1], j=1,...,l, not all zero such that for every k∈K,Fk(w)=0,
limk→∞,k∈Kμk
j=μj,j=1,...,l.Since{∇Fj
k(w)}kconverge to ∇Fj(w), we have
v:= limk→∞,k∈Kvk=l
j=1 μj∇Fj(ω)andv≤ 1
κ, which is a contradiction. The
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1401
contradiction shows that we must have Fk(w) = 0 and hence w∈Sk. Therefore we
have
dist( ˆ
d, Sk)≤ˆ
d−ω≤λ.
Since this is true for every λ∈(εκ, δ), we have that for all k≥¯
k,
dist( ˆ
d, Sk)≤εκ =κ|Fk(ˆ
d)|.
Theorem 3.2. Assume that Assumption 3.1 holds. Suppose that Algorithm 3.1
does not terminate within finite iterations and {(xk,ρ
k,d
k,ξ
k,λ
k,r
k)}is a sequence
generated by Algorithm 3.1. If EWGMFCQ holds (or, equivalently, EWNNAMCQ
holds) at any accumulation point ¯x, then the following two statements are true:
(a) {dk}and {ξk}are bounded.
(b) {rk}and {λk}are bounded. Furthermore, when kis large enough, ξk=0.
Proof. (a) Assume that there exists a subset K⊆Nsuch that limk→∞,k∈Kxk=
¯x. To the contrary, suppose that {dk}Kis unbounded. Then there exists a subset
K0⊆Ksuch that limk→∞,k∈K0dk=∞and limk→∞,k∈K0xk=¯x. By the gradient
consistency property, without loss of generality we may assume that
vi= lim
k→∞,k∈K0
∇gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,k∈K0
∇hj
ρk(xk),j=p+1,...,q.
By EWGMFCQ, vp+1,...,v
qare linearly independent and there exists ˆ
dsuch that
gi(¯x)+vT
iˆ
d<0,i=1,...,p,
hj(¯x)+vT
jˆ
d=0,j=p+1,...,q.
Since the vectors {limk→∞,k∈K0∇hj
ρk(xk):j=p+1,...,q}are linearly independent,
it is easy to see that for sufficiently large k∈K0, the vectors {∇hj
ρk(xk),j =p+
1,...,q}are also linearly independent. Denote
Fj(d):=hj(¯x)+vT
jd, j =p+1,...,q,
Fj
k(d):=hj
ρk(xk)+∇hj
ρk(xk)Td, j =p+1,...,q.
Then Fj(ˆ
d)=0,j=p+1,...,q.Sincevp+1,...,v
qare linearly independent, there
is κsuch that 0 <1
κ<min{ q
j=p+1 μjvj:μj∈[−1,1] not all equal to zero}.By
Lemma 3.1, for sufficiently large k,
dist( ˆ
d, Sk)≤κ
q
j=p+1
|Fj
k(ˆ
d)|,(3.12)
where Sk:= {d∈Rn:Fj
k(d)=0,j =p+1,...,q}.Since Skis closed, there exists
ˆ
dk∈Sksuch that ˆ
d−ˆ
dk=dist(ˆ
d, Sk). Moreover, by virtue of (3.12), the fact that
limk→∞,k∈K0Fj
k(ˆ
d)=Fj(ˆ
d) = 0 for all j=p+1,...,q implies that ˆ
d−ˆ
dk→0as
k→∞,k ∈K0. Hence for sufficiently large k,wehave
hj
ρk(xk)+∇hj
ρk(xk)Tˆ
dk=0,j=p+1,...,q,(3.13)
gi
ρk(xk)+∇gi
ρk(xk)Tˆ
dk<0,i=1,...,p.(3.14)
Conditions (3.13)–(3.14) imply that ( ˆ
dk,0) is a feasible solution for (QP)k.Since
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1402 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
(dk,ξ
k) is an optimal solution to problem (QP)k, we have that for any k≥¯
k,k∈K0,
∇fρk(xk)Tdk+1
2dT
kWkdk≤∇fρk(xk)Tdk+1
2dT
kWkdk+rkξk
≤∇fρk(xk)Tˆ
dk+1
2ˆ
dT
kWkˆ
dk.(3.15)
Since ∇fρk(xk)Tˆ
dk+1
2ˆ
dT
kWkˆ
dkis bounded, it follows that {dk}Kis bounded from
Assumption 3.1. Since (dk,ξ
k) are feasible for problem (QP)k, by the definition of
the smoothing function and the gradient consistency property, it is easy to see that
if {dk}Kis bounded, then {ξk}Kis also bounded. Since Kand ¯xare an arbitrary
subset and an arbitrary accumulation point, {dk}and {ξk}are bounded for the whole
sequence.
(b) To the contrary, suppose that {λk}is unbounded. Then there exists a subset
K1⊆Ksuch that limk→∞,k∈K1λk=∞and ξk>0fork∈K1sufficiently large.
By the gradient consistency property, without loss of generality we may assume that
vi= lim
k→∞,k∈K1
∇gi
ρk(xk),i=1,...,p,
vj= lim
k→∞,k∈K1
∇hj
ρk(xk),j=p+1,...,q,
and limk→∞,k∈K1
λk
λk=¯
λfor some nonzero vector ¯
λ=(
¯
λg,¯
λ+,¯
λ−,¯
λξ)≥0. Divid-
ing by λkin both sides of (3.1) and letting k→∞,k∈K1,wehave
0=
p
i=1
¯
λg
ivi+
q
j=p+1
(¯
λ+
j−¯
λ−
j)vj.(3.16)
Letting k→∞,k∈K1, in conditions (3.3)–(3.6) and assuming that ( ¯
d, ¯
ξ)isthe
limiting point of {(dk,ξ
k)}K1,wehave
0≤¯
λg
i⊥(gi(¯x)+vT
i¯
d−¯
ξ)≤0,i=1,...,p,
0≤¯
λ+
j⊥(hj(¯x)+vT
j¯
d−¯
ξ)≤0,j=p+1,...,q,
0≤¯
λ−
j⊥(−hj(¯x)−vT
j¯
d−¯
ξ)≤0,j=p+1,...,q,
0≤¯
λ¯
ξ⊥−
¯
ξ≤0.
Multiplying both sides of (3.16) by ¯
d,since
¯
λg
i(gi(¯x)+vT
i¯
d−¯
ξ)=0,i=1,...,p,
¯
λ+
j(hj(¯x)+vT
j¯
d−¯
ξ)=0,j=p+1,...,q,
¯
λ−
j(−hj(¯x)−vT
j¯
d−¯
ξ)=0,j=p+1,...,q,
we have
0=
p
i=1
¯
λg
ivT
i¯
d+
q
j=p+1
(¯
λ+
j−¯
λ−
j)vT
j¯
d
=
p
i=1
¯
λg
i(¯
ξ−gi(¯x)) +
q
j=p+1
¯
λ+
j(¯
ξ−hj(¯x)) +
q
j=p+1
¯
λ−
j(¯
ξ+hj(¯x)).
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1403
Thus,
p
i=1
¯
λg
igi(¯x)+
q
j=p+1
(¯
λ+
j−¯
λ−
j)hj(¯x)=
p
i=1
¯
λg
i¯
ξ+
q
j=p+1
(¯
λ+
j+¯
λ−
j)¯
ξ≥0.(3.17)
From EWGMFCQ (equivalently, EWNNAMCQ), condition (3.17) together with con-
dition (3.16) implies that ¯
λg
i=0,i=1,...,p,and¯
λ+
j−¯
λ−
j=0,j=p+1,...,q.
Consider the case where ¯
λg
i=0,i=1,...,p, and there exists an index j∈
{p+1,...,q}such that ¯
λ+
j=¯
λ−
j>0. Then for sufficiently large k∈K1,λ+
j,k >0
and λ−
j,k >0. From the complementary condition (3.4)–(3.5), we must have ξk=0
for sufficiently large k∈K1, which is a contradiction.
Otherwise, consider the case where ¯
λg
i=0,i=1,...,p,and¯
λ+
j=¯
λ−
j=0,
j=p+1,...,q.Thensince¯
λis a nonzero vector, we must have ¯
λξ>0, which implies
that λξ
k>0 for sufficiently large k∈K1. From the complementarity condition (3.6),
ξk= 0 for sufficiently large k∈K1, which is a contradiction.
The contradiction shows that {λk}must be bounded. By the relationship between
{λk}and {rk}given in (3.2), the boundedness of {λk}implies the boundedness of
{rk}. Furthermore, from the updating rule of the algorithm, the boundedness of the
sequences {λk}and {rk}implies that when kis large enough, ξk=0. Wecomplete
the proof.
The following corollary follows immediately from Theorems 3.1 and 3.2.
Corollary 3.2. Let Assumption 3.1 hold, and suppose that Algorithm 3.1 does
not terminate within finite iterations. Suppose that the sequence {xk}is bounded.
Assume that EWGMFCQ (or, equivalently, EWNNAMCQ) holds at any accumulation
point of sequ ence {xk};then ¯
K:= {k:dk≤ˆηρ−1
k}is an infinite set, and any
accumulation point of sequence {xk}¯
Kis a stationary point of problem (P).
In the case where the objective function is smooth and there is only one inequality
constraint and no equality constraints in problem (P), Corollary 3.2 extends [28,
Theorem 4.3] to allow the general smoothing function instead of the specific smoothing
function.
4. Applications to bilevel programs. The purpose of this section is to apply
the smoothing SQP algorithm to the bilevel program. We illustrate how we can
apply our algorithm to solve the bilevel program, and we demonstrate through some
numerical examples that although the GMFCQ never holds for bilevel programs, the
WGMFCQ may be satisfied easily.
In this section we consider the simple bilevel program
(SBP) min F(x, y)
s.t.y∈S(x),
where S(x) denotes the set of solutions of the lower level program
(Px)min
y∈Yf(x, y),
where F, f :Rn×Rm→Rare continuously differentiable and twice continuously
differentiable, respectively, and Yis a compact subset of Rm.OursmoothingSQP
algorithm can easily handle any extra upper level constraint, but we omit it for sim-
plicity. For a general bilevel program, the lower level constraint may depend on the
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1404 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
upper level variables. By “simple,” we mean that the lower level constraint Yis inde-
pendent of x. Although (SBP) is a simple case of the general bilevel program, it has
many applications such as the principal-agent problem [30] in economics. We refer
the reader to [1, 15, 16, 39, 43] for applications of general bilevel programs.
When the lower level program is a convex program in variable y, the first order
approach to solving a bilevel program is to replace the lower level program by its
KKT conditions. In the case where fis not convex in variable y, Mirrlees [30] showed
that this approach may not be valid in the sense that the true optimal solution for
the bilevel problem may not even be a stationary point of the reformulated problem
by the first order approach.
For a numerical purpose, Outrata [34] proposed to reformulate a bilevel program
as a nonsmooth single level optimization problem by replacing the lower level program
by its value function constraint, which in our simple case is
(VP) min F(x, y)
s.t.f(x, y )−V(x)=0,(4.1)
x∈Rn,y∈Y,
where V(x):=min
y∈Yf(x, y) is the value function of the lower level problem. By
Danskin’s theorem (see [11, page 99] or [14]), the value function is Lipschitz continuous
but not necessarily differentiable, and hence problem (VP) is a nonsmooth optimiza-
tion problem with Lipschitz continuous problem data. Ye and Zhu [46] pointed out
that the usual constraint qualifications such as the GMFCQ never hold for problem
(VP). Ye and Zhu [46, 47] derived the first order necessary optimality condition for the
general bilevel program under the so-called partial calmness condition under which
the difficult constraint (4.1) is moved to the objective function with a penalty.
Based on the value function approach, Lin, Xu, and Ye [27] recently proposed to
approximate the value function by its integral entropy function, i.e.,
γρ(x):=−ρ−1ln Y
exp[−ρf(x, y)]dy
=V(x)−ρ−1ln Y
exp[−ρ(f(x, y)−V(x))]dy,
and developed a smoothing projected gradient algorithm to solve problem (VP) when
problem (SBP) is partially calm and to solve an approximate bilevel problem (VP)ε,
where the constraint (4.1) is replaced by f(x, y)−V(x)≤εfor small ε>0when
(SBP) is not partially calm.
Unfortunately, the partial calmness condition is rather strong, and hence a local
optimal solution of a bilevel program may not be a stationary point of (VP). Ye and
Zhu [48] proposed to study the following combined program by adding the first order
condition of the lower level problem into the problem (VP). Although the partial
calmness condition is a very strong condition for (VP), it is likely to hold for the
combined problem under some reasonable conditions [48].
Recently Xu and Ye [45] proposed a smoothing augmented Lagrangian method
to solve the combined problem with the assumption that each lower level solution lies
in the interior of Y:
(CP) min
(x,y)∈Rn×YF(x, y)
s.t.f(x, y )−V(x)≤0,(4.2)
∇yf(x, y)=0.(4.3)
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1405
They showed that if the sequence of penalty parameters is bounded, then any accumu-
lation point is a Clarke stationary point of (CP). They argued that since the problem
(CP) is very likely to satisfy the partial calmness or the weak calmness condition (see
[48]), the sequence of penalty parameters is likely to be bounded.
To simplify our discussion so that we can concentrate on the main idea, we make
the following assumption.
Assumption 4.1. Every optimal solution of the lower level problem is an interior
point of set Y.
Under Assumption 4.1, every optimal solution to the lower level constrained prob-
lem is a local minimizer to the objective function of the lower level problem, and
hence the necessary optimality condition of the lower level problem is simply equal
to ∇yf(x, y) = 0. For some practical problems, it may be possible to set the set Y
large enough so that all optimal solutions of the lower level problem are contained in
the interior of Y. For example, for the principal-agent problem in economics [30], a
very important application of simple bilevel programs, the lower level constraint is an
interval and the solution of the lower level problem can usually be estimated to lie
in the interior of a certain bounded interval Y. If it is difficult to find a compact set
Ythat includes all optimal solutions of the lower level problem, but the set Ycan
be represented by some equality or inequality constraints, then one can use the KKT
condition to replace the constraint (4.3) in the problem (CP). In this case the problem
(CP) will become a nonsmooth mathematical program with equilibrium constraints.
We will study this case in a separate paper.
Since problem (CP) is a nonconvex and nonsmooth optimization problem, in
general the best we can do is to look for its Clarke stationary points. Since we assume
that all lower level solutions lie in the interior of set Y, any local optimal solution of
(CP) must be the Clarke stationary point of (CP) with the constraint y∈Yremoved.
Hence the smoothing SQP method introduced in this paper can be used to find the
stationary points of (CP).
Let (¯x, ¯y) be a local optimal solution of (CP). Then by the Fritz John–type mul-
tiplier rule, there exist r≥0,λ
1≥0,λ
2∈Rmnot all zero such that
0∈r∇F(¯x, ¯y)+λ1(∇f(¯x, ¯y)−∂V (¯x)×{0})+∇(∇yf)(¯x, ¯y)Tλ2.(4.4)
In the case when ris positive, (¯x, ¯y) is a stationary point of (CP). A sufficient condition
for rto be positive is that in the Fritz John condition, r= 0 implies that λ1,λ
2are
all equal to zero. Unfortunately we now show that rcan always be taken as zero in
the above Fritz John condition for problem (CP). Indeed, from the definition of V(x),
we always have f(x, y)−V(x)≥0 for any y∈Y. Hence any feasible point (¯x, ¯y)of
problem (CP) is always an optimal solution of the problem
min
(x,y)∈Rn×Yf(x, y)−V(x) s.t. ∇yf(x, y)=0.
By the Fritz John–type multiplier rule, there exist λ1≥0,λ
2∈Rmnot all equal to
zero such that
0∈λ1(∇f(¯x, ¯y)−∂V (¯x)×{0})+∇(∇yf)(¯x, ¯y)Tλ2.(4.5)
Observe that (4.5) is (4.4) with r= 0. Since (λ1,λ
2) is nonzero, we have shown that
the Fritz John condition (4.4) for problem (CP) holds with r=0. Inotherwords,
NNAMCQ (or, equivalently, GMFCQ) for problem (CP) never holds.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1406 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
It follows from [27, Theorems 5.1 and 5.5] that the integral entropy function γρ(x)
is a smoothing function with the gradient consistency property for the value function
V(x). That is,
lim
z→x, ρ↑∞ γρ(z)=V(x)and∅ = lim sup
z→x, ρ↑∞
∇γρ(z)⊆∂V (x).
For a sequence of iteration points {(xk,y
k)}, the set lim supk→∞ ∇γρk(xk)maybe
strictly contained in ∂V (x). Therefore while (4.5) holds for some λ1≥0,λ
2∈Rm
not all equal to zero, the following inclusion may hold only when λ1=0,λ
2=0:
0∈λ1∇f(¯x, ¯y)−lim sup
k→∞
∇γρk(xk)×{0}+∇(∇yf)(¯x, ¯y)Tλ2.
Then, consequently, the WNNAMCQ may hold. We illustrate this point by using
some numerical examples. In these examples, since y∈R, the problem (CP) has one
inequality constraint f(x, y)−V(x)≤0 and one equality constraint ∇yf(x, y)=0.
Hence the WNNAMCQ
0∈λ1∇f(¯x, ¯y)−lim sup
k→∞
∇γρk(xk)×{0}+λ2∇(∇yf)(¯x, ¯y),λ
1≥0=⇒λ1=λ2=0,
amounts to saying that for limk→∞(xk,y
k)=(¯x, ¯y)andv= limk→∞ ∇γρk(xk), the
vectors
∇f(¯x, ¯y)−(v, 0) and ∇(∇yf)(¯x, ¯y)
are linearly independent.
In our numerical experiments, we use the so-called limited-memory Broyden–
Fletcher–Goldfarb–Shanno (LBFGS) approach proposed by Nocedal [33], which is a
modification to the BFGS method for unconstrained optimization problems, to update
the matrix Wk. Define sk:= xk+1 −xkand
yk:= ∇fρk(xk+1)−∇fρk(xk)−
p
i=1
λg
i,k (∇gi
ρk(xk+1)−∇gi
ρk(xk))
−
q
j=p+1
(λ+
j,k −λ−
j,k )(∇hj
ρk(xk+1)−∇hj
ρk(xk)).
We update Wk+1 by
Wk+1 =Wk−WksksT
kWk
sT
kWksk
+ykyT
k
sT
kyk
if and only if
sk≤γs, yk≤γy, and sT
kyk≥γsy 2
for given (γs,γ
y,γ
sy )>0. Otherwise, we skip the update. As shown in [13], these
restrictions guarantee the existence of M≥m>0 such that
md2≤dTWkd≤Md2.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1407
In numerical practice, it is impossible to obtain an exact “0”; thus we select some
small enough ε>0, ε>0 and change the update rule of rkand ρkto the case when
ξk<ε
and
dk≤max{ˆηρ−1
k,ε},
respectively. Also the stopping criterion is considered as follows: For a given 1>0,
we terminate the algorithm at the kth iteration if
dk<ε and ξk<ε
.
In the remainder of this section, we test the algorithm for some bilevel problems.
Example 4.1 (see [30]). Consider Mirrlees’ problem. Note that the solution
of Mirrlees’ problem does not change if we add the constraint y∈[−2,2] into the
problem:
min (x−2)2+(y−1)2
s.t.y∈S(x),
where S(x) is the solution set of the lower level program
min −xexp[−(y+1)
2]−exp[−(y−1)2]
s.t.y∈[−2,2].
It was shown in [30] that the unique optimal solution is (¯x, ¯y)with¯x=1,¯y≈0.958
being the positive solution of the equation
(1 + y)=(1−y)exp[4y].
In our test, we chose the initial point (x0,y
0)=(0.6,0.3) and the parameters
β=0.8,σ
1=10
−6,ρ
0= 100,r
0= 100,ˆη=5∗105,σ= 10,σ
= 10, ε=10
−7,
and ε=10
−10. Since the stopping criteria hold, we terminate at the 16th iteration
with (xk,y
k)=(1,0.95759). It seems that the sequence converges to (¯x, ¯y).
Since
∇f(xk,y
k)−(∇γρk(xk),0) = (0.01784,0.00015),
∇(∇yf)(xk,y
k)=(0.084813,1.70049),
by virtue of the continuity of the gradients it is easy to see that the vectors
∇f(¯x, ¯y)−lim
k→∞ ∇γρk(xk),0and ∇(∇yf)(¯x, ¯y)
are linearly independent. Thus the WNNAMCQ holds at (¯x, ¯y), and our algorithm
guarantees that (¯x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of Mirrlees’ problem.
Example 4.2 (see [31, Example 3.14]). The bilevel program
min F(x, y):=x−1
42
+y2
s.t.y∈S(x):=argmin
y∈[−1,1]
f(x, y):= y3
3−xy
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1408 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
has the optimal solution point (¯x, ¯y)=(
1
4,1
2) with an objective value of 1
4.
In our test, we chose the initial point (x0,y
0)=(0.3,0.3) and the parameters
β=0.9,σ
1=10
−6,ρ
0= 100,r
0= 100,ˆη= 5000,σ= 10,σ
= 10, ε=10
−7,and
ε=10
−10. Since the stopping criteria hold, we terminate at the 7th iteration with
(xk,y
k)=(0.25,0.5). It seems that the sequence converges to (¯x, ¯y).
Since
∇f(xk,y
k)−(∇γρk(xk),0) = (−1.5,0),
∇(∇yf)(xk,y
k)=(−1,1),
by virtue of the continuity of the gradients it is easy to see that the vectors
∇f(¯x, ¯y)−lim
k→∞ ∇γρk(xk),0and ∇(∇yf)(¯x, ¯y)
are linearly independent. Thus the WNNAMCQ holds at (¯x, ¯y), and our algorithm
guarantees that (¯x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of the problem.
Example 4.3 (see [31, Example 3.20]). The bilevel program
min F(x, y):=(x−0.25)2+y2
s.t.y∈S(x):=argmin
y∈[−1,1]
f(x, y):= 1
3y3−x2y
has the optimal solution point (¯x, ¯y)=(
1
2,1
2) with an objective value of 5
16 .
In our test, we chose the parameters β=0.9,σ
1=10
−6,ρ
0= 100,r
0=
100,ˆη= 500,σ= 10,σ
= 10, ε=10
−7,andε=10
−10. Wechosetheinitial
point (x0,y
0)=(0.3,0.8). Since the stopping criteria hold, we terminate at the 8th
iteration with (xk,y
k)=(0.4999996,0.4999996). It seems that the sequence converges
to (¯x, ¯y).
Since
∇f(xk,y
k)−(∇γρk(xk),0) = (−1.499898,0),
∇(∇yf)(xk,y
k)=(−1,1),
by virtue of the continuity of the gradients it is easy to see that the vectors
∇f(¯x, ¯y1)−lim
k→∞ ∇γρk(xk),0and ∇(∇yf)(¯x, ¯y1)
are linearly independent. Thus the WNNAMCQ holds at (¯x, ¯y), and our algorithm
guarantees that (¯x, ¯y) is a stationary point of (CP). Indeed, (¯x, ¯y) is the unique global
minimizer of the problem.
5. Conclusion. In this paper, we propose a smoothing SQP method for solv-
ing nonsmooth and nonconvex optimization problems with Lipschitz inequality and
equality constraints. The algorithm is applicable even to degenerate constrained op-
timization problems which do not satisfy the GMFCQ, the standard constraint qual-
ification for a local minimizer to satisfy the KKT conditions. Our main motivation
comes from solving the bilevel program which is nonsmooth, nonconvex, and never
satisfies the GMFCQ. In this paper, we have proposed the concept of the WGMFCQ
(equivalently, WNNAMCQ), a weaker version of the GMFCQ, and have shown the
global convergence of the smoothing SQP algorithm under the WGMFCQ. Moreover,
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SMOOTHING SQP METHODS FOR NONSMOOTH PROBLEMS 1409
we have demonstrated the applicability of the smoothing SQP algorithm for solv-
ing the combined program of a simple bilevel program with a nonconvex lower level
problem. For smooth optimization problems, it is well known that the SQP methods
converge very quickly when the iterates are close to the solution. The rapid local
convergence of the SQP is due to the fact that the positive definite matrix Wkin the
SQP subproblem is an approximation of the Hessian matrix of the Lagrangian func-
tion. For our nonsmooth problem, the Lagrangian function is only locally Lipschitz,
and no classical Hessian matrix can be defined. However, it would be interesting to
study the local behavior of the smoothing SQP algorithm by using the generalized
second order subderivatives [38] of the Lagrangian function. This remains a topic of
our future research.
Acknowledgments. The authors are grateful to the anonymous referees for
their careful reading of the paper and helpful comments and to Shaoyan Guo for
helpful discussions.
REFERENCES
[1] J.F. Bard,Practical Bilevel Optimization: Algorithms and Applications,KluwerAcademic
Publishers, Dordrecht, The Netherlands, 1998.
[2] D.P. Bertsekas,Constrained Optimization and Lagrange Multiplier Methods, Academic Press,
New York, 1982.
[3] J.V. Burke and S.-P. Han,A robust sequential quadratic programming method, Math. Pro-
gramming, 43 (1989), pp. 277–303.
[4] J.V. Burke and T. Hoheisel,Epi-convergent smoothing with applications to convex composite
functions, SIAM J. Optim., 23 (2013), pp. 1457–1479.
[5] J.V. Burke, T. Hoheisel, and C. Kanzow,Gradient consistency for integral-convolution
smoothing functions, Set-Valued Var. Anal., 21 (2013), pp. 359–376.
[6] B. Chen and X. Chen,A global and local superlinear continuation-smoothing method for P0
and R0NCP or monotone NCP, SIAM J. Optim., 9 (1999), pp. 624–645.
[7] C. Chen and O.L. Mangasarian,A class of smoothing functions for nonlinear and mixed
complementarity problems, Math. Programming, 71 (1995), pp. 51–70.
[8] X. Chen,Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 134
(2012), pp. 71–99.
[9] X. Chen, R.S. Womersley, and J.J. Ye,Minimizing the condition number of a Gram matrix,
SIAM J. Optim., 21 (2011), pp. 127–148.
[10] F.H. Clarke,Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, 1983.
[11] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, and P.R. Wolenski,Nonsmooth Analysis and
Control Theory, Springer, New York, 1998.
[12] F.E. Curtis and M.L. Overton,A sequential quadratic programming algorithm for noncon-
vex, nonsmooth constrained optimization, SIAM J. Optim., 22 (2012), pp. 474–500.
[13] F.E. Curtis and X. Quez,An adaptive gradient sampling algorithm for nonsmooth optimiza-
tion, Optim. Methods Softw., 28 (2013), pp. 1302–1324.
[14] J.M. Danskin,The Theory of Max-Min and Its Applications to Weapons Allocation Problems,
Springer, New York, 1967.
[15] S. Dempe,Foundations of Bilevel Programming, Kluwer Academic Publishers, Dordrecht, The
Netherlands, 2002.
[16] S. Dempe,Annotated bibliography on bilevel programming and mathematical programs with
equilibrium constraints, Optimization, 52 (2003), pp. 333–359.
[17] F. Facchinei,Robust recursive quadratic programming algorithm model with global and super-
linear convergence properties, J. Optim. Theory Appl., 92 (1997), pp. 543–579.
[18] M. Fukushima and J.-S. Pang,Some feasibility issues in mathematical programs with equi-
librium constraints, SIAM J. Optim., 8 (1998), pp. 673–681.
[19] U.M. Garcia-Palomares and O.L. Mangasarian,Superlinearly convergent quasi-Newton
methods for nonlinearly constrained optimization problems, Math. Programming, 11
(1976), pp. 1–13.
[20] P.E. Gill and E. Wong,Sequential quadratic programming methods, in Mixed Integer Nonlin-
ear Programming, IMA Vol. Math. Appl. 154, Springer-Verlag, Berlin, 2012, pp. 147–224.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1410 MENGWEI XU, JANE J. YE, AND LIWEI ZHANG
[21] S.P. Han,Superlinearly convergent variable metric algorithms for general nonlinear program-
ming problems, Math. Programming, 11 (1976), pp. 263–282.
[22] S.P. Han,A globally convergent method for nonlinear programming, J. Optim. Theory Appl.,
22 (1977), pp. 297–309.
[23] M. Heinkenschloss,Projected sequential quadratic programming methods, SIAM J. Optim., 6
(1996), pp. 373–417.
[24] J.B. Hiriart-Urruty,Refinements of necessary optimality conditions in nondifferentiable
programming. I, Appl. Math. Optim., 5 (1979), pp. 63–82.
[25] H. Jiang and D. Ralph,Smooth SQP methods for mathematical programs with nonlinear
complementarity constraints, SIAM J. Optim, 10 (2000), pp. 779–808.
[26] B. Kummer,Newton’s method for nondifferentiable functions, in Advances in Mathematical
Optimization, Math. Res. 45, Akademie-Verlag, Berlin, 1988.
[27] G.-H. Lin, M. Xu, and J.J. Ye,On solving simple bilevel programs with a nonconvex lower
level program, Math. Program. Ser. A, 144 (2014), pp. 277–305.
[28] C. Ling, L. Qi, G.L. Zhou, and S.Y. Wu,Global convergence of a robust smoothing SQP
method for semi-infinite programming, J. Optim. Theory Appl., 129 (2006), pp. 147–164.
[29] X.-W. Liu and Y.-X. Yuan,A robust algorithm for optimization with general equality and
inequa lit y constraint s, SIAM J. Sci. Comput., 22 (2000), pp. 517–534.
[30] J.A. Mirrlees,The theory of moral hazard and unobservable behaviour: Part I, Rev. Econ.
Stud., 66 (1999), pp. 3–21.
[31] A. Mitsos and P.I. Barton,A Test Set for Bilevel Programs, Technical report, Department
of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 2006.
[32] Y. Nesterov,Smoothing minimization of nonsmooth functions, Math. Program., 103 (2005),
pp. 127–152.
[33] J. Nocedal,Updating quasi-Newton matrices with limited storage, Math. Comp., 35 (1980),
pp. 773–782.
[34] J.V. Outrata,On the numerical solution of a class of Stackelberg problems,Z.Oper.Res.,34
(1990), pp. 255–277.
[35] J.F.A. Pantoja and D.Q. Mayne,Exact penalty function algorithm with simple updating of
the penalty parameter, J. Optim. Theory Appl., 69 (1991), pp. 441–467.
[36] M.J.D. Powell and Y. Yuan,A recursive quadratic programming algorithm that uses differ-
entiable exact penalty functions, Math. Programming, 35 (1986), pp. 265–278.
[37] R.T. Rockafellar,Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[38] R.T. Rockafellar and R.J.-B. Wets,Var iational An al ysis, Springer-Verlag, Berlin, 1998.
[39] K. Shimizu, Y. Ishizuka, and J.F. Bard,Nondifferentiable and Two-Level Mathematical
Programming, Kluwer Academic Publishers, Boston, 1997.
[40] P. Spellucci,A new technique for inconsistent QP problems in the SQP method,Math.
Methods Oper. Res., 47 (1998), pp. 355–400.
[41] K. Tone,Revision of constraint approximations in the successive QP-method for nonlinear
programming problems. Math. Programming, 26 (1983), pp. 144–152.
[42] X. Tong, L.Q. Qi, G.L. Zhou, and S.Y. Wu,A smoothing SQP method for nonlinear programs
with stability constraints arising from power systems, Comput. Optim. Appl., 51 (2012),
pp. 175–197.
[43] L.N. Vicente and P.H. Calamai,Bilevel and multilevel programming: A bibliography review,
J. Global Optim., 5 (1994), pp. 291–306.
[44] R.B. Wilson,A Simplicial Algorithm for Concave Programming, Ph.D. thesis, Graduate
School of Business Administration, Harvard University, Cambridge, MA, 1963.
[45] M. Xu and J.J. Ye,A smoothing augmented Lagrangian method for solving simple bilevel
programs, Comput. Optim. Appl., 59 (2014), pp. 353–377.
[46] J.J. Ye and D.L. Zhu,Optimality conditions for bilevel programming problems, Optimization,
33 (1995), pp. 9–27.
[47] J.J. Ye and D.L. Zhu,A note on: “Optimality conditions for bilevel programming problems,”
Optimization, 39 (1997), pp. 361–366.
[48] J.J. Ye and D. Zhu,New necessary optimality conditions for bilevel programs by combining
the MPEC and value function approaches, SIAM J. Optim., 20 (2010), pp. 1885–1905.
[49] C. Zhang and X. Chen,Smoothing projected gradient method and its application to stochastic
linear complementarity problems, SIAM J. Optim., 20 (2009), pp. 627–649.
[50] J. Zhang and X. Zhang,A robust SQP method for optimization with inequality constraints,
J. Comput. Math, 21 (2003), pp. 247–256.
Downloaded 05/11/18 to 111.117.126.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php