ArticlePDF Available

Abstract and Figures

We present here a characterization of the Clarke subdifferential of the optimal value function of a linear program as a function of matrix coefficients. We generalize the result of Freund (1985) to the cases where derivatives may not be defined because of the existence of multiple primal or dual solutions.
Content may be subject to copyright.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 1 — #1
i
i
i
i
i
i
Generalized derivatives of the optimal value of a
linear program with respect to matrix coefficients
Daniel DE WOLF1,2 and Yves SMEERS2
1TVES (EA 4477), Universit´e du Littoral Cˆote d’Opale, 220 avenue
de l’Universit´e, F-59140 Dunkerque, France, Email:
daniel.dewolf@univ-littoral.fr
2CORE, Universit´e Catholique de Louvain, Voie du Roman Pays 34,
1348 Louvain-La-Neuve, Belgique, Email: yves.smeers@uclouvain.be
Received at Editorial Office: February 15th, 2019, Article revised:
October 24th, 2019, Article accepted: November 12th, 2019.
Abstract
We present here a characterization of the Clarke subdifferential
of the optimal value function of a linear program as a function of
matrix coefficients. We generalize the result of Freund (1985) to the
cases where derivatives may not be defined because of the existence
of multiple primal or dual solutions.
Keywords: Linear programming, Parametric linear programming, Non-
differentiable programming.
1 Introduction
In the framework of linear programming, we consider the problem of es-
timating the variation of the objective function resulting from changes
in some matrix coefficients. Our objective is to extend results already
available for the right-hand side to this more general problem.
The interpretation of the dual variables as derivatives of the optimal value
of the objective function with respect to the elements of the right-hand side
is well known in mathematical programming. This result can be extended
to the case of multiple dual solutions. The set of all dual solutions is then
the subdifferential of the optimal value of the objective function, seen as
a convex function of the right-hand side. The object of this paper is to
extend these well known results to the derivative of the optimal value of
the objective function with respect to matrix coefficients.
1
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 2 — #2
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 2
It is easy to show on a simple example that the objective function value of
a linear program is not a convex function of the matrix coefficients. The
subdifferential concept is thus inappropriate here. One must therefore re-
sort to Clarke’s notion of a generalized gradient. A characterization of this
generalized gradient will be derived and sufficient conditions of existence
of the generalized gradient will be given for this particular application of
nonsmooth analysis.
The paper is organized as follows. Section 2 presents a study of the litera-
ture about generalized derivatives and subdifferentials of optimal function
value. Then Section 3 presents the basic definitions and mains properties
of nonmooth analysis that will be useful for our application. Then In Sec-
tion 4 we recall the result of Freund (1985) in the smooth case, namely the
gradient of the optimal value function in the case where the optimal primal
and dual solutions of the linear problem are unique. Then in Section 5,
a complete characterization of the generalized gradient for the case where
the primal or dual solutions are not unique is established. Since the Lo-
cally Lipschitz property plays an essential role in this characterization, we
will give sufficient conditions to prove the Local Lipschitz property. Sec-
tion 6 presents a practical application of this characterization coming from
the gas industry, namely, the two-stage problem of optimal dimensioning
and operating of a gas transmission network. Then Section 7 presents
some conclusions.
2 Generalized derivatives and subdifferentials of
optimal value functions
Before establishing the main result concerning the generalized gradient
of the optimal value function of a linear program with respect to ma-
trix coefficients, let us say a word about the literature on the problem of
the computation of the generalized derivatives and subdifferentials of the
optimal value function in general optimization problems.
One of the first papers on the subject was the paper of Gauvin (1979) who
considered a general mathematical programming problem with equality
and inequality constraints and pertubation of the right-hand side of the
contraints, noted uifor constraint i. He estimated the generalized gradient
of the optimal value of the problem considered as a function of the right-
hand side perturbation :
z(u) = maxxf(x), x Rn
s.t. gi(x)ui, i = 1, ...m
hi(x) = ui, i =m+ 1, ...p
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 3 — #3
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 3
Note that this case corresponds, for the particular case of a linear program,
to a perturbation of the right-hand side bof the linear programming prob-
lem since we consider a perturbation (noted Ain our case) that appears
as a coefficient of the variables xof the problem.
Several developments concerning this case (i.e. the perturbation only in
the right-hand side of the constraints) where made since that time. For
example, the regularity properties of the optimal value function in non-
linear programs where the perturbation parameters appears only in the
right-hand side was done by Craven and Janin (1993). For affine case,
namely when g(x) = Ax +b, an expression is given for its directional
derivative, not assuming the optimum to be unique. Recently, H¨offner et
al. (2016) consider the computation of generalized derivatives of dynamic
systems with a linear program embedded. They consider the optimal
value of a linear program as a function of the right-hand side of the con-
straints and present an approach to compute an element of the generalized
gradient. The approach is illustrated through a large-scale dynamic flux
balance analysis example. Last year, Gomez et al. (2018) studied the gen-
eralized derivatives problem of parametric Lexicographic Linear programs
using the lexicographic directional derivative (See Barton et al (2018) for
a survey on the lexicographic directional derivative). The paper derives
generalized derivatives information.
Unfortunately, this results can not be applied to our problem since the
perturbation appears only in the right-hand side of the contraints and
doesn’t appear as a coefficient of the decision’s variables.
Rockafellar (1984) considered the directional derivative of the optimal
value function in nonlinear programming problem with a perturbation
uthat appears in the left-hand side of the constraints:
z(u) = maxxf(x, u), x Rn
s.t. gi(x, u)0, i = 1, ...m (1)
hi(x, u) = 0, i =m+ 1, ...p
Under the assumption that every optimal solution xsatisfies the sec-
ond order constraints qualification condition, Rockafellar proved that the
function z(u) is locally Lipchitz and finite. Rockafellar also gives an upper
bound on the generalized derivative of Clarke of the function z(u).
Note also that many developments were made from this original paper for
the general case where the perturbation appears in the left-hand side of
the constraints. For example, Thibault (1991) considered a general math-
ematical programming problem in which the constraints are defined by
multifunctions and depend on a parameter u. A special study is done of
problems in which the multifunctions defining the constraints take con-
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 4 — #4
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 4
vex values. For these problems, generalized gradients of z(u) are given in
terms of the generalized gradients of the support functions of the multi-
functions. Bonnans and Shapiro (2000) studied the first order differentia-
bility analysis of the optimal value function as a function of a parameter
that appears in the objective function and in the left-hand side of the
contraints. Under a constraint qualification condition, they give an upper
bound on the directional derivative. Note that in our case of a linear pro-
gram, we will give a complete characterization of the generalized gradient,
not only upper bound on the directional derivative. Penot (2004) considers
the differentiability properties of optimal value functions for the particu-
lar case where the perturbation parameter only appears in the objective
function. More recently, Mordukhovich et al. (2007) consider the sub-
gradient of marginal functions in parametric mathematical programming.
The authors show that the subdifferential obtained for the corresponding
marginal value function are given in terms of Lagrange multipliers. Last
year, Im (2018) studied the sensitivity analysis for the special case of linear
optimization. In particular, he gives conditions for the objective function
value of a linear problem to be a Locally Lipschitz function of matrix co-
efficients. We will use these conditions in our main characterization of the
generalize gradient.
In the present paper, we shall give a complete characterization of the
generalized gradient for a particular case, namely the linear case, and not
only upper bound on directional derivative.
3 Basic definitions and properties
This section recalls some basic concepts and properties of nonsmooth op-
timization useful for our application. An introduction to the first-order
generalized derivative can be found in Clarke (1990) for the case of a
locally Lipschitz function.
Definition 3.1 A function ffrom Rn(or a subset of Rn) into Ris lo-
cally Lipschitz if for any bounded set Bfrom the interior of the domain
of fthere exists a positive scalar Ksuch that
|f(x)f(y)| ≤ Kkxyk ∀x, y B
where |.|denotes the absolute value and k.kthe usual Euclidian norm.
The locally Lipschitz property can be interpreted as a finite bound on
the variation of the function. It is well known that the locally Lipschitz
property implies the continuity of f.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 5 — #5
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 5
The Rademacher theorem says that a locally Lipschitz function fhas a
gradient almost everywhere (i.e. everywhere except on a set Zfof zero
(Lebesque) measure on Rn).
Definition 3.2 In the locally Lipschitz case, the generalized gradient
is defined as the convex hull of all the points lim f(xk)where {xk}is
any sequence which converges to xwhile avoiding the points where f(x)
does not exist:
∂f (x) = conv{lim
k→∞ f(xk) : xkx, f(xk)exists}(2)
where conv denotes the convex hull.
Another essential concept in nonsmooth optimization is the directional
derivative. This notion can also be generalized to the nonconvex case.
Definition 3.3 The generalized directional derivative of fevaluated
at xin the direction dis defined (using the notation of Clarke) as
f0(x;d) = lim sup
yx;t0
f(y+td)f(y)
t
In the convex case, this notion reduces to the classical notion of directional
derivative
f0(x;d) = lim
t0
f(x+td)f(x)
t
We shall also use the following proposition for the proof of our character-
ization of the generalized gradient:
Proposition 3.1 Let fbe a function from Rninto Ralmost everywhere
continuously differentiable. Then fis continuously differentiable at xif
and only if ∂f (x)reduces to a singleton.
Proof: See Clarke (1990).
We have the following general remark. The definition of the generalized
gradient (2) is only valid for the locally Lipschitz case. If the function
is simply almost everywhere differentiable, one can construct examples for
which the generalized gradient is not defined. A more general definition
based on the cone of normals is given by Clarke (1990) in the case of a
lower semi-continuous function.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 6 — #6
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 6
4 Gradient of the optimal value function.
Returning now to our problem, we consider the optimal value of a linear
problem as a function of the matrix coefficients:
z(A) = maxxcTx
subject to Ax =b
x0
(3)
where cis the n-column vector of objective coefficients, xis the n-column
vector of variables, Ais the m×nmatrix of left-hand side coefficients and
bis the m-column vector of right-hand side coefficients.
We first recall the result for the smooth case. It is established in Freund
(1985) under the two following assumptions:
(H1) The optimal solution of the primal problem (3) is unique;
(H2) The optimal solution of the dual problem of (3) is unique.
The result in the smooth case can then be written as follows:
Proposition 4.1 (Freund, 1985). If assumptions (H1) and (H2) are
both satisfied for A, then z(A)is continuously differentiable and we have
that ∂z
∂aij
=u
ix
j(4)
where u
iis the optimal dual variable associated to row iand x
jis the
optimal primal variable associated to column j.
A complete analysis of the subject in the differentiable case can be found
in Gal (1995).
5 Generalized gradient characterization.
Before examining the case where the optimal basis is not unique,
we show on an example that z(A) does not enjoy any convexity property.
Consider the following linear problem with a single parametric matrix
coefficient: z(a) = minxx1+x2
s.t. ax1+x2= 1
x1, x20
Using the constraint, x2can be substituted:
z(a) = minx1 + (1 a)x1
s.t. 0x11
a
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 7 — #7
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 7
The optimal objective function can thus be written explicitly as:
z(a) = 1 if a < 1
1/a if a1
It is clear that z(a) is neither convex nor concave. Because of this lack of
convexity, the notion to be used is the Clarke’s generalized gradient.
If Ais such that the linear program is infeasible, we define z(A) = −∞.
Denote by dom(z), the domain where z(A) is finite. Before stating the
characterization of the generalized gradient, we first recall the following
propositions which result from Renegar (1994).
Proposition 5.1 If the set of optimal primal solutions for Ais unbounded,
then Ais not an interior point of dom(z).
Proposition 5.2 If the set of optimal dual solutions for Ais unbounded,
then Ais not an interior point of dom(z).
We will use the following notation u×xfor the outer product of an n-
column vector uby the n-row vector xT. The following theorem states a
characterization of the generalized gradient.
Theorem 5.1 If Ais an interior point of dom(z)and if z(A)is locally
Lipschitz in a neighborhood of A,then
∂z(A) = conv{−u×xwhere uis any optimal dual solution
and xis any primal optimal solution of (3)}
Proof :
1. Suppose first that there is a single optimal basis. Since Ais an
interior point of dom(z), we know by Propositions 5.1 and 5.2 that
there are no extreme rays of primal or dual optimal solutions. In this
case, a single optimal basis is a sufficient condition to have primal
and dual nondegeneracy. We know from Proposition 4.1 that z(A)
reduces to a single matrix, which can be computed by the following
formula:
∂z(A) = {−u×x}
where uand xare the dual and primal solutions associated to the
unique optimal basis for (3). This proves the theorem.
2. Suppose next that there are several optimal basis. Lets first intro-
duce some useful notation from linear programming. For a particular
optimal basis, we denote by Bthe columns of matrix Acorrespond-
ing to the basic variables noted xBand we denote by Nthe columns
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 8 — #8
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 8
of matrix Acorresponding to non basic variables, noted xN. The
constraints can by rewritten as follows:
Ax =b(B, N )(xB, xN)T=bBxB=bxB=B1b
Denote cbthe objective coefficient corresponding to basic variables,
it follows that the objective function can be rewritten for a particular
basis as follows:
z(A) = cT
BB1b
We first prove the following inclusion:
conv{−u×xsuch that uT=cT
BB1, xB=B1b, xN= 0
and Bcorresponds to an optimal basis of (3)} ⊂ z(A)
Let Bcorresponds to an optimal basis of (3). Since this optimal
basis is not unique, there must be at least one non-basic variable xj
with a zero reduced cost:
cjcT
BB1aj= 0
where ajdenotes the column jof matrix A. Using the definition
of the vector of dual variables uT=cT
BB1, this condition can be
rewritten:
cjuTaj= 0
We can exclude the pathological case where u= 0. In fact, this case
can be treated by a perturbation of the objective coefficients. This
only requires to consider Aas the extended matrix of the system
where the objective is added as a constraint.
We can thus take uidifferent from zero and define the following
perturbation of the column ajfor any column with zero reduced cost:
if ui>0, subtract  > 0 ( < 0 if ui<0) from the ith component
of aj. For the perturbed problem, all the reduced costs are strictly
positive, and therefore the optimal solution becomes unique.
More specifically, let a sequence of matrices A() where the objective
function is differentiable converging towards matrix Awhere it is not
differentiable. The primal and dual optimality conditions hold for
each point of the sequence and the optimal primal and dual variables
are continuous fonctions of the perturbed matrix since this latter is
a perturbation of an invertible matrix at the limit point. The limits
of the primal and dual variables at each point of the sequence thus
exist and satisfy the primal dual relations at the limit point.
It can be concluded that all the points associated with all the optimal
basis for the problem (3) belong to the generalized gradient, since
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 9 — #9
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 9
they can be written as the limit of perturbed problems where the
optimal basis is unique and so where the function zis continuously
differentiable in A() for all  > 0.
Now consider the inverse inclusion:
∂z(A)conv{−u×xsuch that uT=cT
BB1, xB=B1b, xN= 0
and Bcorresponds to an optimal basis of (3)}
Let {Ak}be any sequence such that z(Ak) exists and converges.
The optimal basis associated with each Akdoes not have to be
unique. As shown by Freund (1985), we can have, for a given matrix
A, several degenerate optimal basis although zis continuously differ-
entiable for matrix A. We will show, in this case, that any optimal
basis associated with Akmust give the same point (..., uixj, ...).
Suppose the opposite, that is, there are two different optimal basis
for Ak, giving two different points (..., u1
ix1
j, ...) and (..., u2
ix2
j, ...)
respectively. As done in part i) of the proof, the matrix Akcan be
perturbed in order to have no more than one of the two basis op-
timal. By taking this limit, we obtain that the first point is in the
generalized gradient. By applying the same procedure of perturba-
tion to the second basis, we show that the second point is also in the
generalized gradient. We can therefore conclude that ∂z(Ak) is not
a singleton. Applying Proposition 3.1, this contradicts the fact that
zis continuously differentiable in Ak.
The gradient can therefore be associated with any of the optimal
basis. Note by {βk}a sequence of optimal basis for Ak(i.e. βkis an
optimal basis for Ak). By a basis β, we mean here a partition of the
variables between basic and non-basic variables. As {βk}is an infi-
nite sequence of basis and as there is only a finite choice of mcolumns
among the ncolumns of the Amatrix, so there must be a special
basis βwhich is repeated infinitely often in the sequence. Let {Bl}
be the subsequence corresponding to this basis which is repeated
infinitely often. The corresponding subsequence {(..., xjui, ...)}lof
gradients associated with this basis converges to the same point as
the original sequence. As
cT
NcT
B(Bl)10
(Bl)1b0
for all l, these inequalities remain true for l→ ∞ and so {Bl}con-
verges to an optimal basis for (3). This completes the proof of the
reverse inclusion.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 10 — #10
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 10
3. We finally show that
∂z(A) = conv{−u×xwhere uis any optimal dual solution
and xis any primal optimal solution of (3)}
Because the sets of primal and dual optimal solutions corresponding
to point Aare bounded by Propositions 5.1 and 5.2, uand xare con-
vex combinations of extreme dual and primal solutions respectively.
Let
u=X
k
µkukwhere X
k
µk= 1 and µk0
x=X
l
λlxlwhere X
l
λl= 1 and λl0
Suppose first that uis a convex combination of extreme ukwhile x
is an extreme optimal point. One has
uixj=X
k
λkuk
ixj
for a given set of λkand for all iand j. Therefore
u×x=X
k
λkuk×xk
This implies that
conv{−u×xwhere uis any optimal dual solution
and xB=B1b, xN= 0,where Bis the optimal basis}
=conv{−u×xwhere u=cT
BB1, xB=B1b,
xN= 0 and Bis any optimal basis of (3)}
The same reasoning can be made in order to relax the requirement
that xis an extreme solution into the weaker one that xis any
optimal solution of problem (3).
Before illustrating the theorem on an example, let us say a few words
about the requirements for Ato be an interior point of dom(z) and for
z(A) to be Lipschitz in a neighborhood of A. Im (2018) proves that these
two requirements holds true if the following conditions are satisfied:
Assumption 5.1 The matrix Ais of full rank and the Slater constraints
qualification is satisfied.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 11 — #11
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 11
Proposition 5.3 If matrix Ais a full rank matrix and if the Slater con-
straints qualification is satisfied, then
Ais an interior point of dom(z)and
the function z(A)is Lipchitz in a neighborhood of A.
Proof: See Im (2018), pages 74-76.
As indicated by H¨offner et al (2016), any linear program can be reduced
to an equivalent linear program that satisfies the full rank property for A
by removing linearly dependent rows.
The following simple example illustrates Theorem 5.1:
z(a) = max
xx1+x2
s.t.
x1+ 2x23
x1+ax22
x1, x20
The feasible region and the objective function are represented in Figure
1 for the particular choice of a= 1. For a= 1, there exists two different
Figure 1: Illustrative example.
basic solutions. The first one is obtained with x1and x2in the basis:
(x1, x2) = (1,1) and the reduced cost of s1, the first slack variable, is
zero. The second solution is obtained by taking x1and s1in the basis:
(x1, x2) = (2,0) and the reduced cost of x2is zero. In both cases, the
optimal dual values are given by (u1, u2) = (0,1).
Take a= 1 and let go to zero. We obtain the first solution and the
reduced cost associated to s1is strictly negative. Take a= 1 + and let
go to zero. We obtain the second solution and the reduced cost associated
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 12 — #12
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 12
to x2is strictly negative. The extreme points of the generalized gradient
are thus:
u2x2=1 (first case)
u2x2= 0 (second case).
One therefore obtains:
∂z(1) = [1,0]
In fact, the general expression of z(a) can be computed explicitly as:
z(a) =
3a5
a2if a < 1
2 if a1
The graph of the optimal value of the function z(a) is represented in Figure
2 as a function of parameter a. The two points 1 and 0 correspond thus
Figure 2: Graph of the optimal value of the objective function.
to the left- and right-derivatives of z(a) at point a= 1 respectively.
6 Pratical application
The motivation for considerating the computation of the generalized gra-
dient of the objective function of a linear problem with respect to the
matrix coefficients is the general problem where there is a two-stage prob-
lem where at the first stage a capacity investment decision is made and
at the second stage the operating of the system is optimized taking into
account this investment decision. In some cases, the investment decision
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 13 — #13
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 13
appears in the right-hand side (such as capacity level decision). We con-
sider the case where the investment decision appears as coefficient of
the second-stage decision variables. Let us illustrate this fact with
an example from the gas industry.
Consider the problem of the optimal dimensioning of pipe networks for
the transmission of natural gas. See, for example, De Wolf and Smeers
(1996) who consider the following two-stage problem for investment and
exploitation of gas transmission networks. At the first stage, the optimal
diameters of pipe lines, denoted D, must be determined in order to min-
imize the sum of the direct investment cost function, denoted C(D), and
Q(D), the future operating cost function.
minDF(D) = C(D) + Q(D)
s.t. Dij 0,(i, j)SA
(5)
where SA denotes the set of arcs of the network.
The operations problem for a given choice of the diameters can thus be
formulated as follows:
Q(D) = min
f,s,p X
jNs
cjsj
s.t.
X
j|(i,j)A
fij X
j|(j,i)A
fji =siiN
sign(fij )f2
ij =K2
ij D5
ij (p2
ip2
j)(i, j)SA
sisisiiN
pipipiiN
(6)
where the variables of the problem are fij, the flow in the arc (i, j), si,
the net supply at the node iand pi, the pressures at node i. The set of
nodes is denoted N. For simplicity of notation, we define the variable πi
as the square of the pressure at node i:
πi=p2
i.
Let us replace in the only nonlinear relation of exploitation problem (6),
the Dij variable by the following substitute:
xij =D5
ij
In fact, taking the xij as parameters, we find that they appear as linear
coefficients of the squared pressure variables in the equation:
sign(fij )f2
ij K2
ij D5
ij (πiπj)=0.(7)
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 14 — #14
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 14
De Wolf and Smeers (2000) solve the problem of the gas transmission prob-
lem by an extension of the Simplex algorithm using piecewise linearisation
of the first term sign(fij )f2
ij . We are thus back in the linear case. Let w
ij
be the optimal value of the dual variable associated with constraint (7).
Applying Theorem 5.1, one obtains an element of the generalized gradient
by: ∂Q
∂xij
=w
ij K2
ij (π
iπ
j) (8)
Now, to obtain an element of the generalized gradient with respect to the
original variables (Dij), one uses the chain rule for the composition of
derivative with: ∂xij
∂Dij
= 5D4
ij .
It is then easy to prove that the following expression gives an element of
the generalized gradient of Q(D):
∂Q(D)
∂Dij
=w
ij (π
iπ
j)5K2
ij D4
ij (9)
This formula gives thus an element of the generalized gradient, which is the
only information required by the bundle method (See Lemar´echal (1989))
used to solve the two stage problem. See De Wof and Smeers (1996) for
the application of the bundle method to this two-stage problem.
7 Conclusions.
It has been shown in this paper how the first-order derivatives of the op-
timal solution of a linear program with respect to matrix coefficients can
be generalized to the nonsmooth case, even when the optimal function as
a function of matrix coefficients admits breakpoints. Our result, Theorem
5.1, emphasizes the fundamental role played by bases in this respect. The
extreme points of the generalized gradient correspond to all the different
optimal basis. A practical application to gas transmission network opti-
mization, which was in fact the motivation for considering such a formula,
was then presented.
Acknowledgments
We would like to thank Professor Guy de Ghellinck for valuable discussions
and suggestions. We would also like to thank Professors Jean-Jacques
Strodiot and Claude Lemar´echal for insightful discussions on the state of
the art in nonsmooth optimization. Finally, we would like to thank the
two anonymous reviewers for valuable suggestions to improve the paper.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 15 — #15
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 15
References
[1] Barton P.I., K. A. Khan, P. Stechlinski and H. A.J. Watson, (2018),
Computationally relevant generalized derivatives: theory, evaluation
and applications, Optimization Methods & Software, Volume 33, Nos.
4-6, Pages 1030-1072.
[2] Bonnans J. F. and A. Shapiro, (2000), Perturbation Analysis of Opti-
mization Problems, Springer Series in Operations Research, Springer-
Verlag, New York.
[3] Clarke F.H., (1990), Optimization and Nonsmooth Analysis, Classics
in Applied Mathematics 5, Society for Industrial and Applied Mathe-
matics.
[4] Craven B.D. and R. Janin (1993), Regularity properties of the optimal
value function in non linear programming. Optimization, Volume 28,
Pages 1-7.
[5] De Wolf D. and Y. Smeers (1996), Optimal dimensioning of pipe net-
works with application to gas transmission networks. Operations Re-
search,Volume 44, No 4, Pages 596-608.
[6] De Wolf D. and Y. Smeers (2000), The Gas Transmission Problem
Solved by an Extension of the Simplex Algorithm. Management Sci-
ence, Volume 46, No 11, Pages 1454-1465.
[7] Freund R.M., (1985), Postoptimal analysis of a linear program under
simultaneous changes in matrix coefficients. In: Cottle R.W. (eds)
Mathematical Programming Essays in Honor of George B. Dantzig
Part I., Mathematical Programming Studies, Volume 24, Pages 1-13.
[8] Gal T., (1995), Postoptimal Analyses, Parametric Programming and
Related Topics, Walter de Gruyter, Berlin, New York.
[9] Gauvin J., (1979), The Generalized Gradient of a Marginal Function
in Mathematical Programming, Mathematics of Operations Research,
Volume 4, Pages 458-463.
[10] Gomez J. A., H¨offner K., Khan K. A. and P.I. Barton, (2018), Gen-
eralized Derivatives of Lexicographic Linear Programs, Journal of Op-
timization Theory and Applications 178, Pages 477-501.
[11] offner K., Khan K. A. and P. I. Barton, (2016), Generalized deriva-
tives of dynamic systems with a linear program embedded, Automat-
ica, Volume 63, Pages 198-208.
i
i
“MatrixSens-EJOR-v3” — 2019/11/25 — 17:31 — page 16 — #16
i
i
i
i
i
i
Accepted by the European Journal of Operational Research 16
[12] Im J., (2018), Sensitivity Analysis and Robust Optimization: A Geo-
metric Approach for the Special Case of Linear Optimization, Master’s
thesis, University of Waterloo, Ontario, Canada, Pages 1-97.
[13] Lemar´echal C., (1989), Nondifferentiable optimization, in: G.L.
Nemhauser and al., eds., Handbooks in Operations Research and Man-
agement Science Volume 1 : Optimization, Elsevier Science Publishers,
Pages 529-572.
[14] Mordukhovich B. S., N. M. Nam and N. D. Yen, (2007), Subgradi-
ents of marginal functions in parametric mathematical programming,
Math. Program., Ser. B, No 116, Pages 369-396.
[15] Penot J.-P., (2004), Differentiability Properties of Optimal Value
Functions, Canad. J. Math. Volume 56 (4), Pages 825-842.
[16] Renegar, J., (1994), Some perturbation theory for linear program-
ming, Mathematical Programming 65, Pages 73-91.
[17] Rockafellar, R.T., (1984), Directional differentiability of the opti-
mal value function in a nonlinear programming problem. In: Fiacco
A.V.(eds) Sensitivity, Stability and Parametric Analysis, Mathematical
Programming Studies, Volume 21, North-Holland, Pages 213-226.
[18] Thibault L., (1991), On Subdifferentials of optimal value functions,
SIAM J. Control and Optimization, Volume 29, No 5, Pages 1019-1036.

Supplementary resource (1)

... Each function evaluation v(P ) requires the solution of the MILP (PMR-OA). Under suitable assumptions, a generalized gradient ∂ P v(P ) can be obtained by fixing y to an optimal y solution of (PMR-OA) and computing a generalized gradient of the resulting LP (8) using parametric sensitivity theory [21]. We formalize these details in the next section. ...
... Theorem 1 of Freund [25] (cf. Proposition 4.1 of [21]) and the fact that the function M (·, y * ) is continuously differentiable on P together imply v(·, y * ) is continuously differentiable at P and the stated equalities hold. ...
... Lemma 2 implies v(·) ≡ v(·, y * ) in a neighborhood of P . The stated equalities hold by mirroring the proof of Theorem 5.1 of De Wolf and Smeers [21] and noting that the function M (·, y * ) is continuously differentiable on P. ...
Preprint
We learn optimal instance-specific heuristics to accelerate partitioning algorithms for solving nonconvex quadratically-constrained quadratic programs (QCQPs) to global optimality. Specifically, we propose the novel problem of \textit{strong partitioning} to optimally partition the domains of variables participating in nonconvex terms within a QCQP \textit{without} sacrificing guarantees of global optimality. We then design a local optimization method for solving this challenging max-min strong partitioning problem. Because solving this max-min problem to local optimality may still be time consuming, we propose to use machine learning (ML) to learn this strategy on homogeneous families of QCQPs. We present a detailed computational study on randomly generated families of QCQPs, including instances of the pooling problem, using the open-source global solver Alpine. Our numerical experiments demonstrate that strong partitioning and its ML approximation significantly reduce Alpine's solution time by factors of 3.516.53.5 - 16.5 and 24.52 - 4.5 \textit{on average} and by maximum factors of 1570015 - 700 and 1020010 - 200, respectively, over different QCQP families.
... We remark that if * is locally Lipschitz with respect to the constraint matrix parameters A eq and A in , it is differentiable everywhere but a set of measure 0 by a theorem of Rademacher (see [44] for discussion). Further, the gradient is defined when (2) has unique primal/dual optima and in this case, D (x,λ,ν) H −1 is defined so ∇ * (q) is computable [44,Prop. ...
... We remark that if * is locally Lipschitz with respect to the constraint matrix parameters A eq and A in , it is differentiable everywhere but a set of measure 0 by a theorem of Rademacher (see [44] for discussion). Further, the gradient is defined when (2) has unique primal/dual optima and in this case, D (x,λ,ν) H −1 is defined so ∇ * (q) is computable [44,Prop. 4.1]. The Lipschitz condition can always be satisfied by removing degenerate constraints, so we assume it. ...
Preprint
Full-text available
Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a novel, almost-everywhere differentiable approximation of the classical epsilon grasp metric. The min-weight metric is simple and interpretable, provides a reasonable measure of grasp robustness, and admits numerically efficient gradients for smooth optimization. We leverage these properties to rapidly synthesize collision-free robust grasps - typically in less than a second. FRoGGeR can refine the candidate grasps generated by other methods (heuristic, data-driven, etc.) and is compatible with many object representations (SDFs, meshes, etc.). We study FRoGGeR's performance on over 40 objects drawn from the YCB dataset, outperforming a competitive baseline in computation time, feasibility rate of grasp synthesis, and picking success in simulation. We conclude that FRoGGeR is fast: it has a median synthesis time of 0.834s over hundreds of experiments.
... Moreover, such LDderivatives follow a sharp chain rule for composite functions, and thus allowing treatment of complex problems with φ embedded (in contrary to methods only computing generalized Jacobian elements e.g. [5,9]). We consider three cases of parameterized convex NLPs of the form (1): ...
Article
Full-text available
This article proposes new practical methods for furnishing generalized derivative information of optimal-value functions with embedded parameterized convex programs, with potential applications in nonsmooth equation-solving and optimization. We consider three cases of parameterized convex programs: (1) partial convexity—functions in the convex programs are convex with respect to decision variables for fixed values of parameters, (2) joint convexity—the functions are convex with respect to both decision variables and parameters, and (3) linear programs where the parameters appear in the objective function. These new methods calculate an LD-derivative, which is a recently established useful generalized derivative concept, by constructing and solving a sequence of auxiliary linear programs. In the general partial convexity case, our new method requires that the strong Slater conditions are satisfied for the embedded convex program’s decision space, and requires that the convex program has a unique optimal solution. It is shown that these conditions are essentially less stringent than the regularity conditions required by certain established methods, and our new method is at the same time computationally preferable over these methods. In the joint convexity case, the uniqueness requirement of an optimal solution is further relaxed, and to our knowledge, there is no established method for computing generalized derivatives prior to this work. In the linear program case, both the Slater conditions and the uniqueness of an optimal solution are not required by our new method.
... However, generalized gradients can be defined. Proposition 2. [Gal75]; [Fre85]; [DWS00] If z * = LP (c, A, b) is finite at (c, A, b) and in some neighborhood of (c, A, b), then generalized gradients of z * with respect to c, b, and A exist and are ...
Preprint
Full-text available
When samples have internal structure, we often see a mismatch between the objective optimized during training and the model's goal during inference. For example, in sequence-to-sequence modeling we are interested in high-quality translated sentences, but training typically uses maximum likelihood at the word level. Learning to recognize individual faces from group photos, each captioned with the correct but unordered list of people in it, is another example where a mismatch between training and inference objectives occurs. In both cases, the natural training-time loss would involve a combinatorial problem -- dynamic programming-based global sequence alignment and weighted bipartite graph matching, respectively -- but solutions to combinatorial problems are not differentiable with respect to their input parameters, so surrogate, differentiable losses are used instead. Here, we show how to perform gradient descent over combinatorial optimization algorithms that involve continuous parameters, for example edge weights, and can be efficiently expressed as integer, linear, or mixed-integer linear programs. We demonstrate usefulness of gradient descent over combinatorial optimization in sequence-to-sequence modeling using differentiable encoder-decoder architecture with softmax or Gumbel-softmax, and in weakly supervised learning involving a convolutional, residual feed-forward network for image classification.
Article
Full-text available
Calculation of loss scenarios is a fundamental requirement of simulation-based capital models and these are commonly approximated. Within a life insurance setting, a loss scenario may involve an asset-liability optimization. When cashflows and asset values are dependent on only a small number of risk factor components, low-dimensional approximations may be used as inputs into the optimization and resulting in loss approximation. By considering these loss approximations as perturbations of linear optimization problems, approximation errors in loss scenarios can be bounded to first order and attributed to specific proxies. This attribution creates a mechanism for approximation improvements and for the eventual elimination of approximation errors in capital estimates through targeted exact computation. The results are demonstrated through a stylized worked example and corresponding numerical study. Advances in error analysis of proxy models enhance confidence in capital estimates. Beyond error analysis, the presented methods can be applied to general sensitivity analysis and the calculation of risk.
Article
Full-text available
Lexicographic linear programs are fixed-priority multiobjective linear programs that are a useful model of biological systems using flux balance analysis and for goal-programming problems. The objective function values of a lexicographic linear program as a function of its right-hand side are nonsmooth. This work derives generalized derivative information for lexicographic linear programs using lexicographic directional derivatives to obtain elements of the Bouligand subdifferential (limiting Jacobian). It is shown that elements of the limiting Jacobian can be obtained by solving related linear programs. A nonsmooth equation-solving problem is solved to illustrate the benefits of using elements of the limiting Jacobian of lexicographic linear programs.
Article
Full-text available
Differentiability properties of optimal value functions associated with perturbed optimization problems require strong assumptions. We consider such a set of assumptions which does not use compactness hypothesis but which involves a kind of coherence property. Moreover, a strict differentiability property is obtained by using techniques of Ekeland and Lebourg and a result of Preiss. Such a strengthening is required in order to obtain genericity results.
Article
A new method for evaluating generalized derivatives in nonsmooth problems is reviewed. Lexicographic directional (LD-)derivatives are a recently developed tool in nonsmooth analysis for evaluating generalized derivative elements in a tractable and robust way. Applicable to problems in both steady-state and dynamic settings, LD-derivatives exhibit a number of advantages over current theory and algorithms. As highlighted in this article, the LD-derivative approach now admits a suitable theory for inverse and implicit functions, nonsmooth dynamical systems and optimization problems, among others. Moreover, this technique includes an extension of the standard vector forward mode of automatic differentiation (AD) and acts as the natural extension of classical calculus results to the nonsmooth case in many ways. The theory of LD-derivatives is placed in the context of state-of-the-art methods in nonsmooth analysis, with an application in multistream heat exchanger modelling and design used to illustrate the usefulness of the approach.
Article
Dynamic systems with a linear program (LP) embedded can be found in control and optimization of bioreactor models based on dynamic flux balance analysis (DFBA). Derivatives of the dynamic states with respect to a parameter vector are essential for open and closed-loop dynamic optimization and parameter estimation of such systems. These derivatives, given by a forward sensitivity system, may not exist because the optimal value of a linear program as a function of the right-hand side of the constraints is not continuously differentiable. Therefore, nonsmooth analysis must be applied which provides optimality conditions in terms of subgradients, for convex functions, or Clarke's generalized gradient, for nonconvex functions. This work presents an approach to compute the necessary information for nonsmooth optimization, i.e., an element of the generalized gradient. Moreover, a numerical implementation of the results is introduced. The approach is illustrated through a large-scale dynamic flux balance analysis example.
Chapter
This chapter discusses the nondifferentiable optimization (NDO). Nondifferentiable optimization or nonsmooth optimization (NSO) deals with the situations in operations research where a function that fails to have derivatives for some values of the variables has to be optimized. For this situation, new tools are required to replace standard differential calculus, and these new tools come from convex analysis. Functions with discontinuous derivatives are frequent in operations research. Sometimes they arise when modeling the problem, sometimes they are introduced artificially during the solution procedure. The chapter discusses the necessary concepts and the basic properties and some examples of practical problems motivating the use of NSO. It is shown how and why classical methods fail. The chapter also discusses some possibilities that can be used when a special structure exists in the nonsmooth problem. It also presents subgradient methods and more recent methods and also covers some orientations for future research.
Chapter
In this chapter we study parameterized variational inequalities (generalized equations) and discuss applications of the theory to nonlinear, semi-definite and semi-infinite programming problems. Various aspects of these specific applications of the general theory have been discussed in the previous chapters. However, we recommand to readers primarily interested in one of these applications to read the corresponding section of this chapter first, since it allows one to have a global view of the power of optimality conditions and perturbation theory for these topics. In addition, this chapter provides some results that were not presented in the previous chapters.
Book
The main subject of this book is perturbation analysis of continuous optimization problems. In the last two decades considerable progress has been made in that area, and it seems that it is time now to present a synthetic view of many important results that apply to various classes of problems.
Article
A general mathematical programming problem in which the constraints are defined by multifunctions and depend on a parameter u, and the resulting value function m(u) are considered. In the context of Banach spaces admitting equivalent Frechet differentiable norms estimates for the generalized gradient ∂m of m are established. A special study is made of problems in which the multifunctions defining the constraints take convex values. For these problems, estimates for ∂m are given in terms of the generalized gradients of the support functions of these multifunctions.