A Probing Algorithm for MINLP with Failure Prediction by SVM.
ABSTRACT Bound tightening is an important component of algorithms for solving nonconvex Mixed Integer Nonlinear Programs. A probing algorithm is a bound-tightening procedure that explores the consequences of restricting a variable to a subinterval with
the goal of tightening its bounds. We propose a variant of probing where exploration is based on iteratively applying a truncated
Branch-and-Bound algorithm. As this approach is computationally expensive, we use a Support-Vector-Machine classifier to infer
whether or not the probing algorithm should be used. Computational experiments demonstrate that the use of this classifier
saves a substantial amount of CPU time at the cost of a marginally weaker bound tightening.
- [Show abstract] [Hide abstract]
ABSTRACT: In the context of convex mixed integer nonlinear programming (MINLP), we investigate how the outer approximation method and the generalized Benders decomposition method are affected when the respective nonlinear programming (NLP) subproblems are solved inexactly. We show that the cuts in the corresponding master problems can be changed to incorporate the inexact residuals, still rendering equivalence and finiteness in the limit case. Some numerical results will be presented to illustrate the behavior of the methods under NLP subproblem inexactness.Journal of Global Optimization 04/2013; 55(4). · 1.31 Impact Factor
Page 1
RC25103 (W1101-112) January 28, 2011
Computer Science
IBM Research Report
A Probing Algorithm for MINLP with Failure
Prediction by SVM
Giacomo Nannicini1, Pietro Belotti2, Jon Lee3, Jeff Linderoth4,
François Margot1, Andreas Wächter3
1Tepper School of Business
Carnegie Mellon University
Pittsburgh, PA
2Department of Mathematical Sciences
Clemson University
Clemson, SC
3IBM Research Division
Thomas J. Watson Research Center
P.O. Box 218
Yorktown Heights, NY 10598
4Industrial and Systems Engineering
University of Wisconsin-Madison
Madison, WI
Research Division
Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich
LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research
Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific
requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P.
O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home .
Page 2
A probing algorithm for MINLP
with failure prediction by SVM
Giacomo Nannicini1⋆, Pietro Belotti2, Jon Lee3,
Jeff Linderoth4⋆⋆, Fran¸ cois Margot1⋆ ⋆ ⋆, Andreas W¨ achter3
1Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA
{nannicin,fmargot}@andrew.cmu.edu
2Dept. of Mathematical Sciences, Clemson University, Clemson, SC
pbelott@clemson.edu
3IBM T. J. Watson Research Center, Yorktown Heights, NY
{jonlee,andreasw}@us.ibm.com
4Industrial and Systems Eng., University of Wisconsin-Madison, Madison, WI
linderoth@wisc.edu
Abstract. Bound tightening is an important component of algorithms
for solving nonconvex Mixed Integer Nonlinear Programs. A probing al-
gorithm is a bound-tightening procedure that explores the consequences
of restricting a variable to a subinterval with the goal of tightening its
bounds. We propose a variant of probing where exploration is based on it-
eratively applying a truncated Branch-and-Bound algorithm. As this ap-
proach is computationally expensive, we use a Support-Vector-Machine
classifier to infer whether or not the probing algorithm should be used.
Computational experiments demonstrate that the use of this classifier
saves a substantial amount of CPU time at the cost of a marginally
weaker bound tightening.
1Introduction
A Mixed Integer Nonlinear Program (MINLP) is a mathematical program with
continuous nonlinear objective and constraints, where some of the variables are
required to take integer values. Without loss of generality, we assume that the
problem is a minimization problem. MINLPs naturally arise in numerous applied
problems, see e.g. [1,2]. In this paper, we address nonconvex MINLPs where
neither the objective function nor the constraints are required to be convex —
a class of problems typically difficult to solve in practice. An exact solution
method for nonconvex MINLPs is Branch-and-Bound [3], where lower bounds
are obtained by convexifying the feasible region using under-estimators, often
linear inequalities [4,5]. The convexification depends on the variable bounds,
⋆Supported by an IBM grant and by NSF grant OCI-0750826.
⋆⋆Supported by U.S. Department of Energy grant DE-FG02-08ER25861 and by NSF
grant CCF-0830153.
⋆ ⋆ ⋆Supported by NSF grant OCI-0750826.
Page 3
2Nannicini et al.
with tighter bounds resulting generally in a tighter convexification. As such,
bound tightening is an important part of any MINLP solver.
Probing is a bound-tightening technique often applied to Mixed Integer Lin-
ear Programs(MILPs) [6]. The idea is to tentatively fix a binary variable to 0 and
then to 1, and use the information obtained to strengthen the linear relaxation
of the problem. Similar techniques have been applied to MINLPs as well [5].
In this paper, we propose a probing technique based on truncated Branch-and-
Bound searches. Let ¯ z be the objective value of the best solution of the original
problem found so far. In each Branch-and-Bound search, we choose a variable,
say xi, and impose xi∈ S, where S is a subinterval of the current domain of xi.
In addition, we add a constraint bounding the objective value of the solution to
at most ¯ z. If that problem is infeasible, we can discard S from the domain of xi.
On the other hand, if we are able to solve the modified problem to optimality,
with an optimal value ¯ z∗< ¯ z, we update ¯ z and can again discard S from the
domain of xi. Details on the choice of xiand S are given in Section 3.
This probing algorithm potentially requires a significant amount of CPU
time. To limit this drawback, we use a Support Vector Machine (SVM) classifier
[7] before performing a Branch-and-Bound search, to predict the success or fail-
ure of the search. If we conclude that the probing algorithm is unlikely to tighten
the bounds on the variable, we skip its application. Machine learning methods
have been used in the OR community for various tasks, such as parameter tun-
ing [8] and solver selection [9]. In this paper, machine learning is used to predict
failures of an algorithm based on characteristics of its input data. The features
on which the SVM prediction is based are problem and subinterval dependent,
and are related to the outcome of the application of a fast bound-tightening
technique (Feasibility-Based Bound Tightening [5]) using the same subinterval.
We provide preliminary computational results to assess the practical effi-
ciency of the approach. The experiments show that the proposed probing algo-
rithm is very effective in tightening the variable bounds, and it is helpful for
solving MINLPs with Branch-and-Bound. By using SVM to predict failures of
the probing algorithm, we save on average 30% of the total bound-tightening
time, without much deterioration of the quality of the bounds.
The rest of this paper is organized as follows. In Section 2, we introduce
the necessary background. In Section 3, we describe the probing algorithm. In
Section 4, we discuss how we can integrate a machine learning method in our
algorithm to save CPU time. In Section 5, we provide computational testing of
the proposed ideas and Section 6 has conclusions.
2Background
A function is factorable if it can be computed in a finite number of simple steps,
starting with model variables and real constants, using elementary unary and
Page 4
A probing algorithm for MINLP3
binary operators. We consider an MINLP of the form:
min
s.t.
f(x)
gj(x) ≤ 0
i≤ xi≤ xU
f(x) ≤ ¯ z
xi∈ Z ∀i ∈ NI,
∀j ∈ M
∀i ∈ NxL
i
P
where f and gj are factorable functions, N = {1,...,n} is the set of variable
indices, M = {1,...,m} is the set of constraint indices, x ∈ Rnis the vector
of variables with lower/upper bounds xL∈ (R ∪ {−∞})n, xU∈ (R ∪ {+∞})n,
and ¯ z is an upper bound on the optimal objective value, which can be infinite.
The variables with indices in NI⊂ N are constrained to take on integer values
in the solution.
A Linear Programming (LP) based Branch-and-Bound algorithm can be used
to solve P [4]. In such a method, subproblems of P are generated by restricting
the variables to reduced interval domains, [¯ xL, ¯ xU] ⊂ [xL,xU]. A key step is the
creation of an LP relaxation of the feasible region of a subproblem, which we
refer to as convexification. This convexification is used to obtain a lower bound
on the optimal objective value of the subproblem. In general, the tighter the
variable bounds, the tighter the convexification, and the stronger the resulting
lower bound. Therefore, bound-tightening techniques aim to deduce improved
variable bounds implied by the constraint structure of the subproblem, and are
widely used by existing software, such as Baron [10] and Couenne [11], for the
solution of MINLPs.
A commonly used bound-tightening procedure is Feasibility-Based Bound
Tightening (FBBT), which uses a symbolic representation of the problem in
order to propagate bound changes on a variable to other variables. For instance,
suppose that P contains the equation x3= x1+ x2, with variable bounds x1∈
[0,1], x2 ∈ [0,3], x3 ∈ [0,4]; if we tighten the bounds on x2 and restrict this
variable to the interval [1,2], then we can propagate the change to x3and impose
x3∈ [1,3]. A full description of FBBT can be found in [12,13].
The other aspects of the Branch-and-Bound algorithm are similar to those
of any Branch-and-Bound for solving MILPs; see [5] for more details.
3The probing algorithm
In this section we describe the probing algorithm to increase the lower bound
on variable xi, where the current bounds on that variable are xi ∈ [xL
with xL
i = −∞ is treated below. The probing
algorithm for decreasing the upper bound is similar. For simplicity, we describe
the procedure applied to the root node P.
Let ℓ and u be such that xL
obtained from P by adding the constraint xi ∈ [ℓ,u]. For s > 0, an s-probing
iteration for xiconsists of the following: set ℓ = xL
perform a Branch-and-Bound search on Pi[ℓ,u] with a time limit. If we have then
i,xU
i]
i > −∞. The special case xL
i≤ ℓ ≤ u ≤ xU
i. We denote by Pi[ℓ,u] the problem
i, u = min{xL
i+ s,xU
i}, and
Page 5
4Nannicini et al.
proved that Pi[ℓ,u] is infeasible, we update the lower bound xL
able to solve Pi[ℓ,u] to optimality, finding a solution with objective value z∗, we
update the best incumbent value ¯ z ← z∗and the lower bound xL
cases, the s-probing iteration is deemed a success. Otherwise, it is a failure.
i← u. If we are
i← u. In both
The Aggressive Probing algorithm for variable xi(see Algorithm 1) has
an initial value for s as input and runs an s-probing iteration. While an exit
condition is not met, if the s-probing iteration is successful, the value of s is
doubled and a new s-probing iteration is executed. If an s-probing iteration
fails, the value of s is halved and a new s-probing iteration is performed.
For integer variables, we round the probing interval endpoints appropriately.
Additionally, if the search on Pi[ℓ,u] is completed and u is integer, we set xL
u + 1 instead of xL
i← u.
With sufficiently large time limits, Algorithm 1 will provide an optimality
certificate if ¯ z is the optimal objective value. If run to completion, the algorithm
proves that no better solution exists with xiin the interval [xL
The special case xL
i= −∞ is handled as follows. Define a positive number
B. We perform a Branch-and-Bound search on Pi[−∞,−B]. If Pi[−∞,−B] is
proved infeasible or solved to optimality within the time limit, we set xL
and execute Algorithm 1. Otherwise, we conclude that xL
In our experiments, we use B = 1010.
i←
i,xU
i].
i← −B
icannot be tightened.
Two details of the algorithm still need to be specified: the exit condition and
the initial choice of s. We use two exit conditions: a maximum CPU time for the
application of Aggressive Probing, and a maximum number of consecutive
failed s-probing iterations. We experimented with several choices for the initial
value of s that took into account the distance between the variable bounds and
the solution of the LP relaxation (or a feasible solution to P, if available). But
the results were not better than a simpler method that seems to work well: the
initial value of s is chosen to be a small, fixed value size, depending on the
variable type. In our experiments, we use size = 0.5 for continuous variables,
size = 1.0 for general integer variables, and size = 0.0 for binary variables.
The good performance of the update strategy and the initial choice for the
value of s lies in the dynamic and geometric adjustment of the interval length
during Aggressive Probing. If the initial interval can easily be proven infea-
sible, the s-probing iteration will terminate very quickly, typically with a single
application of FBBT, or at the root node by solving the LP relaxation. In this
case, because the interval size is increased in a geometric fashion, in a few it-
erations we will reach the scale that is needed for the probing interval to be
“not trivially infeasible”. On the other hand, using a small interval size yields
better chances of completing the s-probing iteration within the time limit, in
case Pi[ℓ,u] is difficult to solve even with u close to ℓ.
In our experiments, we set the time limit for the Branch-and-Bound search
during an s-probing iteration to min{2/3 time limit,time limit−current time}.
This avoids investing all of the CPU time in the first probing iteration in cases
where the initial interval-size guess is too large.
Page 6
A probing algorithm for MINLP5
Algorithm 1 The Aggressive Probing algorithm.
Input: variable index i, time limit, size, max failures
Set s ← size, fail ← 0,
while current time < time limit and fail < max failures and xL
ℓ ← xL
i
Set u ← min{ℓ + s,xU
Execute limited-time Branch-and-Bound on Pi[ℓ,u]
if solution ¯ x found then
Set ¯ z ← min{¯ z,f(¯ x)}
if search complete then
if xi is integer constrained then
Set xL
else
Set xL
Set s ← 2s, fail ← 0
else
Set s ← s/2, fail ← fail + 1
i < xU
i do
i}; if xi is integer constrained, round u ← ⌊u⌋
i ← u + 1
i ← u
4Support Vector Machine for failure prediction
Applying Aggressive Probing to tighten all variables of even a moderately-
sized MINLP can take a considerable amount of CPU time. We would like to
avoid wasting time in trying to tighten the bounds of a variable for which there
is little hope of success.
Observe that if a probing subproblem Pi[ℓ,u] is not solved to optimality, the
CPU time invested in that probing iteration is wasted. To avoid this situation,
we suggest to use the degree of success in applying the fast FBBT algorithm on
Pi[ℓ,u] as a factor in deciding whether or not to run the expensive Aggressive
Probing algorithm. Note that FBBT is the first step of the Branch-and-Bound
algorithm used in Aggressive Probing, so this does not require additional
work. Our hypothesis is that if the constraint xi∈ [ℓ,u] used during an s-probing
iteration does not result in tighter bounds on other variables when FBBT is
used, then Pi[ℓ,u] is approximately as difficult as P, so the limited-time Branch-
and-Bound algorithm is likely to fail. This intuition is confirmed by empirical
tests: on 84 nontrivial MINLP instances taken from various sources, we perform
Aggressive Probing with max failures = 10 to tighten lower and upper
bound of all variables, processing them in the order in which they appear in
the problem, with a time limit of 1 minute per variable and a total time limit
of 1 hour per instance. We record, for each s-probing iteration, whether FBBT
is able to use the probing interval xi ∈ [ℓ,u] to tighten the bounds on other
variables. We observe that, in 587 cases out of 11,747, no stronger bounds are
obtained. In 528 of these 587 cases (90%), the subsequent Branch-and-Bound
search on Pi[ℓ,u] could not be completed within the time limit. Therefore, the
success of FBBT in using xi ∈ [ℓ,u] to tighten other variable bounds indeed
gives an indication of the difficulty of solving Pi[ℓ,u].
Page 7
6Nannicini et al.
Supported by this observation, we use the following strategy. Before applying
Branch-and-Bound to Pi[ℓ,u], we execute FBBT and compute a measure of the
bound reduction obtained on the variables xj, j ∈ N\{i}. Based on this measure,
we use an algorithm to decide whether to perform Branch-and-Bound on Pi[ℓ,u].
In the remainder of this section we discuss our choice of the bound reduction
measure and the decision method.
4.1Measuring the effect of FBBT
Several bound-reduction measures are possible. Because our aim is to save CPU
time, the bound-reduction measure computation should be fast. A simple way
of measuring the bound reduction obtained for the variables xj, j ∈ N \ {i}, is
to count the number of tightened variables, and for each of these, to compute
the interval reduction: γ(xj) = 1 − (˜ xU
are the vectors of variable lower (resp. upper) bounds after applying FBBT, and
xL, xUare those of the original problem P. (Infinity is treated like a number
in the following way: 1/∞ = 0,∞/2∞ = 0.5.) Hence, to quantify the bound
reduction associated with the application of FBBT on the problem Pi[ℓ,u], we
use a vector (η,ρ) ∈ [0,1]2, where η is the fraction of tightened variables, and ρ
is the average value of γ(xj) over all xjthat were successfully tightened.
j− ˜ xL
j)/(xU
j− xL
j), where ˜ xL(resp. ˜ xU)
4.2 Support Vector Machines
Once a vector (η,ρ) is computed for variable xi, the decision of whether to
perform Branch-and-Bound is taken by a predictor trained by a Support Vector
Machine (SVM). While it could be argued that SVM is not really required for
classifying our 2-dimensional data, we use SVM for three reasons. First, in our
experiments SVM performs better than a predictor based on a simple Gaussian
model for the data, see Section 5.2. Second, our future research efforts will utilize
additional input features besides (η,ρ) (see Section 6 for details); therefore, the
flexibility and extensibility of SVM is desirable. Finally, SVM is a parameterized
method allowing better control of the trade-off between Precision and Recall
of the classifier (see below for details) than simpler methods. Because we are
interested in a classifier with high Precision and good Recall, the ability to tune
the classifier is an advantage.
Next, we provide a brief description of the basic concepts behind SVM; see
[14,15] for a comprehensive introduction to the topic. Given training data D =
{(zi,yi) : zi ∈ Rp,yi ∈ {−1,1},i ∈ 1,...,q}, SVM seeks an affine hyperplane
that separates the points with label −1 from those with label 1 by the largest
amount, i.e., the width of the strip containing the hyperplane and no data point
in its interior is as large as possible.
In its simplest form, the associated optimization problem can be written as:
min
s.t.
?h?2
yi(h⊤zi− b) ≥ 1
h ∈ Rp,b ∈ R,
∀(zi,yi) ∈ D
(SVM)
Page 8
A probing algorithm for MINLP7
where the hyperplane is defined by h⊤z = b. Instead of seeking a separating
hyperplane in Rp, which may not exist, SVM implicitly maps each data point
into a higher dimensional feature space where linear separation may be possi-
ble. The mapping is implicit because we do not need explicit knowledge of the
feature space. In the optimization problem (SVM), we express the separating
hyperplane in terms of the training points zi(see e.g., [15]), and substitute the
dot-products between vectors in Rpwith a possibly nonlinear kernel function
K : Rp× Rp?→ R. The kernel function can be interpreted as the dot-product
in the higher-dimensional space. The separation hyperplane in the feature space
translates into a nonlinear separation surface in the original space Rp. Further-
more, SVM handles data that is not separable in the feature space by using a
soft margin, i.e., allowing the optimal separation hyperplane to misclassify some
points, imposing a penalty for each misclassification. The outcome of the SVM
training algorithm is a subset V of {zi : ∃y ∈ {−1,1} with (zi,y) ∈ D} with
corresponding scalar multipliers αv: v ∈ V , and a scalar b. The elements of V
are called support vectors. To classify a new data point w ∈ Rp, we compute the
value of?
of storing an SVM model depends on the number of support vectors |V |, and
the time required to classify a new data point depends on |V | and K.
Commonly used kernel functions are:
v∈VαvK(v,w)−b and use its sign to classify w. Hence, the complexity
– linear: K(u,v) = u⊤v,
– polynomial: K(u,v) = (λu⊤v + β)d,
– radial basis: K(u,v) = e−λ?u−v?2,
where λ,β and d are input parameters. Problem-specific kernel functions can
be devised as well. Another commonly adjusted tuning parameter is the mis-
classification cost ω, which determines the ratio between the penalty paid for
misclassifying an example of label 1 and the penalty paid for misclassifying an
example of label −1. The ratio ω can be adjusted to handle unbalanced data
sets where one class is much more frequent than the other.
4.3Aggressive probing failure prediction with SVM
In this section we assume that we have an SVM model trained on a data set
of the form D = {(ηi,ρi,yi) : (ηi,ρi) ∈ [0,1]2,yi∈ {−1,1},i = 1,...,q}, where
each point corresponds to an s-probing iteration, ηi, ρi are as defined in Sec-
tion 4.1, and yi= 1 if the limited-time Branch-and-Bound search applied to the
corresponding probing subproblem did not complete, yi= −1 otherwise. Gen-
erating the set D and computational experiments with model training will be
discussed in Section 5.
Given such an SVM model, we proceed as follows. At an s-probing iteration
corresponding to problem Pi[ℓ,u], we apply FBBT and compute the resulting
wj= (ηj,ρj) as described in Section 4.1. If wj= (0,0), FBBT could not tighten
the bounds on any variable; in this case, as discussed at the beginning of Sec-
tion 4, we do not execute Branch-and-Bound and continue to the subsequent
Page 9
8Nannicini et al.
probing iteration as if the s-probing iteration failed. If wj ?= (0,0), we predict
the label yj of wj using our SVM classifier. The Branch-and-Bound search on
Pi[ℓ,u] is thus executed only if the predicted label is yj = −1; otherwise, we
continue with the algorithm as if the s-probing iteration failed.
Note that we could apply the SVM classifier even on points of the form (0,0).
However, in our experiments this point was always labeled as 1 by the tested
SVM models, therefore we save CPU time by not running the SVM predictor.
Additionally, we exclude points (0,0,yi) from the data set D on which the model
is trained; this yields an additional advantage that will be discussed in Section 5.
5Computational experiments
We implemented Aggressive Probing within Couenne,an open-source Branch-
and-Bound solver for nonconvex MINLPs [11]. We are mainly interested in ap-
plying our probing technique to difficult instances P to improve a Branch-and-
Bound search; thus, in our implementation Aggressive Probing reuses as
much previously computed information as possible. The root node of each prob-
ing subproblem Pi[ℓ,u] is generated by modifying the root node of P, changing
variable bounds and possibly generating new linear inequalities to improve the
convexification, so that the problem instance is read and processed only once.
The branching strategy of Couenne was set to Strong Branching [5,16] in all
experiments.
We utilized LIBSVM [17], a library for Support Vector Machines written
in C. Given the availability of LIBSVM’s source code, it could be efficiently
integrated within Couenne for our tests. The experiments were conducted on a
2.6 GHz AMD Opteron 852 machine with 64GB of RAM, running Linux.
5.1Test instances
The test instances are a subset of MINLPLib [18], a freely available collection
of convex and nonconvex MINLPs. We excluded instances with more than 1,000
variables and instances for which the default version of Couenne took more than
2 minutes to process the root node, or ran into numerical problems. Additionally,
we excluded the instances for which Aggressive Probing was able to find the
optimal solution and provide an optimality certificate in less than 2 hours. These
are easy instances that can be quickly solved by default Couenne, therefore there
is no need for expensive bound-tightening methods. We are left with 32 instances,
which are listed in Table 1.
5.2 Training the SVM classifier
As a first step in training an SVM to classify failures of the probing algorithm,
we obtained a large-enough set of training examples. We used a superset of the
test problems described in Section 5.1, including some additional problems from
MINLPLib as well as problems from [19] with less than 1,000 variables, giving a
Page 10
A probing algorithm for MINLP
9
Without SVM
Probing + FBBT + Conv.
With SVM
Probing + FBBT + Conv.# vars
orig
401
115
115
145
145
145
145
147
181
181
181
181
181
183
987
10
11
115
113
115
145
181
384
169
36
49
64
813
216
196
71
196
197.41 388.28
ProbingProbing
Instance
csched2
fo7 2
fo7
fo8 ar2 1
fo8 ar25 1
fo8 ar3 1
fo8 ar5 1
fo8
fo9 ar2 1
fo9 ar25 1
fo9 ar3 1
fo9 ar4 1
fo9 ar5 1
fo9
lop97icx
nvs23
nvs24
o7 2
o7 ar4 1
o7
o8 ar4 1
o9 ar4 1
space25a
tln12
tln5
tln6
tln7
tls12
tls6
water4
waterx
waterz
Avg.
total Tght. % Red. % Tght. % Red. % Gap %
5822.005.05
25312.17 25.00
25310.4331.84
37146.9050.12
37146.9050.97
37151.7248.53
37149.6647.91
325 9.5224.80
46247.5149.45
46247.5149.57
46249.7250.20
46249.7248.85
462 49.7248.26
4068.2016.81
1393 0.000.00
6490.0092.07
76 90.9188.25
25319.1326.50
29054.8747.15
25317.3928.12
37159.3145.35
462 56.3545.76
50217.1938.16
36135.5035.67
8141.6733.33
10961.2228.33
141 43.75 25.60
128514.3967.42
3592.78 100.00
31925.5156.54
17414.08 25.38
31915.31 62.02
23.9243.53
Time Tght. % Red. % Tght. % Red. % Gap %
36474.71.25 4.74
11133.1 12.177.87
11129.910.43 5.87
16372.546.9049.99
16373.446.9050.22
16448.052.4146.31
16385.649.6648.06
14186.18.163.23
20435.247.5146.50
20439.247.5146.45
20433.249.7245.41
20443.449.7244.90
20437.1 49.7244.95
17546.6 8.202.50
0.00 120602.00.000.00
97.021080.890.0091.80
94.961201.090.9188.25
0.0011292.319.1326.50
0.0012918.854.8747.25
0.0011294.317.39 27.97
0.0016696.959.3145.80
0.0020717.956.3545.76
0.0036746.016.9336.64
0.0019243.335.5035.67
0.003997.841.6733.33
0.005532.161.2228.33
0.00 7285.543.75 25.60
0.00 129660.0 14.3967.42
0.0018514.3 2.78 100.00
41.5418305.927.55 60.80
34.837926.7 15.4923.08
21.4718637.714.8067.28
12.1122496.6 23.9040.58
Time
288.2
6731.2
5487.6
13494.5
13523.9
14474.2
14715.3
6107.0
13768.2
13821.3
17838.2
17977.8
18037.8
7021.7
21.1
1080.7
1201.2
10191.1
12918.4
6872.0
16697.2
20474.3
36704.2
17973.4
3997.8
5619.6
7281.3
3.44
13.04
9.88
38.81
38.81
46.63
42.59
8.92
39.18
39.18
42.42
42.42
42.42
7.88
0.00
100.00
100.00
19.37
48.28
16.21
52.29
49.13
32.67
37.95
62.96
71.56
64.54
32.14
25.07
40.13
32.18
25.71
30.50
7.12
43.62
57.82
47.23
47.70
41.42
43.50
55.21
46.19
46.16
44.66
43.70
43.29
52.66
0.00
98.41
97.47
46.38
47.16
53.76
47.52
45.96
30.61
34.10
44.86
41.24
32.90
48.11
45.60
50.49
20.77
55.10
45.65
97.76
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.20
13.04
9.88
38.81
38.81
48.25
42.59
8.31
39.18
39.18
42.42
42.42
42.42
7.88
0.00
100.00
100.00
19.37
48.28
16.21
52.29
49.13
29.28
37.95
62.96
71.56
64.54
32.14
25.07
41.69
34.48
24.14
30.32
17.50
35.37
43.28
47.15
47.24
39.41
43.60
46.29
44.30
44.18
41.80
41.48
41.44
45.12
0.00
98.38
97.47
46.38
47.22
53.67
47.81
45.96
32.22
34.10
44.86
41.24
32.90
48.11
45.60
53.47
19.36
60.07
44.59
97.76
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
97.10
94.96
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00 128592.0
0.0018401.0
50.4718013.8
34.83
29.3518617.8
12.6415494.3
7874.8
Table 1. Performance of Aggressive Probing, with and without failure prediction by SVM. We report, after probing (columns
“Probing”) and after applying FBBT and recomputing the convexification (columns “Probing + FBBT + Conv.”): the fraction of
tightened variables with finite bounds, and the average bound reduction. We also report the amount of optimality gap closed by the new
convexification, and the total probing time.
Page 11
10Nannicini et al.
total of 84 instances. We applied Aggressive Probing on all variables, with a
time limit of 30 seconds for each s-probing iteration, 1 minute for each variable
bound, and 2 hours per problem instance. We did not include data for s-probing
iterations started with less than 20 seconds of CPU time left within the time
limit, or iterations in which a feasible MINLP solution was discovered during the
Branch-and-Bound search. For the remaining s-probing iterations, we recorded
the values of (ηi,ρi) (see Section 4.1) and a label yi= 1 if the probing iteration
fails, yi= −1 if it succeeds. The reason for excluding s-probing iterations per-
formed with less than 20 seconds of CPU time left is that they are likely to fail
simply because they are not given enough time to complete, regardless of the
difficulty of the s-probing subproblem. Similarly, we excluded s-probing itera-
tions in which an improved solution was found, because such a discovery cannot
be predicted by only considering the s-probing subproblem, yet it can be used
to infer tighter variable bounds through FBBT, therefore making Pi[ℓ,u] easier
than initially estimated. This yields a data set D with q = 11,747 data points
that can be used for training, as explained in Section 4.3. Eliminating all points
with (ηi,ρi) = (0,0) (see end of Section 4.3) leaves 11,160 points, of which 4,186
have the label yi= 1. By removing the points (0,0) from the training set, the
number of support vectors in the final model is likely to be smaller.
It is known that SVM is very sensitive to its algorithm settings, hence a grid
search on a set of input parameters is typically applied in order to find the values
that yield the best performance on the input data. We tested three types of kernel
functions: linear, polynomial, and radial. For each of these kernel functions, we
performed grid search on the input parameters (see end of Section 4.2), using
the following values: λ = 2kwith k = −3,...,2, β = 2kwith k = −3,...,2,
d = 1,...,5, and ω = 2kwith k = −3,...,4. Each parameter was considered
only when appropriate; e.g., d was used for the polynomial kernel only. Overall,
we tested 1,057 combinations of input parameters.
In our first set of experiments pertaining to the training of an SVM on
D, we performed 3-fold cross validation. We trained the model on 2/3 of D,
and used the remaining 1/3 to estimate the performance of the model. The
resulting model consisted of 4,500 to 5,000 support vectors. Such a large number
of support vectors would yield a slow classifier and may indicate overfitting. To
obtain a model with fewer support vectors, we first attempted ν-classification
[14], without success. In the end, training the model on a small subset of the
full data set D was found to be very effective in reducing the number of support
vectors, without deterioration in the accuracy of the model; this approach has
been used before in the machine learning community [20].
With this setup, experiments for testing parameter values of the model were
performed as follows. We randomly selected 10 different training sets, each one
containing 1/10 of the full data set D. Each training experiment, corresponding
to a set of input parameter values, was performed on each of the 10 training
sets, and the performance of the resulting models was evaluated on the 9/10 of
D that was not used for training. To measure performance, we use Precision and
Page 12
A probing algorithm for MINLP11
0
20
40
60
80
100
0 20 40 60 80 100
Recall %
Precision %
polynomial
linear
radial basis
chosen model
Fig.1. Average values of Precision and Recall for all tested combinations of training
input parameters.
Recall, commonly defined as follows:
Precision: TP/(TP + FP),Recall: TP/(TP + FN),
where TP is the number of True Positives, i.e., the examples with label 1 that
are classified with label 1 by the model. FP is the number of False Positives,
i.e., the examples with label −1 that are classified with label 1. Finally, FN is
the number of False Negatives, i.e., the examples with label 1 that are classified
with label −1. Intuitively, Precision is the fraction of data points labeled 1 by
the classifier that are indeed of class 1, whereas Recall is the fraction of class 1
data points processed by the classifier that are correctly labeled as 1. Overall,
for each set of training parameters, we have 10 values for Precision and 10 values
for Recall. We compute the average and the standard deviation of these values,
and use them to choose the best set of parameters.
Results of this experiment are summarized in Figure 1. Each point represents
the average values of Precision and Recall corresponding to a set of parameter
values. When producing the figure, we eliminated points for which the standard
deviation of either Precision or Recall was more than 1/4 of its mean, because
these points correspond to experiments with unreliable results.
Figure 1 shows the trade-off between Precision and Recall that can be achieved
by varying the learning parameters. We are interested in the set of Pareto op-
tima with respect to these two criteria. Most Pareto optima are obtained with
a polynomial kernel, and the remaining with a radial-basis kernel. The linear
Page 13
12Nannicini et al.
kernel yields inferior results, implying that the data set is difficult to separate
in the original space. There are points with very high Precision (> 85%) but
low Recall that represent “conservative” classifiers: very few probing iterations
are labeled as 1 (failure), but in that case the classifier is almost always correct.
Such a classifier is of limited value for our purpose. We are more interested in
the region with roughly 80% Precision and 60% Recall: approximately 60% of
the unsuccessful probing iterations in the test set are predicted correctly, while
keeping good Precision. These models use a polynomial kernel with degree d ≥ 3
and ω = 1; additionally, we found that using β = 1,2, or 4 seems effective. These
models have between 500 and 800 support vectors. The standard deviation of
Precision and Recall for all models achieving a Pareto optimum is fairly small,
typically less than 2. Therefore, we can assume that the performance of the SVM
model does not depend heavily on the particular subset of D that is used for
training.
For comparison, we also fit a simple 2-dimensional Gaussian model. In this
model, each class is assumed to be normally distributed, and a 2-dimensional
Gaussian model is fit to each class using maximum-likelihood estimation. Then,
we classify points in the corresponding test set by computing the probability that
they are generated by the two normal distributions and by picking the class that
maximizes this probability. Over the 10 training/test sets, the Gaussian model
gives a classifier with mean Precision 47.90%, standard deviation 0.91, mean
Recall 87.47%, standard deviation 1.31. This performance is comparable to a
particular choice of parameters of SVM to obtain high Recall and low Precision.
Computational experiments (not reported in detail) demonstrate that using the
Gaussian model leads to weaker bound tightening compared to SVM, with no
saving of CPU time on average.
Based on these results, we use an SVM model trained with a polynomial
kernel of degree 4, λ = 4, β = 4, ω = 1 for the experiments in the remainder of
this section; this model has 580 support vectors yielding fast classification. Note
that 580 is almost half the size of the training set, suggesting that some overfit-
ting might occur. However, the model shows good performance on examples not
included in the training set.
5.3 Testing the probing algorithm
In this section, we discuss the effect of applying Aggressive Probing on a
set of difficult MINLPs. In addition to FBBT, we also use Optimality-Based
Bound Tightening (OBBT) [12]. This bound-tightening technique maximizes
and minimizes the value of each variable over the convexification computed by
Couenne at the root node, and uses the optimal values as variable bounds. For
each test instance, we first apply FBBT and OBBT. Then, for each variable, we
apply Aggressive Probing to tighten both the lower and upper bounds, with
a time limit of 60 seconds per variable, and 36 hours per instance. The parameter
max failures is set to 10. Variables are processed in the order in which they
are stored in Couenne. Note that Couenne uses a standardized representation
of the problem where extra variables, called auxiliary variables, are typically
Page 14
A probing algorithm for MINLP 13
added to represent expressions in the original problem formulation [5]. In our
experiments, to limit CPU time, OBBT and Aggressive Probing are applied
only to original variables; in principle, both can be applied to auxiliary variables
without modification.
After Aggressive Probing has been applied to all of the original variables
or the global time limit is reached, we record the fraction of tightened variables η,
and the average bound reduction ρ, as described in Section 4.1. Then, we apply
an additional iteration of FBBT to propagate the new bounds and generate
convexification inequalities. This gives a strengthened convexification C′of P
that is compared to the initial one, C. We record the fraction of variables for
which at least one bound could be tightened in C′, as well as the average bound
reduction ρ of the tightened variables. Additionally, we compute the percentage
of the optimality gap of C that is closed by C′, i.e., (z(C′)−z(C))/(z(P)−z(C)),
where z(C) is the optimal objective value of C, and z(P) is the value of the best
known solution for the particular instance. The value of z(P) for each instance
was obtained from the MINLPLib website.
Results are reported in Table 1. The fraction of tightened variables is relative
to the number of original variables for “Probing”. For “Probing + FBBT +
Conv.”, it is relative to the total number of variables, because auxiliary variables
can also be tightened after bound propagation through FBBT. The fraction of
tightened variables takes into account variables with finite bounds only. Infinite
variable bounds are tightened to a finite value only for the three water instances,
independent of whether SVM is used.
First, we discuss the effect of Aggressive Probing alone. Table 1 shows
that the effect of probing is problem-dependent; for example, for lop97icx, no
variable is tightened by our algorithm, and for nvs23 and nvs24, more than 90%
of the variables are tightened. On average, approximately 25% of the original
variables are tightened by Aggressive Probing, and after applying FBBT,
approximately 30% of the total number of variables (original plus auxiliary)
gained tighter bounds. The average bound reduction is close to 50%. The amount
of optimality gap closed by adding convexification inequalities after tightening
the bound is largely problem dependent as well. The new convexification is
much stronger for the water, nvs and csched2 instances, but for the remaining
instances, the optimality gap is unchanged. This is probably due to the geometry
of the initial convexification, for which the LP solution is extremely difficult to
cut off without branching, so that no optimality gap is closed by Aggressive
Probing. In summary, on all but one test instance, Aggressive Probing is
able to provide better variable bounds compared to traditional bound-tightening
techniques (FBBT followed by OBBT). This comes at a large computational cost,
but may be worth the effort for some difficult instances that cannot be solved
otherwise, or when parallel computing resources provide a large amount of CPU
power.
Comparing the Aggressive Probing algorithm with and without SVM
for failure prediction, we observe on average 30% of computing time saving
when using SVM, while the number of tightened variables and average bound
Page 15
14Nannicini et al.
# s-prob. iter.
Success Failure
1600 18131
1634
Without SVM
With SVM8998
Table 2. Number of successful and failed s-probing iterations recorded by applying
Aggressive Probing on the full test set of Table 1.
tightening is only slightly weaker. CPU time savings are problem dependent: the
difference can be huge (csched2a, lop97icx), or negligible. In only two cases
(nsv24 and tln6), using SVM for failure prediction results in an overall longer
probing time, but the increase is negligible. Summarizing, using an SVM model
to predict likely failures of the Aggressive Probing algorithm leads to CPU
time savings that depend on the problem instance at hand and are sometimes
very large, sometimes moderate, while variable bounds are tightened by almost
the same amount.
Table 2 reports the total number of successful and failed s-probing iterations
performed over all test instances. The use of an SVM classifier decreases the
number of failed s-probing iterations by a factor two, and increases the percent-
age of successful s-probing iterations from 8% to 15%. These improvements come
at essentially no cost.
5.4Branch-and-Bound after probing
The main purpose of a bound-tightening technique is to improve the performance
of a Branch-and-Bound search. In this section, we report Branch-and-Bound ex-
periments with and without Aggressive Probing on a few selected instances.
Table 1 indicates that the probing algorithm proposed in this paper may be
effective on the three water instances. Therefore, we execute the Branch-and-
Bound algorithm of Couenne on these instances with a time limit of 24 hours,
using the variable bounds obtained after applying FBBT and OBBT at the root
node. Then we perform the same experiment using the variable bounds provided
by Aggressive Probing with SVM for failure prediction. Results are reported
in Table 3, where we include the time spent by probing in the total CPU time.
The water4 instance is solved with and without Aggressive Probing;
Branch-and-Bound without probing is 30% faster, but it explores 20 times as
many nodes. Thus, probing is very effective in reducing the size of the enumer-
ation tree. The waterx instance remains unsolved after 24 hours. However, em-
ploying Aggressive Probing yields a much better lower bound when the time
limit is reached (we close an additional 37% of optimality gap). Finally, waterz
is not solved by Branch-and-Bound unless Aggressive Probing is used. Due
to tighter variable bounds, we can solve the instance to optimality in approx-
imately 12 hours, whereas it is unsolved in 24 hours (with 1.2 million active
nodes and 23% optimality gap left) if Aggressive Probing is not employed.
To the best of our knowledge, an optimality certificate for the solutions to the