Page 1

RC25103 (W1101-112) January 28, 2011

Computer Science

IBM Research Report

A Probing Algorithm for MINLP with Failure

Prediction by SVM

Giacomo Nannicini1, Pietro Belotti2, Jon Lee3, Jeff Linderoth4,

François Margot1, Andreas Wächter3

1Tepper School of Business

Carnegie Mellon University

Pittsburgh, PA

2Department of Mathematical Sciences

Clemson University

Clemson, SC

3IBM Research Division

Thomas J. Watson Research Center

P.O. Box 218

Yorktown Heights, NY 10598

4Industrial and Systems Engineering

University of Wisconsin-Madison

Madison, WI

Research Division

Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich

LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research

Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific

requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P.

O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home .

Page 2

A probing algorithm for MINLP

with failure prediction by SVM

Giacomo Nannicini1⋆, Pietro Belotti2, Jon Lee3,

Jeff Linderoth4⋆⋆, Fran¸ cois Margot1⋆ ⋆ ⋆, Andreas W¨ achter3

1Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA

{nannicin,fmargot}@andrew.cmu.edu

2Dept. of Mathematical Sciences, Clemson University, Clemson, SC

pbelott@clemson.edu

3IBM T. J. Watson Research Center, Yorktown Heights, NY

{jonlee,andreasw}@us.ibm.com

4Industrial and Systems Eng., University of Wisconsin-Madison, Madison, WI

linderoth@wisc.edu

Abstract. Bound tightening is an important component of algorithms

for solving nonconvex Mixed Integer Nonlinear Programs. A probing al-

gorithm is a bound-tightening procedure that explores the consequences

of restricting a variable to a subinterval with the goal of tightening its

bounds. We propose a variant of probing where exploration is based on it-

eratively applying a truncated Branch-and-Bound algorithm. As this ap-

proach is computationally expensive, we use a Support-Vector-Machine

classifier to infer whether or not the probing algorithm should be used.

Computational experiments demonstrate that the use of this classifier

saves a substantial amount of CPU time at the cost of a marginally

weaker bound tightening.

1Introduction

A Mixed Integer Nonlinear Program (MINLP) is a mathematical program with

continuous nonlinear objective and constraints, where some of the variables are

required to take integer values. Without loss of generality, we assume that the

problem is a minimization problem. MINLPs naturally arise in numerous applied

problems, see e.g. [1,2]. In this paper, we address nonconvex MINLPs where

neither the objective function nor the constraints are required to be convex —

a class of problems typically difficult to solve in practice. An exact solution

method for nonconvex MINLPs is Branch-and-Bound [3], where lower bounds

are obtained by convexifying the feasible region using under-estimators, often

linear inequalities [4,5]. The convexification depends on the variable bounds,

⋆Supported by an IBM grant and by NSF grant OCI-0750826.

⋆⋆Supported by U.S. Department of Energy grant DE-FG02-08ER25861 and by NSF

grant CCF-0830153.

⋆ ⋆ ⋆Supported by NSF grant OCI-0750826.

Page 3

2Nannicini et al.

with tighter bounds resulting generally in a tighter convexification. As such,

bound tightening is an important part of any MINLP solver.

Probing is a bound-tightening technique often applied to Mixed Integer Lin-

ear Programs(MILPs) [6]. The idea is to tentatively fix a binary variable to 0 and

then to 1, and use the information obtained to strengthen the linear relaxation

of the problem. Similar techniques have been applied to MINLPs as well [5].

In this paper, we propose a probing technique based on truncated Branch-and-

Bound searches. Let ¯ z be the objective value of the best solution of the original

problem found so far. In each Branch-and-Bound search, we choose a variable,

say xi, and impose xi∈ S, where S is a subinterval of the current domain of xi.

In addition, we add a constraint bounding the objective value of the solution to

at most ¯ z. If that problem is infeasible, we can discard S from the domain of xi.

On the other hand, if we are able to solve the modified problem to optimality,

with an optimal value ¯ z∗< ¯ z, we update ¯ z and can again discard S from the

domain of xi. Details on the choice of xiand S are given in Section 3.

This probing algorithm potentially requires a significant amount of CPU

time. To limit this drawback, we use a Support Vector Machine (SVM) classifier

[7] before performing a Branch-and-Bound search, to predict the success or fail-

ure of the search. If we conclude that the probing algorithm is unlikely to tighten

the bounds on the variable, we skip its application. Machine learning methods

have been used in the OR community for various tasks, such as parameter tun-

ing [8] and solver selection [9]. In this paper, machine learning is used to predict

failures of an algorithm based on characteristics of its input data. The features

on which the SVM prediction is based are problem and subinterval dependent,

and are related to the outcome of the application of a fast bound-tightening

technique (Feasibility-Based Bound Tightening [5]) using the same subinterval.

We provide preliminary computational results to assess the practical effi-

ciency of the approach. The experiments show that the proposed probing algo-

rithm is very effective in tightening the variable bounds, and it is helpful for

solving MINLPs with Branch-and-Bound. By using SVM to predict failures of

the probing algorithm, we save on average 30% of the total bound-tightening

time, without much deterioration of the quality of the bounds.

The rest of this paper is organized as follows. In Section 2, we introduce

the necessary background. In Section 3, we describe the probing algorithm. In

Section 4, we discuss how we can integrate a machine learning method in our

algorithm to save CPU time. In Section 5, we provide computational testing of

the proposed ideas and Section 6 has conclusions.

2Background

A function is factorable if it can be computed in a finite number of simple steps,

starting with model variables and real constants, using elementary unary and

Page 4

A probing algorithm for MINLP3

binary operators. We consider an MINLP of the form:

min

s.t.

f(x)

gj(x) ≤ 0

i≤ xi≤ xU

f(x) ≤ ¯ z

xi∈ Z ∀i ∈ NI,

∀j ∈ M

∀i ∈ NxL

i

P

where f and gj are factorable functions, N = {1,...,n} is the set of variable

indices, M = {1,...,m} is the set of constraint indices, x ∈ Rnis the vector

of variables with lower/upper bounds xL∈ (R ∪ {−∞})n, xU∈ (R ∪ {+∞})n,

and ¯ z is an upper bound on the optimal objective value, which can be infinite.

The variables with indices in NI⊂ N are constrained to take on integer values

in the solution.

A Linear Programming (LP) based Branch-and-Bound algorithm can be used

to solve P [4]. In such a method, subproblems of P are generated by restricting

the variables to reduced interval domains, [¯ xL, ¯ xU] ⊂ [xL,xU]. A key step is the

creation of an LP relaxation of the feasible region of a subproblem, which we

refer to as convexification. This convexification is used to obtain a lower bound

on the optimal objective value of the subproblem. In general, the tighter the

variable bounds, the tighter the convexification, and the stronger the resulting

lower bound. Therefore, bound-tightening techniques aim to deduce improved

variable bounds implied by the constraint structure of the subproblem, and are

widely used by existing software, such as Baron [10] and Couenne [11], for the

solution of MINLPs.

A commonly used bound-tightening procedure is Feasibility-Based Bound

Tightening (FBBT), which uses a symbolic representation of the problem in

order to propagate bound changes on a variable to other variables. For instance,

suppose that P contains the equation x3= x1+ x2, with variable bounds x1∈

[0,1], x2 ∈ [0,3], x3 ∈ [0,4]; if we tighten the bounds on x2 and restrict this

variable to the interval [1,2], then we can propagate the change to x3and impose

x3∈ [1,3]. A full description of FBBT can be found in [12,13].

The other aspects of the Branch-and-Bound algorithm are similar to those

of any Branch-and-Bound for solving MILPs; see [5] for more details.

3The probing algorithm

In this section we describe the probing algorithm to increase the lower bound

on variable xi, where the current bounds on that variable are xi ∈ [xL

with xL

i = −∞ is treated below. The probing

algorithm for decreasing the upper bound is similar. For simplicity, we describe

the procedure applied to the root node P.

Let ℓ and u be such that xL

obtained from P by adding the constraint xi ∈ [ℓ,u]. For s > 0, an s-probing

iteration for xiconsists of the following: set ℓ = xL

perform a Branch-and-Bound search on Pi[ℓ,u] with a time limit. If we have then

i,xU

i]

i > −∞. The special case xL

i≤ ℓ ≤ u ≤ xU

i. We denote by Pi[ℓ,u] the problem

i, u = min{xL

i+ s,xU

i}, and

Page 5

4Nannicini et al.

proved that Pi[ℓ,u] is infeasible, we update the lower bound xL

able to solve Pi[ℓ,u] to optimality, finding a solution with objective value z∗, we

update the best incumbent value ¯ z ← z∗and the lower bound xL

cases, the s-probing iteration is deemed a success. Otherwise, it is a failure.

i← u. If we are

i← u. In both

The Aggressive Probing algorithm for variable xi(see Algorithm 1) has

an initial value for s as input and runs an s-probing iteration. While an exit

condition is not met, if the s-probing iteration is successful, the value of s is

doubled and a new s-probing iteration is executed. If an s-probing iteration

fails, the value of s is halved and a new s-probing iteration is performed.

For integer variables, we round the probing interval endpoints appropriately.

Additionally, if the search on Pi[ℓ,u] is completed and u is integer, we set xL

u + 1 instead of xL

i← u.

With sufficiently large time limits, Algorithm 1 will provide an optimality

certificate if ¯ z is the optimal objective value. If run to completion, the algorithm

proves that no better solution exists with xiin the interval [xL

The special case xL

i= −∞ is handled as follows. Define a positive number

B. We perform a Branch-and-Bound search on Pi[−∞,−B]. If Pi[−∞,−B] is

proved infeasible or solved to optimality within the time limit, we set xL

and execute Algorithm 1. Otherwise, we conclude that xL

In our experiments, we use B = 1010.

i←

i,xU

i].

i← −B

icannot be tightened.

Two details of the algorithm still need to be specified: the exit condition and

the initial choice of s. We use two exit conditions: a maximum CPU time for the

application of Aggressive Probing, and a maximum number of consecutive

failed s-probing iterations. We experimented with several choices for the initial

value of s that took into account the distance between the variable bounds and

the solution of the LP relaxation (or a feasible solution to P, if available). But

the results were not better than a simpler method that seems to work well: the

initial value of s is chosen to be a small, fixed value size, depending on the

variable type. In our experiments, we use size = 0.5 for continuous variables,

size = 1.0 for general integer variables, and size = 0.0 for binary variables.

The good performance of the update strategy and the initial choice for the

value of s lies in the dynamic and geometric adjustment of the interval length

during Aggressive Probing. If the initial interval can easily be proven infea-

sible, the s-probing iteration will terminate very quickly, typically with a single

application of FBBT, or at the root node by solving the LP relaxation. In this

case, because the interval size is increased in a geometric fashion, in a few it-

erations we will reach the scale that is needed for the probing interval to be

“not trivially infeasible”. On the other hand, using a small interval size yields

better chances of completing the s-probing iteration within the time limit, in

case Pi[ℓ,u] is difficult to solve even with u close to ℓ.

In our experiments, we set the time limit for the Branch-and-Bound search

during an s-probing iteration to min{2/3 time limit,time limit−current time}.

This avoids investing all of the CPU time in the first probing iteration in cases

where the initial interval-size guess is too large.

Page 6

A probing algorithm for MINLP5

Algorithm 1 The Aggressive Probing algorithm.

Input: variable index i, time limit, size, max failures

Set s ← size, fail ← 0,

while current time < time limit and fail < max failures and xL

ℓ ← xL

i

Set u ← min{ℓ + s,xU

Execute limited-time Branch-and-Bound on Pi[ℓ,u]

if solution ¯ x found then

Set ¯ z ← min{¯ z,f(¯ x)}

if search complete then

if xi is integer constrained then

Set xL

else

Set xL

Set s ← 2s, fail ← 0

else

Set s ← s/2, fail ← fail + 1

i < xU

i do

i}; if xi is integer constrained, round u ← ⌊u⌋

i ← u + 1

i ← u

4Support Vector Machine for failure prediction

Applying Aggressive Probing to tighten all variables of even a moderately-

sized MINLP can take a considerable amount of CPU time. We would like to

avoid wasting time in trying to tighten the bounds of a variable for which there

is little hope of success.

Observe that if a probing subproblem Pi[ℓ,u] is not solved to optimality, the

CPU time invested in that probing iteration is wasted. To avoid this situation,

we suggest to use the degree of success in applying the fast FBBT algorithm on

Pi[ℓ,u] as a factor in deciding whether or not to run the expensive Aggressive

Probing algorithm. Note that FBBT is the first step of the Branch-and-Bound

algorithm used in Aggressive Probing, so this does not require additional

work. Our hypothesis is that if the constraint xi∈ [ℓ,u] used during an s-probing

iteration does not result in tighter bounds on other variables when FBBT is

used, then Pi[ℓ,u] is approximately as difficult as P, so the limited-time Branch-

and-Bound algorithm is likely to fail. This intuition is confirmed by empirical

tests: on 84 nontrivial MINLP instances taken from various sources, we perform

Aggressive Probing with max failures = 10 to tighten lower and upper

bound of all variables, processing them in the order in which they appear in

the problem, with a time limit of 1 minute per variable and a total time limit

of 1 hour per instance. We record, for each s-probing iteration, whether FBBT

is able to use the probing interval xi ∈ [ℓ,u] to tighten the bounds on other

variables. We observe that, in 587 cases out of 11,747, no stronger bounds are

obtained. In 528 of these 587 cases (90%), the subsequent Branch-and-Bound

search on Pi[ℓ,u] could not be completed within the time limit. Therefore, the

success of FBBT in using xi ∈ [ℓ,u] to tighten other variable bounds indeed

gives an indication of the difficulty of solving Pi[ℓ,u].

Page 7

6Nannicini et al.

Supported by this observation, we use the following strategy. Before applying

Branch-and-Bound to Pi[ℓ,u], we execute FBBT and compute a measure of the

bound reduction obtained on the variables xj, j ∈ N\{i}. Based on this measure,

we use an algorithm to decide whether to perform Branch-and-Bound on Pi[ℓ,u].

In the remainder of this section we discuss our choice of the bound reduction

measure and the decision method.

4.1Measuring the effect of FBBT

Several bound-reduction measures are possible. Because our aim is to save CPU

time, the bound-reduction measure computation should be fast. A simple way

of measuring the bound reduction obtained for the variables xj, j ∈ N \ {i}, is

to count the number of tightened variables, and for each of these, to compute

the interval reduction: γ(xj) = 1 − (˜ xU

are the vectors of variable lower (resp. upper) bounds after applying FBBT, and

xL, xUare those of the original problem P. (Infinity is treated like a number

in the following way: 1/∞ = 0,∞/2∞ = 0.5.) Hence, to quantify the bound

reduction associated with the application of FBBT on the problem Pi[ℓ,u], we

use a vector (η,ρ) ∈ [0,1]2, where η is the fraction of tightened variables, and ρ

is the average value of γ(xj) over all xjthat were successfully tightened.

j− ˜ xL

j)/(xU

j− xL

j), where ˜ xL(resp. ˜ xU)

4.2 Support Vector Machines

Once a vector (η,ρ) is computed for variable xi, the decision of whether to

perform Branch-and-Bound is taken by a predictor trained by a Support Vector

Machine (SVM). While it could be argued that SVM is not really required for

classifying our 2-dimensional data, we use SVM for three reasons. First, in our

experiments SVM performs better than a predictor based on a simple Gaussian

model for the data, see Section 5.2. Second, our future research efforts will utilize

additional input features besides (η,ρ) (see Section 6 for details); therefore, the

flexibility and extensibility of SVM is desirable. Finally, SVM is a parameterized

method allowing better control of the trade-off between Precision and Recall

of the classifier (see below for details) than simpler methods. Because we are

interested in a classifier with high Precision and good Recall, the ability to tune

the classifier is an advantage.

Next, we provide a brief description of the basic concepts behind SVM; see

[14,15] for a comprehensive introduction to the topic. Given training data D =

{(zi,yi) : zi ∈ Rp,yi ∈ {−1,1},i ∈ 1,...,q}, SVM seeks an affine hyperplane

that separates the points with label −1 from those with label 1 by the largest

amount, i.e., the width of the strip containing the hyperplane and no data point

in its interior is as large as possible.

In its simplest form, the associated optimization problem can be written as:

min

s.t.

?h?2

yi(h⊤zi− b) ≥ 1

h ∈ Rp,b ∈ R,

∀(zi,yi) ∈ D

(SVM)

Page 8

A probing algorithm for MINLP7

where the hyperplane is defined by h⊤z = b. Instead of seeking a separating

hyperplane in Rp, which may not exist, SVM implicitly maps each data point

into a higher dimensional feature space where linear separation may be possi-

ble. The mapping is implicit because we do not need explicit knowledge of the

feature space. In the optimization problem (SVM), we express the separating

hyperplane in terms of the training points zi(see e.g., [15]), and substitute the

dot-products between vectors in Rpwith a possibly nonlinear kernel function

K : Rp× Rp?→ R. The kernel function can be interpreted as the dot-product

in the higher-dimensional space. The separation hyperplane in the feature space

translates into a nonlinear separation surface in the original space Rp. Further-

more, SVM handles data that is not separable in the feature space by using a

soft margin, i.e., allowing the optimal separation hyperplane to misclassify some

points, imposing a penalty for each misclassification. The outcome of the SVM

training algorithm is a subset V of {zi : ∃y ∈ {−1,1} with (zi,y) ∈ D} with

corresponding scalar multipliers αv: v ∈ V , and a scalar b. The elements of V

are called support vectors. To classify a new data point w ∈ Rp, we compute the

value of?

of storing an SVM model depends on the number of support vectors |V |, and

the time required to classify a new data point depends on |V | and K.

Commonly used kernel functions are:

v∈VαvK(v,w)−b and use its sign to classify w. Hence, the complexity

– linear: K(u,v) = u⊤v,

– polynomial: K(u,v) = (λu⊤v + β)d,

– radial basis: K(u,v) = e−λ?u−v?2,

where λ,β and d are input parameters. Problem-specific kernel functions can

be devised as well. Another commonly adjusted tuning parameter is the mis-

classification cost ω, which determines the ratio between the penalty paid for

misclassifying an example of label 1 and the penalty paid for misclassifying an

example of label −1. The ratio ω can be adjusted to handle unbalanced data

sets where one class is much more frequent than the other.

4.3Aggressive probing failure prediction with SVM

In this section we assume that we have an SVM model trained on a data set

of the form D = {(ηi,ρi,yi) : (ηi,ρi) ∈ [0,1]2,yi∈ {−1,1},i = 1,...,q}, where

each point corresponds to an s-probing iteration, ηi, ρi are as defined in Sec-

tion 4.1, and yi= 1 if the limited-time Branch-and-Bound search applied to the

corresponding probing subproblem did not complete, yi= −1 otherwise. Gen-

erating the set D and computational experiments with model training will be

discussed in Section 5.

Given such an SVM model, we proceed as follows. At an s-probing iteration

corresponding to problem Pi[ℓ,u], we apply FBBT and compute the resulting

wj= (ηj,ρj) as described in Section 4.1. If wj= (0,0), FBBT could not tighten

the bounds on any variable; in this case, as discussed at the beginning of Sec-

tion 4, we do not execute Branch-and-Bound and continue to the subsequent

Page 9

8Nannicini et al.

probing iteration as if the s-probing iteration failed. If wj ?= (0,0), we predict

the label yj of wj using our SVM classifier. The Branch-and-Bound search on

Pi[ℓ,u] is thus executed only if the predicted label is yj = −1; otherwise, we

continue with the algorithm as if the s-probing iteration failed.

Note that we could apply the SVM classifier even on points of the form (0,0).

However, in our experiments this point was always labeled as 1 by the tested

SVM models, therefore we save CPU time by not running the SVM predictor.

Additionally, we exclude points (0,0,yi) from the data set D on which the model

is trained; this yields an additional advantage that will be discussed in Section 5.

5Computational experiments

We implemented Aggressive Probing within Couenne,an open-source Branch-

and-Bound solver for nonconvex MINLPs [11]. We are mainly interested in ap-

plying our probing technique to difficult instances P to improve a Branch-and-

Bound search; thus, in our implementation Aggressive Probing reuses as

much previously computed information as possible. The root node of each prob-

ing subproblem Pi[ℓ,u] is generated by modifying the root node of P, changing

variable bounds and possibly generating new linear inequalities to improve the

convexification, so that the problem instance is read and processed only once.

The branching strategy of Couenne was set to Strong Branching [5,16] in all

experiments.

We utilized LIBSVM [17], a library for Support Vector Machines written

in C. Given the availability of LIBSVM’s source code, it could be efficiently

integrated within Couenne for our tests. The experiments were conducted on a

2.6 GHz AMD Opteron 852 machine with 64GB of RAM, running Linux.

5.1Test instances

The test instances are a subset of MINLPLib [18], a freely available collection

of convex and nonconvex MINLPs. We excluded instances with more than 1,000

variables and instances for which the default version of Couenne took more than

2 minutes to process the root node, or ran into numerical problems. Additionally,

we excluded the instances for which Aggressive Probing was able to find the

optimal solution and provide an optimality certificate in less than 2 hours. These

are easy instances that can be quickly solved by default Couenne, therefore there

is no need for expensive bound-tightening methods. We are left with 32 instances,

which are listed in Table 1.

5.2 Training the SVM classifier

As a first step in training an SVM to classify failures of the probing algorithm,

we obtained a large-enough set of training examples. We used a superset of the

test problems described in Section 5.1, including some additional problems from

MINLPLib as well as problems from [19] with less than 1,000 variables, giving a

Page 10

A probing algorithm for MINLP

9

Without SVM

Probing + FBBT + Conv.

With SVM

Probing + FBBT + Conv.# vars

orig

401

115

115

145

145

145

145

147

181

181

181

181

181

183

987

10

11

115

113

115

145

181

384

169

36

49

64

813

216

196

71

196

197.41 388.28

ProbingProbing

Instance

csched2

fo7 2

fo7

fo8 ar2 1

fo8 ar25 1

fo8 ar3 1

fo8 ar5 1

fo8

fo9 ar2 1

fo9 ar25 1

fo9 ar3 1

fo9 ar4 1

fo9 ar5 1

fo9

lop97icx

nvs23

nvs24

o7 2

o7 ar4 1

o7

o8 ar4 1

o9 ar4 1

space25a

tln12

tln5

tln6

tln7

tls12

tls6

water4

waterx

waterz

Avg.

total Tght. % Red. % Tght. % Red. % Gap %

5822.005.05

25312.17 25.00

25310.4331.84

37146.9050.12

37146.9050.97

37151.7248.53

37149.6647.91

325 9.5224.80

46247.5149.45

46247.5149.57

46249.7250.20

46249.7248.85

462 49.7248.26

4068.2016.81

1393 0.000.00

6490.0092.07

76 90.9188.25

25319.1326.50

29054.8747.15

25317.3928.12

37159.3145.35

462 56.3545.76

50217.1938.16

36135.5035.67

8141.6733.33

10961.2228.33

141 43.75 25.60

128514.3967.42

3592.78 100.00

31925.5156.54

17414.08 25.38

31915.31 62.02

23.9243.53

Time Tght. % Red. % Tght. % Red. % Gap %

36474.71.25 4.74

11133.1 12.177.87

11129.910.43 5.87

16372.546.9049.99

16373.446.9050.22

16448.052.4146.31

16385.649.6648.06

14186.18.163.23

20435.247.5146.50

20439.247.5146.45

20433.249.7245.41

20443.449.7244.90

20437.1 49.7244.95

17546.6 8.202.50

0.00 120602.00.000.00

97.021080.890.0091.80

94.961201.090.9188.25

0.0011292.319.1326.50

0.0012918.854.8747.25

0.0011294.317.39 27.97

0.0016696.959.3145.80

0.0020717.956.3545.76

0.0036746.016.9336.64

0.0019243.335.5035.67

0.003997.841.6733.33

0.005532.161.2228.33

0.00 7285.543.75 25.60

0.00 129660.0 14.3967.42

0.0018514.3 2.78 100.00

41.5418305.927.55 60.80

34.837926.7 15.4923.08

21.4718637.714.8067.28

12.1122496.6 23.9040.58

Time

288.2

6731.2

5487.6

13494.5

13523.9

14474.2

14715.3

6107.0

13768.2

13821.3

17838.2

17977.8

18037.8

7021.7

21.1

1080.7

1201.2

10191.1

12918.4

6872.0

16697.2

20474.3

36704.2

17973.4

3997.8

5619.6

7281.3

3.44

13.04

9.88

38.81

38.81

46.63

42.59

8.92

39.18

39.18

42.42

42.42

42.42

7.88

0.00

100.00

100.00

19.37

48.28

16.21

52.29

49.13

32.67

37.95

62.96

71.56

64.54

32.14

25.07

40.13

32.18

25.71

30.50

7.12

43.62

57.82

47.23

47.70

41.42

43.50

55.21

46.19

46.16

44.66

43.70

43.29

52.66

0.00

98.41

97.47

46.38

47.16

53.76

47.52

45.96

30.61

34.10

44.86

41.24

32.90

48.11

45.60

50.49

20.77

55.10

45.65

97.76

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

1.20

13.04

9.88

38.81

38.81

48.25

42.59

8.31

39.18

39.18

42.42

42.42

42.42

7.88

0.00

100.00

100.00

19.37

48.28

16.21

52.29

49.13

29.28

37.95

62.96

71.56

64.54

32.14

25.07

41.69

34.48

24.14

30.32

17.50

35.37

43.28

47.15

47.24

39.41

43.60

46.29

44.30

44.18

41.80

41.48

41.44

45.12

0.00

98.38

97.47

46.38

47.22

53.67

47.81

45.96

32.22

34.10

44.86

41.24

32.90

48.11

45.60

53.47

19.36

60.07

44.59

97.76

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

97.10

94.96

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00 128592.0

0.0018401.0

50.4718013.8

34.83

29.3518617.8

12.6415494.3

7874.8

Table 1. Performance of Aggressive Probing, with and without failure prediction by SVM. We report, after probing (columns

“Probing”) and after applying FBBT and recomputing the convexification (columns “Probing + FBBT + Conv.”): the fraction of

tightened variables with finite bounds, and the average bound reduction. We also report the amount of optimality gap closed by the new

convexification, and the total probing time.

Page 11

10Nannicini et al.

total of 84 instances. We applied Aggressive Probing on all variables, with a

time limit of 30 seconds for each s-probing iteration, 1 minute for each variable

bound, and 2 hours per problem instance. We did not include data for s-probing

iterations started with less than 20 seconds of CPU time left within the time

limit, or iterations in which a feasible MINLP solution was discovered during the

Branch-and-Bound search. For the remaining s-probing iterations, we recorded

the values of (ηi,ρi) (see Section 4.1) and a label yi= 1 if the probing iteration

fails, yi= −1 if it succeeds. The reason for excluding s-probing iterations per-

formed with less than 20 seconds of CPU time left is that they are likely to fail

simply because they are not given enough time to complete, regardless of the

difficulty of the s-probing subproblem. Similarly, we excluded s-probing itera-

tions in which an improved solution was found, because such a discovery cannot

be predicted by only considering the s-probing subproblem, yet it can be used

to infer tighter variable bounds through FBBT, therefore making Pi[ℓ,u] easier

than initially estimated. This yields a data set D with q = 11,747 data points

that can be used for training, as explained in Section 4.3. Eliminating all points

with (ηi,ρi) = (0,0) (see end of Section 4.3) leaves 11,160 points, of which 4,186

have the label yi= 1. By removing the points (0,0) from the training set, the

number of support vectors in the final model is likely to be smaller.

It is known that SVM is very sensitive to its algorithm settings, hence a grid

search on a set of input parameters is typically applied in order to find the values

that yield the best performance on the input data. We tested three types of kernel

functions: linear, polynomial, and radial. For each of these kernel functions, we

performed grid search on the input parameters (see end of Section 4.2), using

the following values: λ = 2kwith k = −3,...,2, β = 2kwith k = −3,...,2,

d = 1,...,5, and ω = 2kwith k = −3,...,4. Each parameter was considered

only when appropriate; e.g., d was used for the polynomial kernel only. Overall,

we tested 1,057 combinations of input parameters.

In our first set of experiments pertaining to the training of an SVM on

D, we performed 3-fold cross validation. We trained the model on 2/3 of D,

and used the remaining 1/3 to estimate the performance of the model. The

resulting model consisted of 4,500 to 5,000 support vectors. Such a large number

of support vectors would yield a slow classifier and may indicate overfitting. To

obtain a model with fewer support vectors, we first attempted ν-classification

[14], without success. In the end, training the model on a small subset of the

full data set D was found to be very effective in reducing the number of support

vectors, without deterioration in the accuracy of the model; this approach has

been used before in the machine learning community [20].

With this setup, experiments for testing parameter values of the model were

performed as follows. We randomly selected 10 different training sets, each one

containing 1/10 of the full data set D. Each training experiment, corresponding

to a set of input parameter values, was performed on each of the 10 training

sets, and the performance of the resulting models was evaluated on the 9/10 of

D that was not used for training. To measure performance, we use Precision and

Page 12

A probing algorithm for MINLP11

0

20

40

60

80

100

0 20 40 60 80 100

Recall %

Precision %

polynomial

linear

radial basis

chosen model

Fig.1. Average values of Precision and Recall for all tested combinations of training

input parameters.

Recall, commonly defined as follows:

Precision: TP/(TP + FP),Recall: TP/(TP + FN),

where TP is the number of True Positives, i.e., the examples with label 1 that

are classified with label 1 by the model. FP is the number of False Positives,

i.e., the examples with label −1 that are classified with label 1. Finally, FN is

the number of False Negatives, i.e., the examples with label 1 that are classified

with label −1. Intuitively, Precision is the fraction of data points labeled 1 by

the classifier that are indeed of class 1, whereas Recall is the fraction of class 1

data points processed by the classifier that are correctly labeled as 1. Overall,

for each set of training parameters, we have 10 values for Precision and 10 values

for Recall. We compute the average and the standard deviation of these values,

and use them to choose the best set of parameters.

Results of this experiment are summarized in Figure 1. Each point represents

the average values of Precision and Recall corresponding to a set of parameter

values. When producing the figure, we eliminated points for which the standard

deviation of either Precision or Recall was more than 1/4 of its mean, because

these points correspond to experiments with unreliable results.

Figure 1 shows the trade-off between Precision and Recall that can be achieved

by varying the learning parameters. We are interested in the set of Pareto op-

tima with respect to these two criteria. Most Pareto optima are obtained with

a polynomial kernel, and the remaining with a radial-basis kernel. The linear

Page 13

12Nannicini et al.

kernel yields inferior results, implying that the data set is difficult to separate

in the original space. There are points with very high Precision (> 85%) but

low Recall that represent “conservative” classifiers: very few probing iterations

are labeled as 1 (failure), but in that case the classifier is almost always correct.

Such a classifier is of limited value for our purpose. We are more interested in

the region with roughly 80% Precision and 60% Recall: approximately 60% of

the unsuccessful probing iterations in the test set are predicted correctly, while

keeping good Precision. These models use a polynomial kernel with degree d ≥ 3

and ω = 1; additionally, we found that using β = 1,2, or 4 seems effective. These

models have between 500 and 800 support vectors. The standard deviation of

Precision and Recall for all models achieving a Pareto optimum is fairly small,

typically less than 2. Therefore, we can assume that the performance of the SVM

model does not depend heavily on the particular subset of D that is used for

training.

For comparison, we also fit a simple 2-dimensional Gaussian model. In this

model, each class is assumed to be normally distributed, and a 2-dimensional

Gaussian model is fit to each class using maximum-likelihood estimation. Then,

we classify points in the corresponding test set by computing the probability that

they are generated by the two normal distributions and by picking the class that

maximizes this probability. Over the 10 training/test sets, the Gaussian model

gives a classifier with mean Precision 47.90%, standard deviation 0.91, mean

Recall 87.47%, standard deviation 1.31. This performance is comparable to a

particular choice of parameters of SVM to obtain high Recall and low Precision.

Computational experiments (not reported in detail) demonstrate that using the

Gaussian model leads to weaker bound tightening compared to SVM, with no

saving of CPU time on average.

Based on these results, we use an SVM model trained with a polynomial

kernel of degree 4, λ = 4, β = 4, ω = 1 for the experiments in the remainder of

this section; this model has 580 support vectors yielding fast classification. Note

that 580 is almost half the size of the training set, suggesting that some overfit-

ting might occur. However, the model shows good performance on examples not

included in the training set.

5.3 Testing the probing algorithm

In this section, we discuss the effect of applying Aggressive Probing on a

set of difficult MINLPs. In addition to FBBT, we also use Optimality-Based

Bound Tightening (OBBT) [12]. This bound-tightening technique maximizes

and minimizes the value of each variable over the convexification computed by

Couenne at the root node, and uses the optimal values as variable bounds. For

each test instance, we first apply FBBT and OBBT. Then, for each variable, we

apply Aggressive Probing to tighten both the lower and upper bounds, with

a time limit of 60 seconds per variable, and 36 hours per instance. The parameter

max failures is set to 10. Variables are processed in the order in which they

are stored in Couenne. Note that Couenne uses a standardized representation

of the problem where extra variables, called auxiliary variables, are typically

Page 14

A probing algorithm for MINLP 13

added to represent expressions in the original problem formulation [5]. In our

experiments, to limit CPU time, OBBT and Aggressive Probing are applied

only to original variables; in principle, both can be applied to auxiliary variables

without modification.

After Aggressive Probing has been applied to all of the original variables

or the global time limit is reached, we record the fraction of tightened variables η,

and the average bound reduction ρ, as described in Section 4.1. Then, we apply

an additional iteration of FBBT to propagate the new bounds and generate

convexification inequalities. This gives a strengthened convexification C′of P

that is compared to the initial one, C. We record the fraction of variables for

which at least one bound could be tightened in C′, as well as the average bound

reduction ρ of the tightened variables. Additionally, we compute the percentage

of the optimality gap of C that is closed by C′, i.e., (z(C′)−z(C))/(z(P)−z(C)),

where z(C) is the optimal objective value of C, and z(P) is the value of the best

known solution for the particular instance. The value of z(P) for each instance

was obtained from the MINLPLib website.

Results are reported in Table 1. The fraction of tightened variables is relative

to the number of original variables for “Probing”. For “Probing + FBBT +

Conv.”, it is relative to the total number of variables, because auxiliary variables

can also be tightened after bound propagation through FBBT. The fraction of

tightened variables takes into account variables with finite bounds only. Infinite

variable bounds are tightened to a finite value only for the three water instances,

independent of whether SVM is used.

First, we discuss the effect of Aggressive Probing alone. Table 1 shows

that the effect of probing is problem-dependent; for example, for lop97icx, no

variable is tightened by our algorithm, and for nvs23 and nvs24, more than 90%

of the variables are tightened. On average, approximately 25% of the original

variables are tightened by Aggressive Probing, and after applying FBBT,

approximately 30% of the total number of variables (original plus auxiliary)

gained tighter bounds. The average bound reduction is close to 50%. The amount

of optimality gap closed by adding convexification inequalities after tightening

the bound is largely problem dependent as well. The new convexification is

much stronger for the water, nvs and csched2 instances, but for the remaining

instances, the optimality gap is unchanged. This is probably due to the geometry

of the initial convexification, for which the LP solution is extremely difficult to

cut off without branching, so that no optimality gap is closed by Aggressive

Probing. In summary, on all but one test instance, Aggressive Probing is

able to provide better variable bounds compared to traditional bound-tightening

techniques (FBBT followed by OBBT). This comes at a large computational cost,

but may be worth the effort for some difficult instances that cannot be solved

otherwise, or when parallel computing resources provide a large amount of CPU

power.

Comparing the Aggressive Probing algorithm with and without SVM

for failure prediction, we observe on average 30% of computing time saving

when using SVM, while the number of tightened variables and average bound

Page 15

14Nannicini et al.

# s-prob. iter.

Success Failure

1600 18131

1634

Without SVM

With SVM8998

Table 2. Number of successful and failed s-probing iterations recorded by applying

Aggressive Probing on the full test set of Table 1.

tightening is only slightly weaker. CPU time savings are problem dependent: the

difference can be huge (csched2a, lop97icx), or negligible. In only two cases

(nsv24 and tln6), using SVM for failure prediction results in an overall longer

probing time, but the increase is negligible. Summarizing, using an SVM model

to predict likely failures of the Aggressive Probing algorithm leads to CPU

time savings that depend on the problem instance at hand and are sometimes

very large, sometimes moderate, while variable bounds are tightened by almost

the same amount.

Table 2 reports the total number of successful and failed s-probing iterations

performed over all test instances. The use of an SVM classifier decreases the

number of failed s-probing iterations by a factor two, and increases the percent-

age of successful s-probing iterations from 8% to 15%. These improvements come

at essentially no cost.

5.4Branch-and-Bound after probing

The main purpose of a bound-tightening technique is to improve the performance

of a Branch-and-Bound search. In this section, we report Branch-and-Bound ex-

periments with and without Aggressive Probing on a few selected instances.

Table 1 indicates that the probing algorithm proposed in this paper may be

effective on the three water instances. Therefore, we execute the Branch-and-

Bound algorithm of Couenne on these instances with a time limit of 24 hours,

using the variable bounds obtained after applying FBBT and OBBT at the root

node. Then we perform the same experiment using the variable bounds provided

by Aggressive Probing with SVM for failure prediction. Results are reported

in Table 3, where we include the time spent by probing in the total CPU time.

The water4 instance is solved with and without Aggressive Probing;

Branch-and-Bound without probing is 30% faster, but it explores 20 times as

many nodes. Thus, probing is very effective in reducing the size of the enumer-

ation tree. The waterx instance remains unsolved after 24 hours. However, em-

ploying Aggressive Probing yields a much better lower bound when the time

limit is reached (we close an additional 37% of optimality gap). Finally, waterz

is not solved by Branch-and-Bound unless Aggressive Probing is used. Due

to tighter variable bounds, we can solve the instance to optimality in approx-

imately 12 hours, whereas it is unsolved in 24 hours (with 1.2 million active

nodes and 23% optimality gap left) if Aggressive Probing is not employed.

To the best of our knowledge, an optimality certificate for the solutions to the