An optimal adaptive algorithm for the approximation of concave functions.
ABSTRACT Motivated by the study of parametric convex programs, we consider approximation of concave functions by piecewise affine functions.
Using dynamic programming, we derive a procedure for selecting the knots at which an oracle provides the function value and
one supergradient. The procedure is adaptive in that the choice of a knot is dependent on the choice of the previous knots.
It is also optimal in that the approximation error, in the integral sense, is minimized in the worst case.

Article: A Method For Approximating Univariate Convex Functions Using Only Function Value Evaluations
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, piecewise linear upper and lower bounds for univariate convex functions are derived that are only based on function value information. These upper and lower bounds can be used to approximate univariate convex functions. Furthermore, new Sandwich algo rithms are proposed, that iteratively add new input data points in a systematic way, until a desired accuracy of the approximation is obtained. We show that our new algorithms that use only functionvalue evaluations converge quadratically under certain conditions on the derivatives. Under other conditions, linear convergence can be shown. Some numeri cal examples, including a Strategic investment model, that illustrate the usefulness of the algorithm, are given.02/2007;
Page 1
Mathematical Programming manuscript No.
(will be inserted by the editor)
J. Gu´ erin · P. Marcotte · G. Savard
An Optimal Adaptive Algorithm for the
Approximation of Concave Functions?
Received: date / Revised version: date
Abstract. We consider the piecewise linear approximation of a concave function representing
the value function of a parameterized convex program. Using dynamic programming, we derive
a procedure for selecting the knots at which an oracle provides the function value and one
supergradient. The procedure is adaptive in that the choice of a knot is dependent on the
choice of the previous knots. It is also optimal in that the approximation error, in the integral
sense, is minimized in the worst case.
Key words. Dynamic programming. Approximation. Adaptive algorithm.
1. Introduction
The present work is motivated by a bicriterion network equilibrium problem
modelled as a variational inequality (see Marcotte and Zhu [5]). In the lineariza
tion algorithm whose implementation is discussed in Marcotte, Nguyen and Tan
guay [6], parametric shortest path problems have to be solved repeatedly. Since
this is computationally costly, it is natural to consider the approximation of the
value function defined by this parametric program by a piecewise linear function
involving a small number of evaluation points (knots). In order to be consistent
with the stopping criterion used in the linearization algorithm, the quality of the
approximation has to be measured in the integral sense, i.e., with respect to the
L1norm. This yields the problem of selecting the knots such as to minimize the
approximation error, in the worst case.
In a general setting, consider a proper, concave function f defined over the
interval [0,1], normalized such that f(0) = 0 and f(1) = 1. At each point
¯ x ∈ (0,1) an oracle gives the value f(¯ x) and that of a supergradient ξ ∈ ∂f(¯ x),
i.e., a point satisfying the inequality
f(x) ≤ f(¯ x) + ξ(x − ¯ x)
∀x ∈ [0,1].
J. Gu´ erin: D´ epartement de math´ ematiques et g´ enie industriel,´Ecole Polytechnique, C.P. 6079,
succursale Centreville Montr´ eal (Qu´ ebec) H3C 3A7
P. Marcotte: CRT and DIRO, Universit´ e de Montr´ eal, CP 6128, succursale CentreVille,
Montr´ eal, Canada H3C 3J7
G. Savard: GERAD and D´ epartement de math´ ematiques et g´ enie industriel,´Ecole Polytech
nique, C.P. 6079, succursale Centreville, Montr´ eal (Qu´ ebec) H3C 3A7
Mathematics Subject Classification (1991): 20E28, 20G40, 20C20
?This work was partially supported by NSERC (Canada) and FCAR (Qu´ ebec)
Page 2
2J. Gu´ erin et al.
In what follows we will denote an arbitrary supergradient ξ by f?(¯ x), even when
f is not differentiable at ¯ x. From this information we derive the obvious under
and overapproximations of f over the interval (0,1):
?f(¯ x)
U(t) = f(¯ x) + f?(¯ x)(t − ¯ x)
which provide the error bound?1
prove that this procedure is optimal in the sense that it minimizes the error
term in the worst case. The procedure is adaptive: the selection of a knot is
dependent on the choice of the previous knots.
Novak [7] and Sonnevend [9] have shown that, for the problem of approxi
mating the integral of a convex function using function values and derivatives,
adaption does not improve the accuracy of the approximation with respect to
passive algorithms, in the worst case. This integration problem is equivalent to
ours and therefore adaption does not help here either. However, since the worst
case is unlikely to occur in practice, an adaptive algorithm might be able to take
advantage of favorable information to yield an improved approximation. This led
Sukharev [10] to the definition of sequentially optimal algorithms, i.e., adaptive
algorithms that make optimal use, at each step, of available information.
The algorithm that we propose in section 4 is not truly sequentially optimal
since, although evaluation points are chosen according to previous information,
they are selected in a lefttoright fashion. This may seem like a severe restriction,
as would any other fixed ordering of the points, but since adaption does not help
in the worst case no ordering can guarantee an better accuracy. The same is
true for any other more complex scheme to determine the order in which the
evaluation points are chosen. This does not mean that adaptive methods are not
useful for our problem, but that the advantage of such methods can be only be
seen in practice.
We have chosen the lefttoright order for simplicity. This gives an approxi
mation method that may be less efficient in practice than a sequentially optimal
algorithm, but such algorithms typically have high computational complexity,
offsetting accuracy gain. In contrast, our algorithm has low complexity, namely
O(n), where n denotes the number of evaluation points. This is the same as the
complexity of Sonnevend’s optimal passive algorithm.
Approximation algorithms based on the bounding functions L and U have
been studied in the literature under the name of “sandwich algorithms”, the
difference between L and U being measured with respect to the uniform, L1
or Hausdorff norm (the Hausdorff distance between the graphs of the functions
L and U). At a given iteration of a sandwich algorithm, a knot that lies in
the interval of largest estimated error is determined. Fruhwirth, Burkard and
Rote [3] propose, in the case of the Hausdorff distance, three subdivision rules
that achieve the optimal asymptotic bound O(n−2). A bound of the same order
was also obtained by Burkard, Hamacher and Rote [2] for the uniform norm. In
L(t) = min
¯ x
t,1 − f(¯ x)
1 − ¯ x
(t − ¯ x) + f(¯ x)
?
0(U(t) − L(t))dt. We propose to minimize the
above error term through a sequential selection procedure for the knots, and
Page 3
An Optimal Adaptive Algorithm for the Approximation of Concave Functions3
Fig. 1. Upper and lower approximation of the function f (n = 2)
the paper by Rote [8] four subdivision rules are studied both from a theoretical
and numerical point of view. However, two of the subdivision methods require,
at each iteration, the solution of an optimization problem involving the function
f itself; this violates a condition of our problem which states that no more than
n function evaluations must be performed. Indeed, Yang and Goh [11] showed
that, if f is easy to compute, the sandwich algorithm can dispense altogether
with firstorder (derivative or supergradient) information.
In discussing optimal sandwich methods, Rote mentions the problem of de
termining an evaluation strategy that minimizes the maximal error. Our analysis
brings a partial answer to this problem and improves upon previous works in
two important respects:
– We obtain both the optimal convergence rate and optimal selection rules for
each n.
– Our result is parameterfree: no a priori information about the function to
be approximated is required.
2. Problem definition
Let f be a proper concave function defined over the unit interval [x0,xn+1] =
[0,1] and normalized so that f(0) = 0 and f(1) = 1. We wish to sequentially
select n points x1,...,xnin order to minimize the measure
?1
where (see Figure 1)
?
U(t) =min
E(x1,...,xn) =1
2
0
[U(t) − L(t)]dt
(1)
L(t) =min
i=1,...,n+1
f(xσ(i)−1)
(t − xσ(i))
xσ(i)−1− xσ(i)
+ f(xσ(i))
(t − xσ(i)−1)
xσ(i)− xσ(i)−1
?
i=0,...,n+1{f(xi) + f?(xi)(t − xi)}
and σ is the permutation that reorders the knots and the two endpoints from
left to right:
0 = xσ(0)< xσ(1)< ... < xσ(n)< xσ(n+1)= 1.
Consider a class of functions F ⊂ C[0,1]. Let Endenote the worstcase error
corresponding to an optimal selection of n knots (n > 0). More precisely, let A
be the class of all algorithms that construct the approximationU+L
formation (f(x1),f?(x1),...,f(xn),f?(xn)) obtained by evaluating the function
f ∈ F and one of its supergradients at n points x1,...,xnof [0,1]. We consider
2
using the in
Page 4
4 J. Gu´ erin et al.
here deterministic adaptive algorithms, where the choice of the point ximay de
pend on the previous information x1,f(x1),f?(x1),...,xi−1,f(xi−1),f?(xi−1).
Denote the application of α ∈ A to f ∈ F by α(f). The optimal worstcase
error for the approximation of functions from F in the L1norm is defined to be
En(a,b) = inf
α∈Asup
f∈Ff − α(f)1.
It is not difficult to show that the supremum in the above expression is exactly
the integral on the righthand side of (1).
In the terminology of Sukharev [10], we consider adaptive algorithms of the
form α = (N,φ), where the information operator is
N(f) = (f(x1),f?(x1),...,f(xn),f?(xn))
and the terminal operation φ is defined by
φ(N(f)) =U + L
2
.
It can be shown that φ is central and thus optimal in the sense that for each f,
and N(f) already computed, it minimizes
sup
˜ f∈Ff
˜f − α(f)1,
where Ffis the subset of functions˜f in F with˜f(xi) = f(xi) and˜f?(xi) = f?(xi)
for all i.
The information operator N described above is imposed by the information
available for our problem and the choice of the particular terminal operation φ
is justified by its optimality. Therefore, the construction of an approximation
algorithm reduces here to the choice of the n evaluation points x1,...,xn. The
optimal worstcase error Enand the optimal evaluation points can be computed
through the recursion
En−k(zk) =min
xk+1∈(0,1)
max
(f(xk+1),f?(xk+1))∈C= En−(k+1)(zk+1)
k = 0,...,n − 1
where
zk=?x1,...,xk,f(x1),...,f(xk),f?(x1),...,f?(xk)?
and C represents the set of constraints that must be satisfied by the values of
f(xk+1) and f?(xk+1) in order to be compatible with the first k functional and
supergradient values of the concave function f.
The above system is in most likelihood too complex to be reduced to a
closed form expression. For this reason, we limit our analysis to the identity
permutation, i.e., the knots will be determined in a lefttoright fashion.
Page 5
An Optimal Adaptive Algorithm for the Approximation of Concave Functions5
Fig. 2. Case n = 1.
Let a = f?(0), b = f?(1) and denote by En(a,b) the optimal worstcase error
when knots are selected in a from left to right. By definition E0(a,b) is equal to
the area of the triangle OPT in Figure 2, i.e.
E0(a,b) =1
2
(1 − b)(a − 1)
a − b
.
In the case where n = 1, let x denote the evaluation point and set v = f(x),
µ = f?(x)1. Since the graph of f is entirely contained within the triangle OPT
of Figure 2, the following requirements must be met by v and µ:
x ≤ v ≤ ax
if
x ∈ [0,1−b
a−b]
x ≤ v ≤ bx + 1 − b if
1 − v
x ∈ [1−b
a−b,1]
1 − x≤ µ ≤v
x.
(2)
The error bound E1(a,b) corresponds to the sum of the areas of the triangles
OML and LNT of Figure 2 and can be expressed in term of E0:
E1(a,b) =
min
vµ
x∈(0,1)maxmax
?
xvE0
?x
va,x vµ
?
+ (1 − x)(1 − v)E0
?1 − x
1 − vµ,1 − x
1 − vb
??
,
where v and µ must satisfy the geometric constraints (2), and the scaling
factors multiplying E0, a, b and µ are obtained by elementary geometric argu
ments. Now, for an arbitrary number n, the worstcase error term can be defined
recursively as
En(a,b) =
min
x∈(0,1)max
v
max
µ
?
xvE0
?x
va,x vµ
?
+ (1 − x)(1 − v)En−1
?1 − x
1 − vµ,1 − x
1 − vb
??
(3)
with constraints (2). Any minimizer x of En(a,b) is called optimal. The next
point of evaluation is then set to a minimizer of En−1
and so on to the nth knot. Our main result follows.
?
1−x
1−f(x)f?(x),
1−x
1−f(x)b
?
,
1From now on, we drop knot indices, with the exception of Figure 2, where it has been
retained in order to avoid a collision of the symbol x with itself.
Page 6
6 J. Gu´ erin et al.
Fig. 3. The function φ (case µmax= µ+)
Theorem 1. The optimal worstcase error is equal to
En(a,b) =
1
2(n + 1)2
(a − 1)(1 − b)
(a − b)
(4)
and the minimum in (3) is achieved at the point
x∗=
1
(n + 1)2
?
1 + 2n1 − b
a − b
?
.
(5)
As proved by Sonnevend [9], this choice is optimal and cannot be improved
by adaptive methods, although the latter may prove superior in practice.
Note that, if f is concave increasing and no a priori information is available
on the slopes a and b, i.e., a = +∞ and b = 0, then we obtain
En(a,b) =
1
2(n + 1)2.
3. Proof of the theorem
As the proof of Theorem 1 is lengthy, due to the many cases and subcases that
have to be probed, we only provide an outline. The reader interested in the
complete proof is referred to Gu´ erin [4]. The proof proceeds by induction on n.
The result clearly holds for n = 0. For n ≥ 1 we evaluate the expression
?
working backwards with respect to the two “max” operators. For fixed x and v,
let us consider the function
?
Using the induction hypothesis to eliminate the terms E0 and En−1, φ can be
written as
φ(µ) = Av − µx
a − µ
where A = ax − v and B = 1 − v − (1 − x)b are nonnegative scalars.
The graph of φ is given on Figure 3. This function has two local optima,
denoted µ−and µ+. Its maximum µmaxon [(1−v)/(1−x),v/x] is attained either
at µ+or at one of the endpoints of the interval. This yields three cases, each
one corresponding to the location of the point (x,v) within the triangle OPT of
Figure 4. The three regions I, II and III are defined, respectively, as quadrilateral
PQRS, triangle RST and triangle ORQ. The maximum of φ occurs at µ+if
Rn(x) = max
v
max
µ
xvE0
?x
va,x vµ
?
+ (1 − x)(1 − v)En−1
?1 − x
1 − vµ,1 − x
1 − vb
??
,
φ(µ) = 2
xvE0
?x
va,xvµ
?
+ (1 − x)(1 − v)En−1
?1 − x
1 − vµ,1 − x
1 − vb
??
.
+ B(1 − x)µ − (1 − v)
n2(µ − b)
,
Page 7
An Optimal Adaptive Algorithm for the Approximation of Concave Functions7
Fig. 4. Three cases for µ.
(x,v) belongs to region I, at (1 − v)/(1 − x) if (x,v) belongs to region II and at
v/x if (x,v) belongs to region III.
We introduce, for fixed x, the function ψ:
ψ(v) = n2(a − b)φ(µmax).
The value vmax at which ψ reaches its maximum defines a piecewise smooth
function of x consisting of three linear and one quadratic pieces. There are two
cases to be considered, depending on whether n is larger or less than (a−1)/(1−
b). The function vmax, illustrated on Figure 5, is defined as
in the case n ≤ (a − 1)/(1 − b), and by
if n ≥ (a − 1)/(1 − b). (Subscript “1” refers to the xcoordinate of a point.)
vmax=
ax
if x ∈ (0,W1]
if x ∈ [W1,D1]
bx + (1 − b)√x
a + b
2
x +2(n + 1) − a(2n + 1)b
2(n + 1)2
if x ∈ [D1,E1]
bx + (1 − b)if x ∈ [E1,1)
vmax=
ax
if x ∈ (0,F1]
a + b
2
x +2(n + 1) − a(2n + 1)b
2(n + 1)2
if x ∈ [F1,G1]
1 − a + ax + (a − 1)√1 − x
bx + (1 − b)
if x ∈ [G1,V1]
if x ∈ [V1,1)
Fig. 5. The function vmax
To conclude the proof, we determine the minimum of the function Rn(x) over
the interval (0,1) by computing the minimal value of Rnover each subinterval.
Next, we check that the minimum occurs at the point x∗, with minimal value
given by the formula of Theorem 1. This corresponds to (x,v) belonging to region
I and
vmax=a + b
2
x +2(n + 1) − a(2n + 1)b
2(n + 1)2
.
Page 8
8J. Gu´ erin et al.
4. Numerical tests
Algorithm DYN is based on the optimal formula provided by Theorem 1. It
computes the optimal location of n points from left to right or right to left, using
the formula for x∗and a straightforward scaling procedure. The choice of the
direction is determined by a heuristic procedure based on a priori information.
The performance of DYN is compared with that of SONN, the optimal passive
algorithm proposed by Sonnevend [9]. The computational complexity of both
algorithms is O(n) function and derivative evaluations.
The performance of DYN and SONN was tested on sets of randomly gener
ated concave functions. Three functional forms were considered: smooth, piece
wise linear (PL) and piecewise smooth (PS). For each form, two samples were
produced: one consisting of concave increasing functions and the other of general
concave functions.
For each sample, the average error was computed for values of n ranging
from 1 to 20. This is consistent with the range considered in the bicriteria traffic
equilibrium problem discussed in the introduction. The results from both algo
rithms, which were compared by taking the ratio of the average error of SONN
over the average error of DYN, are illustrated in Figures 6(a)6(f).
• For nearly all functions tested, DYN performed at least as well as SONN. In
the case were DYN’s performance is worse, the difference in accuracy is at
most 2%.
• On some specific functions, the gain in accuracy achieved by DYN is as high
as 400%. Large gains were observed on functions exhibiting strong curvature
near the endpoints.
• On average, DYN performed better that SONN on all six samples. Gains in
accuracy were largest for piecewise smooth functions and least for smooth
functions, with gains for piecewise linear functions falling in between.
Fig. 6. Ratio SONN over DYN of average errors for (a) smooth concave increasing, (b) smooth
concave, (c) PL concave increasing, (d) PL concave, (e) PS concave increasing and (f) PS
concave functions.
(a)(b)
(c)(d)
(e)(f)
References
1. R. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton University Press,
Princeton (1962).
Page 9
An Optimal Adaptive Algorithm for the Approximation of Concave Functions9
2. R.E. Burkard, H.W. Hamacher and G. Rote. Sandwich approximation of univariate convex
functions with an application to separable convex programming. Naval Research Logistics
38 (1991), 911924.
3. B Fruhwirth, R.E. Burkard and G. Rote. Approximation of convex curves with application
to the bicriterial minimum cost flow problem. European Journal of Operational Research.
42 (1989), 326338.
4. J. Gu´ erin. Une m´ ethode adaptative pour l’approximation de fonctions concaves crois
santes. M´ emoire de maˆ ıtrise de l’´Ecole Polytechnique de Montr´ eal (2000).
5. P. Marcotte and D.L. Zhu. Equilibria with infinitely many differentiated classes of cus
tomers. Complementary and Variational Problems, State of the Art. JongShi Pang and
Michael Ferris eds., SIAM. Philadelphia (1997), 234258.
6. P. Marcotte, S. Nguyen and K. Tanguay. Implementation of an efficient algorithm for the
multiclass traffic assignment problem. Proceedings of the 13th International Symposium
on Transportation and Traffic Theory. JeanBaptiste Lesort ed., Pergamon (1996), 217
236.
7. E. Novak. Quadrature formulas for convex classes of functions. International Series of
Numerical Mathematics 112. Birkh¨ auser Verlag, Basel (1993), 283296.
8. G. Rote. The convergence rate of the sandwich algorithm for approximating convex func
tions. Computing 48 (1992), 337361.
9. G. Sonnevend. Optimal passive and sequential algorithms for the approximation of convex
functions in Lp[0,1]s, p = 1,∞. Constructive function theory ’81. Sofia (1983).
10. A. G. Sukharev. Minimax models in the theory of numerical methods, Theory and Decision
Library Series B Vol. 21. Kluwer Academic Publishers (1992).
11. X.Q. Yang and C.J. Goh. A method for convex curve approximation. European Journal
of Operational Research 97 (1997), 205212.