Page 1
A RATE RESULT FOR SIMULATION OPTIMIZATION WITH CONDITIONAL VALUEATRISK
CONSTRAINTS
Soumyadip Ghosh
Business Analytics and Math Sciences Division
1101 Kitchawan Road
Yorktown Heights, NY 10598, USA.
ABSTRACT
We study a stochastic optimization problem that has its
roots in financial portfolio design.
specified deterministic objective function and constraints on
the conditional valueatrisk of the portfolio. Approximate
optimal solutions to this problem are usually obtained by
solving a sampleaverage approximation. We derive bounds
on the gap in the objective value between the true optimal
and an approximate solution so obtained. We show that
under certain regularity conditions the approximate optimal
value converges to the true optimal at the canonical rate
O(n−1/2), where n represents the sample size. The constants
in the expression are explicitly defined.
The problem has a
1INTRODUCTION
Financial markets have seen an explosive growth in the
number of investment vehicles available, each of which
comes with its own risktoreward tradeoff. As the industry
gathers more knowledge and experience with various exotic
investment opportunities, it has become increasingly clear
that a portfolio manager must actively seek to assess and
manage the risk inherent in a portfolio. Of particular interest
is the fact that though each option’s risk on returns might be
easy to determine, the nature of the joint risk or volatility
in a diverse portfolio is relatively less understood.
This article concerns itself with a portfolio manager’s
task of designing a portfolio by allocating all or part of
a budget over a fixed set of highreturn but also highrisk
assets. Let random variables {ξk,k = 1,...,d} represent
the change in value of investment vehicle k over a fixed
time interval. Denote by x ∈ IRdhow the marginal dollar
is divided amongst the d investments. Then, we concern
ourselves with the stochastic program
max
x
{ ctx

x ∈ X ∩X0and R(g(ξ,x)) ≤ b },
(1)
where the Rconstraints are riskbased. The allocation x is
scaled to that of a nominal dollar, and so x ∈ X0?
IRd
+,
∑kxk≤1}, a subset of the nonnegative orthant IRd
The set X ⊂ IRdis a convex polytope that represents any
additional (deterministic) constraints on the chosen alloca
tion. The cost coefficients c are known and deterministic
and can be thought of as the total revenue from a portfolio
with allocation x.
The function g defines a random outcome that depends
both on the choice of x and an independent random variable
ξ inspaceIRd. Itrepresentsanotionoflossexperiencedwith
decisions x and the change ξ in values of underlying random
variables. The function R is said to be the (deterministic)
riskinherentinthelossmeasuregforaparticularchoiceofx.
We shall limit this article to the case when the function g has
a onedimensional linear functional form g(ξ,x) = −ξtx.
The stochastic program thus seeks to find the maximum
revenueportfolio allocationfrom aset offeasible allocations
where no x results in a risk of more than b. Markowitz
(1952), who laid the foundation to the portfolio optimiza
tion and management theory, considered the meanvariance
relation as representative of the riskreturns tradeoff. Since
then various measures of risk have been studied in this
framework. J P Morgan’s RiskMetricsTM(1996) was an
important advocate for the use of the valueatrisk measure
in financial portfolio management. The VaRVβ(·) at level β
is the lowest potential loss that may occur with probability
1−β, and is thus is a natural candidate as a risk measure.
It is indeed widely used in the industry.
The VaR measure has been known to exhibit behavior
that might run counter to expectations, which limits their
effectiveness to special conditions. Artzner et al. (1999)
argue that risk measures should satisfy the following con
ditions in order to be coherent: they should be translation
invariant, subadditive, positive homogeneous and mono
tone in the random variable. The VaR measure fails, for
instance, the subadditivity test.
TheriskmeasureRweconsideristheconditionalvalue
atrisk (CVaR) at level β, denoted as Cβ(·). The CVaR risk
={ x ∈
+.
615 9781424427086/08/$25.00 ©2008 IEEE
Proceedings of the 2008 Winter Simulation Conference
S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.
Page 2
Ghosh
measure, defined in terms of the VaR risk measure Vβ(·),
and has garnered a lot of attention recently. (Definitions for
both are given in Section 2.) The CVaRCβ(·) represents the
average loss experienced given that the loss is greater than
Vβ(·). Pflug (2000) and Acerbi and Tasche (2002) show that
the CVaR measure is coherent. Coherent measures possess
strong functional properties that make them more amenable
to use in a wide variety of applications, and are now widely
accepted in the academic community.
The problem of estimating CVaR measures Cβ(Y) for
a random variable Y is of interest in itself, and it is par
ticularly so as a rareevent estimation problem (Juneja and
Shahabuddin 2007, provide a good review). This is be
cause the estimation requires generating samples from a
lowprobability set when β is close to 1, as is typical in
practice. The estimation of VaR has been well studied un
der this framework: for instance, Glasserman et al. (2000)
provide a general variancereduction framework to estimate
VaR for lighttailed distributions, while Glasserman et al.
(2002) look at the estimation problem for heavytailed Y.
Focus is now beginning to shift to the CVaR measure, which
awaits a similar thorough treatment.
Rockafellar and Uryasev (2000) introduce a different
estimator that is perhaps more suited for optimization ap
plications like (1). An approximation problem constructed
using their estimator usually results in a convex or even a
linear program (Rockafellar and Uryasev 2002), and thus
leads to efficient implementations for largescale problems
of the type (1).
Problem (1) is a member of the general class of stochas
tic problems that include a constraint involving an expec
tation which, in general, can not be written down in closed
form. The simulation literature provides a diverse set of
tools to tackle such stochastic convex problems. A standard
approach called sample average approximation estimates
the expectation of the random function via samples of the
underlying random variable and then constructs a constraint
that approximates the true CVaR constraint in (1). One usu
ally expects the solution to the approximated problem to
be approximately close to the true optimal. In Section 3,
we provide a bound on the relative optimality gap between
the approximation solution generated by a sampleaverage
algorithm and a true optimal of (1). For a sufficiently large
sample size n, the solution found has an objective value
within O(n−1/2) of the optimal. Wang and Ahmed (2007)
provide bounds of the same order on the quality of sample
averageapproximationsolutionstogeneralconvexproblems
with stochastic constraints. Their results are derived using
large deviations theory and require R(g(ξ,x)) to satisfy cer
tain conditions, which do hold in the case we study. We
derive bounds with similar rates of convergence, but we
provide a geometric argument using the coherence proper
ties of Cβ(x). In our case, the constant in the expression is
defined in a manner that can be calculated analytically in
some cases, or estimated. Moreover, the constant does not
include the r.h.s. b in its definition.
The rest of the article is laid out thus: Section 2 pro
vides the mathematical background on the problem we are
interested in. Section 3 describes the main result of this
paper, and Section 4 discusses some directions these results
can be extended in and/or utilized.
2 THE OPTIMIZATION PROBLEM
The βVaR Vβ(Y) of a random variable Y is the (1−β)th
quantile of Y, and is defined as
Vβ(Y)
?
= F−1
Y(1−β) = inf
y∈IR
{P(Y > y) ≤ 1−β}.
The conditional valueatrisk Cβ(Y) of a random variable
Y with a continuous distribution is
Cβ(Y) = E[YY ≥Vβ(Y)].
In problem (1), we are interested in the CVaR Cβ(g(ξ,x))
of a portfolio with the allocation x, and we treat only
the case g(ξ,x) = −ξtx here. We shall use the shorthand
Cβ(x) for Cβ(−ξtx).
with noncontinuous distributions is harder to define. The
continuousdistribution requirement on ξtx is usually not
overly restrictive. In general, the linear functional ξtx has
a continuous distribution if even one of the components of
ξ have a continuous distribution. Henderson (2007) note
that the distribution function of the convolution ξtx can
be obtained by conditioning on a component ξkthat has a
continuous distribution with a density, which leads to an
expression for convolution ξtx’s density.
FunctionCβ(Y) is subadditive, positive homogeneous,
translation invariant and monotone in Y. The first two
properties, in particular, imply that Cβ(Y) is convex in Y.
Since our choice of g is linear in x,Cβ(x) is also subadditive
and concave in x, and the feasible region carved out by
the CVaR constraint, a levelset, is convex. The stochastic
program (1) can thus be rewritten as a convex optimization
problem:
The CVaR of a random variable
maxctxs.t.Cβ(x) ≤ b,
x ∈ X0∩X.
(2)
Random variables {ξk, k = 1,...,d} represent the
change in real value of the assets under consideration over
the fixed decision timeperiod. Let µ represent the mean
and Σ the correlation structure of ξ. In this exercise, one
would typically consider only those investments that have
an expected net positive growth outcome Eξk= µk> 0. To
avoid trivialities, we additionally assume the ξkto satisfy
Assumption 1
There exists a positive constant δ
such that Cβ(ξk) ≥ δ > 0, ∀k = 1,...,d.
616
Page 3
Ghosh
This implies that each asset considered is expected
to net us a positive return µi, but it is accompanied with
distribution tails that are fat enough to result in a risk of a
positive loss at the β risktolerance level. This assumption
is reasonable in most cases since instruments that violate it
typically do not result in positive real returns. Instruments
suchasTreasurybillsthataretraditionallyconsidered“safe”
or relatively riskfree are typically expected to only track
inflation in value. From formulation (2) we see that an
optimal allocation x∗might not sum to 1, and the rest of
the marginal dollar is assumed to be invested in such safe
instruments.
The program (2) is a convex problem, and thus large in
stances could potentially be solved efficiently. The principle
difficulty lies in the fact that the Cβ(·) constraints cannot be
written down in an explicit form given the distribution of ξ.
One approach to overcoming this difficulty is to construct a
sample approximation of the original problem (2), in which
the CVaR constraint is replaced with an estimate that is
constructed based on samples of the random variable ξ.
The sample average approximation of the problem (2) is
of the general form:
maxctx s.t.
ˆCβ(x) ≤ b,
x ∈ X0∩X,
(3)
where ˆCβ(·) is an estimator of Cβ(·) constructed from
samples of ξ. The approximationˆCβ(Y) can be provided
using a canonical estimator of the expected value of the
random variable YY≥Vβ(Y), because by definitionCβ(Y)
is the expected value of a random variable that follows the
same distribution as Y to the right of the (fixed) pointVβ(Y).
Rockafellar and Uryasev (2000) propose a new sample
average based estimator to the function Cβ(x):
min
α∈IR
?
α +
1
1−β
?
y∈IRd[ytx−α]+pξ(y)dy.
?
This estimator has been designed with optimization in mind,
and is convex in x for our choice of a linear g. The sample
averageapproximationproblem(3)isthenaconvexprogram.
The results we derive in Section 3 shall assume that
the estimatorˆCβ(x) satisfies the following conditions:
Assumption 2
2a.
Cβ(Y) is consistent and satisfies a central limit
theorem of form:
The estimator
ˆCβ(Y) of
√n(ˆCβ(Y)−Cβ(Y)) ⇒ σN(0,1)
(4)
where σ2is the variance associated with the es
timation,
The estimatorˆCβ(x) of Cβ(x) is positive homoge
neous in x, and
2b.
2c.Let σ(x) be the CLT (4) variance for random
variable ξtx and set Θ?
={θ ∈ IRd
Then, the supremum supθ∈Θσ(θ) exists.
Set Θ is the collection of allocations where the entire
nominaldollarisinvestedinthed assetsunderconsideration,
and thus is one of the faces of the boundary of set X0.
These assumptions are not overly restrictive, and rea
sonable estimators are expected to satisfy this condi
tion. Lemma 1 shows that the canonical estimator complies.
Let {ξi: i = 1,...,N} be N i.i.d. samples of ξ, and {ξt
be a nondecreasing ordering of the ξt
canonical estimator of Cβ(x) at point x is
+ ∑kθk= 1}.
(i)x}
ix values. Then, the
ˆCβ(x)?
=
1
N−?Nβ?
N
∑
i=?Nβ?
ξt
(i)x.
(5)
Lemma 1
Proof: For notational convenience, let Y(x) = −ξtx, and
we shall drop the argument x of Y(x) when the context
makes the meaning clear. Note that E[Y(x)] = µtx and
Var[ Y(x) ]=xtΣx. Thesetbeingoptimizedoveriscompact,
soitwillbesufficienttoshowthatthevarianceoftherandom
variables{Y(θ)Y(θ)≥Vβ(θ)}areboundedabovetoprove
2c. The variance can be written as
The estimator (5) satisfies Assumption 2.
=
E[Y2Y ≥Vβ(θ)]−E2[YY ≥Vβ(θ)]
E[Y2]β −δ2
{Var[ Y ]+E2[Y]}β −δ2= β{xtΣx+(µtx)2}−δ2.
≤
=
The first inequality uses Assumption 1. The upper bound is
a quadratic in x, which attains a finite maximum within the
compact set Θ. Thus, the variance term in assumption 2a
is finite and assumption 2c is satisfied. Assumption 2b is
true because of the linear form of (5) and the fact that the
ordering in (5) does not change if x is scaled by a positive
value.2
The Rockafellar and Uryasev (2000) estimator can be
verified to satisfy Assumption 2b. It is not immediately
clear that it obeys a limit theorem such as (4), though we
suspect that this is the case.
3 OPTIMALITY GAP
The central limit theorem (4) obeyed by the CVaR estimator
ˆCβ(x) ultimately leads us to our central result Theorem 1,
which demonstrates that the optimal objective value ctˆ x∗
output for the approximation problem (3) is within a rel
ative gap of O(n−1/2) of the optimal objective value ctx∗
of the original problem (2) for sufficiently large n. The
convergence rate is as we should expect given the CLT (4),
but interestingly the limit does not depend on b. Let φ(·)
n
617
Page 4
Ghosh
represent the distribution function of the standard normal
distribution N(0,1) and x∗be an optimal solution to the
original problem (2).
Theorem 1
The objective value ctx∗
solution returned for the nsample approximation prob
lem (3) (with a sufficiently large n) satisfies
a)
ctx∗
n
≤
?
with probability p. The constant M is defined in Lemma 2,
and is independent of parameters c or b in (3).
The inequalities should be interpreted in the same sense
as when used in deriving standard confidence intervals for
samplingbased estimators.
We shall provide a set of preliminary results that will
in turn lead to the proof of Theorem 1. In the proof we
show that the feasible set created by the sample average
approximation problem, convex or not, is contained within
a scaledup version of the convex feasible set of the original
problem (2). In turn, the approximation feasible set contains
ascaleddownversionoftheoriginalconvexset. Thescaling
parameters are bounded by O(n−1/2) terms, which gives
us Theorem 1. The idea is demonstrated for the IR2case
in Figure 1.
We start with the ratio of the variance σ2(x) and the
square of the estimator Cβ(x), otherwise known as the
coefficient of variation of the random variable ξtxξtx ≥
Vβ(x).
Lemma 2
Define
nof the optimal
ctx∗·
?
1+
?
M
nφ−1(p)+O(1
n)
?
b)
ctx∗
n
≥
ctx∗·
1+
?
M
nφ−1(p)
?−1
M?
= max
θ
{σ2(θ)
C2
β(θ)
: θ ∈ Θ}.
M exists and is finite.
Proof: The functionCβ(x) is concave in its argument x, and
hence Cβ(θ) ≥ ∑kλkCβ(ek) ≥ δ. Here the λkconstitute a
convex combination, and ekrepresent the unit vector with
one in the kthcomponent, and we also use Assumption 1.
Thus, the term 1/Cβ(θ) is bounded above by the constant
1/δ2. This, combined with Assumption 2c, gives the result.
2
The constant M depends on the distribution of ξ in (2).
In many cases, M can be explicitly evaluated; for instance
when ξ are multivariate normally distributed as N(µ,σ),
explicit expressions for σ(θ) and Cβ(θ) can be written
down and shown to be quadratic functions of x, and the
optimal value over Θ can then be determined. At first
glance the fact that M does not use c or b in its definition
seems remarkable, but this is to be expected given the strong
positive scaling property of Cβ(·).
We need some more notation to state our next result.
Let Ω represent the intersection of the convex set defined by
theCVaRconstraintin(2)withthenonnegativeorthantIRd
i.e. Ω
= {Cβ(x) ≤ b,
itsapproximatingsetconstructedbyˆCβ(x)in(3). ThesetΩn
need not necessarily be convex. We denote the boundary
of a set A by ∂A. For an x ∈ ∂Ω, let θ(x) = x/x1
and ¯ r(x) = x1. (The l1norm of x is x1= ∑kxk.)
Denote by Pn(x) the point in ∂Ωn that lies along θ(x).
Let and ¯ rn(x) = Pn(x)1. Figure 1 pictorially depicts these
definitions.
+,
?
x ∈ IRd
+}, and Ωncorrespondingly
Ω
Ωn
θ
ΩL
ΩU
0
x = ¯ rθ
Pn(x) = ¯ rnθ
Figure 1: The sets Ω, Ωn, ΩLand ΩU in IR2.
Lemma 3
For each x ∈ ∂Ω,
?−1
¯ r
?
1+
?M
nφ−1(p)
≤ ¯ rn≤ ¯ r
?
1−
?M
nφ−1(p)
?−1
,
with probability p.
Proof: The function Cβ(x) and its estimator ˆCβ(x) (as
defined in (5)) are both positive homogeneous in x ∈ IRd.
We have that for any x ∈ Ω,
Cβ(x)−ˆCβ(x)
?
=
¯ rCβ(θ)−ˆCβ(θ)
¯ r·σ(θ)
√nφ−1(p).
(6)
The first equation uses the homogeneity of the functions
(Assumption 2b). The second inequality holds for a proba
bility pfollowingthestandardconfidenceintervalderivation
using the CLT (4).
For an x in the boundary ∂Ω, the constraintCβ(x)≤b is
tight. Thus ˆCβ(x)−Cβ(x) = ˆCβ(x)−b. Moreover from
the definition of Pn and ¯ rn, ˆCβ(x) = ¯ rˆCβ(θ) = ¯ r b/¯ rn. In
618
Page 5
Ghosh
other words,
¯ r
¯ rn−1
=
1
bˆCβ(x)−b
1
bCβ(θ)
√M1
√nφ−1(p),
?
?
b
?
·σ(θ)
√nφ−1(p),
≤
with probability p. The second inequality uses (6), while the
last uses Lemma 2. The last inequality can be refashioned
to give the relation required in the statement of the lemma.
2
For a set S and a scalar a, let aS = {ax  x ∈ S}. Lemma 3
leads to the following corollary:
Corollary 2
For n sufficiently large,
a)
ΩL
?
=
Ω·
?
1+
?M
nφ−1(p)
?
?−1
?M
⊆
Ωn
(7)
b)
Ωn
⊆
ΩU
?
=
Ω·
1−
nφ−1(p)
?−1
(8)
with probability p.
Proof: We shall prove the upper bound and the lower bound
follows similarly. For any x ∈ Ωn, we need to show that
x ∈ ΩU. Write x = r(x)θ where θ is a unit vector from Θ.
Consider the ray {rθ, r > 0} along θ. As before, let ¯ r rep
resent the rvalue that defines the raypart contained within
Ω (i.e., ¯ r = maxr{rθ ∈ Ω} and ¯ rθ ∈ ∂Ω), and similarly ¯ rn
and ¯ rU for Ωn and ΩU respectively. From Lemma 3 we
have that (w.p. p) ¯ rn≤K¯ r for some constant K independent
of θ. Moreover, ¯ rU= K¯ r. Thus, r(x) ≤ ¯ rU and x ∈ ΩU.
This holds for any θ ∈ Θ and thus any x ∈ Ωn, and this
establishes a). 2
We now have all the results we need and shall proceed to
prove Theorem 1.
Proof of Theorem 1:
part follows in a similar fashion. The feasible region of the
original problem (2) is given by Ω∩(X0∩X). Similarly,
Ωn∩(X0∩X) defines the feasible set of the approximation
problem (3). The literature on convex bodies (closed, com
pact, convex sets; cf. Schneider 1993) tells us that if (7)
holds, then so does
We prove part a) here; the other
(Ωn∩(X0∩X)) ⊆ (ΩU∩(X0∩X)).
(9)
Let x∗be an optimal solution to (2). Now, for any scalar
constant a > 0, ct(ax∗) ≥ ct(ax), ∀x ∈ (Ω∩X0∩X) , i.e.,
ax∗is also optimal for objective ctx over x∈a(Ω∩X0∩X).
Combining this with (9), we have that
ct(x∗
n)
≤
ctx∗·
?
?
1−
?M
?M
nφ−1(p)
?−1
=
ctx∗·
1+
nφ−1(p)+O(1
n)
?
.
The final expression uses the expansion for (1−x)−1when
0 ≤ x < 1, which is true for sufficiently large n. 2
Remark 1
The arguments in this section primarily
usethecoherencepropertiesofCβ(·)asdescribedbyArtzner
et al. (1999), and so these results presumably hold if Cβ(·)
is interchanged with other coherent measures.
Remark 2
This article treats only the case of a
linear unidimensional function g(ξ,x) = −ξtx. The re
sults above can be generalized to multidimensional linear
settings. Nonlinear functions that satisfy certain proper
ties might also be good candidates: for instance, functions
that are convex and positive homogeneous, or Lipschitz
continuous.
4 FUTURE DIRECTIONS
The constant M that appears in the rate relation in Theo
rem 1 depends on the CVaR estimator used via the CLT (4).
This leads to the natural suggestion that an estimator can
be designed such that it has a lower value of M, or even
minimizes it, which will in turn lead to better estimation of
the true optimal value for the same sample size n. Finding
such an estimator falls under the general purview of variance
reduction, with the added twist that the M is defined as the
maximum coefficientofvariation. Whether this demands
special attention when constructing variancereduction es
timators over applying standard techniques is not clear, but
the question definitely merits further investigation.
REFERENCES
Acerbi, C., and D. Tasche. 2002. On the coherence
of expected shortfall. Journal of Banking and Fi
nance 26:1487–1503.
Artzner, P. F., J. M. Eber, and D. Heath. 1999. Coherence
measures of risk. Mathematical Finance 9:203–228.
Glasserman, P., P. Heidelberger, and P. Shahabuddin. 2000.
Variance reduction techniques for estimating Valueat
Risk. Management Science 46:1349–1364.
Glasserman, P., P. Heidelberger, and P. Shahabuddin. 2002.
Portfolio valueatrisk with heavytailed risk factors.
Math. Finance 12:239–269.
Henderson, S. G. 2007. Mathematics for simulation. In
Handbooks in Operations Research and Management
619
Page 6
Ghosh
Science: Simulation, ed. B. L. Nelson and S. G. Hen
derson, 19–54. Amsterdam: Elsevier Science.
Juneja, S., and P. Shahabuddin. 2007. Rare event simula
tion techniques: An introduction and recent advances.
In Handbooks in Operations Research and Manage
ment Science: Simulation, ed. B. L. Nelson and S. G.
Henderson, 291–350. Amsterdam: Elsevier Science.
Markowitz, H. M. 1952. Portfolio selection. Journal of
Finance 7:77–91.
Pflug, G. 2000. Some remarks on the valueatrisk and con
ditional valueatrisk. In Probabilistic Constrained Op
timization: Methodology and Applications, ed. S. Urya
sev. Dordrecht: Kluwer.
RiskMetricsTM1996. Technical document, 4th edition.
J. P.Morgan.
Rockafellar, R. T., and S. Uryasev. 2000. Optimization of
conditional valueatrisk. Journal of Risk 2:21–41.
Rockafellar, R. T., and S. Uryasev. 2002. Conditional Value
atRiskforgenerallossdistributions.JournalofBanking
and Finance 26:1443–1471.
Schneider, R. 1993. Convex surfaces, curvature and surface
area measures. In Handbook of Convex Geometry, ed.
P. M. Gruber and J. M. Wills, Volume A, 273–299.
Amsterdam: Elsevier Science.
Wang, W., andS.Ahmed.2007.Sampleaverageapproxima
tion of expected value constrained stochastic programs.
Technical report, Submitted for publication.
AUTHOR BIOGRAPHIES
SOUMYADIP GHOSH is a Research Staff Member of
the Mathematical Sciences Division at IBM T.J. Watson
Research Center, Yorktown Heights, NY. His simulation
research interests include theory and practice in particu
lar on input dependence modeling and risk modeling, and
simulationbased optimization techniques. His other inter
ests are in the field of supplychain analysis, and queueing
theory based scheduling of largescale production systems.
He can be contacted at <ghoshs@us.ibm.com>.
620