Content uploaded by Morris A. Cohen
Author content
All content in this area was uploaded by Morris A. Cohen
Content may be subject to copyright.
Capacity Management in Rental Businesses
with Two Customer Bases
Sergei V. Savin, Morris A. Cohen, Noah Gans, Ziv Katalan
Department of Operations and Information Management,
The Wharton School, University of Pennsylvania,
Philadelphia, PA 19104
May 27, 2003
Abstract
We consider the allocation of capacity in a system in which rental equipment is accessed
by two classes of customers. We formulate the problem as a continuous-time analogue of the
one-shot allocation problems found in the more traditional literature on revenue management,
and we analyze a queueing control model that approximates its dynamics. Our investigation
yields three sets of results.
First, we use dynamic programming to characterize properties of optimal capacity alloca-
tion policies. We identify conditions under which “complete sharing” – in which both classes
of customer have unlimited access to the rental fleet – is optimal.
Next, we develop a computationally efficient “aggregate threshold” heuristic that is based
on a fluid approximation of the original stochastic model. We obtain closed-form expressions
for the heuristic’s control parameters and show that the heuristic performs well in numer-
ical experiments. The closed-form expressions also show that, in the context of the fluid
approximation, revenues are concave and increasing in the fleet size.
Finally, we consider the effect of the ability to allocate capacity on optimal fleet size.
We show that the optimal fleet size under allocation policies may be lower, the same as,
or higher than that under complete sharing. As capacity costs increase, allocation policies
allow for larger relative fleet sizes. Numerical results show that, even in cases in which dollar
profits under complete sharing may be close to those under allocation policies, the capacity
reductions enabled by allocation schemes can help to lift profit margins significantly.
Keywords: Service Systems, Queueing Control, Stochastic Knapsack, Fluid Models.
1 Introduction
Rental businesses found in many sectors of the economy share some fundamental attributes.
The rental company invests in equipment for which there is a potential demand, and a stream
of customers patronizes the company, renting its equipment. After each rental, the equipment is
returned to the company, and rental durations are typically significantly shorter than the life of
the equipment, so that each unit may be used repeatedly.
For those who manage rental businesses, important managerial decisions focus on matching
rental demand with the equipment supply. These decisions create a hierarchy of managerial
controls at the company’s disposal. Longer term, capital-investment decisions set the company’s
overall level of rental capacity and attempt to capture as much demand for rental services as
is (marginally) profitable. While they provide for long-term matching between supply and de-
mand, fleet sizing decisions may not be used to counterbalance short-term supply and demand
mismatches. On a tactical time scale, capacity allocation decisions may be needed to determine
which customers are served when rental capacity becomes scarce.
In this paper we consider a simple, stationary model of a rental problem in which capacity
must be rationed among two classes of arriving customers. We address both the lower-level,
allocation problem and the higher-level capacity sizing problems, with an emphasis on the former.
Our approach to the tactical allocation problem follows in the spirit of early formulations of
seat allocation problems in the airline yield-management literature. (For example, see Littlewood
(1972), Alstrup et al. (1986), and Belobaba (1989).) When should arriving customers of each of
the classes be allowed to rent equipment, and when would they be “closed out?”
Two common assumptions made in traditional revenue management models make them in-
adequate for our purposes, however: they assume that there exists a finite horizon over which
units of capacity can be sold and that each unit of capacity can be used only once. For example,
in aviation there are c seats on a flight, and once they are sold or the plane takes off they are
not available for sale.
While hotel problems could (and perhaps should) in principle be formulated as rental prob-
lems, most academic literature only addresses the problem of allocating the rooms available
on a single night. (For example, see Rothstein (1974), Ladany (1977), Williams (1977), Liber-
man and Yechiali (1978), Bitran and Gilbert (1996).) An exception is the application of linear-
programming (LP) based “bid price” controls to hotel stays. (See Williamson (1992) and Weath-
erford (1995)). In this case, multiple nights are considered, but the problem is modeled as
1
deterministic.
In rental businesses, however, the problem is most naturally treated as a problem in dynamic
and stochastic control. An arriving customer rents a unit, which becomes unavailable for the
length of the person’s rental. When the rental period ends, the unit becomes available again.
Over any short period of time, the numbers of arriving and departing customers may be uncertain,
and managers must develop effective policies for controlling the rental of system capacity.
We view the allocation of rental capacity as a continuous time, infinite horizon problem
in which arrivals of customers and durations of rentals are both uncertain. We formulate this
problem as one of admission control to a multiple-server loss system. We assume that, if admitted
into service, a customer pays a daily rental fee which depends on the class to which she belongs.
If the rental request is rejected then a class-dependent, lump-sum penalty is incurred. We show
that this capacity allocation problem can be reduced to a special case of the stochastic knapsack
problem introduced in the telecommunications literature (Ross and Tsang (1989)), one in which
arriving “objects” (demands) are all of size one.
We note that this formulation does not capture the use of prior information on rental duration.
In some contexts, such as truck-trailer leasing (the application that originally motivated this
paper) and storage-locker rentals, this information may not be available. In others, such a hotel
systems, customer-stated projections of expected duration are readily available and can be of
great value in improving the effectiveness of capacity allocation decisions. Thus, our approach
has important limits.
Nevertheless, the simplicity of our approach allows us to make a number of contributions:
1. We demonstrate that the allocation problem with lump-sum penalties can be reduced to one
with no penalties by appropriately adjusting the values of the rental fees. The adjustment
factors are proportional to the penalty values and the service rates.
2. We characterize two conditions under which the complete sharing policy that is often used
in practice is optimal: the first is in the “off-season,” when the overall demand for service
is low relative to capacity; the second is in the “peak season” of high demand, given that
different customer classes are sufficiently similar.
3. We analyze a fluid approximation to the original system, and we derive closed-form expres-
sions that characterize the controls and the performance obtained when allocating capacity
using an “aggregate threshold” policy. These expressions allow us to efficiently calculate
2
admission thresholds that appear to perform well in the original, stochastic model.
4. Closed-form expressions for the fluid model also allow us to demonstrate the concavity of
the fluid model’s revenues with respect to the fleet size when the aggregate threshold policy
is used. This concavity is the essential property required for the efficient solution of the
related, long-term problem of capacity sizing.
5. We show that, in the presence of capacity rationing, the optimal fleet size can be either
higher or lower than that obtained when no rationing is employed. The relationship between
the two fleet sizes varies systematically with the cost of capacity.
6. We present numerical experiments that highlight the potential benefit of jointly optimizing
fleet size and tactical controls. In particular, there appear to be cases in which the sub-
optimal use of complete sharing results in near-optimal dollar profits. Even in these cases,
however, the return on in investment in capacity suffers significantly.
More broadly, these numerical results complement our characterization of sufficient conditions
for the optimality of complete sharing policies. Complete sharing policies maximize physical
measures of system utilization. When complete sharing is optimal, this physical measure of
system utilization is a good proxy for economic utilization. When complete sharing is not optimal,
however, its use can degrade profit margins and, by extension, economic measures of resource
efficiency, such as return on assets. In this case, physical and economic measures of efficiency do
not coincide.
Thus, within the context of the stationary problem developed in this paper, we are able to
characterize how the use of tactical controls affects longer-term decisions regarding fleet size, as
well as longer-term and economic efficiency. While a complete analysis of the problem, which
should account for seasonal changes in demand patterns, is beyond the scope of this paper, our
current results represent a promising first step.
Finally, we note that our analysis and results complement that of two recent papers that
have independently considered the stochastic knapsack. Our analysis parallels that of Altman
et al. (2001), which uses dynamic programming techniques to study optimal capacity allocation
rules and develops and solves (numerically) a fluid approximation to the problem. Our special
problem structure, however, allows us to more fully characterize properties of optimal and heuris-
tic admission controls. We are able to develop a number of additional useful structural results
concerning optimal policies and to develop precise, closed-form characterizations in the context
3
of fluid control.
¨
Ormeci et al. (2001) also uses dynamic programming techniques to develop
similar characterizations of structural properties of the optimal policy. It does not, however,
consider heuristic controls. Neither of these papers considers how the use of tactical controls
affects longer-term fleet-sizing decisions.
The remainder of the paper is organized as follows. In the next section we formulate and ana-
lyze the capacity allocation problem and demonstrate how the problem with lump-sum penalties
can be reduced to one without penalties. We also discuss properties of optimal capacity allo-
cation policies and establish conditions for the optimality of the complete sharing policy. In
Section 3 we introduce a heuristic aggregate threshold policy based on a fluid-model version of
our system, and we compare the performance of this heuristic to that of the optimal policy. In
Section 4, we investigate the interaction between capacity sizing and capacity allocation problems
and establish how optimal fleet capacity changes in the presence of capacity rationing. We then
conclude with a discussion of the results and describe open issues and worthwhile extensions. All
proofs may be found in the Appendix.
2 The Capacity Allocation Problem
In this section we analyze the capacity allocation decision. We formulate it as a problem in the
control of queues, and we use dynamic programming techniques to investigate properties of the
optimal control policies.
2.1 Model Description
Consider a fleet of c identical vehicles or pieces of rental equipment accessed by 2 customer
classes whose arrival processes are independent and Poisson with intensities λ
1
and λ
2
.Letthe
durations of their rentals be independent, exponentially distributed random variables of mean
µ
−1
1
and µ
−1
2
. Suppose, further, that each arrival wishes to rent exactly one unit of capacity.
At each arrival epoch a system controller, such as the manager of the rental location, can
decide whether or not to admit an arriving customer for service – if one of the c units of capacity
is free – or to reject the arrival. Arrivals that are admitted to service are permitted to complete
the duration of their (randomly distributed) rental periods uninterrupted. Rejected customers
do not queue; they exit the system. Similarly, customers that arrive when all c units of capacity
are rented are lost.
4
Rewards and penalties associated with the system state and action are as follows. Arrivals
that are admitted to service pay respective rental fees of $a
1
and $a
2
per unit of time. When
a customer’s rental request is denied – either due to the absence of available rental capacity
or because of the particular capacity allocation policy used – a lump-sum penalty of $π
1
or
$π
2
is incurred, depending on the customer’s class. (For more on rejection penalties and their
relationship to service-level constraints, please see Appendix A.)
The assumption that interarrival and service times are exponentially distributed implies that,
at times between these event epochs, the system evolves as a continuous time Markov chain. At
these times, the system state can be completely described by the numbers of class-1 and class-2
customers renting units. Furthermore, system control – in the form of acceptance or rejection
of an arriving customer – is exercised only at arrival epochs, and it is sufficient to consider only
the discrete-time process embedded at arrival and departure epochs when determining the form
of effective system controls (see Chapter 11 in Puterman (1994)). That is, the system can be
modeled as a discrete time Markov Decision Process (MDP).
In Appendix B we formally define discounted and average-cost versions of this MDP. For
both cases, we also indicate why there exist stationary, deterministic policies that are optimal.
Therefore, we will only consider policies of this class. Furthermore, rather than directly analyze
the MDPs’ objective functions, we use well-known results concerning the convergence of the
value-iteration procedure to analyze the problems.
2.2 Value Iteration Formulation
We begin our definition of the value iteration procedure by “uniformizing” the system. (See
Lippman (1975) and Serfozo (1979).) Formally, we let Γ = λ
1
+ λ
2
+ cµ
1
+ cµ
2
and, for the
discounted problem with a continuous-time discount rate of α, we uniformize the system at rate
α +Γ.
Without loss of generality, we can define the time unit so that α +Γ=1. Thus, λ
i
≡
λ
i
α+Γ
and µ
i
≡
µ
i
α+Γ
become, respectively, the probability that the next uniformized transition is a
type-i arrival or service completion. Similarly, a
i
≡
a
i
α+Γ
is the expected discounted revenue per
type-i rental until the time of the next uniformized transition.
Note that the uniformization rate includes the discount factor, α. In fact, it is well known
that discounting at rate α is equivalent to including a constant intensity at which the process
terminates, after which no more profits will be earned. Thus, one may think of α as the per-
5
period probability that the next transition is a terminating one. (For example, see Section 5.3
in Puterman [12].)
The rate also includes rental completions of “phantom” customers. For example, if the current
system state is (k
1
,k
2
), then the probability that one of (c − k
1
) phantom type-1 customers or
(c − k
2
) phantom type-2 customers completes a rental is (c − k
1
)µ
1
+(c − k
2
)µ
2
.Attheendof
such a phantom rental, the observed state remains the same, (k
1
,k
2
).
Given these uniformized system parameters, we define the value-iteration operator T as
Tf(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[f(k
1
,k
2
)] + λ
2
H
2
[f(k
1
,k
2
)]
+ µ
1
k
1
f(k
1
− 1,k
2
)+µ
2
k
2
f(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)f(k
1
,k
2
). (1)
The heart of the procedure is carried out via the maximizations
H
1
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
) − π
1
,f(k
1
+1,k
2
)] when k
1
+ k
2
<c,
f(k
1
,k
2
) − π
1
when k
1
+ k
2
= c,
(2)
and
H
2
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
) − π
2
,f(k
1
,k
2
+1)] whenk
1
+ k
2
<c,
f(k
1
,k
2
) − π
2
when k
1
+ k
2
= c,
(3)
which are specified for any function f defined on the state space S = {(k
1
,k
2
) ∈ Z
2
| k
1
≥ 0,k
2
≥
0,k
1
+ k
2
≤ c}.
Let v
0
(k
1
,k
2
) ≡ 0 represent an initial estimate of the optimal expected discounted profit, and
v
n
represent the estimated value after n iterations of the value-iteration algorithm:
v
n
(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v
n−1
(k
1
,k
2
)] + λ
2
H
2
[v
n−1
(k
1
,k
2
)]
+ µ
1
k
1
v
n−1
(k
1
− 1,k
2
)+µ
2
k
2
v
n−1
(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v
n−1
(k
1
,k
2
). (4)
Then the fact that
λ
1
+ λ
2
+(µ
1
+ µ
2
)c<1(5)
for α>0 ensures that T is a contraction operator and that {v
n
} converges to the optimal “value
function”
v(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+ µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
) , (6)
6
whose value equals that of the MDP’s optimal objective function (see Porteus (1982)).
The first two terms on the right-hand side of (6) represent the expected discounted revenue
earned until the next uniformized transition. The following four represent the probabilities and
associated profits-to-go associated with system arrivals and service completions. The last term
represents the probability and profit-to-go of a “phantom” rental completion. (Without loss of
generality, we omit the probability, α, and value, 0, associated with a terminating transition.)
If no rejection penalties are used (π
1
= π
2
= 0), then (6) directly reduces to the stochastic
knapsack problem, well known from the telecommunications literature (Ross and Tsang (1989)).
Furthermore, for any given rental fees and penalty values (a
1
,a
2
,π
1
,π
2
), there exists an equivalent
stochastic knapsack formulation with adjusted rental fees: (a
1
, a
2
, 0, 0).
Theorem 1
For any problem with rewards and penalties (a
1
,a
2
,π
1
,π
2
), and optimal value function v(k
1
,k
2
),
there exists an alternative formulation with rewards
a
i
= a
i
+ π
i
(µ
i
+ α) i =1, 2 , (7)
zero penalties, and optimal value function v(k
1
,k
2
) for which
v(k
1
,k
2
)=v(k
1
,k
2
)+
λ
1
π
1
α
+ λ
2
π
2
α
+ π
1
k
1
+ π
2
k
2
. (8)
Furthermore, a policy is optimal for the original problem if and only if it is optimal for the
transformed problem with adjusted revenues and zero penalties.
Therefore, in the analysis that follows we will consider only the transformed problem v
n
(k
1
,k
2
)
with adjusted fees (a
1
, a
2
). Observe that the adjustment factors are linear in the penalty values
and the service rates.
We note that the paper’s numerical results are performed using an average-cost MDP formu-
lation. (Because they do not depend on the starting state, “average-cost” results are easier than
discounted results to interpret.) In this case, a similar result holds, with
a
i
= a
i
+ µ
i
π
i
,i=1, 2 . (9)
For a formal development of the value iteration procedure and the analogue of Theorem 1 for
the average cost problem, please see Appendix C.
7
2.3 Optimality of switching-curve policies
To establish structural properties of the optimal control policy, it is sufficient to show that
certain properties of the functions defined on S are preserved under the action of the value
iteration operator, T (see Porteus (1982)). In particular, we are interested in submodularity. We
say that f(k
1
,k
2
) is submodular in k
1
and k
2
if
f(k
1
+1,k
2
+1)− f (k
1
,k
2
+1)≤ f (k
1
+1,k
2
) − f (k
1
,k
2
),k
1
+ k
2
+2≤ c. (10)
Let F be the set of all f defined on S that are submodular in k
1
and k
2
.
The following Theorem states that F is closed under T, so that the value iteration operator
preserves submodularity of the value function. This, in turn, implies that the optimal capacity
allocation policy is of a special form; it is a “switching curve” policy.
Theorem 2 (Altman et al. (2001);
¨
Ormeci et al. (2001); Savin (2001))
a) f ∈ F ⇒ Tf ∈ F , and therefore v(k
1
,k
2
) ∈ F
b) In turn, for each k
1
it is optimal to admit customers of class 1 when in state (k
1
,k
2
) if
and only if k
2
<k
min
2
(k
1
),where
k
min
2
(k
1
)=
c − k
1
, if v(k
1
+1,c− k
1
− 1) > v(k
1
,c− k
1
− 1)
min(k
2
:0≤ k
2
≤ c − k
1
− 1, v(k
1
+1,k
2
) ≤ v(k
1
,k
2
)), otherwise.
Similarly, for each k
2
it is optimal to admit customers of class 2 when in state (k
1
,k
2
) if and
only if k
1
<k
min
1
(k
2
),where
k
min
1
(k
2
)=
c − k
2
, if v(c − k
2
− 1,k
2
+1)> v(c − k
2
− 1,k
2
)
min (k
1
:0≤ k
1
≤ c − k
2
− 1, v(k
1
,k
2
+1)≤ v(k
1
,k
2
)) , otherwise.
Part b) of the Theorem can be interpreted as follows: when a given number of customers of a
particular class is already renting equipment, the “next” customer of the same class is admitted
if and only if the number of customers of the other class present in the system does not exceed
some critical value. This is switching curve policy, characterized by c critical indices for each of
the customer classes.
For the average cost case we can develop analogous results. At every pass of the value iteration
procedure, the operator preserves the submodularity of the estimate of the value function. This
ensures that the results of Theorem 2 apply to the optimal control policy for this case as well.
As an illustration of the optimal capacity allocation policies we consider an example with
a
1
= 10, a
2
=5,λ
1
= 25, λ
2
= 10, µ
1
=5,µ
2
=1,c = 10 for the case when average revenue per
8
period is maximized. Figure 1 describes the capacity allocation decisions for class 2 customers
and illustrates the notion of the “switching curve.”
Figure 1: The optimal capacity allocation policy for class 2 customers when the average adjusted
revenue per period is maximized (a
1
=10, a
2
=5,λ
1
=25,λ
2
=10,µ
1
=5,µ
2
=1,c= 10).
One feature of this example worth noting is the following: class 1 customers are always
allowed to rent equipment, i.e., k
min
2
(k
1
)=c − k
1
for all feasible k
1
(and so we did not include
the graph of optimal allocation for class 1). In this case, we say that class 1 customers are a
preferred class. While in every numerical example we tested there existed a preferred class, we
have not been able to prove that such a class exists universally. Nevertheless, we have been able
to characterize a great deal about preferred customer classes.
2.4 Preferred classes and the optimality of the complete sharing policy
In this section we investigate the conditions which make a particular customer class a preferred
one. Closely connected to the question about the nature of preferred classes is the issue of
the optimality of the complete sharing policy: complete sharing is optimal when both customer
classes are preferred. The following theorem provides sufficient conditions under which one – or
both – classes may be preferred.
Theorem 3
a) Define λ = λ
1
+ λ
2
, µ =min(µ
1
,µ
2
) , a =max(a
1
, a
2
) and
c
∗
i
=2+
λ
µ
a
a
i
µ
i
+ α
µ
i
6+4
λ +2µ
µ
i
+
µ
i
µ + α
2+
λ +2
µ
µ
i
+ α
− 1
,i=1, 2. (11)
Then for systems with capacity c>c
∗
i
,is always optimal to admit class i customers, i =1, 2.
9
b) In turn, for c ≥ max (c
∗
1
,c
∗
2
) the policy of complete sharing of the service fleet is optimal.
Theorem 3 provides a lower bound on the amount of capacity sufficient to ensure that a
particular customer class (or both classes) has unrestricted access to the available equipment.
Of course, for profit-maximizing firms, capacity costs may prevent c from becoming large enough
to optimally operate in the complete-sharing regime. In Section 4 we investigate the interaction
among capacity cost, fleet size, and tactical control in more detail.
We note that for each customer class this lower bound is, as expected, a non-increasing
function of the penalty-adjusted fee paid by customers of this class. We observe that in the
simple case of µ
1
= µ
2
α, (11) implies that c
∗
1
,c
∗
2
λ/µ. Thus, in the presence of seasonal
demand patterns, these results describe the “off-peak” season when the demand for rentals may
be significantly lower than the available capacity.
Note that Theorem 3 is stronger than a limiting statement. In general, it is not hard to
imagine that as c → +∞, a complete sharing policy will be asymptotically optimal. Theorem
3, however, says that there is a fixed, finite c above which complete sharing is optimal. This is
because, as more and more pieces of equipment are rented, the probability that the next event
is a service completion, rather than an arrival, grows. Thus, the busier the system, the stronger
its drift toward emptying out. For large enough c the expected loss of revenue due to blocking
becomes small when compared to the immediate gain of taking the next customer, no matter
which class she belongs to.
Theorem 3 states that for sufficiently high service capacity the complete sharing policy is
optimal. It is also possible to show that the complete sharing is optimal even in the “peak
season”, when capacity is tight, provided that the customer classes are similar in terms of their
penalty-adjusted rental fees:
Theorem 4
For either class i ∈{1, 2},andj = i,if
a
i
max[µ
i
,µ
j
]
≥
λ
j
λ
j
+ µ
i
a
j
µ
j
, (12)
then it is always optimal to admit type i customers.
The statement of Theorem 4 is intuitively appealing: all other parameters of the problem
being fixed, there exists a minimum value of the adjusted rental fee a
i
which ensures that cus-
tomers of this class should be freely admitted into the system. Complete sharing of service fleet
is optimal when (12) is satisfied for both classes, i.e. when a
1
and a
2
are “close”.
10
Furthermore, recall that a
i
= a
i
+ π
i
(µ
i
+ α) depends on both the revenue earned when
accepting a class i customer and the penalty paid when rejecting class i demand. That is, a
preferred customer may be profitable to serve, unprofitable not to serve, or some combination
of the two. For example, a high-volume customer, such as a national account, may receive a
favorable rental rate in return for a large stream of rentals. At the same time, contractual
service-level requirements or the customer’s market power may imply a large rejection penalty,
so that class i arrivals become VIP. (For more on the relationship between service-level constraints
and rejection penalties, see Appendix A.)
The sufficient conditions of Theorem 4 are direct analogues to expressions for protection levels
in airline seat allocation models. (For example, see Belobaba (1989).) Both sets of inequalities
can be interpreted in terms of simple marginal analysis. For instance, for i =1, the right hand
side of (12) describes (a bound on) the expected cost of admitting an arriving class 1 customer.
It is the expected revenue lost from a blocked class 2 customer that might have been served.
Here
λ
2
λ
2
+µ
1
is the probability that a class 2 arrives before the admitted class 1 finishes service,
and
a
2
µ
2
is the expected revenue lost, given the blocking occurs.
In fact,
¨
Ormeci et al. (2001) develops a characterization of preferred classes that mirrors
this “marginal analysis” result. The left hand side of (12) is more complex – and more stringent
– than simply
a
1
µ
1
, however. This difference better reflects the more complex dynamics of our
system.
Observe that there exists a broad range of circumstances under which a class of customers may
be preferred. First, note that if a
i
> a
j
and a
i
/µ
i
≥ a
j
/µ
j
,thentype-i customers have higher
penalty-adjusted rental rates and higher expected rental durations – and they are preferred.
Second, even though a
j
/µ
j
≤ a
i
/µ
i
,type-j customers may also be preferred, as long as a
j
is not
too far below a
i
.
Conversely, it is possible to construct examples in which neither of the sufficient conditions of
Theorem 4 is satisfied. This occurs when a
i
> a
j
, µ
i
>µ
j
,anda
i
/µ
i
< a
j
/µ
j
. Of course, failure
to satisfy the sufficient conditions does not demonstrate that there exists no preferred class.
Finally, we note that the conditions of Theorem 4 are broadly applicable in that they do
not depend on the service capacity, c, or on the intensity of arrivals of the customer class being
considered for admission. The required parameters are simple to estimate from observable data,
and the results are simple to interpret.
11
3 Heuristic Capacity Allocation Policies
In general, it is optimal to base the control of admissions into the service on the numbers of
customers of both classes 1 and 2 that are in the system at the time each control decision is
made. In practice, however, these “vector” policies may be difficult to implement, especially for
rental systems with large capacities.
Admission control decisions that are based on the value of a particular scalar metric derived
from this vector state, rather than the detailed state of the system, may also provide effective (if
suboptimal) controls. One of the most widely used heuristics is the aggregate threshold (trunk
reservation) policy.
The aggregate threshold (AT) policy assumes that there exists a preferred customer class,
and it is the class that offers higher revenue per unit of time. The AT policy admits second-class
customers as long as the total number of customers already in the system does not exceed some
critical threshold value.
Besides being intuitively appealing, aggregate threshold policies have been proven to be
optimal whenever µ
1
= µ
2
(see Miller (1969)). More generally, we expect them to perform well
in cases when the expected service times for different customer classes are similar.
Figure 2 illustrates the best AT policy, as well as the optimal control policy, for the same
example shown in Figure 1. While the control exercised by the AT policy differs from that of the
optimal policy, the revenues it generates are nearly optimal, falling below optimality by about
0.15%.
Figure 2: The optimal and the best AT policies for class 2 customers (a
1
=10, a
2
=5,λ
1
=
25,λ
2
=10,µ
1
=5,µ
2
=1,c= 10).
AT policies, however, do not yield closed-form expressions for system performance measures.
12
In general, the task of computing the value of the best aggregate threshold level can be compa-
rable in its complexity to the task of computing the optimal control policy.
Ideally, we would like to have a policy that combines ease of calculation with the robust
performance of AT controls. In the following section we develop such a heuristic. It uses a fluid-
model approximation of the stochastic model to derive closed-form expressions for the aggregate
threshold values.
3.1 Fluid models and scaling
In many practical situations, both the size of rental fleet c and the offered rental intensities
ρ
1
=
λ
1
µ
1
and ρ
2
=
λ
2
µ
2
are large. Under these conditions a deterministic fluid model may offer
a good approximation to the original control problem. Indeed, Altman et al. (2001) offer a
heuristic derivation of such a fluid model as the limit of a linearly scaled sequence of MDPs, and
they numerically evaluate the resulting Hamilton-Bellman-Jacobi equations.
We follow the approach of Altman et al. (2001), but given the underlying structure of our
problem, in which there are two classes of customers, we can directly analyze the trajectory of
the fluid system. This allows us to develop an aggregate threshold heuristic whose performance is
robust and whose closed-form expressions allow for immediate calculation of policy parameters.
Furthermore, our analysis also allows us to demonstrate the concavity of discounted revenues
(of a “µ-scaled” version of our model), with respect to the fleet size, c, a property that becomes
important in the capacity-sizing analysis of Section 4.
We start by defining the state space and dynamics for fluid approximations (in general).
Time t is continuous, and the state parameters k
1
(t)andk
2
(t) of the original model become
continuous state variables, restricted to set S
f
=(k
1
(t) ≥ 0,k
2
(t) ≥ 0,k
1
(t)+k
2
(t) ≤ c). Poisson
customer arrivals are replaced by the deterministic continuous “flow” arrivals with intensities λ
1
and λ
2
. The departure process becomes deterministic as well: for the state (k
1
(t),k
2
(t)) it is
represented by an outflow at rate µ
1
k
1
(t)+µ
2
k
2
(t).
Arrivals are controlled as follows: at time t,acontrolpolicy(u
1
(t),u
2
(t)) results in the total
customer inflow of u
1
(t)λ
1
+ u
2
(t)λ
2
. Thus, for the control trajectories (u
1
(t),u
2
(t)) (0 ≤ u
i
(t) ≤
1,i =1, 2) the Kolmogorov evolution equations for the original system are replaced by
dk
1
(t)
dt
= u
1
(t)λ
1
− µ
1
k
1
(t)and
dk
2
(t)
dt
= u
2
(t)λ
2
− µ
2
k
2
(t), (13)
13
with a constraint that reflects the finite size of the service fleet
λ
1
u
1
(t)+λ
2
u
2
(t) ≤ µ
1
k
1
(t)+µ
2
k
2
(t),i=1, 2, whenever k
1
(t)+k
2
(t)=c. (14)
The total discounted revenue is then the objective to be maximized. If at t = 0 the system is
in the state (k
1
,k
2
), then – for a feasible (under (14)) control policy ∆ which uses (u
1
(t),u
2
(t))
– the total discounted revenue is
ˆ
R
α
(k
1
,k
2
, ∆) =
∞
0
(a
1
k
1
(t)+a
2
k
2
(t)) e
−αt
dt =
a
1
k
1
µ
1
+ α
+
a
2
k
2
µ
2
+ α
+ R
α
(k
1
,k
2
, ∆), (15)
where
R
α
(k
1
,k
2
, ∆) =
∞
0
a
1
λ
1
u
1
(t)
µ
1
+ α
+
a
2
λ
2
u
2
(t)
µ
2
+ α
e
−αt
dt, (16)
is the part of the revenue that actually depends on the control policy chosen. In what follows,
the term “revenue” is used to designate R
α
(k
1
,k
2
, ∆).
Our aggregate threshold heuristic is based on a “scaled” version of the fluid model:
Definition 1
A µ-scaled version of the fluid model with parameters λ
1
, λ
2
, µ
1
,andµ
2
is the problem with
parameters λ
s
1
=
λ
1
µ
µ
1
, λ
s
2
=
λ
2
µ
µ
2
, µ
s
1
= µ
s
2
= µ for µ ∈ [µ
1
,µ
2
].
Note that in every µ-scaled version of the fluid model, the departure rates of both customer classes
are equal and
λ
s
1
µ
s
1
=
λ
1
µ
1
,
λ
s
2
µ
s
2
=
λ
2
µ
2
. Since the departure rates of both classes are the same, one
can use arguments similar to those in Miller (1969) to show that the optimal admission control
decisions only depend on the total number of customers k(t)=k
1
(t)+k
2
(t) in the system. Thus,
given a
1
> a
2
, a control policy which admits as many class 1 customers as possible and limits
the admissions of class 2 customers is optimal for any µ-scaled problem.
3.2 Fluid aggregate threshold heuristic
In the µ-scaled model, system dynamics simplify to
dk(t)
dt
= u
1
(t) λ
s
1
+ u
2
(t) λ
s
2
− µk(t) . (17)
In turn, a fluid analog of the original, stochastic system’s AT policy admits class-2 customers
if and only if the total system occupancy, k(t), does not exceed a “fluid aggregate threshold”
(FAT), k
FAT
.Whenρ
1
≥ c or ρ
1
+ ρ
2
≤ c such a FAT policy is a direct analog of AT policies
in the original, stochastic system. For ρ
1
<c<ρ
1
+ ρ
2
, however, there does not exist a neat
correspondence. Therefore, in the following sections we define and analyze the FAT policy within
each subset of the relevant parameter range.
14
3.2.1 The FAT policy when ρ
1
≥ c.
For systems with ρ
1
≥ c class-1 traffic alone is sufficient to ensure complete utilization of the
rental fleet, and a threshold policy can be defined and analyzed in a straightforward fashion. In
this case, the control (u
1
(t),u
2
(t)) is defined as follows:
(u
1
(t),u
2
(t)) =
(1, 1), for k(t) <k
FAT
,
(1, 0), for k
FAT
≤ k(t) <c,
cµ
λ
s
1
, 0
, for k(t)=c.
(18)
Note that, once the system hits the boundary and k(t)=c, customers continue to be admitted
at the maximum feasible rate, and the system state remains at the boundary thereafter.
Control (18) then implies that, at time t, the revenue generation rate r(t)=
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ+α
for ρ
1
≥ c is given by
r(t | ρ
1
≥ c)=
a
1
λ
s
1
+a
2
λ
s
2
µ+α
, for k(t) <k
FAT
,
a
1
λ
s
1
µ+α
, for k
FAT
≤ k(t) <c,
a
1
µc
µ+α
, for k(t)=c.
(19)
To compute the total discount revenues for a given k
FAT
, we must also account for the starting
state k ≡ k(0).
When k<k
FAT
≤ c, there are three elements to the discounted revenues: those earned as
k(t) approaches k
FAT
; those earned when k
FAT
≤ k(t) ≤ c; and those earned after the boundary
has been hit. We calculate each in turn. Let t
FAT
=
1
µ
ln
ρ
1
+ρ
2
−k
ρ
1
+ρ
2
−k
FAT
bethetimethatsystem
state hits k
FAT
,sothatk(t
FAT
)=k
FAT
. Then from (19) we have
t
FAT
0
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
a
1
λ
s
1
+ a
2
λ
s
2
µ + α
1 − exp (−αt
FAT
)
α
. (20)
Similarly, let t
c
= t
FAT
+
1
µ
ln
ρ
1
+ρ
2
−k
FAT
ρ
1
+ρ
2
−c
be the time at which the system state hits c,sothat
k(t
c
)=c. Then using (19) we have
t
c
t
FAT
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
a
1
λ
s
1
µ + α
exp (−αt
FAT
) − exp (−αt
c
)
α
. (21)
Finally, from (19) the revenues earned after reaching the boundary are given by
+∞
t
c
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
a
1
µc
µ + α
exp (−αt
c
)
α
. (22)
Collecting the revenue terms (20)-(22), substituting for t
FAT
and t
c
, and simplifying, we then
obtain the discounted revenues for the FAT policy when ρ
1
≥ c and k ≤ k
FAT
<c:
R
FAT
α
(k, k
FAT
| ρ
1
≥ c, k ≤ k
FAT
<c)=
µ
α(α + µ)
a
1
ρ
1
+ a
2
ρ
2
−
ρ
1
+ ρ
2
− k
FAT
ρ
1
+ ρ
2
− k
α
µ
a
2
ρ
2
+ a
1
(ρ
1
− c)
α+µ
µ
(ρ
1
− k
FAT
)
α
µ
. (23)
15
When k
FAT
≤ k<c, only type-1 customers are admitted to the system. In this case, in the
above analysis we replace t
FAT
by 0 and k
FAT
by k. Then analogous calculations yield
R
FAT
α
(k, k
FAT
| ρ
1
≥ c, k
FAT
≤ k<c)=
µa
1
α(α + µ)
ρ
1
−
(ρ
1
− c)
α+µ
µ
(ρ
1
− k)
α
µ
. (24)
3.2.2 FAT policy when ρ
1
+ ρ
2
< c
When ρ
1
+ ρ
2
<c, a threshold policy with k
FAT
<cleads to incomplete utilization of the rental
fleet and may trivially be improved by setting k
FAT
= c so that all customers are admitted for
service, no matter what the initial state of the system, k(0). Here, the policy is, again, a direct
analog of AT policies in the original, stochastic system. Specifically, the optimal fluid-threshold
of c corresponds to complete sharing, an AT policy with a threshold of c.
Because ρ
1
+ ρ
2
<c, even with no control the boundary k(t)=c is never hit (for t>0). In
this case, the optimal control is
(u
1
(t),u
2
(t)) = (1, 1) ,
for any system state, k(t), and the rate at which revenue is earned is
r(t | ρ
1
+ ρ
2
<c)=
a
1
λ
s
1
+ a
2
λ
s
2
µ + α
.
In turn, the revenue calculation is
R
FAT
α
(k, c | ρ
1
+ ρ
2
<c)
=
∞
0
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
µ
α(µ + α)
(a
1
ρ
1
+ a
2
ρ
2
) . (25)
3.2.3 FAT policy when ρ
1
< c ≤ ρ
1
+ ρ
2
.
Finally, when ρ
1
<c≤ ρ
1
+ ρ
2
there does not appear to exist a fluid analog of a threshold
policy that is both effective and straightforward to implement. On the one hand, a threshold
of k
FAT
<cresults in incomplete utilization of the rental fleet and can be improved upon by
admitting some class-2 customers. On the other, setting k
FAT
= c and admitting all class-2
customers is infeasible, since the maximum rate at which the system can be cleared is strictly
less than the rate at which customers are arriving: cµ < λ
s
1
+ λ
s
2
.
In this case, a natural interpretation of the threshold rule defines a “soft” threshold when
k(t)=c, one that limits, but does not eliminate, the flow of class-2 customers into the system:
(u
1
(t),u
2
(t)) =
(1, 1), for k(t) <c,
1,
µc−λ
s
1
λ
s
2
, for k(t)=c,
(26)
16
so that
r(t | ρ
1
<c≤ ρ
1
+ ρ
2
)=
a
1
λ
s
1
+a
2
λ
s
2
µ+α
, for k(t) <c,
a
1
λ
s
1
+a
2
(µc−λ
s
1
)
µ+α
, for k(t)=c.
(27)
Thus for ρ
1
<c<ρ
1
+ ρ + 2, the control generates system behavior and revenue that differ from
those when k
FAT
<cor k
FAT
= c, and we denote this soft threshold as k
FAT
= c
−
.
Given k
FAT
= c
−
and any k ≡ k(0) ∈ [0,c], the system’s revenues can be split into two
components: those earned before reaching c, and those earned after. If t
c
=
1
µ
ln
ρ
1
+ρ
2
−k
ρ
1
+ρ
2
−c
is
the time required for the system to reach the boundary, than the first revenue component in (27)
gives us
t
c
0
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
a
1
λ
s
1
+ a
2
λ
s
2
µ + α
1 − exp (−αt
c
)
α
. (28)
After the full capacity is reached, we use the bottom revenue generation rate within (27) to
obtain
+∞
t
c
a
1
λ
s
1
u
1
(t)+a
2
λ
s
2
u
2
(t)
µ + α
e
−αt
dt =
a
1
λ
s
1
+ a
2
(µc − λ
s
1
)
µ + α
exp (−αt
c
)
α
. (29)
Adding (28) and (29), and using the expression for t
c
,wethenhave
R
FAT
α
(k | ρ
1
<c≤ ρ
1
+ ρ
2
)=
µ
α (α + µ)
(a
1
ρ
1
+ a
2
ρ
2
) − a
2
(ρ
1
+ ρ
2
− c)
α
µ
+1
(ρ
1
+ ρ
2
− k)
α
µ
. (30)
3.2.4 Optimal Thresholds and Revenues for the FAT Policy
We can use the expressions we have derived for discounted revenues to determine both opti-
mal thresholds and optimal discounted revenues. In both cases, we obtain simple, closed-form
expressions.
First we address the optimal threshold, k
∗
FAT
.Forρ
1
≥ c, its determination follows from
differentiation of (23) with respect to k
FAT
:
Theorem 5
The optimal value of the aggregate threshold, k
∗
FAT
, is independent of the starting state, k,and
is given by
k
∗
FAT
(c)=
0, for c<ρ
1
1 −
a
2
a
1
µ
µ+α
,
c − (ρ
1
− c)
a
1
a
2
µ
µ+α
− 1
, for ρ
1
1 −
a
2
a
1
µ
µ+α
≤ c ≤ ρ
1
,
c
−
for ρ
1
<c≤ ρ
1
+ ρ
2
,
c, for ρ
1
+ ρ
1
<c.
(31)
17
We observe that, all other problem parameters being fixed, the optimal aggregate threshold
value described by (31) is a non-decreasing function of the fleet size c. In particular, if the
available rental capacity falls below the critical value c
min
= ρ
1
1 −
a
2
a
1
µ
µ+α
, then it is optimal
not to admit any of class 2 customers into service. Conversely, if the rental capacity is sufficiently
large, exceeding the offered load from class 1, then the control on admissions of class 2 customers
should be postponed until the entire rental fleet is utilized. For the rental fleet values in between
these two critical quantities, some form of admission control on class 2 customers is optimal,
even in states in which some rental capacity is available. We observe that the critical index c
min
is a decreasing function of the ratio of penalty-adjusted rental fees a
2
/a
1
.
When the time discounting factor α is much smaller than µ, the optimal aggregate threshold
level, described in Theorem 5, is not particularly sensitive to the choice of µ. Even for rental
durations of several months, the service rates (inverse of the expected service time) are about
µ 10
−3
per day and are at least order of magnitude higher than any realistic values for α (for
example, 30% − 40% annual discounting rate results in α 10
−4
per day). The same argument
suggests that k
∗
FAT
is not sensitive to the choice of α. Thus, it is straightforward to use k
∗
FAT
as
a threshold for both discounted and “average-cost” versions of the problem.
Using expression for the optimal aggregate threshold (31), we obtain
Theorem 6
Given fixed λ
s
1
, λ
s
2
, µ, a
1
, a
2
and α, define c
min
= ρ
1
1 −
a
2
a
1
µ
µ+α
.
a) If the rental system starts in state k, then the optimal total discounted revenue is
R
FAT
α
(k, k
∗
FAT
(c)) =
µ
α(α+µ)
a
1
ρ
1
− a
1
(ρ
1
−c)
α+µ
µ
(ρ
1
−k)
α
µ
, for c ≤ c
min
,
µ
α(α+µ)
a
1
ρ
1
+ a
2
ρ
2
− a
2
ρ
2
+(ρ
1
−c)
a
1
a
2
µ
µ+α
α+µ
µ
(ρ
1
+ρ
2
−k)
α
µ
, for c
min
≤ c<ρ
1
,k<k
∗
FAT
(c),
µ
α(α+µ)
a
1
ρ
1
− a
1
(ρ
1
−c)
α+µ
µ
(ρ
1
−k)
α
µ
, for c
min
≤ c<ρ
1
,k≥ k
∗
FAT
(c),
µ
α(α+µ)
a
1
ρ
1
+ a
2
ρ
2
− a
2
(ρ
1
+ρ
2
−c)
α+µ
µ
(ρ
1
+ρ
2
−k)
α
µ
, for ρ
1
≤ c ≤ ρ
1
+ ρ
2
µ
α(α+µ)
(a
1
ρ
1
+ a
2
ρ
2
) , for ρ
1
+ ρ
2
<c.
(32)
b) For fixed values of rental fees, demand and service parameters, R
FAT
α
(k, k
∗
FAT
(c)) is an
non-decreasing concave function of the rental fleet size c for every k ≤ c.
18
Inspection of (32) shows that R
FAT
α
(k, k
∗
FAT
(c)), like k
∗
FAT
(c), is insensitive to the choice of µ
for α µ. (Of course, this insensitivity follows from the µ-scaled problem, not necessarily from
the two-class problem in which µ
1
= µ
2
.) Part b) of Theorem 6 also states that, for any starting
state, FAT revenues are concave in c. Thus, although the concavity of revenue with respect to
fleet size is difficult to demonstrate in the context of the original MDP, it emerges naturally from
the µ-scaled fluid approximation. This concavity property becomes important in the context of
fleet sizing decisions, which we discuss in Section 4.
3.3 Numerical study of the performance of the FAT heuristic
Our motivation for developing the FAT heuristic was that it should perform well and be easy to
implement. Therefore, to test the policy’s performance we have undertaken a series of numerical
studies which compare its average revenues to those obtained using the optimal control and the
complete sharing policy.
In two of the three cases analyzed above, translation of the FAT policy (31) to the context of
a discrete, stochastic system is straightforward. For ρ
1
≥ c we assume
µ
µ+α
≈ 1, when necessary,
and then round the resulting k
∗
FAT
down to the nearest integer. For ρ
1
+ ρ
2
<c,wesetthe
aggregate system threshold equal to c, effectively implementing a complete sharing policy.
When ρ
1
<c<ρ
1
+ ρ
2
, however, k
∗
FAT
= c
−
, and the inflow of class-2 rentals is partially
controlled. In this case, there is not a clear correspondence in a discrete system: setting the
aggregate threshold to c implements complete sharing, which does not control class-2 customers
at all; conversely, setting the threshold to c − 1 completely stops the flow of class-2 customers at
the boundary.
Because both alternatives of the FAT policy are trivial to compute, we include them both in
our numerical tests. In total, in each numerical experiment, we test four polices: the optimal
policy; FAT with c
−
set to c (“c
−
= c”); FAT with c
−
set to c − 1(“c
−
= c − 1”); and complete
sharing (CS). For each set of system parameters, we evaluate the Markov chains induced by the
four policies (in the case of the optimal policy, via value iteration) to calculate long-run average
revenues.
In our numerical tests, we fix the expected rental duration of class-1 rentals at 1/µ
1
=1,
and we run sets of tests in which systematically vary the offered load, θ =
ρ
1
+ρ
2
c
,aswellasthe
relative processing rate of class-2 customers, µ
2
. Within each test set, θ and µ
2
also remain fixed,
and we run (10 × θ + 1) experiments in which we systematically vary λ
1
and λ
2
.
19
CS Policy FAT w it h c
−
= c − 1 FAT w it h c
−
= c
θ
µ
2
µ
1
=0.1
µ
2
µ
1
=1
µ
2
µ
1
=10
µ
2
µ
1
=0.1
µ
2
µ
1
=1
µ
2
µ
1
=10
µ
2
µ
1
=0.1
µ
2
µ
1
=1
µ
2
µ
1
=10
0.5 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
0.8 0.0 (0.0) 0.2 (0.4) 1.5 (2.8) 0.0 (0.0) 0.2 (0.4) 1.5 (2.8) 0.0 (0.0) 0.2 (0.4) 1.5 (2.8)
1.0 0.3 (1.8) 0.9 (1.9) 1.0 (1.9) 0.3 (1.8) 0.9 (1.9) 1.0 (1.9) 0.3 (1.8) 0.9 (1.9) 1.0 (1.9)
1.5 3.1 (6.2) 4.1 (6.5) 6.7 (11.7) 1.2 (4.0) 0.7 (3.2) 0.2 (2.8) 1.9 (5.8) 2.8 (7.3) 4.3 (11.6)
2.0 7.0 (13.1) 6.3 (12.7) 9.6 (16.1) 1.5 (5.9) 0.6 (2.7) 0.3 (2.3) 2.7 (10.9) 3.3 (12.5) 4.7 (16.1)
2.5 9.2 (16.9) 9.6 (16.5) 11.6 (19.6) 1.6 (7.9) 0.7 (4.6) 0.2 (1.9) 3.1 (14.6) 3.5 (16.3) 4.6 (19.6)
3.0 10.9 (20.5) 11.3 (20.1) 12.3 (22.3) 1.6 (9.4) 0.7 (5.3) 0.2 (1.2) 3.5 (17.4) 3.6 (19.1) 4.2 (22.3)
Table 1: Numerical results. Average and maximum (in parentheses) percent shortfall from opti-
mal revenue for 10 θ +1 test cases. Penalty-adjusted service fees are a
1
=10and a
2
=5for
class 1 and 2 customers, re spectively. Service rate for class 1 customers is µ
1
=1. Rental fleet
size is c =10.
More specifically, in each test set we begin with (10×θ+1) equally spaced λ
1
’s – from λ
1
=0to
λ
1
= µ
1
c –andthenchooseλ
2
in each case so that
λ
1
µ
1
c
+
λ
2
µ
2
c
= θ. We then modify the endpoints –
where either λ
1
or λ
2
equals zero – so that the arrival rate that would be zero actually equals 0.01.
For example, the set in which θ =1andµ
2
/µ
1
=1,thereare10×θ+1 = 11 test points, and their
(λ
1
,λ
2
)valuesare{(0.01, 9.99), (1, 9), (2, 8), (3, 7), (4, 6), (5, 5), (6, 4), (7, 3), (8, 2), (9, 1), (9.99, 0.01)}.
Table 1 shows results for the 21 sets of experiments. In each experiment within a set we
record average penalty-adjusted revenue per period using the optimal policy (R
∗
), as well as
that obtained from the FAT and CS policies (R
FAT
and R
CS
). For each experiment we then
calculate the percentage revenue lost when using the heuristic controls ((1 − R
FAT
/R
∗
) × 100%
and (1 − R
CS
/R
∗
) × 100%). Finally, within each cell of Table 1 we report two statistics that
summarize the results across all 10 × θ + 1 experiments: the average of the percentage shortfalls,
as well as the maximum shortfall recorded over all cases (in parentheses).
Table 1’s results show that all three policies perform well at low offered loads. For θ ≤ 1,
none of the three policies controls the inflow of class-2 requests, and all three perform consistently
close to optimality. It is also worth noting that, in these examples, the CS policy is consistently
optimal at θ =0.5. While the sufficient c
∗
i
’s of Theorem 3 can be very large – in the thousands
in many of these examples – the offered loads at which the CS policy is actually optimal appear
to be much less extreme.
As θ climbs above 1, the three policies diverge, and the FAT heuristics outperform CS. At
θ = 2 – when the offered load is twice that of the system’s capacity – the FAT with c
−
= c − 1
still performs quite well, with a worst optimality gap of less than 6% and an average gap in each
table cell that is consistently below 1%. Here, the performance of FAT with c
−
= c is noticeably
worse, with the maximum gap of 16.1% and an average gap ranging from 2.7% to 4.7%. The
20
CS policy’s worst-case performance is also 16.1% below optimal, and its average performance in
each cell trails that of FAT with c
−
= c, falling 7.0% to 9.6% below optimality.
At very high θs, the FAT heuristic with c
−
= c − 1 consistently outperforms the other
heuristics. For example, when θ = 3, the average revenue generated by the FAT policy with
c
−
= c − 1 ranged from 0.2% to 1.9% below optimal, and the worst-case examples of each of the
10θ+1 test sets ranged from 1.2% to 9.4% below optimal. In contrast, average and worst-case
performance of the FAT with c
−
= c and CS policies were 3 to 4 times worse.
Thus, as the offered load increases, the performance of all three heuristics deteriorates with
respect to optimality. In general, the heuristics are exercising insufficient control of class-2
customers. The relatively strong performance of the FAT heuristic with “c
−
”settoc − 1 reflects
the benefit of reserving the last unit of rental capacity for “preferred” class-1 customers when
the traffic intensity is high.
Figure 3 provides additional detail on the how the setting of c
−
affects the performance of
FAT heuristic. In the figure, rental capacity is c = 10, service rates are µ
1
=1.0andµ
2
=0.1,
and penalty-adjusted revenues are a
1
=10anda
2
= 5. The aggregate offered load is fixed at
θ =
ρ
1
+ρ
2
c
=2,andthex-axis of the figure’s parametric analysis tracks the fraction of the offered
load due to class-1 customers as it is systematically increased from 0% to 100% of the total: from
ρ
1
/c =0,toρ
1
/c =2. They-axis reports the two FAT policies’ resulting percentage shortfall
from long-run average optimal revenue.
Figure 3: Performance of alternative FAT heuristics with c
−
interpreted as c (dashed line) and
as c − 1 (solid line). System has c =10, θ =
ρ
1
+ρ
2
c
=2, µ
1
=1, µ
2
=0.1, a
1
=10,anda
2
=5.
21
As Fig. 3 indicates, whenever ρ
1
/c ≥ 1.0, the two policies are identical – with the same
threshold, k
FAT
≤ c−1, and the same long-run average revenues. When ρ
1
/c < 1.0, however, the
two heuristics’ recommendations differ – k
FAT
= c − 1versusk
FAT
= c – and average revenues
differ as well. For moderate ρ
1
’s, the “c
−
= c − 1” policy outperforms the “c
−
= c” one, and for
ρ
1
c, the reverse is true.
It is worth noting that numerical experiments using other θs yield plots whose gross features
are directly analogous to those of Figure 3. Larger values of θ lead to more extreme performance
differences between the c
−
= c− 1andc
−
= c variants of the FAT at moderate to very low values
of ρ
1
.
4 The Effect of Capacity Allocation on Optimal Fleet Size
The allocation policies investigated in Sections 2 and 3 are tactical controls intended to address
instances in which the number of rental units available falls short of the anticipated near-term
demand. The total fleet size c clearly affects the nature of the control. In particular, Theorem 3
shows that, given ample capacity, the optimal control is to give free access to all customers.
It is also natural to ask the converse question. How does the use of tactical control affect the
fleet size the rental company should use? When is the optimal fleet size large enough so that, as
in Theorem 3, complete sharing is (nearly) optimal? More generally, given the ability to change
fleet size, what is the economic value to a firm of exercising tactical controls? In this section we
address both of these questions.
In fact, the effect of capacity allocation on optimal fleet size is not immediately clear. One
might argue that, given any fixed fleet size, optimal rationing increases revenue per unit of time.
This revenue increase, in turn, allows the firm to more profitably sustain higher overall capacity
levels. Alternatively, one might argue that rationing reduces the aggregate arrival rate to the
rental fleet and that, in turn, fewer units of capacity are required to process the arrivals that are
actually served.
We can provide some insight into these trade-offs by directly comparing the optimal fleet
size under active allocation policies to that under complete sharing, which passively allows all
customers access to rental capacity whenever it is available. We formulate the problem of finding
the optimal fleet size as
Π(∆) = max
c
(R (c, ∆(c)) − hc) , (33)
22
where R(c, ∆(c)) is the average revenue per period when operating c units under allocation policy
∆(c) and the capacity cost of $h per unit per period is fixed for all c. Note that, given a fixed
offered load, ρ
1
+ ρ
2
, the allocation policy, ∆(c), may vary with c.
We then compare the maximizer of (33) under two regimes. In one we use ∆(c)=CS(c), the
complete sharing policy, for all c. In the other ∆(c)=∆
∗
(c), which we define as any family of
allocation policies for which the following attributes hold:
1. For any fixed c, R (c, ∆
∗
(c)) ≥ R (c, CS(c)).
2. There exists a c<∞ such that for all c ≥ c, R (c, ∆
∗
(c)) = R (c, CS(c)).
3. R (c, ∆
∗
(c)) − R (c − 1, ∆
∗
(c − 1)) ≤ R (c − 1, ∆
∗
(c − 1)) − R (c − 2, ∆
∗
(c − 2)).
Condition 1 states that, for any c,∆
∗
(c) performs at least as well as complete sharing.
Condition 2 states that there exists a finite fleet size above which complete sharing performs
as well as ∆
∗
(c). Note that Theorem 3 demonstrates that such a c exists in the context of the
discounted problem.
Condition 3 requires that average revenues per period under ∆
∗
are concave in c.Theorem
6 proves that this type of concavity exists for the FAT policy in the context of the discounted
fluid model, and the result also suggests that the condition (roughly) holds for AT policies more
generally. Similarly, though we have not been able to prove that the condition holds for the
optimal policy, it has consistently been present in the numerical tests we have run.
Without loss of generality, we assume that a
1
≥ a
2
, and we define
h
∗
min
= R (c, ∆
∗
(c)) − R (c − 1, ∆
∗
(c − 1)) ,
h
CS
min
= R (c, CS(c)) − R (c − 1, CS(c − 1)) ,
h
∗
max
= R (1, ∆
∗
(1)) − R (0, ∆
∗
(0)) ≥ max
a
1
ρ
1
1+ρ
1
,
a
1
ρ
1
+ a
2
ρ
2
1+ρ
1
+ ρ
2
, and
h
CS
max
= R (1, CS(1)) − R (0, CS(0)) =
a
1
ρ
1
+ a
2
ρ
2
1+ρ
1
+ ρ
2
. (34)
Observe that h
∗
min
and h
CS
min
are the marginal values of adding the last piece of equipment, as
it becomes optimal to take all arrivals, first-come first-served. Similarly, h
∗
max
and h
CS
max
are the
marginal values of the first piece of equipment under the two schemes. It is not difficult to see
that h
∗
min
≤ h
CS
min
≤ h
CS
max
≤ h
∗
max
.
The following result uses these relationships to parameterize how the fleet size under capacity
allocation policies differs from that under complete sharing:
23
Theorem 7
Let c
∗
(h) and c
CS
(h) be the maximizers of (33) under ∆
∗
and CS.
a) If h<h
∗
min
then c
∗
(h)=c
CS
(h) ≥ c.
b) If h ∈ [h
∗
min
,h
CS
min
] then c
∗
(h) ≤ c
CS
(h).
c) If h ∈ [h
CS
min
,h
CS
max
] then c
∗
(h) may be smaller, equal to, or larger than c
CS
(h).
d) If h ∈ [h
CS
max
,h
∗
max
] then c
∗
(h) ≥ c
CS
(h).
e) If h>h
∗
max
then c
∗
(h)=c
CS
(h)=0.
We note that the results of Theorem 7 can be extended to the multi-class case, as well as to
multi-period capacity sizing models with more complex cost structures. We briefly discuss the
latter in the discussion at the end of the paper.
Thus, the optimal fleet size using capacity rationing may be either higher or lower than that
under the complete sharing policy. The theorem shows that the relationship between the two
depends fundamentally on the unit cost of capacity.
Parts (a) and (e) of the theorem show that optimal capacity levels for CS and ∆
∗
coincide for
very high and very low values of holding costs. If holding costs are extremely high, the expected
revenues cannot justify the acquisition of even a single unit of capacity, even under rationing. On
the other hand, if the holding costs are extremely low, then Theorem 3 implies that the optimal
rationing policy is complete sharing, and in this case the profit maximizing capacity levels of the
two policies again coincide.
Parts (b) and (d) show ranges for which the c
∗
(h) unambiguously dominates and is dominated
by c
CS
(h). Part (b) shows that for low values of h the lower marginal value to the rationing
policy of adding the “last” unit of capacity (before complete sharing becomes optimal) drives
c
∗
(h)belowc
CS
(h). Part (d) shows that for high values of h the benefit of being able to reject
lower revenue customers allows c
∗
(h) to climb above c
CS
(h).
Finally, part (c) defines a set of intermediate values of h for which c
∗
(h) can be higher, the
same as, or lower than c
CS
(h). The ordering of the relationships reflects the proximity of h to
the boundaries, h
CS
min
and h
CS
max
.
While the relationships described in the theorem are not strict inequalities, it is not difficult
to develop examples in which c
∗
(h)differsfromc
CS
(h). Figure 4 illustrates an example in which
24
the ∆
∗
(c)usedforeachc is the optimal policy for that c. Given the problem parameters c = 30,
h
∗
max
=9.09, h
CS
max
=7.14, h
∗
min
=0.65, and h
CS
min
=0.63. Between h
CS
max
and h
∗
min
, optimal fleet
sizes for the two policies are equal at h =6.5.
Figure 4: Optimal capacity size as a function of the holding cost under the optimal and complete
sharing policies. Fixed problem parameters: a
1
=10, a
2
=5, λ
1
= λ
2
=10,andµ
1
= µ
2
=1.
Capacity cost per unit of time, h, is systematically varied.
We now turn to the economic benefit of capacity rationing. Table 2 presents a set of 25
numerical experiments that compare the performance of the optimal (OPT) and complete sharing
(CS) policy. For each example, the table reports the optimal fleet size, profit per period, and
(percent) profit margin for both policies. In all of the experiments, the aggregate arrival rate
(λ
1
+ λ
2
), service rates (µ
1
and µ
2
), and penalty-adjusted revenues (a
1
and a
2
)remainfixed.
Then the fraction of the offered load due to class-1 customers (
λ
1
λ
1
+λ
2
) and the holding cost per
unit of time per unit of capacity (h) are systematically varied. The table’s results reflect three
phenomena that are worth noting.
The first is the effect of increased holding costs on fleet sizes, already displayed in Figure
4. At lower relative holding costs, capacity rationing reduces the optimal fleet size relative to
that for complete sharing a company. As one looks down each pair of columns, however, one
sees that rationing allows a company to maintain larger capacity than would be optimal under
complete sharing. Interestingly, in looking across each row, one sees that for very small and very
large fractions of class-1 customers, optimal capacities from the two policies are the same. In the
former case, this is due to the optimality of complete sharing; the policies themselves coincide.
In the latter case, however, the optimal policy reserves capacity for class-1 customers. While
complete sharing is suboptimal, the blocking of class-1 customers due to class-2 admissions is
25
λ
1
λ
1
+λ
2
=0.1
λ
1
λ
1
+λ
2
=0.3
λ
1
λ
1
+λ
2
=0.5
λ
1
λ
1
+λ
2
=0.7
λ
1
λ
1
+λ
2
=0.9
h
a
2
OPT CS OPT CS OPT CS OPT CS OPT CS
Fleet 0.5 11 11 12 13 12 13 13 13 14 14
Sizes 0.7 99910 10 11 11 12 12 12
0.9 5568 8991011 11
1.1 0034 6789910
1.3 0020 436688
Profits 0.5 18.52 18.52 27.22 27.02 36.40 36.17 45.66 45.33 54.71 54.60
0.7 8.47 8.47 16.27 16.05 24.98 24.26 33.46 32.82 41.94 41.62
0.9 1.48 1.48 8.31 7.00 15.59 14.01 23.01 21.76 30.50 29.99
1.1 --3.46 0.97 8.71 5.82 14.50 12.28 20.50 19.61
1.3 --1.12 - 4.15 0.60 7.90 4.82 12.03 10.86
Profit 0.5 67.3% 67.3% 90.7% 83.1% 121.3% 111.3% 140.5% 139.5% 156.3% 156.0%
Margin 0.7 26.9% 26.9% 51.7% 45.9% 71.4% 63.0% 86.9% 78.1% 99.9% 99.1%
0.9 6.6% 6.6% 30.8% 19.4% 43.3% 34.6% 56.8% 48.4% 61.6% 60.6%
1.1 --21.0% 4.4% 26.4% 15.1% 33.0% 24.8% 41.4% 35.7%
1.3 --8.6% - 16.0% 3.1% 20.3% 12.4% 23.1% 20.9%
Table 2: Numerical results. Optimal fleet sizes, profit per unit time, and profit margins for the
optimal (OPT) and complete sharing (CS) policies. In all test cases, the following parameters
are fixed: a
1
=10, a
2
=5, µ
1
= µ
2
=1, λ
1
+ λ
2
=10. The relative value of the holding cost,
h
a
2
,
and the fraction of demand due to the ‘preferred’ class,
λ
1
λ
1
+λ
2
, are systematically varied.
a rare enough event that it does not significantly affect optimal fleet size (or, for that matter,
profits).
The second is the fact that, when capacity costs are relatively low, complete sharing appears to
be fairly robust with respect to average profit per unit of time. In particular, when h/a
2
=0.5, the
profit advantage derived from capacity rationing is minimal, less than 1%, and when h/a
2
=0.7,
the advantage is no more than 3%. Here, increased costs, due to additional capacity, are made
up for by increased revenues, due to additional class-2 traffic. Rather, it is when capacity costs
are high – as high as or higher than class-2 penalty-adjusted revenues – that the profit increase
due to restrictions on class-2 access become significant.
The last effect is that, conversely, capacity rationing provides for a more consistently signif-
icant increase in profit margins over complete sharing. For example, even when capacity costs
are half that of penalty-adjusted class-2 revenues, margins may increase by as much as 9%. As
capacity costs approach and exceed class-2 fees, the benefit that follows the ability to limit class-2
customers increases far more sharply.
Of dollar profit and profit margin, which is more indicative of the value of rationing to the
rental company? For capital-constrained companies, we would argue it is the latter. Indeed,
in many rental business, capacity is a significant capital investment, and measures, such as
26
return on assets, that track the (dollar) efficiency of asset utilization become critical measures
of performance for managers and for investors. In our numerical experiments, h –thecostper
unit of capacity per unit of time – reflects interest expenses of capacity investment (as well as
maintenance expenses). In dividing profit by h · c (or, equivalently for us, by c) profit margin
accounts for this investment in capacity.
Thus, absolute profits lost, due to lack of control, may not be large. Nevertheless, the ability
to ration capacity and, in turn, to adjust the size of a rental fleet can, at the same time, lead
to a significant improvement in the economic utilization of the assets employed. The complete
sharing policy maximizes physical – rather than economic – utilization of assets. When complete
sharing is optimal, the two notions naturally coincide. When it is not optimal, however, it leads
to lower economic productivity.
5 Discussion
Our formulation of the rental capacity allocation problem captures some essential features that
the more traditional yield management literature does not address. In it, we explicitly represent
the fact that customers arrive at random, use pieces of equipment for rental periods of uncertain
duration, and then return the equipment to be used again.
Using dynamic programming techniques, we are able to characterize “switching curve” poli-
cies as being optimal. We also demonstrate that there are two sets of conditions under which a
customer class should be labeled a “VIP” and have unrestricted access to the available service
capacity: one in which there is ample excess capacity and another in which the penalty-adjusted
revenue and service rate parameters are favorable. In particular, we find that customers may be
assigned the VIP tag even when their rental fees are lower than those of the other class.
When applied to both customer classes, the sufficient conditions for VIP status become
conditions in which “complete sharing” policies are optimal. These policies are of interest, since
service companies often use equipment utilization as a criterion for measuring system performance
and may be reluctant to turn away customers. Theorem 4 implies that the goals of maximizing
utilization and of maximizing revenues are properly aligned, even in the peak season, if the
penalty-adjusted rental fees and service rates of the different customer classes are similar.
We also analyze a “fluid aggregate threshold” (FAT) policy that is based on a fluid approxi-
mation of the original policy. Our numerical tests show that the performance of the FAT heuristic
27
is close to that of the optimal admission policies over a broad range of operating regimes. In
addition to providing a simple and effective capacity allocation policy, the fluid model results in
revenue which is a concave function of the rental fleet size. This concavity is essential for the
analysis of related capacity sizing decisions and for an understanding of how capacity allocation
schemes affect them.
We then demonstrate that, given this concavity property, the optimal fleet size using capacity
rationing may be either higher or lower than that under the complete sharing policy. As capacity
costs grow, the optimal fleet size under rationing grows relative to that under complete sharing.
Finally, we show that, appropriate adjustment of the fleet size under complete sharing may
produce nearly optimal profits. Even in this case, however, the economic productivity of assets
can suffer significantly. When complete sharing is not optimal, its maximization of physical
utilization leads to economic underutilization of resources.
Thus, the formulation and results represent a promising step in furthering the understanding
of the management of rental systems. Of course, more work remains to be done. There are
several aspects of the allocation problem itself that merit additional analysis.
First, as we noted in Section 2, rental companies may have prior estimates of the expected
duration of the rental period, and this information would be of value when deciding whether
to admit a customer to the system. At the same time, the use of this information will also
significantly complicate the analysis. For example, it will likely require expanding the state
space of the system from numbers of pieces of equipment in use to estimates of the duration of
the remaining rental period for every piece of equipment.
Similarly, our description of rental dynamics does not include the treatment of reservation
systems, which may provide additional information about rental demand. Again, the inclusion of
reservation systems should help to improve system performance, and it will also add an additional
layer of complexity to the analysis.
One may also consider price, in addition to capacity allocation, as a mechanism for control. In
particular, an interesting case exists in which one class of customers represents national accounts,
whose prices are fixed by long-term contracts, while the other represents “rack rate” customers
for whom price may be used as a short-term control. Then the rental company may use capacity
allocation to maintain service levels for national-account clients at the same time it uses prices
to maximize profits from rack-rate customers.
28
The relationship between capacity allocation and fleet sizing can also be explored over a longer
time horizon. For example, consider a longer-term, discrete-time problem in which each period
represents a season. At the start of each season, rental capacity is adjusted by buying and selling
units, and during the season tactical controls, such as the ones developed in this paper, are used
to manage short-term capacity shortages. Then if the season’s expected revenues are concave
in the fleet size, it is not difficult to show that the optimal fleet-sizing policy is a “buy-up-to
/ sell-down-to” policy that is an analogue of “order-up-to” policies in the inventory literature
(Heyman and Sobel (1984)).
Finally, we recall that our formulation uses lump-sum penalty costs to capture the long-term
cost of denying access to customers. An alternative would be to impose service-level constraints
on the blocking probabilities of arrivals. While we believe that the current policies should be
“nearly” feasible, particularly for large systems, a thorough analysis of the relationship between
the two formulations would be of broad interest.
Acknowledgments
Research supported by NSF Grant SBR-9733739 and by the Fishman-Davisdon Center for Service
and Operations Management.
References
[1] Alstrup, J., S. Boas, O. B. G. Madsen, and R. V. V. Vidal, “Booking Policy for Flights with
Two Types of Passengers,” European Journal of Operations Research, 27 (1986), 274-288.
[2] Altman, E., G. Koole and T. Jim´enez, “On optimal call admission control in a resource-
sharing system”, IEEE Trans. on Communications 49 (2001), 1659–1668.
[3] Belobaba, P. P. “Application of a Probabilistic Decision Model to Airline Seat Inventory
Control,” Operations Research, 37 (1989), 183-197.
[4] Bitran, G. R., and S. M. Gilbert, “Managing Hotel Reservations with Uncertain Arrivals,”
Operations Research, 44 (1996), 35-49.
[5] Heyman, D. P., and M. J. Sobel, Stochastic Models in Operations Research,vol.2,McGraw-
Hill (1984), 391-407.
29
[6] Ladany, S. “Bayesian Dynamic Operating Rules for Optimal Hotel Reservation,” Z. Opera-
tions Research, 21 (1977), B165-176.
[7] Liberman, V., and U. Yechiali, “On the Hotel Overbooking Problem: An Inventory System
with Stochastic Cancelations,” Management Science, 24 (1978), 1117-1126.
[8] Littlewood, K., “Forecasting and Control of Passenger Bookings,” AGIFORS Symposium
Proceedings, 12 (1972).
[9] Miller, B., “A Queueing Reward System with Several Customer Classes,” Management
Science, 16 (1969), 234-245.
[10]
¨
Ormeci, L., A. Burnetas, and J. van der Wal, “Admission Policies to a Two-class Loss
System,” Stochastic Models 17 (2001), 513–539.
[11] Porteus, E., “Conditions for Characterizing the Structure of Optimal Strategies in Infinite-
Horizon Dynamic Programs,” Journal of Optimization Theory and Applications 36 (1982),
419-432.
[12] Puterman, M.L., Discrete Stochastic Dynamic Programming, John Wiley and Sons (1994).
[13] Ross, K. W., and D. H. K. Tsang, “The Stochastic Knapsack Problem,” IEEE Transactions
on Communications, 37 (1989), 740-747.
[14] Rothstein, M., “Hotel Overbooking as a Markovian Sequential Decision Process,” Decision
Sciences, 5 (1974), 389-404.
[15] Savin, S.V., “Managing Capital Intensive Rental Businesses”, PhD Thesis, The Wharton
School, University of Pennsylvania, 2001.
[16] Weatherford, L. R. “Length of Stay Heuristics: Do They Really Make a Difference?” Cornell
Hotel and Restaurant Administration Quarterly, 36 (1995), 70-79.
[17] Williams, F. E. “Decision Theory and the Innkeeper: An Approach for Setting Hotel Reser-
vation Policy,” Interfaces, 7 (1977), 18-30.
[18] Williamson, E. L. “Airline Network Seat Inventory Control: Methodologies and Revenue
Impact,” Ph.D. Dissertation, MIT Flight Transportation Laboratory Report R92-3 (1992).
30
Appendix
A Relationship Between Penalties and Service-Level Constraints
It may be the case that some part or all of the rejection penalties, (π
1
,π
2
), represents money that
the rental company pays to customers that it cannot accommodate. Our primary motivation for
their inclusion, however, is as “good will” costs.
An alternative approach would be to impose service-level constraints on the blocking prob-
abilities of the two classes. If one dualizes the constraints then the optimal solution to the
Lagrangian relaxation yields an objective value that equals that of the original, constrained
problem. In this case, lump-sum rejection costs naturally emerge as the problem’s Lagrange
multipliers, and they capture the value of the service-level constraints (see Chapters 3 and 4 in
Altman (1999)).
Well-known results concerning this type of constrained MDP show that the inclusion of
service-level constraints causes the optimal policy to randomize its actions in at most two of
its states, one for each constraint. Thus, the form of the optimal policy changes from that
for an analogous unconstrained problem, for which deterministic policies are optimal (see Ross
(1989) and Altman (1999)). In turn, because the optimal policy for the Lagrangian relaxation
is deterministic, it may not be feasible for the constrained problem.
At the same time, there is a common class of problems in which an optimal policy for the
Lagrangian relaxation can be shown to be feasible for the original problem with constraints. In
particular, when only one of the constraints is binding – for example, when one of the classes
represents “casual” or “rack rate” customers whose long-run arrival rate is not affected by in-
cidences of blocking – the optimal policy for the constrained problem randomizes between two
stationary, deterministic policies, each of which is optimal for the Lagrangian relaxation. One of
the policies is feasible for the constrained problem but is not tight on the service-level constraint.
The other is not feasible but obtains a higher objective value. By randomizing between these
two policies, the optimal constrained policy improves upon the feasible policy and eliminates the
slack on the service-level constraint.
Furthermore, the actions of these two policies are identical in all states but one (see Sennott
(2001)). Thus, when only one of the original service-level constraints is binding, optimal policies
for the relaxation are known to be nearly identical to optimal policies with constraints. When
both constraints are binding, the theory breaks down, however.
31
Therefore, rather than defining service-level constraints, we define analogous dual prices, the
lump-sum rejection penalties $π
1
and $π
2
. This formulation allows us to maintain the analytical
tractability of the problem. Furthermore, in Section 2.2 we demonstrate that the relaxation can
be further simplified by directly embedding Lagrange multipliers within the rental revenues, $a
1
and $a
2
.
B Formal Definition of the MDPs
We formally define the discounted and average-cost formulations of the allocation problem’s
MDP. In both cases, we also sketch out why there exist stationary, deterministic policies that
are optimal.
We define the system state {
ˆ
S
t
| t =0, 1,...} that evolves at these event epochs as
ˆ
S
t
=
(
ˆ
k
t
1
,
ˆ
k
t
2
, ˆg
t
1
, ˆg
t
2
). Here,
ˆ
k
t
i
represents the number of type-i customers currently renting equipment.
Clearly 0 ≤
ˆ
k
t
1
,
ˆ
k
t
2
≤ c,and0≤
ˆ
k
t
1
+
ˆ
k
t
2
≤ c as well. We let ˆg
t
i
∈{0, 1} equal 1 when the event is
an arrival of a class-i customer and 0 otherwise.
Note that
ˆ
S
t
represents the before action state of the system at event epoch t. Alternatively,
we may record the system state at transition t after action, after the system manager has decided
to accept or reject an arriving customer, if one exists. We define this after action state space
S
t
=(k
t
1
,k
t
2
) to be the numbers of units being rented after the t
th
decision epoch.
To analyze the discrete-time process embedded at event epochs, we uniformize the underlying
continuous time Markov chain to evolve at constant rate Γ = λ
1
+ λ
2
+(µ
1
+ µ
2
)c. Then if the
after-action state at epoch t is S
t
=(k
t
1
,k
t
2
), we have the following set of transition probabilities
ˆ
S
t+1
=
(k
t
1
,k
t
2
, 1, 0) w.p.λ
1
/ Γ,
(k
t
1
,k
t
2
, 0, 1) w.p.λ
2
/ Γ,
(k
t
1
− 1,k
t
2
, 0, 0) w.p.µ
1
k
t
1
/ Γ,
(k
t
1
,k
t
2
− 1, 0, 0) w.p.µ
2
k
t
2
/ Γ,
(k
t
1
,k
t
2
, 0, 0) w.p.
µ
1
c − k
t
1
+ µ
2
c − k
t
2
/ Γ .
(35)
Note that departures that drive the system occupancy to be negative occur with probability zero
and that the last transition probability reflects uniformization at rate Γ.
Let u
t
∈{0, 1} denote the action taken at event epoch t. If the action is to accept an arriving
customer, then u
t
= 1, and if the action is to reject an arriving customer, then u
t
=0. Atevent
epochs that represent customer departures we let u
t
= 0 as well.
A policy ∆ is a set of decision rules used by the system controller when choosing whether to
accept or reject an arrival at each epoch t. Define the history of the system up to event epoch
32
t to be the H
t
= {(
ˆ
S
0
,u
0
),...,(
ˆ
S
t−1
,u
t−1
) ∪
ˆ
S
t
}, the record of all states and actions taken up
through event epoch t.Anon-anticipating policy ∆ is a rule which chooses an action u
t
, possibly
at random, using only the information available in H
t
. We consider only such non-anticipating
rules, and we denote the action taken at t under ∆ as u
∆
t
. Finally, a stationary policy considers
only the current state
ˆ
S
t
when determining u
t
.
Our analysis of capacity allocation rules primarily considers the maximization of expected
discounted profits over an infinite horizon, and the formal results of this section are stated in
this context. Let α>0 be the continuous-time discount rate. We can always select time units
so that Γ + α = 1. Then, we seek a non-anticipating policy ∆ to maximize
lim
t→∞
t
s=0
α
s
E
∆
[a
1
(
ˆ
k
s
1
+ˆg
s
1
u
∆
s
)+a
2
(
ˆ
k
s
2
+ˆg
s
2
u
∆
s
) − (π
1
ˆg
s
1
+ π
2
ˆg
s
2
)(1− u
∆
s
)]. (36)
The fact that the state and action spaces are finite, one-period rewards and costs are station-
ary and bounded, and α<1 implies that the maximum in (36) is achieved and that there exists
a stationary, deterministic policy that is optimal (see Chapter 6 in Puterman (1994)). In turn,
this implies that we may restrict our attention to this class of policies.
We also consider the maximization of average profit per period, often referred to as the
“average cost” criterion. In particular, numerical comparisons are more transparent in this
context, since average profits per period do not depend on an initial state, and all of the numerical
results in the paper are stated in the context of average-cost problems.
In the average-cost formulation, we define the time scale so that Γ = λ
1
+ λ
2
+ cµ
1
+ cµ
2
=1
and the expected one-period revenue earned from renting a unit to a class-i customer is a
i
≡
a
i
Γ
.
In turn, we seek a policy ∆ to maximize
lim
t→∞
1
t
t
s=0
E
∆
a
1
(
ˆ
k
s
1
+ˆg
s
1
u
∆
s
)+a
2
(
ˆ
k
2
s
+ˆg
s
2
u
∆
s
) − (π
1
ˆg
s
1
+ π
2
ˆg
s
2
)(1− u
∆
s
)
. (37)
In this case we can also restrict our analysis to that of stationary, deterministic policies.
Note that, for any stationary policy, the before-action state (0, 0, 0, 0) is positive recurrent.
Furthermore, under any such policy, state (0, 0, 0, 0) is accessible from all other states. Together,
these facts imply that, for any stationary policy, each state that is accessible from (0, 0, 0, 0) is
positive recurrent and each state that is not is transient. Thus, each stationary, deterministic
policy induces a single class of recurrent states, so that the resulting problem is unichain.In
addition, the system is aperiodic, since the last transition of (35) implies that with positive
probability the system remains in the current state after one transition. Together with the
33
finiteness of the state and action spaces and the stationary, bounded nature of costs and rewards,
these conditions imply that the maximum in (37) is achieved and that there exists a stationary,
deterministic policy that is optimal (see Chapter 8 in Puterman (1994)).
CProofs
Proof of Theorem 1
Proof
Let v(k
1
,k
2
) be a solution to the adjusted value function:
v(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+ µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
) (38)
where a
i
= a
i
+ π
i
(µ
i
+ α), as in (7).
H
1
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
),f(k
1
+1,k
2
)] when k
1
+ k
2
<c,
f(k
1
,k
2
)whenk
1
+ k
2
= c,
(39)
H
2
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
),f(k
1
,k
2
+1)] whenk
1
+ k
2
<c,
f(k
1
,k
2
)whenk
1
+ k
2
= c.
(40)
Then using (8) to substitute for v(k
1
,k
2
)weobservethat
H
1
[v(k
1
,k
2
)] = H
1
[v(k
1
,k
2
) −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
k
2
]
= −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
(k
1
+1)− π
2
k
2
+
H
1
[v(k
1
,k
2
)]. (41)
Similarly,
H
2
[v(k
1
,k
2
)] = −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
(k
2
+1)+
H
2
[v(k
1
,k
2
)]. (42)
Again, using (8) to substitute into the optimality equation for v(k
1
,k
2
), we obtain
v(k
1
,k
2
) −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
k
2
= a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+λ
1
−
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
(k
1
+1)− π
2
k
2
+λ
2
−
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
(k
2
+1)
+µ
1
k
1
(v(k
1
− 1,k
2
) −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
(k
1
− 1) − π
2
k
2
)
+µ
2
k
2
(v(k
1
,k
2
− 1) −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
(k
2
− 1))
34
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)(v(k
1
,k
2
) −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
k
2
)
= a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
)
−(λ
1
+ λ
2
+(µ
1
+ µ
2
)c)
λ
1
π
1
α
+ λ
2
π
2
α
+ π
1
k
1
+ π
2
k
2
−λ
1
π
1
− λ
2
π
2
+ µ
1
π
1
k
1
+ µ
2
π
2
k
2
, (43)
Transferring −
λ
1
π
1
α
+ λ
2
π
2
α
− π
1
k
1
− π
2
k
2
to the right-hand side of (43), we obtain
v(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
)
(1 − (λ
1
+ λ
2
+(µ
1
+ µ
2
)c))
λ
1
π
1
α
+ λ
2
π
2
α
+ π
1
k
1
+ π
2
k
2
−λ
1
π
1
− λ
2
π
2
+ µ
1
π
1
k
1
+ µ
2
π
2
k
2
= a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
)
+α
λ
1
π
1
α
+ λ
2
π
2
α
+ π
1
k
1
+ π
2
k
2
−λ
1
π
1
− λ
2
π
2
+ µ
1
π
1
k
1
+ µ
2
π
2
k
2
. (44)
Algebraic manipulation of (44) then yields
v(k
1
,k
2
)=(a
1
+ π
1
(α + µ
1
)) k
1
+(a
2
+ π
2
(α + µ
2
)) k
2
+λ
1
H
1
[v(k
1
,k
2
)] + λ
2
H
2
[v(k
1
,k
2
)]
+µ
1
k
1
v(k
1
− 1,k
2
)+µ
2
k
2
v(k
1
,k
2
− 1) + ((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)v(k
1
,k
2
). (45)
Finally, a comparison of (2)–(3) (with f ≡ v) with (39)–(40) (with f ≡ ˆv) shows that a policy
optimally accepts a customer in the original problem if and only if it accepts a customer in the
transformed problem.
35
The Adjusted Value Iteration Formulation for the Average C ost Case
First, we define value iteration for the average cost case. As in (37) we select time units so that
Γ=λ
1
+ λ
2
+(µ
1
+ µ
2
)c = 1. Here, the result of the n
th
trial of the procedure can be expressed
as
V
n
(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[V
n−1
(k
1
,k
2
)] + λ
2
H
2
[V
n−1
(k
1
,k
2
)]
+ µ
1
k
1
V
n−1
(k
1
− 1,k
2
)+µ
2
k
2
V
n−1
(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)V
n−1
(k
1
,k
2
) , (46)
where the H
i
’s are defined as in (2)–(3).
In this case, the same type of convergence to a value function holds, though both the necessary
conditions and the statement of the result are a bit more delicate. In particular, we note that
the system has finite state and action spaces and is unichain and aperiodic. Therefore, there
exists an optimal policy that is stationary and deterministic, with average revenue per period V
(the “gain”). Furthermore, lim
n→∞
V
n
(k
1
,k
2
)/n = V (see Chapter 8 in Puterman (1994)).
Similarly, we prove the average-cost analogue of Theorem 1 by analyzing the value iteration
operator, rather than the value function. Formally, we state the result as follows:
Theorem A1
For any average cost problem, with (a
1
,a
2
,π
1
,π
2
) for which lim
n→∞
V
n
(k
1
,k
2
)/n = V ,there
exists an alternative formulation, with rewards a
i
= a
i
+ µ
i
π
i
, i =1, 2 and zero penalties, for
which lim
n→∞
V
n
(k
1
,k
2
)/n =
V and
V = V + λ
1
π
1
+ λ
2
π
2
. (47)
Proof
We let the adjusted value iteration operator T be
V
n+1
(k
1
,k
2
)=a
1
k
1
+ a
2
k
2
+ λ
1
H
1
[
V
n
(k
1
,k
2
)] + λ
2
H
2
[
V
n
(k
1
,k
2
)]
+ µ
1
k
1
V
n
(k
1
− 1,k
2
)+µ
2
k
2
V
n
(k
1
,k
2
− 1)
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)
V
n
(k
1
,k
2
) , (48)
where a
i
= a
i
+ π
i
µ
i
is the expected adjusted class i revenue per period, and
H
1
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
),f(k
1
+1,k
2
)] when k
1
+ k
2
<c,
f(k
1
,k
2
)whenk
1
+ k
2
= c,
(49)
H
2
[f(k
1
,k
2
)] =
max[f(k
1
,k
2
),f(k
1
,k
2
+1)] whenk
1
+ k
2
<c,
f(k
1
,k
2
)whenk
1
+ k
2
= c.
(50)
36
Given V
0
=0,and
V
0
≡ π
1
k
1
+ π
2
k
2
, we will prove by induction that the relationship
V
n
(k
1
,k
2
)=V
n
(k
1
,k
2
)+n(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
k
2
(51)
holds for all n.ThenV
n
+ n(λ
1
π
1
+ λ
2
π
2
) ≤
V
n
≤ V
n
+ n(λ
1
π
1
+ λ
2
π
2
)+µ
1
π
1
c + µ
2
π
2
c for all
n, and lim
n→∞
V
n
/n = V + λ
1
π
1
+ λ
2
π
2
.
Using (51) to substitute for V
n
(k
1
,k
2
)inH
1
,weobservethat
H
1
[V
n
(k
1
,k
2
)] = H
1
[
V
n
(k
1
,k
2
) − (n(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
k
2
)]
= −n(λ
1
π
1
+ λ
2
π
2
) − π
1
(k
1
+1)− π
2
k
2
+
H
1
[
V
n
(k
1
,k
2
)]. (52)
Similarly,
H
2
[V
n
(k
1
,k
2
)] = −n(λ
1
π
1
+ λ
2
π
2
) − π
1
k
1
− π
2
(k
2
+1)+
H
2
[
V
n
(k
1
,k
2
)]. (53)
Substituting for ˆa
i
,
H
i
,and
V
n
on the right hand side of (48) we have
V
n+1
(k
1
,k
2
)=(a
1
+ µ
1
π
1
)k
1
+(a
2
+ µ
2
π
2
)k
2
+λ
1
H
1
[V
n
(k
1
,k
2
)+n(λ
1
π
1
+ λ
2
π
2
)+π
1
(k
1
+1)+π
2
k
2
]
+λ
2
H
2
[V
n
(k
1
,k
2
)+n(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
(k
2
+1)]
+µ
1
k
1
(V
n
(k
1
− 1,k
2
)+n(λ
1
π
1
+ λ
2
π
2
)+π
1
(k
1
− 1) + π
2
k
2
)
+µ
2
k
2
(V
n
(k
1
,k
2
− 1) + n(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
(k
2
− 1))
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
k
2
)(V
n
(k
1
,k
2
)+n(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
k
2
).
Then collecting terms and using λ
1
+ λ
2
+(µ
1
+ µ
2
)c =1weobtain
V
n+1
(k
1
,k
2
)=V
n+1
(k
1
,k
2
)+(n +1)(λ
1
π
1
+ λ
2
π
2
)+π
1
k
1
+ π
2
k
2
. (54)
Proof of Theorem 2
Please see Altman et al. (1998) or Savin (2001).
Proof of Theorem 3
Proof
Below we prove (11) for i =1,sincetheprooffori = 2 is trivially obtained from it.
By contradiction suppose that at time 0 there are k
1
+ k
2
= c − 1 customers in the system and
that the optimal policy, π, rejects an arriving class-1 customer with service time
˜
t
0
∼ exp(µ
1
).
37
Consider an alternative policy, π
, that accepts the class-1 customer at time 0. Furthermore,
suppose that π
follows π as closely as possible thereafter: whenever π rejects a customer, so
does π
; whenever π accepts a customer, π
attempts to accept the customer as well. The only
case in which π
rejects a customer that π accepts is the one in which blocking occurs. This
blocking is due, ultimately, to the acceptance of the class-1 customer at time 0. Below we show
that the expected discounted revenue is greater under π
than under π as long as the service
capacity is large enough.
Consider the possible states of the system under policies π and π
on (0,
˜
t
0
), just before the
service completion of the customer accepted under π
at time 0 (We will sometimes use π and π
to denote systems themselves.) For simplicity, below we denote the number of class i customers
in the π system, k
π
i
(t|(k
1
,k
2
)), as k
π
i
(t), and the number of class i customers in the π
system,
k
π
i
(t|(k
1
+1,k
2
)), as k
π
i
(t). The following Lemma shows that for any t ∈ (0,
˜
t
0
), two systems
will vary by at most one customer.
Lemma A1
For al l t ∈ (0,
˜
t
0
) all but one of the customers are identical in the two systems. i) The customer
admitted at time 0 to π
does not appear in π. ii) There may be one fewer customer in π than
π
, or there may be one customer in π – of type 1 or type 2 – that does not appear in π
. iii)
This implies, 0 ≤ k
π
1
(t) − k
π
1
(t) ≤ 1 and 0 ≤ k
π
2
(t) − k
π
2
(t) ≤ 1.
Proof
We prove the lemma by induction. At time 0 π
has one more customer, and this customer
is of type 1. The other c − 1 customers are identical in both systems.
Suppose first that all but one customer are identical and an arrival occurs. If both systems
accept or reject the arrival, then the systems still differ by at most one. If system π accepts the
arrival but system π
does not, then there must have been c − 1 identical customers, and system
π
was full because of the customer it accepted at time 0. Again the induction holds, this time
with c − 1 identical customers and different customers occupying the c
th
slot in the two systems.
Next, suppose all but one customer in both systems are identical and a departure occurs. If
one of the customers common to the two systems has left, the induction holds. Otherwise the
departure may be a customer that was admitted to system π but blocked from π
,inwhichcase
all but the one remain identical, and π
is left with one more customer than π, the customer
admitted at time 0. Finally, the departure may be that of the type-1 customer admitted to π
at time zero, in which case the induction assumption holds and the stopping time
˜
t
0
is attained.
38
It follows directly from the lemma that at
˜
t
0
, just after the customer admitted to π
at time
0 has left, the systems will be in one of the following three states:
A
0
=
k
π
1
(
˜
t
0
)=k
π
1
(
˜
t
0
); k
π
2
(
˜
t
0
)=k
π
2
(
˜
t
0
)
, (55)
A
1
=
k
π
1
(
˜
t
0
)=k
π
1
(
˜
t
0
)+1; k
π
2
(
˜
t
0
)=k
π
2
(
˜
t
0
)
, or (56)
A
2
=
k
π
1
(
˜
t
0
)=k
π
1
(
˜
t
0
); k
π
2
(
˜
t
0
)=k
π
2
(
˜
t
0
)+1
, (57)
where P {A
0
} + P {A
1
} + P {A
2
} = 1. We can also define the “blocking event” to be
B =
a customer arrival on (0,
˜
t
0
) is blocked under π
but not π
.
After
˜
t
0
policy π
can exactly match the actions of π. Given event A
1
or A
2
occurs, at
˜
t
0
π will have one more customer in the system than π
, however. Given A
1
occurs, we define a
second random time,
˜
t
1
, to be the remaining service time, after
˜
t
0
, of the extra type-1 customer in
system π.Here
˜
t
1
∼ exp(µ
1
), independent of
˜
t
0
. Similarly, given A
2
occurs, we define
˜
t
2
to be the
remaining service time, after
˜
t
0
, of the extra type-2 customer in system π,where
˜
t
2
∼ exp(µ
2
).
Thus, for each of the three events, we can define a random time
˜
t at which the system under
π
couples with that under π:givenA
0
they couple at
˜
t =
˜
t
0
;givenA
1
they couple at
˜
t =
˜
t
0
+
˜
t
1
;
and given A
2
they couple at
˜
t =
˜
t
0
+
˜
t
2
. Furthermore, in each of these cases we can use Lemma
C to bound the difference in discounted revenues earned by the two systems until the coupling
time. When there is no blocking in either system, policy π
earns a
1
units of revenue more per
unit of time until
˜
t
0
, due to the extra type 1 customer taken at time 0. When there is blocking
in π
, however, system π may earn a
1
or a
2
units per unit of time until
˜
t, depending on the type
of customer blocked. A simple upper bound on the revenue lost would be
a =max(a
1
, a
2
)for
˜
t
units of time. To prove the Theorem, we will use the bounds and stopping times to show that for
systems with large service capacities the expected discounted revenue until coupling is greater
under π
than under π.
Let ∆
+
be the extra discounted revenue earned on (0,
˜
t]inπ
from accepting the class-1
customer at time 0, let ∆
−
be the discounted revenue foregone in π
due to blocking that might
occur, and let ∆ = ∆
+
− ∆
−
be the difference. Then,
E[∆] = E[∆
+
] − E[∆ ]
=
+∞
0
t
0
a
1
e
−αs
dsdF
˜
t
0
−
+∞
0
P
B|
˜
t
0
= t
E
∆ |B ∩
˜
t
0
= t
dF
˜
t
0
−
+∞
0
P
B|
˜
t
0
= t
E
∆ |B ∩
˜
t
0
= t
dF
˜
t
0
39
≥
+∞
0
t
0
a
1
e
−αs
dsdF
˜
t
0
−
+∞
0
P
B|
˜
t
0
= t
t
0
ae
−αs
ds + e
−αt
+∞
0
a
α
1 − e
−αs
µe
−µs
ds
dF
˜
t
0
(58)
where
B|
˜
t
0
= t
conditions event B on event
˜
t
0
= t, µ =min(µ
1
,µ
2
), the first term in the
square brackets is an upper bound on the revenue lost on (0,
˜
t
0
), and the second term in the
square brackets is an upper bound on the revenue lost on (
˜
t
0
,t). In (58) we have also used the
fact that no revenue is lost if blocking event B does not occur. Substituting µ
1
e
−µ
1
t
dt for dF
˜
t
0
and integrating, we have
E[∆] =
a
1
µ
1
+ α
−
+∞
0
P
B|
˜
t
0
= t
$
a
α
1 − e
−αt
+ e
−αt
a
µ + α
%
µ
1
e
−µ
1
t
dt. (59)
We plan to show that
+∞
0
e
−(µ
1
+α)t
P
B|
˜
t
0
= t
dt ≤
λ
1
+ λ
2
(λ
1
+ λ
2
+(c − 2)µ)(µ
1
+ α)
2+
λ +2
µ
µ
1
+ α
, (60)
and
+∞
0
1 − e
−αt
e
−µ
1
t
P
B|
˜
t
0
= t
dt ≤
(λ
1
+ λ
2
) α
(λ
1
+ λ
2
+(c − 2)µ) µ
2
1
6+4
λ +2µ
µ
1
(61)
or equivalently, that E[∆] ≥ 0forc ≥ c
∗
1
.
We cannot directly characterize P {B}, since we do not know the details of how π and π
behave on (0,
˜
t
0
). Instead, we will develop an upper bound on P {B} by analyzing a simpler,
well-defined system for which we prove that the probability of blocking is greater than that of
the original system. We derive the system in two steps.
First, consider the probability of blocking when the complete sharing (CS) policy, which
accepts all arriving customers as long as there is available capacity, and let P
CS
{B} be the
probability of blocking on (0,
˜
t
0
) when complete sharing policy is used. Then
Lemma A2
P
CS
{B|
˜
t
0
= t}≥P {B|
˜
t
0
= t}.
Proof
We use a sample-path argument. Consider any sample path in which
˜
t
0
= t and in which
blocking occurs on (0,t) under π
. In particular, consider the moment that blocking first occurs
under π
. If blocking has already occurred under CS, then we are done. If blocking has not
40
yet occurred, then under CS the system has at least as many customers as in π
,sinceCShas
rejected none of the customers accepted under π
, and blocking also occurs under CS at this
time. Therefore, whenever there is blocking under π
, there will be blocking under CS as well.
Second, we note that by conditioning on
˜
t
0
= t, we effectively reduce the size of the system
under consideration by one unit of capacity. That is,
P
CS
{B|
˜
t
0
= t} = P
CS
{∃ blocking on (0,t)ina(c − 1) server system that is full at time 0}.
Because this probability is still difficult to analyze, we consider the following three-state Markov
chain that is designed to allow us to characterize an upper bound on the probability:
M =
1 − pp0
1 − p 0 p
001
,
where λ = λ
1
+ λ
2
and p = λ/(λ +(c − 2)¯µ).
Let T
CS
be the first time blocking occurs in the (c − 1)-server system with complete sharing
and let T
M
be the first time the Markov chain M passes to state 3, given it starts in state 2.
Then
Lemma A3
T
CS
≥
st
T
M
, which implies P
CS
{B|
˜
t
0
= t}≤P
M
{T
M
≤ t}.
Proof
(Sketch) First, compare the original (c − 1)-server system under CS to another CS system
in which all customers in service have mean service time ¯µ =min(µ
1
,µ
2
), rather than the original
µ
1
and µ
2
. By coupling the two sequences of service times, we can show that the probability
that the system with the slow services (both with mean ¯µ) will experience blocking by time t is
greater than the probability that the original system does.
Next, consider the CS system with slow services, ¯µ. The system starts out with c−1customers
in service and experiences blocking on the first transition with probability λ/(λ+(c−1)¯µ). If the
next event is a departure, however, there are c − 2 customers in the system, and the analogous
probability that the next event is an arrival (though not blocking) is higher, λ/(λ +(c − 2)¯µ).
Then, observe that the Markov chain M is constructed to mimic the c − 1serversystem
as follows. 1) State 1 corresponds to c − 2 customers in service, state 2 corresponds to c − 1
customers in service, and a transition to state 3 corresponds to the blocking event in the (c − 1)-
server system. 2) The rate at which arrivals occur is the same in both M and the (c − 1)-server
41
system. 3) The rate at which service completions occurs is less in system M than that in the
(c − 1)-server system. 4) The “occupancy” in M never drops below c − 2, which corresponds to
state 1.
Thus, if we start system M in state 2, then its first passage time to state 3 is constructed
to be stochastically smaller than the time to blocking in system CS with service rates ¯µ.In
particular, the sequence of arrivals that triggers the blocking event in CS can be coupled to that
in M. Similarly, the number of departures from system M up to the blocking event in CS is no
more than that in the CS system. The result follows.
For Markov chain M we obtain
Lemma A4
+∞
0
e
−(µ
1
+α)t
P
M
(T
M
≤ t)dt ≤
λ
1
+ λ
2
(λ
1
+ λ
2
+(c − 2)µ)(µ
1
+ α)
2+
λ +2
µ
µ
1
+ α
, (62)
and
+∞
0
1 − e
−αt
e
−µ
1
t
P
M
(T
M
≤ t)dt ≤
(λ
1
+ λ
2
) α
(λ
1
+ λ
2
+(c − 2)µ) µ
2
1
6+4
λ +2µ
µ
1
. (63)
Proof
From the definition of T
M
we obtain
P
M
T
M
≤ t
=
+∞
k=1
q
k
F
E
(k, Λ,t) , (64)
where F
E
(k, Λ,t)=1− exp (−Λt)
k−1
(
i=0
(Λt)
i
i!
is the degree-k Erlang CDF, Λ = λ
1
+ λ
2
+ cµ, and
q
k
is the probability that the Markov chain M reaches state 3 in exactly k steps starting in state
2. We note that q
1
= p, q
2
=0and
q
k
=(1− p)pb
k−2
,k≥ 3, (65)
where b
k
is the probability that M reaches state 2 in exactly k steps starting in state 1. This
last probability satisfies the recursion
b
k
=(1− p)b
k−1
+ p(1 − p)b
k−2
,k≥ 3, (66)
with initial conditions b
1
= p, b
2
= p(1 − p). From (66) we obtain
A =
+∞
k=1
b
k
Q
k
= pQ + p(1 − p)Q
2
+(1− p)Q (A − pQ)+p(1 − p)Q
2
A, (67)
42
for any Q<1, so that
+∞
k=1
b
k
Q
k
=
pQ
1 − (1 − p)(1 + pQ)Q
. (68)
Then, from (64), (65) and (68), we obtain for any ω>0
+∞
0
e
−ωt
P
M
T
M
≤ t
dt
=
1
ω
+∞
k=1
q
k
Λ
ω +Λ
k
=
1
ω
p
Λ
ω +Λ
+ p(1 − p)
Λ
ω +Λ
2
+∞
k=1
b
k
Λ
ω +Λ
k
=
p
ω
Q (ω)+
p(1 − p)Q
3
(ω)
1 − Q(ω)+pQ(ω) − p(1 − p)Q
2
(ω)
≤
p
ω
1+
p
1 − Q(ω)
≤
p
ω
1+
λ
ω
λ +
µ(c − 2) + ω +2µ
λ + µ(c − 2)
≤
p
ω
2+
λ +2
µ
ω
. (69)
where Q(ω)=Λ/ (ω +Λ), Λ=λ
1
+ λ
2
+ cµ. Finally,
+∞
0
e
−µ
1
t
− e
(α+µ
1
)t
P
M
T
M
≤ t
dt ≤ α
+∞
0
te
−µ
1
t
P
M
T
M
≤ t
dt
= −α
d
dω
$
p
ω
Q (ω)+
p(1 − p)Q
3
(ω)
1 − Q(ω)+pQ(ω) − p(1 − p)Q
2
(ω)
%
ω=µ
1
= α
p
ω
2
Q (ω)+
p(1 − p)Q
3
(ω)
1 − Q(ω)+pQ(ω) − p(1 − p)Q
2
(ω)
ω=µ
1
+α
pQ (ω)
ω
2
1 − Q (ω)+
p(1 − p)Q
2
(ω)(1 − Q (ω))(3 − Q (ω)(2+pQ (ω))(1 − p))
(1 − Q(ω)+pQ(ω) − p(1 − p)Q
2
(ω))
2
ω=µ
1
≤ α
p
µ
2
1
2+4
p
1 − Q(µ
1
)
≤ α
p
µ
2
1
6+4
λ +2µ
µ
1
. (70)
Now, (60) and (61) are obtained by combining results of Lemmas 2, 3 and 4. This completes
the theorem’s proof.
Proof of Theorem 4
Proof
Here we will prove (12) for i =1,sincetheprooffori = 2 can be obtained from it by
the simple exchange of indices. Consider the class of functions F
∗
defined on the set S such that
each member of this class f (k
1
,k
2
) is a submodular function satisfying the following relations:
f(k
1
,k
2
) − f (k
1
+1,k
2
) ≤ 0,k
1
+ k
2
= c − 1, (71)
f(k
1
+1,k
2
) − f(k
1
,k
2
) ≤
a
1
µ
1
,k
1
+ k
2
+1≤ c, and (72)
43
f(k
1
,k
2
+1)− f (k
1
+1,k
2
) ≤
a
1
λ
2
,k
1
+ k
2
+1≤ c. (73)
Because of the submodularity of f (k
1
,k
2
), the (71) is, in fact, valid for every pair (k
1
,k
2
) ∈ S.
Below we show that F
∗
is closed under T if the condition (12) is satisfied. Indeed, using the
expected discounted profit optimality equation for the k
1
+ k
2
+1=c,weobtain
Tf(k
1
,k
2
) − Tf(k
1
+1,k
2
)
= −a
1
+ λ
2
(max[f(k
1
,k
2
),f(k
1
,k
2
+1)]− f (k
1
+1,k
2
))
+µ
1
k
1
(f(k
1
− 1,k
2
) − f(k
1
,k
2
))
+µ
2
k
2
(f(k
1
,k
2
− 1) − f (k
1
+1,k
2
− 1))
+((µ
1
+ µ
2
)c − µ
1
(k
1
+1)− µ
2
k
2
)(f(k
1
,k
2
) − f(k
1
+1,k
2
))
≤ 0. (74)
Also, for any (k
1
,k
2
) ∈ S such that k
1
+ k
2
+1<c,wehave
Tf(k
1
+1,k
2
) − Tf(k
1
,k
2
)
= a
1
+ λ
1
(f(k
1
+2,k
2
) − f (k
1
+1,k
2
))
+ λ
2
(max[f(k
1
+1,k
2
),f(k
1
+1,k
2
+1)]− max[f(k
1
,k
2
),f(k
1
,k
2
+1)])
+ µ
1
k
1
(f(k
1
,k
2
) − f(k
1
− 1,k
2
)) + µ
2
k
2
(f(k
1
+1,k
2
− 1) − f(k
1
,k
2
− 1))
+((µ
1
+ µ
2
)c − µ
1
(k
1
+1)− µ
2
k
2
)(f(k
1
+1,k
2
) − f(k
1
,k
2
))
≤ (λ
1
+ λ
2
+(µ
1
+ µ
2
)c)
a
1
µ
1
≤
a
1
µ
1
. (75)
For the case k
1
+ k
2
+1=c we obtain
Tf(k
1
+1,k
2
) − Tf(k
1
,k
2
)
= a
1
+ λ
2
(f(k
1
+1,k
2
) − max[f(k
1
,k
2
),f(k
1
,k
2
+1)])
+ µ
1
k
1
(f(k
1
,k
2
) − f(k
1
− 1,k
2
))
+ µ
2
k
2
(f(k
1
+1,k
2
− 1) − f (k
1
,k
2
− 1))
+((µ
1
+ µ
2
)c − µ
1
(k
1
+1)− µ
2
k
2
)(f(k
1
+1,k
2
) − f (k
1
,k
2
))
≤
a
1
µ
1
. (76)
Further, considering Tf(k
1
,k
2
+1)− Tf(k
1
+1,k
2
) for the case k
1
+ k
2
+1<c,weobtain
Tf(k
1
,k
2
+1) − Tf(k
1
+1,k
2
)
= a
2
− a
1
+ λ
1
(f(k
1
+1,k
2
+1)− f(k
1
+2,k
2
))
44
+ λ
2
(max[f(k
1
,k
2
+1),f(k
1
,k
2
+2)]− max[f(k
1
+1,k
2
),f(k
1
+1,k
2
+1)])
+ µ
1
k
1
(f(k
1
− 1,k
2
+1)− f(k
1
,k
2
)) + µ
2
k
2
(f(k
1
,k
2
) − f (k
1
+1,k
2
− 1))
+((µ
1
+ µ
2
)c − µ
1
k
1
− µ
2
(k
2
+ 1))(f(k
1
,k
2
+1)− f(k
1
+1,k
2
))
+(µ
1
− µ
2
)(f(k
1
+1,k
2
) − f (k
1
,k
2
)). (77)
Now, if µ
1
≤ µ
2
,then
Tf(k
1
,k
2
+1)− Tf(k
1
+1,k
2
) ≤ a
2
− a
1
+(λ
1
+ λ
2
+(µ
1
+ µ
2
)c)
a
1
λ
2
− µ
2
a
1
λ
2
≤
a
1
λ
2
(78)
whenever
a
1
≥
λ
2
λ
2
+ µ
1
a
2
. (79)
If, on the other hand, µ
1
>µ
2
,then
Tf(k
1
,k
2
+1)− Tf(k
1
+1,k
2
) ≤ a
2
− a
1
+(λ
1
+ λ
2
+(µ
1
+ µ
2
)c)
a
1
λ
2
− µ
2
a
1
λ
2
+(µ
1
− µ
2
)
a
1
µ
1
≤
a
1
λ
2
(80)
for
a
1
µ
1
≥
λ
2
λ
2
+ µ
1
a
2
µ
2
. (81)
The proof for the case when k
1
+ k
2
+1=c is easily obtained from the above arguments.
Proof of Theorem 5
Proof
Here we consider the only non-trivial case of case of ρ
1
≥ c,sothatk
FAT
<cis optimal.
Suppose the initial state is k = 0. Then differentiating (23) with respect to k
FAT
,weobtain
∂R
FAT
α
(k,k
FAT
)
∂k
FAT
=
(ρ
1
+ ρ
2
− k
FAT
)
α
µ
−1
(µ + α)(ρ
1
+ ρ
2
− k)
α
µ
×
a
2
ρ
2
+
a
1
(ρ
1
− c)
α
µ
+1
(ρ
1
− k
FAT
)
α
µ
− (ρ
1
+ ρ
2
− k
FAT
)
a
1
(ρ
1
− c)
α
µ
+1
(ρ
1
− k
FAT
)
α
µ
+1
=
ρ
2
(ρ
1
+ ρ
2
− k
FAT
)
α
µ
−1
(µ + α)(ρ
1
+ ρ
2
− k)
α
µ
a
2
− a
1
ρ
1
− c
ρ
1
− k
FAT
α+µ
µ
. (82)
In turn, solving the first order conditions,
∂R
FAT
α
(k,k
FAT
)
∂k
FAT
=0,fork
FAT
,provides
k
∗
= c − (ρ
1
− c)
a
1
a
2
µ
µ+α
− 1
.
45
Furthermore, from (82) it can be seen that
∂R
FAT
α
(k,k
FAT
)
∂k
FAT
< 0 for all k
FAT
>k
∗
and
∂R
FAT
α
(k,k
FAT
)
∂k
FAT
>
0 for all k
FAT
<k
∗
.Thusfork = 0 the optimal threshold k
∗
FAT
= k
∗
if and only if ρ
1
1 −
a
2
a
1
µ
µ+α
≤
c<ρ
1
.Forc below this range, k
∗
FAT
= 0 is optimal.
We also claim that the optimal threshold, k
∗
FAT
, is independent of the starting state, k,so
that the argument above, stated for k =0,holdsforallk ∈ [0,c]. Fist, note that the expression
for k
∗
is independent of k.Thus,forallk<k
∗
FAT
the differentiation by which k
∗
was obtained
is well-defined and k
∗
FAT
is optimal.
For k ≥ k
∗
FAT
we prove the claim by contradiction. Suppose there exists a starting state
k
1
>k
∗
FAT
with optimal threshold k
1
FAT
= k
∗
FAT
.Ifk
1
FAT
<k
1
then, without loss of generality, we
can redefine k
1
FAT
to be k
∗
FAT
, since from (24) we see that whenever k>k
FAT
discounted revenues
do not depend on k
FAT
.Otherwisek
1
<k
1
FAT
and, given the optimality of k
1
FAT
, the following non-
threshold policy earns higher discounted revenues than the optimal k
∗
FAT
threshold policy: when
k(t) ∈ [0,k
∗
FAT
], accept both class-1 and class-2 customers; then when k(t) ∈ [k
∗
FAT
,k
1
], accept
only class-1 customers; then when k(t) ∈ [k
1
,k
1
FAT
], accept both class-1 and class-2 customers;
then when k(t) ∈ [k
1
FAT
,c], accept only class-1 customers; finally, after k(t)hitsc, process
according to the FAT policy. But if this non-threshold policy earns higher discounted revenues
than a FAT policy with threshold k
∗
FAT
, then a FAT policy with threshold k
∗
FAT
+(k
1
FAT
− k
1
)
would also earn higher discounted revenues, and this contradicts the optimality of the k
∗
FAT
policy
for k =0.
Proof of Theorem 6
Proof
Here we focus on proving part b), since (32) is obtained by substituting (31) into (23) and
(24). Consider 0 <k≤ c
min
. By taking first and second derivatives of (32) with respect to c for
any k ≤ c, we observe that the optimal fluid revenue function is an increasing piecewise concave
function of c; that is, it is concave in each of the intervals k ≤ c<c
min
,c
min
≤ c<ρ
1
,ρ
1
≤ c<
ρ
1
+ ρ
2
, ρ
1
+ ρ
2
≤ c. In addition,
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = c
min
− 0) ≥
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = c
min
+0),
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = ρ
1
− 0) ≥
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = ρ
1
+0),
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = ρ
1
+ ρ
2
− 0)
≥
∂R
FAT
α
(k,k
∗
FAT
(c))
∂c
(c = ρ
1
+ ρ
2
+0), which ensures overall concavity. In exactly the same way,
monotonicity and concavity of (32) with respect to c ≥ k is demonstrated for c
min
<k≤ ρ
1
,
ρ
1
<k≤ ρ
1
+ ρ
2
,ρ
1
+ ρ
2
≤ k.
Proof of Theorem 7
46
Proof It is well known that, under complete sharing, the average revenue per period is increasing
and concave in the fleet size (see Messerli (1972)). Then statements a) and e) follow from the
definitions of h
∗
max
and h
∗
min
and from concavity of R (c, ∆
∗
(c)) and R (c, CS(c)) . From the
definitions of h
CS
max
and h
∗
max
it follows that for the values of h between h
∗
max
and h
CS
max
we
have c
∗
(h) ≥ 1 >c
CS
(h) = 0, and d) follows. Similarly, for h
∗
min
<h<h
CS
min
, we have
c
CS
(h) ≥ c>c − 1 ≥ c
∗
(h), and we obtain b). Finally, c) follows from b) and d) and the
piecewise continuity of c
CS
(h)andc
∗
(h).
Additional References Cited in the Appendices
A1. Altman, E., Constrained Markov Decision Processes, Chapman and Hall (1999).
A2. Messerli, E. J., “Proof of a Convexity Property of the Erlang B Formula”, The Bell System
Technical Journal, 51 (1972), 951-953.
A3. Sennott, L.I., “Computing Average Optimal Constrained Policies in Stochastic Dynamic
Programming,” Probability in the Engineering and Information Sciences, 15 (2001), 103-
133.
47