Conference Paper

Bounding Procedures for Stochastic Dynamic Programs with Application tothe Perimeter Patrol Problem

DOI: 10.1109/ACC.2012.6314780 Conference: American Control Conference, At Monteal, QC, CA
Source: DBLP

ABSTRACT

One often encounters the curse of dimensionality in
the application of dynamic programming to determine optimal
policies for controlled Markov chains. In this paper, we provide
a method to construct sub-optimal policies along with a bound
for the deviation of such a policy from the optimum via a
linear programming approach. The state-space is partitioned
and the optimal cost-to-go or value function is approximated
by a constant over each partition. By minimizing a positive
cost function defined on the partitions, one can construct an
approximate value function which also happens to be an upper
bound for the optimal value function of the original Markov
Decision Process (MDP). As a key result, we show that this
approximate value function is independent of the positive cost
function (or state dependent weights; as it is referred to, in
the literature) and moreover, this is the least upper bound that
one can obtain; once the partitions are specified. We apply
the linear programming approach to a perimeter surveillance
stochastic optimal control problem; whose structure enables
efficient computation of the upper bound.

Download full-text

Full-text

Available from: Meir Pachter
  • Source
    • "An added advantage of partitioning technique is that it allows symmetries present in the problem to be exploited. A lower bound can be computed as a solution of disjunctive linear program as in [15]. However, for the special case when ψ s (x s ) = x s ∞ , as we have noted in Lemma 1, the cardinality of the set T (k, u) is one. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with the development of sub-optimal decision making algorithms for a collection of robots in order to aid a remotely located human operator in the task of classification of incursions across a perimeter in a surveillance application. The operator is tasked with the classification of incursion as either a nuisance or a threat. Whenever there is an incursion into the perimeter, Unattended Ground Sensors (UGS) raise an alert and the robots service the alerts by visiting the alert location and collecting evidence in the form of video and other images and transmit them to the operator. There are two competing needs for a robot: it needs to spend more time at an alert location for aiding the operator in accurate classification and it needs to service the alerts as quickly as possible so that the evidence collected is relevant. A natural problem is to determine the optimal amount of time a robot must spend servicing an alert. In this paper, we discretize the problem spatially and temporally and recast the optimization problem as follows: Is it better for a robot to spend the next time interval at the alert location in terms of maximizing the expected, discounted payoff? The payoff associated with a state is an increasing function of the time spent by a robot servicing an alert and a decreasing function of the number of unserviced alerts. This problem can be easily be cast as a Markov Decision Process (MDP). However, the number of states runs into billions even for a modest size problem. We consider Approximate Dynamic Programming via linear programming as this approach provides an upper (and lower) bound on the optimal expected discounted payoff and enables the construction of a suboptimal policy. The bounds may then be used to provide an estimate of the quality of sub-optimal policy employed. We also provide a computationally tractable way of computing the lower bound using linear programming. Finally, numerical results supporting our method are provided.
    Full-text · Conference Paper · Oct 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One encounters the curse of dimensionality in the application of dynamic programming to determine optimal policies for large scale controlled Markov chains. In this article, we consider a perimeter patrol stochastic optimal control problem. To determine the optimal control policy, one has to solve a Markov decision problem, whose large size renders exact dynamic programming methods intractable. So, we propose a state aggregation based approximate linear programming method to construct provably good sub-optimal policies instead. The state-space is partitioned and the optimal cost-to-go or value function is restricted to be a constant over each partition. We show that the resulting restricted system of linear inequalities embeds a family of Markov chains of lower dimension, one of which can be used to construct a tight lower bound on the optimal value function. In general, the construction of the lower bound requires the solution to a combinatorial problem. But the perimeter patrol problem exhibits a special structure that enables tractable linear programming formulation for the lower bound. We demonstrate this and also provide numerical results that corroborate the efficacy of the proposed methodology.
    Full-text · Conference Paper · Jun 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: A common approximate dynamic programming method entails state partitioning and the use of linear programming, i.e., the state-space is partitioned and the optimal value function is approximated by a constant over each partition. By minimizing a positive cost function defined on the partitions, one can construct an upper bound for the optimal value function. We show that this approximate value function is independent of the positive cost function and that it is the least upper bound, given the partitions.
    No preview · Article · Nov 2012 · Operations Research Letters
Show more