Approximating Optimal Policies for Agents with Limited Execution Resources

IJCAI International Joint Conference on Artificial Intelligence 03/2004;
Source: CiteSeer


An agent with limited consumable execution resources needs policies that attempt to achieve good performance while respecting these limitations. Otherwise, an agent (such as a plane) might fail catastrophically (crash) when it runs out of resources (fuel) at the wrong time (in midair). We present a new approach to constructing policies for agents with limited execution resources that builds on principles of real-time AI, as well as research in constrained Markov decision processes. Specifically, we formulate, solve, and analyze the policy optimization problem where constraints are imposed on the probability of exceeding the resource limits. We describe and empirically evaluate our solution technique to show that it is computationally reasonable, and that it generates policies that sacrifice some potential reward in order to make the kinds of precise guarantees about the probability of resource overutilization that are crucial for mission-critical applications.

Download full-text


Available from: Edmund H. Durfee, Sep 11, 2013
  • Source
    • "However, to our best knowledge, MDPs with hard constraints have yet been studied. Attempts were made to solve MDP with soft constraints (Dolgov & Durfee 2003) but they focused on discrete-time MDP problems with positive costs and presented approximate solutions. We study the hard constrained (HC) problem in the sMDP domain for better modelling power where the sojourn time between states is continuous and random. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In multiple criteria Markov Decision Processes (MDP) where multiple costs are incurred at every decision point, current methods solve them by minimising the expected primary cost criterion while constraining the expectations of other cost criteria to some critical values. However, systems are of- ten faced with hard constraints where the cost criteria should never exceed some criticalvalues at any time, rather than con- straints based on the expected cost criteria. For example, a resource-limited sensor network no longer functions once its energy is depleted. Based on the semi-MDP (sMDP) model, we study the hard constrained (HC) problem in continuous time, state and action spaces with respect to both finite and infinite horizons, and various cost criteria. We show that the HCsMDP problem is NP-hard and that there exists an equiv- alent discrete-time MDP to every HCsMDP. Hence, classical methods such as reinforcement learning can solve HCsMDPs.
    Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, USA; 01/2006
  • Source
    • "For some domains that involve resources whose overutilization can have dire consequences, it might not be sufficient to bound the expected consumption of a resource, and more expressive risk-sensitive constraints might be required (Ross & Chen 1988; Sobel 1985). In particular, it might be desirable to bound the probability that the resource consumption exceeds a given upper bound (Dolgov & Durfee 2003; 2004). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The problem of optimal policy formulation for teams of resource-limited agents in stochastic environments is com- posed of two strongly-coupled subproblems: a resource allo- cation problem and a policy optimization problem. We show how to combine the two problems into a single constrained optimization problem that yields optimal resource allocations and policies that are optimal under these allocations. We model the system as a multiagent Markov decision process (MDP), with social welfare of the group as the optimization criterion. The straightforward approach of modeling both the resource allocation and the actual operation of the agents as a multiagent MDP on the joint state and action spaces of all agents is not feasible, because of the exponential increase in the size of the state space. As an alternative, we describe a technique that exploits problem structure by recognizing that agents are only loosely-coupled via the shared resource constraints. This allows us to formulate a constrained policy optimization problem that yields optimal policies among the class of realizable ones given the shared resource limitations. Although our complexity analysis shows the constrained opti- mization problem to be NP-complete, our results demonstrate that, by exploiting problem structure and via a reduction to a mixed integer program, we are able to solve problems orders of magnitude larger than what is possible using a traditional multiagent MDP formulation.
    Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 3-7 2004, Whistler, British Columbia, Canada; 01/2004
  • Source
    • "which allows one to express È ´ ¼ µ as a linear function of the occupancy measure Ü. Our investigation [5] of this approximation showed, unsurprisingly, that this linear approximation is computationally cheap but usually leads to suboptimal policies, because the Markov inequality provides a very rough upper bound on the probability that the total cost exceed a given limit ¼ . The purpose of this work is to improve this approximation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The majority of the work in the area of Markov decision processes has focused on expected values of rewards in the objective function and expected costs in the constraints. Although several methods have been proposed to model risksensitive utility functions and constraints, they are only applicable to certain classes of utility functions and allow limited expressiveness in the constraints. We propose a construction that extends the standard linear programming formulation of MDPs by augmenting it with additional optimization variables, which allows us to compute the higher order moments of the total costs (and/or reward). This greatly increases the expressive power of the model, and supports reasoning about the probability distributions of the total costs (reward). Consequently, this allows us to formulate more interesting constraints and to model a wide range of utility functions. In particular, in this work we show how to formulate the constraint that bounds the probability of the total incurred costs falling within a given range. Constraints of that type arise, for example, when one needs to bound the probability of overutilizing a consumable resource. Our construction, which greatly increases the expressive power of our model, unfortunately comes at the cost of significantly increasing the size and the complexity of the optimization program. On the other hand, it allows one to choose how many higher order moments of the costs (and/or reward) are modeled as a means of balancing accuracy against computational effort.
Show more