# Approximating Optimal Policies for Agents with Limited Execution Resources

### Full-text

Edmund H. Durfee, Sep 11, 2013 Available from:-
##### Conference Paper: Hard Constrained Semi-Markov Decision Processes.

[Show abstract] [Hide abstract]

**ABSTRACT:**In multiple criteria Markov Decision Processes (MDP) where multiple costs are incurred at every decision point, current methods solve them by minimising the expected primary cost criterion while constraining the expectations of other cost criteria to some critical values. However, systems are of- ten faced with hard constraints where the cost criteria should never exceed some criticalvalues at any time, rather than con- straints based on the expected cost criteria. For example, a resource-limited sensor network no longer functions once its energy is depleted. Based on the semi-MDP (sMDP) model, we study the hard constrained (HC) problem in continuous time, state and action spaces with respect to both finite and infinite horizons, and various cost criteria. We show that the HCsMDP problem is NP-hard and that there exists an equiv- alent discrete-time MDP to every HCsMDP. Hence, classical methods such as reinforcement learning can solve HCsMDPs.Proceedings, The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, July 16-20, 2006, Boston, Massachusetts, USA; 01/2006 -
##### Conference Paper: Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes.

[Show abstract] [Hide abstract]

**ABSTRACT:**The problem of optimal policy formulation for teams of resource-limited agents in stochastic environments is com- posed of two strongly-coupled subproblems: a resource allo- cation problem and a policy optimization problem. We show how to combine the two problems into a single constrained optimization problem that yields optimal resource allocations and policies that are optimal under these allocations. We model the system as a multiagent Markov decision process (MDP), with social welfare of the group as the optimization criterion. The straightforward approach of modeling both the resource allocation and the actual operation of the agents as a multiagent MDP on the joint state and action spaces of all agents is not feasible, because of the exponential increase in the size of the state space. As an alternative, we describe a technique that exploits problem structure by recognizing that agents are only loosely-coupled via the shared resource constraints. This allows us to formulate a constrained policy optimization problem that yields optimal policies among the class of realizable ones given the shared resource limitations. Although our complexity analysis shows the constrained opti- mization problem to be NP-complete, our results demonstrate that, by exploiting problem structure and via a reduction to a mixed integer program, we are able to solve problems orders of magnitude larger than what is possible using a traditional multiagent MDP formulation.Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), June 3-7 2004, Whistler, British Columbia, Canada; 01/2004 - [Show abstract] [Hide abstract]

**ABSTRACT:**The majority of the work in the area of Markov decision processes has focused on expected values of rewards in the objective function and expected costs in the constraints. Although several methods have been proposed to model risksensitive utility functions and constraints, they are only applicable to certain classes of utility functions and allow limited expressiveness in the constraints. We propose a construction that extends the standard linear programming formulation of MDPs by augmenting it with additional optimization variables, which allows us to compute the higher order moments of the total costs (and/or reward). This greatly increases the expressive power of the model, and supports reasoning about the probability distributions of the total costs (reward). Consequently, this allows us to formulate more interesting constraints and to model a wide range of utility functions. In particular, in this work we show how to formulate the constraint that bounds the probability of the total incurred costs falling within a given range. Constraints of that type arise, for example, when one needs to bound the probability of overutilizing a consumable resource. Our construction, which greatly increases the expressive power of our model, unfortunately comes at the cost of significantly increasing the size and the complexity of the optimization program. On the other hand, it allows one to choose how many higher order moments of the costs (and/or reward) are modeled as a means of balancing accuracy against computational effort.