Leonore Winterer’s research while affiliated with University of Freiburg and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


Strong Simple Policies for POMDPs
  • Article
  • Full-text available

June 2024

·

39 Reads

International Journal on Software Tools for Technology Transfer

Leonore Winterer

·

·

Bernd Becker

·

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that provably adheres to one or more specifications. Yet, the general problem is undecidable, and policies require full (and thus potentially unbounded) traces of execution history. To provide good approximations of such policies, POMDP agents often employ randomization over action choices. We consider the problem of computing simpler policies for POMDPs, and provide several approaches to still ensure their expressiveness. Key aspects are (1) the combination of an arbitrary number of specifications the policies need to adhere to, (2) a restricted form of randomization, and (3) a light-weight preprocessing of the POMDP model to encode memory. We provide a novel encoding as a mixed-integer linear program as baseline to solve the underlying problems. Our experiments demonstrate that the policies we obtain are more robust, smaller, and easier to implement for an engineer than those obtained from state-of-the-art POMDP solvers.

Download

Strengthening Deterministic Policies for POMDPs

August 2020

·

36 Reads

·

6 Citations

Lecture Notes in Computer Science

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows to take trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.


Strengthening Deterministic Policies for POMDPs

July 2020

·

26 Reads

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.


Strategy Synthesis for POMDPs in Robot Planning via Game-Based Abstractions

April 2020

·

30 Reads

·

15 Citations

IEEE Transactions on Automatic Control

Leonore Winterer

·

·

·

[...]

·

Bernd Becker

We study synthesis problems with constraints in partially observable Markov decision processes (POMDPs), where the objective is to compute a strategy for an agent that is guaranteed to satisfy certain safety and performance specifications. Verification and strategy synthesis for POMDPs are, however, computationally intractable in general. We alleviate this difficulty by focusing on planning applications and exploiting typical structural properties of such scenarios; for instance, we assume that the agent has the ability to observe its own position inside an environment. We propose an abstraction refinement framework which turns such a POMDP model into a (fully observable) probabilistic two-player game (PG). For the obtained PGs, efficient verification and synthesis tools allow to determine strategies with optimal safety and performance measures, which approximate optimal schedulers on the POMDP. If the approximation is too coarse to satisfy the given specifications, an refinement scheme improves the computed strategies. As a running example, we use planning problems where an agent moves inside an environment with randomly moving obstacles and restricted observability. We demonstrate that the proposed method advances the state of the art by solving problems several orders-of-magnitude larger than those that can be handled by existing POMDP solvers. Furthermore, this method gives guarantees on safety constraints, which is not supported by the majority of the existing solvers


Correct-by-construction policies for POMDPs

April 2019

·

36 Reads

·

3 Citations

In this extended abstract, we discuss how to compute policies with finite memory---so-called finite-state controllers (FSCs)---for partially observable Markov decision processes (POMDPs) that are provably correct with respect to given specifications. In particular, for a POMDP M and a specification φ, we want to solve the decision problem whether there is a policy σ for M with k memory states, such that φ is satisfied by M and σ (Mσ |= φ). The underlying method is achieved via a marriage of formal verification and artificial intelligence. The key insight is that computing (randomized) FSCs on POMDPs is equivalent to---and computationally as hard as---synthesis for parametric Markov chains (pMCs). The parameter synthesis problem is to decide whether for a parametric Markov chain (pMC) M and a specification φ there is a parameter instantiation u such that in the Markov chain (MC) induced by u the specification is satisfied (M[u] |= φ). The correspondence---depicted in Figure 1---allows to utilize efficient tools for synthesis in pMCs to compute correct-by-construction FSCs on POMDPs.


Fig. 2: Grid for SC4. The cameras observe the shaded area.
Motion planning under partial observability using game-based abstraction

December 2017

·

63 Reads

·

21 Citations


Table 2 . Benchmarks (a) Instances
Fig. 5. Unfolding a POMDP for two memory states 
Permissive Finite-State Controllers of POMDPs using Parameter Synthesis

October 2017

·

138 Reads

·

94 Citations

We study finite-state controllers (FSCs) for partially observable Markov decision processes (POMDPs). The key insight is that computing (randomized) FSCs on POMDPs is equivalent to synthesis for parametric Markov chains (pMCs). This correspondence enables using parameter synthesis techniques to compute FSCs for POMDPs in a black-box fashion. We investigate how typical restrictions on parameter values affect the quality of the obtained FSCs. Permissive strategies for POMDPs are obtained as regions of parameter values, a natural output of parameter synthesis techniques. Experimental evaluation on several POMDP benchmarks shows promising results.


Fig. 1: Schematic overview of the approach
Fig. 2: Grid for SC4. The cameras observe the shaded area. 
Motion Planning under Partial Observability using Game-Based Abstraction

August 2017

·

133 Reads

·

6 Citations

We study motion planning problems where agents move inside environments that are not fully observable and subject to uncertainties. The goal is to compute a strategy for an agent that is guaranteed to satisfy certain safety and performance specifications. Such problems are naturally modelled by partially observable Markov decision processes (POMDPs). Because of the potentially huge or even infinite belief space of POMDPs, verification and strategy synthesis is in general computationally intractable. We tackle this difficulty by exploiting typical structural properties of such scenarios; for instance, we assume that agents have the ability to observe their own positions inside an environment. Ambiguity in the state of the environment is abstracted into non-deterministic choices over the possible states of the environment. Technically, this abstraction transforms POMDPs into probabilistic two-player games (PGs). For these PGs, efficient verification tools are able to determine strategies that approximate certain measures on the POMDP. If an approximation is too coarse to provide guarantees, an abstraction refinement scheme further resolves the belief space of the POMDP. We demonstrate that our method improves the state of the art by orders of magnitude compared to a direct solution of the POMDP.


Citations (6)


... Our contributions This article is an extension of [58]. In comparison to our earlier work, we added a section on 1 https://www.gurobi.com/. ...

Reference:

Strong Simple Policies for POMDPs
Strengthening Deterministic Policies for POMDPs
  • Citing Chapter
  • August 2020

Lecture Notes in Computer Science

... Game-based abstraction. Game-based abstraction was primarily explored towards the verification of very large or infinite MDPs [27,36] [58] is conceptually close to our game. However, for POMDPs, the abstraction cannot use the specific structure of our problem, and for refinement towards a robust policy, it can (and must) add memory to the policies, whereas we aim to find a concise policy tree via divide-and-conquer. ...

Strategy Synthesis for POMDPs in Robot Planning via Game-Based Abstractions
  • Citing Article
  • April 2020

IEEE Transactions on Automatic Control

... Similarly, if computational complexity, rather than decidability, is the limiting factor for finding or creating effective risk management policies or plans, then further restricting the systems, environments, policies, and objectives considered may restore tractability. For example, it is often possible to design restricted policies (such as limited-memory controllers for POMDPs) for which behaviors can be guaranteed to conform with desired specifications within a limited set of allowed specifications (Jansen et al., 2019). Similarly, there are many restricted but useful classes of discrete, continuous, and hybrid dynamic systems for which risk analysis questions can be answered, including questions of reachability, controllability, and decision optimization that are undecidable for less restricted systems Laf-ferriere, Pappas, & Yovine, 1999). ...

Correct-by-construction policies for POMDPs
  • Citing Conference Paper
  • April 2019

... Analysis under Partial Observation. Partially observable Markov decision processes (POMDPs) extend MDPs with partial observability and have many applications in fields such as AI, scheduling and planning (e.g., [Grady et al. 2015;Jagannathan et al. 2013;Patil et al. 2014;Winterer et al. 2017]). The formal verification community has studied POMDPs and presented theoretical results, e.g., [Chatterjee et al. 2013a;Chatterjee and Doyen 2014], however, progress on developing practical methods has been limited. ...

Motion planning under partial observability using game-based abstraction

... These approaches represent policies as a set of α-vectors. For policy search, alternatives are to search for randomised policies via gradient descent (Heck et al., 2022) or via convex optimisation (Junges et al., 2018;Cubuktepe et al., 2021). Recent approaches extract FSCs via deep reinforcement learning (Carr, Jansen, & Topcu, 2021). ...

Permissive Finite-State Controllers of POMDPs using Parameter Synthesis

... Previously proposed methods to solve the problem are e.g. to use approximate value iteration [22], optimisation and search techniques [1,12], dynamic programming [6], Monte Carlo simulation [43], game-based abstraction [51], and machine learning [13,14,19]. Other approaches restrict the memory size of the policies [35]. ...

Motion Planning under Partial Observability using Game-Based Abstraction