Fabio Somenzi’s research while affiliated with University of Colorado Boulder and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (281)


Multi-Agent Reinforcement Learning for Alternating-Time Logic
  • Chapter

October 2024

·

8 Reads

Ernst Moritz Hahn

·

Mateo Perez

·

Sven Schewe

·

[...]

·

Dominik Wojtczak

Alternating-time temporal logic (ATL) extends branching time logic by enabling quantification over paths that result from the strategic choices made by multiple agents in various coalitions within the system. While classical temporal logics express properties of “closed” systems, ATL can express properties of “open” systems resulting from interactions among several agents. Reinforcement learning (RL) is a sampling-based approach to decision-making where learning agents, guided by a scalar reward function, discover optimal policies through repeated interactions with the environment. The challenge of translating high-level objectives into scalar rewards for RL has garnered increased interest, particularly following the success of model-free RL algorithms. This paper presents an approach for deploying model-free RL to verify multi-agent systems against ATL specifications. The key contribution of this paper is a verification procedure for model-free RL of quantitative and non-nested classic ATL properties, based on Q-learning, demonstrated on a natural subclass of non-nested ATL formulas.


Fig. 3. The FST from the proof of Theorem 1, simulating the transition function of a Turing machine over a binary alphabet.
Fig. 5. Automata used to represent even and odd equivalence classes.
Fig. 6. Reward curves for the token passing case study.
Fig. 7. Execution of the optimal policy for the duplicating pebbles case study.
Fig. 8. Reward curve for the duplicating pebbles case study.

+2

Regular Reinforcement Learning
  • Chapter
  • Full-text available

July 2024

·

148 Reads

In reinforcement learning, an agent incrementally refines a behavioral policy through a series of episodic interactions with its environment. This process can be characterized as explicit reinforcement learning, as it deals with explicit states and concrete transitions. Building upon the concept of symbolic model checking, we propose a symbolic variant of reinforcement learning, in which sets of states are represented through predicates and transitions are represented by predicate transformers. Drawing inspiration from regular model checking, we choose regular languages over the states as our predicates, and rational transductions as predicate transformations. We refer to this framework as regular reinforcement learning , and study its utility as a symbolic approach to reinforcement learning. Theoretically, we establish results around decidability, approximability, and efficient learnability in the context of regular reinforcement learning. Towards practical applications, we develop a deep regular reinforcement learning algorithm, enabled by the use of graph neural networks. We showcase the applicability and effectiveness of (deep) regular reinforcement learning through empirical evaluation on a diverse set of case studies.

Download

Assume-Guarantee Reinforcement Learning

March 2024

·

14 Reads

·

1 Citation

Proceedings of the AAAI Conference on Artificial Intelligence

We present a modular approach to reinforcement learning (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making assumptions about the behaviors of neighboring components, and providing guarantees about their own behavior. We express these assume-guarantee contracts as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.


Omega-Regular Decision Processes

March 2024

·

3 Reads

Proceedings of the AAAI Conference on Artificial Intelligence

Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct representation of non-Markovian decision processes, their expressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead over the system evolution. Semantically, these lookaheads can be considered as promises made by the decision maker or the learning agent about her future behavior. In particular, we assume that, if the promised lookaheads are not met, then the payoff to the decision maker is falsum (least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction.


A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

March 2024

·

3 Reads

·

2 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

Linear temporal logic (LTL) and omega-regular objectives---a superset of LTL---have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes (MDPs). As part of the development of our algorithm, we introduce the epsilon-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the omega-regular objective in the limit. We prove that our algorithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory.


Omega-Regular Reward Machines

September 2023

·

28 Reads

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and ω-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces ω-regular reward machines, which integrate reward machines with ω-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute ε-optimal strategies against ω-regular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.


Omega-Regular Reward Machines

August 2023

·

28 Reads

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.


Fig. 1. Example showing non-robustness of safety specifications.
Fig. 4. Reward machines for ϕ = p (left) and ϕ = X λ q (right). The transitions are labeled by the guard and reward.
Policy synthesis in MDPs for different classes of specifications.
Policy Synthesis and Reinforcement Learning for Discounted LTL

July 2023

·

38 Reads

·

3 Citations

Lecture Notes in Computer Science

The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.


Multi-Objective Omega-Regular Reinforcement Learning

June 2023

·

19 Reads

·

15 Citations

Formal Aspects of Computing

The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω -automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: 1) weighted preference , where the decision maker provides scalar weights for various objectives, and 2) lexicographic preference , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this paper we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω -regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω -regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool Mungojerrie and we present an experimental evaluation of our technique on benchmark learning problems.


Policy Synthesis and Reinforcement Learning for Discounted LTL

May 2023

·

7 Reads

The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.


Citations (62)


... A prevalent approach for drift detection involves monitoring the classification performance of the model. According to the Probably Approximately Correct (PAC) theory [1], classifier performance tends to stabilize with increased learning time under stable conditions. Therefore, significant changes in the error rate of the base classifier can be viewed as indicators of concept drift. ...

Reference:

Variance Feedback Drift Detection Method for Evolving Data Streams Mining
A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... Even more, CBCs for specifications beyond simple reachability yields, e.g., sequential reachability problems (Jagtap et al., 2020), greatly impeding their applicability. Further work on model-free reinforcement learning studies synthesizing robust temporal logic controllers without constructing an explicit model of the system (Hasanbeig et al., 2019;Kazemi and Soudjani, 2020;Kazemi et al., 2024b). An approach based on computing reachable sets leverages random set theory to obtain infinite-sample guarantees is provided by Lew and Pavone (2021) without analyzing finite-sample convergence rates. ...

Assume-Guarantee Reinforcement Learning
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... Our approach falls into the category of provably safe RL (PSRL) techniques [7,16], treating safety as a hard constraint that must never be violated. This is in contrast to statistically safe RL techniques, which provide only statistical bounds on the system's safety by constraining the training objectives [3,19,20,21,22]. These soft guarantees, however, are insufficient for domains like autonomous driving, where each failure can be catastrophic. ...

Policy Synthesis and Reinforcement Learning for Discounted LTL

Lecture Notes in Computer Science

... However, rewards are not sufficient for other properties, for instance, the probability that trains will arrive in a specific order. Model checking (Baier and Katoen 2008) is not limited by properties that can be expressed by rewards (Hahn et al. 2019;Hasanbeig, Kroening, and Abate 2020;Vamplew et al. 2022), but supports a broader range of properties that can be expressed by probabilistic computation tree logic (PCTL; Hansson and Jonsson 1994). ...

Multi-Objective Omega-Regular Reinforcement Learning
  • Citing Article
  • June 2023

Formal Aspects of Computing

... Recently, using RL with lexicographic ordering began to attract attention from other communities as well. For example, Hahn et al. [4] use formal methods to construct single objective MDPs when all of the objectives are ω-regular. ...

Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives
  • Citing Chapter
  • November 2021

Lecture Notes in Computer Science

... Due to this step, Mungojerrie has been connected to external linear program solvers. This enabled the extension of Mungojerrie to compute reward maximizing policies via a linear program for branching Markov decision processes in [18]. ...

Model-Free Reinforcement Learning for Branching Markov Decision Processes

Lecture Notes in Computer Science

... However, the work of Hahn et al. [12] revealed challenges in translating formal specifications to reward machines, and proposed a correct translation from more general ω-automata based requirements to reward machines. Since then, several formally correct reward schemes [14,13,3,20] have been proposed to automate ω-regular reward translation. ...

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives
  • Citing Chapter
  • October 2020

Lecture Notes in Computer Science

... Also, the symbolic approach can be extended to the various control structures such as the output feedback control [13], [14], [5]. The data-driven and learning based control concept can be also integrated into the symbolic setting [15], [16], [17]. Optimal solution of the reachability control problem for a finite state system is considered in a number of works [1], [18]. ...

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
  • Citing Conference Paper
  • April 2020

... those encoding stability problems. As a consequence, this class of LDBAs is known as Good-for-MDPs [15], since the non-determinism can be resolved on the fly in arbitrary MDPs without changing acceptance. Figure 2: Illustrative task T 0 (left), and LDBA encoding the formula (right, with starting state marked as 0 and accepting state in green). ...

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Lecture Notes in Computer Science