ArticlePublisher preview available

Qualitative control learning can be much faster than reinforcement learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Reinforcement learning has emerged as a prominent method for controlling dynamic systems in the absence of a precise mathematical model. However, its reliance on extensive interactions with the environment often leads to prolonged training periods. In this paper, we propose an alternative approach to learning control policies that focuses on learning qualitative models and uses symbolic planning to derive a qualitative plan for the control task, which is executed by an adaptive reactive controller. We conduct experiments utilizing our approach on the cart-pole problem, a standard benchmark in dynamic system control. We additionally extend this problem domain to include uneven terrains, such as driving over craters or hills, to assess the robustness of learned controllers. Our results indicate that qualitative learning offers significant advantages over reinforcement learning in terms of sample efficiency, transferability, and interpretability. We demonstrate that our proposed approach is at least two orders of magnitude more sample efficient in the cart-pole domain than the usual variants of reinforcement learning.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Machine Learning (2025) 114:4
https://doi.org/10.1007/s10994-024-06724-7
Qualitative control learning can be much faster
thanreinforcement learning
DomenŠoberl1· IvanBratko2
Received: 5 April 2024 / Revised: 20 August 2024 / Accepted: 27 September 2024 /
Published online: 14 January 2025
© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2025
Abstract
Reinforcement learning has emerged as a prominent method for controlling dynamic sys-
tems in the absence of a precise mathematical model. However, its reliance on extensive
interactions with the environment often leads to prolonged training periods. In this paper,
we propose an alternative approach to learning control policies that focuses on learning
qualitative models and uses symbolic planning to derive a qualitative plan for the control
task, which is executed by an adaptive reactive controller. We conduct experiments utiliz-
ing our approach on the cart-pole problem, a standard benchmark in dynamic system con-
trol. We additionally extend this problem domain to include uneven terrains, such as driv-
ing over craters or hills, to assess the robustness of learned controllers. Our results indicate
that qualitative learning offers significant advantages over reinforcement learning in terms
of sample efficiency, transferability, and interpretability. We demonstrate that our proposed
approach is at least two orders of magnitude more sample efficient in the cart-pole domain
than the usual variants of reinforcement learning.
Keywords Qualitative modeling· Qualitative reasoning· Qualitative control· Transfer
learning
Editors:Rita P.Ribeiro, Ana Carolina Lorena and Albert Bifet.
* Domen Šoberl
domen.soberl@famnit.upr.si
Ivan Bratko
ivan.bratko@fri.uni-lj.si
1 Department ofInformation Sciences andTechnologies, Faculty ofMathematics, Natural Sciences
andInformation Technologies, University ofPrimorska, Glagoljaška 8, 6000Koper, Slovenia
2 Artificial Intelligence Laboratory, Faculty ofComputer andInformation Science, University
ofLjubljana, Večna pot 113, 1000Ljubljana, Slovenia
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper, three recently introduced reinforcement learning (RL) methods are used to generate human-interpretable policies for the cart-pole balancing benchmark. The novel RL methods learn human-interpretable policies in the form of compact fuzzy controllers and simple algebraic equations. The representations as well as the achieved control performances are compared with two classical controller design methods and three non-interpretable RL methods. All eight methods utilize the same previously generated data batch and produce their controller offline-without interaction with the real benchmark dynamics. The experiments show that the novel RL methods are able to automatically generate well-performing policies which are at the same time human-interpretable. Furthermore, one of the methods is applied to automatically learn an equation-based policy for a hardware cart-pole demonstrator by using only human-player-generated batch data. The solution generated in the first attempt already represents a successful balancing policy, which demonstrates the methods applicability to real-world problems.
Article
Full-text available
Qualitative modeling allows autonomous agents to learn comprehensible control models, formulated in a way that is close to human intuition. By abstracting away certain numerical information, qualitative models can provide better insights into operating principles of a dynamic system in comparison to traditional numerical models. We show that qualitative models, learned from numerical traces, contain enough information to allow motion planning and path following. We demonstrate our methods on the task of flying a quadcopter. A qualitative control model is learned through motor babbling. Training is significantly faster than training times reported in papers using reinforcement learning with similar quadcopter experiments. A qualitative collision-free trajectory is computed by means of qualitative simulation, and executed reactively while dynamically adapting to numerical characteristics of the system. Experiments have been conducted and assessed in the V-REP robotic simulator.
Conference Paper
Full-text available
Designing optimal controllers continues to be challenging as systems are becoming complex and are inherently nonlinear. The principal advantage of reinforcement learning (RL) is its ability to learn from the interaction with the environment and provide optimal control strategy. In this paper, RL is explored in the context of control of the benchmark cartpole dynamical system with no prior knowledge of the dynamics. RL algorithms such as temporal-difference, policy gradient actor-critic, and value function approximation are compared in this context with the standard LQR solution. Further, we propose a novel approach to integrate RL and swing-up controllers.
Article
Full-text available
We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.
Chapter
The classical problem of balancing an inverted pendulum is commonly used to evaluate control learning techniques. Traditional learning methods aim to improve the performance of the learned controller, often disregarding comprehensibility of the learned control policies. Recently, Explainable AI (XAI) has become of great interest in the areas where humans can benefit from insights discovered by AI, or need to check whether AI’s decisions make sense. Learning qualitative models allows formulation of learned hypotheses in a comprehensible way, closer to human intuition than traditional numerical learning. In this paper, we use a qualitative approach to learning control strategies, which we demonstrate on the problem of balancing an inverted pendulum. We use qualitative induction to learn a qualitative model from experimentally collected numerical traces, and qualitative simulation to search for possible qualitative control strategies, which are tested through reactive execution. Successful behaviors provide a clear explanation of the learned control strategy.
Book
An argument that qualitative representations—symbolic representations that carve continuous phenomena into meaningful units—are central to human cognition. In this book, Kenneth Forbus proposes that qualitative representations hold the key to one of the deepest mysteries of cognitive science: how we reason and learn about the continuous phenomena surrounding us. Forbus argues that qualitative representations—symbolic representations that carve continuous phenomena into meaningful units—are central to human cognition. Qualitative representations provide a basis for commonsense reasoning, because they enable practical reasoning with very little data; this makes qualitative representations a useful component of natural language semantics. Qualitative representations also provide a foundation for expert reasoning in science and engineering by making explicit the broad categories of things that might happen and enabling causal models that help guide the application of more quantitative knowledge as needed. Qualitative representations are important for creating more human-like artificial intelligence systems with capabilities for spatial reasoning, vision, question answering, and understanding natural language. Forbus discusses, among other topics, basic ideas of knowledge representation and reasoning; qualitative process theory; qualitative simulation and reasoning about change; compositional modeling; qualitative spatial reasoning; and learning and conceptual change. His argument is notable both for presenting an approach to qualitative reasoning in which analogical reasoning and learning play crucial roles and for marshaling a wide variety of evidence, including the performance of AI systems. Cognitive scientists will find Forbus's account of qualitative representations illuminating; AI scientists will value Forbus's new approach to qualitative representations and the overview he offers.
Article
Atari games are an excellent testbed for studying intelligent behavior, as they offer a range of tasks that differ widely in their visual representation, game dynamics, and goals presented to an agent. The last two years have seen a spate of research into artificial agents that use a single algorithm to learn to play these games. The best of these artificial agents perform at better-than-human levels on most games, but require hundreds of hours of game-play experience to produce such behavior. Humans, on the other hand, can learn to perform well on these tasks in a matter of minutes. In this paper we present data on human learning trajectories for several Atari games, and test several hypotheses about the mechanisms that lead to such rapid learning.