About
84
Publications
42,729
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,343
Citations
Introduction
Dr Patrick Mannion is a full-time member of academic staff at National University of Ireland Galway, where he lectures in Computer Science. From January 2017 to May 2019 he served as an Assistant Lecturer in Computing at Galway-Mayo Institute of Technology. He completed his PhD in Machine Learning at NUI Galway in 2017. He also holds a BEng in Civil Engineering (NUI Galway, 2012), a HDipAppSc in Software Development (NUI Galway, 2013), and a PgCert in Teaching and Learning (GMIT, 2018). He is a former Irish Research Council Scholar and a former Fulbright-TechImpact Scholar. His research interests include autonomous vehicles, reinforcement learning, multi-agent systems, and multi-objective optimisation.
Current institution
Additional affiliations
June 2019 - present
January 2017 - May 2019
October 2013 - August 2017
Education
October 2013 - August 2017
September 2012 - August 2013
September 2008 - May 2012
Publications
Publications (84)
Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shapi...
Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of Reinforcement Learning agents in single-objective problems. Here we extend the guarantees of Potential-Based Reward Shaping (PBRS) by providing theoretical proof that PBRS does not alter the true Pareto front i...
Urban traffic congestion has become a serious issue, and improving the flow of traffic through cities is critical for environmental, social and economic reasons. Improvements in Adaptive Traffic Signal Control (ATSC) have a pivotal role to play in the future development of Smart Cities, and in the alleviation of traffic congestion. Here we describe...
The majority of multi-agent reinforcement learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignm...
Special issue on adaptive and learning agents 2017 - Volume 33 - Patrick Mannion, Anna Harutyunyan, Kaushik Subramanian
Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer...
Many decision-making problems feature multiple objectives where it is not always possible to know the preferences of a human or agent decision-maker for different objectives. However, demonstrated behaviors from the decision-maker are often available. This research proposes a dynamic weight-based preference inference (DWPI) algorithm that can infer...
Addressing the question of how to achieve optimal decision-making under risk and uncertainty is crucial for enhancing the capabilities of artificial agents that collaborate with or support humans. In this work, we address this question in the context of Public Goods Games. We study learning in a novel multi-objective version of the Public Goods Gam...
Many challenging tasks such as managing traffic systems, electricity grids, or supply chains involve complex decision-making processes that must balance multiple conflicting objectives and coordinate the actions of various independent decision-makers (DMs). One perspective for formalising and addressing such tasks is multi-objective multi-agent rei...
Effective residential appliance scheduling is crucial for sustainable living. While multi-objective reinforcement learning (MORL) has proven effective in balancing user preferences in appliance scheduling, traditional MORL struggles with limited data in non-stationary residential settings characterized by renewable generation variations. Significan...
Reinforcement learning is commonly applied in residential energy management, particularly for optimizing energy costs. However, RL agents often face challenges when dealing with deceptive and sparse rewards in the energy control domain, especially with stochastic rewards. In such situations, thorough exploration becomes crucial for learning an opti...
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion r...
It has been shown that an agent can be trained with an adversarial policy which achieves high degrees of success against a state-of-the-art DRL victim despite taking unintuitive actions. This prompts the question: is this adversarial behaviour detectable through the observations of the victim alone? We find that widely used classification methods s...
Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Lear...
For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion r...
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions us...
Many decision-making problems feature multiple objectives. In such problems, it is not always possible to know the preferences of a decision-maker for different objectives. However, it is often possible to observe the behavior of decision-makers. In multi-objective decision-making, preference inference is the process of inferring the preferences of...
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions us...
Real-world sequential decision-making tasks are usually complex , and require trade-offs between multiple-often conflicting-objectives. However, the majority of research in reinforcement learning (RL) and decision-theoretic planning assumes a single objective, or that multiple objectives can be handled via a predefined weighted sum over the objecti...
In many real-world scenarios, the utility of a user is derived from a single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to...
The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can...
In many real-world scenarios, the utility of a user is derived from a single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user’s preferences over objectives (also known as the utility function) are unknown or difficult to...
Many real-world problems contain both multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependenci...
Residential buildings are large consumers of energy. They contribute significantly to the demand placed on the grid, particularly during hours of peak demand. Demand‐side management is crucial to reducing this demand placed on the grid and increasing renewable utilisation. This research study presents a multi‐objective tunable deep reinforcement le...
In sequential multi-objective decision making (MODeM) settings, when the utility of a user is derived from a single execution of a policy, policies for the expected scalarised returns (ESR) criterion should be computed. In multi-objective settings, a user's preferences over objectives, or utility function, may be unknown at the time of planning. Wh...
In sequential multi-objective decision making (MODeM) settings, when the utility of a user is derived from a single execution of a policy, policies for the expected scalarised returns (ESR) criterion should be computed. In multi-objective settings, a user's preferences over objectives, or utility function, may be unknown at the time of planning. Wh...
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear co...
Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w...
Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we prese...
The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to...
In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult t...
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a dec...
In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult t...
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination....
Anaerobic Digestion (AD) is a waste treatment technology widely used for wastewater and solid waste treatment, with the advantage of being a source of renewable energy in the form of biogas. Anaerobic digestion model number 1 (ADM1) is the most common mathematical model available for AD modelling. Commercial software implementations of ADM1 are ava...
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods...
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a dec...
When developing reinforcement learning agents, the standard approach is to train an agent to converge to a fixed policy that is as close to optimal as possible for a single fixed reward function. If different agent behaviour is required in the future, an agent trained in this way must normally be either fully or partially retrained, wasting valuabl...
Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we prese...
In multi-objective multi-agent systems (MOMASs), agents explicitly consider the possible trade-offs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analyzed on the basis of the utility that these compromises have for the users of a system, where an agent’s utility function maps thei...
Special issue on adaptive and learning agents 2019 - Volume 35 - Patrick Mannion, Patrick MacAlpine, Bei Peng, Roxana Rădulescu
The goal of multi-objective problems is to find solutions that balance different objectives. When solving multi-objective problems using reinforcement learning linear scalarisation techniques are generally used, however system expertise is required to optimise the weights for linear scalarisation. Thresholded Lexicographic Ordering (TLO) is one tec...
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments. This review summarises deep reinforcement learning (DRL) algorithms, provides a taxonomy of automated driving tasks where (D)RL methods ha...
In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their...
The majority of multi-agent system implementations aim to optimise agents’ policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We ar...
The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions....
In this paper, we leverage curriculum learning (CL) to improve the performance of multiagent systems (MAS) that are trained with the cooperative coevolution of artificial neural networks. We design curricula to progressively change two dimensions: scale (i.e. domain size) and coupling (i.e. the number of agents required to complete a subtask). We d...
In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their...
Correctly identifying vulnerable road users (VRUs), e.g. cyclists and pedestrians, remains one of the most challenging environment perception tasks for autonomous vehicles (AVs). This work surveys the current state-of-the-art in VRU detection, covering topics such as benchmarks and datasets, object detection techniques and relevant machine learning...
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments...
Multi-Agent Systems (MAS) are a form of distributed intelligence, where multiple autonomous agents act in a common environment. Numerous complex, real world systems have been successfully optimised using Multi-Agent Reinforcement Learning (MARL) in conjunction with the MAS framework. In MARL agents learn by maximising a scalar reward signal from th...
Multi-Agent Reinforcement Learning (MARL) is a powerful Machine Learning paradigm, where multiple autonomous agents can learn to improve the performance of a system through experience. The majority of MARL implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-...
Reward shaping has been proposed as a means to address the credit assignment problem in Multi-Agent Systems (MAS). Two popular shaping methods are Potential-Based Reward Shaping and difference rewards, and both have been shown to improve learning speed and the quality of joint policies learned by agents in single-objective MAS. In this work we disc...
In order to accelerate the learning process in high dimensional reinforcement learning problems, TD methods such as Q-learning and Sarsa are usually combined with eligibility traces. The recently introduced DQN (Deep Q-Network) algorithm, which is a combination of Q-learning with a deep neural network, has achieved good performance on several games...
The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignm...
Multi-Agent Reinforcement Learning (MARL) is a powerful Machine Learning paradigm, in which agents learn to improve their performance at a task by maximising a scalar reward signal from the environment. The design of the reward function affects the joint policies learned, and therefore the issue of credit assignment in MARL is an area of active res...
In this paper, we examine the application of Multi-Agent Reinforcement Learning (MARL) to a Dynamic Economic Emissions Dispatch problem. This is a multi-objective problem domain, where the conflicting objectives of fuel cost and emissions must be minimised. We evaluate the performance of several different MARL credit assignment structures in this d...
Multi-Agent Reinforcement Learning (MARL) is a powerful Machine Learning paradigm, where multiple autonomous agents can learn to improve the performance of a system through experience. In this paper, we examine the application of MARL to a Dynamic Economic Emissions Dispatch (DEED) problem. This is a multi-objective problem domain, where the confli...
In a Multi-Agent System (MAS), multiple agents act autonomously in a common environment. Agents in competitive MAS are self-interested, so they typically come into conflict with each other when trying to achieve their own goals. One such example is that of multiple agents sharing a common resource, where each agent seeks to maximise its own gain wi...
Multi-Agent Reinforcement Learning (MARL) is an area of research that combines Reinforcement Learning (RL) with Multi-Agent Systems (MAS). In MARL, agents learn over time by trial and error, what actions to take depending on the state of the environment. The focus of this paper will be to apply MARL to the Watershed management problem. This problem...
In Reinforcement Learning (RL), an agent learns to improve its performance by maximizing the return from a reward function. Reward shaping augments the reward function with additional knowledge provided by the system designer. Potential-Based Reward Shaping (PBRS) is a form of reward shaping that provides theoretical guarantees including policy inv...
Parallel Reinforcement Learning (PRL) is an emerging paradigm within Reinforcement Learning (RL) literature, where multiple agents share their experiences while learning in parallel on separate instances of a problem. Here we propose a novel variant of PRL with State Action Space Partitioning (SASP). PRL agents are each assigned to a specific regio...
Developing Adaptive Traffic Signal Control strategies for efficient urban traffic management is a challenging problem, which is not easily solved. Reinforcement Learning (RL) has been shown to be a promising approach when applied to traffic signal control (TSC) problems. When using RL agents for TSC, difficulties may arise with respect to convergen...
Reinforcement Learning (RL) is a commonly used and effective Machine Learning technique, but can perform poorly when faced with complex problems
leading to a slow rate of convergence. Parallel Learning (PL) is a novel paradigm within RL that seeks to address these concerns.
In PL, multiple agents pool their experiences while learning concurrently o...
The development of Adaptive Traffic Signal Control strategies for efficient urban traffic management is a major challenge faced by traffic engineers today.
Reinforcement Learning (RL) has been shown to be a promising approach when applied to traffic signal control (TSC) problems. When using RL agents for
TSC, difficulties may arise with learning sp...
The development of Adaptive Traffic Signal Control strategies for efficient urban traffic management is a major challenge faced by traffic engineers today. Reinforcement Learning (RL) has been shown to be a promising approach when applied to traffic signal control (TSC) problems. When using RL agents for TSC difficulties may arise with learning spe...
The expansion rates of a pyritiferous Irish mudstone-siltstone fill material have been measured over a period of 19 months in an apparatus devised to replicate underfloor conditions. The testing, performed in a temperature-controlled environment, has shown that both fill density and depth submerged in water have significant influences on the progre...
The expansion rates of a pyritiferous Irish mudstone–siltstone fill material have been measured over a period
of 19 months in an apparatus devised to replicate underfloor conditions. The testing, performed in a temperature-controlled environment, has shown that both fill density and depth submerged in water have significant influences on the progre...
Urban traffic congestion is now a serious issue, and improving the flow of traffic through cities is critical for environmental, social and economic reasons. The aim of this research is to develop more efficient traffic management strategies by applying Multi Agent Reinforcement Learning methods to control traffic lights. This article presents some...