Preprint

Dynamic neighbourhood optimisation for task allocation using multi-agent learning

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present three algorithms that combine to solve these problems. We focus on distributed agent systems where learning behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with different capabilities in carrying out tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5× better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Hierarchical Reinforcement Learning (HRL) out-performs many 'flat' Reinforcement Learning (RL) algorithms in some application domains. However, HRL may need longer time to obtain the optimal policy because of its large action space. Potential Based Reward Shaping (PBRS) has been widely used to incorporate heuristics into flat RL algorithms so as to reduce their exploration. In this paper, we investigate the integration of PBRS and HRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRS-MAXQ-0 is guaranteed to converge. Empirical results show that PBRS-MAXQ-0 significantly out-performs MAXQ-0 given good heuristics, and can converge even when given misleading heuristics.
Article
Full-text available
The Infrastructure-as-a-Service model of cloud computing allocates resources in the form of virtual machines that can be resized and live migrated at runtime. This paper presents a novel distributed resource allocation approach for VM consolidation relying on multi-agent systems. Our approach uses a utility function based on host CPU utilization to drive live migration actions. Experimental results show reduced service level agreement violations and a better overall performance compared to a centralized approach and a threshold-based distributed approach.
Article
Full-text available
Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.
Article
Full-text available
Reinforcement learning has significant applications for multi-agent systems, especially in unknown dynamic environments. However, most multi-agent reinforcement learning (MARL) algorithms suffer from such problems as exponential computation complexity in the joint state-action space, which makes it difficult to scale up to realistic multi-agent problems. In this paper, a novel algorithm named negotiation-based MARL with sparse interactions (NegoSI) is presented. In contrast to traditional sparse-interaction based MARL algorithms, NegoSI adopts the equilibrium concept and makes it possible for agents to select the non-strict Equilibrium Dominating Strategy Profile (non-strict EDSP) or Meta equilibrium for their joint actions. The presented NegoSI algorithm consists of four parts: the equilibrium-based framework for sparse interactions, the negotiation for the equilibrium set, the minimum variance method for selecting one joint action and the knowledge transfer of local Q-values. In this integrated algorithm, three techniques, i.e., unshared value functions, equilibrium solutions and sparse interactions are adopted to achieve privacy protection, better coordination and lower computational complexity, respectively. To evaluate the performance of the presented NegoSI algorithm, two groups of experiments are carried out regarding three criteria: steps of each episode (SEE), rewards of each episode (REE) and average runtime (AR). The first group of experiments is conducted using six grid world games and shows fast convergence and high scalability of the presented algorithm. Then in the second group of experiments NegoSI is applied to an intelligent warehouse problem and simulated results demonstrate the effectiveness of the presented NegoSI algorithm compared with other state-of-the-art MARL algorithms.
Article
Full-text available
In complex, open, and heterogeneous environments, agents must be able to reorganize towards the most appropriate organizations to adapt unpredictable environment changes within Multi-Agent Systems (MAS). Types of reorganization can be seen from two different levels. The individual agents level (micro-level) in which an agent changes its behaviors and interactions with other agents to adapt its local environment. And the organizational level (macro-level) in which the whole system changes its structure by adding or removing agents. This chapter is dedicated to overview different aspects of what is called MAS Organization including its motivations, paradigms, models, and techniques adopted for statically or dynamically organizing agents in MAS.
Article
Full-text available
Wireless sensor networks (WSNs) have been widely investigated in recent years. One of the fundamental issues in WSNs is packet routing, because in many application domains, packets have to be routed from source nodes to destination nodes as soon and as energy efficiently as possible. To address this issue, a large number of routing approaches have been proposed. Although every existing routing approach has advantages, they also have some disadvantages. In this paper, a multi-agent framework is proposed that can assist existing routing approaches to improve their routing performance. This framework enables each sensor node to build a cooperative neighbour set based on past routing experience. Such cooperative neighbours, in turn, can help the sensor to effectively relay packets in the future. This framework is independent of existing routing approaches and can be used to assist many existing routing approaches. Simulation results demonstrate the good performance of this framework in terms of four metrics: average delivery latency, successful delivery ratio, number of live nodes and total sensing coverage.
Article
Full-text available
Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.
Article
Full-text available
The dynamicity of distributed wireless networks caused by node mobility, dynamic network topology, and others has been a major challenge to routing in such networks. In the traditional routing schemes, routing decisions of a wireless node may solely depend on a predefined set of routing policies, which may only be suitable for a certain network circumstances. Reinforcement Learning (RL) has been shown to address this routing challenge by enabling wireless nodes to observe and gather information from their dynamic local operating environment, learn, and make efficient routing decisions on the fly. In this article, we focus on the application of the traditional, as well as the enhanced, RL models, to routing in wireless networks. The routing challenges associated with different types of distributed wireless networks, and the advantages brought about by the application of RL to routing are identified. In general, three types of RL models have been applied to routing schemes in order to improve network performance, namely Q-routing, multi-agent reinforcement learning, and partially observable Markov decision process. We provide an extensive review on new features in RL-based routing, and how various routing challenges and problems have been approached using RL. We also present a real hardware implementation of a RL-based routing scheme. Subsequently, we present performance enhancements achieved by the RL-based routing schemes. Finally, we discuss various open issues related to RL-based routing schemes in distributed wireless networks, which help to explore new research directions in this area. Discussions in this article are presented in a tutorial manner in order to establish a foundation for further research in this field.
Article
Full-text available
Multiagent systems (MAS) are widely accepted as an important method for solving problems of a distributed nature. A key to the success of MAS is efficient and effective multiagent learning (MAL). The past twenty-five years have seen a great interest and tremendous progress in the field of MAL. This article introduces and overviews this field by presenting its fundamentals, sketching its historical development and describing some key algorithms for MAL. Moreover, main challenges that the field is facing today are indentified.
Article
Full-text available
This paper describes the concept of sensor networks which has been made viable by the convergence of micro-electro-mechanical systems technology, wireless communications and digital electronics. First, the sensing tasks and the potential sensor networks applications are explored, and a review of factors influencing the design of sensor networks is provided. Then, the communication architecture for sensor networks is outlined, and the algorithms and protocols developed for each layer in the literature are explored. Open research issues for the realization of sensor networks are also discussed.
Conference Paper
Full-text available
Creating coordinated multiagent policies in environments with un- certainty is a challenging problem, which can be greatly simplified if the coordination needs are known to be limited to specific parts of the state space, as previous work has successfully shown. In this work, we assume that such needs are unknown and we investigate coordination learning in multiagent settings. We contribute a rein- forcement learning based algorithm in which independent decision- makers/agents learn both individual policies and when and how to coordinate. We focus on problems in which the interaction between the agents is sparse, exploiting this property to minimize the cou- pling of the learning processes for the different agents. We intro- duce a two-layer extension of Q-learning, in which we augment the action space of each agent with a coordination action that uses information from other agents to decide the correct action. Our results show that our agents learn both to act coordinate and to act independently, in the different regions of the space where they need to, and need not to, coordinate, respectively.
Conference Paper
Full-text available
We introduce the use of learned shaping rewards in reinforcement learning tasks, where an agent uses prior experience on a sequence of tasks to learn a portable predictor that estimates interme- diate rewards, resulting in accelerated learning in later tasks that are related but distinct. Such agents can be trained on a sequence of relatively easy tasks in order to develop a more informative measure of reward that can be transferred to im- prove performance on more difficult tasks with- out requiring a hand coded shaping function. We use a rod positioning task to show that this signif- icantly improves performance even after a very brief training period. tion in problem-space that is Markov for the particular task at hand, and one in agent-space that may not be Markov but that is retained across successive task instances (each of which may require a new problem-space, possibly of a different size). The agent learns to initially estimate reward for novel states from "sensations" in agent-space in order to speed up reinforcement learning in problem-space. Although this method also applies to other types of se- quential decision problems, in this paper we focus on goal- directed exploration tasks because they most clearly illus- trate our point, and we present the results of a rod position- ing experiment in which our method significantly improves performance after only a brief period of training.
Conference Paper
Full-text available
In Cloud service composition, collaboration between brokers and service providers is essential to promptly satisfy incoming Cloud consumer requirements. These requirements should be mapped to Cloud resources, which are accessed via web services, in an automated manner. However, distributed and constantly changing Cloud-computing environments pose new challenges to automated service composition such as: (i) dynamically contracting service providers, which set service fees on a supply-and-demand basis, and (ii) dealing with incomplete information regarding Cloud resources (e.g., location and providers). To address these issues, in this work, an agent-based Cloud service composition approach is presented. Cloud participants and resources are implemented and instantiated by agents. These agents sustain a three-layered self-organizing multi-agent system that establishes a Cloud service composition framework and an experimental test bed. The self-organizing agents make use of acquaintance networks and the contract net protocol to evolve and adapt Cloud service compositions. The experimental results indicate that service composition is efficiently achieved despite dealing with incomplete information as well as coping with dynamic service fees.
Conference Paper
Full-text available
Resource allocation in computing clusters is traditionally centralized, which limits the cluster scale. Effective resource allocation in a network of computing clusters may enable building larger computing infrastructures. We consider this problem as a novel application for multiagent learning (MAL). We propose a MAL algorithm and apply it for optimizing online resource allocation in cluster networks. The learning is distributed to each cluster, using local information only and without access to the global system reward. Experimental results are encouraging: our multiagent learning approach performs reasonably well, compared to an optimal solution, and better than a centralized myopic allocation approach in some cases.
Article
Full-text available
Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across dierent data centers). At this scale, small and large components fail continuously. The way Cassandra man- ages the persistent state in the face of these failures drives the reliability and scalability of the software systems rely- ing on this service. While in many ways Cassandra resem- bles a database and shares many design and implementation strategies therewith, Cassandra does not support a full rela- tional data model; instead, it provides clients with a simple data model that supports dynamic control over data lay- out and format. Cassandra system was designed to run on cheap commodity hardware and handle high write through- put while not sacricing read eciency.
Article
Full-text available
Monitoring of the marine environment has come to be a field of scientific interest in the last ten years. The instruments used in this work have ranged from small-scale sensor networks to complex observation systems. Among small-scale networks, Wireless Sensor Networks (WSNs) are a highly attractive solution in that they are easy to deploy, operate and dismantle and are relatively inexpensive. The aim of this paper is to identify, appraise, select and synthesize all high quality research evidence relevant to the use of WSNs in oceanographic monitoring. The literature is systematically reviewed to offer an overview of the present state of this field of study and identify the principal resources that have been used to implement networks of this kind. Finally, this article details the challenges and difficulties that have to be overcome if these networks are to be successfully deployed.
Article
Full-text available
Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.
Conference Paper
Autonomic computing is being advocated as a tool for maintaining and managing large and complex computing systems. Self-organising multi-agent systems provide a suitable paradigm for developing such autonomic systems. Towards this goal, we demonstrate a robust, decentralised approach for structural adaptation in explicitly modelled problem solving agent organisations. Our method is based on self-organisation principles and enables the agents to modify the organisational structure to achieve a better allocation of tasks across the organisation in a simulated task-solving environment. The agents forge and dissolve relations with other agents using their history of interactions as guidance. We empirically show that the efficiency of organisations using our approach is close to that of organisations having an omniscient central allocator and considerably better than static organisations or those changing the structure randomly.
Article
Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The second is based on the modification of the exploration process through the incorporation of external knowledge or the guidance of a risk metric. We use the proposed classification to survey the existing literature, as well as suggesting future directions for Safe Reinforcement Learning.
Article
Cloud computing is becoming an interesting alternative as a flexible and affordable on-demand environment for deploying custom applications in the form of services. In this work, a self-organizing system model based on dynamic relation network is proposed. In the model, autonomic element can self-adapted to the weight of the relationship under the guidance of the self-organizing policy. Based on this organization model, we present the service-oriented dynamic self-organizing algorithm in Cloud computing, which implement autonomic element self-organization through service finding, service composition, task implementation and service optimization.
Article
Reward functions are an essential component of many robot learning methods. Defining such functions, however, remains hard in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. We introduce a framework, wherein the robot simultaneously learns an action policy and a model of the reward function by actively querying a human expert for ratings. We represent the reward model using a Gaussian process and evaluate several classical acquisition functions (AFs) from the Bayesian optimization literature in this context. Furthermore, we present a novel AF, expected policy divergence. We demonstrate results of our method for a robot grasping task and show that the learned reward function generalizes to a similar task. Additionally, we evaluate the proposed novel AF on a real robot pendulum swing-up task.
Conference Paper
Nowadays and in the near future, the complexity of computer applications is exponentially increasing. This complexity comes from the inherent properties of such applications: the great number of their involved components, the distribution of their control and skills, the nonlinearity of their process and their increasing openness. This is also caused by the unpredictable coupling with their environment due to high dynamicity. To fulfill these requirements, systems have to adapt themselves in order to be robust and efficient. This paper will deal with self-adaptation in software systems, particularly from a multi-agent viewpoint and will focus on the Adaptive Multi-Agent Systems theory.
Article
Service composition in multi-Cloud environments must coordinate self-interested participants, automate service selection, (re)configure distributed services, and deal with incomplete information about Cloud providers and their services. This work proposes an agent-based approach to compose services in multi-Cloud environments for different types of Cloud services: one-time virtualized services, e.g., processing a rendering job, persistent virtualized services, e.g., infrastructure-as-a-service scenarios, vertical services, e.g., integrating homogenous services, and horizontal services, e.g., integrating heterogeneous services. Agents are endowed with a semi-recursive contract net protocol and service capability tables (information catalogs about Cloud participants) to compose services based on consumer requirements. Empirical results obtained from an agent-based testbed show that agents in this work can: successfully compose services to satisfy service requirements, autonomously select services based on dynamic fees, effectively cope with constantly changing consumers’ service needs that trigger updates, and compose services in multiple Clouds even with incomplete information about Cloud participants.
Article
As computer networks (and computational grids) become increasingly complex, the problem of allocating resources within such networks, in a distributed fashion, will become more and more of a design and implementation concern. This is especially true where the allocation involves distributed collections of resources rather than just a single resource, where there are alternative patterns of resources with different levels of utility that can satisfy the desired allocation, and where this allocation process must be done in soft real-time. This book is the first of its kind to examine solutions to this problem using ideas taken from the field of multiagent systems. The field of multiagent systems has itself seen an exponential growth in the past decade, and has developed a variety of techniques for distributed resource allocation.
The component placement problem involves mapping a component to a particular location and maximising component utility in grid and cloud systems. It is also an NP hard resource allocation and deployment problem, so many common grid and cloud computing libraries, such as MPICH and Hadoop, do not address this problem, even though large performance gains can occur by optimising communications between nodes. This paper provides four contributions to research on the component placement problem for grid and cloud computing environments. First, we present the multi-agent distributed adaptive resource allocation (MADARA) toolkit, which is designed to address grid and cloud allocation and deployment needs. Second, we present a heuristic called the comparison-based iteration by degree (CID) heuristic, which we use to approximate optimal deployments in MADARA. Third, we analyse the performance of applying the CID heuristic to approximate common grid and cloud operations, such as broadcast, gather and reduce. Fourth, we evaluate the results of applying genetic programming mutation to improve our CID heuristic.
Article
This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.
Article
In today's competitive industry marketplace, the companies face growing demands to improve process efficiencies, comply with environmental regulations, and meet corporate financial objectives. Given the increasing age of many industrial systems and the dynamic industrial manufacturing market, intelligent and low-cost industrial automation systems are required to improve the productivity and efficiency of such systems. The collaborative nature of industrial wireless sensor networks (IWSNs) brings several advantages over traditional wired industrial monitoring and control systems, including self-organization, rapid deployment, flexibility, and inherent intelligent-processing capability. In this regard, IWSN plays a vital role in creating a highly reliable and self-healing industrial system that rapidly responds to real-time events with appropriate actions. In this paper, first, technical challenges and design principles are introduced in terms of hardware development, system architectures and protocols, and software development. Specifically, radio technologies, energy-harvesting techniques, and cross-layer design for IWSNs have been discussed. In addition, IWSN standards are presented for the system owners, who plan to utilize new IWSN technologies for industrial automation applications. In this paper, our aim is to provide a contemporary look at the current state of the art in IWSNs and discuss the still-open research issues in this field and, hence, to make the decision-making process more effective and direct.
Article
This paper seeks to demonstrate that autonomic behaviour is not restricted to resource-rich system, as typified by large servers, but can be incorporated into distributed and computationally challenged devices. Methods regarding how wireless sensor networks can benefit from the use of autonomic techniques without being overburdened with additional computing costs will be discussed. This will be achieved through the use of multi-agent systems (MAS). The discussion is grounded in the development of an autonomic wireless sensor network aimed at environmental sensing, an environmental nervous system. Finally, we provide empirical evidence of self-management via the use of distributed agents.
Conference Paper
We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worst- case lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They re- quire a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good perfor- mance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.
Conference Paper
One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.
Conference Paper
Air traffic flow management is one of the fundamental chal- lenges facing the Federal Aviation Administration (FAA) today. The FAA estimates that in 2005 alone, there were over 322,000 hours of delays at a cost to the industry in ex- cess of three billion dollars. Finding reliable and adaptive solutions to the flow management problem is of paramount importance if the Next Generation Air Transportation Sys- tems are to achieve the stated goal of accommodating three times the current traffic volume. This problem is particu- larly complex as it requires the integration and/or coordi- nation of many factors including: new data (e.g., changing weather info), potentially conflicting priorities (e.g., differ- ent airlines), limited resources (e.g., air traffic controllers) and very heavy traffic volume (e.g., over 40,000 flights over the US airspace). In this paper we use FACET - an air traffic flow simulator developed at NASA and used extensively by the FAA and industry - to test a multi-agent algorithm for traffic flow management. An agent is associated with a fix (a specific location in 2D space) and its action consists of setting the separation required among the airplanes going though that fix. Agents use reinforcement learning to set this separation and their actions speed up or slow down traffic to manage congestion. Our FACET based results show that agents re- ceiving personalized rewards reduce congestion by up to 45% over agents receiving a global reward and by up to 67% over a current industry approach (Monte Carlo estimation).
Conference Paper
In many multi-agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(λ)/Q-learning. In this paper, we present a new reward evaluation method that provides a visualization of the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces. This method is independent of the learning algorithm and is only a function of the problem domain and the agents' reward structure. We then use this reward property visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents' movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain.
Conference Paper
Most existing algorithm for structural credit assignment are developed for competitive reinforcement learning systems. In competitive reinforcement learning system, agents are activated one by one, so there is only one active agent at a time and structural credit assignment could be implemented by some temporal credit assignment algorithms. In collaborated reinforcement learning systems, agents are activated simultaneously, so how to transform the global reinforcement signal fed back from the environment to a reinforcement vector is a crucial difficulty that could not be slide over. In this article, the first really feasible and efficient structural credit assignment difficulty in collaborated reinforcement learning systems is primarily solved. The experiments show that the algorithm converges very rapidly and the assignment result is quite satisfying.
Article
Typescript. Thesis (Ph. D.)--University of Massachusetts at Amherst, 1984. Includes bibliographical references (leaves 200-210). Microfilm.
Conference Paper
Service-oriented computing (SOC) research and applications have been largely limited to software development in electronic and Web based applications. Service-oriented robotics software development extends SOC from its current fields to a new domain, which was considered not feasible because of the efficiency issues in terms of computing and communication. This paper presents the concepts, principles, and methods in SOC and SOC-based robotics software development and applies them in the design of distributed robotics applications. Two case studies, Intel security robot design and a maze-traversing robot application in Microsoft Robotics Studio, are used to illustrate the concepts and methods.
Article
In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and effectively learn a near-optimal policy in a wide variety of settings. A sequence of increasingly complex empirical tests verifies the efficacy of this technique.
Multi-agent Reinforcement Learning: An Overview
  • L Buşoniu
  • R Babuška
  • De Schutter
Buşoniu, L., Babuška, R., and De Schutter, B. Multi-agent Reinforcement Learning: An Overview. Proceedings of the 2nd International Conference on Multi-Agent Systems 19 (2010), 183-221.
Shaping fitness functions for coevolving cooperative multiagent systems
  • M Colby
  • K Tumer
Colby, M., and Tumer, K. Shaping fitness functions for coevolving cooperative multiagent systems. AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems 1, Volume, June (2012), 1-8.
Learning multi-agent state space representations
  • Y M Dehauwere
  • P Vrancx
  • A Nowé
DeHauwere, Y. M., Vrancx, P., and Nowé, A. Learning multi-agent state space representations. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS (2010).
Self-organisation: Paradigms and applications
  • G Di Marzo Serugendo
  • N Foukia
  • S Hassas
  • A Karageorgos
  • S K Mostéfaoui
  • O F Rana
  • M Ulieru
  • P Valckenaers
  • C Van Aart
Di Marzo Serugendo, G., Foukia, N., Hassas, S., Karageorgos, A., Mostéfaoui, S. K., Rana, O. F., Ulieru, M., Valckenaers, P., and Van Aart, C. Self-organisation: Paradigms and applications. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004).
Raft refloated: Do we have consensus?
  • H Howard
  • M Schwarzkopf
  • A Madhavapeddy
  • J Crowcroft
Howard, H., Schwarzkopf, M., Madhavapeddy, A., and Crowcroft, J. Raft refloated: Do we have consensus? In Operating Systems Review (ACM) (2015).
Microservices as agents in IoT systems
  • P Krivic
  • P Skocir
  • M Kusek
Krivic, P., Skocir, P., Kusek, M., and Jezic, G. Microservices as agents in IoT systems. Smart Innovation, Systems and Technologies 74, January (2018), 22-31.
A theoretical and empirical analysis of reward transformations in multi-objective stochastic games
  • P Mannion
  • J Duggan
  • E Howley
Mannion, P., Duggan, J., and Howley, E. A theoretical and empirical analysis of reward transformations in multi-objective stochastic games. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS (2017).
Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing
  • H Mao
  • Z Gong
Mao, H., Gong, Z., and Xiao, Z. Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing, 2020.
Automatic shaping and decomposition of reward functions
  • B Marthi
Marthi, B. Automatic shaping and decomposition of reward functions. In ACM International Conference Proceeding Series (2007).
Policy invariance under reward transformations : Theory and application to reward shaping
  • A Y Ng
  • D Harada
Ng, A. Y., Harada, D., and Russell, S. Policy invariance under reward transformations : Theory and application to reward shaping. Sixteenth International Conference on Machine Learning (1999).
In search of an understandable consensus algorithm
  • D Ongaro
  • J Ousterhout
Ongaro, D., and Ousterhout, J. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC 2014 (2019).