Chapter

Self-learning Governance of Black-Box Multi-Agent Systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Agents in Multi-Agent Systems (MAS) are not always built and controlled by the system designer, e.g., on electronic trading platforms. In this case, there is often a system objective which can differ from the agents’ own goals (e.g., price stability). While much effort has been put into modeling and optimizing agent behavior, we are concerned in this paper with the platform perspective. Our model extends Stochastic Games (SG) with dynamic restriction of action spaces to a new self-learning governance approach for black-box MAS. This governance learns an optimal restriction policy via Reinforcement Learning.As an alternative to the two straight-forward approaches—fully centralized control and fully independent learners—, this novel method combines a sufficient degree of autonomy for the agents with selective restriction of their action spaces. We demonstrate that the governance, though not explicitly instructed to leave any freedom of decision to the agents, learns that combining the agents’ and its own capabilities is better than controlling all actions. As shown experimentally, the self-learning approach outperforms (w.r.t. the system objective) both “full control” where actions are always dictated without any agent autonomy, and “ungoverned MAS” where the agents simply pursue their individual goals. KeywordsMulti-Agent SystemGovernanceSelf-learning systemReinforcement LearningElectronic institution

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
We introduce a modular and transparent approach for augmenting the ability of reinforcement learning agents to comply with a given norm base. The normative supervisor module functions as both an event recorder and real-time compliance checker w.r.t. an external norm base. We have implemented this module with a theorem prover for defeasible deontic logic, in a reinforcement learning agent that we task with playing a “vegan” version of the arcade game Pac-Man.
Article
Full-text available
The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid traction, and the latest accomplishments address problems with real-world complexity. This article provides an overview of the current developments in the field of multi-agent deep reinforcement learning. We focus primarily on literature from recent years that combines deep reinforcement learning methods with a multi-agent scenario. To survey the works that constitute the contemporary landscape, the main contents are divided into three parts. First, we analyze the structure of training schemes that are applied to train multiple agents. Second, we consider the emergent patterns of agent behavior in cooperative, competitive and mixed scenarios. Third, we systematically enumerate challenges that exclusively arise in the multi-agent domain and review methods that are leveraged to cope with these challenges. To conclude this survey, we discuss advances, identify trends, and outline possible directions for future work in this research area.
Article
Full-text available
Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem.
Article
Full-text available
Given recent requirements for ensuring the robustness of algorithmic trading strategies laid out in the Markets in Financial Instruments Directive II, this paper proposes a novel agent-based simulation for exploring algorithmic trading strategies. Five different types of agents are present in the market. The statistical properties of the simulated market are compared with equity market depth data from the Chi-X exchange and found to be significantly similar. The model is able to reproduce a number of stylised market properties including: clustered volatility, autocorrelation of returns, long memory in order flow, concave price impact and the presence of extreme price events. The results are found to be insensitive to reasonable parameter variations.
Article
Full-text available
Intelligent transport systems, efficient electric grids, and sensor networks for data collection and analysis are some examples of the multi-agent systems (MAS) that cooperate to achieve common goals. Decision making is an integral part of intelligent agents and MAS that will allow such systems to accomplish increasingly complex tasks. In this survey, we investigate state of the art work within the past five years on cooperative MAS decision making models, including Markov decision processes, game theory, swarm intelligence and graph theoretic models. We survey algorithms that result in optimal and sub-optimal policies such as reinforcement learning, dynamic programming, evolutionary computing and neural networks. We also discuss the application of these models to robotics, wireless sensor networks, cognitive radio networks, intelligent transport systems and smart electric grids. In addition, we define key terms in the area and discuss remaining challenges that include incorporating big data advancements to decision making, developing autonomous, scalable and computationally efficient algorithms, tackling more complex tasks and developing standardized evaluation metrics. While recent surveys have been published on this topic, we present a broader discussion of related models and applications.
Article
Full-text available
Since multi-agent systems are inspired by human societies, they do not only borrow their coordination mechanisms such as conventions and norms, but also need to consider the processes that describe how norms come about, how they propagate in the society, and how they change over time.In the NorMAS community, this is best reflected in various norm life cycle conceptions that look at normative processes from a holistic perspective. While the earliest life cycle model emerged in the research field of international relations, the first life cycle model in the AI community has been proposed at the 2009 NorMAS Dagstuhl workshop by Savarimuthu and Cranefield [2009] and is based on a comprehensive survey of then existing contributions to the research field. Subsequently, two further models have been proposed that offer more refined accounts of the fundamental underlying processes.In this article, we review all existing norm life cycle models (Section 2), including the introduction of the individual life cycle models and their contextualization with specific contributions that exemplify life cycle processes.
Chapter
Full-text available
Reinforcement Learning was originally developed for Markov Decision Processes (MDPs). It allows a single agent to learn a policy that maximizes a possibly delayed reward signal in a stochastic stationary environment. It guarantees convergence to the optimal policy, provided that the agent can sufficiently experiment and the environment in which it is operating is Markovian. However, when multiple agents apply reinforcement learning in a shared environment, this might be beyond the MDP model. In such systems, the optimal policy of an agent depends not only on the environment, but on the policies of the other agents as well. These situations arise naturally in a variety of domains, such as: robotics, telecommunications, economics, distributed control, auctions, traffic light control, etc. In these domains multi-agent learning is used, either because of the complexity of the domain or because control is inherently decentralized. In such systems it is important that agents are capable of discovering good solutions to the problem at hand either by coordinating with other learners or by competing with them. This chapter focuses on the application reinforcement learning techniques in multi-agent systems. We describe a basic learning framework based on the economic research into game theory, and illustrate the additional complexity that arises in such systems. We also described a representative selection of algorithms for the different areas of multi-agent reinforcement learning research.
Article
Full-text available
In a multiagent system, if agents' experiences could be accessible and assessed between peers for environmental modeling, they can alleviate the burden of exploration for unvisited states or unseen situations so as to accelerate the learning process. Since how to build up an effective and accurate model within a limited time is an important issue, especially for complex environments, this paper introduces a model-based reinforcement learning method based on a tree structure to achieve efficient modeling and less memory consumption. The proposed algorithm tailored a Dyna-Q architecture to multiagent systems by means of a tree structure for modeling. The tree-model built from real experiences is used to generate virtual experiences such that the elapsed time in learning could be reduced. As well, this model is suitable for knowledge sharing. This paper is inspired by the concept of knowledge sharing methods in multiagent systems where an agent could construct a global model from scattered local models held by individual agents. Consequently, it can increase modeling accuracy so as to provide valid simulated experiences for indirect learning at the early stage of learning. To simplify the sharing process, the proposed method applies resampling techniques to grafting partial branches of trees containing required and useful experiences disseminated from experienced peers, instead of merging the whole trees. The simulation results demonstrate that the proposed sharing method can achieve the objectives of sample efficiency and learning acceleration in multiagent cooperation applications.
Chapter
Full-text available
Electronic institutions (EIs) have been proposed as a means of regulating open agent societies. EIs define the rules of the game in agent societies by fixing what agents are permitted and forbidden to do and under what circumstances. And yet, there is the need for EIs to adapt their regulations to comply with their goals despite coping with varying populations of self-interested agents. In this paper we focus on the extension of EIs with autonomic capabilities to allow them to yield a dynamical answer to changing circumstances through the adaptation of their norms.
Chapter
Full-text available
In this article we argue that open agent organisations can be effectively designed and implemented as institutionalized electronic organisations (electronic institutions) composed of a vast amount of heterogeneous (human and software) agents playing different roles and interacting by means of speech acts. Here we take the view that the design and development of electronic institutions must be guided by a principled methodology. Along this direction, we advocate for the presence of an underlying formal method that underpins the use of structured design techniques and formal analysis, facilitating development, composition and reuse. For this purpose we propose a specification formalism for electronic institutions that founds their design, analysis and development.
Conference Paper
Full-text available
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demon- strate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value it- eration based on a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equi- libria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.
Conference Paper
Full-text available
In this paper we present the Electronic Institutions Development Environment (EIDE) to support the engineering of multiagent systems as Electronic Institutions. An electronic institution defines a set of rules that structure agent interactions, establishing what agents are permitted and forbidden to do, as well as the consequences of their actions. EIDE supports and facilitates all the stages of electronic institutions' engineering, namely from the specification of an institutional rules to its execution and monitoring.
Conference Paper
Full-text available
There is a growing interest in the study and development of self-* systems motivated by the need for information systems capable of self-management in distributed, open, and dynamic scenarios. Unfortu- nately,there is a lack of frameworks that support the intricate task of developing self-* systems. We try to make headway along this direction by introducing a framework, EIDE-*, to support the engineering of a par- ticular type of self-* systems, namely autonomic electronic institutions: regulated environments capable of adapting their norms to comply with institutional goals despite the varying behaviours of their participating agents.
Article
Full-text available
This article introduces the research issues related to and definition of normative multiagent systems. @InProceedings{boella_et_al:DSP:2007:918, author = {Guido Boella and Leendert van der Torre and Harko Verhagen}, title = { Introduction to Normative Multiagent Systems}, booktitle = {Normative Multi-agent Systems}, year = {2007}, editor = {Guido Boella and Leon van der Torre and Harko Verhagen }, number = {07122}, series = {Dagstuhl Seminar Proceedings}, ISSN = {1862-4405}, publisher = {Internationales Begegnungs- und Forschungszentrum f{"u}r Informatik (IBFI), Schloss Dagstuhl, Germany}, address = {Dagstuhl, Germany}, URL = {http://drops.dagstuhl.de/opus/volltexte/2007/918}, annote = {Keywords: Norms, Multiagent systems, Normative multiagent systems} }
Article
Full-text available
This paper introduces Correlated-Q (CE-Q) learning, a multiagent Q-learning algorithm based on the correlated equilibrium (CE) solution concept. CE-Q generalizes both Nash- Q and Friend-and-Foe-Q: in general-sum games, the set of correlated equilibria contains the set of Nash equilibria; in constantsum games, the set of correlated equilibria contains the set of minimax equilibria. This paper describes experiments with four variants of CE-Q, demonstrating empirical convergence to equilibrium policies on a testbed of general-sum Markov games.
Article
Full-text available
We are concerned with the utility of social laws in a computational environment, laws which guarantee the successful coexistence of multiple programs and programmers. In this paper we are interested in the off line design of social laws, where we as designers must decide ahead of time on useful social laws. In the first part of this paper we suggest the use of social laws in the domain of mobile robots, and prove analytic results about the usefulness of this approach in that setting. In the second part of this paper we present a general model of social law in a computational system, and investigate some of its properties. This includes a definition of the basic computational problem involved with the design of multi-agent systems, and an investigation of the automatic synthesis of useful social laws in the framework of a model which refers explicitly to social laws. This work was supported in part by a grant from the US-Israel Binational Science Foundation. 1 Introduction This pa...
Article
Full-text available
Multiagent learning is a key problem in game theory and AI. It involves two interrelated learning problems: identifying the game and learning to play. These two problems prevail even in team games where the agents' interests do not conflict. Even team games can have multiple Nash equilibria, only some of which are optimal. We present optimal adaptive learning (OAL), the first algorithm that converges to an optimal Nash equilibrium for any team Markov game. We provide a convergence proof, and show that the algorithm's parameters are easy to set so that the convergence conditions are met. Our experiments show that existing algorithms do not converge in many of these problems while OAL does. We also demonstrate the importance of the fundamental ideas behind OAL: incomplete history sampling and biased action selection.
Chapter
Norms represent behavioural aspects that are encouraged by a social group of agents or the majority of agents in a system. Normative systems enable coordinating synthesised norms of heterogeneous agents in complex multi-agent systems autonomously. In real applications, agents have multiple objectives that may contradict each other or contradict the synthesised norms. Therefore, agents need a mechanism to understand the impact of a suggested norm on their objectives and decide whether or not to adopt it. To address these challenges, a utility-based norm synthesis (UNS) model is proposed which allows the agents to coordinate their behaviour while achieving their conflicting objectives. UNS proposes a utility-based case-based reasoning technique, using case-based reasoning for run-time norm synthesising in a centralised approach, and a utility function derived from the objectives of the system and its operating agents to decide whether or not to adopt a norm. The model is evaluated using a traffic junction scenario and the results show its efficacy to optimise multiple objectives while adopting synthesised norms.KeywordsNorms synthesisMulti-objectiveHeterogeneous multi-agent systems
Article
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the variation budgets are known. In addition, we propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a parameter-free manner, i.e., without knowing the variation budgets. Notably, learning non-stationary MDPs via the conventional optimistic exploration technique presents a unique challenge absent in existing (non-stationary) bandit learning settings. We overcome the challenge by a novel confidence widening technique that incorporates additional optimism.
Chapter
When creating (open) agent systems it has become common practice to use social concepts such as social practices, norms and conventions to model the way the interactions between the agents are regulated. However, in the literature most papers concentrate on only one of these aspects at the time. Therefore there is hardly any research on how these social concepts relate. It is also unclear whether something like a norm evolves from a social practice or convention or whether they are complete independent entities. In this paper we investigate some of the conceptual differences between these concepts. Whether they are fundamentally stemming from a single social object or should be seen as different types of objects altogether. And finally, when one should which type of concept in an implementation or a combination of them.
Book
This book constitutes the thoroughly refereed proceedings of the International Workshop on Engineering Environment-Mediated Multi-Agent Systems, EEMMAS 2007, held in Dresden, Germany, in October 2007, in conjunction with ECCS 2007, the European Conference on Complex Systems The volume includes 16 thoroughly revised papers, selected from the lectures given at the workshop, together with 2 papers resulting from invited talks by prominent researchers in the field. The papers are organized in sections on engineering self-organizing applications, stigmergic interaction, modeling and structuring mediating environments, and environment-based support for context and organizations.
Conference Paper
Contracts represent agreements between two or more parties formally in the form of deontic statements or norms within their clauses. If not carefully designed, such conflicts may invalidate an entire contract, and thus human reviewers invest great effort to write conflict-free contracts that, for complex and long contracts, can be time consuming and error-prone. In this work, we develop an approach to automate the identification of potential conflicts between norms in contracts. We build a two-phase approach that uses traditional machine learning together with deep learning to extract and compare norms in order to identify conflicts between them. Using a manually annotated set of conflicts as train and test set, our approach obtains 85% accuracy, establishing a new state-of-the art.
Book
This book addresses the question of how to achieve social coordination in Socio-Cognitive Technical Systems (SCTS). SCTS are a class of Socio-Technical Systems that are complex, open, systems where several humans and digital entities interact in order to achieve some collective endeavour. The book approaches the question from the conceptual background of regulated open multiagent systems, with the question being motivated by their design and construction requirements. The book captures the collective effort of eight groups from leading research centres and universities, each of which has developed a conceptual framework for the design of regulated multiagent systems and most have also developed technological artefacts that support the processes from specification to implementation of that type of systems. The first, introductory part of the book describes the challenge of developing frameworks for SCTS and articulates the premises and the main concepts involved in those frameworks. The second part discusses the eight frameworks and contrasts their main components. The final part maps the new field by discussing the types of activities in which SCTS are likely to be used, the features that such uses will exhibit, and the challenges that will drive the evolution of this field.
Chapter
The notion of electronic institution draws inspiration from traditional institutions. Both can be seen as “coordination artefacts that serve as an interface between the internal decision making of individuals and their (collective) goals”. However, electronic institutions, unlike the conventional ones, are intended to work on-line and may involve the participation of humans as well as software agents. The EI/EIDE framework that we present in this chapter includes the formal metamodel (EI) for electronic institutions (EI), and a particular development environment (EIDE) for implementing EI-based models. One models an electronic institution as a network of scenes where agents establish and discharge commitments, through “conversations” that are constrained by procedural and functional conventions. The EI metamodel includes the formal languages used to specify an institution and the data structure, operations and operational semantics that need to be supported by a technological environment to run it
Chapter
This chapter presents the INGENIAS framework for Social Coordination. INGENIAS is an agent-oriented methodology that comprehends a modeling language, a development process, and a set of tools. The modeling language captures the MAS specification as well as the requirements of the system to be developed. The development process is an iterative process producing the necessary artifacts based on the elements captured by the modeling language. The tools conform the INGENIAS Development Kit. It includes, among others, a specification editor and a code generator. Key features of INGENIAS are the extensive use of visual diagrams for modeling, grounding on the BDI concepts of agents, and translation to a JADE based architecture called the INGENIAS Agent Framework.
Chapter
This chapter presents the ANTE framework for Social Coordination. In multi-agent systems, social coordination is related with a number of agreement technologies: negotiation as a means of reaching agreements; organizational or normative structures as social constructs to regulate interactions; trust as a mechanism to assess agent performance when acting while subject to norms. The ANTE framework addresses the issue of social coordination from a comprehensive perspective, exploring negotiation as a mechanism for establishing some normative coordination infrastructure, based on the notion of a contract. Contracts are monitored for compliance, and their enactment phase enables the collection of behavioral data that can be used by computational trust models.
Article
In the last few years, the financial industry has witnessed a growing demand for an integrated multi-asset, multi-strategy trading system that allows traders to simultaneously trade different types of assets, and provides real-time risk assessments, status, and performance of the diversified portfolio. However, many factors contribute to the complexity of developing such integrated decision-making systems. Some of these factors include: the inherent diversity of financial assets, the heterogeneity of trading and risk assessment strategies, and the highly dynamic nature of financial markets. Moreover, the large volume of data to be analyzed severely affects the system's ability to make timely decisions, especially, for high-frequency trading. This paper proposes a novel Holonic Intelligent Multi-Agent Algorithmic Trading System (HIMAATS) to address the software functional requirements (multi-asset, multi-strategy, real-time risk assessment, etc.), and non-functional requirements (autonomy, high-throughput, low-latency, modularity, scalability, etc.). ISCA
Article
This special issue contains four selected and revised papers from the second international workshop on normative multiagent systems, for short NorMAS07 (Boella et al. (eds) Normative multiagent systems. Dagstuhl seminar proceedings 07122, 2007), held at Schloss Dagstuhl, Germany, in March 2007. At the workshop a shift was identified in the research community from a legal to an interactionist view on normative multiagent systems. In this editorial we discuss the shift, examples, and 10 new challenges in this more dynamic setting, which we use to introduce the papers of this special issue.
Article
Abstract Nowadays, with the expansion of Internet, there is a need of methodologies and soft- ware tools to ease the development of applications where distributed homogeneous entities can participate. Multiagent systems, and electronic institutions in particu- lar, can play a main role on the development of these type of systems. Electronic institutions define the rules of the game in agent societies, by fixing what agents are permitted and forbidden to do and under what circumstances. The goal of this Preprint submitted to Elsevier Science,23 September 2004 paper is to present EIDE, an integrated development environment for supporting the engineering of multiagent systems as electronic institutions. Key words: Multiagent system, Agent Oriented Software Engineering
Article
Artificial intelligence (AI) has approached normative concepts and phenomena in two frameworks: the theory of the law and related computation applications; and theory of multi-agents systems (MAS) and related computational applications. A wide gap exists between these two frameworks. This report presents a summary of the different approaches to norms adapted in the two domains. Some open questions are formulated and possible answers detailed.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Article
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function. In this solipsistic view, secondary agents can only be part of the environment and are therefore fixed in their behavior. The framework of Markov games allows us to widen this view to include multiple adaptive agents with interacting or competing goals. This paper considers a step in this direction in which exactly two agents with diametrically opposed goals share an environment. It describes a Q-learning-like algorithm for finding optimal policies and demonstrates its application to a simple two-player game in which the optimal policy is probabilistic. 1 INTRODUCTION No agent lives in a vacuum; it must interact with other agents to achieve its goals. Reinforcement learning is a promising technique for creating agents that co-exist [Tan, 1993, Yanco and Stein, 1993] , but the mathematical framework that just...
Article
We explore the view that coordinated behavior is explained by the social constraints that agents in organizations are subject to. In this framework, agents adopt those goals that are requested by their obligations, knowing that not fulfilling obligations induces a price to pay or a loss of utility. Based on this idea we build a coordination system where we represent the organization, the roles played by agents, the obligations imposed among roles, the goals and the plans that agents may adopt. Once a goal adopted, a special brand of plans, called conversation plans, are available to the agents for effectively carrying out coordinated action. Conversation plans explicitly represent interactions by message exchange and their actions are dynamically reordered using the theory of Markov Decision Processes to ensure the optimization of various criteria. The framework is applied to model supply chains of distributed enterprises. Introduction and Motivation To build autonomous agents that wo...
Article
A satisfactory multiagent learning algorithm should, {\em at a minimum}, learn to play optimally against stationary opponents and converge to a Nash equilibrium in self-play. The algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action repeated games--assuming that the opponent's (mixed) strategy is observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have these two properties in {\em all} repeated (finite) games. It requires only that the other players' actual actions (not their strategies) can be observed at each step. It also learns to play optimally against opponents that {\em eventually become} stationary. The basic idea behind AWESOME ({\em Adapt When Everybody is Stationary, Otherwise Move to Equilibrium}) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing other multiagent learning algorithms also.
Multi-agent coordination in adversarial environments through signal mediated strategies
  • F Cacciamani
  • A Celli
  • M Ciccone
  • N Gatti
Social Coordination Frameworks for Social Technical Systems
  • J J Gomez-Sanz
  • H Aldewereld
  • O Boissier
  • V Dignum
  • P Noriega
Social Coordination Frameworks for Social Technical Systems
  • H Lopes Cardoso
  • J Urbano
  • A Rocha
  • A J M Castro
  • E Oliveira
  • H Ldewereld
  • O Boissier
  • V Dignum
  • P Noriega
RLlib: abstractions for distributed reinforcement learning
  • E Liang
On-line norm synthesis for open Multi-Agent systems
  • J Morales
International Foundation for Autonomous Agents and Multiagent Systems
  • M Esteva