Battle scenario with large and small perception. The blue squares are...

PMAC: Personalized Multi-Agent Communication

Article

Mar 2024

Communication plays a crucial role in information sharing within the field of multi-agent reinforcement learning (MARL). However, how to transmit information that meets individual needs remains a long-standing challenge. Some existing work focus on using a common channel for information transfer, which limits the capability for local communication. Meanwhile, other work attempt to establish peer-to-peer communication topologies but suffer from quadratic complexity. In this paper, we propose Personalized Multi-Agent Communication (PMAC), which enables the formation of peer-to-peer communication topologies, personalized message sending, and personalized message receiving. All these modules in PMAC are performed using only multilayer perceptrons (MLPs) with linear computational complexity. Empirically, we show the strength of personalized communication in a variety of cooperative scenarios. Our approach exhibits competitive performance compared to existing methods while maintaining notable computational efficiency.

Learning Efficient and Robust Multi-Agent Communication via Graph Information Bottleneck

Article

Mar 2024

Efficient communication learning among agents has been shown crucial for cooperative multi-agent reinforcement learning (MARL), as it can promote the action coordination of agents and ultimately improve performance. Graph neural network (GNN) provide a general paradigm for communication learning, which consider agents and communication channels as nodes and edges in a graph, with the action selection corresponding to node labeling. Under such paradigm, an agent aggregates information from neighbor agents, which can reduce uncertainty in local decision-making and induce implicit action coordination. However, this communication paradigm is vulnerable to adversarial attacks and noise, and how to learn robust and efficient communication under perturbations has largely not been studied. To this end, this paper introduces a novel Multi-Agent communication mechanism via Graph Information bottleneck (MAGI), which can optimally balance the robustness and expressiveness of the message representation learned by agents. This communication mechanism is aim at learning the minimal sufficient message representation for an agent by maximizing the mutual information (MI) between the message representation and the selected action, and simultaneously constraining the MI between the message representation and the agent feature. Empirical results demonstrate that MAGI is more robust and efficient than state-of-the-art GNN-based MARL methods.

hammer: Multi-level coordination of reinforcement learning agents via learned messaging

Article

Full-text available

Oct 2023
NEURAL COMPUT APPL

Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation-learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal—in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, central agent that can observe the entire observation space, and there are multiple, low-powered local agents that can only receive local observations and are not able to communicate with each other. The central agent’s job is to learn what message needs to be sent to different local agents based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. In this work, we present our MARL algorithm hammer, describe where it would be most applicable, and implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that (1) learned communication does indeed improve system performance, (2) results generalize to heterogeneous local agents, and (3) results generalize to different reward structures.

Multi-Agent Cooperative Games Using Belief Map Assisted Training

Chapter

Full-text available

Sep 2023

In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learning (RL), the message passing system needs to be optimized together with the agent policies. This consequently increases the model’s complexity and poses significant challenges to the convergence and performance of learning. To address this issue, we propose the Belief-map Assisted Multi-agent System (BAMS), which leverages a neuro-symbolic belief map to enhance training. The belief map decodes the agent’s hidden state to provide a symbolic representation of the agent’s understanding of the environment and other agents’ status. The simplicity of symbolic representation allows the gathering and comparison of the ground truth information with the belief, which provides an additional channel of feedback for the learning. Compared to the sporadic and delayed feedback coming from the reward in RL, the feedback from the belief map is more consistent and reliable. Agents using BAMS can learn a more effective message passing network to better understand each other, resulting in better performance in the game. We evaluate BAMS’s performance in a cooperative predator and prey game with varying levels of map complexity and compare it to previous multi-agent message passing models. The simulation results showed that BAMS reduced training epochs by 66%, and agents who apply the BAMS model completed the game with 34.62% fewer steps on average.

Minimizing Return Gaps with Discrete Communications in Decentralized POMDP

Preprint

Aug 2023

Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in Partially-Observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents. However, such black-box approaches are unable to provide any quantitative guarantees on the expected return and often lead to the generation of continuous messages with high communication overhead and poor interpretability. In this paper, we establish an upper bound on the return gap between an ideal policy with full observability and an optimal partially-observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. By minimizing the upper bound, we propose a surprisingly simple design of message generation functions in multi-agent communication and integrate it with reinforcement learning using a Regularized Information Maximization loss function. Evaluations show that the proposed discrete communication significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly-optimal returns with few-bit messages that are naturally interpretable.

Heterogeneous Multi-Agent Communication Learning via Graph Information Maximization

Conference Paper

Jul 2023

A brain-inspired theory of mind spiking neural network improves multi-agent cooperation and competition

Article

Full-text available

Jun 2023

During dynamic social interaction, inferring and predicting others' behaviors through theory of mind (ToM) is crucial for obtaining benefits in cooperative and competitive tasks. Current multi-agent reinforcement learning (MARL) methods primarily rely on agent observations to select behaviors, but they lack inspiration from ToM, which limits performance. In this article, we propose a multi-agent ToM decision-making (MAToM-DM) model, which consists of a MAToM spiking neural network (MAToM-SNN) module and a decision-making module. We design two brain-inspired ToM modules (Self-MAToM and Other-MAToM) to predict others' behaviors based on self-experience and observations of others, respectively. Each agent can adjust its behavior according to the predicted actions of others. The effectiveness of the proposed model has been demonstrated through experiments conducted in cooperative and competitive tasks. The results indicate that integrating the ToM mechanism can enhance cooperation and competition efficiency and lead to higher rewards compared with traditional MARL models.

Mindstorms in Natural Language-Based Societies of Mind

Preprint

Full-text available

May 2023

Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.

Collective Large-scale Wind Farm Multivariate Power Output Control Based on Hierarchical Communication Multi-Agent Proximal Policy Optimization

Preprint

May 2023

Wind power is becoming an increasingly important source of renewable energy worldwide. However, wind farm power control faces significant challenges due to the high system complexity inherent in these farms. A novel communication-based multi-agent deep reinforcement learning large-scale wind farm multivariate control is proposed to handle this challenge and maximize power output. A wind farm multivariate power model is proposed to study the influence of wind turbines (WTs) wake on power. The multivariate model includes axial induction factor, yaw angle, and tilt angle controllable variables. The hierarchical communication multi-agent proximal policy optimization (HCMAPPO) algorithm is proposed to coordinate the multivariate large-scale wind farm continuous controls. The large-scale wind farm is divided into multiple wind turbine aggregators (WTAs), and neighboring WTAs can exchange information through hierarchical communication to maximize the wind farm power output. Simulation results demonstrate that the proposed multivariate HCMAPPO can significantly increase wind farm power output compared to the traditional PID control, coordinated model-based predictive control, and multi-agent deep deterministic policy gradient algorithm. Particularly, the HCMAPPO algorithm can be trained with the environment based on the thirteen-turbine wind farm and effectively applied to larger wind farms. At the same time, there is no significant increase in the fatigue damage of the wind turbine blade from the wake control as the wind farm scale increases. The multivariate HCMAPPO control can realize the collective large-scale wind farm maximum power output.

Learning Roles with Emergent Social Value Orientations

Preprint

Full-text available

Jan 2023

Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, by first introducing the typical "division of labor or roles" mechanism in human society, we provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.

Battle scenario with large and small perception. The blue squares are the controllable agent and the red squares are the enemies controlled by the environment

Citations