Figure - available from: Autonomous Agents and Multi-Agent Systems
This content is subject to copyright. Terms and conditions apply.
Battle scenario with large and small perception. The blue squares are the controllable agent and the red squares are the enemies controlled by the environment

Battle scenario with large and small perception. The blue squares are the controllable agent and the red squares are the enemies controlled by the environment

Source publication
Article
Full-text available
This work explores the large-scale multi-agent communication mechanism for multi-agent reinforcement learning (MARL). We summarize the general topology categories for communication structures, which are often manually specified in MARL literature. A novel framework termed Learning Structured Communication (LSC) is proposed by learning a flexible an...

Citations

... DGN (Jiang et al. 2020) introduces graph convolutional communication within the observable field. LSC (Sheng et al. 2022) clusters the communication group based on a predefined radius. ...
Article
Communication plays a crucial role in information sharing within the field of multi-agent reinforcement learning (MARL). However, how to transmit information that meets individual needs remains a long-standing challenge. Some existing work focus on using a common channel for information transfer, which limits the capability for local communication. Meanwhile, other work attempt to establish peer-to-peer communication topologies but suffer from quadratic complexity. In this paper, we propose Personalized Multi-Agent Communication (PMAC), which enables the formation of peer-to-peer communication topologies, personalized message sending, and personalized message receiving. All these modules in PMAC are performed using only multilayer perceptrons (MLPs) with linear computational complexity. Empirically, we show the strength of personalized communication in a variety of cooperative scenarios. Our approach exhibits competitive performance compared to existing methods while maintaining notable computational efficiency.
... 3) We propose a general MARL framework that can flexibly integrate the proposed communication learning mechanism with any value function factorization methods. 4) We evaluate the proposed method on several MARL environments, including SMAC ) and MAgent (Sheng et al. 2020). Experimental results demonstrate that MAGI is more robust and efficient than the state-of-theart MACRL methods. ...
... HAMA (Ryu, Shin, and Park 2020) presents a hierarchical attentional communication protocol based on GNNs, which effectively models the relations between agents. LSC (Sheng et al. 2020) introduces the hierarchical GNN to realize effective communication learning by exchanging messages among groups and agents. GA2NET (Liu et al. 2020) introduces a two-stage attention mechanism to model the complete graph for multi-agent communication learning. ...
Article
Efficient communication learning among agents has been shown crucial for cooperative multi-agent reinforcement learning (MARL), as it can promote the action coordination of agents and ultimately improve performance. Graph neural network (GNN) provide a general paradigm for communication learning, which consider agents and communication channels as nodes and edges in a graph, with the action selection corresponding to node labeling. Under such paradigm, an agent aggregates information from neighbor agents, which can reduce uncertainty in local decision-making and induce implicit action coordination. However, this communication paradigm is vulnerable to adversarial attacks and noise, and how to learn robust and efficient communication under perturbations has largely not been studied. To this end, this paper introduces a novel Multi-Agent communication mechanism via Graph Information bottleneck (MAGI), which can optimally balance the robustness and expressiveness of the message representation learned by agents. This communication mechanism is aim at learning the minimal sufficient message representation for an agent by maximizing the mutual information (MI) between the message representation and the selected action, and simultaneously constraining the MI between the message representation and the agent feature. Empirical results demonstrate that MAGI is more robust and efficient than state-of-the-art GNN-based MARL methods.
... Furthermore, standard techniques used in deep learning, such as dropout [52], have inspired works where messages of other agents are dropped out during learning to work well even in conditions with only limited communication feasible [19]. However, in all these works, the goal is to learn inter-agent communication alongside local policies that suffer from the bottleneck of simultaneously achieving effective communication and global collaboration [48]. They also face difficulty in extracting essential and high-quality information for exchange among agents [48]. ...
... However, in all these works, the goal is to learn inter-agent communication alongside local policies that suffer from the bottleneck of simultaneously achieving effective communication and global collaboration [48]. They also face difficulty in extracting essential and high-quality information for exchange among agents [48]. Further, unlike HAMMER, these works expect that more sophisticated agents are available in the environment in terms of communication capabilities or the ability to run complex algorithms to model other agents present-which might not always be feasible. ...
Article
Full-text available
Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation-learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal—in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, central agent that can observe the entire observation space, and there are multiple, low-powered local agents that can only receive local observations and are not able to communicate with each other. The central agent’s job is to learn what message needs to be sent to different local agents based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. In this work, we present our MARL algorithm hammer, describe where it would be most applicable, and implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that (1) learned communication does indeed improve system performance, (2) results generalize to heterogeneous local agents, and (3) results generalize to different reward structures.
... Previous research models a multi-agent communication system as a message passing graph neural network [11] [13], where each node in the graph represents an agent and each edge models a communication pathway equipped with message encoding and decoding. Different graph topologies have been studied [21], and recent works focus on improving the efficiency and reducing the cost of the communication using gated message passing [16], attention [5], schedule communication [10] and event/memory driven processing [7][19] [22]. ...
Chapter
Full-text available
In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learning (RL), the message passing system needs to be optimized together with the agent policies. This consequently increases the model’s complexity and poses significant challenges to the convergence and performance of learning. To address this issue, we propose the Belief-map Assisted Multi-agent System (BAMS), which leverages a neuro-symbolic belief map to enhance training. The belief map decodes the agent’s hidden state to provide a symbolic representation of the agent’s understanding of the environment and other agents’ status. The simplicity of symbolic representation allows the gathering and comparison of the ground truth information with the belief, which provides an additional channel of feedback for the learning. Compared to the sporadic and delayed feedback coming from the reward in RL, the feedback from the belief map is more consistent and reliable. Agents using BAMS can learn a more effective message passing network to better understand each other, resulting in better performance in the game. We evaluate BAMS’s performance in a cooperative predator and prey game with varying levels of map complexity and compare it to previous multi-agent message passing models. The simulation results showed that BAMS reduced training epochs by 66%, and agents who apply the BAMS model completed the game with 34.62% fewer steps on average.
... [54] propose a graph-attention communication protocol to help with the problems of when to communicate and whom to address messages to. LSC [39] adopts an auxiliary task to learn hierarchical communication structures. ...
Preprint
Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in Partially-Observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents. However, such black-box approaches are unable to provide any quantitative guarantees on the expected return and often lead to the generation of continuous messages with high communication overhead and poor interpretability. In this paper, we establish an upper bound on the return gap between an ideal policy with full observability and an optimal partially-observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. By minimizing the upper bound, we propose a surprisingly simple design of message generation functions in multi-agent communication and integrate it with reinforcement learning using a Regularized Information Maximization loss function. Evaluations show that the proposed discrete communication significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly-optimal returns with few-bit messages that are naturally interpretable.
... Das et al. [10] leverages GNN with soft attention mechanism to learn whom to receive messages and what messages to pass. Sheng et al. [11] utilizes the hierarchical GNN to achieve effective communication learning by sharing information among agents and groups. Ryu et al. [12] presents a hierarchical attention mechanism based on GNNs, which models the relationships between agents effectively. ...
... We utilize the communication learning mechanism designed above to generate the negative message embeddingm c i based on the set N − i . The designed mutual-information loss function in Eq. (11) can be utilized to maximize the MI. ...
... Communication 45,46 methods allow agents to receive information directly from others and thus enrich the observation information to improve decision-making accuracy. These methods directly access the information of all other agents and are unsuitable for competitive tasks. ...
Article
Full-text available
During dynamic social interaction, inferring and predicting others' behaviors through theory of mind (ToM) is crucial for obtaining benefits in cooperative and competitive tasks. Current multi-agent reinforcement learning (MARL) methods primarily rely on agent observations to select behaviors, but they lack inspiration from ToM, which limits performance. In this article, we propose a multi-agent ToM decision-making (MAToM-DM) model, which consists of a MAToM spiking neural network (MAToM-SNN) module and a decision-making module. We design two brain-inspired ToM modules (Self-MAToM and Other-MAToM) to predict others' behaviors based on self-experience and observations of others, respectively. Each agent can adjust its behavior according to the predicted actions of others. The effectiveness of the proposed model has been demonstrated through experiments conducted in cooperative and competitive tasks. The results indicate that integrating the ToM mechanism can enhance cooperation and competition efficiency and lead to higher rewards compared with traditional MARL models.
... Notably, the emergence of natural-language-like properties can be observed by maximizing the agents' objective [215], [216]. Recent work has focused on learnable and possibly dynamic communication typologies [217], [218]. ...
Preprint
Full-text available
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.
... Learning to communicate effectively among agents has shown to be crucial to strengthen interagent collaboration and ultimately improving the quality of policies learned by MARL. Sheng et al. [20] categorized the existing designs for communication topology into five patterns: fully-connected [21], star [22], tree [23], neighboring [24], hierarchical [20], and the hierarchical communication topology is the 3 most effective comparing to the other four. ...
... Learning to communicate effectively among agents has shown to be crucial to strengthen interagent collaboration and ultimately improving the quality of policies learned by MARL. Sheng et al. [20] categorized the existing designs for communication topology into five patterns: fully-connected [21], star [22], tree [23], neighboring [24], hierarchical [20], and the hierarchical communication topology is the 3 most effective comparing to the other four. ...
Preprint
Wind power is becoming an increasingly important source of renewable energy worldwide. However, wind farm power control faces significant challenges due to the high system complexity inherent in these farms. A novel communication-based multi-agent deep reinforcement learning large-scale wind farm multivariate control is proposed to handle this challenge and maximize power output. A wind farm multivariate power model is proposed to study the influence of wind turbines (WTs) wake on power. The multivariate model includes axial induction factor, yaw angle, and tilt angle controllable variables. The hierarchical communication multi-agent proximal policy optimization (HCMAPPO) algorithm is proposed to coordinate the multivariate large-scale wind farm continuous controls. The large-scale wind farm is divided into multiple wind turbine aggregators (WTAs), and neighboring WTAs can exchange information through hierarchical communication to maximize the wind farm power output. Simulation results demonstrate that the proposed multivariate HCMAPPO can significantly increase wind farm power output compared to the traditional PID control, coordinated model-based predictive control, and multi-agent deep deterministic policy gradient algorithm. Particularly, the HCMAPPO algorithm can be trained with the environment based on the thirteen-turbine wind farm and effectively applied to larger wind farms. At the same time, there is no significant increase in the fatigue damage of the wind turbine blade from the wake control as the wind farm scale increases. The multivariate HCMAPPO control can realize the collective large-scale wind farm maximum power output.
... Across a wide variety of economics and social psychology studies, when individuals are given a chance to talk with each other, cooperation increases significantly (Orbell et al., 1988(Orbell et al., , 1990. Although there are many works (Sheng et al., 2020;Ahilan and Dayan, 2021) on communication learning in MARL, little attention has been paid to the role of communication in solving ISD. Pretorius et al. (2020) first uses empirical game-theoretic analysis (Tuyls et al., 2018) to study existing communication learning methods in ISD and to verify the effects of these methods experimentally. ...
Preprint
Full-text available
Social dilemmas can be considered situations where individual rationality leads to collective irrationality. The multi-agent reinforcement learning community has leveraged ideas from social science, such as social value orientations (SVO), to solve social dilemmas in complex cooperative tasks. In this paper, by first introducing the typical "division of labor or roles" mechanism in human society, we provide a promising solution for intertemporal social dilemmas (ISD) with SVOs. A novel learning framework, called Learning Roles with Emergent SVOs (RESVO), is proposed to transform the learning of roles into the social value orientation emergence, which is symmetrically solved by endowing agents with altruism to share rewards with other agents. An SVO-based role embedding space is then constructed by individual conditioning policies on roles with a novel rank regularizer and mutual information maximizer. Experiments show that RESVO achieves a stable division of labor and cooperation in ISDs with different complexity.