January 2025
·
7 Reads
IEEE Transactions on Cognitive and Developmental Systems
Safety is a crucial challenge in the application of reinforcement learning. Multi-agent safe reinforcement learning is an emerging field focused on learning control policies that maximize cumulative rewards while adhering to safety constraints. However, existing research is limited and faces challenges such as environmental non-stationarity and the curse of dimensionality in action spaces, making it difficult to balance performance and safety. To address these, this paper proposes a multi-agent safe reinforcement learning algorithm based on the Transformer (MAST). Our key contribution is the multiagent total advantage decomposition theorem, which establishes the connection between multi-agent safe reinforcement learning and sequence models. MAST employs a Transformer-based actor network that generates joint actions in parallel during training, and autoregressively during inference. Empirical evaluations on the Safe MAMuJoCo benchmark show that MAST achieves a 13.06% improvement over state-of-the-art algorithms. Our attention-based reward and safety critics achieve a 22.10% increase in rewards and an 83.58% reduction in safety costs. Additionally, the Transformer-based actor improves performance by 53.60% to 111.93% compared to RNN-based methods.