March 2025
IEEJ Transactions on Electronics Information and Systems
This paper focuses on the development of learning methods for achieving effective collaborative transportation by multiple robots in a warehouse environment. In large-scale and complex environments, it is necessary for agents to undergo numerous iterations of learning, such as reinforcement learning, to make appropriate behavioral choices. Traditional multi-agent methods like MADDPG (Multi-Agent Deep Deterministic Policy Gradient) and QMIX face the issue of requiring extensive computation time for environmental exploration. Therefore, this paper proposes a two-stage learning procedure that separates overall optimization, including the formulation of general task execution procedures, from individual optimization based on local situation assessments. Additionally, the effectiveness of the proposed method is demonstrated through simulation system analysis adapted to the target environment.