Agents in Multi-Agent Systems (MAS) are not always built and controlled by the system designer, e.g., on electronic trading platforms. In this case, there is often a system objective which can differ from the agents’ own goals (e.g., price stability). While much effort has been put into modeling and optimizing agent behavior, we are concerned in this paper with the platform perspective. Our model extends Stochastic Games (SG) with dynamic restriction of action spaces to a new self-learning governance approach for black-box MAS. This governance learns an optimal restriction policy via Reinforcement Learning.As an alternative to the two straight-forward approaches—fully centralized control and fully independent learners—, this novel method combines a sufficient degree of autonomy for the agents with selective restriction of their action spaces. We demonstrate that the governance, though not explicitly instructed to leave any freedom of decision to the agents, learns that combining the agents’ and its own capabilities is better than controlling all actions. As shown experimentally, the self-learning approach outperforms (w.r.t. the system objective) both “full control” where actions are always dictated without any agent autonomy, and “ungoverned MAS” where the agents simply pursue their individual goals.
KeywordsMulti-Agent SystemGovernanceSelf-learning systemReinforcement LearningElectronic institution