Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents' policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure is
... [Show full abstract] decided by minimum description length criterion using series of an agent's input-output and rewards as sample data. A BN constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of BN (i.e. stochastic knowledge). The proposed improvement mechanism makes RL agents acquire more effective policies. We carry out simulations in the pursuit problem in order to show the effectiveness of our proposed method. learning mechanism. We verify the effectiveness of our method by carrying out simulations in the pursuit problem, which is a typical multi-agent problem. The experimental results show that the agent learns its own policy more effectively by using our method. Moreover, we discuss about the environmental information represented by the structure of BN.