Article

State Space Construction for Behavior Acquisition in Multi Agent Environments with Vision and Action

Dept. of Adaptive Machine Syst., Osaka Univ.
11/1998; DOI: 10.1109/ICCV.1998.710819
Source: IEEE Xplore

ABSTRACT This paper proposes a method which estimates the relationships between learner's behaviors and other agents' ones in the environment through interactions (observation and action) using the method of system identication. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis for the relationship between the observed data in terms of action and future observation. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior. The proposed method is applied to a soccer playing situation, where a rolling ball and other moving agents are well modeled and the learner's behaviors are successfully acquired by the method. Computer simulations and real experiments are shown and a discussion is given. 1 Introduction Building a robot that learns to accomplish a task through visual information has been acknowledged as one of the major challenges facing vision, robotics, a...

Download full-text

Full-text

Available from: Koh Hosoda, Jan 30, 2015
0 Followers
 · 
54 Views
  • Source
    • "Robot soccer is a good domain for researchers to study the multi-agent cooperation problem. Under the robot soccer simulation environment, Stone and Veloso [2] proposed a layered learning method that consists of two levels of learned behaviors; Uchibe [3] et al proposed a scheme in which the relationship between a learner's behaviors and those of other robots is estimated based on the method of system identification and the cooperative behaviors is acquired by reinforcement learning. These methods have developed efficient action selections of individual robot. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In a multi robots environment, the overlap of actions selected by each robot makes the acquisition of cooperation behaviors less efficient. In this paper an approach is proposed to determine the action selection priority level based on which the cooperative behaviors can be readily controlled. First, eight levels are defined for the action selection priority, which can be correspondingly mapped to eight subspaces of actions. Second, using the local potential field method, the action selection priority level for each robot is calculated and thus its action subspace is obtained. Then, Reinforcement learning (RL) is utilized to choose a proper action for each robot in its action subspace. Finally, the proposed method has been implemented in a soccer game and the high efficiency of the proposed scheme was verified by the result of both the computer simulation and the real experiments. In multi-agent environment, the cooperation behaviors can usually be acquired by learning. However, conventional reinforcement learning algorithm which is well used for single agent case is not so efficient in the multi-agent case, as an environment including other learners might change randomly from the viewpoint of an individual learning agent [1] . Therefore, for multi-agent learning, it is very important for the agent to perceive the influence from both the other learners and the objectives. Robot soccer is a good domain for researchers to study the multi-agent cooperation problem. Under the robot soccer simulation environment, Stone and Veloso [2] proposed a layered learning method that consists of two levels of learned behaviors; Uchibe [3] et al proposed a scheme in which the relationship between a learner's behaviors and those of other robots is estimated based on the method of system identification and the cooperative behaviors is acquired by reinforcement learning. These methods have developed efficient action selections of individual robot. However, all the methods mentioned above failed to consider the fact that the action selected by each robot may be unnecessarily overlapped. In this paper, a method based on potential function is proposed to deal with the overlap of action selection. The concept named Action Selection Priority Level (ASPL) is firstly defined in this method. Then, eight ASPLs are introduced to indicate the action of different purpose. That is, the action space is divided into eight subspaces corresponding to each ASPL. The ASPL, as well as the action subspace for each robot, can be determined by the local potential field method where more factors relating to the decision of a robot are considered. Moreover, we imported the ASPL model into the conventional RL algorithm with which the proper action for each robot is chosen. In addition, it should be noted that the creation of the ASPL model enables the robot to search proper actions among Supported by the National Natural Science Foundation of China under Grant No.69985002 (¹ ú ¼ Ò × Ô È » ¿ AE Ñ » ù ½ ð) CHU Hai-tao was born in 1972. He is a Ph.D. candidate at the Department of Computer Science and Technology, Harbin Institute of Technology. His research interests are multi-agent systems, and machine learning. HONG Bing-rong was born in 1937. He is a professor and doctoral supervisor at the Computer Application, Harbin Institute of Technology. His current research areas include intelligence robot, robot soccer, and multi-agent systems.
  • Source
    • "In order to construct the local predictive model of other agents, Akaike's Information Criterion(AIC) [1] is applied to the result of Canonical Variate Analysis(CVA) [5]. We just briefly explained the method (for the details of the local predictive model, see [9] [10]). CVA uses a discrete time, linear, state space model as follows: "
    [Show abstract] [Hide abstract]
    ABSTRACT: Discusses how a robot can develop its state vector according to the complexity of the interactions with its environment. A method for controlling the complexity is proposed for a vision-based mobile robot whose task is to shoot a ball into a goal avoiding collisions with a goalkeeper. First, we provide the most difficult situation (the maximum speed of the goalkeeper with chasing-a-ball behavior), and the robot estimates the full set of state vectors with the order of the major vector components by a method of system identification. The environmental complexity is defined in terms of the speed of the goalkeeper while the complexity of the state vector is the number of the dimensions of the state vector. According to the increase of the speed of the goalkeeper, the dimension of the state vector is increased by taking a trade-off between the size of the state space (the dimension) and the learning time. Simulations are shown, and other issues for the complexity control are discussed
    Robotics and Automation, 1998. Proceedings. 1998 IEEE International Conference on; 06/1998
  • Source
    • "The rest of this article is structured as follows: at first we show our basic idea, then we give brief explanation of the local predictive model and reinforcement learning. The details of the local predictive model and learning algorithms are described in [8] and [7], respectively. Finally, we show simulation results and real experiments and give a discussion. "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a method that acquires robots' behaviors based on the estimation of the state vectors. In order to acquire the cooperative behaviors in multi-robot environments, each learning robot estimates the local predictive model between the learner and the other objects separately. Based on the local predictive models, the robots learn the desired behaviors using reinforcement learning. The proposed method is applied to a soccer playing situation, where a rolling ball and other moving robots are well modeled and the learner's behaviors are successfully acquired by the method. Computer simulations and real experiments are shown and a discussion is given
    Robotics and Automation, 1998. Proceedings. 1998 IEEE International Conference on; 06/1998
Show more