[Show abstract][Hide abstract] ABSTRACT: In a multiagent system, if agents' experiences could be accessible and assessed between peers for environmental modeling, they can alleviate the burden of exploration for unvisited states or unseen situations so as to accelerate the learning process. Since how to build up an effective and accurate model within a limited time is an important issue, especially for complex environments, this paper introduces a model-based reinforcement learning method based on a tree structure to achieve efficient modeling and less memory consumption. The proposed algorithm tailored a Dyna-Q architecture to multiagent systems by means of a tree structure for modeling. The tree-model built from real experiences is used to generate virtual experiences such that the elapsed time in learning could be reduced. As well, this model is suitable for knowledge sharing. This paper is inspired by the concept of knowledge sharing methods in multiagent systems where an agent could construct a global model from scattered local models held by individual agents. Consequently, it can increase modeling accuracy so as to provide valid simulated experiences for indirect learning at the early stage of learning. To simplify the sharing process, the proposed method applies resampling techniques to grafting partial branches of trees containing required and useful experiences disseminated from experienced peers, instead of merging the whole trees. The simulation results demonstrate that the proposed sharing method can achieve the objectives of sample efficiency and learning acceleration in multiagent cooperation applications.
[Show abstract][Hide abstract] ABSTRACT: Dyna-Q, a well-known model-based reinforcement learning RL method, interplays offline simulations and action executions to update Q functions. It creates a world model that predicts the feature values in the next state and the reward function of the domain directly from the data and uses the model to train Q functions to accelerate policy learning. In general, tabular methods are always used in Dyna-Q to establish the model, but a tabular model needs many more samples of experience to approximate the environment concisely. In this article, an adaptive model learning method based on tree structures is presented to enhance sampling efficiency in modeling the world model. The proposed method is to produce simulated experiences for indirect learning. Thus, the proposed agent has additional experience for updating the policy. The agent works backwards from collections of state transition and associated rewards, utilizing coarse coding to learn their definitions for the region of state space that tracks back to the precedent states. The proposed method estimates the reward and transition probabilities between states from past experience. Because the resultant tree is always concise and small, the agent can use value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. The effectiveness and generality of our method is further demonstrated in two numerical simulations. Two simulations, a mountain car and a mobile robot in a maze, are used to verify the proposed methods. The simulation result demonstrates that the training rate of our method can improve obviously.
Cybernetics and Systems 11/2013; 44(8):641-662. · 0.97 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, a model learning method based on tree structures is present to achieve the sample efficiency in stochastic environment. The proposed method is composed of Q-Learning algorithm to form a Dyna agent that can used to speed up learning. The Q-Learning is used to learn the policy, and the proposed method is for model learning. The model builds the environment model and simulates the virtual experience. The virtual experience can decrease the interaction between the agent and the environment and make the agent perform value iterations quickly. Thus, the proposed agent has additional experience for updating the policy. The simulation task, a mobile robot in a maze, is introduced to compare the methods, Q-Learning, Dyna-Q and the proposed method. The result of simulation confirms the proposed method that can achieve the goal of sample efficiency.
Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics; 10/2013
[Show abstract][Hide abstract] ABSTRACT: Reinforcement learning is one of the more prominent machine learning technologies, because of its unsupervised learning structure and its ability to produce continual learning, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from other agents in the system, in order to increase the speed of learning. In the proposed learning algorithm, an agent stores its experience in terms of a state aggregation, by use of a decision tree, such that policy sharing between multiple agents is eventually accomplished by merging the different decision trees of peers. Unlike lookup tables, which have a homogeneous structure for state aggregation, decision trees carried with in agents have a heterogeneous structure. The method detailed in this study allows policy sharing between cooperative agents by means merging their trees into a hyper-structure, instead of forcefully merging entire trees. The proposed scheme initially allows the entire decision tree to be translated from one agent to others. Based on the evidence, only partial leaf nodes have useful experience for use in policy sharing. The proposed method induces a hyper decision tree by using a large amount of samples that are sampled from the shared nodes. The results from simulations in a multi-agent cooperative domain illustrate that the proposed algorithms perform better than the algorithm that does not allow sharing.
Information Sciences 06/2013; 234:112–120. · 3.64 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this paper, a sharing method of model construction between multi-agents is presented to shorten the time of modeling. The sharing method allows the agents to share their knowledge in modeling. In the proposed method, the individual model held by each agent can be implemented with the heterogeneous structure such as decision tree. To decreasing the complexity of the sharing process, the proposed method executes model sharing between cooperative agents by means of the leaf nodes of trees instead of merging whole trees violently. The result of simulation in multi-agent cooperative domain illustrates that the proposed algorithm perform better than the one without sharing.
System Science and Engineering (ICSSE), 2013 International Conference on; 01/2013
[Show abstract][Hide abstract] ABSTRACT: This paper develops a tree-construction method based on the framework of reinforcement learning (RL). The induction of a decision tree is regarded as a problem of RL, where the optimal policy should be found to obtain the maximal accumulated information gain. The proposed approach consists of two stages. At the first stage, the emulation/demonstration stage, sensory-action data in a mechatronic system or samples of training patterns are generated by an operator or stimulator. The records of these emulation data are aggregated into components of the state space represented by a decision tree. State aggregation for decartelization of a state space consists of two phases: split estimation and tree growing. In the split estimation phase, an inducer estimates long-term evaluations of splits at visited nodes. In the second phase, the inducer grows the tree by the predicted long-term evaluations, which is approximated by a neural network model. At the second stage, the learned behavior or classifier is shaped by the RL scheme with a discretized state space constructed by the decision tree derived from the previous stage. Unlike the conventional greedy procedure for constructing and pruning a the tree, the proposed method casts the sequential process of tree induction to policy iterations, where policies for node split are evaluated and improved repeatedly until an optimal or near-optimal policy is obtained. The splitting criterion regarded as an action policy is based on long-term evaluations of payoff instead of using immediate evaluations on impurity. A comparison with CART (classification and regression tree) and C4.5 on several benchmark datasets is presented. Furthermore, to show its applications for learning control, the proposed method is applied further to construct a so-called tree-based reinforcement learning method, where the mechanism works with a discrete state space derived from the proposed method. The results show the feasibility and high performance of the proposed system as a state partition by comparison with the renowned Adaptive Heuristic Critic (AHC) model.
[Show abstract][Hide abstract] ABSTRACT: This study introduces a method to enable a robot to learn how to perform new tasks through human demonstration and independent practice. The proposed process consists of two interconnected phases; in the first phase, state-action data are obtained from human demonstrations, and an aggregated state space is learned in terms of a decision tree that groups similar states together through reinforcement learning. Without the postprocess of trimming, in tree induction, the tree encodes a control policy that can be used to control the robot by means of repeatedly improving itself. Once a variety of behaviors is learned, more elaborate behaviors can be generated by selectively organizing several behaviors using another Q-learning algorithm. The composed outputs of the organized basic behaviors on the motor level are weighted using the policy learned through Q-learning. This approach uses three diverse Q-learning algorithms to learn complex behaviors. The experimental results show that the learned complicated behaviors, organized according to individual basic behaviors by the three Q-learning algorithms on different levels, can function more adaptively in a dynamic environment.
IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 07/2012; 42(4):999-1004. · 2.18 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: State partition is an important issue in reinforcement learning, because it has a significant effect on the performance. In this paper, an adaptive state partition method is presented for discretizing the state space adaptively and makes use of decision trees effectively. The proposed method splits the state space according to the temporal difference generated by the reinforcement learning. Consequently, the reinforcement learning uses the state space partitioned by the decision tree to learn the policy simultaneously. For avoiding a trivial partition, sibling nodes are pruned according to the Activity and the Reliability. A Monte-Carlo Tree Search (MCTS) is also proposed to explore the policy. A simulation for approaching goal has been conducted to demonstrate that the proposed method can achieve the design goal.
Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on; 01/2012
[Show abstract][Hide abstract] ABSTRACT: This article presented a Dyna-Q learning method based on a world model of tree structures to enhance the efficiency on sampling data in reinforcement learning problem. The Q-Learning mechanism is for policy learning as the tree is learning the world model by observing the transitions between the states after the actions taken. In early stages of learning, the learning agent does not have an accurate model but explores the environment as possible to collect sufficient experiences to approximate the environment model. When the agent develops a more accurate model, a planning method can use the model to produce simulated experiences to accelerate value iterations. Thus, the agent with the proposed method can obtain virtual experiences for updating the policy. Simulations on a mobile robot escaping from a labyrinth to verify the performance of the robot equipped with the proposed method. The result proves that tree-based Dyna-Q agent can speed up the learning process.
Advanced Intelligent Mechatronics (AIM), 2012 IEEE/ASME International Conference on; 01/2012
[Show abstract][Hide abstract] ABSTRACT: The original Q-learning method is difficult on achieving sample efficiency such as training a policy to get to a goal with in limited time step. So, the Dyna-Q agent is proposed to speed up the policy learning. However, the Dyna-Q did not specify how to build the model, so the table is used to be the model largely. In this paper, we proposed an adaptive model learning method based on tree structures and combined with Q-Learning to form Tree-Based Dyna-Q agent to enhance the policy learning. When the tree-based model learns an accurate model, a planning method can use the model to produce simulated experiences to accelerate value iterations. Thus, the agent with the proposed method can obtain virtual experiences for updating the policy. The simulation result shows that training time of our method can improve obviously.
SICE Annual Conference (SICE), 2012 Proceedings of; 01/2012
[Show abstract][Hide abstract] ABSTRACT: Q-learning, a most widely used reinforcement learning method, normally needs well-defined quantized state and action spaces to obtain an optimal policy for accomplishing a given task. This means it difficult to be applied to real robot tasks because of poor performance of learned behavior due to the failure of quantization of continuous state and action spaces. In this paper, we proposed a fuzzy-based Cerebellar Model Articulation Controller method to calculate contribution values to estimate a continuous action value in order to make motion smooth and effective. And we implement it to a multi-agent system for real robot applications.
[Show abstract][Hide abstract] ABSTRACT: Reinforcement learning is one of the more prominent machine learning technologies due to its unsupervised learning structure and ability to continually learn, even in a dynamic operating environment. Applying this learning to cooperative multi-agent systems not only allows each individual agent to learn from its own experience, but also offers the opportunity for the individual agents to learn from the other agents in the system to increase the speed of learning can be accelerated. In the proposed learning algorithm, an agent store its experience in terms of state aggregation implemented with a decision tree, such that policy sharing between multi-agent is eventually accomplished by merging different decision trees between peers. Unlike lookup tables which have homogeneous structure for state aggregations, decision trees carried in agents are with heterogeneous structure. This work executes policy sharing between cooperative agents by means of forming a hyper structure from their trees instead of merging whole trees violently. The proposed scheme initially translates the whole decision tree from an agent to others. Based on the evidence, only partial leaf nodes hold helpful experience for policy sharing. The proposed method inducts a hyper decision tree by a great mount of samples which are sampled from the shared nodes. Results from simulations in multi-agent cooperative domain illustrate that the proposed algorithms perform better than the one without sharing.
[Show abstract][Hide abstract] ABSTRACT: The fundamental approach of Q-learning is based on finite discrete state spaces, and incrementally estimating Q-values based on the reward received from the environment and the agent's previous Q-value estimates. Unfortunately, robots always learn and behave in a continuous perceptual space where the observed perceptions are transformed into or coarsely regarded as states. Nowadays, there is no elegant way to combine discrete actions with continuous observations or states. Therefore, accommodating continuous states with a finite discrete set of actions has become an important and intriguing issue in this research area. We proposed an algorithm to define an action policy from a discrete space to a real valued domain; that is, the method selects a real-valued action from a discrete set, the magnitude of which is immediately imposed a slight bias before this determined action is taken. From the viewpoint of exploration and exploitation, the method searches for a better action based on a paradigm action in the solution space with a variation within the biased region. Further, the proposed method uses the renown epsilon-greedy to explore a better trace but with a narrowized Tabu search.
Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Istanbul, Turkey, 10-13 October 2010; 01/2010
[Show abstract][Hide abstract] ABSTRACT: To design appropriate actions of mobile robots, the designers usually observe the sensory signals on the robots and decide the actions from the viewpoint of some desired purposes. This approach needs deliberative consideration and abundant knowledge on robotics for a variety of situations. To improve the actions of robots, it is hard to sense the error by human eyes and takes time in trial-and-error. In this article, we propose a novel learning algorithm, fused behavior Q-learning algorithm (FBQL) to deal with such situations. The proposed algorithm has the merit of simplicity in designing individual behavior by means of a decision tree approach to state aggregation which is eventually recoding the domain knowledge. Furthermore, these learned behaviors are fused into a more complicated behavior by a set of appropriate weighting parameters through a Q-learning mechanism such that the robots can behave adaptively and optimally in a dynamic environment.
Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, 11-14 October 2009; 01/2009
[Show abstract][Hide abstract] ABSTRACT: Q-learning, a most widely used reinforcement learning method, normally needs well-defined quantized state and action spaces to obtain an optimal policy for accomplishing a given task. This means it difficult to be applied to real robot tasks because of poor performance of learned behavior due to the failure of quantization of continuous state and action spaces. In this paper, we proposed a fuzzy-based CMAC method to calculate contribution values to estimate a continuous action value in order to make motion smooth and effective. And we implement it to a multi-agent system for real robot applications.
[Show abstract][Hide abstract] ABSTRACT: State value estimating is an important issue in reinforcement learning. It affects the performance significantly. The methods of lookup tables have advantages in convergence rate. But they need prior knowledge about how to partition the state space in advance. It is also not reasonable in a real system since the values associated with different sensory inputs but belonging to a representing state are the same. We proposed a method to discretize the state space adaptively and effectively in terms of an approach akin to decision tree methods. In each (discretized) presenting state, function approximators based on the tree structure estimate the values precisely.
Industrial Electronics Society, 2007. IECON 2007. 33rd Annual Conference of the IEEE; 12/2007
[Show abstract][Hide abstract] ABSTRACT: This correspondence presents a multistrategy decision making system for robot soccer games. Through reinforcement processes, the coordination between robots is learned in the course of game. Meanwhile, a better action can be granted after an iterative learning process. The experimental scenario is a five-versus-five soccer game, where the proposed system dynamically assigns each player to a position in a primitive role, such as attacker, goalkeeper, etc. The responsibility of each player varies along with the change of the role in state transitions. Therefore, the system uses several strategies, such as offensive strategy, defensive strategy, and so on, for a variety of scenarios. Thus, the decision-making mechanism can choose a better strategy according to the circumstances encountered. In each strategy, a robot should behave in coordination with its teammates and resolve conflicts aggressively. The major task assignment to robots in each strategy is simply to catch good positions. Therefore, the problem of dispatching robots to good positions in a reasonable manner should be effectively handled with. This kind of problem is similar to assignment problems in linear programming research. Utilizing the Hungarian method, each robot can be assigned to its assigned spot with minimal cost. Consequently, robots based on the proposed decision-making system can accomplish each situational task in coordination.
IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 12/2007; · 2.18 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: It is hard to define a state space or the proper reward function in reinforcement learning to make the robot act as expected. In this paper, we demonstrate the expected behavior for a robot Then a RL-based decision tree approach which decides to split according to long-term evaluations, instead of a top-down greedy strategy which finds out the relationship between the input and output from the demonstration data. We use this method to teach a robot for target seeking problem. In order to promote the performance in tackling target seeking problem, we add a Q-learning along with the state space based on RL-based decision tree. The experiment result shows that Q-Iearning can promote the performance quickly. For demonstration, we build a mobile robot powered by an embedded board. The robot can detect the hall of the range in any direction with omni-directional vision system. With such powerful embedded computing capability and the efficient machine vision system, the robot can inherit the learned behavior from a simulator which has learned the empirical behavior and continue to learn with Q-learning to improve the performance of target seeking problem.
Integration Technology, 2007. ICIT '07. IEEE International Conference on; 04/2007
[Show abstract][Hide abstract] ABSTRACT: A self-organizing control mechanism with a capability of reinforcement learning is proposed. The method is realized by a reinforcement signal predictor based on the grey theory and a policy learning unit implemented by a neural network. In consideration of the stability problem in learning, temporal difference algorithm is used as the weight-update rule of the connectionist From the results of the simulations and experiments, the proposed method demonstrates that a control task can be learned even with very little a priori knowledge.
[Show abstract][Hide abstract] ABSTRACT: In general, Q-learning needs well-defined quantized state spaces and action spaces to obtain an optimal policy for accomplishing
a given task. This makes it difficult to be applied to real robot tasks because of poor performance of learned behavior due
to the failure of quantization of continuous state and action spaces. In this paper, we proposed a fuzzy-based CMAC method
to calculate the contribution of each neighboring state to generate a continuous action value in order to make motion smooth
and effective. A momentum term to speed up training has been designed and implemented in a multi-agent system for real robot
Advances in Neural Networks - ISNN 2006, Third International Symposium on Neural Networks, Chengdu, China, May 28 - June 1, 2006, Proceedings, Part I; 01/2006