Conference Paper

Multilayered reinforcement learning for complicated collision avoidance problems

RIKEN, Inst. of Phys. & Chem. Res., Saitama
DOI: 10.1109/ROBOT.1998.680648 Conference: Robotics and Automation, 1998. Proceedings. 1998 IEEE International Conference on, Volume: 3
Source: IEEE Xplore

ABSTRACT We have proposed the collision avoidance methods in a multirobot
system based on the information exchanged by the “LOCISS: Locally
Communicable Infrared Sensory System”, which is developed by the
authors. One of the problems in the LOCISS based methods is that the
number of situations which should be considered increases very much when
the number of the robots and stationary obstacles in the working
environment increases. In order to reduce the required computational
power and memory capacity for such a large number of situations, we
propose, in this paper, a multilayered reinforcement learning scheme to
acquire appropriate collision avoidance behaviors. The feasibility and
the performance of the proposed scheme is examined through the
experiment using actual mobile robots

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, we have showed a method by which multiple modules are assigned to different situations caused by the alternation of the other agent's policy so that an agent may learn purposive behaviors for the specified situations as consequences of the other agent's behaviors. Macro actions are introduced to realize simultaneous learning of competitive behaviors in a multi-agent system. Results of a soccer situation and the importance of the learning scheduling in case of none-simultaneous learning without macro actions, as well as the validity of the macro actions in case of simultaneous learning in the multi-agent system, were shown. We have also showed another learning system using the state values instead of the physical sensor values and macro actions instead of the physical motor commands, and adopted the receiver's state value estimation modules that estimate how easy for each receiver to receive the ball in order to accelerate the learning. The state and action space abstraction (the use of state values and macro actions) contributes to the reduction of the learning time while the use of the receiver's state value estimation modules contributed to the improvement of the teamwork performance.
    01/2008; , ISBN: 978-3-902613-14-1
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Navigation consisting of two essential components known as localization and planning is the art of steering a course through a medium. Localization matches an actual position in the real-world to a location inside a map; in other words, each location in the map refers to an actual position in the environment. Planning is finding a short, collision-free path from the starting position towards the predefined ending location. This study is a survey which focuses on introducing classic and heuristic-based path planning approaches and investigates their achievements in search optimization problems. The methods are categorized, their strengths and drawbacks are discussed, and the applications in which they have been utilized are explained.
    International Journal of Advancements in Computing Technology (IJACT). 01/2013; 5(14):1-14.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Particle Swarm optimization (PSO) is a search method inspired from the social behaviors of animals. PSO has been found to outperform other methods in various tasks. Area Extended PSO (AEPSO) is an enhanced version of PSO that achieves better performance by balancing its essential intelligent behaviours more intelligently. AEPSO incorporates knowledge with the aim of choosing proper behaviors in each situation. This study provides a comparison between the variations of Basic PSO and AEPSO aiming to address dynamic and time dependent constraint problems in simulated robotic search. The problem is set up in a multi-robot learning scenario. The scenario is based on the use of a team of simulated robots (hereafter referred to as agents) who participate in survivor rescuing missions. The experiments are classified into three simulations. At first, agents employ variations of basic PSO as their decision maker and movement controllers. The first simulation investigates the impacts of swarm size, parameter adjustment, and population density on agents’ performance. Later, AEPSO is employed to improve the performance of the swarm in the same simulations. The final simulation investigates the feasibility of AEPSO in time-dependent, dynamic and uncertain environments. As shown by the results, AEPSO achieves an appreciable level of performance in dynamic, time-dependence and uncertain simulated environments and outperforms the variations of basic PSO, Linear Search and Random Search used in the simulations.
    Journal of Intelligent and Robotic Systems 01/2010; 58(3 4):253-285. · 0.83 Impact Factor