Conference Paper

Multilayered reinforcement learning for complicated collision avoidance problems

RIKEN, Inst. of Phys. & Chem. Res., Saitama
DOI: 10.1109/ROBOT.1998.680648 Conference: Robotics and Automation, 1998. Proceedings. 1998 IEEE International Conference on, Volume: 3
Source: IEEE Xplore


We have proposed the collision avoidance methods in a multirobot
system based on the information exchanged by the “LOCISS: Locally
Communicable Infrared Sensory System”, which is developed by the
authors. One of the problems in the LOCISS based methods is that the
number of situations which should be considered increases very much when
the number of the robots and stationary obstacles in the working
environment increases. In order to reduce the required computational
power and memory capacity for such a large number of situations, we
propose, in this paper, a multilayered reinforcement learning scheme to
acquire appropriate collision avoidance behaviors. The feasibility and
the performance of the proposed scheme is examined through the
experiment using actual mobile robots

3 Reads
  • Source
    • "The fuzzy system is automatically designed to train the neural network weights. Fujii et al. [9] suggested a multi-layered methodology for collision-free navigation via reinforcement learning. However, the planned vehicle motions using learning based approaches are not optimal, particularly at the initial learning stage. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time safety aware navigation of an intelligent vehicle is one of the major challenges in intelligent vehicle systems. Many studies have been focused on the obstacle avoidance to prevent an intelligent vehicle from approaching obstacles “too close” or “too far”, but difficult to obtain an optimal trajectory. In this paper, a novel biologically inspired neural network methodology with safety consideration to realtime collision-free navigation of an intelligent vehicle with safety consideration in a non-stationary environment is proposed. The real-time vehicle trajectory is planned through the varying neural activity landscape, which represents the dynamic environment, in conjunction of a safety aware navigation algorithm. The proposed model for intelligent vehicle trajectory planning with safety consideration is capable of planning a real-time “comfortable” trajectory by overcoming the either “too close” or “too far” shortcoming. Simulation results are presented to demonstrate the effectiveness and efficiency of the proposed methodology that performs safer collision-free navigation of an intelligent vehicle.
    IEEE International Joint Conference on Neural Networks (IJCNN'2014), Beijing, China; 06/2014
  • Source
    • "To handle direction and fitness criteria: AEPSO takes advantage from two heuristics known as Credit Assignment and Environment Reduction to addresses the clue de sacs problem (Suranga, 2006; Majid et al., 2001). Environment Reduction Heuristic: The environment reduction heuristic is inspired from the work done by (Fujii et al., 1998; Park et al., 2001; Yang and Gu, 2004) in which it is stated that it is possible to separate a large learning space to several small learning spaces with the aim of easing the exploration. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Navigation in dynamic and uncertain environments in the absence of reliable environment map is challeng-ing. In this study, we investigate the effectiveness of two variations of Particle Swarm Optimization (PSO) called Area Extended PSO (AEPSO) and Cooperative AEPSO (CAEPSO) in noisy environments in which the noise does not represent random noise originated from a single type of source but the combination of noises originating from different sources located in nearby or faraway positions. Knowledge Transfer and Transfer Learning that represent the use of the expertise and knowledge gained from previous experiments can improve the robots decision making and reduce the number of wrong decisions in such uncertain environments. This study investigates the impact of transfer learning on robots' search in such hostile environment. The results highlight the feasibility of CAEPSO to be used as the movement controller and decision maker of a swarm of robots in the simulated uncertain environment when gained expertise from past trainings is transferred to the robots in the testing phase.
    10th International Conference on Informatics in Control, Automation and Robotics (ICINCO); 07/2013
  • Source
    • "This proposed modification tries to solve the existing credit assignment problem for a multiagent system. Fujii et all [8] have used multilayered reinforcement learning scheme to select the appropriate collision avoidance behaviors so as to reduce the computational power and memory capacity. This helps to move numbers of robots using LOCISS (Locally Communicable Infrared Sensory System) safely in an environment full with large numbers of static obstacles. "
    [Show abstract] [Hide abstract]
    ABSTRACT: With the advancement of robotic explorations in diversified fields (like planetary exploration, exploration in Antartica, military applications for mine detection, surveillance and rescue) autonomous exploration has become popular for exploration in unknown and unstructured environments. To help such explorations in fully unknown environments robot learning is very useful. Out of various well-known learning methods Q-learning, a sub-part of Reinforcement learning, has been proven suitable. It is a continuous system-environment interaction model using reward-punishment scheme. Mostly an agent looks after this online system-environment interaction. But if the learning scheme is complex, single agent may not perform well. For such situation multi-agent learning is more appropriate. The whole task is broken down into several sub-tasks and different agents are assigned for completion of different sub-tasks. The current research work describes a methodology which uses a simplified multi-agent scheme for autonomous exploration of a robot. Here the most important and appropriate task/ agent with highest priority is identified using a popular behaviour-based architecture and the corresponding action is taken. The online update of the reinforcement function is done only to that agent/ task, but not to all other as in case of two-player zero sum game. The current work differs with the popularly used multi-agent Q-learning, which is nothing but two-player zero sum game, in the sense that it is tested using a robotic system, not using any simulation like others. Also it doesn't use two (or more) completely different systems as in the previous cases of multi-agent Q-learning, rather it uses different agents in a single system (robot) for the same environment.
    26’th International Conference on CAD/CAM, Robotics and Factories of the Future, Kuala Lumpur, Malaysia; 01/2011
Show more