Conference Paper

A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming

Dept. of Electr., Comput., & Biomed. Eng., Univ. of Rhode Island, Kingston, RI, USA
DOI: 10.1109/ICNSC.2010.5461483 Conference: Networking, Sensing and Control (ICNSC), 2010 International Conference on
Source: IEEE Xplore

ABSTRACT In this paper we propose a hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming (ADP). The key idea of this architecture is to integrate a reference network to provide the internal reinforcement representation (secondary reinforcement signal) to interact with the operation of the learning system. Such a reference network serves an important role to build the internal goal representations. Furthermore, motivated by recent research in neurobiological and psychology research, the proposed ADP architecture can be designed in a hierarchical way, in which different levels of internal reinforcement signals can be developed to represent multi-level goals for the intelligent system. Detailed system level architecture, learning and adaptation principle, and simulation results are presented in this work to demonstrate the effectiveness of this work.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Adaptive dynamic programming (ADP) is an effective method for learning while fuzzy controller has been put into use in many applications because of its simplicity and no need of accurate mathematic modeling. The combination of ADP and fuzzy control has been studied a lot. Before this paper, we have studied using ADP to learn the fuzzy rules of a Monotonic controller, which shows good performance. In this paper, a hyperbolic fuzzy model is adopted to make an improvement. In this way, both membership function and fuzzy rules are learned. With ADP algorithm, fuzzy controller has the capacity of learning and adapting. Simulations on a single cart-pole plant and a rotational inverted pendulum are implemented to observe the performance, even with uncertainties and disturbances.
    Intelligent Control and Automation (WCICA), 2012 10th World Congress on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Reinforcement learning (RL) is a powerful paradigm for sequential decision-making under uncertainties, and most RL algorithms aim to maximize some numerical value which represents only one long-term objective. However, multiple long-term objectives are exhibited in many real-world decision and control systems, so recently there has been growing interest in solving multiobjective reinforcement learning (MORL) problems where there are multiple conflicting objectives. The aim of this paper is to present a comprehensive overview of MORL. The basic architecture, research topics, and naïve solutions of MORL are introduced at first. Then, several representative MORL approaches and some important directions of recent research are comprehensively reviewed. The relationships between MORL and other related research are also discussed, which include multiobjective optimization, hierarchical RL, and multiagent RL. Moreover, research challenges and open problems of MORL techniques are suggested.
    03/2015; 45(3):385-398. DOI:10.1109/TSMC.2014.2358639
  • [Show abstract] [Hide abstract]
    ABSTRACT: We are interested in developing a multi-goal generator to provide detailed goal representations that help to improve the performance of the adaptive critic design (ACD). In this paper we propose a hierarchical structure of goal generator networks to cascade external reinforcement into more informative internal goal representations in the ACD. This is in contrast with previous designs in which the external reward signal is assigned to the critic network directly. The ACD control system performance is evaluated on the ball-and-beam balancing benchmark under noise-free and various noisy conditions. Simulation results in the form of a comparative study demonstrate effectiveness of our approach.
    Neural Networks (IJCNN), The 2012 International Joint Conference on; 01/2012