A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming
ABSTRACT In this paper we propose a hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming (ADP). The key idea of this architecture is to integrate a reference network to provide the internal reinforcement representation (secondary reinforcement signal) to interact with the operation of the learning system. Such a reference network serves an important role to build the internal goal representations. Furthermore, motivated by recent research in neurobiological and psychology research, the proposed ADP architecture can be designed in a hierarchical way, in which different levels of internal reinforcement signals can be developed to represent multi-level goals for the intelligent system. Detailed system level architecture, learning and adaptation principle, and simulation results are presented in this work to demonstrate the effectiveness of this work.
- [Show abstract] [Hide abstract]
ABSTRACT: We are interested in developing a multi-goal generator to provide detailed goal representations that help to improve the performance of the adaptive critic design (ACD). In this paper we propose a hierarchical structure of goal generator networks to cascade external reinforcement into more informative internal goal representations in the ACD. This is in contrast with previous designs in which the external reward signal is assigned to the critic network directly. The ACD control system performance is evaluated on the ball-and-beam balancing benchmark under noise-free and various noisy conditions. Simulation results in the form of a comparative study demonstrate effectiveness of our approach.Neural Networks (IJCNN), The 2012 International Joint Conference on; 01/2012
Conference Paper: Neural and fuzzy dynamic programming for under-actuated systems[Show abstract] [Hide abstract]
ABSTRACT: This paper aims to integrate the fuzzy control with adaptive dynamic programming (ADP) scheme, to provide an optimized fuzzy control performance, together with faster convergence of ADP for the help of the fuzzy prior knowledge. ADP usually consists of two neural networks, one is the Actor as the controller, the other is the Critic as the performance evaluator. A fuzzy controller applied in many fields can be used instead as the Actor to speed up the learning convergence, because of its simplicity and prior information on fuzzy membership and rules. The parameters of the fuzzy rules are learned by ADP scheme to approach optimal control performance. The feature of fuzzy controller makes the system steady and robust to system states and uncertainties. Simulations on under-actuated systems, a cart-pole plant and a pendubot plant, are implemented. It is verified that the proposed scheme is capable of balancing under-actuated systems and has a wider control zone.Neural Networks (IJCNN), The 2012 International Joint Conference on; 01/2012
- [Show abstract] [Hide abstract]
ABSTRACT: Learning problems over high dimensional data are common in real world applications. In this study, a challenging, large and lifelike database, the German traffic sign benchmark data, containing 43 classes and 51840 images, is used to demonstrate the strength of our proposed boosted support vector machine with deep learning architecture. Recognition of traffic signs is difficult, and it involves multiple categories, contains subsets of classes that may appear very similar to each other, and tends to have large variations within class in visual appearances due to illumination changes, partial occlusions, rotations and weather conditions. By combining a low variance error boosting algorithm, a low bias error support vector machine and deep learning architecture, an efficient and effective boosting support vector machine method is presented. It has been shown to greatly reduce data dimension and build classification models with higher prediction accuracy while utilizing fewer features and training instances. In evaluation, the proposed method outperforms Adaboost.M1, cw-Boost, and support vector machine, and it achieves ultra fast processing time (0.0038 per prediction) and high accuracy (93.5 %) on prediction of separate test data utilizes less than 35 % of the training instances. Moreover, the method is applicable to a standard standalone PC without requiring super computers with enormous memory spaces.Applied Intelligence 10/2013; 39(3):465-474. · 1.85 Impact Factor