Article

Reinforcement Learning in Continuous Spaces by using Learning Fuzzy Classifier Systems

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Despite their proven effectiveness, many Michigan learning classifier systems (LCSs) cannot perform multistep reinforcement learning in continuous spaces. To meet this technical challenge, some LCSs have been designed to learn fuzzy logic rules. They can be largely classified into strength-based and accuracy-based systems. The latter is gaining more research attention in the last decade. However, existing accuracy-based learning systems either address primarily single-step learning problems or require the action space to be discrete. In this paper, a new accuracy-based learning fuzzy classifier system (LFCS) is developed to explicitly handle continuous state input and continuous action output during multistep reinforcement learning. Several technical improvements have been achieved while developing the new learning algorithm. Particularly, we have successfully extended ${Q}$ -learning like credit assignment methods to continuous spaces. To enable direct learning of stochastic strategies for action selection, we have also proposed to use a new fuzzy logic system with stochastic action outputs. Moreover, fine-grained learning of fuzzy rules has been achieved effectively in our algorithm by using a natural gradient learning method. It is the first time that these techniques are utilized substantially in any accuracy-based LFCSs. Meanwhile, in comparison with several recently proposed learning algorithms, our algorithm is shown to perform highly competitively on four benchmark learning problems and a robotics problem. The practical usefulness of our algorithm is also demonstrated by improving the performance of a wireless body area network.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Genetic Programming has been integrated as an RL agent to solve real-world robotic problems with notable success [10]. Many other evolutionary algorithms, such as Learning Classifier Systems, have also been successfully applied to addressing sophisticated RL problems [4,12]. Unlike NEAT-RAC-PGS, these methods only focus on improving learning performance, but their outputs cannot be used to solve other different but similar problems. ...
Conference Paper
To improve the effectiveness of commonly used Policy Gradient Search (PGS) algorithms for Reinforcement Learning (RL), many existing works considered the importance of extracting useful state features from raw environment inputs. However, these works only studied the feature extraction process, but the learned features have not been demonstrated to improve reinforcement learning performance. In this paper, we consider NeuroEvolution of Augmenting Topology (NEAT) for automated feature extraction, as it can evolve Neural Networks with suitable topologies that can help extract useful features. Following this idea, we develop a new algorithm called NEAT with Regular Actor Critic for Policy Gradient Search, which integrates a popular Actor-Critic PGS algorithm (i.e., Regular Actor-Critic) with NEAT based feature extraction. The algorithm manages to learn useful state features as well as good policies to tackle complex RL problems. The results on benchmark problems confirm that our proposed algorithm is significantly more effective than NEAT in terms of learning performance, and that the learned features by our proposed algorithm on one learning problem can maintain the effectiveness while it is used with RAC on another related learning problem.
Conference Paper
Reinforcement learning aims at solving stochastic sequential decision making problems through direct trial-and-error interactions with the learning environment. In this paper, we will develop generalized compatible features to approximate value functions for reliable Reinforcement Learning. Further guided by an Actor-Critic Reinforcement Learning paradigm, we will also develop a generalized updating rule for policy gradient search in order to constantly improve learning performance. Our new updating rule has been examined on several benchmark learning problems. The experimental results on two problems will be reported specifically in this paper. Our results show that, under suitable generalization of the updating rule, the learning performance and reliability can be noticeably improved.
Conference Paper
Full-text available
Over the past decade, advances in electronics, computer science, and wireless technologies have brought Wireless Body Area Network (WBAN) into many interesting applications. Particularly for the healthcare application, reliability is considered as a very important aspect for WBAN. Being the main focus of this paper, we aim at improving reliability by reducing the collision rate and increasing the packet delivery ratio. We also strive to enhance the performance in terms of throughput in the WBAN while maintaining message latency at a reasonable level. In an effort to achieve these goals, we introduce a new cross-layer fuzzy logic based backoff mechanism. Through this method, instead of relying merely on Medium Access Control (MAC), information from physical and application layers will be exploited as well. Moreover, since independent decision making is supported by each sensor without relying on any coordinating devices, communication in the WBAN becomes very flexible. Specifically, rather than determining Backoff Exponent (BE) in IEEE 802.15.4 through a blind try-and-error process, the proposed fuzzy logic system determines the BE by considering both the current channel condition and application requirements. This feature gives the proposed system a higher level of adaptability. The simulation results clearly show noticeable improvement in reliability and performance without significantly increasing the message latency.
Article
Full-text available
Learning Classifier Systems (LCS) are population-based reinforcement learners that were originally designed to model various cognitive phenomena. This paper presents an explicitly cognitive LCS by using spiking neural networks as classifiers, providing each classifier with a measure of temporal dynamism. We employ a constructivist model of growth of both neurons and synaptic connections, which permits a Genetic Algorithm (GA) to automatically evolve sufficiently-complex neural structures. The spiking classifiers are coupled with a temporally-sensitive reinforcement learning algorithm, which allows the system to perform temporal state decomposition by appropriately rewarding "macro-actions," created by chaining together multiple atomic actions. The combination of temporal reinforcement learning and neural information processing is shown to outperform benchmark neural classifier systems, and successfully solve a robotic navigation task.
Article
Full-text available
Reinforcement learning offers to robotics a framework and set of tools for the design of sophisticated and hard-to-engineer behaviors. Conversely, the challenges of robotic problems provide both inspiration, impact, and validation for developments in reinforcement learning. The relationship between disciplines has sufficient promise to be likened to that between physics and mathematics. In this article, we attempt to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots. We highlight both key challenges in robot reinforcement learning as well as notable successes. We discuss how contributions tamed the complexity of the domain and study the role of algorithms, representations, and prior knowledge in achieving these successes. As a result, a particular focus of our paper lies on the choice between model-based and model-free as well as between value-function-based and policy-search methods. By analyzing a simple problem in some detail we demonstrate how reinforcement learning approaches may be profitably applied, and we note throughout open questions and the tremendous potential for future research.
Article
Full-text available
The issue of finding fuzzy models with an inter- pretability as good as possible without decreasing the accuracy is one of the main research topics on genetic fuzzy systems. When they are used to perform on-line reinforcement learning by means of Michigan-style fuzzy classifier systems, this issue becomes even more difficult. Indeed, rule generalization (description of state-action relation- ships with rules as compact as possible) has received a great deal of attention in the discrete-valued learning classifier system field (e.g., XCS is the subject of extensive ongoing research). However, the same issue does not appear to have received a similar level of attention in the case of Michigan-style fuzzy classifier system. This may be due to the difficulty in extending the discrete-valued system operation to the continuous case. The intention of this contribution is to propose an approach to properly develop a fuzzy XCS system for immediate-reward problems.
Article
Full-text available
Abstract The behaviour of pulses of Belousov-Zhabotinski (BZ) reaction-diffusion waves can be controlled automatically through machine learning. By extension, a form of chemical network computing, i.e., a massively parallel non-linear computer, can be realised by such an approach. In this initial study a lightsensitive sub-excitable BZ reaction in which a checkerboard image comprising of varying light intensity cells is projected onto the surface of a thin silica gel impregnated with tris(bipyridyl) ruthenium (II) catalyst and indicator is used to make the network. As a catalyst BZ solution is swept past the gel, pulses of wave
Article
Full-text available
Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms.
Article
Full-text available
Wireless body sensor networks (BSNs) in healthcare systems operate under conflicting requirements. These are the maintenance of the desired reliability and message latency of data transmissions, while simultaneously maximizing battery lifetime of individual body sensors. In doing so, the characteristics of the entire system, including physical, medium access control (MAC), and application layers have to be considered. The aim of this paper is to develop a new MAC model for BSNs to fulfill all these specific rigorous requirements under realistic medical settings. For that purpose, a novel cross-layer fuzzy-rule scheduling algorithm and energy-aware radio activation policies are introduced. The main idea is to integrate a fuzzy-logic system in each body sensor to deal with multiple cross-layer input variables of diverse nature in an independent manner. By being autonomously aware of their current condition, body sensors are able to demand a "collision-free" time slot, whenever they consider it strictly required (e.g. high system packet delay or low body sensor residual battery lifetime). Similarly, they may refuse to transmit, if there is a bad channel link, thus permitting another body sensor to do so. This results in improving the system overall performance. The proposed MAC model is evaluated by computer simulations in terms of quality of service and energy consumption under specific healthcare scenarios.
Article
Full-text available
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
Conference Paper
Full-text available
This paper shows for the first time that a Learning Classifier System, namely XCSF, can learn to control a realistic arm model with four degrees of freedom in a three-dimensional workspace. XCSF learns a locally linear approximation of the Jacobian of the arm kinematics, that is, it learns linear predictions of hand location changes given joint angle changes, where the predictions are conditioned on current joint angles. To control the arm, the linear mappings are inverted—deriving appropriate motor commands given desired hand movement directions. Due to the locally linear model, the inversely desired joint angle changes can be easily derived, while effectively resolving kinematic redundancies on the fly. Adaptive PD controllers are used to finally translate the desired joint angle changes into appropriate motor commands. This paper shows that XCSF scales to three dimensional workspaces. It reliably learns to control a four degree of freedom arm in a three dimensional work space accurately and effectively while flexibly incorporating additional task constraints.
Conference Paper
Full-text available
We present a class of Learning Classifier Systems that learn fuzzy rule-based models, instead of interval-based or Boolean models. We discuss some motivations to consider Learning Fuzzy Classifier Systems (LFCS) as a promising approach to learn mappings from real-valued input to real-valued output, basing on data interpretation implemented by fuzzy sets. We describe some of the approaches explicitly or implicitly referring to this research area, presented in literature since the beginning of the last decade. We also show how the general LFCS model can be considered as a framework for a wide range of systems, each implementing in a different way the modules composing the basic architecture. We also mention some of the applications of LFCS presented in literature, which show the potentialities of this type of systems. Finally, we introduce a general methodology to extend reinforcement distribution algorithms usually not designed to learn fuzzy models. This opens new application possibilities.
Conference Paper
Full-text available
Over the recent years, research on Learning Classifier Systems (LCSs) got more and more pronounced and diverse. There have been significant advances of the LCS field on various fronts including system understanding, representations, computational models, and successful applications. In comparison to other machine learning techniques, the advantages of LCSs have become more pronounced: (1) rule-comprehensibility and thus knowledge extraction is straightforward; (2) online learning is possible; (3) local minima are avoided due to the evolutionary learning component; (4) distributed solution representations evolve; or (5) larger problem domains can be handled. After the tenth edition of the International Workshop on LCSs, more than ever before, we are looking towards an exciting future. More diverse and challenging applications, efficiency enhancements, studies of dynamical systems, and applications to cognitive control approaches appear imminent. The aim of this paper is to provide a look back at the LCS field, whereby we place our emphasis on the recent advances. Moreover, we take a glimpse ahead by discussing future challenges and opportunities for successful system applications in various domains.
Conference Paper
Full-text available
Recognizing that many,payoff functions are continuous and depend on the input state x, the classifier system architecture XCS is extended so that a classifier’s prediction is a linear function of x .O n a continuous nonlinear problem, the extended system, XCS-LP, exhibits high performance and low error, as well as dramatically smaller evolved populations compared,with XCS. Linear predictions are seen as a new direction in the quest for powerful generalization in classifier systems.
Article
Full-text available
Broadly conceived as computational models of cognition and tools for modeling complex adaptive systems, later extended for use in adaptive robotics, and today also applied to effective classification and data-mining–what has happened to learning classifier systems in the last decade? This paper addresses this question by examining the current state of learning classifier system research.
Article
Full-text available
The use of genetic algorithms for designing fuzzy systems provides them with the learning and adap- tation capabilities and is called genetic fuzzy systems (GFSs). This topic has attracted considerable attention in the Computation Intelligence community in the last few years. This paper gives an overview of the field of GFSs, being organized in the following four parts: (a) a taxonomy proposal focused on the fuzzy system components involved in the genetic learning process; (b) a quick snapshot of the GFSs status paying attention to the pioneer GFSs contri- butions, showing the GFSs visibility at ISI Web of Science including the most cited papers and pointing out the milestones covered by the books and the special issues in the topic; (c) the current research lines together with a discussion on critical considerations of the recent devel- opments; and (d) some potential future research directions.
Article
Full-text available
A number of representation schemes have been presented for use within Learning Classifier Systems, ranging from binary encodings to neural networks. This paper presents results from an investigation into using discrete and fuzzy dynamical system representations within the XCSF Learning Classifier System. In particular, asynchronous Random Boolean Networks are used to represent the traditional condition-action production system rules in the discrete case and asynchronous Fuzzy Logic Networks in the continuous-valued case. It is shown possible to use self-adaptive, open-ended evolution to design an ensemble of such dynamical systems within XCSF to solve a number of well-known test problems.
Article
Full-text available
An important strength of learning classifier systems (LCSs) lies in the combination of genetic optimization techniques with gradient-based approximation techniques. The chosen approximation technique develops locally optimal approximations, such as accurate classification estimates, Q-value predictions, or linear function approximations. The genetic optimization technique is designed to distribute these local approximations efficiently over the problem space. Together, the two components develop a distributed, locally optimized problem solution in the form of a population of expert rules, often called classifiers. In function approximation problems, the XCSF classifier system develops a problem solution in the form of overlapping, piecewise linear approximations. This paper shows that XCSF performance on function approximation problems additively benefits from: 1) improved representations; 2) improved genetic operators; and 3) improved approximation techniques. Additionally, this paper introduces a novel closest classifier matching mechanism for the efficient compaction of XCS's final problem solution. The resulting compaction mechanism can boil the population size down by 90% on average, while decreasing prediction accuracy only marginally. Performance evaluations show that the additional mechanisms enable XCSF to reliably, accurately, and compactly approximate even seven dimensional functions. Performance comparisons with other, heuristic function approximation techniques show that XCSF yields competitive or even superior noise-robust performance.
Article
Full-text available
The accuracy-based XCS classifier system has been shown to solve typical data mining problems in a machine-learning competitive way. However, successful applications in multistep problems, modeled by a Markov decision process, were restricted to very small problems. Until now, the temporal difference learning technique in XCS was based on deterministic updates. However, since a prediction is actually generated by a set of rules in XCS and Learning Classifier Systems in general, gradient-based update methods are applicable. The extension of XCS to gradient-based update methods results in a classifier system that is more robust and more parameter independent, solving large and difficult maze problems reliably. Additionally, the extension to gradient methods highlights the relation of XCS to other function approximation methods in reinforcement learning.
Article
Full-text available
This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the “winning unit” weighted by their Q-values. Then, TD(λ) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.
Conference Paper
Full-text available
Evolutionary Learning Classifier Systems (LCSs) are rule based systems that have been used effectively in concept learning. XCS is a prominent LCS that uses genetic algorithms and reinforcement learning techniques. In traditional machine learning, early stopping have been investigated extensively to an extent that it is now a default mechanism in many systems. There has been a belief that EC methods are more resilient to overfitting. Therefore, this topic is under-investigated in the evolutionary computation literature and has not been investigated in LCS. In this paper, we show that it is necessary to stop evolution in LCS using a stopping criteria other than a maximum number of generations and that evolution may suffer from overfitting similar to other ML methods. I. I NTRODUCTION
Article
Full-text available
If complexity is your problem, learning classifier systems (LCSs) may offer a solution. These rule-based, multifaceted, machine learning algorithms originated and have evolved in the cradle of evolutionary biology and artificial intelligence. The LCS concept has inspired a multitude of implementations adapted to manage the different problem domains to which it has been applied (e.g., autonomous robotics, classification, knowledge discovery, and modeling). One field that is taking increasing notice of LCS is epidemiology, where there is a growing demand for powerful tools to facilitate etiological discovery. Unfortunately, implementation optimization is nontrivial, and a cohesive encapsulation of implementation alternatives seems to be lacking. This paper aims to provide an accessible foundation for researchers of different backgrounds interested in selecting or developing their own LCS. Included is a simple yet thorough introduction, a historical review, and a roadmap of algorithmic components, emphasizing differences in alternative LCS implementations.
Conference Paper
Full-text available
We apply XCS with computed prediction (XCSF) to tackle multistep reinforcement learning problems involving continuous inputs. In essence we use XCSF as a method of generalized reinforcement learning. We show that in domains involving continuous inputs and delayed rewards XCSF can evolve compact populations of accurate maximally general classifiers which represent the optimal solution to the target problem. We compare the performance of XCSF with that of tabular Q-learning adapted to the continuous domains considered here. The results we present show that XCSF can converge much faster than tabular techniques while producing more compact solutions. Our results also suggest that when exploration is less effective in some areas of the problem space, XCSF can exploit effective generalizations to extend the evolved knowledge beyond the frequently explored areas. In contrast, in the same situations, the convergence speed of tabular Q-learning worsens.
Conference Paper
Aimed at achieving multi-step reinforcement learning in continuous spaces, many Learning Classifier Systems have been developed recently to learn fuzzy logic rules. Among these systems, accuracy-based Michigan learning fuzzy classifier systems are gaining increasing research attention. However, in order to learn effectively, existing accuracy-based systems often require the action space to be discrete. Without this restriction, only single-step learning may be supported. In this paper, we will develop a new accuracy-based learning fuzzy classifier system that can perform multi-step reinforcement learning in completely continuous domains. To achieve this goal, a special fuzzy logic system will be introduced in this paper where the output action from the system is modelled through a continuous probability distribution. A natural gradient learning technique will be further exploited to fine-tune the action outputs of individual fuzzy rules. The effectiveness of our learning system has been verified on several benchmark problems.
Conference Paper
Learning classifier systems (LCSs) are rule-based machine learning technologies designed to learn optimal decision-making policies in the form of a compact set of maximally general and accurate rules. A study of the literature reveals that most of the existing LCSs focused primarily on learning deterministic policies. However a desirable policy may often be stochastic, in particular when the environment is partially observable. To fill this gap, based on XCS, which is one of the most successful accuracy-based LCSs, a new Michigan-style LCS called Natural XCS (i.e. NXCS) is proposed in this paper. NXCS enables direct learning of stochastic policies by utilizing a natural gradient learning technology under a policy gradient framework. Its effectiveness is experimentally compared with XCS and one of its variation known as XCSμ in this paper. Our results show that NXCS can achieve competitive performance in both deterministic and stochastic multi-step problems.
Conference Paper
In this article, a novel consensus clustering method (voting-XCSc) via learning classifier system is proposed, which aims (1) to automatically determine the clustering number and (2) to achieve consensus results by reducing the influence coming from the randomness. When conducting the clustering for the data points, the proposed voting-XCSc will first employ the XCSc to generate a set of clustering results with different clustering numbers, and then it will adopt the dissociation-based strategy to experimentally determine the clustering number among all the candidates. Finally, a majority voting-based consensus method is applied to obtain the final clustering results. The proposed voting-XCSc has been evaluated on both the toy examples as well as two real clustering-related applications. i.e, lung cancer image identification, image segmentation. The results demonstrate the voting-XCSc can obtain the superior performance compared with XCSc, K-means, and other state-of-the-arts.
Article
To solve reinforcement learning problems, many learning classifier systems (LCSs) are designed to learn state-action value functions through a compact set of maximally general and accurate rules. Most of these systems focus primarily on learning deterministic policies by using a greedy action selection strategy. However, in practice, it may be more flexible and desirable to learn stochastic policies, which can be considered as direct extensions of their deterministic counterparts. In this paper, we aim to achieve this goal by extending each rule with a new policy parameter. Meanwhile, a new method for adaptive learning of stochastic action selection strategies based on a policy gradient framework has also been introduced. Using this method, we have developed two new learning systems, one based on a regular gradient learning technology and the other based on a new natural gradient learning method. Both learning systems have been evaluated on three different types of reinforcement learning problems. The promising performance of the two systems clearly shows that LCSs provide a suitable platform for efficient and reliable learning of stochastic policies.
Article
This paper describes the fuzzy classifier system and a new payoff distribution scheme that performs true reinforcement learning. The fuzzy classifier system is a crossover between learning classifier systems and fuzzy logic controllers. By the use of fuzzy logic, the fuzzy classifier system allows for variables to take continuous values, and thus, could be applied to the identification and control of continuous dynamic systems. The fuzzy classifier system adapt the mechanics of learning classifier system to fuzzy logic to evolve sets of coadapted fuzzy rules. The payoff distribution scheme presented here opens the way for the use of the fuzzy classifier system in control tasks. Additionally, other mechanisms that improve learning speed are presented.
Article
This paper concerns the design of user-oriented, interactive systems that combine knowledge-acquisition techniques with the basic techniques of database searching and word processing. The object of such a system is to create, for each individual user, a database and acquisition system relevant to his changing needs and purposes. Six criteria for an interactive knowledge-acquisition system employing learning are presented. Then a prototype, based on extant systems and using an adaptive algorithm as an inference procedure, is used to explore these criteria in detail.
Article
We applyXCS withcomputed prediction (XCSF) totackle multistep reinforcement learning prob- lemsinvolving continuous inputs. Inessence weuse XCSFasamethod ofgeneralized reinforcement learn- ing.Weshowthat indomains involving continuous in- putsanddelayed rewards XCSFcanevolve compact populations ofaccurate maximally general classifiers which represent theoptimal solution tothetarget prob- lem.Wecompare theperformance ofXCSFwiththat of tabular Q-learning adapted tothecontinuous domains considered here. Theresults wepresent showthat XCSF canconverge muchfaster thantabular techniques while producing morecompact solutions. Ourresults also sug- gest that whenexploration isless effective insomeareas oftheproblem space, XCSFcanexploit effective gen- eralizations toextend theevolved knowledge beyond the frequently explored areas. Incontrast, inthesamesitua- tions, theconvergence speed oftabular Q-learning wors- ens.
Article
A mathematical tool to build a fuzzy model of a system where fuzzy implications and reasoning are used is presented. The premise of an implication is the description of fuzzy subspace of inputs and its consequence is a linear input-output relation. The method of identification of a system using its input-output data is then shown. Two applications of the method to industrial processes are also discussed: a water cleaning process and a converter in a steel-making process.
Conference Paper
Traffic control in large cities is a difficult and non-trivial optimization problem. Most of the automated urban traffic control systems are based on deterministic algorithms and have a multi-level architecture; to achieve global optimality, hierarchical control algorithms are generally employed. However, these algorithms are often slow to react to varying conditions, and it has been recognized that incorporating computational intelligence into the lower levels can remove some burdens of algorithm calculation and decision making from higher levels. An alternative approach is to use a fully distributed architecture in which there is effectively only one (low) level of control. Such systems are aimed at increasing the response time of the controller and, again, these often incorporate computational intelligence techniques. This paper presents preliminary work into designing an intelligent local controller primarily for distributed traffic control systems. The idea is to use a classifier system with a fuzzy rule representation to determine useful junction control rules within the dynamic environment.
Article
Fuzzy logic controllers (FLCs) consitute knowledge-based systems that include fuzzy rules and fuzzy membership functions to incorporate human knowledge into their knowledge base. The specification of fuzzy rules and fuzzy membership functions is one of the key question when designing FLCs, and is generally affected by subjective decisions. Some efforts have been made to obtain an improvement on system performance by incorporating learning mechanisms to modify the rules and/or membership functions of the FLC. Genetic algorithms are probabilistic search and optimization procedures based on natural genetics. This paper proposes a way to apply (with a learning purpose) genetic algorithms to FLCs, and presents an application designed to control the synthesis of the biped walk of a simulated 2-D biped robot.
Article
The synthesis of genetics-based machine learning and fuzzy logic is beginning to show promise as a potent tool in solving complex control problems in multi-variate non-linear systems. In this paper an overview of current research applying the genetic algorithm to fuzzy rule based control is presented. A novel approach to genetics-based machine learning of fuzzy controllers, called a Pittsburgh Fuzzy Classifier System # 1 (P-FCS1) is proposed. P-FCS1 is based on the Pittsburgh model of learning classifier systems and employs variable length rule-sets and simultaneously evolves fuzzy set membership functions and relations. A new crossover operator which respects the functional linkage between fuzzy rules with overlapping input fuzzy set membership functions is introduced. Experimental results using P-FCS 1 are reported and compared with other published results. Application of P-FCS1 to a distributed control problem (dynamic routing in computer networks) is also described and experimental results are presented.
Article
A classifier system is a machine learning system that learns syntactically simple string rules (called classifiers) through a genetic algorithm to guide its performance in an arbitrary environment. In a classifier system, the bucket brigade algorithm is used to solve the problem of credit assignment, which is a critical problem in the field of reinforcement learning. In this paper, we propose a new approach to fuzzy classifier systems and a neuro-fuzzy system referred to as ACSNFIS to implement the proposed fuzzy classifier system. The proposed system is tested by the balancing problem of a cart pole and the back-driving problem of a truck to demonstrate its performance.
Article
We present ELF, a learning fuzzy classifier system (LFCS), and its application to the field of Learning Autonomous Agents. In particular, we will show how this kind of Reinforcement Learning systems can be successfully applied to learn both behaviors and their coordination for Autonomous Agents. We will discuss the importance of knowledge representation approach based on fuzzy sets to reduce the search space without losing the required precision. Moreover, we will show how we have applied ELF to learn the distributed coordination among agents which can exchange information with each other. The experimental validation has been done on software agents interacting in a real-time task.
Article
We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor–critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor–critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Conference Paper
Anticipatory Classifier Systems (ACS) are classifier systems that learn by using the cognitive mechanism of anticipatory behavioral control which was introduced in cognitive psychology by Hoffmann [4]. They can learn in deterministic multi-step environments.1 A stepwise introduction to ACS is given. We start with the basic algorithm and apply it in simple “woods” environments. It will be shown that this algorithm can only learn in a special kind of deterministic multi-step environments. Two extensions are discussed. The first one enables an ACS to learn in any deterministic multi-step environment. The second one allows an ACS to deal with a special kind of non-Markov state.
Conference Paper
This paper describes a fuzzy classifier system using the Pittsburgh model. In this model genetic operations and fitness assignment apply to complete rule-sets, rather than to individual rules, thus overcoming the problem of conflicting individual and collective interests of classifiers. The fuzzy classifier system presented here dynamically adjusts both membership functions and fuzzy relations. A modified crossover operator for particular use in Pittsburgh-style fuzzy classifier systems, with variable length rule-sets, is introduced and evaluated. Experimental results of the new system, which appear encouraging, are presented and discussed.
Article
In this paper, we focus on the coordination issues in a multiagent setting. Two coordination algorithms based on reinforcement learning are presented and the- oretically analyzed. Our Fuzzy Subjective Task Struc- ture (FSTS) model is described and extended so that the information essential to the agent coordination is eectively and explicitly modeled and incorporated into a general reinforcement learning structure. When compared with other learning based coordination ap- proaches, we argue that due to the explicit model- ing and exploitation of the interdependencies among agents, our approach is more ecient and eective, thus widely applicable.
Article
a b s t r a c t In this paper we propose GP-COACH, a Genetic Programming-based method for the learn-ing of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems. GP-COACH learns disjunctive normal form rules (generated by means of a con-text-free grammar) coded as one rule per tree. The population constitutes the rule base, so it is a genetic cooperative-competitive learning approach. GP-COACH uses a token compe-tition mechanism to maintain the diversity of the population and this obliges the rules to compete and cooperate among themselves and allows the obtaining of a compact set of fuzzy rules. The results obtained have been validated by the use of non-parametric statis-tical tests, showing a good performance in terms of accuracy and interpretability.
Article
One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor–Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart–pole balancing control illustrate the performance and applicability of the proposed FACRLN.
Article
Advances in wireless communication tech- nologies, such as wearable and implantable biosensors, along with recent developments in the embedded com- puting area are enabling the design, development, and implementation of body area networks. This class of networks is paving the way for the deployment of inno- vative healthcare monitoring applications. In the past few years, much of the research in the area of body area networks has focused on issues related to wireless sen- sor designs, sensor miniaturization, low-power sensor circuitry, signal processing, and communications proto- cols. In this paper, we present an overview of body area networks, and a discussion of BAN communications types and their related issues. We provide a detailed investigation of sensor devices, physical layer, data link layer, and radio technology aspects of BAN research. We also present a taxonomy of BAN projects that have been introduced/proposed to date. Finally, we highlight some of the design challenges and open issues that still need to be addressed to make BANs truly ubiquitous for a wide range of applications.
Article
In spite of the existence of a large diversity in literature related to scheduling algorithms in computational grids, there are only a few efficiently dealing with the inherent uncertainty and dynamism of resources and applications of these systems. Further, the need to meet both users and providers QoS requirements, such as tardiness or resource utilization, calls for new adaptive scheduling strategies that consider current and future status of the grid. Fuzzy Rule-Based Systems (FRBSs) are knowledge based systems that are recently emerging as an alternative for the development of grid scheduling middleware. Their main strength resides in their adaptability to changes in environment and their ability to model vagueness. However, since their performance strongly depends on the quality of their acquired knowledge, new automatic learning strategies are pursued. In this work, a FRBS meta-scheduler for scheduling jobs in computational grids is suggested which incorporates a novel knowledge acquisition method based on Swarm Intelligence. Simulations results show that the fuzzy meta-scheduler improves six classical queued-based and scheduled-based approaches present in today's production systems and it is able to easily adapt to changes in the grid conditions.
Conference Paper
This paper describes work using an artificial, behaving, animal model (termed an “ani-mat”) to study intelligence at a primitive level. The motivation for our somewhat unusual approach is the view that the essence of intelligence is exhibited by animals surviving in real environments. Therefore, insight into intelligence should be obtainable from simulated animals and environments, even simple ones, provided the simulations suitably reflect the animal’s survival problems. The starting point for the research is an explicit definition of intelligence which guides model construction. In experiments, a particular animat is placed in an environment and evaluated as to its rates of improvement in performance and perceptual generalization. Learning is central, because we wish to provide the animat with adaptive mechanisms which yield rapid and solid improvement but themselves contain minimal a priori information.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Conference Paper
Most research in the held of learning classifier systems today concentrates on the accuracy-based XCS. This paper presents initial results from an extension of XCS that operates in continuous environments on a physical robot. This is compared with a similar extension based upon the simpler ZCS. The new system is shown to be capable of near optimal performance in a simple robotic task. To the best of our knowledge, this is the first application of an accuracy-based LCS to controlling a physical agent in the real world without a priori discretization.