Conference PaperPDF Available

Use of Genetic Programming for the Search of a New Learning Rule for Neural Networks

Authors:

Abstract and Figures

In previous work we explained how to use standard optimization methods such as simulated annealing, gradient descent and genetic algorithms to optimize a parametric function which could be used as a learning rule for neural networks. To use these methods, we had to choose a fixed number of parameters and a rigid form for the learning rule. In this article, we propose to use genetic programming to find not only the values of rule parameters but also the optimal number of parameters and the form of the rule. Experiments on classification tasks suggest genetic programming finds better learning rules than other optimization methods. Furthermore, the best rule found with genetic programming outperformed the well-known backpropagation algorithm for a given set of tasks
Content may be subject to copyright.
Postsynaptic Neuron
Facilitatory Neuron
Neuromodulatory Synapse
Synapse
Chemical Modulator
Presynaptic Neuron
Output
Error
Output
InputsInputs
... Substantial and varied work on the evolution of learning has taken place since the early 1990s (e.g., [3]- [6], [19], [20]) and continues to this day (e.g., [2], [10], [21]), with work ranging from evolving learning rules for single neurons by encoding the parameters of a template formula [3] to evolving the structure of the rules themselves [6] to evolving the topology, components, and hyperparameters of deep ANNs [21]; covering supervised [3] and reinforcement learning [10], [22]; and showing that known rules could be rediscovered [3] as well as that new rules could be discovered [19]. ...
... Substantial and varied work on the evolution of learning has taken place since the early 1990s (e.g., [3]- [6], [19], [20]) and continues to this day (e.g., [2], [10], [21]), with work ranging from evolving learning rules for single neurons by encoding the parameters of a template formula [3] to evolving the structure of the rules themselves [6] to evolving the topology, components, and hyperparameters of deep ANNs [21]; covering supervised [3] and reinforcement learning [10], [22]; and showing that known rules could be rediscovered [3] as well as that new rules could be discovered [19]. ...
... Their forms hinge on manual discovery and expert crafting. A symbolic optimizer has little memory overhead, and can be easily interpreted or transferred like an equation or a computer program (Bello et al., 2017;Runarsson & Jonsson, 2000;Orchard & Wang, 2016;Bengio et al., 1994). A handful of L2O methods tried to search for good symbolic rules from scratch, using evolutionary or reinforcement learning (Bello et al., 2017;Real et al., 2020;Runarsson & Jonsson, 2000;Orchard & Wang, 2016;Bengio et al., 1994). ...
... A symbolic optimizer has little memory overhead, and can be easily interpreted or transferred like an equation or a computer program (Bello et al., 2017;Runarsson & Jonsson, 2000;Orchard & Wang, 2016;Bengio et al., 1994). A handful of L2O methods tried to search for good symbolic rules from scratch, using evolutionary or reinforcement learning (Bello et al., 2017;Real et al., 2020;Runarsson & Jonsson, 2000;Orchard & Wang, 2016;Bengio et al., 1994). Unfortunately, those direct search methods quickly become inefficient when the search space of symbols becomes large. ...
Preprint
Full-text available
Recent studies on Learning to Optimize (L2O) suggest a promising path to automating and accelerating the optimization procedure for complicated tasks. Existing L2O models parameterize optimization rules by neural networks, and learn those numerical rules via meta-training. However, they face two common pitfalls: (1) scalability: the numerical rules represented by neural networks create extra memory overhead for applying L2O models, and limit their applicability to optimizing larger tasks; (2) interpretability: it is unclear what an L2O model has learned in its black-box optimization rule, nor is it straightforward to compare different L2O models in an explainable way. To avoid both pitfalls, this paper proves the concept that we can "kill two birds by one stone", by introducing the powerful tool of symbolic regression to L2O. In this paper, we establish a holistic symbolic representation and analysis framework for L2O, which yields a series of insights for learnable optimizers. Leveraging our findings, we further propose a lightweight L2O model that can be meta-trained on large-scale problems and outperformed human-designed and tuned optimizers. Our work is set to supply a brand-new perspective to L2O research. Codes are available at: https://github.com/VITA-Group/Symbolic-Learning-To-Optimize.
... Each candidate solution encodes a trained network and their quality is determined according to the performance on a specific task. There are many approaches to the evolutionary learning of ANNs; they can be divided into three groups: optimisation of the (i) learning algorithm parameters [12,17]; (ii) learning rules [5,18]; or (iii) weights and bias values [21,8,10,3]. ...
... There are several approaches where GP is used to evolve learning rules of ANNs (e.g. [5,18]). GP is well-known for its results in symbolic regression tasks. ...
... Furthermore, they have the potential to discover domain-specific solutions that are more efficient than general-purpose algorithms. Early experiments focusing on learning in artificial neural networks (ANNs) made use of gradient descent or genetic algorithms to optimize parameterized learning rules (Bengio et al., 1990;Bengio et al., 1992;Bengio et al., 1993) or genetic programming to evolve less constrained learning rules (Bengio et al., 1994;Radi and Poli, 2003), rediscovering mechanisms resembling the backpropagation of errors (Linnainmaa, 1970;Ivakhnenko, 1971;Rumelhart et al., 1985). Recent experiments demonstrate how optimization methods can design optimization algorithms for recurrent ANNs (Andrychowicz et al., 2016), evolve machine learning algorithms from scratch (Real et al., 2020), and optimize parametrized learning rules in neuronal networks to achieve a desired function (Confavreux et al., 2020). ...
Article
Full-text available
Continuous adaptation allows survival in an ever-changing world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hard-wired organisms. How these changes can be mathematically described at the phenomenological level, as so-called ‘plasticity rules’, is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for discovering biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. By evolving compact symbolic expressions, we ensure the discovered plasticity rules are amenable to intuitive understanding, fundamental for successful communication and human-guided generalization. We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning from target signals, and uncover various functionally equivalent STDP-like rules with tuned homeostatic mechanisms.
... Learning to learn is an established idea in in supervised learning, including meta-learning with genetic programming (Schmidhuber, 1987;Holland, 1975;Koza, 1993), learning a neural network update rule (Bengio et al., 1991), and self modifying RNNs (Schmidhuber, 1993). Genetic programming has been used to find new loss functions (Bengio et al., 1994;Trujillo & Olague, 2006). More recently, AutoML (Hutter et al., 2018) aims to automate the machine learning training process. ...
Preprint
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.
... Furthermore, they have the potential to discover domain-specific solutions that are more efficient than general-purpose algorithms. Early experiments focusing on learning in artificial neural networks (ANNs) made use of gradient descent or genetic algorithms to optimize parameterized learning rules (Bengio et al., 1990(Bengio et al., , 1992(Bengio et al., , 1993 or genetic programming to evolve less constrained learning rules (Bengio et al., 1994;Radi and Poli, 2003), rediscovering mechanisms resembling the backpropagation of errors (Ivakhnenko, 1971;Rumelhart et al., 1985). Recent experiments demonstrate how optimization methods can design optimization algorithms for recurrent ANNs (Andrychowicz et al., 2016). ...
Preprint
Continuous adaptation allows survival in an ever-changing world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hard-wired organisms. How these adjustments come about is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for finding biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. This approach makes the relative weighting of guiding factors explicit, explores large search spaces, encourages diverse sets of hypotheses, and can discover domain-specific solutions. By evolving compact symbolic expressions we ensure the discovered plasticity rules are amenable to intuitive understanding. This is fundamental for successful communication and human-guided generalization, for example to different network architectures or task domains. We demonstrate the flexibility of our approach by discovering efficient plasticity rules in typical learning scenarios.
... More general than choosing between a limited set of learning rules and/or adjusting the parameters of existing learning rules is evolving the structure of the learning rule update equation itself. This was done as early as 1994 for supervised learning using genetic programming to evolve trees of operators and operands [29] but we can find no record of learning rule structure evolution in RL to date. ...
... A related approach is using genetic programming to evolve update equations for neural networks (e.g., Bengio et al. (1994); Runarsson & Jonsson (2000); Orchard & Wang (2016)). Genetic programming however is often slow and requires many heuristics to work well. ...
Article
We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.
... An interesting idea would be to not only tune the parameters but also the actual optimization algorithm. This has already been studied to some extent for evolving specialized optimizers by Bengio et al. [138] and Radi and Poli [139], and for evolving PSO variants by Poli et al [140]. They used Genetic Programming (GP) which employs basic evolutionary concepts to construct computer programs (see chapter 1). ...
Thesis
Full-text available
This thesis is about the tuning and simplification of black-box (direct-search, derivative-free) optimization methods, which by definition do not use gradient information to guide their search for an optimum but merely need a fitness (cost, error, objective) measure for each candidate solution to the optimization problem. Such optimization methods often have parameters that infuence their behaviour and efficacy. A Meta-Optimization technique is presented here for tuning the behavioural parameters of an optimization method by employing an additional layer of optimization. This is used in a number of experiments on two popular optimization methods, Differential Evolution and Particle Swarm Optimization, and unveils the true performance capabilities of an optimizer in different usage scenarios. It is found that state-of-the-art optimizer variants with their supposedly adaptive behavioural parameters do not have a general and consistent performance advantage but are outperformed in several cases by simplified optimizers, if only the behavioural parameters are tuned properly.
Chapter
Genetic learning forms supporting the ceaseless rule learning (CRL) approach are described by taking care of the preparation issue in a few stages. As a result, they comprises of minimum, two stages: an age procedure, that builds up an essential arrangement of fluffy principles speaking to the information existing inside the informational collection, and a post-preparing process, with the capacity of refining the past standard set in order to dispose of the excess guidelines that developed during the age stage and to pick those fluffy principles that collaborate in an ideal way. Genetic Fuzzy Rule-Based Systems (GFRBSs) fortifying the CRL approach are normally called multi-stage Genetic Fuzzy Rule-Based Systems. The multi-stage structure might be an immediate result of the path during which GFRBSs bolstered the CRL approach settle the Chance Constrained Programming (CCP). These kind of frameworks endeavor to comprehend the CCP through a way that blends the advantages of the Pittusburg and Michigan approach [14]. The objective of the CRL approach is proportional back the component of the pursuit space by encoding singular standards in chromosome like in Michigan approach, however the assessment conspire take the participation of rules viable like in Pitt approach. The generation process forces competition between fuzzy rules, as in genetic learning processes grounded on the Michigan approach, to get a fuzzy rule set composed of the simplest possible fuzzy rules. To do so, a fuzzy rule generating method is run several times by an ceaseless covering method that wraps it and analyses the covering that the consecutively rules learnt cause within the training data set. Hence, the cooperation among the fuzzy rules generated within the different runs is merely briefly addressed by means of a rule penalty criterion. The later post-processing stage forces cooperation between the fuzzy rules generated in generation process by refining or eliminating the previously generated redundant or excessive fuzzy rules so as to get a final fuzzy rule set that demonstrates an efficient performance.