BookPDF Available

A Field Guide to Genetic Programming

Authors:
A preview of the PDF is not available
... Some important distributions for queuing modeling can be read in (Koole, 2014). Symbolic regression performed by GP is discussed in the books (Koza, 1998) and (Poli, B. Langdon, & Mcphee, 2008). Other variants of evolutionary computation can be read in (Eiben & Smith, 2015). ...
... Koza shows that various problems, such as symbolic regression, can be automatically solved by evolving computer programs. By letting computer programs evolve, it is not required in GP to know or specify the form or structure of the solution in advance (Poli et al., 2008). Initially, Koza expresses programs as syntax trees rather than lines of code (1998). ...
... In this section, we will describe the main components of Tree-based Genetic Programming. Poli et al. (2008) is used as guidance to discuss all components. ...
Research
Full-text available
In queuing theory, closed-form expressions for key performance metrics (such as the waiting time distribution, numbers of customers in the system, etc.) are useful as they show how the performance of a queuing system depends on the system parameters. Unfortunately, many queuing systems prohibit the derivation of closed-form expressions. Alternatively, mathematical approximations or simulation approaches are very useful, but they fail to give fundamental insight into the functional relationship between system parameters and the performance measures of a queuing system. This paper proposes a data-driven approach to obtain closed-form expressions for key performance metrics by symbolic regression. Searching the mathematical expressions space is performed by genetic programming, an evolutionary algorithm variant. Data sets are created by selecting system parameters for a variety of single node queuing systems and obtaining the key performance metrics by simulation when these metrics are not derivable. Three different sampling techniques are used for selecting parameters: single random sampling, stratified sampling, and systematic sampling. This research shows that for the M/M/1, M/G/1, and M/M/s queuing systems, genetic programming is able to obtain exact performance metrics. Prior knowledge, such as the heavy-traffic behavior, can improve the speed of convergence when for example this behavior is implemented in the form of an explanatory variable. Furthermore, it is shown that none of the sampling techniques resulted in improving the speed of convergence. For the M/G/s queue, genetic programming is able to find accurate approximations for some performance metrics when using prior knowledge of the heavy traffic behavior and the probability of waiting.
... Table 1: The distribution of gene categories for randomly generated genes and genomes, which is used to create the initial population and supply genes for mutation operators. Some literal genes are sampled from a discrete set dependent on the problem, while others were created via ephemeral random constant (ERC) generators [20]. This distribution was selected on the basis of human intuition and is potentially sub-optimal. ...
Preprint
Full-text available
General program synthesis has become an important application area for genetic programming (GP), and for artificial intelligence more generally. Code Building Genetic Programming (CBGP) is a recently introduced GP method for general program synthesis that leverages reflection and first class specifications to support the evolution of programs that may use arbitrary data types, polymorphism, and functions drawn from existing codebases. However, neither a formal description nor a thorough benchmarking of CBGP have yet been reported. In this work, we formalize the method of CBGP using algorithms from type theory. Specially, we show that a functional programming language and a Hindley-Milner type system can be used to evolve type-safe programs using the process abstractly described in the original CBGP paper. Furthermore, we perform a comprehensive analysis of the search performance of this functional variant of CBGP compared to other contemporary GP program synthesis methods.
... Therefore, maintaining a balance that rewards performance yet encourages diversity in the population is a crucial consideration in the design of methods for selecting parents. For example, tournament selection -a popular selection schemed in EAs [57,58] -randomly selects several individuals from within the population and then in a manner of a tournament chooses the fittest of the selection as the winner that will breed (together with the winner of another tournament, if the genetic operator is crossover). The tournament size is normally much smaller than the population size (typically 2 -7 [57], while population sizes are 100+). ...
Thesis
Full-text available
The quest for simple solutions is not new in machine learning (ML) and related methods such as genetic programming (GP). GP is a nature-inspired approach to the automatic programming of computers used to create solutions to a broad range of computational problems. However, the evolving solutions can grow unnecessarily complex, which presents considerable challenges. Typically, the control of complexity in GP means reducing the sizes of the evolved expressions – known as bloat-control. However, size is a function of solution representation, and hence it does not consistently capture complexity across diverse GP applications. Instead, this thesis proposes to estimate the complexity of the evolving solutions by their evaluation time – the computational time required to evaluate a GP evolved solution on the given task. After all, the evaluation time depends not only on the size of the evolved expressions but also on other aspects such as their composition, thus acting as a more nuanced measure of model complexity than the expression size alone. Also, GP evaluates all the solutions in a population identically to determine their relative performance, for example, with the same dataset. Therefore, evaluation time can consistently compare the relative complexity. To discourage complexity using the proposed evaluation time, two approaches are used. The first approach explicitly penalises models with long evaluation times by customising well-tested techniques that traditionally control the size. The second uses a novel technique that implicitly discourages long evaluation times by incorporating a race condition in the GP process. The proposed methods yield accurate yet simple solutions; furthermore, the implicit method improves the runtime and training speed of GP. Across a diverse suite of GP applications, the evaluation time methods proffer several qualitative advantages over the bloat-control methods. They effectively manage the functional complexity of regression models to enable them to predict unseen data (generalise) better than those produced by bloat-control. In two feature engineering applications, they decrease the number of features – principally responsible for model complexity – while bloat-control does not. In a robot control application, they evolve accurate and efficient routines – efficient routines use fewer time steps to complete their tasks; bloat-control could not detect the efficiency of the programs. In Boolean logic problems where size emerges as the major cause of complexity, these methods are not hindered and perform at least as well as bloat-control. Overall, the proposed system characterises and manages various forms of complexity; also, it is broadly applicable and, hence, suitable for an automatic programming system.
... Genetic Programming [30], like many Artificial Intelligence techniques based on search, can be expensive in terms of computer resources. Indeed Koza in the first GP book [30] describes ways to speed it up [69]. ...
Preprint
Full-text available
We summarise how a 3.0 GHz 16 core AVX512 computer can interpret the equivalent of up to on average 1103370000000 GPop/s. Citations to existing publications are given. Implementation stress is placed on both parallel computing, bandwidth limits and avoiding repeated calculation. Information theory suggests in digital computing, failed disruption propagation gives huge speed ups as FDP and incremental evaluation can be used to reduce fitness evaluation time in phenotypically converged populations. Conversely FDP may be responsible for evolution stagnation. So the wider Evolutionary Computing, Artificial Life, Unconventional Computing and Software Engineering community may need to avoid deep nesting.
... This structure will be replicated for the construction of the GA applied in this paper, and the concepts already presented will be translated into certain operations within the programming language chosen for the creation of the tool that was Python. The choice of using the Python tool was motivated by the facilities that this programming language has, for being a free and open source software, and has been gaining a lot space in the specialized academic community (SHEPPARD, 2016;POLI et al., 2008). ...
Article
Full-text available
The research work described in this paper main aim to investigate the effectiveness of the Genetic Algorithm applied in solving a problem of Logistics Engineering. In addition, the paper presents a dense theory on the subject in order to contribute to researchers in this area. With this clear objectives, the composition of this paper will initially clarify the technical concepts used. Subsequently, the problem that will be solved as well as its modeling is presented, so that at the end an algorithm is presented, focusing on the construction of the logic of the algorithm, as well as the data obtained that prove the effectiveness tool as a way of solving the defined problem. The concepts behind the algorithm used here, are derived from the most recent studies on Artificial Intelligence and are based on biological studies of the theory of evolution and genetics.
Article
Full-text available
To solve constrained optimization problems (COPs), teaching learning-based optimization (TLBO) has been used in this study as a baseline algorithm. Different constraint handling techniques (CHTs) are incorporated in the framework of TLBO. The superiority of feasibility (SF) is one of the most commonly used and much effective CHTs with various decisive factors. The most commonly utilized decision-making factors in SF are a number of constraints violated (NCV) and weighted mean (WM) values for comparing solutions. In this paper, SF based on a number of constraints violated (NCVSF) and weighted mean (WMSF) is incorporated in the framework of TLBO and applied on CEC-2006 constrained benchmark functions. The use of a single factor for making the decision of the winner might be not a good idea. The combined use of NCV and WM factors in hybrid superiority of feasibility (HSF) has shown the dominating role of NCV over WM. We have employed NCVSF, WMSF, and HSF in the TLBO framework and suggested three constrained versions, namely NCVSF-TLBO, WMSF-TLBO, and HSF-TLBO. The performance of the proposed algorithms is evaluated upon CEC-2006 constrained benchmark functions. Among them, HSF-TLBO has shown better performance on most of the used constrained optimization problems in terms of proximity and diversity.
Article
The resource-constrained project scheduling problem (RCPSP) is one of the scheduling problems that belong to the class of NP-hard problems. Therefore, heuristic approaches are usually used to solve it. One of the most commonly used heuristic approaches are priority rules (PRs). PRs are easy to use, fast and able to respond to system changes, which makes them applicable in a dynamic environment. The disadvantage of PRs is that when applied in a static environment, they do not achieve results of the same quality as heuristic approaches designed for a static environment. Moreover, a new PR must be evolved separately for each optimization criterion, which is a challenging process. Therefore, recently significant effort has been put into the automatic development of PRs. Although PRs are mainly used in a dynamic environment, they are also used in a static environment in situations where speed and simplicity are more important than the quality of the obtained solution. Since PRs evolved for the dynamic environment do not use all the information available in a static environment, this paper analyzes two adaptations for evolving PRs for the RCPSP - iterative priority rules and rollout approach. This paper shows that these approaches achieve better results than the PRs evolved and used without these adaptations. The results of the approaches presented in the paper were also compared with the results obtained with the genetic algorithm as a representative of the heuristic approaches used mainly in the static environment.
Chapter
EFSMs provide a way to model systems with internal data variables. In situations where they do not already exist, we need to infer them from system behaviour. A key challenge here is inferring the functions which relate inputs, outputs, and internal variables. Existing approaches either work with white-box traces, which expose variable values, or rely upon the user to provide heuristics to recognise and generalise particular data-usage patterns. This paper presents a preprocessing technique for the inference process which generalises the concrete values from the traces into symbolic functions which calculate output from input, even when this depends on values not present in the original traces. Our results show that our technique leads to more accurate models than are produced by the current state-of-the-art and that somewhat accurate models can still be inferred even when the output of particular transitions depends on values not present in the original traces.
Article
Full-text available
In Part 1 of this report, we outlined a framework for creating an intelligent agent based upon modeling the large-scale functionality of the human brain. Building on those results, we begin Part 2 by specifying the behavioral requirements of a large-scale neurocognitive architecture. The core of our long-term approach remains focused on creating a network of neuromorphic regions that provide the mechanisms needed to meet these requirements. However, for the short term of the next few years, it is likely that optimal results will be obtained by using a hybrid design that also includes symbolic methods from AI/cognitive science and control processes from the field of artificial life. We accordingly propose a three-tiered architecture that integrates these different methods, and describe an ongoing computational study of a prototype "mini-Roboscout" based on this architecture. We also examine the implications of some non-standard computational methods for developing a neurocognitive agent. This examination included computational experiments assessing the effectiveness of genetic programming as a design tool for recurrent neural networks for sequence processing, and experiments measuring the speed-up obtained for adaptive neural networks when they are executed on a graphical processing unit (GPU) rather than a conventional CPU. We conclude that the implementation of a large-scale neurocognitive architecture is feasible, and outline a roadmap for achieving this goal.
Conference Paper
Full-text available
We present a novel algorithm for the one- dimension offline bin packing problem with discrete item sizes based on the notion of matching the item-size histogram with the bin-gap histogram. The approach is controlled by a constructive heuristic function which decides how to prioritise items in order to minimise the difference between histograms. We evolve such a function using a form of linear register-based genetic programming system. We test our evolved heuristics and compare them with hand-designed ones, including the well- known best fit decreasing heuristic. The evolved heuristics are human-competitive, generally being able to outperform high- performance human-designed heuristics.
Conference Paper
Full-text available
Tournament selection performs tournaments by first sam- pling individuals uniformly at random from the population and then se- lecting the best of the sample for some genetic operation. This sampling process needs to be repeated many times when creating a new genera- tion. However, even upon iteration, it may happen not to sample some of the individuals in the population. These individuals can therefore play no role in future generations. Under conditions of low selection pressure, the fraction of individuals not involved in any way in the selection process may be substantial. In this paper we investigate how we can model this process and we explore the possibility, methods and consequences of not generating and evaluating those individuals with the aim of increasing the efficiency of evolutionary algorithms based on tournament selection. In some conditions, considerable savings in terms of fitness evaluations are easily achievable, without altering in any way the expected behaviour of such algorithms.
Chapter
A generic, optimal feature extraction method using multi-objective genetic programming (MOGP) is presented. This methodology has been applied to the well-known edge detection problem in image processing and detailed comparisons made with the Canny edge detector. We show that the superior performance from MOGP in terms of minimizing the misclassification is due to its effective optimal feature extraction. Furthermore, to compare different evolutionary approaches, two popular techniques - PCGA and SPGA - have been extended to genetic programming as PCGP and SPGP, and applied to five datasets from the UCI database. Both of these evolutionary approaches provide comparable misclassification errors within the present framework but PCGP produces more compact transformations.
Chapter
Rheological structure-property models play a crucial role in the manufacturing and processing of polymers. Traditionally rheological models are developed by design of experiments that measure a rheological property as a function of the moments of molar mass distributions. These empirical models lack the capacity to apply to a wide range of distributions due the limited availability of experimental data. In recent years fundamental models were developed to satisfy a wider range of distributions, but they are in terms of variables not readily available during processing or manufacturing. Genetic programming can be used to bridge the gap between the practical, but limited, empirical models and the more general, but less practical, fundamental models. This is a novel approach of generating rheological models that are both practical and valid for a wide set of distributions. Keywordsgenetic programming-rheology-molar mass distribution
Conference Paper
Researchers wishing to create computational systems that themselves generate artworks face two interacting challenges. The first is that the standards by which artistic output is judged are notoriously difficult to quantify. The larger AI community is currently involved in a rich internal dialogue on methodolog- ical issues, standards, and rigor, and hence murkiness with re- gard to the assessment of output must be faced squarely. The second challenge is that any artwork exists within an extraordi- narily rich cultural and historical context, and it is rare that an artist who is ignorant of this context will produce acceptable works. In this paper we assert that these considerations argue for case-based AI/Art systems that take critical criteria as pa- rameters. We describe an example system that produces new bebop jazz melodies from a case-base of melodies, using ge- netic programming techniques and a fitness function based on user-provided critical criteria. We discuss the role that such techniques may play in future work on AI and the arts.
Conference Paper
We present an abstraction of the genetic algorithm (GA), termed population-based incremental learning (PBIL), that explicitly maintains the statistics contained in a GA's population, but which abstracts away the crossover operator and redefines the role of the population. This results in PBIL being simpler, both computationally and theoretically, than the GA. Empirical results reported elsewhere show that PBIL is faster and more effective than the GA on a large set of commonly used benchmark problems. Here we present results on a problem custom designed to benefit both from the GA's crossover operator and from its use of a population. The results show that PBIL performs as well as, or better than, GAs carefully tuned to do well on this problem. This suggests that even on problems custom designed for GAs, much of the power of the GA may derive from the statistics maintained implicitly in its population, and not from the population itself nor from the crossover operator.
Conference Paper
In real-time rendering, objects are represented using polygons or triangles. Triangles are easy to render and graphics hardware is highly optimized for rendering of triangles. Initially, the shading computations were carried out by dedicated hardwired algorithms for each vertex and then interpolated by the rasterizer. Todays graphics hardware contains vertex and pixel shaders which can be reprogrammed by the user. Vertex and pixel shaders allow almost arbitrary computations per vertex respectively per pixel. We have developed a system to evolve such programs. The system runs on a variety of graphics hardware due to the use of NVIDIA’s high level Cg shader language. Fitness of the shaders is determined by user interaction. Both fixed length and variable length genomes are supported. The system is highly customizable. Each individual consists of a series of meta commands. The resulting Cg program is translated into the low level commands which are required for the particular graphics hardware.