# Genetic Programming: On the Programming of Computers by Means of Natural Selection (Complex Adaptive Systems)

... The full method generates individuals in which each branch has a depth equal to the predefined value for maximum initial depth, while the grow method produces individuals with different shapes that also respect the predefined maximum initial depth. The ramped half-and-half (RHH) method is a mix of the full and the grow mechanisms, and initialises GP trees with depths within the range between the predefined values for minimum initial depth and maximum initial depth [8]. Sensible initialisation is the application of RHH to GE [9], which starts by identifying which production rules are recursive, and the minimum depth necessary to finish the mapping process starting with the related non-terminal symbol for each production rule. ...

... We have numerical operators to use with non-Boolean features and float numbers, and conditional branches to convert these results into Boolean enabling operations with Boolean features bringing to a binary result. Regarding the Boolean problems, we use the same function sets as Koza [8], which involves the operators AND, OR and NOT, and the IF function, which executes the IF-THEN-ELSE operation. ...

... We summarise in Figure 2 the information related to the regression problems using a style similar to [6,8], and Table 3 shows the hyperparameters used in all experiments. We chose these parameters according to the results of some initial runs. ...

GRAPE is an implementation of Grammatical Evolution (GE) in DEAP, an Evolutionary Computation framework in Python, which consists of the necessary classes and functions to evolve a population of grammar-based solutions, while reporting essential measures. This tool was developed at the Bio-computing and Developmental Systems (BDS) Research Group, the birthplace of GE, as an easy to use (compared to the canonical C++ implementation, libGE) tool that inherits all the advantages of DEAP, such as selection methods, parallelism and multiple search techniques, all of which can be used with GRAPE. In this paper, we address some problems to exemplify the use of GRAPE and to perform a comparison with PonyGE2, an existing implementation of GE in Python. The results show that GRAPE has a similar performance, but is able to avail of all the extra facilities and functionality found in the DEAP framework. We further show that GRAPE enables GE to be applied to systems identification problems and we demonstrate this on two benchmark problems.

... In particular, [14] were able to obtain an approximate solution of the Schrödinger equation by applying an approach known as grammatical evolution, thus a technique related to genetic programming whose output is a computer program. In 1988, Koza [15] introduced genetic programming and the application of this technique in optimization and search problems. Whether with a genetic algorithm or with genetic programming, the general protocol and the goal are always the same: to search for the optimal solution. ...

... Note how, in term (15), the numerator is calculated with the following formula: ...

... Once the Z(i) value has been calculated with (15), in the genetic algorithm, we will obtain the fitness value of chromosome i: (18) In expression (15), l is the x-axis length and a, b the real lower and upper bounds, respectively. Note that the optimization problem solved by the genetic algorithm is the minimization of the fitness function F satisfying the Schrödinger equation. ...

The Schrödinger equation is one of the most important equations in physics and chemistry and can be solved in the simplest cases by computer numerical methods. Since the beginning of the 1970s, the computer began to be used to solve this equation in elementary quantum systems, and, in the most complex case, a ‘hydrogen-like’ system. Obtaining the solution means finding the wave function, which allows predicting the physical and chemical properties of the quantum system. However, when a quantum system is more complex than a ‘hydrogen-like’ system, we must be satisfied with an approximate solution of the equation. During the last decade, application of algorithms and principles of quantum computation in disciplines other than physics and chemistry, such as biology and artificial intelligence, has led to the search for alternative techniques with which to obtain approximate solutions of the Schrödinger equation. In this work, we review and illustrate the application of genetic algorithms, i.e., stochastic optimization procedures inspired by Darwinian evolution, in elementary quantum systems and in quantum models of artificial intelligence. In this last field, we illustrate with two ‘toy models’ how to solve the Schrödinger equation in an elementary model of a quantum neuron and in the synthesis of quantum circuits controlling the behavior of a Braitenberg vehicle.

... In particular, [14] were able to obtain an approximate solution of the Schrödinger equation by applying an approach known as grammatical evolution, thus a technique related to genetic programming whose output is a computer program. In 1988 Koza [15] introduces genetic programming and the application of this technique in optimization and search problems. Whether with a genetic algorithm or with genetic programming, the general protocol and the goal are always the same: to search for the optimal solution. ...

... Once the Z(i) value has been calculated with (15), in the genetic algorithm we will obtain the fitness value of chromosome i: ...

... In expression (15) l is the x-axis length and a, b the real lower and upper bounds respectively. Note that the optimization problem solved by the genetic algorithm is the minimization of the fitness function F satisfying the Schrödinger equation. ...

The Schrodinger equation is one of the most important equations in physics and chemistry and can be solved in the simplest cases by computer numerical methods. Since the beginning of the 70s of the last century the computer began to be used to solve this equation in elementary quantum systems, e.g. and in the most complex case a hydrogen-like system. Obtaining the solution means finding the wave function, which allows predicting the physical and chemical properties of the quantum system. However, when a quantum system is more complex than a hydrogen-like system then we must be satisfied with an approximate solution of the equation. During the last decade the application of algorithms and principles of quantum computation in disciplines other than physics and chemistry, such as biology and artificial intelligence, has led to the search for alternative techniques with which to obtain approximate solutions of the Schrodinger equation. In this paper, we review and illustrate the application of genetic algorithms, i.e. stochastic optimization procedures inspired by Darwinian evolution, in elementary quantum systems and in quantum models of artificial intelligence. In this last field, we illustrate with two toy models how to solve the Schrodinger equation in an elementary model of a quantum neuron and in the synthesis of quantum circuits controlling the behavior of a Braitenberg vehicle.

... Genetic Programming refers to a class of evolutionary algorithms introduced by (Koza 1992) to evolve computer programs. Traditionally syntax trees have been used to represent programs. ...

... A polynomially-sized training set can be used to produce a solution with a polynomially small generalisation error. In this case we restrict the maximum size of the tree the RLS-GP algorithm will accept (as is common in applications of GP to avoid the rapid increase of program size without significant return in fitness, i.e., bloat (Koza 1992;Poli, Langdon, and McPhee 2008)), and compare the fitness of two solutions by sampling S rows from the complete truth table independently at random in each iteration. Theorem 19. ...

... Overall, defining an "easy" benchmark function to evaluate the generalisation capabilities of GP systems still remains an open problem. Benchmark functions where problems such as bloat and overfitting (Koza 1992) can be studied in further detail also need to be devised. Further directions for future work are to consider analyses of algorithms with more comprehensive terminal and function sets, along the way towards the analysis of more sophisticated and realistic population based GP systems. ...

Genetic Programming (GP) is a general purpose bio-inspired meta-heuristic for the evolution of computer programs. In contrast to the several successful applications, there is little understanding of the working principles behind GP. In this paper we present a performance analysis that sheds light on the behaviour of simple GP systems for evolving conjunctions of n variables (AND_n). The analysis of a random local search GP system with minimal terminal and function sets reveals the relationship between the number of iterations and the expected error of the evolved program on the complete training set. Afterwards we consider a more realistic GP system equipped with a global mutation operator and prove that it can efficiently solve AND_n by producing programs of linear size that fit a training set to optimality and with high probability generalise well. Additionally, we consider more general problems which extend the terminal set with undesired variables or negated variables. In the presence of undesired variables, we prove that, if non-strict selection is used, then the algorithm fits the complete training set efficiently while the strict selection algorithm may fail with high probability unless the substitution operator is switched off. In the presence of negations, we show that while the algorithms fail to fit the complete training set, the constructed solutions generalise well. Finally, from a problem hardness perspective, we reveal the existence of small training sets that allow the evolution of the exact conjunctions even in the presence of negations or of undesired variables.

... GP is an evolutionary computation technique in which a population of individuals (called a GP population here) is evolved stochastically to a population that is potentially better than the previous one, given a specific fitness function (performance measure) that determines the fitness (quality) of each individual [28]. It cannot guarantee optimality but is equipped with processes to avoid local minima/maxima in which deterministic methods potentially get trapped [9]. ...

... The GP settings used for the implementation of this approach are presented in Table 2, which were determined based on [28] and [6]. The function set includes four basic arithmetic operators. ...

... The number of generated offspring for each generation is assumed to be twice the size of the population (mu). To prevent bloating [28,36], the maximum tree depth is set to 10. The GP process is terminated when a maximum number of 50 generations is reached, which always converged in the conducted experiments. ...

The widespread and increased use of smartphones, equipped with the global positioning system (GPS), has facilitated the automation of travel data collection. Most studies on travel mode detection that used GPS data have employed hand-crafted features that may not have the capabilities to detect all complex travel behaviours since their performance is highly dependent on the skills of domain experts and may limit the performance of classifiers. In this study, a genetic programming (GP) approach is proposed to select and construct features for GPS trajectories. GP increased the macro-average of the F1-score from 77.3 to 80.0 in feature construction when applied to the GeoLife dataset. It could transform the decision tree into a competitive classifier with support vector machines (SVMs) and neural networks that are both able to extract high-level features. Simplicity, interpretability, and a relatively lower risk of overfitting allow the proposed model to be readily used for passive travel data collection even on smartphones with limited computational capacities. The model is validated by a second dataset from Australia and New Zealand, which indicated that a decision tree with the GP constructed features as its input has a considerably higher transferability than SVMs and neural networks.

... The third model was obtained using symbolic regression techniques as applied in genetic programming. Genetic programming is a random-based technique (Koza, 1992) for automatically learning computer programmes based on artificial evolution. It has been successfully used in many applications (Edwards, 2006;Barati et al., 2014). ...

... Based on high quality datasets, highly accurate prediction models can be obtained using symbolic regression techniques as applied in genetic programming. Genetic programming is a random-based technique (Koza, 1992) for automatically learning computer programmes based on artificial evolution. It has been successfully used in many applications (Edwards, 2006;Barati et al., 2014). ...

... Accordingly, a data-driven model was used to predict the effective voidage proposed by Kramer et al. (2020a,b) based on dimensionless numbers (Rep1Frp model). In addition, a model based on symbolic regression was considered (Koza, 1992). Minimum fluidisation prediction, fluidisation modelling details and graphs are given in the Supplementary material (Section 6). ...

In drinking water treatment plants, multiphase flows are a frequent phenomenon. Examples of such flows are pellet-softening and filter backwashing where liquid-solid fluidisation is applied. A better grasp of these fluidisation processes is needed to be able to determine optimal hydraulic states. In this research, models were developed, and experiments performed to gain such hydraulic knowledge. As a result, treatment processes can be made more flexible. In a rapidly changing environment, drinking water production must be flexible to ensure robustness and to tackle challenges related to sustainability and long-term changes. In the hydraulic models, the voidage in the fluidised bed and the particle size of the suspended granules are crucial variables. Voidage prediction is challenging as the fluidised bed is a dynamic environment showing highly heterogeneous behaviour that is hard to describe with an effective model. And particle size causes a conundrum due to the irregular shapes of the applied granules. Through the combination of hydraulic dimensionless Reynolds and Froude numbers, an accurate voidage prediction model has now been developed. With a straightforward pseudo-3D image analysis for non-spherical particles measuring particle mass and density, the dimensioned shapes of, for instance, ellipsoids can be determined. Particle shape factors included in models are not constant as is commonly believed, but dynamic. Applying advanced computational fluid dynamics simulations confirmed significant heterogeneous particle-fluid patterns in fluidised beds. Comprehensive sedimentation experiments showed that the average drag coefficient and terminal setting velocity of individual grains can be estimated reasonably well, but with a significant degree of data spread around the mean values. For engineering purposes, this is relevant information which should be taken into consideration. A new soft-sensor was designed to determine the voidage gradient and particle size profile in a fluidised bed. The expansion degree of highly erratic, polydisperse and porous granular activated carbon grains can be predicted with a model, but in full-scale processes the grains are subject to change, and therefore it is most likely that the prediction accuracy will deteriorate rapidly. For reliable drinking water quality, smart models provide solutions to complex challenges, but they are only effective when they are calibrated and validated in advanced pilot plants and are applied in full-scale processes with diligence and commitment on the part of multidisciplinary teams.

... A Genetic Programming (GP) method is an extension of a genetic algorithm (GA) that was introduced by Koza (1992). The main difference between GP and GA is the representation of the solution (Gandomi et al. 2011;Mousavi et al. 2012). ...

... The main difference between GP and GA is the representation of the solution (Gandomi et al. 2011;Mousavi et al. 2012). GP solutions are computer programs with hierarchical structures (Koza 1992;Gandomi et al. 2011;Mousavi et al. 2012), while GA solutions are strings of numbers (Mousavi et al. 2012). Hierarchical structures and the dynamic variability of computer programs are important features of GP (Koza 1992). ...

... GP solutions are computer programs with hierarchical structures (Koza 1992;Gandomi et al. 2011;Mousavi et al. 2012), while GA solutions are strings of numbers (Mousavi et al. 2012). Hierarchical structures and the dynamic variability of computer programs are important features of GP (Koza 1992). GP starts with a random population of computer programs. ...

Bushfire susceptibility mapping helps the government authorities predict and provide the required disaster management plans to reduce the adverse impacts from bushfires. In this paper, we investigated Gene Expression Programming (GEP) and ensemble methods to create bushfire susceptibility maps for Victoria, Australia, as a case study. Bushfire susceptibility maps indicate that the eastern part of Victoria where forests are predominant has the highest probability of bushfire. Western part of Victoria which is covered by cropland, shrubland and grassland has the lowest bushfire probability. Two ensemble methods, namely an ensemble of GEP and Frequency Ratio (GEPFR) and an ensemble of Logistic Regression and Frequency Ratio (LRFR), were proposed and compared with stand-alone GEP and stand-alone Frequency Ratio (FR) methods. The proposed methods were evaluated by Area Under Curve (AUC). AUCs of GEPFR, LRFR, GEP and FR are 0.860, 0.852, 0.850, and 0.840, respectively. It can be concluded that GEPFR outperforms the other three methods, and the ensemble methods outperform the stand-alone methods. GEPFR, LRFR and GEP produced the bushfire probability with an accuracy in the range of 90.79%−92.27%, and therefore they are equally useful for policy makers and managers to have better natural hazard management plans.

... Genetic Programming (GP) is an evolutionary learning technique in which computer programs are evolved over generations [5]. GP's functional structure lends itself well to be used as a mapping technique for manifold learning, and there has been some work combining the two [6]. ...

... GP is an evolutionary computation technique that represents solutions to a problem as evolvable computer programs [5]. GP begins with a population of randomly initialised individuals. ...

... The NRMSE fitness works by first using the GP individual to produce the low-dimensional embedding from the original data. The error between this embedding and the lowdimensional UMAP embedding can then be calculated, first by calculating the root mean square error (RMSE) as in Eq. (5). ...

... • Crossovers. Geometric crossovers occur often in practice and can be designed in a principled manner across different representations [100]; however, geometric crossovers are inherently limited to metric spaces and thus exclude useful non-geometric crossovers [103] such as Koza s subtree swap [86] or Davis s order [31]. By contrast, few crossover examples of recombination P-structures are known [51,150], basically multi-point string-based crossovers (e.g. ...

... The previous three families laid a basis for new EAs, whose differences often blurred. Koza proposed genetic programming (GP) [86] which, in plain words, teaches a computer how to automatically evolve 'intelligent' computer programs. GP differs from traditional EP, GAs and ESs in the representation of individuals as hierarchical tree-like structures of variable size, these representing programs syntax trees. ...

... However, Koza [86], Radcliffe and Surry [117,143] argue that indirect representations are unnecessary and prevent search operators from fully exploiting problem-specific knowledge compared with search operators defined on phenotypes directly. Whether direct or indirect representations are used, many agree [52,68,96,153] that EAs should be designed for a certain class of problems, to abandon the pure 'black box' setting where EAs make no assumptions about (thus not exploiting) the underlying structure of the problem at hand, in view of Wolpert and Macready s NFL theorems [155]. ...

Evolutionary algorithms (EAs) are randomised general-purpose strategies, inspired by natural evolution, often used for finding (near) optimal solutions to problems in combinatorial optimisation. Over the last 50 years, many theoretical approaches in evolutionary computation have been developed to analyse the performance of EAs, design EAs or measure problem difficulty via fitness landscape analysis. An open challenge is to formally explain why a general class of EAs perform better, or worse, than others on a class of combinatorial problems across representations. However, the lack of a general unified theory of EAs and fitness landscapes, across problems and representations, makes it harder to characterise pairs of general classes of EAs and combinatorial problems where good performance can be guaranteed provably. This thesis explores a unification between a geometric framework of EAs and elementary landscapes theory, not tied to a specific representation nor problem, with complementary strengths in the analysis of population-based EAs and combinatorial landscapes. This unification organises around three essential aspects: search space structure induced by crossovers, search behaviour of population-based EAs and structure of fitness landscapes. First, this thesis builds a crossover classification to systematically compare crossovers in the geometric framework and elementary landscapes theory, revealing a shared general subclass of crossovers: geometric recombination P-structures, which covers well-known crossovers. The crossover classification is then extended to a general framework for axiomatically analysing the population behaviour induced by crossover classes on associated EAs. This shows the shared general class of all EAs using geometric recombination P-structures, but no mutation, always do the same abstract form of convex evolutionary search. Finally, this thesis characterises a class of globally convex combinatorial landscapes shared by the geometric framework and elementary landscapes theory: abstract convex elementary landscapes. It is formally explained why geometric recombination P-structure EAs expectedly can outperform random search on abstract convex elementary landscapes related to low-order graph Laplacian eigenvalues. Altogether, this thesis paves a way towards a general unified theory of EAs and combinatorial fitness landscapes.
(Available online at the University of Exeter:
http://hdl.handle.net/10871/126174)

... La Programmation Génétique (GP) [Koza 1992] est une variante des algorithmes évolutionnaires et est au coeur de notre travail. Elle est basée sur la représentation arborescente, dans sa version standard, mais aussi linéaire (Linear GP) ou celle de graphes (Cartesian GP). ...

... La Programmation Génétique (GP) proposée par John Koza dans [Koza 1992] est un système d'inférence automatisée de programmes qui s'améliorent de manière itérative, aussi appelée automatic programming. Un programme est une expression structurée composée d'un nombre variable de symboles, qui doit pouvoir être interprétée ou compilée puis exécutée par une machine. ...

... C'est Cramer [Cramer 1985a] qui est le premier à avoir utilisé des structures arborescentes dans un algorithme génétique. Mais l'adoption de cette représentation pour définir la programmation génétique comme un nouvel algorithme évolutionnaire a été faite par John Koza en 1992 [Koza 1992]. Son objectif initial était de faire évoluer des sous-programmes du langage LISP (figure 1.4). ...

Dans cette thèse, nous étudions l'adaptation des Programmes Génétiques (GP) pour surmonter l'obstacle du volume de données dans les problèmes Big Data. GP est une méta‐heuristique qui a fait ses preuves pour les problèmes de classification. Néanmoins, son coût de calcul est un frein à son utilisation avec les larges bases d’apprentissage. Tout d'abord, nous effectuons une revue approfondie enrichie par une étude comparative expérimentale des algorithmes d'échantillonnage utilisés avec GP. Puis, à partir des résultats de l'étude précédente, nous proposons quelques extensions basées sur l'échantillonnage hiérarchique. Ce dernier combine des algorithmes d'échantillonnage actif à plusieurs niveaux et s’est prouvé une solution appropriée pour mettre à l’échelle certaines techniques comme TBS et pour appliquer GP à un problème Big Data (cas de la classification des bosons de Higgs). Par ailleurs, nous formulons une nouvelle approche d'échantillonnage appelée échantillonnage adaptatif, basée sur le contrôle de la fréquence d'échantillonnage en fonction du processus d'apprentissage, selon les schémas fixe, déterministe et adaptatif. Enfin, nous présentons comment transformer une implémentation GP existante (DEAP) en distribuant les évaluations sur un cluster Spark. Nous démontrons comment cette implémentation peut être exécutée sur des clusters à nombre de nœuds réduit grâce à l’échantillonnage. Les expériences montrent les grands avantages de l'utilisation de Spark pour la parallélisation de GP.

... In this paper, we apply Genetic Programming (GP) [5] for the automatic synthesis of a congestion window management protocol. We employ the NS3 simulator [6] to evaluate the effectiveness of the protocols evolved in a point-topoint WiFi scenario. ...

... In this work, we employ Genetic Programming (GP) [5] to evolve congestion control policies in the form of C++ programs. The function set is shown in Table 1, while the terminal set is shown in Table 2. ...

The Transmission Control Protocol (TCP) protocol, i.e., one of the most used protocols over networks, has a crucial role on the functioning of the Internet. Its performance heavily relies on the management of the congestion window, which regulates the amount of packets that can be transmitted on the network. In this paper, we employ Genetic Programming (GP) for evolving novel congestion policies, encoded as C++ programs. We optimize the function that manages the size of the congestion window in a point-to-point WiFi scenario, by using the NS3 simulator. The results show that, in the protocols discovered by GP, the Additive-Increase-Multiplicative-Decrease principle is exploited differently than in traditional protocols, by using a more aggressive window increasing policy. More importantly, the evolved protocols show an improvement of the throughput of the network of about 5%.KeywordsGenetic programmingNS3TCPNetwork protocols

... While NODE [4] (with a large body of follow up work) is perhaps the most prominent method to learn ODEs from data in black-box form, we focus on various works that infer governing laws in symbolic form. Classically, symbolic regression aims at regular functional relationships (mapping (x, f (x)) pairs to f instead of mapping trajectories (t, y(t)) to the governing ODEẏ = f (y, t)) and has been approached by heuristics-based search, most prominently via genetic programming [10]. Genetic programming randomly evolves a population of prospective mathematical expressions over many iterations and mimics natural selection by keeping only the best contenders across iterations, where superiority is measured by user-defined and problem-specific fitness functions. ...

... • testset-classic: To validate our approach on existing datasets we turn to benchmarks in the classic symbolic regression literature (inferring just the functional relationship between input-ouput pairs) and simply interpret functions as ODEs. In particular we include all scalar function listed in the overview in [15] which includes equations from many different benchmarks [9,10,11,22,26]. For example, we interpret the function f (y) = y 3 + y 2 + y from Uy et al. [22] as an autonomous ODEẏ(t) = f (y(t)) = y(t) 3 + y(t) 2 + y, which we solve numerically for a randomly sampled initial value as described before. ...

Natural laws are often described through differential equations yet finding a differential equation that describes the governing law underlying observed data is a challenging and still mostly manual task. In this paper we make a step towards the automation of this process: we propose a transformer-based sequence-to-sequence model that recovers scalar autonomous ordinary differential equations (ODEs) in symbolic form from time-series data of a single observed solution of the ODE. Our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing laws of a new observed solution in a few forward passes of the model. Then we show that our model performs better or on par with existing methods in various test cases in terms of accurate symbolic recovery of the ODE, especially for more complex expressions.

... Although several numerical implementations of SR have been published over past decades, it is possible to distinguish two main approaches of completely different natures. One makes use of a population-based heuristic method known as Genetic Programming (GP) [23]. An example of the other approach, based on deterministic arguments, is known as Fast Function Extraction (FFX) [24]. ...

... Throughout this work we make use of a GP-based SR. Although the basic principles of GP can be found elsewhere [23,25], for the sake of completeness we will briefly describe them in the following paragraphs. ...

In this contribution we explore the possibilities and limitations of symbolic regression as an alternative to the approaches currently used to characterize the dispersive behavior of a given material. To this end, we make use of genetic programming to retrieve, from either ellipsometric or spectral data, closed-form expressions that model the optical properties of the materials studied. In a first stage we consider transparent dielectrics for our numerical experiments. Next we increase the complexity of the problem and consider absorbing dielectrics, which not only require the use of complex functions to model their dielectric function, but also imply a supplementary constraint imposed by the verification of the causality principle.

... As the space of possible expressions is vast, most of the existing work focuses on developing optimization algorithms. Genetic Programming [17] has been widely used for that task [36]. A different strategy has been employed in AI Feynman [41,42] that uses neural networks to reduce the search space by identifying simplifying properties like symmetry or separability. ...

... We use B-Splines [9] as the testing functions and we estimate the fields in Step 2 of D-CIPHER with a Gaussian Process [45]. The outer optimization in Step 3 is performed using a modified genetic programming algorithm [17] and the inner optimization by CoLLie (Section 6). ...

Closed-form differential equations, including partial differential equations and higher-order ordinary differential equations, are one of the most important tools used by scientists to model and better understand natural phenomena. Discovering these equations directly from data is challenging because it requires modeling relationships between various derivatives that are not observed in the data (\textit{equation-data mismatch}) and it involves searching across a huge space of possible equations. Current approaches make strong assumptions about the form of the equation and thus fail to discover many well-known systems. Moreover, many of them resolve the equation-data mismatch by estimating the derivatives, which makes them inadequate for noisy and infrequently sampled systems. To this end, we propose D-CIPHER, which is robust to measurement artifacts and can uncover a new and very general class of differential equations. We further design a novel optimization procedure, CoLLie, to help D-CIPHER search through this class efficiently. Finally, we demonstrate empirically that it can discover many well-known equations that are beyond the capabilities of current methods.

... Nevertheless, this is still an open issue when it comes to Evolutionary Algorithms such as CGPANN, even though Genetic Programs are known for being naturally parallel. As pointed out in [23], the main approaches are the data-parallel, by evaluating fitness cases in parallel, and the population parallel approach, through the evaluation of GP programs in parallel. ...

... In general, the evaluation of the fitness corresponds to most of the computational effort spent by the algorithm, while the genetic operations (such as mutation and selection) do not contribute significantly to the computational cost. Considering this, Koza [23] points out two main approaches to calculate fitness in parallel: at the individual level, or at the level of fitness cases, where each individual is executed sequentially, but in parallel over the dataset, followed by the accumulation of partial results. ...

... Many different GI frameworks and toolkits have been developed and used, often using variations in their core search process. Nowadays GI search processes are based on either genetic programming (GP) [12]- [14], with, for example, the GISMOE framework [4], or stochastic local search [15], with, for example, the recent PyGGI [16]- [18] or Gin [19], [20] frameworks. However, though literature search strategies have proven themselves to be effective in practice, they have not yet been empirically compared and analysed. ...

... Genetic programming (GP) [12] has been used in GI since the inception of the field [29], and since in most of GI work both for the improvement of functional and non-functional properties. In contrast to random search, which was a strict exploratory procedure, GI simultaneously evolves a population of solutions, using both mutation and crossover. ...

Genetic improvement uses automated search to improve existing software. It has been successfully used to optimise various program properties, such as runtime or energy consumption, as well as for the purpose of bug fixing. Genetic improvement typically navigates a space of thousands of patches in search for the program mutation that best improves the desired software property. While genetic programming has been dominantly used as the search strategy, more recently other search strategies, such as local search, have been tried. It is, however, still unclear which strategy is the most effective and efficient. In this paper, we conduct an in-depth empirical comparison of a total of 18 search processes using a set of 8 improvement scenarios. Additionally, we also provide new genetic improvement benchmarks and we report on new software patches found. Our results show that, overall, local search approaches achieve better effectiveness and efficiency than genetic programming approaches. Moreover, improvements were found in all scenarios (between 15% and 68%). A replication package can be found online: https://github.com/bloa/tevc _2020 artefact.

... EA, including Genetic Algorthims (GA) [25], Genetic Programming (GP) [26], Evolution Strategy (ES) [27], are almost identical in basic evolution steps. Neuroevolution researches mainly focus on implementation details of evolution procedure, especially the two most important issues: encoding scheme and evolution operations [28]. ...

Neuroevolution has greatly promoted Deep Neural Network (DNN) architecture design and its applications, while there is a lack of methods available across different DNN types concerning both their scale and performance. In this study, we propose a self-adaptive neuroevolution (SANE) approach to automatically construct various lightweight DNN architectures for different tasks. One of the key settings in SANE is the search space defined by cells and organs self-adapted to different DNN types. Based on this search space, a constructive evolution strategy with uniform evolution settings and operations is designed to grow DNN architectures gradually. SANE is able to self-adaptively adjust evolution exploration and exploitation to improve search efficiency. Moreover, a speciation scheme is developed to protect evolution from early convergence by restricting selection competition within species. To evaluate SANE, we carry out neuroevolution experiments to generate different DNN architectures including convolutional neural network, generative adversarial network and long short-term memory. The results illustrate that the obtained DNN architectures could have smaller scale with similar performance compared to existing DNN architectures. Our proposed SANE provides an efficient approach to self-adaptively search DNN architectures across different types.

... Our evolutionary computation for program synthesis differs from genetic programming [9] or evolutionary programming [1]: we did not directly apply simulated evolution to programs, but our framework improves the search mechanism for deriving correct programs through evolution. We take this approach to take the best of both worlds: the correctness of resulting programs guaranteed by the deductive synthesis and its certification tool, and the search heuristics enhanced through evolutionary computation. ...

A deductive program synthesis tool takes a specification as input and derives a program that satisfies the specification. The drawback of this approach is that search spaces for such correct programs tend to be enormous, making it difficult to derive correct programs within a realistic timeout. To speed up such program derivation, we improve the search strategy of a deductive program synthesis tool, SuSLik, using evolutionary computation. Our cross-validation shows that the improvement brought by evolutionary computation generalises to unforeseen problems.

... Algoritmos evolutivos são naturalmente paralelizáveis, uma vez que a população de soluções candidatas de cada iteração/geração pode ser avaliada simultaneamente, já que não existe dependência entre as avaliações dos indivíduos. A avaliação engloba a maior parte do esforço computacional desse tipo de método e existem duas abordagens para realizá-la em paralelo [Koza 1992]: (i) paralelizando a avaliação completa do indivíduo, para que essa aconteça simultaneamente para vários indivíduos, e (ii) ao nível das instâncias, onde cada indivíduoé executado sequencialmente, mas em paralelo sobre o conjunto de dados. ...

A inferência de Redes de Regulação Gênica (GRNs) é importante em Biologia Sistêmica, pois permite o entendimento de padrões de interações entre genes. Essas descobertas sãoúteis para fornecer compreensão sobre doenças e ajudar no desenvolvimento de fármacos. Técnicas de computação evolutiva, como a Programação Genética Cartesiana (CGP), têm sido utilizadas para inferir GRNs com resultados promissores. Entretanto, a CGP tem problemas de escalabilidade. Aqui, GRNs são inferidas de forma eficiente usando abordagens de computação de alto desempenho. Experimentos computacionais mostram que o método desenvolvido nesta iniciação científica é capaz de inferir GRNs mais rapidamente do que outros da literatura com soluções simbólicas. O ganho em tempo de processamento da técnica paralela apresentada em relação ao formato sequencial é de até 104%.

... Using these machine learning methods, the effective knowledge of a system can also be distinguished [32]. A variety of AR algorithms have been introduced by different researchers, including genetic programming (GP) [33], artificial bee colony programming (ABCP) [34], gene expression programming (GEP) [35], artificial bee colony expression programming (ABCEP) [35], biogeography-based programming (BBP) [36], and artificial immune system programming [37]. The main difference between these algorithms stems from their nature. ...

The use of recycled aggregate concrete (RAC) in the construction industry can help to prevent irreparable environmental damages and to mitigate the depletion rate of natural resources. However, the quality of the RAC should be investigated before its practical applications. Compressive strength of the RAC (fRAC′) is one of the most crucial design parameters, which is measured by time-consuming and cost-extensive experiments. One solution to restrict the number of experiments and achieve reliable fRAC′ estimation is through employing machine learning methods Artificial Bee Colony Expression Programming (ABCEP) is a newly proposed automatic regression technique that is used in this study to predict the fRAC′. For comparison purposes four extensions of artificial bee colony programming techniques (i.e., Artificial Bee Colony Programming (ABCP), quick Artificial Bee Colony Programming (qABCP), Quick semantic Artificial Bee Colony Programming (qsABCP), and Semantic Artificial Bee Colony Programming (sABCP)) were served as well. To analyse the results, the average and best performances of all algorithms, regression analysis, execute run times, Wilcoxon signed-rank test, and the behavior of algorithms dealing with the local optima were investigated. The results show that the ABCEP method is the most effective technique, with the average root mean squared error of 10.36 MPa compared to 16.23 MPa, 10.82 MPa, 17.71 MPa, and 14.20 MPa for the developed ABCP, qABCP, qsABCP, and sABCP, respectively, in colony size of 30. In addition, the run time of this algorithm is remarkably less than other developed algorithms.

... Genetic Programming (GP) makes use of a population of individuals, each of which is represented by a tree structure that constitutes a mathematical equation that defines the connection between the output Y and a set of input variables a i (i = 1,2, . . . , j) (Koza, 2003). Based on these concepts, MGGP generalizes GP by defining each individual as a structure of trees, often known as genes, that receives a i and attempts to forecast Y, as shows in Fig. 5. ...

Stability is a primary requirement of the electrical power system for its flawless, secure, and economical operation. Low-frequency oscillations (LFOs), commonly seen in interconnected power systems, initiate the possibility of instability and, therefore, require sophisticated care to deal with. This paper proposes an original approach to tuning the parameters of the power system stabilizer (PSS), which plays a crucial role in the power system networks to dampen unwanted oscillations. The ensemble method combines multiple machine learning techniques and has been used for tuning the PSS parameters in real-time for two PSS-connected power system networks. The first system is a single-machine infinite bus power system, while the second is a unified power flow controller (UPFC) device. The backtracking search algorithm (BSA) based proposed ensemble model is formed by combining three machine learning (ML) techniques, namely the extreme learning machine (ELM), neurogenetic (NG) system, and multi-gene genetic programming (MGGP). To validate the stability of the network, Eigenvalues, well-recognized statistical parameters, and minimum damping ratios were analyzed, besides the time-domain simulation results. Furthermore, results for various loading conditions were prepared to check the robustness of the proposed model. A comparative study of the proposed approach with NG, ELM, MGGP models, and two reference cases along with the conventional method will validate the superiority of the employed ML approach.

... Step 2: The quality of each food source is assessed using a fitness measuring approach, which assesses how well the discovered food sources suit the target (Koza, 2003). In this case, a metric called raw fitness is defined as the sum of the fitness errors between anticipated results and experimental values, with a value of zero being preferable. ...

Waste from concrete demolition is a sustainability concern that can be mitigated when used as recycled aggregate in concrete instead of virgin natural aggregates. However, the durability of recycled aggregate concrete (RAC), including concrete carbonation, needs to be investigated before the widespread applications of RAs in construction. Developing artificial intelligence-based predictive models for estimating the carbonation depth of RAC using the available data can reduce the need for experimental studies to generate reliable models for the service life assessment of concrete structures. In this study, artificial bee colony expression programming (ABCEP), as a novel branch of automatic regression technique, was used to predict the carbonation depth of RAC from a large dataset consisting of 655 data samples. Several ABCEP architectures were developed, different analyses were conducted, and a comparison study between the best ABCEP model and previous models published in the literature was conducted. The findings show that the best structure of the ABCEP model could estimate the carbonation depth of RAC with a reasonable root mean square error of 3.33 mm. The exposure time was the most influential parameter affecting the carbonation depth of RAC. Furthermore, the ABCEP model could outperform the previous models, despite the larger unknown dataset used to test its performance.

... Genetic Programming (GP) examines the automatic generation of computer programs, inspired by the theory of evolution. The initial representation of GP was in a tree from [21]. CGP [24] is a flavor of GP with approximately 20 years of interesting and varied research works addressing a wide range of problem domains. ...

A novel approach to induce Fuzzy Pattern Trees using Grammatical Evolution is presented in this paper. This new method, called Fuzzy Grammatical Evolution, is applied to a set of benchmark classification problems. Experimental results show that Fuzzy Grammatical Evolution attains similar and oftentimes better results when compared with state-of-the-art Fuzzy Pattern Tree composing methods, namely Fuzzy Pattern Trees evolved using Cartesian Genetic Programming, on a set of benchmark problems. We show that, although Cartesian Genetic Programming produces smaller trees, Fuzzy Grammatical Evolution produces better performing trees. Fuzzy Grammatical Evolution also benefits from a reduction in the number of necessary user-selectable parameters, while Cartesian Genetic Programming requires the selection of three crucial graph parameters before each experiment. To address the issue of bloat, an additional version of Fuzzy Grammatical Evolution using parsimony pressure was tested. The experimental results show that Fuzzy Grammatical Evolution with this extension routinely finds smaller trees than those using Cartesian Genetic Programming without any compromise in performance. To improve the performance of Fuzzy Grammatical Evolution, various ensemble methods were investigated. Boosting was seen to find the best individuals on half the benchmarks investigated.

... On the other hand, traditional regression imposes a single fixed model structure during training, frequently chosen to be expressive (e.g., neural network, random forest, etc.) at the expense of being easily interpretable. Because SR is believed to be an NP-hard problem [54], evolutionary methods have been developed to obtain approximate solutions [32,31,1,42]. The symbolic regression challenge has recently regained popularity, and novel approaches combining classical genetic programming and modern deep reinforcement learning have emerged [45,36,44,52,53]. ...

This paper seeks to answer the following question: "What can we learn by predicting accuracy?" Indeed, classification is one of the most popular task in machine learning and many loss functions have been developed to maximize this non-differentiable objective. Unlike past work on loss function design, which was mostly guided by intuition and theory before being validated by experimentation, here we propose to approach this problem in the opposite way : we seek to extract knowledge from experiments. This data-driven approach is similar to that used in physics to discover general laws from data. We used a symbolic regression method to automatically find a mathematical expression that is highly correlated with the accuracy of a linear classifier. The formula discovered on more than 260 datasets has a Pearson correlation of 0.96 and a r2 of 0.93. More interestingly, this formula is highly explainable and confirms insights from various previous papers on loss design. We hope this work will open new perspectives in the search for new heuristics leading to a deeper understanding of machine learning theory.

... It might be the case that there exists some sort of "critical mass" of the controller such that, after a local optimum in the fitness space is reached, evolution keeps adding redundant genetic material (i.e., neurons and edges that do not necessarily contribute to the locomotion skills of the VSR). Since we do not disfavor in any way the addition of new edges and nodes that do not modify the functionality of the overall controller, we can speculate we are witnessing an instance of the bloat phenomenon (Silva & Costa, 2008) observed in the evolution of computer programs (also known as genetic programming, see Koza (1993)). It has been argued that bloat is beneficial to the individual as it provides a buffer against the deleterious effects of mutation and recombination (López et al., 2011). ...

Modularity is a desirable property for embodied agents, as it could foster their suitability to different domains by disassembling them into transferable modules that can be reassembled differently. We focus on a class of embodied agents known as voxel-based soft robots (VSRs). They are aggregations of elastic blocks of soft material; as such, their morphologies are intrinsically modular. Nevertheless, controllers used until now for VSRs act as abstract, disembodied processing units: Disassembling such controllers for the purpose of module transferability is a challenging problem. Thus, the full potential of modularity for VSRs still remains untapped. In this work, we propose a novel self-organizing, embodied neural controller for VSRs. We optimize it for a given task and morphology by means of evolutionary computation: While evolving, the controller spreads across the VSR morphology in a way that permits emergence of modularity. We experimentally investigate whether such a controller (i) is effective and (ii) allows tuning of its degree of modularity, and with what kind of impact. To this end, we consider the task of locomotion on rugged terrains and evolve controllers for two morphologies. Our experiments confirm that our self-organizing, embodied controller is indeed effective. Moreover, by mimicking the structural modularity observed in biological neural networks, different levels of modularity can be achieved. Our findings suggest that the self-organization of modularity could be the basis for an automatic pipeline for assembling, disassembling, and reassembling embodied agents.

... Genetic algorithms (GA) [55] and genetic programming (GP) [56] are two evolutionary computation techniques [57] that consist of methods for solving multi-objective optimization problems, where a "population" of solutions, defined as "individuals", is evolved through a series of "generations" in a way inspired by the Darwinian theory of evolution. A population is a list of individuals, a generation is an iteration of the optimization (evolutionary) process and a multi-objective fitness function is a set of m ∈ N real functions g i : S for i ∈ {1, . . . ...

Bot accounts are automated software programs that act as legitimate human profiles on social networks. Identifying these kinds of accounts is a challenging problem due to the high variety and heterogeneity that bot accounts exhibit. In this work, we use genetic algorithms and genetic programming to discover interpretable classification models for Twitter bot detection with competitive qualitative performance, high scalability, and good generalization capabilities. Specifically, we use a genetic programming method with a set of primitives that involves simple mathematical operators. This enables us to discover a human-readable detection algorithm that exhibits a detection accuracy close to the top state-of-the-art methods on the TwiBot-20 dataset while providing predictions that can be interpreted, and whose uncertainty can be easily measured. To the best of our knowledge, this work is the first attempt at adopting evolutionary computation techniques for detecting bot profiles on social media platforms.

... Genetic Programming (GP) (Koza (1992)) has been applied with success to complex realworld problems, becoming very popular for the automatic generation of programs, such as a mathematical expression. Grammatical Evolution (GE) (Ryan et al. (1998); O'Neill and Ryan (2001)) is a GP variant which uses a formal grammar to define the syntax of the language and a binary string to represent the candidate solutions. ...

There are situations in engineering where one is faced with the problem of inferring a mathematical model from a set of observed data. In its simplest form, the model is defined as a linear combination of basis functions, and their numerical coefficients are computed by minimizing the error. This technique requires a strong a priori domain knowledge so that an adequate model structure is chosen. In order to acquire a better understanding in more complex situations it is often necessary to find a model that fits the given data without making any prior assumptions about model structure. Grammar-based immune programming (GIP), a genetic programming variant, is able to perform symbolic regression, where the structure of the model is no longer pre-defined by the analyst but rather built from a given set of primitives via an evolutionary process. Here, GIP is applied to the determination of a symbolic expression for one component of the strain tensor of a locally deformed pipe as a function of the geometry of the pipe and its deformed region. This component value combined with the other components (evaluated by other means) may be used to validate the deformed pipeline operation or to recommend its repair.

... We implemented genetic programming (GP) [26] search on top of an existing genetic improvement framework, PyGGI [27]. Chromosomes are patches to the AST. ...

Maintaining confidential information control in soft-ware is a persistent security problem where failure means secrets can be revealed via program behaviors. Information flow control techniques traditionally have been based on static or symbolic analyses — limited in scalability and specialized to particular languages. When programs do leak secrets there are no approaches to automatically repair them unless the leak causes a functional test to fail. We present our vision for HyperGI, a genetic improvement framework that detects, localizes and repairs information leakage. Key elements of HyperGI include (1) the use of two orthogonal test suites, (2) a dynamic leak detection approach which estimates and localizes potential leaks, and (3) a repair component that produces a candidate patch using genetic improvement. We demonstrate the successful use of HyperGI on several programs with no failing functional test cases. We manually examine the resulting patches and identify trade-offs and future directions for fully realizing our vision.

... The other family of methods belongs to the Genetic Programming (GP) (Koza 1992) field. Unlike the previous methods, the representation is based on a hierarchical graph, usually representing a mathematical expression. ...

Graphic design is the process of creating graphics to meet specific commercial needs based on knowledge of layout principles and esthetic concepts. This is usually an iterative trial and error process which requires a lot of time even for expert designers. This expert knowledge can be modelled, represented and used by a computer to perform design activities. This paper describes a novel approach named Gaudii (standing for "Intelligent Automated Graphic Design Generator") which utilizes principles and techniques known from the fields of Evolutionary Computation and Fuzzy Logic to automatically obtain design elements. Experimental results that demonstrate the potential of the proposed approach are presented in the area of poster design.

... We implemented genetic programming (GP) [26] search on top of an existing genetic improvement framework, PyGGI [27]. Chromosomes are patches to the AST. ...

Maintaining confidential information control in software is a persistent security problem where failure means secrets can be revealed via program behaviors. Information flow control techniques traditionally have been based on static or symbolic analyses -- limited in scalability and specialized to particular languages. When programs do leak secrets there are no approaches to automatically repair them unless the leak causes a functional test to fail. We present our vision for HyperGI, a genetic improvement framework tha detects, localizes and repairs information leakage. Key elements of HyperGI include (1) the use of two orthogonal test suites, (2) a dynamic leak detection approach which estimates and localizes potential leaks, and (3) a repair component that produces a candidate patch using genetic improvement. We demonstrate the successful use of HyperGI on several programs which have no failing functional tests. We manually examine the resulting patches and identify trade-offs and future directions for fully realizing our vision.

... Accordingly, a data-driven model was used to predict the effective voidage proposed by Kramer et al. [66] based on dimensionless numbers (Rep1Frp model). In addition, a model based on symbolic regression was considered [82]. Minimum fluidisation prediction, fluidisation modelling details and graphs are given in the Supplementary materials (Section 6). ...

Granular activated carbon (GAC) filtration is an important unit operation in drinking water treatment. GAC filtration is widely used for its filtration and adsorption capabilities as a barrier for undesired organic macro- and micro-pollutants. GAC filtration consists of two successive phases: adsorption and filtration, capturing the impurities from the water in conjunction with a backwash procedure in which the suspended particles are flushed out of the system. Available literature predominantly focusses on adsorption. A less frequently discussed but nevertheless equally crucial aspect of this operation is the backwash procedure of GAC beds. To prevent accumulation of suspended particles and to avoid additional operation costs, optimal backwashing is required. Another factor is sustainability: water utilities are showing increasing interest in exploring new sustainable GAC media. As these have different bed expansion tendencies due to different GAC characteristics with varying geometries, operational developments are needed for prediction models to estimate the expansion degree during backwashing. The prediction of the bed expansion of GAC is complex as the particles are non-spherical, porous and polydisperse. Through a combination of advanced particle laboratory and fluidisation experiments, we demonstrate a new approach which leads to an improved expansion prediction model for the backwashing of GAC filters.

... The GA module implemented in the ENIIGMA fitting tool is build on Pyevolve 4 (Perone 2009), an open-source and extensible library dedicated to perform evolutionary computation in Python programming language. The GA computation is composed of three main characteristics, namely, generation of random population of probable solutions, fitness-oriented to evaluate the population, and variation-driven to improve the next population (Holland 1975;Koza 1992). The population follows the chromosome-like structure, in which the vector w i j ∈ R m×n is given by: ...

Context. A variety of laboratory ice spectra simulating different chemical environments, ice morphology as well as thermal and energetic processing are demanded to provide an accurate interpretation of the infrared spectra of protostars. To answer which combination of laboratory data best fit the observations, an automated statistically-based computational approach becomes necessary. Aims. To introduce a new approach, based on evolutionary algorithms, to search for molecules in ice mantles via spectral decomposition of infrared observational data with laboratory ice spectra. Methods. A publicly available and open-source fitting tool, called ENIIGMA (dEcompositioN of Infrared Ice features using Genetic Modelling Algorithms), is introduced. The tool has dedicated Python functions to carry out continuum determination of the protostellar spectra, silicate extraction, spectral decomposition and statistical analysis to calculate confidence intervals and quantify degeneracy. As an assessment of the code, several tests were conducted with known ice samples and constructed mixtures. A complete analysis of the Elias 29 spectrum was performed as well. Results. The ENIIGMA fitting tool can identify the correct ice samples and their fractions in all checks with known samples tested in this paper. Concerning the Elias 29 spectrum, the broad spectral range between 2.5-20 $\mu$m was successfully decomposed after continuum determination and silicate extraction. This analysis allowed the identification of different molecules in the ice mantle, including a tentative detection of CH$_3$CH$_2$OH. Conclusions. The ENIIGMA is a toolbox for spectroscopy analysis of infrared spectra that is well-timed with the launch of the James Webb Space Telescope. Additionally, it allows for exploring the different chemical environments and irradiation fields in order to correctly interpret astronomical observations.

... Evolutionary algorithms: evolutionary (populationbased) algorithms mimic the behaviour of evolution in nature, such as the Genetic Algorithm by Holland et al. (1992) and Genetic Programming and Biogeography-Based Optimisers proposed by Simon (2008). ...

This work investigates the use of the Moth-Flame Optimisation (MFO) algorithm in solving the Permutation Flow Shop Scheduling Problem and proposes further optimisations. MFO is a population-based approach that simulates the behaviour of real moths by exploiting the search space randomly without employing any local searches that may stick in local optima. Therefore, we propose a Hybrid Moth Optimisation Algorithm (HMOA) that is embedded within a local search to better exploit the search space. HMOA entails employing three search procedures to intensify and diversify the search space in order to prevent the algorithm's becoming trapped in local optima. Furthermore, HMOA adaptively selects the search procedure based on improvement ranks. In order to evaluate the performances of MFO and HMOA, we perform a comparison against other approaches drawn from the literature. Experimental results demonstrate that HMOA is able to produce better-quality solutions and outperforms many other approaches on the Taillard benchmark.

... Developing GP systems that facilitate the evolution of modular program architectures has long captured the attention of the genetic programming community. Koza introduced automatically defined functions (ADFs) where callable functions can evolve as separate branches of GP syntax trees [34,35]. Angeline and Pollack developed compression and expansion genetic operators to automatically modularize existing code into libraries of parameterized subroutines [4]. ...

We introduce and experimentally demonstrate the utility of tag-based genetic regulation, a new genetic programming (GP) technique that allows programs to dynamically adjust which code modules to express.Tags are evolvable labels that provide a flexible mechanism for referencing code modules. Tag-based genetic regulation extends existing tag-based naming schemes to allow programs to “promote” and “repress” code modules in order to alter expression patterns. This extension allows evolution to structure a program as a gene regulatory network where modules are regulated based on instruction executions. We demonstrate the functionality of tag-based regulation on a range of program synthesis problems. We find that tag-based regulation improves problem-solving performance on context-dependent problems; that is, problems where programs must adjust how they respond to current inputs based on prior inputs. Indeed, the system could not evolve solutions to some context-dependent problems until regulation was added. Our implementation of tag-based genetic regulation is not universally beneficial, however. We identify scenarios where the correct response to a particular input never changes, rendering tag-based regulation an unneeded functionality that can sometimes impede adaptive evolution. Tag-based genetic regulation broadens our repertoire of techniques for evolving more dynamic genetic programs and can easily be incorporated into existing tag-enabled GP systems.

... Accurate expansion experiments [9] combined with symbolic regression techniques [40] have provided an empirical data-driven model to predict the voidage as a function of the fluid and particle properties. This same approach was applied to predict the (spherical) particle size conversely as a function of the superficial fluid velocity, kinematic viscosity, particle density and measured voidage following Eq. ...

Liquid-solid fluidisation is frequently encountered in drinking water treatment processes, often to obtain a large liquid-solid interfacial surface area. A large surface area is crucial for optimal seeded crystallisation in full-scale softening reactors. Due to crystallisation, particles grow and migrate to a lower zone in the reactor which leads to a stratified bed. Larger particles adversely affect the surface area. To maintain optimal process conditions in the fluidised beds, information is needed about the distribution of particle size, local voidage and available surface area, over the reactor height.
In this work, a sensor is developed to obtain the hydraulic state gradient, based on Archimedes’ principle. A cylindrical heavy object is submerged in the fluidised bed and lowered gradually while its weight is measured at various heights using a sensitive force measuring device.
Based on accurate fluidisation experiments with calcite grains, the voidage is determined and a straightforward empirical model is developed to estimate the particle size as a function of superficial fluid velocity, kinematic viscosity, suspension density, voidage and particle density. The surface area and specific space velocity can be estimated accordingly, which represent key performance indicators regarding the hydraulic state of the fluidised bed reactor. The prediction error for voidage is 5 ± 2 % and for particle size 9 ± 4 %.
The newly developed soft sensor is a more time-effective method for obtaining the hydraulic state in full-scale liquid-solid fluidised bed reactors.

... As a powerful technique for evolving programs, genetic programming (GP) [11] provides an effective framework for model revision. GP has been successfully applied to real-world problems in various fields [12]- [14], and has the theoretical advantage that the output is interpretable, unlike blackbox models. ...

Modeling real-world phenomena is a focus of many science and engineering efforts, such as ecological modeling and financial forecasting, to name a few. Building an accurate model for complex and dynamic systems improves understanding of underlying processes and leads to resource efficiency. Towards this goal, knowledge-driven modeling builds a model based on human expertise, yet is often suboptimal. At the opposite extreme, data-driven modeling learns a model directly from data, requiring extensive data and potentially generating overfitting. We focus on an intermediate approach, model revision, in which prior knowledge and data are combined to achieve the best of both worlds. In this paper, we propose a genetic model revision framework based on tree-adjoining grammar (TAG) guided genetic programming (GP), using the TAG formalism and GP operators in an effective mechanism to incorporate prior knowledge and make data-driven revisions in a way that complies with prior knowledge. Our framework is designed to address the high computational cost of evolutionary modeling of complex systems. Via a case study on the challenging problem of river water quality modeling, we show that the framework efficiently learns an interpretable model, with higher modeling accuracy than existing methods.

... Under an assumption of continuity, symbolic equations can predict dynamic behavior of the system in regions of the state space nearby the operating point used for training data collection. Automated generation of symbolic equations from data was first implemented in [3] and [4] using genetic programming [5]. These implementations were successful, subject to overfitting and scaling limitations. ...

Recent progress in sparse regression has enabled the construction of nonlinear dynamic models from data. This paper presents a method for discovering governing dynamic equations of data using Least Angle Regression (LARS) with an orthogonalization step. A library of candidate symbolic expressions is evaluated using measurements of the state components to create candidate covariates, which are vectors onto which the data is regressed. LARS is applied to select and appropriately scale covariates to fit time-derivative measurements of the state components. The symbolic expressions that correspond to the selected covariates are similarly weighted and summed to create a model of the system's dynamics. If the candidate covariate vectors are not orthogonal, performing Singular Value Decomposition of the matrix containing the covariates provides right-singular vectors that can serve as an orthogonal basis for LARS. The three-variable Lorenz system and two-vortex flow models provide test cases with known dynamics. The proposed model-construction method successfully recovers the Lorenz equations from data and identifies the governing equations of motion for two vortices advecting in a background potential flow.

... In general, nature-inspired-based algorithms can be categorized into EAs and swarm intelligence-based algorithms as shown in Figure 1. The EAs including the genetic algorithm (GA) [20,25,27,28], genetic programming (GP) [29], evolutionary strategies (ESs) [30,31], evolutionary programming (EP) [32] are classified as classical paradigms of EC and differential evolution (DE) [33][34][35][36] and particle swam optimization (PSO) [37,38], ant colony optimization [39,40], artificial bee colony (ABC) [41,42], cuckoo search (CS) [43], bat algorithm (BA) [44][45][46], Bee algorithm [47,48], and firefly algorithm (FA) [49] are new emerging population-based algorithms [50]. Despite many key features of the aforementioned EAs, they demand maximum function evaluations and spent huge computation time to solve optimization problems with complicated search space. ...

In the last two decades, evolutionary computing has become the mainstream to attract the attention of the experts in both academia and industrial applications due to the advent of the fast computer with multi-core GHz processors have had a capacity of processing over 100 billion instructions per second. Today's different evolutionary algorithms are found in the existing literature of evolutionary computing that is mainly belong to swarm intelligence and nature-inspired algorithms. In general, it is quite realistic that not always each developed evolutionary algorithms can perform all kinds of optimization and search problems. Recently, ensemble-based techniques are considered to be a good alternative for dealing with various benchmark functions and real-world problems. In this paper, an ameliorated ensemble strategy-based evolutionary algorithm is developed for solving large-scale global optimization problems. The suggested algorithm employs the particle swam optimization, teaching learning-based optimization, differential evolution, and bat algorithm with a self-adaptive procedure to evolve their randomly generated set of solutions. The performance of the proposed ensemble strategy-based evolutionary algorithm evaluated over thirty benchmark functions that are recently designed for the special session of the 2017 IEEE congress of evolutionary computation (CEC'17). The experimental results provided by the suggested algorithm over most CEC'17 benchmark functions are much promising in terms of proximity and diversity.

... This data-driven discovery of governing equations has been shown to perform well for extracting the equations of fluid dynamics such as NVS (Raissi & Karniadakis, 2018), Lorenz systems (Brunton et al., 2016), chemical reaction kinetics (Hoffmann et al., 2019) networks and Burgers equation (Rudy et al., 2019) or also used for extracting reduced kinetic equations that can capture whole dynamics (Harirchi et al.,0000). Different classes of ML algorithm that have been used are symbolic regression (Bongard & Lipson, 2007;Schmidt & Lipson, 2009), sparse identification of nonlinear dynamics based on lasso regression (Schaeffer et al., 2013), neural networks informed by physics (Lee et al., 2018a), equation-free modeling (Bindal et al., 2006), genetic programming (Koza, 1992) etc. The recent studies that have shown promising results on identifying governing equations for dynamics of physical systems from fast data-driven approach may also prove beneficial to identify governing dynamical equations for complex manufacturing units. ...

Dynamical equations form the basis of design for manufacturing processes and control systems; however, identifying governing equations using a mechanistic approach is tedious. Recently, Machine learning (ML) has shown promise to identify the governing dynamical equations for physical systems faster. This possibility of rapid identification of governing equations provides an exciting opportunity for advancing dynamical systems modeling. However, applicability of the ML approach in identifying governing mechanisms for the dynamics of complex systems relevant to manufacturing has not been tested. We test and compare the efficacy of two white-box ML approaches (SINDy and SymReg) for predicting dynamics and structure of dynamical equations for overall dynamics in a distillation column. Results demonstrate that a combination of ML approaches should be used to identify a full range of equations. In terms of physical law, few terms were interpretable as related to Fick’s law of diffusion and Henry’s law in SINDy, whereas SymReg identified energy balance as driving dynamics.

The low‐frequency oscillations (LFOs) are usually considered as the slow‐poisoning issues for electric power networks as they can cause system blackout if not resolved in time. However, this LFO issue has recently become a significant concern to the utility body owing to integrating renewable energy (RE) resources in the power networks. Because of the intermittent nature of RE sources, the LFOs are frequently introduced in the power networks and appear as a threatening issue in the end. Therefore, this chapter has addressed an efficient solution: implementing different artificial intelligence (AI) techniques in electric power networks to overcome the undesired LFOs and improve the overall stability of the networks by tuning the power system stabilizer (PSS) parameters. In this case, four machine learning (ML) tools, group method of data handling (GMDH), extreme learning machine (ELM), neurogenetic (NG), and multi‐gene genetic programming (MGGP), were employed in two different electric networks to investigate the applicability of AI techniques in enhancing the system's stability. The stability measuring indices of the power networks like minimum damping ratio (MDR), eigenvalues, and the time–domain simulations are evaluated for different operating situations with newly conjectured key parameters of PSS, tuned in real time. Furthermore, the results of the developed ML models were compared with the conventional approach to exhibit the applicability and superiority of AI techniques over similar approaches.

Against the background of smart manufacturing and Industry 4.0, how to achieve real-time scheduling has become a problem to be solved. In this regard, automatic design for shop scheduling based on hyper-heuristics has been widely studied, and a number of reviews and scheduling algorithms have been presented. Few studies, however, have specifically discussed the technical points involved in algorithm development. This study, therefore, constructs a general framework for automatic design for shop scheduling strategies based on hyper-heuristics, and various state-of-the-art technical points in the development process are summarized. First, we summarize the existing types of shop scheduling strategies and classify them using a new classification method. Second, we summarize an automatic design algorithm for shop scheduling. Then, we investigate surrogate-assisted methods that are popular in the current algorithm field. Finally, current problems and challenges are discussed, and potential directions for future research are proposed.

Digital circuits are one of the most important enabling technologies in the world today. Powerful tools, such as Hardware Description Languages (HDLs) have evolved over the past number of decades to allow designers to operate at high levels of abstraction and expressiveness, rather than at the gate level, which circuits are actually constructed from. Similarly, highly accurate digital circuit simulators permit designers to test their circuits before committing them to silicon. This is still a highly complex and generally manual task, however, with complex circuits taking months or even years to go from planning to silicon. We show how Grammatical Evolution (GE) can harness the standard tools of silicon design and be used to create a fully automatic circuit design system. Specifically, we use a HDL known as SystemVerilog and Icarus, a free, but powerful simulator, to generate circuits from high level descriptions. We apply our system to several well known digital circuit literature benchmarks and demonstrate that GE can successfully evolve functional circuits, including several which have been subsequently rendered in Field Programmable Gate Arrays (FPGAs).

Recently it was shown, using the typical mutation mechanism that is used in evolutionary algorithms, that monotone conjunctions are provably evolvable under a specific set of Bernoulli \((p)^n\) distributions. A natural question is whether this mutation mechanism allows convergence under other distributions as well. Our experiments indicate that the answer to this question is affirmative and, at the very least, this mechanism converges under Bernoulli \((p)^n\) distributions outside of the known proved regime.KeywordsEvolvabilityGenetic programmingMonotone conjunctionsDistribution-specific learningBernoulli \((p)^n\) distributions

In the Semantic Web era, Linked Open Data (LOD) is its most successful implementation, which currently contains billions of RDF (Resource Data Framework) triples derived from multiple, distributed, heterogeneous sources. The role of a general semantic schema, represented as an ontology, is essential to ensure the correctness and consistency in LOD and make it possible to infer implicit knowledge by reasoning. The growth of LOD creates an opportunity for the discovery of ontological knowledge from its raw RDF data itself to enrich relevant knowledge bases. In this work, we aim at discovering schema-level knowledge in the form of axioms encoded in OWL (Ontology Web Language) from RDF data. The approaches to automated generation of the axioms from recorded RDF facts on the Web may be regarded as a case of inductive reasoning and ontology learning. The instances, represented by RDF triples, play the role of specific observations, from which axioms can be extracted by generalization.Based on the insight that discovering new knowledge is essentially an evolutionary process, whereby hypotheses are generated by some heuristic mechanism and then tested against the available evidence, so that only the best hypotheses survive, we propose a model applying Grammatical Evolution, one type of evolutionary algorithm, to mine OWL axioms from an RDF data repository. In addition, we specialize the model for the specific problem of learning OWL class disjointness axioms, along with the experiments performed on DBpedia, one of the prominent examples of LOD. Furthermore, we use different axiom scoring functions based on possibility theory, which are well-suited to the open world assumption scenario of LOD, to evaluate the quality of discovered axioms. Specifically, we proposed a set of measures to build objective functions based on single-objective and multi-objective models, respectively. Finally, in order to validate it, the performance of our approach is evaluated against subjective and objective benchmarks, and is also compared to the main state-of-the-art systems.

Genetic learning forms supporting the ceaseless rule learning (CRL) approach are described by taking care of the preparation issue in a few stages. As a result, they comprises of minimum, two stages: an age procedure, that builds up an essential arrangement of fluffy principles speaking to the information existing inside the informational collection, and a post-preparing process, with the capacity of refining the past standard set in order to dispose of the excess guidelines that developed during the age stage and to pick those fluffy principles that collaborate in an ideal way.
Genetic Fuzzy Rule-Based Systems (GFRBSs) fortifying the CRL approach are normally called multi-stage Genetic Fuzzy Rule-Based Systems. The multi-stage structure might be an immediate result of the path during which GFRBSs bolstered the CRL approach settle the Chance Constrained Programming (CCP). These kind of frameworks endeavor to comprehend the CCP through a way that blends the advantages of the Pittusburg and Michigan approach [14]. The objective of the CRL approach is proportional back the component of the pursuit space by encoding singular standards in chromosome like in Michigan approach, however the assessment conspire take the participation of rules viable like in Pitt approach.
The generation process forces competition between fuzzy rules, as in genetic learning processes grounded on the Michigan approach, to get a fuzzy rule set composed of the simplest possible fuzzy rules. To do so, a fuzzy rule generating method is run several times by an ceaseless covering method that wraps it and analyses the covering that the consecutively rules learnt cause within the training data set. Hence, the cooperation among the fuzzy rules generated within the different runs is merely briefly addressed by means of a rule penalty criterion. The later post-processing stage forces cooperation between the fuzzy rules generated in generation process by refining or eliminating the previously generated redundant or excessive fuzzy rules so as to get a final fuzzy rule set that demonstrates an efficient performance.

Political optimizer (PO) is a relatively state-of-the-art meta-heuristic optimization technique for global optimization problems, as well as real-world engineering optimization, which mimics the multi-staged process of politics in human society. However, due to a greedy strategy during the election phase, and an inappropriate balance of global exploration and local exploitation during the party switching stage, it suffers from stagnation in local optima with a low convergence accuracy. To overcome such drawbacks, a sequence of novel PO variants were proposed by integrating PO with Quadratic Interpolation, Advance Quadratic Interpolation, Cubic Interpolation, Lagrange Interpolation, Newton Interpolation, and Refraction Learning (RL). The main contributions of this work are listed as follows. (1) The interpolation strategy was adopted to help the current global optima jump out of local optima. (2) Specifically, RL was integrated into PO to improve the diversity of the population. (3) To improve the ability of balancing exploration and exploitation during the party switching stage, a logistic model was proposed to maintain a good balance. To the best of our knowledge, PO combined with the interpolation strategy and RL was proposed here for the first time. The performance of the best PO variant was evaluated by 19 widely used benchmark functions and 30 test functions from the IEEE CEC 2014. Experimental results revealed the superior performance of the proposed algorithm in terms of exploration capacity.

The goal of this chapter is to propose an integrated application of the Internet of Things‐based Wireless Sensor Network in the arena of the Indian Agriculture System using the Hybrid Optimization Technique and Machine Learning. The IoT‐based WSN can play a great role to collect various useful data from the ground level. In this chapter, we aim to do the area coverage optimization and this optimization will help to cover more areas to do the surveillance. The coverage area optimization of the target area surveillance in case of research in agriculture is always a major concern. A new Hybrid Algorithm, i.e., GA‐MWPSO has been used for solving the non‐linear constrained optimization problems. To test the competence of the proposed algorithms, a set of test problems has been taken, solved and compared with existing literature. The obtained dataset has been populated in a higher range to make the training set. This idea developed the concept of Machine Learning (ML). This concept became useful to take the decision‐making tool in this research field. The data collected from the target area for year after year can be feed to the system to make a supervised machine learning system in this field.

ResearchGate has not been able to resolve any references for this publication.