ChapterPDF Available

Accelerating Genetic Programming through Graphics Processing Units.

Authors:
  • Machine Intelligence Ltd.

Abstract and Figures

Keywords: Graphics Processing Units (GPUs) are in the process of becoming a major source of computational power for numerical applications. Originally designed for application of time-consuming graphics operations, GPUs are stream processors that implement the SIMD paradigm. The true degree of parallelism of GPUs is often hidden from the user, making programming even more flexible and convenient. In this chapter we survey Genetic Programming methods currently ported to GPUs. graphics processing units, parallel processing 1.
Content may be subject to copyright.
A preview of the PDF is not available
... One general tendency for improving GP has been to increase the computational efficiency of training procedures [12,14,19,22,23,89,94]. To improve training efficiency, focus is often placed on the program evaluation phase, since this phase generally involves evaluating hundreds or thousands of computer programs on hundreds or thousands of fitness cases (i.e., data points), for each of hundreds or thousands of generations [12,14,19,22,23,89,94]. ...
... One general tendency for improving GP has been to increase the computational efficiency of training procedures [12,14,19,22,23,89,94]. To improve training efficiency, focus is often placed on the program evaluation phase, since this phase generally involves evaluating hundreds or thousands of computer programs on hundreds or thousands of fitness cases (i.e., data points), for each of hundreds or thousands of generations [12,14,19,22,23,89,94]. Notably, the evolutionary phases of GP also typically operate on the same number of programs across the same number of generations, but these procedures are often not affected by the number of fitness cases, 7 which generally leads to a significant workload imbalance between evaluation and evolution. ...
Article
Full-text available
This paper establishes the potential of accelerating the evaluation phase of tree-based genetic programming through contemporary field-programmable gate array (FPGA) technology. This exploration stems from the fact that FPGAs can sometimes leverage increased levels of both data and function parallelism, as well as superior power/energy efficiency, when compared to general-purpose CPU/GPU systems. In this investigation, we introduce a fixed-depth, tree-based architecture that can fully parallelize tree evaluation for type-consistent primitives that are unrolled and pipelined. We show that our accelerator on a 14nm FPGA achieves an average speedup of 43×\times when compared to a recent open-source GPU solution, TensorGP, implemented on 8nm process-node technology, and an average speedup of 4,902×\times when compared to a popular baseline GP software tool, DEAP, running parallelized across all cores of a 2-socket, 28-core (56-thread), 14nm CPU server. Despite our single-FPGA accelerator being 2.4×\times slower on average when compared to the recent state-of-the-art Operon tool executing on the same 2-processor, 28-core CPU system, we show that this single-FPGA system is 1.4×\times better than Operon in terms of performance-per-watt. Importantly, we also describe six future extensions that could provide at least a 64–192×\times speedup over our current design. Therefore, our initial results provide considerable motivation for the continued exploration of FPGA-based GP systems. Overall, any success in significantly improving runtime and energy efficiency could potentially enable novel research efforts through faster and/or less costly GP runs, similar to how GPUs unlocked the power of deep learning during the past fifteen years.
... One commonly used method to enhance the speed of GP is by capitalizing on its ability to be parallelized. Several studies have investigated the use of parallel hardware, such as Central Processing Units (CPUs) with Single Instruction Multiple Data (SIMD) capabilities [1,3,7,14,23,28,29], architectures based on Graphics Processing Units (GPUs) [8,21,22,27], and even Field Programmable Gate Arrays (FPGAs) [10,17,39], for executing fitness evaluations in the context of GP. However, GPUs are currently considered the most promising option for achieving significant speed improvements due to their widespread availability and focus on high-throughput computing. ...
... Recently, there have been several attempts to accelerate the evolutionary algorithm on the massively parallel GPU architecture. However, many researchers have taken only simple numeric benchmarks without any global data or with only a very limited data set [7]. This is in contradiction to the real world problems, where big simulations have to be carried out and the fitness evaluation is often the most time-consuming operation. ...
Poster
Full-text available
The aim of this paper is to develop and test an evolutionary algorithm (EA) to generate optimal trajectories of a humanoid robot using GPU accelerator at multiple levels, taking into consideration an effective utilisation of GPU memory hierarchy and judicious management of parallelism. The model was derived from our CPU serial implementation. This paper presents implementation details of an EA algorithm that evolves a recurrent neural network (RNN) on PC which has 2 GTX 480 GPUs. This combination constructs a controller for the simulated humanoid robot which is simulated using the library named Open Dynamic Engine (ODE). Since EA and RNN are inherently parallel, the GPGPU computing paradigm leads to a promising speedup in the evolutionary phase with respect to the CPU based version.
... GPU computing has been demonstrated as a powerful approach to achieve high performance on long-running scientific applications in system biology [37][38][39][40], bioinformatics [41][42][43], data mining [44,45], machine learning [46,47] and microscopy image processing [48,49]. In addition, due to their ability to massively compute in parallel a given algorithm in multiple data, evolutionary computation methods have been proposed based on GPU computation approaches [50][51][52]. These parallel methods using GPUs can achieve speedups from 8× for complex bioinformatics data mining problems [53] to 7000× for simpler benchmark functions that can run entirely in the GPU [54]. ...
Article
Full-text available
Reverse engineering mechanistic gene regulatory network (GRN) models with a specific dynamic spatial behavior is an inverse problem without analytical solutions in general. Instead, heuristic machine learning algorithms have been proposed to infer the structure and parameters of a system of equations able to recapitulate a given gene expression pattern. However, these algorithms are computationally intensive as they need to simulate millions of candidate models, which limits their applicability and requires high computational resources. Graphics processing unit (GPU) computing is an affordable alternative for accelerating large-scale scientific computation, yet no method is currently available to exploit GPU technology for the reverse engineering of mechanistic GRNs from spatial phenotypes. Here we present an efficient methodology to parallelize evolutionary algorithms using GPU computing for the inference of mechanistic GRNs that can develop a given gene expression pattern in a multicellular tissue area or cell culture. The proposed approach is based on multi-CPU threads running the lightweight crossover, mutation and selection operators and launching GPU kernels asynchronously. Kernels can run in parallel in a single or multiple GPUs and each kernel simulates and scores the error of a model using the thread parallelism of the GPU. We tested this methodology for the inference of spatiotemporal mechanistic gene regulatory networks (GRNs)—including topology and parameters—that can develop a given 2D gene expression pattern. The results show a 700-fold speedup with respect to a single CPU implementation. This approach can streamline the extraction of knowledge from biological and medical datasets and accelerate the automatic design of GRNs for synthetic biology applications.
... The first programmable-GPU-based GP is perhaps proposed by , where a SIMD interpreter (a GPU programming framework) is developed to evaluate the whole population of GP in parallel. Later, Harding, Banzhaf, and Langdon developed a series of parallel GPs (Langdon 2011;Banzhaf et al. 2008;Langdon 2010) based on the Compute Unified Device Architecture (CUDA/C++) and different GP variants. Though these works mainly focused on the parallelization of fitness evaluation, they have demonstrated the great potential of GPU in accelerating the search efficiency of GP. ...
Article
Full-text available
Genetic programming (GP) is a popular and powerful optimization algorithm that has a wide range of applications, such as time series prediction, classification, data mining, and knowledge discovery. Despite the great success it enjoyed, selecting the proper primitives from high-dimension primitive set for GP to construct solutions is still a time-consuming and challenging issue that limits the efficacy of GP in real-world applications. In this paper, we propose a multi-population GP framework with adaptively weighted primitives to address the above issues. In the proposed framework, the entire population consists of several sub-populations and each has a different vector of primitive weights to determine the probability of using the corresponding primitives in a sub-population. By adaptively adjusting the weights of the primitives and periodically sharing information between sub-populations, the proposed framework can efficiently identify important primitives to assist the search. Furthermore, based on the proposed framework and the graphics processing unit computing technique, a high-performance self-learning gene expression programming algorithm (HSL-GEP) is developed. The HSL-GEP is tested on fifteen problems, including four real-world problems. The experimental results have demonstrated that the proposed HSL-GEP outperforms several state-of-the-art GPs, in terms of both solution quality and search efficiency.
... These two models are available for use in MIMD parallel architectures in the case of islands and SIMD models for the neighborhood. Therefore, both perspectives can be combined to develop multiple models of parallel and distributed algorithms (Harding and Banzhaf 2009), which take advantage of the parallel threads in the GPU, the use of multiple GPUs, and the distribution of computation across multiple machines networked with these GPUs. ...
Thesis
Full-text available
This Doctoral Thesis presents new computational models on data classification which address new open problems and challenges in data classification by means of evolutionary algorithms. Specifically, we pursue to improve the performance, scalability, interpretability and accuracy of classification models on challenging data. The performance and scalability of evolutionary-based classification models were improved through parallel computation on GPUs, which demonstrated to achieve high efficiency on speeding up classification algorithms. The conflicting problem of the interpretability and accuracy of the classification models was addressed through a highly interpretable classification algorithm which produced very comprehensible classifiers by means of classification rules. Performance on challenging data such as the imbalanced classification was improved by means of a data gravitation classification algorithm which demonstrated to achieve better classification performance both on balanced and imbalanced data. All the methods proposed in this Thesis were evaluated in a proper experimental framework, by using a large number of data sets with diverse dimensionality and by comparing their performance against other state-of-the-art and recently published methods of proved quality. The experimental results obtained have been verified by applying non-parametric statistical tests which support the better performance of the methods proposed.
... In recent years, parallel computing on GP-GPU (General-Purpose Graphics Processing Unit) hardware has become an affordable and attractive alternative to traditional large and expensive computer clusters. Originally designed for high-performance image processing in computer graphics, movies, games and related applications, the popularity of GPUs has reached domains as diverse as scientific computing for physics, astronomy, biology, chemistry, geology and other areas, optimization and packet switching in computer networks, and genetic programming and evolutionary computation, among other domains[8,17,32,43]. In this section we summarize the GPU architecture and programming very briefly, just enough for the reader to be able to follow the discussion on the parallelization of the AChem algorithms in Section 5. See Chapter 2 of this book for a more comprehensive overview of GPU hardware and[17]for a more comprehensive overview of GPU computing applied to the modelling of biochemical systems. ...
Chapter
Full-text available
An Artificial Chemistry is an abstract model of a chemistry that can be used to model real chemical and biological processes, as well as any natural or artificial phenomena involving interactions among objects and their transformations. It can also be used to perform computations inspired by chemistry, including heuristic optimization algorithms akin to evolutionary algorithms, among other usages. Artificial chemistries are conceptually parallel computations, and could greatly benefit from parallel computer architectures for their simulation, especially as GPU hardware becomes widespread and affordable. However, in practice it is difficult to parallelize artificial chemistry algorithms efficiently for GPUs, particularly in the case of stochastic simulation algorithms that model individual molecular collisions and take chemical kinetics into account. This chapter surveys the current state of the art in the techniques for parallelizing artificial chemistries on GPUs, with focus on their stochastic simulation and their applications in the evolutionary computation domain. Since this problem is far from being entirely solved to satisfaction, some suggestions for future research are also outlined.
Chapter
In this paper, we explore the prospect of accelerating tree-based genetic programming (TGP) by way of modern field-programmable gate array (FPGA) devices, which is motivated by the fact that FPGAs can sometimes leverage larger amounts of data/function parallelism, as well as better energy efficiency, when compared to general-purpose CPU/GPU systems. In our preliminary study, we introduce a fixed-depth, tree-based architecture capable of evaluating type-consistent primitives that can be fully unrolled and pipelined. The current primitive constraints preclude arbitrary control structures, but they allow for entire programs to be evaluated every clock cycle. Using a variety of floating-point primitives and random programs, we compare to the recent TensorGP tool executing on a modern 8 nm GPU, and we show that our accelerator implemented on a 14 nm FPGA achieves an average speedup of 43×\times . When compared to the popular baseline tool DEAP executing across all cores of a 2-socket, 28-core (56-thread), 14 nm CPU server, our accelerator achieves an average speedup of 4,902×\times . Finally, when compared to the recent state-of-the-art tool Operon executing on the same 2-processor CPU system, our accelerator executes about 2.4×\times slower on average. Despite not achieving an average speedup over every tool tested, our single-FPGA accelerator is the fastest in several instances, and we describe five future extensions that could allow for a 32–144×\times speedup over our current design as well as allow for larger program depths/sizes. Overall, we estimate that a future version of our accelerator will constitute a state-of-the-art GP system for many applications.KeywordsTree-based genetic programmingField-programmable gate arraysHardware acceleration
Article
Research and development of automatic trading systems are becoming more frequent, as they can reach a high potential for predicting market movements. The use of these systems allows to manage a huge amount of data related to the factors that affect investment performance (macroeconomic variables, company information, industry indicators, market variables, etc.), while avoiding psychological reactions of traders when investing in financial markets.Movements in stock markets are continuous throughout each day, which requires trading systems must be supported by more powerful engines, since the amount of data to process grows, while the response time required to support operations is shortened. In this chapter we present two parallel implementations of a GA based trading system. The first uses a Grid Volunteer System based on BOINC and the second one takes advantage of a Graphic Processing Unit implementation.
Conference Paper
Full-text available
As is typical in evolutionary algorithms, fitness evaluation in GP takes the majority of the computational effort. In this paper we demonstrate the use of the Graphics Processing Unit (GPU) to accelerate the evaluation of individuals. We show that for both binary and floating point based data types, it is possible to get speed increases of several hundred times over a typical CPU implementation. This allows for evaluation of many thousands of fitness cases, and hence should enable more ambitious solutions to be evolved using GP.
Article
Full-text available
We demonstrate a SIMD C++ genetic programming system on a single 128 node parallel nVidia GeForce 8800 GTX GPU under RapidMind’s GPGPU Linux software by predicting ten year+ outcome of breast cancer from a dataset containing a million inputs. NCBI GEO GSE3494 contains hundreds of Affymetrix HG-U133A and HG-U133B GeneChip biopsies. Multiple GP runs each with a population of 5 million programs winnow useful variables from the chaff at more than 500 million GPops per second. Sources available via FTP.
Conference Paper
Full-text available
Graphics processor units are fast, inexpensive parallel computing devices. Recently there has been great interest in harnessing this power for various types of scientific computation, including genetic programming. In previous work, we have shown that using the graphics processor provides dramatic speed improvements over a standard CPU in the context of fitness evaluation. In this work, we use Cartesian Genetic Programming to generate shader programs that implement image filter operations. Using the GPU, we can rapidly apply these programs to each pixel in an image and evaluate the performance of a given filter. We show that we can successfully evolve noise removal filters that produce better image quality than a standard median filter.
Conference Paper
Full-text available
We present a new twist to the Beowulf cluster — the Bladed Beowulf. In contrast to traditional Beowulfs which typically use Intel or AMD processors, our Bladed Beowulf uses Trans- meta processors in order to keep thermal power dissipation low and reliability and density high while still achieving com- parable performance to Intel- and AMD-based clusters. Given the ever increasing complexity of traditional super- computers and Beowulf clusters; the issues of size, reliability, power consumption, and ease of administration and use will be "the" issues of this decade for high-performance comput- ing. Bigger and faster machines are simply not good enough anymore. To illustrate, we present the results of performance benchmarks on our Bladed Beowulf and introduce two perfor- mance metrics that contribute to the total cost of ownership (TCO) of a computing system — performance/power and per- formance/space.
Article
Recently, graphics processors have emerged as a powerful computational platform. A variety of encouraging results, mostly from researchers using GPUs to accelerate scientific computing and visualization applications, have shown that significant speedups can be achieved by applying GPUs to data-parallel computational problems. However, attaining these speedups requires knowledge of GPU programming and architecture.The preceding chapters have described the architecture of modern GPUs and the trends that govern their performance and design. Continuing from the concepts introduced in those chapters, in this chapter we present intuitive mappings of standard computational concepts onto the special-purpose features of GPUs. After presenting the basics, we introduce a simple GPU programming framework and demonstrate the use of the framework in a short sample program.
Article
In machine learning terms DNA (gene) chip data is unusual in having thousands of attributes (the gene expression values) but few (
Conference Paper
A GPU is used to datamine five million correlations between probes within Affymetrix HG-U133A probesets across 6685 human tissue samples from NCBIpsilas GEO database. These concordances are used as machine learning training data for genetic programming running on a Linux PC with a RapidMind OpenGL GLSL backend. GPGPU is used to identify technological factors influencing high density oligonuclotide arrays (HDONA) performance. GP suggests mismatch (PM/MM) and adenosine/guanine ratio influence microarray quality. Initial results hint that Watson-Crick probe self hybridisation or folding is not important. Under GPGPGPU an nVidia GeForce 8800 GTX interprets 300 million GP primitives/second (300 MGPops, approx 8 GFLOPS).
Article
This paper describes the successful parallel implementation of genetic programming on a network of processing nodes using the transputer architecture. With this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at intermediate cost. This approach is illustrated by a comparison of the computational effort required to solve a benchmark problem. Because of the decoupled character of genetic programming, our approach achieved a nearly linear speed up from parallelization. In addition, for the best choice of parameters tested, the use of subpopulations delivered a super-linear speed-up in terms of the ability of the algorithm to solve the problem. Several examples are also presented where the parallel genetic programming system evolved solutions that are competitive with human performance.
Conference Paper
An extension of Cellular Genetic Programming for data classifiation to induce an ensemble of predictors is presented. Each classifier is trained on a different subset of the overall data, then they are combined to classify new tuples by applying a simple majority voting algorithm, like bagging. Preliminary results on a large data set show that the ensemble of classifiers trained on a sample of the data obtains higher accuracy than a single classifier that uses the entire data set at a much lower computational cost.