William B. Langdon's research while affiliated with University College London and other places

Publications (331)

Article
We evolve floating point Sextic polynomial populations of genetic programming binary trees for up to a million generations. We observe continued innovation but this is limited by tree depth. We suggest that deep expressions are resilient to learning as they disperse information, impeding evolvability, and the adaptation of highly nested organisms,...
Article
We sample the genetic programming tree search space and show it is smooth since many mutations on many test cases have little or no fitness impact. We generate uniformly at random high order polynomials composed of 12 500 and 750 000 additions and multiplications and follow the impact of small changes to them. From information theory 32 bit floatin...
Preprint
Full-text available
We summarise how a 3.0 GHz 16 core AVX512 computer can interpret the equivalent of up to on average 1103370000000 GPop/s. Citations to existing publications are given. Implementation stress is placed on both parallel computing, bandwidth limits and avoiding repeated calculation. Information theory suggests in digital computing, failed disruption pr...
Preprint
Full-text available
We inject a random value into the evaluation of highly evolved deep integer GP trees 9743720 times and find 99.7percent Suggesting crossover and mutation's impact are dissipated and seldom propagate outside the program. Indeed only errors near the root node have impact and disruption falls exponentially with depth at between exp(-depth/3) and exp(-...
Article
Information-theoretic analysis of large, evolved programs produced by running genetic programming for up to a million generations has shown even functions as smooth and well behaved as floating-point addition and multiplication lose entropy and consequently are robust and fail to propagate disruption to their outputs. This means that, while depende...
Article
Full-text available
We study both genotypic and phenotypic convergence in GP floating point continuous domain symbolic regression over thousands of generations. Subtree fitness variation across the population is measured and shown in many cases to fall. In an expanding region about the root node, both genetic opcodes and function evaluation values are identical or nea...
Preprint
Full-text available
Information theoretic analysis of large evolved programs produced by running genetic programming for up to a million generations has shown even functions as smooth and well behaved as floating point addition and multiplication loose entropy and consequently are robust and fail to propagate disruption to their outputs. This means, while dependent up...
Conference Paper
If a software execution is disrupted, witnessing the execution at a later point may see evidence of the disruption or not. If not, we say the disruption failed to propagate. One name for this phenomenon is software robustness but it appears in different contexts in software engineering with different names. Contexts include testing, security, relia...
Article
We use continuous optimisation and manual code changes to evolve up to 1024 Newton-Raphson numerical values embedded in an open source GNU C library glibc square root sqrt to implement a double precision cube root routine cbrt, binary logarithm log2 and reciprocal square root function for C in seconds. The GI inverted square root x -1/2 is far more...
Conference Paper
Limited precision floating point computer implementations of large polynomial arithmetic expressions are nonlinear and dissipative. They are not reversible (irreversible, lack conservation), lose information, and so are robust to perturbations (anti-fragile) and resilient to fluctuations. This gives a largely stable locally flat evolutionary neutra...
Chapter
Often GP evolves side effect free trees. These pure functional expressions can be evaluated in any order. In particular they can be interpreted from the genetic modification point outwards. Incremental evaluation exploits the fact that: in highly evolved children the semantic difference between child and parent falls with distance from the syntacti...
Article
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceed- ings) there was a wide ranging discussion at the eighth inter- national Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the International Conference on Software En- gineering on Friday 3rd July 2020). Topics included industry...
Conference Paper
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included...
Preprint
C++ code snippets from a multi-core parallel memory-efficient crossover for genetic programming are given. They may be adapted for separate generation evolutionary algorithms where large chromosomes or small RAM require no more than M + (2 times nthreads) simultaneously active individuals.
Preprint
Full-text available
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included...
Article
Full-text available
The journal and in particular the resource reviews have been running for 20 years. We summarise the GP literature, including top papers and authors, as seen by users of the genetic programming bibliography. Then revisit our original goals for GPEM book reviews and compare them with what has achieved.
Chapter
Many functions, such as square root, are approximated and sped up with lookup tables containing pre-calculated values.
Article
Software is vital to modern life, yet much of it is old and suffers from bit-rot. There are not and never will be enough software experts to keep it all up to date by hand. Instead we suggest combining data driven learning with evolutionary search to maintain computer systems. @RE: <1>N. Alshahwan. Industrial experience of genetic improvement in Fa...
Preprint
random_tree() is a linear time and space C++ implementation able to create trees of up to a billion nodes for genetic programming and genetic improvement experiments. A 3.60GHz CPU can generate more than 18 million random nodes for GP program trees per second.
Conference Paper
Many functions, such as square root, are approximated and sped up with lookup tables containing pre-calculated values. We introduce an approach using genetic algorithms to evolve such lookup tables for any smooth function. It provides double precision and calculates most values to the closest bit, and outperforms reference implementations in most c...
Article
Full-text available
We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Confer- ence on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI res...
Conference Paper
We evolve floating point Sextic polynomial populations of genetic programming binary trees for up to a million generations. Programs with almost 400 000 000 instructions are created by crossover. To support unbounded Long-Term Evolution Experiment LTEE GP we use both SIMD parallel AVX 512 bit instructions and 48 threads to yield performance of up t...
Conference Paper
We modified GPQuick to use SIMD parallel floating point AVX 512 bit instructions and 48 threads to give up to 139 billion GP operations per second, 139 giga GPops, on a single Intel Xeon Gold 6126 2.60 GHz server. The multi-threaded single instruction multiple data genetic programming GP interpreter has evolved binary trees of more than 396 million...
Conference Paper
CMA-ES plus manual code changes rapidly transforms 512 Newton-Raphson start points from a GNU C library table driven version of sqrt into a double precision reciprocal square root function. The GI x-1/2 is far more accurate than Quake's InvSqrt, Quare root.
Conference Paper
Automated search in the form of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), plus manual code changes, transforms 512 Newton-Raphson floating point start numbers from an open source GNU C library, glibc, table driven square root function to create a new bespoke custom mathematical implementation of double precision binary logarithm...
Preprint
Full-text available
We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Conference on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI resea...
Chapter
Using 512 bit Advanced Vector Extensions, previous development history and Intel documentation, BNF grammar based genetic improvement automatically ports RNAfold to AVX, giving up to a 1.77 fold speed up. The evolved code pull request is an accepted GI software maintenance update to bioinformatics package ViennaRNA.
Conference Paper
Using 512 bit Advanced Vector Extensions, previous development history and Intel documentation, BNF grammar based genetic improvement automatically ports RNAfold to AVX, giving up to a 1.77 fold speed up. The evolved code pull request is an accepted GI software maintenance update to bioinformatics package ViennaRNA.
Preprint
We evolve floating point Sextic polynomial populations of genetic programming binary trees for up to a million generations. Programs with almost four hundred million instructions are created by crossover. To support unbounded Long-Term Evolution Experiment LTEE GP we use both SIMD parallel AVX 512 bit instructions and 48 threads to yield performanc...
Article
The GECCO 2018 conference in Kyoto, Japan hosted the 15th annual "Humies" Awards. The first annual "Humies" competition was held at the 2004 Genetic and Evolutionary Computation Conference (GECCO-2004) in Seattle (USA). With its generous prize money (provided by John Koza) it has become a staple of the Genetic and Evolutioary Computing calendar. Th...
Preprint
Report on Humies competition at GECCO 2018 in Japan
Conference Paper
We document the program and the immediate outcomes of Dagstuhl Seminar 18052 “Genetic Improvement of Software”. The seminar brought together researchers in Genetic Improvement (GI) and related areas of software engineering to investigate what is achievable with current technology and the current impediments to progress and how GI can affect the sof...
Preprint
Full-text available
We add CUDA GPU C program code to RNAfold to enable both it to be run on nVidia gaming graphics hardware and so that many thousands of RNA secondary structures can be computed in parallel. RNAfold predicts the folding pattern for RNA molecules by using O( n ³ ) dynamic programming matrices to minimise the free energy of treating them as a sequence...
Conference Paper
Genetic improvement might be widely used to adapt existing numerical values within programs. Applying GI to embedded parameters in computer code can create new functionality. For example, CMA-ES can evolve 1024 real numbers in a GNU C library square root to implement a cube root routine for C.
Conference Paper
Grow and graft genetic programming (GGGP) evolves more than 50000 parameters in a state-of-the-art C program to make functional source code changes which give more accurate predictions of how RNA molecules fold up. Genetic improvement updates 29% of the dynamic programming free energy model parameters. In most cases (50.3%) GI gives better results...
Chapter
Trying all simple changes (first order mutations) to executed C, C++ and CUDA source code shows software engineering artefacts are more robust than is often assumed. Of those that compile, up to 89 % run without error. Indeed a few mutants are improvements. Program fitness landscapes are smoother. Analysis of these programs, a parallel nVidia GPGPU...
Preprint
Full-text available
Grow and graft genetic programming (GGGP) can automatically evolve an existing state-of-the art program to give more accurate predictions of the secondary structures adapted by RNA molecules using their base sequence alone. That is, genetic improvement (GI) can make functional as well as non-functional source code changes.
Article
Full-text available
Background BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using “Genetic Improvement”. Results The genetically improved (GI) code is up to three times faster on short paired end reads from...
Article
Background: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using “Genetic Improvement”. Results: The genetically improved (GI) code is up to three times faster on short paired end reads fro...
Conference Paper
RNAfold predicts the secondary structure of RNA molecules from their base sequence. We apply a mixture of manual and automated genetic improvements to its C source. GI gives a 1.6% improvement to parallel SSE4.1 code. The automatic programming evolutionary system has access to Intel library code and previous revisions. On 4 666 curated structures f...
Conference Paper
Evolving binary mux-6 trees for up to 100 000 generations, during which some programs grow to more than a hundred million nodes, suggests the landscape which GP explores contains some very smooth regions. Although the GP population evolves under crossover, our unbounded GP appears not to evolve building blocks. We do see periods of tens even hundre...
Conference Paper
There is a cultural divide between computer scientists and biologists that needs to be addressed. The two disciplines used to be quite unrelated but many new research areas have arisen from their synergy. We selectively review two multi-disciplinary problems: dealing with contamination in sequencing data repositories and improving software using bi...
Article
Genetic improvement uses computational search to improve existing software while retaining its partial functionality. Genetic improvement has previously been concerned with improving a system with respect to all possible usage scenarios. In this paper, we show how genetic improvement can also be used to achieve specialisation to a specific set of u...
Conference Paper
We propose the use of search based learning from existing open source test suites to automatically generate partially correct test oracles. We argue that mutation testing and nversion computing (augmented by deep learning and other soft computing techniques), will be able to predict whether a program’s output is correct sufficiently accurately to b...
Article
Full-text available
Genetic improvement uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic prog...
Article
Following UCL spin-out DeepMind's success at beating the world Go champion, there was very much a flavour of artificial intelligence (AI) in the air. For example Deep Learning was the topic of Juergen Schmidhuber's invited plenary talk and some of the competitions, for example, Diego Perez and Simon Lucas' General Video Game AI Competition (winner...
Conference Paper
Full-text available
High order mutation analysis of a software engineering benchmark, including schema and local optima networks, suggests program improvements may not be as hard to find as is often assumed. (1) Bit-wise genetic building blocks are not deceptive and can lead to all global optima. (2) There are many neutral networks, plateaux and local optima, neverthe...
Article
Full-text available
We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parall...
Conference Paper
Trying all simple changes (first order mutations) to executed C, C++ and CUDA source code shows software engineering artefacts are more robust than is often assumed. Of those that compile, up to 89 % run without error. Indeed a few mutants are improvements. Program fitness landscapes are smoother. Analysis of these programs, a parallel nVidia GPGPU...
Preprint
Full-text available
Typically BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically miproved for short paired end next generation sequences. On longer, 150 base paired end noisy Cambridge Epigsnetix’s data, a Pascal GTX 1080 proc esses about 10000 strings per second, comparable with twin nVidia Tes...
Conference Paper
ACGI respects the Application Programming Interface whilst using genetic programming to optimise the implementation of the API. It reduces the scope for improvement but it may smooth the path to GI acceptance because the programmer’s code remains unaffected; only library code is modified. We applied ACGI to C++ software for the state-of-the-art Ope...
Conference Paper
In steady state Twin GP both children created by sub-tree crossover and point mutation are used. They are born together and die together. Evolution is little changed. Indeed fitness selection using the twin’s co-conceived doppelganger is possible.
Conference Paper
We give a model of parallel distributed genetic improvement. With modern low cost power monitors; high speed Ethernet LAN latency and network jitter have little effect. The model calculates a minimum usable mutation effect based on the analogue to digital converter (ADC)’s resolution and shows the optimal test duration is inversely proportional to...
Conference Paper
BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer noisy epigenetics strings using nVidia Titan and twin Tesla K40 the same GI-ed code is more than 3 times faster than bwa-meth on an 8 core CPU.
Conference Paper
Full-text available
Automatic Programming has long been a sub-goal of Artificial Intelligence (AI). It is feasible in limited domains. Genetic Improvement (GI) has expanded these dramatically to more than 100 000 lines of code by building on human written applications. Further scaling may need key advances in both Search Based Software Engineering (SBSE) and Evolution...
Article
Trying all hopeful high order mutations to source code shows none of the first order schema of triangle software engineering benchmark are deceptive. Indeed these unit building blocks lead to all global optima. Suggesting program improvements may not be as hard to find as is often assumed.
Conference Paper
Genetic programming (GP) can increase computer program’s functional and non-functional performance. It can automatically port or refactor legacy code written by domain experts. Working with programmers it can grow and graft (GGGP) new functionality into legacy systems and parallel Bioinformatics GPGPU code. We review Genetic Improvement (GI) and SB...
Conference Paper
We introduce a ‘grow and serve’ approach to Genetic Improvement (GI) that grows new functionality as a web service running on the Django platform. Using our approach, we successfully grew and released a citation web service. This web service can be invoked by existing applications to introduce a new citation counting feature. We demonstrate that GI...
Conference Paper
Full-text available
We genetically improve BarraCUDA using a BNF grammar incorporating C scoping rules with GP. Barracuda maps next generation DNA sequences to the human genome using the Burrows-Wheeler algorithm (BWA) on nVidia Tesla parallel graphics hardware (GPUs). GI using phenotypic tabu search with manually grown code can graft new features giving more than 100...
Conference Paper
Grow and graft genetic programming greatly improves GPGPU dynamic programming software for predicting the minimum binding energy for folding of RNA molecules. The parallel code inserted into the existing CUDA version of pknots was grown using a BNF grammar. On an nVidia Tesla K40 GPU GGGP gives a speed up of up to 10000 times.
Article
Full-text available
Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationall...
Conference Paper
Genetic programming can optimise software, including: evolving test benchmarks, generating hyper-heuristics by searching meta-heuristics, generating communication protocols, composing telephony systems and web services, generating improved hashing and C++ heap managers, redundant programming and even automatic bug fixing. Particularly in embedded r...
Article
Genetic programming can optimise software, including: evolving test benchmarks, generating hyper-heuristics by searching meta-heuristics, generating communication protocols, composing telephony systems and web services, generating improved hashing and C++ heap managers, redundant programming and even automatic bug fixing. Particularly in embedded r...
Article
We show that the genetic improvement of programs (GIP) can scale by evolving increased performance in a widely-used and highly complex 50000 line system. Genetic improvement of software for multiple objective exploration (GISMOE) found code that is 70 times faster (on average) and yet is at least as good functionally. Indeed, it even gives a small...
Chapter
Genetic programming (GP) can dramatically increase computer programs’ performance. It can automatically port or refactor legacy code written by domain experts and specialist software engineers. After reviewing SBSE research on evolving software we describe an open source parallel StereoCamera image processing application in which GI optimisation ga...
Conference Paper
This paper1 presents a survey of work on Search Based Software Engineering (SBSE) for Software Product Lines (SPLs). We have attempted to be comprehensive, in the sense that we have sought to include all papers that apply computational search techniques to problems in software product line engineering. Having surveyed the recent explosion in SBSE f...
Conference Paper
Adding new functionality to an existing, large, and perhaps poorly-understood system is a challenge, even for the most competent human programmer. We introduce a ‘grow and graft’ approach to Genetic Improvement (GI) that transplants new functionality into an existing system. We report on the trade offs between varying degrees of human guidance to t...
Article
Genetic Improvement (GI) is shown to optimise, in some cases by more than 35percent, a critical component of healthcare industry software across a diverse range of six nVidia graphics processing units (GPUs). GP and other search based software engineering techniques can automatically optimise the current rate limiting CUDA parallel function in the...
Article
New features of the genetic programming bibliography include graphical displays of recent Internet based paper down load activity, html web pages identifying centres of GP expertise, new papers and a blog.
Article
This paper presents a brief outline of an approach to online genetic improvement. We argue that existing progress in genetic improvement can be exploited to support adaptivity. We illustrate our proposed approach with a 'dreaming smart device' example that combines online and offline machine learning and optimisation.
Article
Full-text available
Background In silco Biology is increasingly important and is often based on public data. While the problem of contamination is well recognised in microbiology labs the corresponding problem of database corruption has received less attention. Results Mapping 50 billion next generation DNA sequences from The Thousand Genome Project against published...
Conference Paper
Genetic Programming (GP) may dramatically increase the performance of software written by domain experts. GP and autotuning are used to optimise and refactor legacy GPGPU C code for modern parallel graphics hardware and software. Speed ups of more than six times on recent nVidia GPU cards are reported compared to the original kernel on the same har...
Conference Paper
Genetic Improvement (GI) is a form of Genetic Programming that improves an existing program. We use GI to evolve a faster version of a C++ program, a Boolean satisfiability (SAT) solver called MiniSAT, specialising it for a particular problem class, namely Combinatorial Interaction Testing (CIT), using automated code transplantation. Our GI-evolved...
Conference Paper
This paper overviews the application of Search Based Software Engineering (SBSE) to reverse engineering with a particular emphasis on the growing importance of recent developments in genetic programming and genetic improvement for reverse engineering. This includes work on SBSE for remodularisation, refactoring, regression testing, syntax-preservin...
Conference Paper
Genetic Programming (GP) has long been applied to several SBSE problems. Recently there has been much interest in using GP and its variants to solve demanding problems in which the code evolved by GP is intended for deployment. This paper investigates the application of genetic improvement to a challenging problem of improving a well-studied system...
Conference Paper
At least 473 Affymetrix HG-U133 +2 Homosapiens probes match one or more species of mycoplasma. Analysis of published data from thousands of human GeneChips finds correlations in homo sapiens studies between different microbiology laboratories in different countries which suggests contamination with mycoplasma is the common factor. This high lights...
Article
We study a generic program to investigate the scope for automatically customising it for a vital current task, which was not considered when it was first written. In detail, we show genetic programming (GP) can evolve models of aspects of BLAST's output when it is used to map Solexa Next-Gen DNA sequences to the human genome.
Article
We have recently used genetic programming to automatically generate an improved version of Langmead's DNA read alignment tool Bowtie2 Sect.5.3 RN/12/09. We find it runs more than four times faster than the Bioinformatics sequencing tool (BWA) currently used with short next generation paired end DNA sequences by the Cancer Institute, takes less memo...
Article
Full-text available
Optimising programs for non-functional properties such as speed, size, throughput, power consumption and bandwidth can be demanding; pity the poor programmer who is asked to cater for them all at once! We set out an alternate vision for a new kind of software development environment inspired by recent results from Search Based Software Engineering...
Article
The Emerald supercomputer contains 1008 x86 CPU cores and 372 nVidia M2090 Tesla. A CUDA GPGPU genetic programming GeneChip datamining application which searches for non-linear gene expression based prediction of long term survival following breast cancer surgery was transferred without change and run on part of the Emerald cluster. An average of 3...
Article
Full-text available
We document a parallel non-recursive beam search GPGPU FCA algorithm written in nVidia CUDA C. We run it on benchmarks and to analyse software module dependency. Despite kernel sort removing repeated calculations, 32 bit packing and optimising GPU data structures and kernels, we do not yet see major speed ups. Instead GeForce 295 GTX and Tesla C205...