Conference Paper

Explainable A.I.: The Promise of Genetic Programming Multi-run Subtree Encapsulation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Being a white-box technique, GP can deliver on the promise of XAI (Howard & Edwards, 2018). It can not only create interpretable models but can also unlock the behaviour of black-box models. ...
Article
Full-text available
Genetic programming (GP), a widely used Evolutionary Computing technique, suffers from bloat -- the problem of excessive growth in individuals' sizes. As a result, its ability to efficiently explore complex search spaces reduces. The resulting solutions are less robust and generalisable. Moreover, it is difficult to understand and explain models which contain bloat. This phenomenon is well researched, primarily from the angle of controlling bloat: instead, our focus in this paper is to review the literature from an explainability point of view, by looking at how simplification can make GP models more explainable by reducing their sizes. Simplification is a code editing technique whose primary purpose is to make GP models more explainable. However, it can offer bloat control as an additional benefit when implemented and applied with caution. Researchers have proposed several simplification techniques and adopted various strategies to implement them. We organise the literature along multiple axes to identify the relative strengths and weaknesses of simplification techniques and to identify emerging trends and areas for future exploration. We highlight design and integration challenges and propose several avenues for research. One of them is to consider simplification as a standalone operator, rather than an extension of the standard crossover or mutation operators. Its role is then more clearly complementary to other GP operators, and it can be integrated as an optional feature into an existing GP setup. Another proposed avenue is to explore the lack of utilisation of complexity measures in simplification. So far, size is the most discussed measure, with only two pieces of prior work pointing out the benefits of using time as a measure when controlling bloat.
... knowledgebase), salience mapping, sensitivity-based analysis, feature importance, fuzzy-based, neural-network, and genetic-programming based. These techniques use one of three basic evaluation approaches: application-grounded, humangrounded and functionally grounded [1], [2], [8]- [11]. ...
... Their simple structure allows the end-user to easily understand the contribution of individual features to the predictions and to visualise them, together with the shape functions, with bar-and line-charts. Multi-Run Subtree Encapsulation, which comes from the genetic programming (GP) realm, was proposed in [237] as a way to generate simpler tree-based GP programs. If the tree contains sub-trees of different makeup but evaluating the same vector of results, they are to be considered as the same sub-tree. ...
Article
Full-text available
Machine and deep learning have proven their utility to generate data-driven models with high accuracy and precision. However, their non-linear, complex structures are often difficult to interpret. Consequently, many scholars have developed a plethora of methods to explain their functioning and the logic of their inferences. This systematic review aimed to organise these methods into a hierarchical classification system that builds upon and extends existing taxonomies by adding a significant dimension—the output formats. The reviewed scientific papers were retrieved by conducting an initial search on Google Scholar with the keywords “explainable artificial intelligence”; “explainable machine learning”; and “interpretable machine learning”. A subsequent iterative search was carried out by checking the bibliography of these articles. The addition of the dimension of the explanation format makes the proposed classification system a practical tool for scholars, supporting them to select the most suitable type of explanation format for the problem at hand. Given the wide variety of challenges faced by researchers, the existing XAI methods provide several solutions to meet the requirements that differ considerably between the users, problems and application fields of artificial intelligence (AI). The task of identifying the most appropriate explanation can be daunting, thus the need for a classification system that helps with the selection of methods. This work concludes by critically identifying the limitations of the formats of explanations and by providing recommendations and possible future research directions on how to build a more generally applicable XAI method. Future work should be flexible enough to meet the many requirements posed by the widespread use of AI in several fields, and the new regulations.
... One such technique is genetic programming (GP). It is not only intrinsically transparent but can also be employed to generate explanations of black-box models [8,10]. Thus, it has the potential to achieve good performance without compromising explainability. ...
Conference Paper
Full-text available
The last decade has seen amazing performance improvements in deep learning. However, the black-box nature of this approach makes it difficult to provide explanations of the generated models. In some fields such as psychology and neuroscience, this limitation in explainability and interpretability is an important issue. Approaches such as genetic programming are well positioned to take the lead in these fields because of their inherent white box nature. Genetic programming, inspired by Darwinian theory of evolution, is a population-based search technique capable of exploring a highdimensional search space intelligently and discovering multiple solutions. However, it is prone to generate very large solutions, a phenomenon often called “bloat”. The bloated solutions are not easily understandable. In this paper, we propose two techniques for simplifying the generated models. Both techniques are tested by generating models for a well-known psychology experiment. The validity of these techniques is further tested by applying them to a symbolic regression problem. Several population dynamics are studied to make sure that these techniques are not compromising diversity – an important measure for finding better solutions. The results indicate that the two techniques can be both applied independently and simultaneously and that they are capable of finding solutions at par with those generated by the standard GP algorithm – but with significantly reduced program size. There was no loss in diversity nor reduction in overall fitness. In fact, in some experiments, the two techniques even improved fitness.
... Transparent Generalized Additive Model Tree (TGAMT) [313] was proposed as an explainable and transparent method that uses a CART-like greedy recursive search to grow the tree. Multi-Run Subtree Encapsulation, which comes from the genetic programming (GP) realm, was proposed in [314] as a way to generate simpler tree-based GP programs. If the tree contains sub-trees of different makeup but evaluating the same vector of results, they are to be considered as the same sub-tree. ...
Preprint
Full-text available
Explainable Artificial Intelligence (XAI) has experienced a significant growth over the last few years. This is due to the widespread application of machine learning, particularly deep learning, that has led to the development of highly accurate models but lack explainability and interpretability. A plethora of methods to tackle this problem have been proposed, developed and tested. This systematic review contributes to the body of knowledge by clustering these methods with a hierarchical classification system with four main clusters: review articles, theories and notions, methods and their evaluation. It also summarises the state-of-the-art in XAI and recommends future research directions.
... While the rules evolved by genetic programming for educational data mining does provide some explanation as to reasons for the conclusion/action arrived at, these explanations may not be clear, may not be that readable due to introns or more detail may be required. The combination of explainable artificial [1,11] and genetic programming needs to be examined to enhance explanations and feedback. ...
Article
Full-text available
Since its inception genetic programming, and later variations such as grammar-based genetic programming and grammatical evolution, have contributed to various domains such as classification, image processing, search-based software engineering, amongst others. This paper examines the role that genetic programming has played in education. The paper firstly provides an overview of the impact that genetic programming has had in teaching and learning. The use of genetic programming in intelligent tutoring systems, predicting student performance and designing learning environments is examined. A critical analysis of genetic programming in education is provided. The paper then examines future directions of research and challenges in the application of genetic programming in education.
Article
Full-text available
Artificial neural networks (ANNs) are widely used in critical mission systems such as healthcare, self-driving vehicles and the army, which directly affect human life, and in predicting data related to these systems. However, the black-box nature of ANN algorithms makes their use in mission-critical applications difficult, while raising ethical and forensic concerns that lead to a lack of trust. The development of the Artificial Intelligence (AI) day by day and gaining more space in our lives have revealed that the results obtained from these algorithms should be more explainable and understandable. Explainable Artificial Intelligence (XAI) is a field of AI that supports a set of tools, techniques, and algorithms that can create high-quality interpretable, intuitive, human-understandable explanations of artificial intelligence decisions. In this study, a new model-agnostic method that can be used for the financial sector has been developed by considering the stock market data for explainability. This method enables us to understand the relationship between the inputs given to the created model and the outputs obtained from the model. All inputs were evaluated individually and combined, and the evaluation results were shown with tables and graphics. This model will also help create an explainable layer for different machine learning algorithms and application areas.
Chapter
Recent advances in deep learning methodology led to artificial intelligence (AI) performance achieving and even surpassing human levels in an increasing number of complex tasks. There are many impressive examples of this development such as image classification, sensitivity analysis, speech understanding, or strategic gaming. The estimations based on the AI methods do not give any certain information due to the lack of transparency for the visualization, explanation, and interpretation of deep learning models which can be a major disadvantage in many applications. This chapter discusses studies on the prediction of precious metals in the financial field that need an explanatory model. Traditional AI and machine learning methods are insufficient to realize these predictions. There are many advantages to using explainable artificial intelligence (XAI), which enables us to make reasonable decisions based on inferences. In this chapter, the authors examine the precious metal prediction by XAI by presenting a comprehensive literature review on the related studies.
Conference Paper
Exploiting patterns within a solution or reusing certain functionality is often necessary to solve certain problems. This paper proposes a new method for identifying useful modules. Modules are only considered if they are prevalent in the population and they are seen to have a positive effect on an individual's fitness. This is achieved by finding the covariance of an individual's fitness with the presence of a particular subtree in the overall expression. While there are many successful systems that dynamically add modules during Genetic Programming (GP) runs, doing so is not trivial for Grammatical Evolution (GE), due to the fact that it employs a mapping process to produce individuals from binary strings, which makes it difficult to dynamically change the mapping process during a run. We adopt a multi-run approach which only has a single stage of module addition to mitigate the problems associated with continuously adding newly found functionality to a grammar. Based on the well-known Price Equation, our system explores the covariance between traits to identify useful modules, which are added to the grammar, before the system is restarted. Grammar Augmentation through Module Encapsulation (GAME) was tested on seven problems from three different domains and was observed to significantly improve the performance on 3 problems and never showing harmful effects on any problem. GAME found the best individual in 6 of the 7 experiments.
Conference Paper
Full-text available
Previous work has shown that genetic programming is capable of creating analog electrical circuits whose output equals common mathematical functions, merely by specifying the desired mathematical function that is to be produced. This paper extends this work by generating computational circuits whose output is an approximation to the error function associated with an existing computational circuit (created by means of genetic programming or some other method). The output of the evolved circuit can then be added to the output of the existing circuit to produce a circuit that computes the desired function with greater accuracy. This process can be performed iteratively. We present a set of results showing the effectiveness of this approach over multiple iterations for generating squaring, square root, and cubing computational circuits. We also perform iterative refinement on a recently patented cubic signal generator circuit, obtaining a refined circuit that is 7.2 times more accurate than the original patented circuit. The iterative refinement process described herein can be viewed as a method for using previous knowledge (i.e. the existing circuit) to obtain an improved result.
Article
Full-text available
. We propose and study new search operators and a novel node representation that can make GP fitness landscapes smoother. Together with a tree evaluation method known as sub-machine code GP and the use of demes, these make up a recipe for solving very large parity problems using GP. We tested this recipe on parity problems with up to 22 input variables, solving them with a very high success probability. 1. INTRODUCTION The even-n-parity functions have long been recognised as difficult for Genetic Programming (GP) to induce if no bias favourable to their induction is introduced in the function set, the input representation, or in any other part of the algorithm. For this reason they are very interesting and have been widely used as benchmark tests [1, 4, 5, 6, 7, 23, 24, 26]. For an even-parity function of n Boolean inputs, the task is to evolve a function that returns 1 if an even number of the inputs evaluate to 1, 0 otherwise. The task seems to be difficult for at least two reasons...
Article
Full-text available
Several evolutionary simulations allow for a dynamic resizing of the genotype. This is an important alternative to constraining the genotype's maximum size and complexity. In this paper, we add an additional dynamic to simulated evolution with the description of a genetic algorithm that coevolves its representation language with the genotypes. We introduce two mutation operators that permit the acquisition of modules from the genotypes during evolution. These modules form an increasingly highlevel representation language specific to the developmental environment. Experimental results illustrating interesting properties of the acquired modules and the evolved languages are provided. 1.0 Introduction A central theme of artificial life is to construct artifacts that approach the complexity of biological systems. To accomplish this, the benefits and limitations of current methods and tools must be considered. The representation employed in a genetic algorithm, for example, must permit an ...
Article
The problem of evolving an artificial ant to follow the Santa Fe trail is used to study the well known genetic programming feature of growth in solution length. Known variously as “bloat”, “fluff” and increasing “structural complexity”, this is often described in terms of increasing “redundancy” in the code caused by “introns”. Comparison between runs with and without fitness selection pressure, backed by Price’s Theorem, shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. With simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones. Thus in search (without a length bias) we expect longer representations to occur more often and so representation length to tend to increase. That is fitness based selection leads to bloat.
Article
In tree-based Genetic Programming, subtrees which represent potentially useful sub-solutions can be encapsulated in order to protect them and aid their prolifer-ation throughout the population. This paper investigates implementing this as a multi-run method. A two-stage encapsulation scheme based on subtree survival and frequency is compared against Automatically Defined Functions in fixed and evolved architectures and standard Genetic Programming for solving a Parity problem.
Article
The continuation method is developed with a special emphasis on its suitability for numerical solutions on fast computers. Four problems are treated in detail : finding roots of a polynomial, boundary value problems of nonlinear equations, identification of parameters and eigenvalue problems of linear ordinary differential operators. Numerical results are given for these problems. Finally, the continuation method is compared to iterative methods and several schemes which combine them are proposed.
Article
The automatic detection of ships in low-resolution synthetic aperture radar (SAR) imagery is investigated in this article. The detector design objectives are to maximise detection accuracy across multiple images, to minimise the computational effort during image processing, and to minimise the effort during the design stage. The results of an extensive numerical study show that a novel approach, using genetic programming (GP), successfully evolves detectors which satisfy the earlier objectives. Each detector represents an algebraic formula and thus the principles of detection can be discovered and reused. This is a major advantage over artificial intelligence techniques which use more complicated representations, e.g. neural networks.
Conference Paper
In tree-based genetic programming (GP), the most frequent subtrees on later generations are likely to constitute useful partial solutions. This paper investigates the effect of encapsulating such subtrees by representing them as atoms in the terminal set, so that the subtree evaluations can be exploited as terminal data. The encapsulation scheme is compared against a second scheme which depends on random subtree selection. Empirical results show that both schemes improve upon standard GP.
Article
One of the major challenges in the field of integrated circuit design is coping with difficult combinatorial optimization problems. Simply finding the minimum length of wire needed to connect a block of transistors is NP-hard. When factoring in a handful of other simultaneous optimization dimensions such as connections to other blocks of components and area minimization, it is easy to see the difficulty facing circuit designers. Since many of the problems encountered do not have polynomial-time solutions, very large scale integration (VLSI) algorithm designers experiment with various optimization techniques such as integer linear programming and simulated annealing. Optimization methods for VLSI computer-aided design (CAD) incorporating evolutionary search began appearing in research articles in the late 1980’s. As the first body of work devoted exclusively to evolutionary algorithms (EA’s) in VLSI CAD, this monograph attempts to fill a noticeable void in the literature. The book is divided into two parts. Part I “Basic Principles” provides an overview of the book and discusses the underlying principles of EA’s and algorithm performance issues. The treatment of EA’s here is cursory at best and concentrates mainly on the classic genetic algorithm. An overview of some aspects of VLSI CAD is also included; however, only the latter phase of the circuit design process is covered, and analog integrated circuit design is not touched upon. The second part entitled “Practice” comprises roughly threequarters of the book and deals with tools and applications. A software tool called the Genetic Algorithm Managing Environment (GAME) is described in one chapter. It provides a flexible environment in which an algorithm designer can easily interface EA’s to VLSI CAD tools to facilitate experimentation and production runs. One of the author’s themes is that problem-specific knowledge is necessary for EA’s to be competitive with other optimization approaches. Support for this argument is provided by the numerous case studies examined in a lengthy chapter concerning applications of EA’s to logic synthesis, mapping, and testing. In logic synthesis, one wishes to implement a Boolean function in hardware while satisfying certain constraints (e.g., power, delay area). A category of logic realization called Fixed Polarity Reed Muller expressions is described. Such expressions are difficult to minimize and an approach is described that uses a hybrid EA (one that incorporates problem-specific heuristics). For large Boolean functions, it is shown that the EA presented can achieve smaller expressions as compared to standard tools, although the EA required more computer time. The section on mapping includes EA applications in partitioning, floorplanning, and placement and routing problems. Partitioning consists of mapping blocks of circuit components to two-dimensional
Article
The evolutionary computation community has shown increasing interest in arbitrary-length representations, particularly in the field of genetic programming. A serious stumbling block to the scalability of such representations has been bloat: uncontrolled genome growth during an evolutionary run. Bloat appears across the evolutionary computation spectrum, but genetic programming has given it by far the most attention. Most genetic programming models explain this phenomenon as a result of the growth of introns, areas in an individual which serve no functional purpose. This paper presents evidence which directly contradicts intron theories as applied to tree-based genetic programming. The paper then uses data drawn from this evidence to propose a new model of genome growth. In this model, bloat in genetic programming is a function of the mean depth of the modification (crossover or mutation) point. Points far from the root are correspondingly less likely to hurt the child's survivability in the next generation. The modification point is in turn strongly correlated to average parent tree size and to removed subtree size, both of which are directly linked to the size of the resulting child.
Article
This paper develops an evolutionary method that learns inductively to recognize the makeup and the position of very short consensus sequences, cis-acting sites, which are a typical feature of promoters in genomes. The method combines a Finite State Automata (FSA) and Genetic Programming (GP) to discover candidate promoter sequences in primary sequence data. An experiment measures the success of the method for promoter prediction in the human genome. This class of method can take large base pair jumps and this may enable it to process very long genomic sequences to discover gene specific cis-acting sites, and genes which are regulated together.
Conference Paper
Presents a new cause of code growth, termed removal bias. We show that growth due to removal bias can be expected to occur whenever operations which remove and replace a variable-sized section of code, e.g. crossover or subtree mutation, are used in an evolutionary paradigm. Two forms of non-destructive crossover are used to examine the causes of code growth. The results support the protective value of inviable code and removal bias as two distinct causes of code growth. Both causes of code growth are shown to exist in at least two different problems
Article
Introduction Hierarchical Genetic Programming (HGP) extensions discover, modify, and exploit subroutines to accelerate the evolution of programs [Koza 1992, Rosca and Ballard 1994a] . The use of subroutines biases the search for good programs and offers the possibility to reuse code. While HGP approaches improve the efficiency and scalability of genetic programming (GP) for many applications [Koza, 1994b], several issues remain unresolved. The scalability of HGP techniques could be further improved by solving two such issues. One is the characterization of the value of subroutines. Current methods for HGP do not attempt to decide what is relevant, i.e. which blocks of code or subroutines may be worth giving special attention, but employ genetic operations on subroutines at random points. The other issue is the time-course of the generation of new subroutines. Current HGP techniques do not make informed choices to automatically decide whe
Subtree encapsulation versus ADFs in GP for parity problems
  • S C Roberts
  • D Howard
  • J R Koza
  • Lee Spector
  • Erik D Goodman
  • Annie Wu
  • W B Langdon
  • Hans-Michael Voigt
  • Mitsuo Gen
  • Sandip Sen
  • Marco Dorigo
  • Shahram Pezeshk
  • Max H Garzon
  • Edmund Burke
Subtree encapsulation versus ADFs in GP for parity problems
  • S C Roberts
  • D Howard
  • J R Koza
Lateral Thinking: A Textbook of Creativity. Penguin Books Ltd
  • De Bono
De Bono E., 1970. Lateral Thinking: A Textbook of Creativity. Penguin Books Ltd., England.
A Staged Genetic Programming Strategy for Image Analysis
  • D Howard
  • S C Roberts
Howard D., Roberts S. C., 1999. A Staged Genetic Programming Strategy for Image Analysis. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 1999), Orlando, Florida, 1047-1052, Morgan Kaufmann.
Investigating the Influence of Depth and Degree of Genotypic Change on Fitness
  • C Igel
  • K Chellapilla
Igel C., Chellapilla K., 1999. Investigating the Influence of Depth and Degree of Genotypic Change on Fitness. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 1999), Orlando, Florida, 1061-1068, Morgan Kaufmann.
Analysis of Genetic Diversity through Population History
  • N F Mcphee
  • N J Hopper
McPhee N. F., Hopper N. J., 1999. Analysis of Genetic Diversity through Population History. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 1999), Orlando, Florida, 1112-1120, Morgan Kaufmann.
Evolving Modules in GP by Subtree Encapsulation
  • S C Roberts
  • D Howard
  • J R Koza
Roberts S. C., Howard D., Koza J. R., 2001. Evolving Modules in GP by Subtree Encapsulation. In Proceedings of the 4th European Conference, EuroGP 2001, Lake Como, Italy, April 2001, Springer LNCS 2038.
Subtree encapsulation versus ADFs in GP for parity problems
  • roberts