Conference PaperPDF Available

The Usability Argument for Refinement Typed Genetic Programming


Abstract and Figures

The performance of Evolutionary Algorithms is frequently hindered by arbitrarily large search spaces. In order to overcome this challenge, domain-specific knowledge is often used to restrict the representation or evaluation of candidate solutions to the problem at hand. Due to the diversity of problems and the unpredictable performance impact, the encoding of domain-specific knowledge is a frequent problem in the implementation of evolutionary algorithms. We propose the use of Refinement Typed Genetic Programming, an enhanced hybrid of Strongly Typed Genetic Programming (STGP) and Grammar-Guided Genetic Programming (GGGP) that features an advanced type system with polymorphism and dependent and refined types. We argue that this approach is more usable for describing common problems in machine learning, optimisation and program synthesis, due to the familiarity of the language (when compared to GGGP) and the use of a unifying language to express the representation, the phenotype translation, the evaluation function and the context in which programs are executed.
Content may be subject to copyright.
The Usability Argument for
Refinement Typed Genetic Programming?
Alcides Fonseca1[0000000208794015] , Paulo Santos1[0000000201548989], and
Sara Silva1[0000000182234799]
Faculdade de Ciencias da Universidade de Lisboa, Portugal
Abstract The performance of Evolutionary Algorithms is frequently
hindered by arbitrarily large search spaces. In order to overcome this
challenge, domain-specific knowledge is often used to restrict the repre-
sentation or evaluation of candidate solutions to the problem at hand.
Due to the diversity of problems and the unpredictable performance im-
pact, the encoding of domain-specific knowledge is a frequent problem
in the implementation of evolutionary algorithms.
We propose the use of Refinement Typed Genetic Programming, an en-
hanced hybrid of Strongly Typed Genetic Programming (STGP) and
Grammar-Guided Genetic Programming (GGGP) that features an ad-
vanced type system with polymorphism and dependent and refined types.
We argue that this approach is more usable for describing common prob-
lems in machine learning, optimisation and program synthesis, due to the
familiarity of the language (when compared to GGGP) and the use of
a unifying language to express the representation, the phenotype trans-
lation, the evaluation function and the context in which programs are
Keywords: Genetic Programming ·Refined Types ·Search-Based Soft-
ware Engineering
1 Introduction
Genetic Programming (GP) [28] has been successfully applied in different areas,
including bioinformatics [13], quantum computing [35], and supervised machine
learning [19]. One of the main challenges of applying GP to real-world prob-
lems, such as program synthesis, is the efficient exploration of the vast search
space. Frequently, domain knowledge can be used to restrict the search space,
making the exploration more efficient. Strongly Typed Genetic Programming
(STGP) [24] restricts the search space by ignoring candidates that do not type
check. To improve its expressive power, STGP has been extended with type
?This work was supported by LASIGE (UIDB/00408/2020) and the CMU—Portugal
project CAMELOT (POCI-01-0247-FEDER-045915).
2 A. Fonseca et al.
inheritance [10], polymorphism [37] and a Hindley-Milner inspired type sys-
tem [22], the basis for those in Haskell, SML, OCaml or F].
Grammar-Guided Genetic Programming (GGGP) [21] also restricts the search
space, sometimes enforcing the same rules as STGP, only allowing the generation
of individuals that follow a given grammar. Grammar-based approaches have also
been developing towards restricting the search space. The initial proposal [21]
used context-free grammars (CFG) in the Backus Normal Form. The GAUGE
[31] system relies on attribute grammars to restrict the phenotype translation
from a sequence of integers. Christiansen grammars [33,6] can express more
restrictions than CFGs, but still have limitations, such as variable scoping, poly-
morphism or recursive declarations.
We propose Refinement Typed Genetic Programming (RTGP) as a more
robust version of STGP through the use of a type system with refinements
and dependent types. Languages with these features have gained focus in the
Programming Languages (PL) research area: LiquidHaskell [36] is an extension
of Haskell that supports refinements; Agda [2] and Idris [3] are dependently-
typed languages that are frequently used as theorem provers. These languages
support the encoding of specifications within the type system. Previously, special
constructs were required to add specification verification within the source code.
This idea was introduced in Eiffel [23] and applied later to Java with the JML
specification language [18].
In particular, our major contributions are:
A GP approach that relies on a simple grammar combined with a de-
pendent refined type system, with the argument that this approach is
more expressive than existing approaches;
Concretisation of this approach in the Æon Programming Language.
These contributions advance GP through a new interface in which to define
representations and assess their success. One particular field where our approach
might have a direct impact is general program synthesis. We identify two dif-
ficulties in the literature [15]: a) the large search space that results from the
combination of language operators, grammar and available functions, and b) the
lack of a continuous fitness function. We address both aspects within the same
programming language.
In the remainder of the current paper we present: the Æon language for
expressing GP problems (§2); a method for extracting fitness functions from
Æon programs (§3); the Refined Typed Genetic Programming approach (§4);
examples of RTGP (§5); a comparison with other approaches, from a usability
point of view (§6); and concluding remarks (§7).
2 The Æon Programming Language
We introduce the Æon programming language as an example of a language
with polymorphism and non-liquid refinements. This language can be used as
the basis for RTGP due to its support of static verification of polymorphism and
The Usability Argument for Refinement Typed Genetic Programming 3
type Array<T>{size:Int }// size is a ghost variable
range : (mi:Int, ma:Int) arr:Array<Int>where (ma >mi and arr.size == ma
mi) = native;
append : (a:Array<T>, e:T) n:Array<T>where (a.size + 1 == n.size) =
listWith10Elements : (i:Int) n:Array<Int>where (n.size == 10) {
append(range(0,i), 42) // Type error
fib : (n:Int) f:Int where (n >= 0 and f>= n) {
if n<2then 1else fib(n1) + fib(n2)
incomplete : (n:Int) r:Int where (r >n && fib(r) % 100 == 0) {
Listing 1.1: An example of the Æon language
a subset of the refinements. However, RTGP is not restricted to this language
and could be applied to other languages that have similar type systems.
Listing 1.1 presents a simple example in Æon. To keep Æon a pure language,
several low-level details are implemented in a host language, which Æon can
interact with using the native construct. The range function is an example of a
function whose definition is done in the native language of the interpreter1.
What distinguishes Æon from strongly typed mainstream languages like C or
Java is that types can have refinements that express restrictions over the types.
For instance, the refinements on range specify that the second argument must be
greater than the first, and the output array has size equal to their different.
The range call in the listWith10Elements function throws a compile error be-
cause iis an Integer, and there are integers that are not greater than 0 (the first
argument). The iargument should have been of type i:Int — i ¿ 0. However,
there is another refinement being violated because if i= 1, the size of the out-
put will be 2 and not 10 as expected. The correct input type should have been
{i:Int where i==9}for the function to compile.
It should now be clear how a language like Æon can be used to express
domain-knowledge in GP problems. A traditional STGP solution would accept
any integer value as the argument for range and would result in a runtime-
error that would be penalized in the fitness function. Individual repair is not
trivial to implement without resorting to symbolic execution, which is more
computationally intensive than the verification applied here.
The incomplete function, while very basic, is a simple example of the definition
of a search problem. The function receives any integer (as there are no restric-
1We have developed a compiler from Æon to Java and an Æon interpreter in Python.
In each case, the range function would have to be defined in Java and Python.
4 A. Fonseca et al.
tions) and returns an integer greater than the one received and whose Fibonacci
number is divisible by 100. A placeholder hole () is left as the implementation
of this function (inspired by Haskell’s and Agda’s ??name holes [8]). The place-
holder allows the program to parse, typecheck, but not execute2. Typechecking
is required to describe the search problem: Acceptable solutions are those that
inhabit the type of the hole, {r:Int where r>nand fib(r)% 100 == 0}. This is an
example of a dependent refined type as the type of rdepends on the value of n
in the context. This approach of allowing the user to define a structure and let
the search fill in the details has been used with success in sketch and SMT-based
approaches [34].
While the Æon language does not make any distinction, there are two classes
of refinements for the purpose of RTGP: liquid and non-liquid refinements. Liq-
uid refinements are those whose satisfiability can be statically verified, usually
through the means of an SMT solver. One such example is {x:Integer where
x.size % 2 == 0}. SMT solvers can solve this kind of linear arithmetic problems.
Another example is {x:Array<Integer>where x.size >0}because x.size is the same
as size(x) where size is an uninterpreted function in SMT solving.
Non-liquid refinements are those that SMT solvers are not able to reason
about. These are typically not allowed in languages like LiquidHaskell[36]. One
example is the second refinement of incomplete function, fib(r)% 100 == 0, because
the verification of correctness requires the execution of the fib function, which
can only be called during runtime [7], typically for runtime verification. Another
example of a non-liquid refinement would be the use of any natively defined
function because the SMT cannot be sure of its behaviour other than the liquid
refinement expressed in its type. For instance, when considering a native function
that makes an HTTP request, an SMT solver cannot guess statically what kind
of reply the server would send.
3 Refinements in GP
Now that we have addressed the difference between liquid and non-liquid refined
types, we will see how both are used in the RTGP process. Figure 1 presents an
overview of the data flow of the evolutionary process, starting from the prob-
lem formulation in Æon and ending in the solution found, also in Æon. The
architecture identifies compiler components and the Æon code that is either
generated or manipulated by those components.
3.1 Liquid Refinements for Constraining the Search Space
Liquid Refinements, the ones supported by Liquid Types [30], are conjunctions
of statically verifiable logical predicates of data. We define all other refinements
as non-liquid refinements. An example of a liquid type is {x:Int where x>3and
x<7}, where x > 3 and x < 7 are liquid refinements.
2Replacing the hole by a crash-inducing expression allows the program to compile or
be interpreted. While this is out of scope, the reader may find more in [25].
The Usability Argument for Refinement Typed Genetic Programming 5
Æon Compiler
Refinement Typed
Genetic Programming
Problem Formulation
Liquid Type
Fitness Criteria
Random Input
Æon Code
Figure 1: Architecture of the proposed approach.
In our approach, liquid refinements are used to constraint the generation of
candidate programs. Through the usage of a type-checker that supports Liquid
Types, we are preventing candidates that are known to be invalid from being
generated in the first place. However, the use of a type-checker is not ideal, as
several invalid candidates might be generated before one that meets the liquid
refinement is found.
Synquid [29] is a first step in improving the performance of liquid type syn-
thesis. Synquid uses an enumerative approach and an SMT-solver to synthesize
programs from a Liquid Type. However, Synquid has several limitations for this
purpose: it is unable to synthesize simple refinements related to numerical values
(such as the previous example), it is deterministic since it uses an enumerative
approach, and while it is presented as complete with regards to the semantic val-
ues (can generate all programs that can be expressed in the subset of supported
Liquid Types), it it not complete with regards to the syntactic expression. As
an example, Synquid is able to synthesize 2 as an integer, but not 1+1, since
the semantic values are equivalent. For the purposes of GP, not being able to
synthesize more complex representations of the same program prevents recombi-
nation and mutation from exploring small alternatives. We are in the process of
formalizing an algorithm that is more complete than Synquid for this purpose.
The RTGP algorithm (§4) uses the liquid type synthesis algorithm for:
Generating random input arguments in fitness evaluation (§3.2);
Generating a random individual in the initial population;
Generating a subtree in the mutation operator;
Generating a subtree in the recombination operator, when the other
parent does not have a compatible node;
6 A. Fonseca et al.
Boolean Continuous
true,false 0.0, 1.0
x6=y1f(x== y)
ab(f(a) + f(b))/2
abmin(f(a), f (b))
ab f(¬ab)
x < y norm((xy+δ))
Table 1: Conversion function fbetween boolean expressions and continuous
3.2 Non-liquid Refinements to express Fitness Functions
A good fitness function is ideally continuous and should be able to measure
how close to the real solution a potential solution is. However, fulfiling a given
specification is a boolean criterion: it either is fulfilled or not. While previous
work [15] has used the number of passed tests as the fitness function, we aim to
have a more fine-grained measurement of how far each test is from passing. In
particular, we consider the overall error as the fitness value, and the search as a
minimization problem.
We propose the use of non-liquid refinement types to synthesize a continuous
fitness criteria from the specification (depicted in Figure 1) that, together with
randomly generated input values, is used to obtain the fitness function.
f : (n:{Int |n>0}, a:{Array<String>|a.size == 3})r:Int where (r >n &&
fib(r) % 100 == 0 and (n >4serverCheck(r) == 3) ) {}
Listing 1.2: An example of a specification that corresponds to a bi-objective
Listing 1.2 shows an example with a liquid refinement (r > n) and two non-
liquid clauses in the refinement. Each of these two clauses is handled individually
as in a multi-objective problem. Each clause is first reduced to the conjunctive
normal form (CNF) and then converted from a predicate into a continuous func-
tion that, given the input and expected output, returns a floating point number
between 0.0 and 1.0, where the value represents the error. For instance, the
example in Listing 1.2 is converted to two functions:
norm(fib(r)% 100 0) and min(1norm(4n+δ), norm(serverCheck(r) 3))
Table 1 shows the conversion rules between boolean expressions and corre-
sponding continuous values. The function f, which is defined using these rules,
is applied recursively until a final continuous expression is generated. This ap-
proach is an extension of that presented in [12].
Since the output of fis an error, the value true is converted to 0.0, stat-
ing that condition holds, otherwise 1.0, this being the maximum value of not
complying with the condition. Variables and function calls are also converted to
0.0 and 1.0 on whether the condition holds or not. Equalities of numeric values
are converted into the normalized absolute difference between the arguments.
The Usability Argument for Refinement Typed Genetic Programming 7
The normalization is required as it allows different clauses to have the same im-
portance on the given specification. Inequalities are converted to equalities and
its difference with 1, negating the fitness result from equality. Conjunctions are
converted to the average of the sum of the fitness extraction of both operands.
Disjunctions value is obtained by extracting the minimum fitness value of both
clauses. The minimum value indicates what clause is the closest to no error.
Conditional statements fitness is recursively extracted by using the material im-
plication rule. Similarly to inequalities, the negation of conditions denies the
value returned by the truth of the condition. Numeric value comparisons repre-
sented a harder challenge as there are intervals where the condition holds. We
use the difference of values to represent the error. In the <and >rules, the δ
constant depends on the type of the numerical value, 1.0 for integers and 0.00001
for doubles, and is essential for the extra step required for the condition to hold
its truth value. A rectifier linear unit was used to ensure that if the condition
holds, it is set to the maximum between the negative number and 0, otherwise,
if the value is greater than 0, the positive fitness value is normalized.
The fitness function is the result of applying each fifor each non-liquid
refinement to a set of randomly generated (using the liquid synthesis algorithm
in §3.1) input values. The fitness of an individual is the combination of all fi
for all random input values.
4 The RTGP Algorithm
The proposed RTGP algorithm follows the classical STGP [24] in its structure
but differs in the details. Just like in all GP approaches, multiple variants can
be obtained by changing or swapping some of the components presented here.
4.1 Representation
RTGP can have either a bitstream representation (e.g., [31]) or a direct repre-
sentation (e.g., [24]). For the sake of simplicity, let us consider the direct repre-
sentation in the remainder of the paper.
4.2 Initialization Procedure
To generate random individuals, the algorithm mentioned in §3.1 is used with
the context and type of the as arguments. This is repeated until the population
has the desired size. Koza proposed the combination of full and grow as ramped-
half-and-half [14], which is used in classical STGP. In RTGP, the full method is
not always possible, since no valid expression with that given depth may exist
in the language. If, for instance, we want an expression of type Xand the only
function that returns Xis the constructor without any parameters. In this case,
it is impossible to have any expression of type Xwith dgreater than 1. Unlike
in the STGP full method, a tree is used in the initial population, even if it does
not have the predetermined depth.
8 A. Fonseca et al.
4.3 Evaluation
The goal of the search problem is the minimization of the error between the
observed and the expressed specification. Non-liquid refinements are translated
to multi-objective criteria (following the approach explained in §3.2). The input
values are randomly generated at each generation to prevent overfitting [9]. A
fitness of 0.0 for one clause represents that all sets of inputs have passed that
condition successfully. The overall objective of the candidate is to obtain a 0.0
fitness in all clauses.
4.4 Selection and Genetic Operators
Recent work has provided significant insights on parent selection in program
synthesis [11]. A variant of lexicase selection, dynamic -Lexicase [17] selection,
has been used to allow near-elite individuals to be chosen in continuous search
The mutation operator chooses a random node from the candidate tree. A
replacement is randomly generated by providing the node type to the expression
synthesis algorithm along with the current node depth, fulfiling the maximum
tree depth requirement. The valid subtrees of the replaced node are provided as
genetic material to the synthesizer, allowing partial mutations on the candidate.
The crossover operator selects two random parents using the dynamic -
lexicase selection algorithm. A random node is chosen from the first parent, and
nodes with the same type from the second parent are selected for transplantation
into the first parent. If no compatible nodes are found, the expression synthesizer
is invoked using the second parent valid subtrees, and the remaining first parent
subtrees as genetic material. This is similar to how STGP operates, with the
distinction that subtyping in Liquid Types refers to the implication of semantic
properties. Thus, unsafe recombinations and mutations will never occur.
4.5 Stopping Criteria
The algorithm iterates over generations of the population until one or multiple
of the following criteria are met: a) there is an individual of fitness 0.0; b)
a predefined number of generations have been iterated; c) a predefined time
duration has passed.
5 Examples of RTGP
This section introduces three examples from the literature implemented in Æon.
5.1 Santa Fe Ant Trail
The Santa Fe Ant Trail problem is frequently used as a benchmark for GP. In
[26], the authors propose a grammar-based approach to solve this problem. In
RTGP, if-then-else conditions and auxiliary functions (via lambda abstraction)
are embedded in the language, making this a very readable program.
The Usability Argument for Refinement Typed Genetic Programming 9
type Map;
food present : (m:Map) Int = native;
food ahead : (m:Map) Boolean = native;
left : (m:Map) Map = native;
right : (m:Map) Map = native;
move : (m:Map) Map = native;
program : (m:Map) m2:Map where ( food present(m2) == 0 ) {}
Listing 1.3: Santa Fe Ant Trail
5.2 Super Mario Bros Level Design
The second example defines the search for an interesting design for a Super Mario
Bros level that maximizes the engagement, minimizes frustration and maximizes
challenge. These functions are defined according to a model that can easily be
implemented in Æon (Listing 1.4). We present this as a more usable alternative
to the one that uses GGGP [32].
type X as {x:Integer |5<= x && x <= 95 }
type Y as {x:Integer |3<= x && x <= 5 }
type Wg as {x:Integer |2<= x && x <= 5 }
type W as {x:Integer |2<= x && x <= 7 }
type Wb as {x:Integer |2<= x && x <= 6 }
type Wa as Wb
type Wc as W
type Level as Pair<List<Chunk>,{enemies:List<Enemy>|2<= enemies.size &&
enemies.size <= 10}>;
type BoxType;
block coin() BoxType = native;
rock coin() BoxType = native;
block powerup() BoxType = native;
rock empty() BoxType = native;
type Chunk;
gap(x:X, y:Y, wg:Wg, wb:Wb, wa:Wa) Level = native;
platform(x:X, y:Y, w:W) Level = native;
hill(x:X, y:Y, w:W) Level = native;
cannon hill(x:X, y:Y, wg:Wg, wb:Wb, wa:Wa) Level = native;
tube hill(x:X, y:Y, wg:Wg, wb:Wb, wa:Wa) Level = native;
coin(x:X, y:Y, w:Wc) Level = native;
cannon(x:X, y:Y, wg:Wg, wb:Wb, wa:Wa) Level = native;
tube(x:X, y:Y, wg:Wg, wb:Wb, wa:Wa) Level = native;
boxes(t:BoxType, b:{List<Pair<X,Yi| 2<= b.size && b.size <= 6 })Level =
type Enemy;
koopa(x:X) Enemy = native;
goompa(x:X) Enemy = native;
10 A. Fonseca et al.
generateLevel() l:Level where ( @maximize(engagement(l)) and
@minimize(frustration(l)) and @maximize(challenge(l) ) {}
Listing 1.4: Super Mario Bros Level Design
Compared with the proposed grammar [32], the complexity is similar and
productions in either version are directly correspondent. The Æon version is
arguably more expressive because the combinations of repetitions of objects
with minimum and maximum number of repetitions can be bounded using types
(enemies and boxes).
5.3 Logical Gates
The third example is taken from [27], where the goal is to “given any logical
function, find a logically equivalent symbolic expression that uses only the oper-
ators in one of the three following complete sets: and, or, not, nand, nor”. The
authors propose a Christiansen grammar, which is context-sensitive, to express
this problem. Listing 1.5 presents a more simple implementation of the problem
using Æon. It can be argued that the implementation using refinements comes
more directly from the problem statement than the complex dynamic grammar
used in [27]. Furthermore, the implementation of the operations can be done
directly in the same language.
set(x:Boolean) y:Boolean = uninterpreted;
andG(x:Boolean, y:Boolean) z:Boolean where ( set(x) == 1 and set(y) == 1
and set(z) == 1 ) = {x && y }
or(x:Boolean, y:Boolean) z:Boolean where ( set(x) == 1 and set(y) == 1 and
set(z) == 1 ) = {x|| y}
not(x:Boolean) z:Boolean where ( set(x) == 1 and set(z) == 1 ) = {!x }
nand(x:Boolean, y:Boolean) z:Boolean where ( set(x) == 2 and set(y) == 2
and set(z) == 2 ) = {!(x && y) }
nor(x:Boolean, y:Boolean) z:Boolean where ( set(x) == 3 and set(y) == 3 and
set(z) == 3 ) = {!(x || y) }
target : (x:Boolean, ..., z:Boolean) e:Boolean where ( e == f(x,...z) ) {}
Listing 1.5: Equivalent Logical Gates to a given function f.
6 Discussion
This section compares RTGP with GGGP and presents arguments why RTGP
could be used instead of GGGP. Because Dependent Types can encode gram-
mars [5], the performance of both approaches is equivalent.
6.1 A Direct Comparison with GGGP
A survey on GGGP [21] identified the advantages and disadvantages of GGGP.
We compare with RTGP on the advantages:
The Usability Argument for Refinement Typed Genetic Programming 11
Ability to declaratively restrict the search space—A type system is
used instead of a grammar to express the restriction.
Problem Structure—Problem domains that already follow a grammar struc-
ture can be easily encoded in RTGP. RTGP can more directly encode several
problems than a grammar. Two examples are General-purpose programming
and the Logical Gates problem (§5.3).
Homologous Operators—Both GGGP and RTGP restrict the replacement
of one component by another of similar close values.
Flexible Extension—Extensions to GP can be encoded both in grammars
and dependent types. Both approaches can be used as engines to test other
GP concepts.
And disadvantages:
Feasibility Constraints—Both GGGP and RTGP make the design of new
operators a more significant challenge than in STGP, given that operators
should follow the constraints imposed by the system. All RTGP operators are
shared among any problem and rely solely on two algorithms: the type checker
and expression synthesis.
Repair Mechanisms—Implementing repairing in GGGP often depends on
the grammar. RTGP relies on n expression synthesis algorithm (§3.1) that
generates individuals in a way that constraints are never violated. The same
has been done for the mutation and crossover operators. However, a repair
mechanism is straightforward: the type-checker identifies the malign node,
and expression synthesis generates a replacement.
Limited Flexibility—GGGP is flexible when the program can be directly
encoded in a context-free grammar. Some GGGP approaches use context-
sensitive grammars (CSGP) [27], but readability can become a problem (ex-
plained in §6.2).
Turing Incompleteness—GGGP supports grammars with semantics that
allow the encoding of Turing-complete and incomplete problems. As such,
GGGP does not offer any additional support for computation paradigms such
as recursion and iteration, like other GP systems. RTGP supports both recur-
sion and iteration directly, unless otherwise specified.
6.2 Usability
Instead, the main argument for RTGP over GGGP is one that concerns with
usability. First, RTGP provides an integrated environment for describing the
context, the problem, the search space and the solutions. Taking Æon as an
example of RTGP, the environment in which the final program will execute
can be defined, relying on native functions to use software written in other
programming languages. The problem is defined using refined types for the goal
of the system, and a hole marker () is left as a placeholder for the program
we are looking for. The search space is defined by the types used in the problem
definition. Finally, the solution is a program in the same language as everything
else, so it is ready to execute (and be evaluated).
12 A. Fonseca et al.
On the other hand, if one were to use GGGP, one would have to create each
of these components individually. The lack of a de-facto standard framework
for GGGP helps this argument, in which interfacing with the context can be
more complicated than implementing GGGP itself. GGGP concerns only with
the description of the search space, while RTGP provides an integrated view of
using GP.
The strongest point for RTGP is that it does not require the user to define
a grammar. Just by placing holes in a program, users can use RTGP without
even knowing how to define a grammar. Instead, they need to know how to use a
familiar programming language (which to implement GGGP is already required)
and to know how to express desired properties in refined types. While refined
types have not yet become mainstream, several languages have feature subsets of
its features for a long time. Eiffel [23] supported pre- and post-conditions since
1986. Ada is another language that supports design by contract [4], and it is
very popular for critical embedded development, being used for large projects
in air traffic control [16] with more than 1 million lines of code. Advanced type
systems have become more popular to prevent bugs from existing in codebases.
Mozilla created Rust to avoid concurrency issues in the Firefox browser [20],
and Microsoft is using PL and SMT-based techniques to verify low-level critical
components of the kernel and drivers [1].
7 Conclusions and Future Work
We have presented Refinement Typed Genetic Programming (RTGP) as an ap-
proach to describe search problems in an integrated programming language. We
have introduced a language, Æon, capable of expressing the environment, the
fitness function, the search space, and the solution. The language features an
advanced type system with liquid and non-liquid types. We have provided a
methodology to generate the fitness function from non-liquid refined types, and
we have introduced an algorithm that generates expressions from any inhabitable
type in this language.
In §6 we have compared RTGP against GGGP, concluding that they are
equivalent in expressiveness. However, we argue that RTGP provides better us-
ability for end-users than GGGP, in which all aspects of the evolution have to be
implemented. Furthermore, expressing restrictions in types allows more modular
programs and better readability inside an integrated experience for defining and
using RTGP.
There are still some aspects to explore with regards to RTGP: identifying
the most efficient representation; improving the liquid type synthesis; finding the
best representation for non-functional properties of programs; how to integrate
this synthesis in an integrated editor; and to perform a exhaustive benchmark
performance analysis.
The Usability Argument for Refinement Typed Genetic Programming 13
1. Ball, T., Cook, B., Levin, V., Rajamani, S.K.: Slam and static driver verifier: Tech-
nology transfer of formal methods inside microsoft. In: International Conference
on Integrated Formal Methods. pp. 1–20. Springer (2004)
2. Bove, A., Dybjer, P., Norell, U.: A brief overview of Agda,a functional language
with dependent types. In: International Conference on Theorem Proving in Higher
Order Logics. pp. 73–78. Springer (2009)
3. Brady, E.: Idris, a general-purpose dependently typed programming language: De-
sign and implementation. Journal of Functional Programming 23(05), 552–593
4. Brink, K., van Katwijk, J., Toetenel, W.: Ada 95 as implementation vehicle for
formal specifications. In: Proceedings of 3rd International Workshop on Real-Time
Computing Systems and Applications. pp. 98–105. IEEE (1996)
5. Brink, K., Holdermans, S., L¨oh, A.: Dependently typed grammars. In: Mathematics
of Program Construction, 10th International Conference, MPC 2010, Qu´ebec City,
Canada, June 21-23, 2010. Proceedings. pp. 58–79 (2010),
10.1007/978-3- 642-13321-3_6
6. Christiansen, H.: A survey of adaptable grammars. ACM SIGPLAN Notices
25(11), 35–44 (1990)
7. Elmas, T., Tasiran, S., Qadeer, S.: Vyrd: verifying concurrent programs by runtime
refinement-violation detection. ACM SIGPLAN Notices 40(6), 27–37 (2005)
8. Gissurarson, M.P.: Suggesting Valid Hole Fits for Typed-Holes in Haskell. Master’s
thesis (2018)
9. Gon¸calves, I., Silva, S., Melo, J.B., Carreiras, J.a.M.B.: Random sampling tech-
nique for overfitting control in genetic programming. In: Proceedings of the
15th European Conference on Genetic Programming. pp. 218–229. EuroGP’12,
Springer-Verlag, Berlin, Heidelberg (2012),
978-3- 642-29139-5_19
10. Haynes, T.D., Schoenefeld, D.A., Wainwright, R.L.: Type inheritance in strongly
typed genetic programming. Advances in genetic programming 2(2), 359–376
11. Helmuth, T., Mcphee, N., Spector, L.: Lexicase Selection for Program Synthesis:
A Diversity Analysis, pp. 151–167 (12 2016)
12. Korel, B.: Automated software test data generation. IEEE Transactions on software
engineering 16(8), 870–879 (1990)
13. Kosakovsky Pond, S.L., Posada, D., Gravenor, M.B., Woelk, C.H., Frost, S.D.:
Gard: a genetic algorithm for recombination detection. Bioinformatics 22(24),
3096–3098 (2006)
14. Koza, J.R.: Genetic programming: on the programming of computers by means of
natural selection, vol. 1. MIT press (1992)
15. Krawiec, K.: Behavioral Program Synthesis with Genetic Programming, Studies
in Computational Intelligence, vol. 618. Springer (2016),
1007/978-3- 319-27565-9
16. Kruchten, P., Thompson, C.J.: An object-oriented, distributed architecture for
large-scale ada systems. In: Proceedings of the conference on TRI-Ada’94. pp.
262–271 (1994)
17. La Cava, W., Helmuth, T., Spector, L., Moore, J.H.: A probabilistic and multi-
objective analysis of lexicase selection and e-lexicase selection. Evolutionary Com-
putation 27(3), 377–402 (2019). a 00224, https:
//, pMID: 29746157
14 A. Fonseca et al.
18. Leavens, G.T., Baker, A.L., Ruby, C.: JML: a Java modeling language. In: Formal
Underpinnings of Java Workshop (at OOPSLA’98). pp. 404–420 (1998)
19. Loveard, T., Ciesielski, V.: Representing classification problems in genetic pro-
gramming. In: Evolutionary Computation. vol. 2, pp. 1070–1077. IEEE (2001)
20. Matsakis, N.D., Klock, F.S.: The Rust language. ACM SIGAda Ada Letters 34(3),
103–104 (2014)
21. Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’neill, M.: Grammar-based
genetic programming: a survey. Genetic Programming and Evolvable Machines
11(3-4), 365–396 (2010)
22. McPhee, N.F., Hopper, N.J., Reierson, M.L.: Impact of types on essentially typeless
problems in GP. In: Genetic Programming. pp. 232–240 (1998)
23. Meyer, B.: Eiffel: programming for reusability and extendibility. ACM Sigplan
Notices 22(2), 85–94 (1987)
24. Montana, D.J.: Strongly typed genetic programming. Evolutionary computation
3(2), 199–230 (1995)
25. Omar, C., Voysey, I., Chugh, R., Hammer, M.A.: Live functional programming
with typed holes. Proceedings of the ACM on Programming Languages 3(POPL),
1–32 (2019)
26. O’Neill, M., Ryan, C.: Grammar based function definition in grammatical evolu-
tion. In: Proceedings of the 2nd Annual Conference on Genetic and Evolutionary
Computation. pp. 485–490 (2000)
27. Ortega, A., De La Cruz, M., Alfonseca, M.: Christiansen grammar evolution: gram-
matical evolution with semantics. IEEE Transactions on Evolutionary Computa-
tion 11(1), 77–90 (2007)
28. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic pro-
gramming. Published via and freely available at (2008), (With contributions by J.
R. Koza)
29. Polikarpova, N., Kuraj, I., Solar-Lezama, A.: Program synthesis from polymor-
phic refinement types. In: Proceedings of the 37th ACM SIGPLAN Conference on
Programming Language Design and Implementation. pp. 522–538. ACM (2016)
30. Rondon, P.M., Kawaguci, M., Jhala, R.: Liquid types. In: Proceedings of the 29th
ACM SIGPLAN Conference on Programming Language Design and Implementa-
tion. pp. 159–169 (2008)
31. Ryan, C., Nicolau, M., O’Neill, M.: Genetic algorithms using grammatical evolu-
tion. In: European Conference on Genetic Programming. pp. 278–287. Springer
32. Shaker, N., Nicolau, M., Yannakakis, G.N., Togelius, J., O’neill, M.: Evolving levels
for super mario bros using grammatical evolution. In: 2012 IEEE Conference on
Computational Intelligence and Games (CIG). pp. 304–311. IEEE (2012)
33. Shutt, J.N.: Recursive adaptable grammars (1999)
34. Solar-Lezama, A.: Program synthesis by sketching. University of California, Berke-
ley (2008)
35. Spector, L., Barnum, H., Bernstein, H.J., Swamy, N.: Finding a better-than-
classical quantum and/or algorithm using genetic programming. In: Evolutionary
Computation. vol. 3, pp. 2239–2246. IEEE (1999)
36. Vazou, N., Seidel, E.L., Jhala, R., Vytiniotis, D., Peyton-Jones, S.: Refinement
types for haskell. In: ACM SIGPLAN Notices. vol. 49, pp. 269–282. ACM (2014)
37. Yu, T.: Polymorphism and genetic programming. In: European Conference on Ge-
netic Programming. pp. 218–233. Springer (2001)
... GGGP has been consistently presented as being more general than STGP [3,21,25,31], with few disputes of the similarity in expressive power [8,9], without any evidence. We aim to clarify the issue by comparing the relative expressive power of GGGP, AGE and STGP from a theoretical perspective. ...
... In our ranking, we evidence examples, which generalize for the limitations in expressing constraints on the search space. Secondly, we lay the theoretical foundation for implementers of other search-space-contraining approaches (e.g., Refined Typed Genetic Programming [8,9]) to formalize their work and show theoretical properties, especially in comparison to these three baseline approaches. We believe our work serves as a benchmark in theoretical expressive power, much like there are benchmark suites that are used to benchmark empirical performance [12]. ...
... The formalization presented also the foundation for future formalisation of extensions to either method, and the baselines to evaluate against. This can be used, for instance, to compare Christiansen Grammatical Evolution [26] and Refined Typed GP [9]. ...
Full-text available
Since Genetic Programming (GP) has been proposed, several flavors of GP have arisen, each with their own strengths and limitations. Grammar-Guided and Strongly-Typed GP (GGGP and STGP, respectively) are two popular flavors that have the advantage of allowing the practitioner to impose syntactic and semantic restrictions on the generated programs. GGGP makes use of (traditionally context-free) grammars to restrict the generation of (and the application of genetic operators on) individuals. By guiding this generation according to a grammar, i.e. a set of rules, GGGP improves performance by searching for an good-enough solution on a subset of the search space. This approach has been extended with Attribute Grammars to encode semantic restrictions, while Context-Free Grammars would only encode syntactic restrictions. STGP is also able to restrict the shape of the generated programs using a very simple grammar together with a type system. In this work, we address the question of which approach has more expressive power. We demonstrate that STGP has higher expressive power than Context-Free GGGP and less expressive power than Attribute Grammatical Evolution.
... Strongly-Typed Genetic Programming [21] only generates trees that are valid by construction, showing success in practice [13,26]. Refinement-Typed GP [10] (RTGP) extends STGP with value-dependent types, increasing the expressive power of the system. Strongly-Formed GP [4] (SFGP) also extends STGP with the ability to distinguish node-tonode from node-to-terminal relations, allowing for higher specificity and thus a reduction of the search space. ...
... For instance, it might be relevant to restrict integers to be positive, or less than 10. To achieve this, we propose Meta-Handlers, inspired by Refinement Typed Genetic Programming [10]. ...
... Another example where Meta-Handlers contribute to expressive power is through value-dependent types [10]. Consider the example of generating a pair of integers ( , ) where > . ...
... Strongly-Typed Genetic Programming [21] only generates trees that are valid by construction, showing success in practice [12,26]. Refinement-Typed GP [9] (RTGP) extends STGP with dependent types, increasing the expressive power of the system. Strongly-Formed GP [3] (SFGP) also extends STGP with the ability to distinguish node-to-node from node-toterminal relations, allowing for higher specificity and thus a reduction of the search space. ...
... For instance, it might be relevant to restrict integers to be positive, or less than 10. To achieve this, we propose Meta-Handlers, inspired by Refinement Typed Genetic Programming [9]. ...
... Another example where Meta-Handlers contribute to expressive power is through dependent types [9]. Consider the example of generating a pair of integers ( , ) where > . ...
Full-text available
Genetic Programming (GP) is an heuristic method that can be applied to many Machine Learning, Optimization and Engineering problems. In particular, it has been widely used in Software Engineering for Test-case generation, Program Synthesis and Improvement of Software (GI). Grammar-Guided Genetic Programming (GGGP) approaches allow the user to refine the domain of valid program solutions. Backus Normal Form is the most popular interface for describing Context-Free Grammars (CFG) for GGGP. BNF and its derivatives have the disadvantage of interleaving the grammar language and the target language of the program. We propose to embed the grammar as an internal Domain-Specific Language in the host language of the framework. This approach has the same expressive power as BNF and EBNF while using the host language type-system to take advantage of all the existing tooling: linters, formatters, type-checkers, autocomplete, and legacy code support. These tools have a practical utility in designing software in general, and GP systems in particular. We also present Meta-Handlers, user-defined overrides of the tree-generation system. This technique extends our object-oriented encoding with more practicability and expressive power than existing CFG approaches, achieving the same expressive power of Attribute Grammars, but without the grammar vs target language duality. Furthermore, we evidence that this approach is feasible, showing an example Python implementation as proof. We also compare our approach against textual BNF-representations w.r.t. expressive power and ergonomics. These advantages do not come at the cost of performance, as shown by our empirical evaluation on 5 benchmarks of our example implementation against PonyGE2. We conclude that our approach has better ergonomics with the same expressive power and performance of textual BNF-based grammar encodings.
... In fact, GP is also capable of solving the sketch-based problems [4]. Recently, a new approach has been proposed that integrates SMT solvers in the evolutionary search, in the context of Program Synthesis [8], which will be the focus of this Chapter. Another advantage of using GP with constraints is that it can be used to increase the readability of the generated programs [5]. ...
... A bidirectional tree generation algorithm that supports polymorphic types has been shown to reduce the search space exponentially [18]. Our idea in this Chapter is to further advance the idea of using advanced type systems to encode semantic information in the tree structure, which was introduced as a more usable alternative to Grammar-Guided GP [8]. ...
... Refined-Typed GP [8] was proposed to address this usability issue. It follows the line of work of STGP by having a simple grammar and using an advanced type system with dependent types to have the same expressive power as Christiansen Grammars. ...
Full-text available
Search-Based Software Engineering problems frequently have semantic constraints that can be used to deterministically restrict what type of programs can be generated, improving the performance of Genetic Programming. Strongly-Typed and Grammar-Guided Genetic Programming are two examples of using domain-knowledge to improve performance of Genetic Programming by preventing solutions that are known to be invalid from ever being added to the population. However, the restrictions in real world challenges like program synthesis, automated program repair or test generation are more complex than what context-free grammars or simple types can express. We address these limitations with examples, and discuss the process of efficiently generating individuals in the context of Christiansen Grammatical Evolution and Refined-Typed Genetic Programming. We present three new approaches for the population initialization procedure of semantically constrained GP that are more efficient and promote more diversity than traditional Grammatical Evolution.
... Recently, some researchers have begun investigating how such tools can be incorporated with GP including using counter examples to drive the search [23] and program-sketching [3], both ideas originating from the programming languages community. Other work has considered type refinements [11,41] to constrain the search space. ...
Program synthesis automates the process of writing code, which can be a very useful tool in allowing people to better leverage computational resources. However, a limiting factor in the scalability of current program synthesis techniques is the large size of the search space, especially for complex programs. We present a new model for synthesizing programs which reduces the search space by composing programs from program pieces, which are component functions provided by the user. Our method uses genetic programming search with a fitness function based on refinement type checking, which is a formal verification method that checks function behavior expressed through types. We evaluate our implementation of this method on a set of 3 benchmark problems, observing that our fitness function is able to find solutions in fewer generations than a fitness function that uses example test cases. These results indicate that using refinement types and other formal methods within genetic programming can improve the performance and practicality of program synthesis.
Conference Paper
Automated test generation helps programmers to test their software with minimal intervention. Automated test generation tools produce a set of program inputs that maximize the possible execution paths, presented as a test coverage metric. Proposed approaches fall within three main approaches. Search-based methods work on any program by randomly searching for inputs that maximize coverage. Heuristic-based methods can be used to have better performance than pure random-search. Constraint-based methods use symbolic execution to restrict the random inputs to those guaranteed to explore different paths. Despite making the execution slower and supporting very few programs, these methods are more efficient because the search space is vastly reduced. The third approach combines the previous two to support any program and takes advantage of the space search reduction when able, at the cost of slower execution. We propose a fourth approach that also refines search-based with constraints. However, instead of requiring a slower symbolic execution when measuring coverage, constraints are statically extracted from the source code before the search procedure occurs. Our approach supports all programs (as in Search-Based) and reduces the search-space (as in Constraint-based methods). The innovation is that static analysis occurs only once and, despite being less exact that symbolic execution, it can significantly reduce the execution cost in every coverage measurement. This paper introduces this approach, describes how it can be implemented and discusses its advantages and drawbacks.
Full-text available
Live programming environments aim to provide programmers (and sometimes audiences) with continuous feedback about a program's dynamic behavior as it is being edited. The problem is that programming languages typically assign dynamic meaning only to programs that are complete, i.e. syntactically well-formed and free of type errors. Consequently, live feedback presented to the programmer exhibits temporal or perceptive gaps. This paper confronts this "gap problem" from type-theoretic first principles by developing a dynamic semantics for incomplete functional programs, starting from the static semantics for incomplete functional programs developed in recent work on Hazelnut. We model incomplete functional programs as expressions with holes, with empty holes standing for missing expressions or types, and non-empty holes operating as membranes around static and dynamic type inconsistencies. Rather than aborting when evaluation encounters any of these holes as in some existing systems, evaluation proceeds around holes, tracking the closure around each hole instance as it flows through the remainder of the program. Editor services can use the information in these hole closures to help the programmer develop and confirm their mental model of the behavior of the complete portions of the program as they decide how to fill the remaining holes. Hole closures also enable a fill-and-resume operation that avoids the need to restart evaluation after edits that amount to hole filling. Formally, the semantics borrows machinery from both gradual type theory (which supplies the basis for handling unfilled type holes) and contextual modal type theory (which supplies a logical basis for hole closures), combining these and developing additional machinery necessary to continue evaluation past holes while maintaining type safety. We have mechanized the metatheory of the core calculus, called Hazelnut Live, using the Agda proof assistant. We have also implemented these ideas into the Hazel programming environment. The implementation inserts holes automatically, following the Hazelnut edit action calculus, to guarantee that every editor state has some (possibly incomplete) type. Taken together with this paper's type safety property, the result is a proof-of-concept live programming environment where rich dynamic feedback is truly available without gaps, i.e. for every reachable editor state.
Full-text available
Lexicase selection is a selection method for evolutionary computation in which individuals are selected by filtering the population according to performance on test cases, considered in random order. When used as the parent selection method in genetic programming, lexicase selection has been shown to provide significant improvements in problem-solving power. In this chapter we investigate the reasons for the success of lexicase selection, focusing on measures of population diversity. We present data from eight program synthesis problems and compare lexicase selection to tournament selection and selection based on implicit fitness sharing. We conclude that lexicase selection does indeed produce more diverse populations, which helps to explain the utility of lexicase selection for program synthesis.
Type systems allow programmers to communicate a partial specification of their program to the compiler using types, which can then be used to check that the implementation matches the specification. But can the types be used to aid programmers during development? In this experience report I describe the design and implementation of my lightweight and practical extension to the typed-holes of GHC that improves user experience by adding a list of valid hole fits and refinement hole fits to the error message of typed-holes. By leveraging the type checker, these fits are selected from identifiers in scope such that if the hole is substituted with a valid hole fit, the resulting expression is guaranteed to type check.
Conference Paper
Type systems allow programmers to communicate a partial specification of their program to the compiler using types, which can then be used to check that the implementation matches the specification. But can the types be used to aid programmers during development? In this experience report I describe the design and implementation of my lightweight and practical extension to the typed-holes of GHC that improves user experience by adding a list of valid hole fits and refinement hole fits to the error message of typed-holes. By leveraging the type checker, these fits are selected from identifiers in scope such that if the hole is substituted with a valid hole fit, the resulting expression is guaranteed to type check.
Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems in program synthesis and symbolic regression, the central goal of this article is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we propose new variants of ε -lexicase selection, a method that modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that ε -lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.
We present an algorithm for synthesizing recursive functions that provably satisfy a given specification in the form of a refinement type. We show that refinement types can be decomposed more effectively than other kinds of specifications, which helps prune the space of candidate programs the synthesizer has to consider. Our algorithm can automatically generate refined instantiations of polymorphic types, which enables synthesis of programs with nontrivial invariants from a small set of generic components. We have evaluated our prototype implementation on a set of benchmarks and found that it meets or exceeds the state of the art in terms of scalability and conciseness of user input, and also extends the class of programs for which a verified implementation can be synthesized.
Conference Paper
We present a method for synthesizing recursive functions that provably satisfy a given specification in the form of a polymorphic refinement type. We observe that such specifications are particularly suitable for program synthesis for two reasons. First, they offer a unique combination of expressive power and decidability, which enables automatic verification—and hence synthesis—of nontrivial programs. Second, a type-based specification for a program can often be effectively decomposed into independent specifications for its components, causing the synthesizer to consider fewer component combinations and leading to a combinatorial reduction in the size of the search space. At the core of our synthesis procedure is a newalgorithm for refinement type checking, which supports specification decomposition. We have evaluated our prototype implementation on a large set of synthesis problems and found that it exceeds the state of the art in terms of both scalability and usability. The tool was able to synthesize more complex programs than those reported in prior work (several sorting algorithms and operations on balanced search trees), as well as most of the benchmarks tackled by existing synthesizers, often starting from a more concise and intuitive user input.
In this introductory chapter, we characterize and formalize the key concepts of this book, in particular computer programs. We also define the task of program synthesis and determine the main factors that make it challenging. Finally, we delineate several paradigms of program synthesis, among them genetic programming.
Genetic programming (GP) is a popular heuristic methodology of program synthesis with origins in evolutionary computation. In this generate-and-test approach, candidate programs are iteratively produced and evaluated. The latter involves running programs on tests, where they exhibit complex behaviors reflected in changes of variables, registers, or memory. That behavior not only ultimately determines program output, but may also reveal its `hidden qualities' and important characteristics of the considered synthesis problem. However, the conventional GP is oblivious to most of that information and usually cares only about the number of tests passed by a program. This `evaluation bottleneck' leaves search algorithm underinformed about the actual and potential qualities of candidate programs. This book proposes behavioral program synthesis, a conceptual framework that opens GP to detailed information on program behavior in order to make program synthesis more efficient. Several existing and novel mechanisms subscribing to that perspective to varying extent are presented and discussed, including implicit fitness sharing, semantic GP, co-solvability, trace convergence analysis, pattern-guided program synthesis, and behavioral archives of subprograms. The framework involves several concepts that are new to GP, including execution record, combined trace, and search driver, a generalization of objective function. Empirical evidence gathered in several presented experiments clearly demonstrates the usefulness of behavioral approach. The book contains also an extensive discussion of implications of the behavioral perspective for program synthesis and beyond.
Rust is a new programming language for developing reliable and efficient systems. It is designed to support concurrency and parallelism in building applications and libraries that take full advantage of modern hardware. Rust's static type system is safe¹ and expressive and provides strong guarantees about isolation, concurrency, and memory safety. Rust also offers a clear performance model, making it easier to predict and reason about program efficiency. One important way it accomplishes this is by allowing fine-grained control over memory representations, with direct support for stack allocation and contiguous record storage. The language balances such controls with the absolute requirement for safety: Rust's type system and runtime guarantee the absence of data races, buffer overflows, stack overflows, and accesses to uninitialized or deallocated memory.