Conference PaperPDF Available

# Drill and Join: A Method for Exact Inductive Program Synthesis

Authors:

## Abstract

In this paper we propose a novel semi-supervised active machine-learning method, based on two recursive higher-order functions that can inductively synthesize a functional computer program. Based on properties formulated using abstract algebra terms, the method uses two combined strategies: to reduce the dimensionality of the Boolean algebra where a target function lies and to combine known operations belonging to the algebra, using them as a basis to build a program that emulates the target function. The method queries for data on specific points of the problem input space and build a program that exactly fits the data. Applications of this method include all sorts of systems based on bitwise operations. Any functional computer program can be emulated using this approach. Combinatorial circuit design, model acquisition from sensor data, reverse engineering of existing computer programs are all fields where the proposed method can be useful.
Drill & Join
A method for exact inductive program synthesis
Remis Balaniuk
Universidade Catolica de Bras´ılia and Tribunal de Contas da Uni˜ao, Brazil
Email: remis@robotics.stanford.edu
Abstract. In this paper we propose a novel semi-supervised active machine-
learning method, based on two recursive higher-order functions that can
inductively synthesize a functional computer program. Based on proper-
ties formulated using abstract algebra terms, the method uses two com-
bined strategies: to reduce the dimensionality of the Boolean algebra
where a target function lies and to combine known operations belonging
to the algebra, using them as a basis to build a program that emulates
the target function. The method queries for data on speciﬁc points of
the problem input space and build a program that exactly ﬁts the data.
Applications of this method include all sorts of systems based on bitwise
operations. Any functional computer program can be emulated using
this approach. Combinatorial circuit design, model acquisition from sen-
sor data, reverse engineering of existing computer programs are all ﬁelds
where the proposed method can be useful.
1 Introduction
Induction means reasoning from speciﬁc to general. In the case of inductive
learning from examples, the general rules are derived from input/output (I/O)
examples or answers from questions. Inductive machine learning has been suc-
cessfully applied to a variety of classiﬁcation and prediction problems  .
Inductive program synthesis (IPS) builds from examples the computation
required to solve a problem. The problem must be formulated as a task of learn-
ing a concept from examples, referred to as inductive concept learning . A
computer program is automatically created from an incomplete speciﬁcation of
the concept to be implemented, also referred as the target function .
Research on inductive program synthesis started in the seventies. Since then
it has been studied in several diﬀerent research ﬁelds and communities such as
artiﬁcial intelligence (AI), machine learning, inductive logic programming (ILP),
genetic programming, and functional programming  .
One basic approach to IPS is to simply enumerate programs of a deﬁned set
until one is found which is consistent with the examples. Due to combinatorial
explosion, this general enumerative approach is too expensive for practical use.
Summers  proposed an analytical approach to induce functional Lisp programs
without search. However, due to strong constraints imposed on the forms of I/O-
examples and inducible programs in order to avoid search, only relatively simple
functions can be induced. Several variants and extensions of Summers’ method
2 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
have been proposed, like in . An overview is given in . Kitzelmann 
proposed a combined analytical and search-based approach. Albarghouthi 
proposed ESCHER, a generic algorithm that interacts with the user via I/O
examples, and synthesizes recursive programs implementing intended behavior.
Hybrid methods propose the integration of inductive inference and deduc-
tive reasoning. Deductive reasoning usually requires high-level speciﬁcations as
a formula in a suitable logic, a background theory for semantic correctness spec-
iﬁcation, constraints or a set of existing components as candidate implementa-
tions. Some examples of hybrid methods are the syntax-guided synthesis ,
the sciduction methodology  and the oracle-guided component-based pro-
gram synthesis approach .
IPS is usually associated to functional programming . In functional code
the output value of a function depends only on the arguments that are input
to the function. It is a declarative programming paradigm. Side eﬀects that
cause change in state that do not depend on the function inputs are not al-
lowed. Programming in a functional style can usually be accomplished in lan-
guages that arent’t speciﬁcally designed for functional programming. In early
debates around programming paradigms, conventional imperative programming
and functional programming were compared and discussed. John Backus  , in
his work on programs as mathematical objects, supported the functional style of
programming as an alternative to the ”ever growing, fat and weak” conventional
programming languages. Backus identiﬁed as inherent defects of imperative pro-
gramming languages their inability to eﬀectively use powerful combining forms
for building new programs from existing ones, and their lack of useful mathe-
matical properties for reasoning about programs. Functional programming, and
more particularly function-level programming, is founded on the use of combin-
ing forms for creating programs that allow an algebra of programs.
Our work is distinguishable from previous works on IPS in a number of ways:
– Our method is based on function-level programming. Programs are built
directly from programs that are given at the outset, by combining them
with program-forming operations.
Our method is based on active learning. Active learning is a special case of
machine learning in which the learning algorithm can control the selection
of examples that it generalizes from and can query one or more oracles to
obtain examples. The oracles could be implemented by evaluation/execution
of a model on a concrete input or they could be human users. Most existing
IPS methods supply the set of examples at the beginning of the learning
process, without reference to the learning algorithm. Some hybrid synthesis
methods use active learning, as in   and , to generate examples to
a deductive procedure. Our method deﬁnes a learning protocol that queries
the oracle to obtain the desired outputs at new data points.
– Our method generates programs on a very low-level declarative language,
compatible with most high-level programming languages. Most existing meth-
ods are conceived considering and restricted to speciﬁc high-level source
languages. Our whole method is based on Boolean algebra. Inputs and out-
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 3
puts are bit vectors and the generated programs are Boolean expressions.
The synthesized program can also be used for a combinatorial circuit design
describing the sequences of gates required to emulate the target function.
Research on reconﬁgurable supercomputing is very interested in providing
compilers that translate algorithms directly into circuit design expressed in
an hardware description language. They want to avoid the high cost of hav-
ing to hand-code custom circuit designs .
– The use of Boolean algebra and abstract algebra concepts at the basis of
the method deﬁnes a rich formalism. The space where a program is to be
searched, or synthesized, corresponds to a well deﬁned family of operations
that can be reused, combined and ordered.
Our method can be applied to general purpose computing. A program gen-
erated using our method, computing an output bit vector from an input bit
vector, is equivalent to a system of Boolean equations or a set of truth ta-
bles. However, bit vectors can also be used to represent any kind of complex
data types, like ﬂoating point numbers and text strings. A functionally com-
plete set of operations performed on arbitrary bits is enough to compute any
computable value. In principle, any Boolean function can be built-up from a
functionally complete set of logic operators. In logic, a functionally complete
set of logical connectives or Boolean operators is one which can be used to
express all possible truth tables by combining members of the set into a
Boolean expression. Our method synthesizes Boolean expressions based on
the logic operators set {X OR, AND}which is functionally complete.
If the problem has a total functional behavior and enough data is supplied
during the learning process our method can synthesize the exact solution.
This paper is organized as follows. Sections 2 and 3 review some relevant math-
ematical concepts. Sections 4 and 5 describe the mathematics of the method.
Section 6 presents the method itself and how the programs are synthesized.
Section 7 presents the main algorithms. Section 8 shows a Common Lisp imple-
mentation of the simplest version of the method. Sections 9, 10 and 11 close the
document discussing the method, possible applications and future work.
2 Boolean ring, F2 ﬁeld, Boolean polynomials and
Boolean functions
A Boolean ring is essentially equivalent to a Boolean algebra, with ring multipli-
cation corresponding to conjunction () and ring addition to exclusive disjunc-
tion or symmetric diﬀerence (or XOR). In Logic, the combination of operators
(XOR or exclusive OR) and (AND) over elements true, f alse produce the
Galois ﬁeld F2 which is extensively used in digital logic and circuitry . This
ﬁeld is functionally complete and can represent any logic obtainable with the
system (∧∨) and can also be used as a standard algebra over the set of the in-
4 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
tegers modulo 2 (binary numbers 0 and 1) 1. Addition has an identity element
(false) and an inverse for every element. Multiplication has an identity element
(true) and an inverse for every element but f alse.
Let Bbe a Boolean algebra and consider the associated Boolean ring. A
Boolean polynomial in Bis a string that results from a ﬁnite number of Boolean
operations on a ﬁnite number of elements in B. A multivariate polynomial over a
ring has a unique representation as a xor-sum of monomials. This gives a normal
form for Boolean polynomials:
M
J⊂{1,2,...,n}
aJY
jJ
xj(1)
where aJBare uniquely determined. This representation is called the al-
gebraic normal form. A Boolean function of nvariables f:Zn
2IN2can be
associated with a Boolean polynomial by deriving an algebraic normal form.
3 Abstract algebra and higher order functions
Let us consider a generic functional setting having as domain the set of bit
strings of a ﬁnite, deﬁned, length and as range the set {true, f alse}or the
binary numbers 0 and 1 represented by one bit. This setting can represent the
inputs and output of a logic proposition, a Boolean function, a truth table or a
fraction of a functional program corresponding to one of its output bits 2.
In abstract algebra terms, this setting will deﬁne a ﬁnitary Boolean algebra
consisting of a ﬁnite family of operations on {0,1}having the input bit string as
their arguments. The length of the bit string will be the arity of the operations
in the family. An n-ary operation can be applied to any of 2npossible values of
its narguments. For each choice of arguments an operation may return 0 or 1,
whence there are 22nn-ary possible operations in the family. In this functional
setting, IPS could be seen as the synthesis of an operation that ﬁts a set of I/O
examples inside its family. The Boolean algebra deﬁnes our program space.
Let Vnbe the set of all binary words of length n,|Vn|= 2n. The Boolean
algebra Bon Vnis a vector space over ZZ2. Because it has 22nelements, it is of
dimension 2nover ZZ2. This correspondence between an algebra and our program
space deﬁnes some useful properties. The operations in a family need not be
all explicitly stated. A basis is any set of operators from which the remaining
operations can be obtained by composition. A Boolean algebra may be deﬁned
from any of several diﬀerent bases. To be a basis is to yield all other operations
by composition, whence any two bases must be intertranslatable.
A basis is a linearly independent spanning set. Let v1, . . . , vmBbe a
basis of B.Span(v1, . . . , vm) = {λ1v1. . . . λmvm|λ1, . . . , λmZZ2}.
1Throughout this paper we will use indistintevely 0, F or false for the binary number
0 and 1, T or true for the binary number 1
2Bit strings can represent any complex data type. Consequently, our functional setting
includes any functional computer program having ﬁxed length input and output.
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 5
The dimension dim(B) of the Boolean algebra is the minimum msuch that
B=span(v1, . . . , vm).
The method proposed in this paper consists of two combined strategies: re-
ducing the dimensionality of the Boolean algebra where a target function lies and
combining known operations in the algebra, using them as a basis to synthesize
a program that emulates the target function.
Both strategies are implemented using two recursive higher order functions
that we created and that we named the drill and the join.
In computer science higher-order functions are functions that take one or
more functions as input and output a function. They correspond to linear map-
pings in mathematics.
4 The Drill function
We deﬁne the set Fmof functions f:ZZp
2×ZZq
2ZZ2containing Boolean functions
belonging to a Boolean algebra of dimension m, described in polynomial form:
f(X, Y ) =
m
M
i=1
gi(X)hi(Y) (2)
where gi:ZZp
2ZZ2and hi:ZZq
2ZZ2are also Boolean functions. Note that
equations 1 and 2 are equivalent and interchangeable. Equation 2 only splits its
input space in two disjoint subsets: p+q=n.
Considering a function fFm, a chosen X0ZZp
2and a chosen Y0ZZq
2
such that f(X0, Y0)6= 0, we deﬁne the drill higher-order function:
IFX0Y0= IF(f(X, Y ), X0, Y0) = f(X, Y )(f(X0, Y )f(X, Y0)) (3)
Note that the function IF outputs a new function and has as inputs the function
fand instances of Xand Y, deﬁning a position on finput space.
Theorem: If fFmand f(X0, Y0)6= 0, then f= IFX0Y0Frand rm1.
Proof: Consider W=span(h1, . . . , hm). Consequently dim(W)m. The
linear operator hWh(Y0) is not the zero map because the hypothesis
forbids hi(Y0) = 0 for all i= 1, ..., n. Consequently, the vector subspace W=
{hW|h(Y0)=0}has dim(W)m1. Notice that for all XZZp
2we have
f(X, ·)W. In fact:
f(X, Y0) = f(X, Y0)(f(X0, Y0)f(X, Y0)) = 0 (4)
Let r=dim(W) and hi, i = 1, . . . , r be a spanning set such that W=
span(h1, . . . , hr). For all XZZp
2,f(X, ·) can be represented as a linear com-
bination of the hi, the coeﬃcients depending on X. In other words, there exist
coeﬃcients gi(X) such that:
f(X, ·) =
r
M
i=1
gi(X)hi(5)
6 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
or written diﬀerently and remaining that r=dim(W) and consequently r
m1 :
f(X, Y ) =
r
M
i=1
gi(X)hi(Y)ut (6)
As an illustration let us consider a Boolean function f:ZZ2×ZZ2ZZ2whose be-
havior can be described by table 1. One possible representation for this function
Table 1. Truth table for f(x, y)
x y f (x, y )
F F T
T F F
F T T
T T T
would be f(x, y) = y(¬x∧ ¬y). fis of dimension 2 (no shorter representation
is possible). Note that this representation is given here for the illustration’s sake.
The method does not require high-level deﬁnitions, only I/O examples.
Respecting the stated hypothesis we can pick x0=Fand y0=Fonce
f(F, F ) = T. The partial functions obtained will be: f(x0, y) = y(T∧ ¬y) and
f(x, y0) = F(¬xT). Appplying the drill function we obtain:
f(x, y) = IF(f(x, y), x0, y0) = (y(¬x∧¬y))((y(T∧¬y))(F(¬xT))) = (xy)
(7)
We can see that f(x, y) is of dimension 1, conﬁrming the stated theorem.
5 The Join function
Consider now the set Fmof Boolean functions f:ZZn
2ZZ2and v1, . . . , vmFm
a basis. The functions in this set can be described in polynomial form as:
f(X) =
m
M
i=1
λivi(X) (8)
where λiZZ2are the coeﬃcients.
Considering a function fFm, a chosen XjZZn
2such that f(Xj)6= 0 and
a chosen function vjbelonging to the basis such that vj(Xj)6= 0, we deﬁne the
join function :
IHXjvj= IH(f(X), Xj, vj) = f(X)vj(X) (9)
Theorem: If fFm,f(Xj)6= 0 and vj(Xj)6= 0, then f= IHXjvjFrand
rm1.
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 7
Proof: Consider W=span(v1, . . . , vm). Consequently dim(W)m. The
linear operator vWv(Xj) is not the zero map otherwise vj(Xj) = 0.
Consequently, the vector subspace W={fW|f(Xj)=0}has dim(W)
m1. We can see that fW. In fact:
f(Xj) = f(Xj)vj(Xj) = 0 (10)
Let r=dim(W) and vi, i = 1, . . . , r be a spanning set such that W=span(vi, . . . , vr).
The function fcan be represented as a linear combination of the vi. In other
words, and remaining that r=dim(W) so rm1, there exist coeﬃcients λi
such that:
f(X) =
r
M
i=1
λivi(X)ut (11)
We can use the same function f:ZZ2×ZZ2ZZ2described by table 1 to
illustrate the behavior of the join higher-order function. fbelongs to a Boolean
algebra of dimension 22which can be deﬁned, for instance, by the following span-
ning set: v1(x, y) = x, v2(x, y) = y , v3(x, y) = xy, v4(x, y) = T. Respecting the
stated hypothesis we can pick Xj= (T, T ) and vj=v1once f(T , T ) = Tand
v1(T , T ) = T. Appplying the join function we obtain:
f(x, y) = IH(f(X), Xj, vj)=(y(¬x∧ ¬y)) x= (xy) (12)
We can see that f(x, y) is of dimension 1, conﬁrming the stated theorem.
6 The Drill & Join program synthesis method
Drill and join are used to deﬁne a program synthesis method. Considering an
active learning framework, the input function f(X, Y ) on IF and the input func-
tion f(X) on IH represent an external unknown concept from which it is possible
to obtain data by means of queries (input-output examples).
This unknown concept could be, for instance, some physical phenomenon that
a machine with sensors and actuators can actively experiment, an algorithm to
be translated in a hardware description language, a computer program that one
would like to emulate or optimize or a decision process to be implemented on a
computer for which one or more experts are able to answer required questions.
In order to understand the method it is important to notice two important
properties of both higher-order functions:
– IF and IH can be applied recursively: if f(X, Y )Fmthen f1(X, Y ) =
IF(f(X, Y ), X0, Y0)Fm1and f2(X, Y ) = IF(f1(X, Y ), X1, Y1)Fm2.
Similarly, if f(X)Fmthen f1(X) = IH(f(X), X0, v0)Fm1and f2(X) =
IH(f1(X), X1, v1)Fm2. Each recursion generates a new function belong-
ing to an algebra of a lower dimension.
8 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
The recursion ends when the higher-order functions become the zero map:
IF(f(X, Y ), Xi, Yi)=0f(X, Y )=(f(Xi, Y )f(X, Yi)) and similarly,
IH(f(X), Xi, vi)=0f(X) = vi(X).
The ﬁrst property enables us to apply the same higher order function recur-
sively in order to gradually reduce the dimensionality of the initial problem. The
second deﬁnes a stop condition. As a result, the output program is obtained.
X, Y ZZp
2×ZZq
2:fm+1(X, Y ) = IF(fm(X, Y ), Xm, Ym)=0
fm(X, Y ) = (fm(Xm, Y )fm(X, Ym))
Replacing fm(X, Y ) = IF(fm1(X, Y ), Xm1, Ym1) gives us:
fm1(X, Y )(fm1(Xm1, Y )fm1(X, Ym1)) = (fm(Xm, Y )fm(X, Ym))
(13)
and consequently:
fm1(X, Y )=(fm1(Xm1, Y )fm1((X, Ym1)) (fm(Xm, Y )fm(X, Ym))
(14)
Tracking the recursion back to the beginning we obtain:
f(X, Y ) =
m
M
i=1
fi(Xi, Y )fi(X, Yi) (15)
Equation 15 tells us that the original target function fcan be recreated using
the partial functions fobtained using the drill function. The partial functions
fare simpler problems, deﬁned on subspaces of the original target function.
Similarly:
XZZn
2:fm+1(X) = IH(fm(X), Xm, vm)=0fm(X) = vm(X) (16)
Replacing fm(X) = IH(fm1(X), Xm1, vm1) gives us:
fm1(X)vm1(X) = vm(X) (17)
and:
fm1(X) = vm1(X)vm(X) (18)
Tracking the recursion back to the beginning we obtain:
f(X) =
m
M
i=1
vi(X) (19)
Equation 19 tells us that the original target function fcan be recreated using
the partial functions fobtained using the join function and the basis v.
Note that if the drill initial condition cannot be established, i.e., no (X0, Y0) :
f(X0, Y0)6= 0 can be found, the target function is necessarly f(X, Y ) = F. On
the same way if no X0:f(X0)6= 0 can be found to initiate join the target
function is the zero map f(X) = F. If it exists a Xj:f(Xj)6= 0 but no
vj:vj(Xj)6= 0 the basis was not chosen appropriately.
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 9
The IF higher order function deﬁnes a double recursion. For each step of the
dimensionality reduction recursion two subspace synthesis problems are deﬁned:
fi(Xi, Y ) and fi(X, Yi). Each of these problems can be treated as a new target
function in a Boolean algebra of reduced arity once part of the arguments is
ﬁxed. They can be recursively solved using IF or IH again.
The combination of IF and IH deﬁnes a very powerful inductive method. The
IF higher-order function alone requires the solution of an exponential number of
subspace synthesis problems in order to inductively synthesize a target function.
The IH higher-order function requires a basis of the Boolean algebra. Considering
the 2ncardinality of a basis, it can be impractical to require its prior existence
in large arity algebras. Nevertheless, both functions combined can drastically
reduce the number of queries and the prior bases deﬁnition.
The method, detailed on the next sections, uses the following strategies:
Bases are predeﬁned for low arity input spaces, enabling the join function.
Synthesis on large arity input spaces begin using the IF function
Previously sinthesized programs on a subspace can be memorized in order
to compose a basis on that subspace.
At each new synthesis, if a basis exists use IH, otherwise use IF.
To illustrate the functioning of the whole method let us use again the same
function f:ZZ2×ZZ2ZZ2described by tables 1 and 2. We can apply the drill
function one ﬁrst time f1(X, Y ) = IF(f(X, Y ), X0, Y0) with x0=F,y0=F,
deﬁning two subspace problems: f(x0, y) and f(x, y0). Both problems can be
solved using the join function and the basis v0(x) = x,v1(x) = T.f1is not the
zero map, requiring a second recursion of drill:f2= IF(f1(X, Y ), X1, Y1) with
x1=T,y1=T. Two new subspace problems are deﬁned: f1(x1, y) and f1(x, y1)
and they can be solved using join and the same basis again. f2(X, Y ) = Fis
ﬁnally the zero map, stopping the recursion. Table 2 shows the target function
and the transformation steps performed using the drill function. To illustrate
Table 2. Truth table for the drill steps
x y f (x, y )f(x0, y)f(x, y0)f1(x, y)f1(x1, y)f1(x, y1)f2(x, y)
F F T T T F F F F
T F F T F F F T F
F T T T T F T F F
T T T T F T T T F
the use of the join function let us consider the reconstruction of f(X) = f(x, y0)
detailed in table 3. One ﬁrst application f1(X) = IH(f(X), X0, v0) with x0=F
and v0(x) = xwill not result in the zero map, as shown on the ﬁfth column of
table 3, requiring a recursive call f2(X) = IH(f1(X), X1, v1) with x1=Tand
v1(x) = Twhich will result the zero map.
10 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
Table 3. Truth table for the join steps
x f(x)v0(x)v1(x)f1(x)f2(x)
F T F T T F
T F T T T F
Using equation 19 we can ﬁnd a representation for the partial target function:
f(X) = f(x, y0) = v0(x)v1(x) = xT. The same process can be used to ﬁnd
representations for all one-dimensional problems: f(x0, y) = T,f1(x1, y) = y
and f1(x, y1) = x.
Having solved the partial problems we can use equation 15 to build the
full target function: f(x, y)=(f(x0, y)f(x, y0)) (f1(x1, y)f1(x, y1)) =
(T(xT)) (yx) which is equivalent to our initial representation.
Each higher order function underlies a query protocol. At each recursion of
IF one or more queries for data are made in order to ﬁnd Xi, YiZZp
2×ZZq
2:
fi(Xi, Yi)6= 0. At each recursion of IH a position XiZZp
2:f(Xj)6= 0 and
a function vi:vi(Xi)6= 0 from the basis are chose. The queries require data
from the target function and must be correctly answered by some kind of oracle,
expert, database or system. Wrong answers make the algorithms diverge. Data is
also necessary to verify the recursion stop condition. A full test of fi(X, Y ) = 0 or
fi(X) = 0, scanning the whole input space, can generate proof that the induced
result exactly meets the target function. Partial tests can be enough to deﬁne
candidate solutions subject to further inspection.
7 The main algorithms of the Drill&Join method
To explain how the drill and join higher-order functions can be used as program-
forming functionals we propose the following two algorithms.
Drill takes as initial inputs the target function: fn and the dimension of its
input space: inputs. Deeper inside the recursion fn corresponds to fm(X, Y ),
inputs deﬁnes the dimension of its subspace and initial indicates its ﬁrst free
dimension. Drill returns a synthesized functional program that emulates fn.
1: procedure Drill(f n, inputs,optional: initial = 0)
2: if have a basis for this subspace then
3: return Join(fn,inputs,initial);
4: end if
5: pos ﬁnd a position inside this subspace where f n is not null;
6: if pos =null then
7: return FALSE; .Stop condition.
8: end if
9: fa(args) = fn(concatenate(args, secondhal f(pos))); . f (X, Y0)
10: fb(args) = fn(concatenate(f irsthalf(pos), args)); . f (X0, Y )
11: fc(args) = fn(args)(f a(f irsthalf(args)) f b(secondhalf (args))) ;
12: . fm+1(X, Y )
13: pa =Drill(f a, inputs/2, initial); .Recursive call to synthesize f(X, Y0)
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 11
14: pb =Drill(f b, inputs/2, initial +inputs/2); .Recurs. call forf(X0, Y )
15: pc =Drill(f c, inputs, initial); .Recursive call for fm+1(X, Y )
16: return ’((’ pa ’AND’ pb ’)’ XOR pc ’)’; .Returns the program
17: end procedure
Note that fa,f b and f c are functions based on f n and pa,pb and pc are
programs obtained by recursively calling drill.f irsthalf and seconhalf split a
vector in two halves.
Join takes as input a funcion fn, belonging to a Boolean algebra. The al-
gorithm requires a basis for this Boolean algebra, materialized as an array of
functions: basis[] and an array of programs emulating the basis: basisp[]. The
algorithm creates a program to emulate fn combining the program basis.
1: procedure Join(f n,optional: initial = 0)
2: pos ﬁnd a position inside this subspace where f n is not null;
3: if pos =null then
4: return FALSE; .Stop condition.
5: end if
6: vﬁnd a function v from the basis such that v(pos) is not 0;
7: vp get the program that emulates v;
8: fa(args) = fn(args)v(args);
9: pa =Join(f a, initial); .Recursive call for fm+1(X)
10: return ’(’ pa ’XOR’ vp(initial) ’)’ ; .Returns the program.
11: end procedure
8 A Common Lisp version of the Drill&Join method
Common Lisp is a natural choice of programming language to implement the
drill&join method. Lisp functions can take other functions as arguments to build
and return new functions. Lisp lists can be interpreted and executed as programs.
The following code implement the simplest version of the method. It synthe-
sizes a Lisp program that emulates a function fn by just querying it. The function
fn must accept bit strings as inputs and must return one bit as the answer. In
this simple illustration fn is another Lisp function but in real use it would be an
external source of data queried throughout an adequate experimental protocol.
Drill takes the unknown function fn and its number of binary arguments
nargs as input. It returns a list composed of logical operators, logical symbols
nil and true and references to an input list of arguments. The output list can be
executed as a Lisp program that emulates fn.
(defun drill (fn nargs &optional (ipos 0) (slice 0))
(let ((base (list nil #’(lambda(x) (first x)) #’(lambda(x) t) ) )
(basep (list nil #’(lambda(x) (list ’nth x ’args)) #’(lambda(x) t))))
(if (= nargs 1)
(join fn base basep ipos)
(let ((pos (findpos fn nargs slice)))
(if (null pos)
nil
(labels ((fa (args) (funcall fn (append args (cdr pos))))
(fb (args) (funcall fn (append (list (car pos)) args)))
(fc (args) (xor (funcall fn args)
12 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
(and (fa (list (car args))) (fb (cdr args))))))
(let ((sa (drill #’(lambda(args) (fa args)) 1 ipos))
(sb (drill #’(lambda(args) (fb args)) (1- nargs) (1+ ipos)))
(sc (drill #’(lambda(args) (fc args)) nargs ipos (1+ slice))))
(if (and (atom sa) sa)
(setq r1 sb)
(setq r1 (list ’and sa sb)))
(if (null sc)
r1
(list ’xor r1 sc)))))))))
In this implementation the split of the input space is done by choosing the
ﬁrst argument to be Xand the rest to be Y.Drill will call Join when the
recursion is down to just one input function (nargs=1). A basis for one input
bit functions {f(x) = x, f (x) = T}is deﬁned directly inside drill as a functions
list base and a programs list basep which are passed as arguments to the join
function which is implemented as follows:
(defun join (fn base basep &optional (ipos 0))
(let ((pos (findpos fn 1)))
(if (null pos)
nil
(let ((fb (findbase base basep pos)))
(labels ((fa (args) (xor (funcall fn args) (funcall (nth 0 fb) args))))
(let ((r (join #’(lambda(args) (fa args)) base basep ipos)))
(return-from join (list ’xor (funcall (nth 1 fb) ipos) r))))))))
The ﬁndpos function is used to test if fn is the zero map performing a full
search on the fn input space. It stops when a non-zero answer is found and
returns its position. The full search means that no inductive bias was used.
(defun findpos (fn nargs)
(loop for i from 0 to (1- (expt 2 nargs)) do
(let ((l (make-list nargs)) (j i) (k 0))
(loop do (if (= (mod j 2) 1) (setf (nth k l) t))
(incf k) (setq j (floor j 2))
while (> j 0))
(if (funcall fn l) (return-from findpos l)))))
The ﬁndbase function is used inside join to ﬁnd a function from the basis
respecting the constraint vj(Xj)6= 0.
(defun findbase (base basep pos)
(loop for i from 1 to (1- (list-length base)) do
(if (funcall (nth i base)pos)
(let ((ba (nth i base)) (bp (nth i basep)))
(setq fb (list ba bp))
(delete (nth 0 fb) base) (delete (nth 1 fb) basep)
(return-from findbase fb)))))
The method queries the target function (funcall fn) only inside ﬁndpos, in
order to ﬁnd a non-zero position.
Using the Lisp code provided above it is possible to check the consistency
of the method. Applied to our illustration described by table 1 the generated
program would be:
(XOR (AND (XOR T (NTH 0 ARGS)) T) (AND (NTH 0 ARGS) (NTH 1 ARGS)))
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 13
As a more advanced illustration, let us consider the ”unknown” target func-
tion to be the Fibonacci sequence. To avoid bulky outputs we will limit the il-
lustration to have as input an unsigned integer between 0 and 63. Consequently,
the input of the fn function can be a six bit long bit string. The range of the
output (between 1 and 6557470319842) requires a 64 bits unsigned integer. The
target function fibonacci(n) computes a long integer corresponding to the nth
position of the Fibonnacci sequence. To translate integers to lists of {NI L, T }
handled by the drill and join lisp code we use their binary representation. The
translation is done by the routines longint2bitlist, bitlist2longint, 6bitlist2int and
6int2bitlist, not included in this paper, called from the function myf ibonacci(n)
(defun myfibonacci(n) (let ((r (6bitlist2int n))) (longint2bitlist (fibonacci r))))
In order to synthesize a program able to emulate the whole target function we
need to call Drill for each output bit and generate a list of boolean expressions.
(defun synthesis (fn nargs nouts filename)
(with-open-file (outfile filename :direction :output)
(let ((l (make-list nouts)))
(loop for i from 0 to (1- nouts) do
(labels ((fa (args) (nth i (funcall fn args))))
(let ((x (drill #’(lambda(args) (fa args)) nargs))) (setf (nth i l) x))))
(print l outfile))))
To run the generated program we need to compute each output bit and
translate the bit string into an integer:
(defun runprogram(filename v)
(with-open-file (infile filename)
(setq s (read infile)) (setq args (6int2bitlist v))
(let ((r (make-list (list-length s))))
(loop for i from 0 to (1- (list-length s)) do
(setf (nth i r) (eval (nth i s))))
(print (bitlist2longint r)))))
A full check on the whole target function input space shows that the syn-
thesized program exactly emulates the target function. To illustrate how the
generated programs looks like we show below the expression that computes the
ﬁrst output bit of the Fibonacci sequence:
(XOR (AND (XOR (AND (XOR (AND (NTH 0 ARGS) (XOR T (NTH 1 ARGS))) (AND (XOR T
(NTH 0 ARGS)) (NTH 1 ARGS))) T) (AND (XOR (AND (XOR T (NTH 0 ARGS)) T) (AND
(NTH 0 ARGS) (NTH 1 ARGS))) (NTH 2 ARGS))) (XOR (AND (XOR (AND (XOR T (NTH 3 ARGS))
T) (AND (NTH 3 ARGS) (NTH 4 ARGS))) (XOR T (NTH 5 ARGS))) (AND (XOR (AND (NTH 3 ARGS)
(XOR T (NTH 4 ARGS))) (AND (XOR T (NTH 3 ARGS)) (NTH 4 ARGS))) (NTH 5 ARGS)))) (AND
(XOR (AND (XOR (AND (XOR T (NTH 0 ARGS)) T) (AND (NTH 0 ARGS) (NTH 1 ARGS))) (XOR T
(NTH 2 ARGS))) (AND (XOR (AND (NTH 0 ARGS) (XOR T (NTH 1 ARGS))) (AND (XOR T
(NTH 0 ARGS)) (NTH 1 ARGS))) (NTH 2 ARGS))) (XOR (AND (XOR (AND (NTH 3 ARGS)
(XOR T (NTH 4 ARGS))) (AND (XOR T (NTH 3 ARGS)) (NTH 4 ARGS))) T) (AND (XOR (AND
(XOR T (NTH 3 ARGS)) T) (AND (NTH 3 ARGS) (NTH 4 ARGS))) (NTH 5 ARGS)))))
The generated program is basically a single Boolean expression that explicitly
references an input list called args. The full program consists of a list of those
Boolean expressions.
14 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
9 Discussion
The implementation presented on section 8 has didactic purposes only. A num-
ber of enhancements are possible. Bases can be dynamically built on subspaces
by memorizing programs previously created avoiding drilling down to smaller
subspaces and reducing the number of queries to the target function.
An interesting aspect of the generated programs is the fact that the com-
putation of each output bit is completely independent of the others. There is a
speciﬁc program for each output bit, enabling parallel processing.
Empirical comparisons between the proposed method and existing ones are
diﬃcult because of the conceptual diﬀerences between them. Existing benchmark
frameworks, as CHStone  and and SyGuS  tend to be speciﬁc to a cer-
tain synthesis approach. CHStone was conceived for C-based high-level synthesis
and SyGuS for syntax-guided hybrid synthesis (inductive and deductive). Con-
ceptual comparisons can be done but without objective results. The proposed
method can handle target functions that others probably cannot, but because
it works at the bit level of inputs and output, the number of examples required
for learning and testing tend to be larger than in other methods. Most exist-
ing inductive methods use static example databases while our method is based
on active learning and requires an experimental protocol in order to query the
target concept during the learning process. Our method requires a predeﬁned
input space, with ﬁxed length, and does not handle dynamic input lists like in
Lisp programs generated by most variants and extensions of Summers  an-
alytical approach. But on the other side, the simplicity of the code generated
by our method, based on only two logical operators, enables its compilation in
almost any conventional programming language, even on hardware description
languages. The declarative nature of the generated programs brings predictablily
at runtime in terms of execution time and use of machine resources. The example
proposed in section 8 showed how to synthesize a declarative program to emulate
the Fibonacci sequence. The generated program requires, for any six bits input,
exactly 1804 low-level, bitwise logic operations to compute the corresponding Fi-
bonacci number. An equivalent imperative program will require local variables,
loop or recursion controls and a number of variable assignments and arithmetic
operations proportional to the input value. The method does not require any
prior knowledge about the target function, as a background theory, types of
variables or operation on types, like in hybrid methods. Nevertheless, the goal
of this paper is not to prove that we created a better method but to present
a new concept on inductive program synthesis. Future work will be necessary
in order to assess the advantages and disadvantages of the proposed method in
each possible ﬁeld of application when other methods are also available.
For practical use of our method it is important to be able to estimate the eﬀort
required to synthesize a program. In typical active learning, there is usually a cost
element associated with every query. This cost depends on the characteristics of
the target concept and the associated experimental protocol used to query it.
The synthesis eﬀort will depend on the number of queries to be required and
the cost of each query. If the implementation of the method is based on a full
L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation 15
veriﬁcation of the zero map, like in our illustration presented on section 8, a
full scan of the target function input space will be necessary and the number
of queries will depend only on the size of the input space in bits. As a future
work an inductive bias can be proposed in order to avoid the full scan and then
reduce the number of queries.
10 Applications
Applications of this method include all sorts of systems based on bitwise opera-
tions, given that the learning problem can be described in functional form. We
succesfully applied the method to target functions handling diﬀerent complex
datatypes, as ﬂoating-point numbers and texts.
Any functional computer program can be emulated using our approach.
There can be a number of reasons to perform reverse engineering of existing
computer programs: the lack of a source code, the need to translate an exe-
cutable program to run on a diﬀerent computational platform, the intent to
optimize an ineﬃcient implementation.
The method can also be used to translate algorithms from a high-level lan-
guage directly into combinatorial circuit design expressed in an hardware de-
scription language. The code generated by the proposed method can be easily
mapped into a netlist (sequence of circuit gates).
Machine learning is another ﬁeld of application. Machines having sensors and
actuators, like robots, can acquire direct models using the method. The machine
can actively experiment a physical phenomena and synthesize a program to
predict the result of possible actions.
11 Conclusion
We have presented Drill&Join, a generic method that actively interacts with an
external concept (system, function, oracle or database) via I/O examples, and
synthesizes programs. Generated programs are based on Boolean expressions.
The method is not restricted to any speciﬁc form of functional learning problem
or target function and does not require any background knowledge to be applied.
The only requirement is that the external source of data is consistent. Our
work presents a number of interesting questions for future consideration. The
combination of the drill and the join higher-order functions and the dynamic
construction of bases on subspaces via memorization of generated program can
drastically reduce the number of recursive calls and queries. Further investigation
is necessary on how to explore these dynamic bases on large input spaces. The
stop condition of the algorithms, based on a full veriﬁcation of the zero map,
requires a complete scan of the learning input space. Partial veriﬁcations can
compromise the convergence of the algorithms. Investigations on inductive biases
adequate to the method are necessary.
16 L
A
T
E
X style ﬁle for Lecture Notes in Computer Science – documentation
References
1. S. B. Kotsiantis, Supervised Machine Learning: A review of classiﬁcation techniques,
Informatica 31:249-268, 2007.
2. S. Muggleton, L. De Raedt, D. Poole, I. Bratko, P. Flach, K. Inoue and A. Srinivasan,
ILP turns 20. Biography and future challenges, Machine Learning 86:3-23. DOI
10.1007/s10994-011-5259-2, 2012.
3. E. Kitzelmann, U. Schmid, and R. Plasmeijer, Inductive Programming: A Survey of
Program Synthesis Techniques, In: LNCS, Approaches and Applications of Inductive
Programming, 3rd Workshop AAIP, Revised Papers, Springer-Verlag, 2010
4. M. H. Stone, The theory of representations of Boolean Algebras, Transactions of the
American Mathematical Society 40: 37-111, 1936.
5. A. Albarghouthi, S. Gulwani and Z. Kincaid, Recursive Program Synthesis Proceed-
ings of the 25th International Conference on Computer aided Veriﬁcation (CAV),
Saint Petersburg, Russia, 2013.
6. E. Kitzelmann, A Combined Analytical and Search-Based Approach for the Inductive
Synthesis of Functional Programs, Kunstliche Intelligenz, 25(2): 179-182. 2011.
7. E. Kitzelmann, U. Schmid, Inductive Synthesis of Functional Programs: An Ex-
planation Based Generalization Approach. Journal of Machine Learning Research,
7:429-454. 2006.
8. E. Kitzelmann, Analytical Inductive Functional Programming, M. Hanus (Ed.):
LOPSTR 2008, LNCS 5438, pp. 87-102, 2009.
9. P.D. Summers, A methodology for LISP program construction from examples. Jour-
nal ACM, 24(1):162-175, 1977.
10. D. R. Smith, The synthesis of LISP programs from examples. A survey. In A.W.
Biermann, G. Guiho, and Y. Kodratoﬀ, editors, Automatic Program Construction
Techniques, pages 307-324. Macmillan, 1984.
11. S. Muggleton and L. De Raedt, Inductive logic programming: Theory and methods.
Journal of Logic Programming, Special Issue on 10 Years of Logic Programming,
19-20:629-679, 1994.
12. T. Sasao, Switching theory for logic synthesis, Springer, 1999. ISBN:0-7923-8456-3.
13. R. Alur, R. Bodik., G. Juniwal et. al. Syntax-Guided Synthesis, FMCAD, page
1-17. IEEE (2013).
14. N. Lavrac and S. Dseroski, Inductive Logic Programming Techniques and Applica-
tions. Ellis Horwood, New York, 1994.
15. J. L. Tripp, M. B. Gokhal and K. D. Peterson, Trident: From High-Level Language
to Hardware Circuitry. IEEE - Computer, 0018-9162/07, March 2007.
16. J. Backus, Can Programming Be Liberated from the von Neumann Style? A Func-
tional Style and Its Algebra of Programs. Communications of the ACM, Volume 21,
Number 8, August 1978.
17. Y. Hara, H.i Tomiyama, S. Honda and H. Takada, Proposal and Quantitative Anal-
ysis of the CHStone Benchmark Program Suite for Practical C-based High-level Syn-
thesis, Journal of Information Processing, Vol. 17, pp.242-254, (2009).
18. S.Jha, S.Gulwani, S.A.Seshia, A.Tiwari. Oracle-guided Component-based Program
Synthesis. In: ICSE, 2010.
19. S. A. Seshia: Sciduction: combining induction, deduction, and structure for veriﬁ-
cation and synthesis. DAC 2012: 356-365.
This article was processed using the L
A
T
E
X macro package with LLNCS style