Conference PaperPDF Available

Drill and Join: A Method for Exact Inductive Program Synthesis

Authors:

Abstract

In this paper we propose a novel semi-supervised active machine-learning method, based on two recursive higher-order functions that can inductively synthesize a functional computer program. Based on properties formulated using abstract algebra terms, the method uses two combined strategies: to reduce the dimensionality of the Boolean algebra where a target function lies and to combine known operations belonging to the algebra, using them as a basis to build a program that emulates the target function. The method queries for data on specific points of the problem input space and build a program that exactly fits the data. Applications of this method include all sorts of systems based on bitwise operations. Any functional computer program can be emulated using this approach. Combinatorial circuit design, model acquisition from sensor data, reverse engineering of existing computer programs are all fields where the proposed method can be useful.
Drill & Join
A method for exact inductive program synthesis
Remis Balaniuk
Universidade Catolica de Bras´ılia and Tribunal de Contas da Uni˜ao, Brazil
Email: remis@robotics.stanford.edu
Abstract. In this paper we propose a novel semi-supervised active machine-
learning method, based on two recursive higher-order functions that can
inductively synthesize a functional computer program. Based on proper-
ties formulated using abstract algebra terms, the method uses two com-
bined strategies: to reduce the dimensionality of the Boolean algebra
where a target function lies and to combine known operations belonging
to the algebra, using them as a basis to build a program that emulates
the target function. The method queries for data on specific points of
the problem input space and build a program that exactly fits the data.
Applications of this method include all sorts of systems based on bitwise
operations. Any functional computer program can be emulated using
this approach. Combinatorial circuit design, model acquisition from sen-
sor data, reverse engineering of existing computer programs are all fields
where the proposed method can be useful.
1 Introduction
Induction means reasoning from specific to general. In the case of inductive
learning from examples, the general rules are derived from input/output (I/O)
examples or answers from questions. Inductive machine learning has been suc-
cessfully applied to a variety of classification and prediction problems [1] [3].
Inductive program synthesis (IPS) builds from examples the computation
required to solve a problem. The problem must be formulated as a task of learn-
ing a concept from examples, referred to as inductive concept learning [14]. A
computer program is automatically created from an incomplete specification of
the concept to be implemented, also referred as the target function [3].
Research on inductive program synthesis started in the seventies. Since then
it has been studied in several different research fields and communities such as
artificial intelligence (AI), machine learning, inductive logic programming (ILP),
genetic programming, and functional programming [3] [8].
One basic approach to IPS is to simply enumerate programs of a defined set
until one is found which is consistent with the examples. Due to combinatorial
explosion, this general enumerative approach is too expensive for practical use.
Summers [9] proposed an analytical approach to induce functional Lisp programs
without search. However, due to strong constraints imposed on the forms of I/O-
examples and inducible programs in order to avoid search, only relatively simple
functions can be induced. Several variants and extensions of Summers’ method
2 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
have been proposed, like in [7]. An overview is given in [10]. Kitzelmann [6]
proposed a combined analytical and search-based approach. Albarghouthi [5]
proposed ESCHER, a generic algorithm that interacts with the user via I/O
examples, and synthesizes recursive programs implementing intended behavior.
Hybrid methods propose the integration of inductive inference and deduc-
tive reasoning. Deductive reasoning usually requires high-level specifications as
a formula in a suitable logic, a background theory for semantic correctness spec-
ification, constraints or a set of existing components as candidate implementa-
tions. Some examples of hybrid methods are the syntax-guided synthesis [13],
the sciduction methodology [19] and the oracle-guided component-based pro-
gram synthesis approach [18].
IPS is usually associated to functional programming [8]. In functional code
the output value of a function depends only on the arguments that are input
to the function. It is a declarative programming paradigm. Side effects that
cause change in state that do not depend on the function inputs are not al-
lowed. Programming in a functional style can usually be accomplished in lan-
guages that arent’t specifically designed for functional programming. In early
debates around programming paradigms, conventional imperative programming
and functional programming were compared and discussed. John Backus [16] , in
his work on programs as mathematical objects, supported the functional style of
programming as an alternative to the ”ever growing, fat and weak” conventional
programming languages. Backus identified as inherent defects of imperative pro-
gramming languages their inability to effectively use powerful combining forms
for building new programs from existing ones, and their lack of useful mathe-
matical properties for reasoning about programs. Functional programming, and
more particularly function-level programming, is founded on the use of combin-
ing forms for creating programs that allow an algebra of programs.
Our work is distinguishable from previous works on IPS in a number of ways:
– Our method is based on function-level programming. Programs are built
directly from programs that are given at the outset, by combining them
with program-forming operations.
Our method is based on active learning. Active learning is a special case of
machine learning in which the learning algorithm can control the selection
of examples that it generalizes from and can query one or more oracles to
obtain examples. The oracles could be implemented by evaluation/execution
of a model on a concrete input or they could be human users. Most existing
IPS methods supply the set of examples at the beginning of the learning
process, without reference to the learning algorithm. Some hybrid synthesis
methods use active learning, as in [13] [19] and [18], to generate examples to
a deductive procedure. Our method defines a learning protocol that queries
the oracle to obtain the desired outputs at new data points.
– Our method generates programs on a very low-level declarative language,
compatible with most high-level programming languages. Most existing meth-
ods are conceived considering and restricted to specific high-level source
languages. Our whole method is based on Boolean algebra. Inputs and out-
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 3
puts are bit vectors and the generated programs are Boolean expressions.
The synthesized program can also be used for a combinatorial circuit design
describing the sequences of gates required to emulate the target function.
Research on reconfigurable supercomputing is very interested in providing
compilers that translate algorithms directly into circuit design expressed in
an hardware description language. They want to avoid the high cost of hav-
ing to hand-code custom circuit designs [15].
– The use of Boolean algebra and abstract algebra concepts at the basis of
the method defines a rich formalism. The space where a program is to be
searched, or synthesized, corresponds to a well defined family of operations
that can be reused, combined and ordered.
Our method can be applied to general purpose computing. A program gen-
erated using our method, computing an output bit vector from an input bit
vector, is equivalent to a system of Boolean equations or a set of truth ta-
bles. However, bit vectors can also be used to represent any kind of complex
data types, like floating point numbers and text strings. A functionally com-
plete set of operations performed on arbitrary bits is enough to compute any
computable value. In principle, any Boolean function can be built-up from a
functionally complete set of logic operators. In logic, a functionally complete
set of logical connectives or Boolean operators is one which can be used to
express all possible truth tables by combining members of the set into a
Boolean expression. Our method synthesizes Boolean expressions based on
the logic operators set {X OR, AND}which is functionally complete.
If the problem has a total functional behavior and enough data is supplied
during the learning process our method can synthesize the exact solution.
This paper is organized as follows. Sections 2 and 3 review some relevant math-
ematical concepts. Sections 4 and 5 describe the mathematics of the method.
Section 6 presents the method itself and how the programs are synthesized.
Section 7 presents the main algorithms. Section 8 shows a Common Lisp imple-
mentation of the simplest version of the method. Sections 9, 10 and 11 close the
document discussing the method, possible applications and future work.
2 Boolean ring, F2 field, Boolean polynomials and
Boolean functions
A Boolean ring is essentially equivalent to a Boolean algebra, with ring multipli-
cation corresponding to conjunction () and ring addition to exclusive disjunc-
tion or symmetric difference (or XOR). In Logic, the combination of operators
(XOR or exclusive OR) and (AND) over elements true, f alse produce the
Galois field F2 which is extensively used in digital logic and circuitry [12]. This
field is functionally complete and can represent any logic obtainable with the
system (∧∨) and can also be used as a standard algebra over the set of the in-
4 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
tegers modulo 2 (binary numbers 0 and 1) 1. Addition has an identity element
(false) and an inverse for every element. Multiplication has an identity element
(true) and an inverse for every element but f alse.
Let Bbe a Boolean algebra and consider the associated Boolean ring. A
Boolean polynomial in Bis a string that results from a finite number of Boolean
operations on a finite number of elements in B. A multivariate polynomial over a
ring has a unique representation as a xor-sum of monomials. This gives a normal
form for Boolean polynomials:
M
J⊂{1,2,...,n}
aJY
jJ
xj(1)
where aJBare uniquely determined. This representation is called the al-
gebraic normal form. A Boolean function of nvariables f:Zn
2IN2can be
associated with a Boolean polynomial by deriving an algebraic normal form.
3 Abstract algebra and higher order functions
Let us consider a generic functional setting having as domain the set of bit
strings of a finite, defined, length and as range the set {true, f alse}or the
binary numbers 0 and 1 represented by one bit. This setting can represent the
inputs and output of a logic proposition, a Boolean function, a truth table or a
fraction of a functional program corresponding to one of its output bits 2.
In abstract algebra terms, this setting will define a finitary Boolean algebra
consisting of a finite family of operations on {0,1}having the input bit string as
their arguments. The length of the bit string will be the arity of the operations
in the family. An n-ary operation can be applied to any of 2npossible values of
its narguments. For each choice of arguments an operation may return 0 or 1,
whence there are 22nn-ary possible operations in the family. In this functional
setting, IPS could be seen as the synthesis of an operation that fits a set of I/O
examples inside its family. The Boolean algebra defines our program space.
Let Vnbe the set of all binary words of length n,|Vn|= 2n. The Boolean
algebra Bon Vnis a vector space over ZZ2. Because it has 22nelements, it is of
dimension 2nover ZZ2. This correspondence between an algebra and our program
space defines some useful properties. The operations in a family need not be
all explicitly stated. A basis is any set of operators from which the remaining
operations can be obtained by composition. A Boolean algebra may be defined
from any of several different bases. To be a basis is to yield all other operations
by composition, whence any two bases must be intertranslatable.
A basis is a linearly independent spanning set. Let v1, . . . , vmBbe a
basis of B.Span(v1, . . . , vm) = {λ1v1. . . . λmvm|λ1, . . . , λmZZ2}.
1Throughout this paper we will use indistintevely 0, F or false for the binary number
0 and 1, T or true for the binary number 1
2Bit strings can represent any complex data type. Consequently, our functional setting
includes any functional computer program having fixed length input and output.
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 5
The dimension dim(B) of the Boolean algebra is the minimum msuch that
B=span(v1, . . . , vm).
The method proposed in this paper consists of two combined strategies: re-
ducing the dimensionality of the Boolean algebra where a target function lies and
combining known operations in the algebra, using them as a basis to synthesize
a program that emulates the target function.
Both strategies are implemented using two recursive higher order functions
that we created and that we named the drill and the join.
In computer science higher-order functions are functions that take one or
more functions as input and output a function. They correspond to linear map-
pings in mathematics.
4 The Drill function
We define the set Fmof functions f:ZZp
2×ZZq
2ZZ2containing Boolean functions
belonging to a Boolean algebra of dimension m, described in polynomial form:
f(X, Y ) =
m
M
i=1
gi(X)hi(Y) (2)
where gi:ZZp
2ZZ2and hi:ZZq
2ZZ2are also Boolean functions. Note that
equations 1 and 2 are equivalent and interchangeable. Equation 2 only splits its
input space in two disjoint subsets: p+q=n.
Considering a function fFm, a chosen X0ZZp
2and a chosen Y0ZZq
2
such that f(X0, Y0)6= 0, we define the drill higher-order function:
IFX0Y0= IF(f(X, Y ), X0, Y0) = f(X, Y )(f(X0, Y )f(X, Y0)) (3)
Note that the function IF outputs a new function and has as inputs the function
fand instances of Xand Y, defining a position on finput space.
Theorem: If fFmand f(X0, Y0)6= 0, then f= IFX0Y0Frand rm1.
Proof: Consider W=span(h1, . . . , hm). Consequently dim(W)m. The
linear operator hWh(Y0) is not the zero map because the hypothesis
forbids hi(Y0) = 0 for all i= 1, ..., n. Consequently, the vector subspace W=
{hW|h(Y0)=0}has dim(W)m1. Notice that for all XZZp
2we have
f(X, ·)W. In fact:
f(X, Y0) = f(X, Y0)(f(X0, Y0)f(X, Y0)) = 0 (4)
Let r=dim(W) and hi, i = 1, . . . , r be a spanning set such that W=
span(h1, . . . , hr). For all XZZp
2,f(X, ·) can be represented as a linear com-
bination of the hi, the coefficients depending on X. In other words, there exist
coefficients gi(X) such that:
f(X, ·) =
r
M
i=1
gi(X)hi(5)
6 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
or written differently and remaining that r=dim(W) and consequently r
m1 :
f(X, Y ) =
r
M
i=1
gi(X)hi(Y)ut (6)
As an illustration let us consider a Boolean function f:ZZ2×ZZ2ZZ2whose be-
havior can be described by table 1. One possible representation for this function
Table 1. Truth table for f(x, y)
x y f (x, y )
F F T
T F F
F T T
T T T
would be f(x, y) = y(¬x∧ ¬y). fis of dimension 2 (no shorter representation
is possible). Note that this representation is given here for the illustration’s sake.
The method does not require high-level definitions, only I/O examples.
Respecting the stated hypothesis we can pick x0=Fand y0=Fonce
f(F, F ) = T. The partial functions obtained will be: f(x0, y) = y(T∧ ¬y) and
f(x, y0) = F(¬xT). Appplying the drill function we obtain:
f(x, y) = IF(f(x, y), x0, y0) = (y(¬x∧¬y))((y(T∧¬y))(F(¬xT))) = (xy)
(7)
We can see that f(x, y) is of dimension 1, confirming the stated theorem.
5 The Join function
Consider now the set Fmof Boolean functions f:ZZn
2ZZ2and v1, . . . , vmFm
a basis. The functions in this set can be described in polynomial form as:
f(X) =
m
M
i=1
λivi(X) (8)
where λiZZ2are the coefficients.
Considering a function fFm, a chosen XjZZn
2such that f(Xj)6= 0 and
a chosen function vjbelonging to the basis such that vj(Xj)6= 0, we define the
join function :
IHXjvj= IH(f(X), Xj, vj) = f(X)vj(X) (9)
Theorem: If fFm,f(Xj)6= 0 and vj(Xj)6= 0, then f= IHXjvjFrand
rm1.
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 7
Proof: Consider W=span(v1, . . . , vm). Consequently dim(W)m. The
linear operator vWv(Xj) is not the zero map otherwise vj(Xj) = 0.
Consequently, the vector subspace W={fW|f(Xj)=0}has dim(W)
m1. We can see that fW. In fact:
f(Xj) = f(Xj)vj(Xj) = 0 (10)
Let r=dim(W) and vi, i = 1, . . . , r be a spanning set such that W=span(vi, . . . , vr).
The function fcan be represented as a linear combination of the vi. In other
words, and remaining that r=dim(W) so rm1, there exist coefficients λi
such that:
f(X) =
r
M
i=1
λivi(X)ut (11)
We can use the same function f:ZZ2×ZZ2ZZ2described by table 1 to
illustrate the behavior of the join higher-order function. fbelongs to a Boolean
algebra of dimension 22which can be defined, for instance, by the following span-
ning set: v1(x, y) = x, v2(x, y) = y , v3(x, y) = xy, v4(x, y) = T. Respecting the
stated hypothesis we can pick Xj= (T, T ) and vj=v1once f(T , T ) = Tand
v1(T , T ) = T. Appplying the join function we obtain:
f(x, y) = IH(f(X), Xj, vj)=(y(¬x∧ ¬y)) x= (xy) (12)
We can see that f(x, y) is of dimension 1, confirming the stated theorem.
6 The Drill & Join program synthesis method
Drill and join are used to define a program synthesis method. Considering an
active learning framework, the input function f(X, Y ) on IF and the input func-
tion f(X) on IH represent an external unknown concept from which it is possible
to obtain data by means of queries (input-output examples).
This unknown concept could be, for instance, some physical phenomenon that
a machine with sensors and actuators can actively experiment, an algorithm to
be translated in a hardware description language, a computer program that one
would like to emulate or optimize or a decision process to be implemented on a
computer for which one or more experts are able to answer required questions.
In order to understand the method it is important to notice two important
properties of both higher-order functions:
– IF and IH can be applied recursively: if f(X, Y )Fmthen f1(X, Y ) =
IF(f(X, Y ), X0, Y0)Fm1and f2(X, Y ) = IF(f1(X, Y ), X1, Y1)Fm2.
Similarly, if f(X)Fmthen f1(X) = IH(f(X), X0, v0)Fm1and f2(X) =
IH(f1(X), X1, v1)Fm2. Each recursion generates a new function belong-
ing to an algebra of a lower dimension.
8 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
The recursion ends when the higher-order functions become the zero map:
IF(f(X, Y ), Xi, Yi)=0f(X, Y )=(f(Xi, Y )f(X, Yi)) and similarly,
IH(f(X), Xi, vi)=0f(X) = vi(X).
The first property enables us to apply the same higher order function recur-
sively in order to gradually reduce the dimensionality of the initial problem. The
second defines a stop condition. As a result, the output program is obtained.
X, Y ZZp
2×ZZq
2:fm+1(X, Y ) = IF(fm(X, Y ), Xm, Ym)=0
fm(X, Y ) = (fm(Xm, Y )fm(X, Ym))
Replacing fm(X, Y ) = IF(fm1(X, Y ), Xm1, Ym1) gives us:
fm1(X, Y )(fm1(Xm1, Y )fm1(X, Ym1)) = (fm(Xm, Y )fm(X, Ym))
(13)
and consequently:
fm1(X, Y )=(fm1(Xm1, Y )fm1((X, Ym1)) (fm(Xm, Y )fm(X, Ym))
(14)
Tracking the recursion back to the beginning we obtain:
f(X, Y ) =
m
M
i=1
fi(Xi, Y )fi(X, Yi) (15)
Equation 15 tells us that the original target function fcan be recreated using
the partial functions fobtained using the drill function. The partial functions
fare simpler problems, defined on subspaces of the original target function.
Similarly:
XZZn
2:fm+1(X) = IH(fm(X), Xm, vm)=0fm(X) = vm(X) (16)
Replacing fm(X) = IH(fm1(X), Xm1, vm1) gives us:
fm1(X)vm1(X) = vm(X) (17)
and:
fm1(X) = vm1(X)vm(X) (18)
Tracking the recursion back to the beginning we obtain:
f(X) =
m
M
i=1
vi(X) (19)
Equation 19 tells us that the original target function fcan be recreated using
the partial functions fobtained using the join function and the basis v.
Note that if the drill initial condition cannot be established, i.e., no (X0, Y0) :
f(X0, Y0)6= 0 can be found, the target function is necessarly f(X, Y ) = F. On
the same way if no X0:f(X0)6= 0 can be found to initiate join the target
function is the zero map f(X) = F. If it exists a Xj:f(Xj)6= 0 but no
vj:vj(Xj)6= 0 the basis was not chosen appropriately.
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 9
The IF higher order function defines a double recursion. For each step of the
dimensionality reduction recursion two subspace synthesis problems are defined:
fi(Xi, Y ) and fi(X, Yi). Each of these problems can be treated as a new target
function in a Boolean algebra of reduced arity once part of the arguments is
fixed. They can be recursively solved using IF or IH again.
The combination of IF and IH defines a very powerful inductive method. The
IF higher-order function alone requires the solution of an exponential number of
subspace synthesis problems in order to inductively synthesize a target function.
The IH higher-order function requires a basis of the Boolean algebra. Considering
the 2ncardinality of a basis, it can be impractical to require its prior existence
in large arity algebras. Nevertheless, both functions combined can drastically
reduce the number of queries and the prior bases definition.
The method, detailed on the next sections, uses the following strategies:
Bases are predefined for low arity input spaces, enabling the join function.
Synthesis on large arity input spaces begin using the IF function
Previously sinthesized programs on a subspace can be memorized in order
to compose a basis on that subspace.
At each new synthesis, if a basis exists use IH, otherwise use IF.
To illustrate the functioning of the whole method let us use again the same
function f:ZZ2×ZZ2ZZ2described by tables 1 and 2. We can apply the drill
function one first time f1(X, Y ) = IF(f(X, Y ), X0, Y0) with x0=F,y0=F,
defining two subspace problems: f(x0, y) and f(x, y0). Both problems can be
solved using the join function and the basis v0(x) = x,v1(x) = T.f1is not the
zero map, requiring a second recursion of drill:f2= IF(f1(X, Y ), X1, Y1) with
x1=T,y1=T. Two new subspace problems are defined: f1(x1, y) and f1(x, y1)
and they can be solved using join and the same basis again. f2(X, Y ) = Fis
finally the zero map, stopping the recursion. Table 2 shows the target function
and the transformation steps performed using the drill function. To illustrate
Table 2. Truth table for the drill steps
x y f (x, y )f(x0, y)f(x, y0)f1(x, y)f1(x1, y)f1(x, y1)f2(x, y)
F F T T T F F F F
T F F T F F F T F
F T T T T F T F F
T T T T F T T T F
the use of the join function let us consider the reconstruction of f(X) = f(x, y0)
detailed in table 3. One first application f1(X) = IH(f(X), X0, v0) with x0=F
and v0(x) = xwill not result in the zero map, as shown on the fifth column of
table 3, requiring a recursive call f2(X) = IH(f1(X), X1, v1) with x1=Tand
v1(x) = Twhich will result the zero map.
10 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
Table 3. Truth table for the join steps
x f(x)v0(x)v1(x)f1(x)f2(x)
F T F T T F
T F T T T F
Using equation 19 we can find a representation for the partial target function:
f(X) = f(x, y0) = v0(x)v1(x) = xT. The same process can be used to find
representations for all one-dimensional problems: f(x0, y) = T,f1(x1, y) = y
and f1(x, y1) = x.
Having solved the partial problems we can use equation 15 to build the
full target function: f(x, y)=(f(x0, y)f(x, y0)) (f1(x1, y)f1(x, y1)) =
(T(xT)) (yx) which is equivalent to our initial representation.
Each higher order function underlies a query protocol. At each recursion of
IF one or more queries for data are made in order to find Xi, YiZZp
2×ZZq
2:
fi(Xi, Yi)6= 0. At each recursion of IH a position XiZZp
2:f(Xj)6= 0 and
a function vi:vi(Xi)6= 0 from the basis are chose. The queries require data
from the target function and must be correctly answered by some kind of oracle,
expert, database or system. Wrong answers make the algorithms diverge. Data is
also necessary to verify the recursion stop condition. A full test of fi(X, Y ) = 0 or
fi(X) = 0, scanning the whole input space, can generate proof that the induced
result exactly meets the target function. Partial tests can be enough to define
candidate solutions subject to further inspection.
7 The main algorithms of the Drill&Join method
To explain how the drill and join higher-order functions can be used as program-
forming functionals we propose the following two algorithms.
Drill takes as initial inputs the target function: fn and the dimension of its
input space: inputs. Deeper inside the recursion fn corresponds to fm(X, Y ),
inputs defines the dimension of its subspace and initial indicates its first free
dimension. Drill returns a synthesized functional program that emulates fn.
1: procedure Drill(f n, inputs,optional: initial = 0)
2: if have a basis for this subspace then
3: return Join(fn,inputs,initial);
4: end if
5: pos find a position inside this subspace where f n is not null;
6: if pos =null then
7: return FALSE; .Stop condition.
8: end if
9: fa(args) = fn(concatenate(args, secondhal f(pos))); . f (X, Y0)
10: fb(args) = fn(concatenate(f irsthalf(pos), args)); . f (X0, Y )
11: fc(args) = fn(args)(f a(f irsthalf(args)) f b(secondhalf (args))) ;
12: . fm+1(X, Y )
13: pa =Drill(f a, inputs/2, initial); .Recursive call to synthesize f(X, Y0)
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 11
14: pb =Drill(f b, inputs/2, initial +inputs/2); .Recurs. call forf(X0, Y )
15: pc =Drill(f c, inputs, initial); .Recursive call for fm+1(X, Y )
16: return ’((’ pa ’AND’ pb ’)’ XOR pc ’)’; .Returns the program
17: end procedure
Note that fa,f b and f c are functions based on f n and pa,pb and pc are
programs obtained by recursively calling drill.f irsthalf and seconhalf split a
vector in two halves.
Join takes as input a funcion fn, belonging to a Boolean algebra. The al-
gorithm requires a basis for this Boolean algebra, materialized as an array of
functions: basis[] and an array of programs emulating the basis: basisp[]. The
algorithm creates a program to emulate fn combining the program basis.
1: procedure Join(f n,optional: initial = 0)
2: pos find a position inside this subspace where f n is not null;
3: if pos =null then
4: return FALSE; .Stop condition.
5: end if
6: vfind a function v from the basis such that v(pos) is not 0;
7: vp get the program that emulates v;
8: fa(args) = fn(args)v(args);
9: pa =Join(f a, initial); .Recursive call for fm+1(X)
10: return ’(’ pa ’XOR’ vp(initial) ’)’ ; .Returns the program.
11: end procedure
8 A Common Lisp version of the Drill&Join method
Common Lisp is a natural choice of programming language to implement the
drill&join method. Lisp functions can take other functions as arguments to build
and return new functions. Lisp lists can be interpreted and executed as programs.
The following code implement the simplest version of the method. It synthe-
sizes a Lisp program that emulates a function fn by just querying it. The function
fn must accept bit strings as inputs and must return one bit as the answer. In
this simple illustration fn is another Lisp function but in real use it would be an
external source of data queried throughout an adequate experimental protocol.
Drill takes the unknown function fn and its number of binary arguments
nargs as input. It returns a list composed of logical operators, logical symbols
nil and true and references to an input list of arguments. The output list can be
executed as a Lisp program that emulates fn.
(defun drill (fn nargs &optional (ipos 0) (slice 0))
(let ((base (list nil #’(lambda(x) (first x)) #’(lambda(x) t) ) )
(basep (list nil #’(lambda(x) (list ’nth x ’args)) #’(lambda(x) t))))
(if (= nargs 1)
(join fn base basep ipos)
(let ((pos (findpos fn nargs slice)))
(if (null pos)
nil
(labels ((fa (args) (funcall fn (append args (cdr pos))))
(fb (args) (funcall fn (append (list (car pos)) args)))
(fc (args) (xor (funcall fn args)
12 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
(and (fa (list (car args))) (fb (cdr args))))))
(let ((sa (drill #’(lambda(args) (fa args)) 1 ipos))
(sb (drill #’(lambda(args) (fb args)) (1- nargs) (1+ ipos)))
(sc (drill #’(lambda(args) (fc args)) nargs ipos (1+ slice))))
(if (and (atom sa) sa)
(setq r1 sb)
(setq r1 (list ’and sa sb)))
(if (null sc)
r1
(list ’xor r1 sc)))))))))
In this implementation the split of the input space is done by choosing the
first argument to be Xand the rest to be Y.Drill will call Join when the
recursion is down to just one input function (nargs=1). A basis for one input
bit functions {f(x) = x, f (x) = T}is defined directly inside drill as a functions
list base and a programs list basep which are passed as arguments to the join
function which is implemented as follows:
(defun join (fn base basep &optional (ipos 0))
(let ((pos (findpos fn 1)))
(if (null pos)
nil
(let ((fb (findbase base basep pos)))
(labels ((fa (args) (xor (funcall fn args) (funcall (nth 0 fb) args))))
(let ((r (join #’(lambda(args) (fa args)) base basep ipos)))
(return-from join (list ’xor (funcall (nth 1 fb) ipos) r))))))))
The findpos function is used to test if fn is the zero map performing a full
search on the fn input space. It stops when a non-zero answer is found and
returns its position. The full search means that no inductive bias was used.
(defun findpos (fn nargs)
(loop for i from 0 to (1- (expt 2 nargs)) do
(let ((l (make-list nargs)) (j i) (k 0))
(loop do (if (= (mod j 2) 1) (setf (nth k l) t))
(incf k) (setq j (floor j 2))
while (> j 0))
(if (funcall fn l) (return-from findpos l)))))
The findbase function is used inside join to find a function from the basis
respecting the constraint vj(Xj)6= 0.
(defun findbase (base basep pos)
(loop for i from 1 to (1- (list-length base)) do
(if (funcall (nth i base)pos)
(let ((ba (nth i base)) (bp (nth i basep)))
(setq fb (list ba bp))
(delete (nth 0 fb) base) (delete (nth 1 fb) basep)
(return-from findbase fb)))))
The method queries the target function (funcall fn) only inside findpos, in
order to find a non-zero position.
Using the Lisp code provided above it is possible to check the consistency
of the method. Applied to our illustration described by table 1 the generated
program would be:
(XOR (AND (XOR T (NTH 0 ARGS)) T) (AND (NTH 0 ARGS) (NTH 1 ARGS)))
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 13
As a more advanced illustration, let us consider the ”unknown” target func-
tion to be the Fibonacci sequence. To avoid bulky outputs we will limit the il-
lustration to have as input an unsigned integer between 0 and 63. Consequently,
the input of the fn function can be a six bit long bit string. The range of the
output (between 1 and 6557470319842) requires a 64 bits unsigned integer. The
target function fibonacci(n) computes a long integer corresponding to the nth
position of the Fibonnacci sequence. To translate integers to lists of {NI L, T }
handled by the drill and join lisp code we use their binary representation. The
translation is done by the routines longint2bitlist, bitlist2longint, 6bitlist2int and
6int2bitlist, not included in this paper, called from the function myf ibonacci(n)
(defun myfibonacci(n) (let ((r (6bitlist2int n))) (longint2bitlist (fibonacci r))))
In order to synthesize a program able to emulate the whole target function we
need to call Drill for each output bit and generate a list of boolean expressions.
(defun synthesis (fn nargs nouts filename)
(with-open-file (outfile filename :direction :output)
(let ((l (make-list nouts)))
(loop for i from 0 to (1- nouts) do
(labels ((fa (args) (nth i (funcall fn args))))
(let ((x (drill #’(lambda(args) (fa args)) nargs))) (setf (nth i l) x))))
(print l outfile))))
To run the generated program we need to compute each output bit and
translate the bit string into an integer:
(defun runprogram(filename v)
(with-open-file (infile filename)
(setq s (read infile)) (setq args (6int2bitlist v))
(let ((r (make-list (list-length s))))
(loop for i from 0 to (1- (list-length s)) do
(setf (nth i r) (eval (nth i s))))
(print (bitlist2longint r)))))
A full check on the whole target function input space shows that the syn-
thesized program exactly emulates the target function. To illustrate how the
generated programs looks like we show below the expression that computes the
first output bit of the Fibonacci sequence:
(XOR (AND (XOR (AND (XOR (AND (NTH 0 ARGS) (XOR T (NTH 1 ARGS))) (AND (XOR T
(NTH 0 ARGS)) (NTH 1 ARGS))) T) (AND (XOR (AND (XOR T (NTH 0 ARGS)) T) (AND
(NTH 0 ARGS) (NTH 1 ARGS))) (NTH 2 ARGS))) (XOR (AND (XOR (AND (XOR T (NTH 3 ARGS))
T) (AND (NTH 3 ARGS) (NTH 4 ARGS))) (XOR T (NTH 5 ARGS))) (AND (XOR (AND (NTH 3 ARGS)
(XOR T (NTH 4 ARGS))) (AND (XOR T (NTH 3 ARGS)) (NTH 4 ARGS))) (NTH 5 ARGS)))) (AND
(XOR (AND (XOR (AND (XOR T (NTH 0 ARGS)) T) (AND (NTH 0 ARGS) (NTH 1 ARGS))) (XOR T
(NTH 2 ARGS))) (AND (XOR (AND (NTH 0 ARGS) (XOR T (NTH 1 ARGS))) (AND (XOR T
(NTH 0 ARGS)) (NTH 1 ARGS))) (NTH 2 ARGS))) (XOR (AND (XOR (AND (NTH 3 ARGS)
(XOR T (NTH 4 ARGS))) (AND (XOR T (NTH 3 ARGS)) (NTH 4 ARGS))) T) (AND (XOR (AND
(XOR T (NTH 3 ARGS)) T) (AND (NTH 3 ARGS) (NTH 4 ARGS))) (NTH 5 ARGS)))))
The generated program is basically a single Boolean expression that explicitly
references an input list called args. The full program consists of a list of those
Boolean expressions.
14 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
9 Discussion
The implementation presented on section 8 has didactic purposes only. A num-
ber of enhancements are possible. Bases can be dynamically built on subspaces
by memorizing programs previously created avoiding drilling down to smaller
subspaces and reducing the number of queries to the target function.
An interesting aspect of the generated programs is the fact that the com-
putation of each output bit is completely independent of the others. There is a
specific program for each output bit, enabling parallel processing.
Empirical comparisons between the proposed method and existing ones are
difficult because of the conceptual differences between them. Existing benchmark
frameworks, as CHStone [17] and and SyGuS [13] tend to be specific to a cer-
tain synthesis approach. CHStone was conceived for C-based high-level synthesis
and SyGuS for syntax-guided hybrid synthesis (inductive and deductive). Con-
ceptual comparisons can be done but without objective results. The proposed
method can handle target functions that others probably cannot, but because
it works at the bit level of inputs and output, the number of examples required
for learning and testing tend to be larger than in other methods. Most exist-
ing inductive methods use static example databases while our method is based
on active learning and requires an experimental protocol in order to query the
target concept during the learning process. Our method requires a predefined
input space, with fixed length, and does not handle dynamic input lists like in
Lisp programs generated by most variants and extensions of Summers [9] an-
alytical approach. But on the other side, the simplicity of the code generated
by our method, based on only two logical operators, enables its compilation in
almost any conventional programming language, even on hardware description
languages. The declarative nature of the generated programs brings predictablily
at runtime in terms of execution time and use of machine resources. The example
proposed in section 8 showed how to synthesize a declarative program to emulate
the Fibonacci sequence. The generated program requires, for any six bits input,
exactly 1804 low-level, bitwise logic operations to compute the corresponding Fi-
bonacci number. An equivalent imperative program will require local variables,
loop or recursion controls and a number of variable assignments and arithmetic
operations proportional to the input value. The method does not require any
prior knowledge about the target function, as a background theory, types of
variables or operation on types, like in hybrid methods. Nevertheless, the goal
of this paper is not to prove that we created a better method but to present
a new concept on inductive program synthesis. Future work will be necessary
in order to assess the advantages and disadvantages of the proposed method in
each possible field of application when other methods are also available.
For practical use of our method it is important to be able to estimate the effort
required to synthesize a program. In typical active learning, there is usually a cost
element associated with every query. This cost depends on the characteristics of
the target concept and the associated experimental protocol used to query it.
The synthesis effort will depend on the number of queries to be required and
the cost of each query. If the implementation of the method is based on a full
L
A
T
E
X style file for Lecture Notes in Computer Science – documentation 15
verification of the zero map, like in our illustration presented on section 8, a
full scan of the target function input space will be necessary and the number
of queries will depend only on the size of the input space in bits. As a future
work an inductive bias can be proposed in order to avoid the full scan and then
reduce the number of queries.
10 Applications
Applications of this method include all sorts of systems based on bitwise opera-
tions, given that the learning problem can be described in functional form. We
succesfully applied the method to target functions handling different complex
datatypes, as floating-point numbers and texts.
Any functional computer program can be emulated using our approach.
There can be a number of reasons to perform reverse engineering of existing
computer programs: the lack of a source code, the need to translate an exe-
cutable program to run on a different computational platform, the intent to
optimize an inefficient implementation.
The method can also be used to translate algorithms from a high-level lan-
guage directly into combinatorial circuit design expressed in an hardware de-
scription language. The code generated by the proposed method can be easily
mapped into a netlist (sequence of circuit gates).
Machine learning is another field of application. Machines having sensors and
actuators, like robots, can acquire direct models using the method. The machine
can actively experiment a physical phenomena and synthesize a program to
predict the result of possible actions.
11 Conclusion
We have presented Drill&Join, a generic method that actively interacts with an
external concept (system, function, oracle or database) via I/O examples, and
synthesizes programs. Generated programs are based on Boolean expressions.
The method is not restricted to any specific form of functional learning problem
or target function and does not require any background knowledge to be applied.
The only requirement is that the external source of data is consistent. Our
work presents a number of interesting questions for future consideration. The
combination of the drill and the join higher-order functions and the dynamic
construction of bases on subspaces via memorization of generated program can
drastically reduce the number of recursive calls and queries. Further investigation
is necessary on how to explore these dynamic bases on large input spaces. The
stop condition of the algorithms, based on a full verification of the zero map,
requires a complete scan of the learning input space. Partial verifications can
compromise the convergence of the algorithms. Investigations on inductive biases
adequate to the method are necessary.
16 L
A
T
E
X style file for Lecture Notes in Computer Science – documentation
References
1. S. B. Kotsiantis, Supervised Machine Learning: A review of classification techniques,
Informatica 31:249-268, 2007.
2. S. Muggleton, L. De Raedt, D. Poole, I. Bratko, P. Flach, K. Inoue and A. Srinivasan,
ILP turns 20. Biography and future challenges, Machine Learning 86:3-23. DOI
10.1007/s10994-011-5259-2, 2012.
3. E. Kitzelmann, U. Schmid, and R. Plasmeijer, Inductive Programming: A Survey of
Program Synthesis Techniques, In: LNCS, Approaches and Applications of Inductive
Programming, 3rd Workshop AAIP, Revised Papers, Springer-Verlag, 2010
4. M. H. Stone, The theory of representations of Boolean Algebras, Transactions of the
American Mathematical Society 40: 37-111, 1936.
5. A. Albarghouthi, S. Gulwani and Z. Kincaid, Recursive Program Synthesis Proceed-
ings of the 25th International Conference on Computer aided Verification (CAV),
Saint Petersburg, Russia, 2013.
6. E. Kitzelmann, A Combined Analytical and Search-Based Approach for the Inductive
Synthesis of Functional Programs, Kunstliche Intelligenz, 25(2): 179-182. 2011.
7. E. Kitzelmann, U. Schmid, Inductive Synthesis of Functional Programs: An Ex-
planation Based Generalization Approach. Journal of Machine Learning Research,
7:429-454. 2006.
8. E. Kitzelmann, Analytical Inductive Functional Programming, M. Hanus (Ed.):
LOPSTR 2008, LNCS 5438, pp. 87-102, 2009.
9. P.D. Summers, A methodology for LISP program construction from examples. Jour-
nal ACM, 24(1):162-175, 1977.
10. D. R. Smith, The synthesis of LISP programs from examples. A survey. In A.W.
Biermann, G. Guiho, and Y. Kodratoff, editors, Automatic Program Construction
Techniques, pages 307-324. Macmillan, 1984.
11. S. Muggleton and L. De Raedt, Inductive logic programming: Theory and methods.
Journal of Logic Programming, Special Issue on 10 Years of Logic Programming,
19-20:629-679, 1994.
12. T. Sasao, Switching theory for logic synthesis, Springer, 1999. ISBN:0-7923-8456-3.
13. R. Alur, R. Bodik., G. Juniwal et. al. Syntax-Guided Synthesis, FMCAD, page
1-17. IEEE (2013).
14. N. Lavrac and S. Dseroski, Inductive Logic Programming Techniques and Applica-
tions. Ellis Horwood, New York, 1994.
15. J. L. Tripp, M. B. Gokhal and K. D. Peterson, Trident: From High-Level Language
to Hardware Circuitry. IEEE - Computer, 0018-9162/07, March 2007.
16. J. Backus, Can Programming Be Liberated from the von Neumann Style? A Func-
tional Style and Its Algebra of Programs. Communications of the ACM, Volume 21,
Number 8, August 1978.
17. Y. Hara, H.i Tomiyama, S. Honda and H. Takada, Proposal and Quantitative Anal-
ysis of the CHStone Benchmark Program Suite for Practical C-based High-level Syn-
thesis, Journal of Information Processing, Vol. 17, pp.242-254, (2009).
18. S.Jha, S.Gulwani, S.A.Seshia, A.Tiwari. Oracle-guided Component-based Program
Synthesis. In: ICSE, 2010.
19. S. A. Seshia: Sciduction: combining induction, deduction, and structure for verifi-
cation and synthesis. DAC 2012: 356-365.
This article was processed using the L
A
T
E
X macro package with LLNCS style
... The synthesis of each component Boolean function f i follows the expansion approach, in which the target function is iteratively separated in its own subspaces, with each division decreasing the dimension of the vectorial space of the function by at least 1, until the bases of the function are found and recombined in the function f i itself. The drill-and-join synthesis method we consider in this paper has been recently introduced by Balaniuk [16] as an efficient implementation of this idea. ...
... The drill-and-join method was recently proposed as a new inductive method for efficient program synthesis [16]. ...
... We present a simple example of the drill-and-join algorithm in action [16]. Consider the target function f (x|y) = y ∨ (¬x ∧ ¬y). ...
Article
Full-text available
Control flow obfuscation techniques can be used to hinder software reverse-engineering. Symbolic analysis can counteract these techniques, but only if they can analyze obfuscated conditional statements. We evaluate the use of dynamic synthesis to complement symbolic analysis in the analysis of obfuscated conditionals. We test this approach on the taint-analysis-resistant Mixed Boolean Arithmetics (MBA) obfuscation method that is commonly used to obfuscate and randomly diversify statements. We experimentally ascertain the practical feasibility of MBA obfuscation. We study using SMT-based approaches with different state-of-the-art SMT solvers to counteract MBA obfuscation, and we show how targeted algebraic simplification can greatly reduce the analysis time. We show that synthesis-based deobfuscation is more effective than current SMT-based deobfuscation algorithms, thus proposing a synthesis-based attacker model to complement existing attacker models.
... Once the control flow graph is recovered and the function is deobfuscated, one of the goals of the presented approach is to recompile and execute the lifted function. Due to the choice of LLVM-IR as destination language for the lifted binary code, we can easily compile the recovered code back into binary code by using one of the 1 As close as possible to a non-obfuscated compiled source code Our experiments show that we are able to apply our approach on current state-of-the-art obfuscations and also, to partially defeat the anti-symbolic deobfuscation tricks introduced in [22]. ...
... The work presented in [19], although based on binary execution traces, is a valid starting point to improve the detection of the dynamic opaque predicates in SATURN. While the work presented in [3] describes a strong simplification methodology based on the Drill&Join synthesis technique [1] which is orthogonal to the ones in SATURN and could further improve the MBA expressions handling. As discussed in Section 12, a plugin system would enable us to integrate these approaches during the exploration phase. ...
Preprint
The strength of obfuscated software has increased over the recent years. Compiler based obfuscation has become the de facto standard in the industry and recent papers also show that injection of obfuscation techniques is done at the compiler level. In this paper we discuss a generic approach for deobfuscation and recompilation of obfuscated code based on the compiler framework \textit{LLVM}. We show how binary code can be lifted back into the compiler intermediate language \textit{LLVM-IR} and explain how we recover the control flow graph of an obfuscated binary function with an iterative control flow graph construction algorithm \cite{biondi:hal-01241356} based on compiler optimizations and SMT solving. Our approach does not make any assumptions about the obfuscated code, but instead uses strong compiler optimizations available in \textit{LLVM} and \textit{Souper Optimizer} to simplify away the obfuscation. Our experimental results show that this approach can be effective to weaken or even remove the applied obfuscation techniques like constant unfolding, certain arithmetic-based opaque expressions, dead code insertions, bogus control flow or integer encoding found in public and commercial obfuscators. The recovered \textit{LLVM-IR} can be further processed by custom deobfuscation passes that are now applied at the same level as the injected obfuscation techniques or recompiled with one of the available \textit{LLVM} backends. The presented work is implemented in a deobfuscation tool called \textit{SATURN}
... A similar oracle can be defined for any sub-expressions (namely sub-AST) of ϕ n v ; only the number of variables might vary. 1 The path predicate is implicitly included in the sequence as it is the conjunction of the symbolic values of the instruction pointer register. 2 For the rest of the paper we use AST and expression interchangeably ...
... The work presented in [19], although based on binary execution traces, is a valid starting point to improve the detection of the dynamic opaque predicates in SATURN. While the work presented in [3] describes a strong simplification methodology based on the Drill&Join synthesis technique [1] which is orthogonal to the ones 5 The comparison is available in the results' repository in SATURN and could further improve the MBA expressions handling. As discussed in Section 12, a plugin system would enable us to integrate these approaches during the exploration phase. ...
Conference Paper
The strength of obfuscated software has increased over the recent years. Compiler based obfuscation has become the de facto standard in the industry and recent papers also show that injection of obfuscation techniques is done at the compiler level. In this paper we discuss a generic approach for deobfuscation and recompilation of obfuscated code based on the compiler framework LLVM. We show how binary code can be lifted back into the compiler intermediate language LLVM-IR and explain how we recover the control flow graph of an obfuscated binary function with an iterative control flow graph construction algorithm based on compiler optimizations and satisfiability modulo theories (SMT) solving. Our approach does not make any assumptions about the obfuscated code, but instead uses strong compiler optimizations available in LLVM and Souper Optimizer to simplify away the obfuscation. Our experimental results show that this approach can be effective to weaken or even remove the applied obfuscation techniques like constant unfolding, certain arithmetic-based opaque expressions, dead code insertions, bogus control flow or integer encoding found in public and commercial obfuscators. The recovered LLVM-IR can be further processed by custom deobfuscation passes that are now applied at the same level as the injected obfuscation techniques or recompiled with one of the available LLVM backends. The presented work is implemented in a deobfuscation tool called SATURN.
Article
Full-text available
Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers.
Article
The classical formulation of the program-synthesis problem is to find a program that meets a correctness specification given as a logical formula. Recent work on program synthesis and program optimization illustrates many potential benefits of allowing the user to supplement the logical specification with a syntactic template that constrains the space of allowed implementations. Our goal is to identify the core computational problem common to these proposals in a logical framework. The input to the syntax-guided synthesis problem (SyGuS) consists of a background theory, a semantic correctness specification for the desired program given by a logical formula, and a syntactic set of candidate implementations given by a grammar. The computational problem then is to find an implementation from the set of candidate expressions so that it satisfies the specification in the given theory. We describe alternative solution strategies that combine learning, counterexample analysis and constraint solving. We report on prototype implementations, and present experimental results on the set of benchmarks collected as part of the first SyGuS-Comp competition held in July 2014.
Article
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single article cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.
Conference Paper
Input-output examples are a simple and accessible way of describing program behaviour. Program synthesis from input-output examples has the potential of extending the range of computational tasks achievable by end-users who have no programming knowledge, but can articulate their desired computations by describing input-output behaviour. In this paper, we present Escher, a generic and efficient algorithm that interacts with the user via input-output examples, and synthesizes recursive programs implementing intended behaviour. Escher is parameterized by the components (instructions) that can be used in the program, thus providing a generic synthesis algorithm that can be instantiated to suit different domains. To search through the space of programs, Escher adopts a novel search strategy that utilizes special data structures for inferring conditionals and synthesizing recursive procedures. Our experimental evaluation of Escher demonstrates its ability to efficiently synthesize a wide range of programs, manipulating integers, lists, and trees. Moreover, we show that Escher outperforms a state-of-the-art SAT-based synthesis tool from the literature.
Article
Switching Theory for Logic Synthesis covers the basic topics of switching theory and logic synthesis in fourteen chapters. Chapters 1 through 5 provide the mathematical foundation. Chapters 6 through 8 include an introduction to sequential circuits, optimization of sequential machines and asynchronous sequential circuits. Chapters 9 through 14 are the main feature of the book. These chapters introduce and explain various topics that make up the subject of logic synthesis: multi-valued input two-valued output function, logic design for PLDs/FPGAs, EXOR-based design, and complexity theories of logic networks. An appendix providing a history of switching theory is included. The reference list consists of over four hundred entries. Switching Theory for Logic Synthesis is based on the author's lectures at Kyushu Institute of Technology as well as seminars for CAD engineers from various Japanese technology companies. Switching Theory for Logic Synthesis will be of interest to CAD professionals and students at the advanced level. It is also useful as a textbook, as each chapter contains examples, illustrations, and exercises.
Article
Inductive program synthesis addresses the problem of automatically generating (declarative) recursive programs from ambiguous specifications such as input/output examples. Potential applications range from software development to intelligent agents that learn in recursive domains. Current systems suffer from either strong restrictions regarding the form of inducible programs or from blind search in vast program spaces. The main contribution of my dissertation (Kitzelmann, Ph.D.thesis, 2010) is the algorithm Igor2 for the induction of functional programs. It is based on search in program spaces but derives candidate programs directly from examples, rather than using them as test cases, and thereby prunes many programs. Experiments show promising results.
Conference Paper
We present a novel approach to automatic synthesis of loop- free programs. The approach is based on a combination of oracle-guided learning from examples, and constraint-based synthesis from components using satisfiability modulo theo- ries (SMT) solvers. Our approach is suitable for many appli- cations, including as an aid to program understanding tasks such as deobfuscating malware. We demonstrate the effi- ciency and effectiveness of our approach by synthesizing bit- manipulating programs and by deobfuscating programs.