Conference PaperPDF Available

Making Universal Induction Efficient by Specialization

Authors:

Abstract

Efficient pragmatic methods in artificial intelligence can be treated as results of specialization of models of universal intelligence with respect to a certain task or class of environments. Thus, specialization can help to create efficient AGI preserving its universality. This idea is promising, but has not yet been applied to concrete models. Here, we considered the task of mass induction, which general solution can be based on Kolmogorov complexity parameterized by reference machine. Futamura-Turchin projections of this solution were derived and implemented in combinatory logic. Experiments with search for common regularities in strings show that efficiency of universal induction can be considerably increased for mass induction using proposed approach.
Making Universal Induction Efficient by Specialization
Alexey Potapov1,2, Sergey Rodionov1,3
1AIDEUS, Russia
2National Research University of Information Technology, Mechanics and Optics,
St. Petersburg, Russia
3Aix Marseille Université, CNRS, LAM (Laboratoire d'Astrophysique de Marseille) UMR
7326, 13388, Marseille, France
{potapov,rodionov}@aideus.com
Efficient pragmatic methods in artificial intelligence can be treated as results of
specialization of models of universal intelligence with respect to a certain task
or class of environments. Thus, specialization can help to create efficient AGI
preserving its universality. This idea is promising, but has not yet been applied
to concrete models. Here, we considered the task of mass induction, which gen-
eral solution can be based on Kolmogorov complexity parameterized by refer-
ence machine. Futamura-Turchin projections of this solution were derived and
implemented in combinatory logic. Experiments with search for common regu-
larities in strings show that efficiency of universal induction can be considera-
bly increased for mass induction using proposed approach.
1 Introduction
Knowledge about the appropriate degree of universality of intelligence is essential for
artificial general intelligence, since it determines the strategy of research and devel-
opment in this field. Relatively generally accepted point of view supposes that intelli-
gence consists of a set of specialized modules, which cooperation yields a synergetic
effect [1]. On the other hand, models of universal intelligence exist [2, 3], which in
theory possess capabilities unachievable by compositions of narrow methods, but
possibly indispensable for general intelligence.
However, strong objection against these models is their computational infeasibility,
which makes doubtful that desirable capabilities can be really achieved in practice. In
part of induction, the most direct way to improve their efficiency is to select some
appropriate reference machine, which specifies inductive bias agreed with reality [4].
This operation is useful, but insufficient, because it can help to find only limited
number of simple regularities in reasonable time, while more complex regularities
inevitably encountered in nature will be unrecoverable. More efficient practical ap-
proximations of universal intelligence models lose universality (e.g. [5]).
One can possibly conclude that efficient universality cannot be achieved, and uni-
versality is mostly of theoretical interest, or inefficient universal methods can work in
parallel with efficient pragmatic intelligence as a last resort.
However, universal intelligence can be probably much more useful in practical
sense. Principal possibility of automatic construction of efficient narrow methods
from inefficient general methods via program specialization (partial evaluation) was
noticed 30 years ago [6]. Recently, this idea was put into the context of AGI research
[7]. In particular, it was indicated that specializing a universal intelligence w.r.t. some
problem and solving this problem afterwards can be computationally more efficient
than solving this problem immediately. This result implies that models of universal
intelligence can be made more efficient without violating universality. Indirect utiliza-
tion of universality in the form of construction of specialized methods is attractive
since it also bridges the gap between the two approaches, and shows that systems
composed of a set of specialized modules are insufficient for AGI, because this set is
fixed and is not automatically extended.
However, no analysis of possible specialization of concrete models of universal in-
telligence has been given yet. In this paper, we make the first such attempt focusing
on Solomonoff’s universal induction (of course, specialization of decision-making is
also of interest). We consider this problem on example of mass induction tasks for
which benefits from specialization should be more evident.
2 Background
Universal Induction
For the sake of simplicity we will consider method of universal induction (instead of
universal prediction or universal intelligence), which consists in searching for the
shortest program that reproduces given data [4]:
])(|)([minarg
*xpUplp p
, (1)
where each program p reproducing data x being executed on universal machine U is
treated as the (generative) model of this data, and p* is the best model.
This is universal solution to the problem of induction, because it possesses two
main properties: it relies on the universal (Turing-complete) space of models, in
which any computable regularity can be found, and it uses universal complexity-
based criterion for model selection (which universality also relies on Turing-
completeness since any two universal machines can emulate each other using inter-
preters with length independent of programs being executed).
This criterion is incomputable, but can be replaced with a computable counterpart
(e.g. Levin search, LSearch [8], based on Levin complexity instead of Kolmogorov
complexity), however required number of operations for identifying p* will be propor-
tional to 2l(p*)T(p*), where T(p*) is required time for p* to terminate.
This estimation cannot be reduced without violating universality, optimality or
without imposing some restrictions. Thus, the question is how to do this in a reasona-
ble way. We cannot build a method that will work better in any environment, but we
can build a method that will do much better in environments possessing some specific
properties (preserving its universality in other environments). These properties are
exploited in narrow methods.
Mass induction and data representations
In order to bridge the gap between efficient narrow methods and inefficient universal
methods, one should understand difference between them. Of course, practical non-
universal methods usually work with Turing-incomplete model spaces. That is why
they can be made computationally efficient. Their success in turn is conditioned by
correspondence between a class of data to be processed and a model space. Moreover,
does each practical method not only fit a specific class of input data, but it is also
applied to different data instances from this class independently meaning that each
such method is designed to solve some mass induction problem (a set of individual
induction tasks to be solved independently).
Indeed, computer vision systems, for example, are most usually applied to different
images (or video sequences) independently, and they rely on specific image represen-
tations (e.g. based on Fourier or wavelet transforms, contours, feature points, etc.),
which define corresponding model spaces.
Mass induction with introduced representations as a possible connection between
universal and narrow methods has already been considered [9]. One can state the
following mass induction task. Let the set
n
ii
x1
}{
of input data strings be given. In-
stead of searching for the shortest program reproducing concatenation x1xn, one can
simplify the problem and search for programs S and input strings y1yn for them such
that U(Syi)=xi. The program S is called representation for the data set {xi}, while yi are
the models obtained within this representation. For the sake of simplicity, we will
write S(y) instead of U(Sy), when U is fixed. Criterion for selecting the best represen-
tation and models can be derived from Kolmogorov complexity:
,)|()(min)...(1
21
n
iiU
S
nU SxKSlxxxK
)(minarg ,)()(minarg )(:
*
1
** ylyylSlS
i
xySy
i
n
ii
S
, (2)
which is a version of the representational minimum description length principle that
extends usual minimum description length principle [9].
Such consideration gives only the criterion for selecting models for data of specific
types and to reduce computational complexity in terms of decomposition of the high-
dimensional task with data x1xn into lower-dimensional subtasks with data xi.
However, each such subtask remains computationally intractable requiring enu-
meration of
)( *
2i
yl
models for each xi. At the same time, it might be not necessary if S
is not a universal machine. Actually, in most practical methods observational data xi
are directly mapped to their descriptions yi by some program S', which can be referred
to as a descriptive representation (in contrast to a generative representation S). For
example, if xi are images, then S' can be a filter, which outputs image edges, feature
points or something else as yi.
However, descriptive representations cannot be directly utilized in universal induc-
tion. The problem is in criterion. Generative framework assures that data compression
is lossless, and Kolmogorov complexity is the correct measure of amount of infor-
mation. In descriptive framework, we can calculate lengths of models, but we do not
know how much information is lost. Indeed, descriptive models impose constraints on
data content instead of telling how to reconstruct data. It can be seen on example of
(image) features. Value of each feature contains additional information about data, but
in indirect form. Given enough features, data can be reconstructed, but this task is
rather complex, if features are non-linear. Thus, it is difficult to tell from nonlinear
feature transform itself, whether it is lossless or not. This situation is similar to differ-
ence between declarative and imperative knowledge. For example, programs in
Prolog impose constraints on solution, but don’t describe, how to build it.
In the field of weak AI both representations and methods for searching models
within them are constructed manually (or chosen automatically from narrow classes).
Artificial general intelligence should have same capabilities as humans, namely,
should be able to construct new (efficient) methods by its own. Here, we consider this
process as specialization of AGI with respect to specific tasks to be solved.
Program Specialization
The idea of partial evaluation comes from observation that if one has a program with
several parameters, and value of one of these parameters is fixed and known, the pro-
gram can be transformed in such way that will work only with this value, but will do
it more efficiently. One simple common example is the procedure of raising x to pow-
er n using a cycle, which can directly compute x * x without cycles or iterations, when
it is specialized w.r.t. n=2. In general, if there is a specializer spec for programs in
some programming language R, then the result spec(p, x0) of specialization of a pro-
gram
Ryxp ),(
w.r.t. its first parameter x=x0 is the program with one parameter, for
which
),())(,()( 00 yxpyxpspecy
.
Most frequent application of partial evaluations is interpreters and compilers. Well-
known Futamura-Turchin projections [10] show that if there is an interpreter
RintL
for programs in a language L, then the result of its specialization w.r.t. some program
pL(x),
LpL
, is the program pL compiled into the language R, since
is the program in R such that
),())(,()( xpintLxpintLspecx LL
meaning that the
result of execution of this program is the same as the result of interpretation of the
program pL.
Further, since specializer takes two arguments, it in turn can also be specialized
w.r.t. interpreter intL yielding a compiler spec(spec, intL) from L to R, because
),())()(,(),( xpintLxpintLspecspecxp LLL
. One can further specialize spec w.r.t.
itself spec(spec, spec), which can be used in particular as a generator of compilers,
since
))(,()( intLspecspecspecintL
is the compiler from L to R.
Main interesting property of specialization consists not simply in the condition
),())(,()( 00 yxpyxpspecy
, but in fact that evaluation of a program specialized
w.r.t. some parameter can be much faster than evaluation of original program. How-
ever, traditional techniques of partial evaluations can guarantee only linear speedup
[11], which is enough for compilers, but inappropriate for our tasks. Some
metacomputation techniques allow for polynomial and even exponential speedup, but
unreliably [11]. Of course, in general the problem of optimal specialization is algo-
rithmically unsolvable, but humans somehow manage to use it (of course, the princi-
pal question is either human intelligence is efficiently general due to powerful spe-
cialization, or visa versa).
Specialization of Universal Induction
The most direct application of partial evaluation in mass induction consists in consid-
eration of the generative representation S as the interpreter of some language. Models
yi will be programs in this language. Then, one can directly apply Futamura-Turchin
projections. For example, specialization of S w.r.t. yi will yield compiled programs
spec(S, yi) in the language of the reference machine U. Such “compilation” of models
or construction of a compiler spec(spec, S) might have sense, but its effect on compu-
tational efficiency will be insignificant. Instead, one should consider specialization of
procedures of universal induction.
Consider the following extension of the Levin search procedure for the task of
mass induction (universal mass induction method):
Given strings x1,…xn enumerate all programs S in parallel (allocating resources for
each program proportional to 2l(S)); for each program and for each xi enumerate all
possible strings yi, until the first set
},...,,{ **
1
*n
yyS
is found such that
*** )( ii xyS
.
We will refer to this algorithm as RSearch (representation search). RSearch has a
subroutine (let refer to it as MSearch, model search), which searches for the best y for
given S and x: y=MSearch(S, x) such that S(y)=x implying
xxSMSearchSx )),(()(
. (3)
MSearch uses exhaustive search to find the best model within the utilized represen-
tation. However, the task of searching for the shortest generative model in Turing-
incomplete space (that corresponds to a set of possible input strings to some program
S) can have simplified or even explicit solution in the form of an algorithm, which
directly maps given x to its best model y.
Imagine that we have some specializer spec, which can accept the algorithm
MSearch(S, x) and any fixed representation S and produce its computationally effi-
cient projection such that
),())(,()( xSMSearchxSMSearchspecx
by definition of
spec. Let denote S' =spec(MSearch, S), then
),()(')( xSMSearchxSx
. Substituting
this result in (3), one can obtain
xxSSx ))('()(
.
Among all possible programs possessing this property, the program S should be
chosen, for which (2) is held. We have proven the following theorem.
Theorem 1. The result of specialization of the model search in the universal mass
induction method (RSearch) w.r.t. the fixed representation S optimal for the set {xi} of
input data is the program S' such that
xxSSx ))('()(
and
ii
xSl ))('(
is minimal.
This theorem shows that S' is the right-inverse to S, but not an arbitrary one, since
it should satisfy the information-theoretic optimality criterion.
It should be noted that equality S'(S(y))=y can be true not for all y, since for some
representations (e.g. interpreters of universal machines) many models producing the
same x can exist implying that if y is not an optimal model of x=S(y) then
yySSy ))(('
*
. Thus, S' is not necessarily left-inverse.
Theorem 1 shows that a descriptive representation is a result of partial evaluation
of (extended) universal induction with some generative representation. One can go
further and apply the second Futamura-Turchin projection in these settings, i.e. spe-
cializes the spec program itself w.r.t. MSearch algorithm. The program with the fol-
lowing property will be immediately obtained.
Theorem 2. Let spec(spec, MSearch)=Inv' then
xxSvInSxS
)))(((),(
.
This theorem might seem trivial, but it shows a connection between inversion and
specialization, which are usually considered as different tasks of metacomputation.
Again, it is not true that
yySSvInyS
))()((),(
, because S is not injection. Also
Inv' is not an arbitrary inversion, but it constructs S'=Inv'(S) optimal in terms of induc-
tion (as it is indicated in Theorem 1).
It is interesting to note that is usually considered as a particular case of meta-
computations, which also include program inversion (which is also assumed to be
more difficult than specialization). However, Theorem 2 shows that inversion can also
be obtained as the result of specialization in case of search procedures.
Of course, one can consider the third Futamura-Turchin projection also, which will
be spec(spec, spec) again. In this context, it appears to be not only a generator of
compilers, but a generator of procedures for solving inverse problems. This is quite
interesting abstract construction, but it is a way too general in terms of what it should
be able to do. Indeed, self-application spec(spec, spec) is supposed to be done without
knowing, which programs or data will be further passed to spec. The third projection
should probably be put in online settings (self-optimization of spec in runtime while
receiving concrete data) to become relevant to AGI.
3 Search for Representations
Inferred theoretical results reveal only general properties of specialized induction
procedures, but don’t give constructive means for building them. Straightforward way
to further this approach is to try applying partial evaluation and metacomputation
techniques directly to MSearch. Such an attempt can be very valuable, since it can
help to understand usefulness of program specialization as possible automated transi-
tion from general inefficient to narrow efficient intelligence.
However, in our particular case, the result of specialization spec(MSearch, S) can
be unknown together with S. Here, we don’t solve the problem of efficient automatic
construction of representations, but limit ourselves with the problem of efficient mod-
el construction within somehow created representations. For this reason, we use the
RSearch procedure modified in such a way that instead of searching for all yi for each
S, it searches for pairs of S and S' with S' satisfying Theorem 1. We will refer to this
procedure as SS'-Search.
Let us estimate computational complexity of different search procedures. The
number of operations in RSearch will be proportional to
i
yl
Sl i)(
)( 22
. It should be
pointed out that in the worst case LSearch time will be proportional
ii
ylSl )()(
2
=
i
yl
Sl i)(
)( 22
since no decomposition is done.
SS'-Search time will be proportional to 2l(S)2l(S') meaning that SS'-Search can even
be much more efficient than RSearch in cases, when yi are longer than S' (and this
should be quite common in sensory data analysis especially when data contain un-
compressible noise). However, the opposite result can also be possible. The main
advantage of SS'-Search should be in construction of S', which can help to search for
models of new data pieces more efficiently.
4 Implementation in Combinatory Logic
We built one resource-bounded implementation of universal induction using combina-
tory logic (CL) as the reference machine and genetic programming as the search en-
gine [12]. Because it is not biased toward some specific task, it is not practically use-
ful. However, it is appropriate for validating theoretical results. Here, we extend this
implementation on the case of mass induction problems using RSearch and SS'-
Search.
We used combinators K, S, B, b, W, M, J, C, T with the following reduction rules
K x y x S x y z x z (y z) B f g x = f (g x)
b f g x = g (f x) W x y = x y y M x = x x
J a b c d = a b (a d c) C f x y = f y x T x y = y x
where x, y, etc. are arbitrary CL-expressions.
We supplement CL alphabet with non-combinatory symbols 0 and 1 (and in some
experiments with other digits, which are treated simply as different symbols). Reduc-
tion of CL-expression can yield some non-reducible expression with combinators or
with only non-combinatory symbols.
Representations can be easily introduced in CL. Indeed, one can construct a CL-
expression as a concatenation of two expressions S and y treating S as a representation
and y as a model of data x within this representation, if concatenation Sy is reduced to
x. Then, being given a set of data pieces {xi} in mass induction settings, one should
search for one common S and different yi such that CL-expressions Syi are reduced
(possibly imprecisely) to xi.
In order to select the best S and yi the criterion (2) can be made more concrete:
iiix
iiyS
SySxdHylHSlHS ))(,()()(minarg ***
. (4)
where
*
i
y
are models obtained by MSearch or by applying S' to xi in SS'-Search; l(S)
and
)( *
i
yl
are lengths (number of symbols) in corresponding strings,
))(,( *
ii ySxd
is
the edit distance between two strings (number of symbols to be encoded to transform
)( *
i
yS
to xi); HS, Hy and Hx are number of bits per symbol to be encoded in corre-
sponding domains.
Application of genetic programming here is similar to our previous implementa-
tion, in which CL-expressions were represented as trees with typical crossover in the
form of sub-tree exchange. The main difference is that solutions in mass induction
have more complex structure, and crossover should be performed independently for
each part. It also appeared to be useful to allow modifications of only one part of the
solution per iteration (population).
Our implementation can be found at https://github.com/aideus/cl-lab
5 Experimental Results
We conducted some experiments with several sets of binary strings. Seemingly sim-
plest set was composed of string 11100101 repeated 10 times as x1x10. Obvious
optimal generative representation here coincides with the same string 11100101 (if
empty strings as models are acceptable) since this CL-expression will always be re-
duced to itself. However, RSearch failed to find this solution. The best found solution
appeared to be 1110 for representation (S), and 0101 for models (yi) for each string.
Surely, reduction of Syi will yield 11100101, but this solution is not optimal. Why
consequent optimization of this solution is difficult for genetic programming? String
11100 as the representation can be obtained from 1110 by single mutation, but this
will lead to necessity of rebuilding all models (since Syi will become equal to
111000101).
Impracticity of general methods in application to mass induction tasks as a conclu-
sion is trivial. At the same time, SS'-Search successfully solved this problem and
found S'=J(bMJK)T and S=W110010. Here, S'xi is reduced to yi=1, and Syi is re-
duced back to xi=11100101. In contrast to RSearch, incremental improvement of solu-
tions is achievable here easier, since modification of S requires consistent modifica-
tion of only S' instead of many yi.
Consider other sets of strings. The next set was composed of 16 strings
0101101101010, 0001101001011, 0111111110011, etc. These are random strings
starting with 0. RSearch failed to find any precise solution. Found representations had
such a form as B(BW(BWM)(01)) with corresponding models 10111, 10011, 11010,
etc. At the same time, SS'-Search found optimal solution S=0, S'=CK, in which S'
removes the first bit of the string producing models of data strings (which are difficult
to find blindly in RSearch), and S adds 0 as the first bit.
The next set contained the following strings 00000000, 00010001, 00100010,…
11111111. Both methods managed to find good solutions (although in 25% runs).
RSearch found S=SSbBBM and yi=0000, 0001, 1111. SS'-Search found
S=BBB(BM) and S'=B(SJCK) such that S' transforms xi to appropriate yi by remov-
ing a duplicating half of a string, which are then transformed back by S.
We also conducted some tests with an extended alphabet of non-combinatory sym-
bols including 0..9 (which were not interpreted as digits though). One set included
such strings as 159951, 248842, 678876, 589985, 179971, etc. (i.e. strings with mirror
symmetry). RSearch completely failed on this set, while SS'-Search found an optimal
solution S=B(S(BST))M, S'=JKK. Of course, a solution with the same representa-
tion and models is valid for RSearch also, but the search problem in purely generative
settings appeared to be too difficult.
Another set contained such strings as 307718, 012232, 689956, 782214, etc.
Common regularity for these strings is coincidence of 3rd and 4th symbol. Again,
RSearch was unsuccessful, while SS'-Search found S=KBbW and S'=BK, which add
and exclude redundant symbol correspondingly.
Our main intention to consider specialization of universal induction was to avoid
expensive search for individual models for every new data strings. Once S' is con-
structed, it can be directly applied to construct models in all considered cases. This is
the main benefit from using specialization of universal induction. We didn’t try to
solve the problem of automatic construction of arbitrary representations, but increase
in performance of universal methods in solving also this problem is importance.
Of course, capabilities of SS'-Search based on uninformed search are quite limited.
Indeed, in our experiments it failed to discover many seemingly simple regularities
especially in strings of varying length (partially because their representation in com-
binatory logic can be rather complex). Examples of unsuccessful tests include
{1221333, 3331221333, 2233313331333, 22122122122, …}, {00, 11, 000, 111,
0000, …}, {491234, 568485, 278412, 307183, 098710, …}, and others. Thus, this
solution is far from efficient universal induction. Nevertheless, comparison of
RSearch and SS'-Search shows that efficiency of universal induction can be consider-
ably increased, and there is a principal way to bridge the gap between efficient and
universal methods.
Conclusion
We considered universal induction in application to mass problems. Solutions of such
problems include representations that capture common regularities in strings, and
individual models of these strings. Such methods as LSearch can be directly extended
to solve mass problems. However, this leads to direct enumeration of both representa-
tions and models. At the same time, model search can be made much more efficient
for particular representations as it is done in efficient narrow methods of machine
perception and learning.
We studied a possibility to perform specialization of universal induction w.r.t.
some representation (reference machine). The result of such specialization should
correspond to a descriptive representation that maps inputs into models as efficient as
possible. However, the most difficult problem consisting in construction of represen-
tations themselves remains.
We proposed the SS'-Search method that can be treated as the generalization
autoencoders [13] for the Turing-complete space of representations. This method
consists in searching for descriptive and generative representations simultaneously. It
was implemented using combinatory logic as the reference machine. The SS'-Search
appeared to be much more efficient for mass induction tasks than the direct search for
generative models for each given string, but it still allows solving induction tasks of
rather low complexity. Further research is needed to increase efficiency of universal
methods. Also, analysis of specialization of concrete universal intelligence models (in
addition to universal induction) is of interest.
Acknowledgements
This work was supported by the Russian Federation President’s grant Council (MD-
1072.2013.9) and the Ministry of Education and Science of the Russian Federation.
References
1. Hart, D., Goertzel, B.: OpenCog: A Software Framework for Integrative Artificial General
Intelligence. In: Frontiers in Artificial Intelligence and Applications (Proc. 1st AGI Confer-
ence), vol. 171, pp. 468472 (2008).
2. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic
Probability. Springer (2005).
3. Schmidhuber, J.: Gödel Machines: Fully Self-Referential Optimal Universal Self-
improvers. In: Artificial General Intelligence. Cognitive Technologies, B. Goertzel and C.
Pennachin (Eds.), pp. 199226. Springer (2007).
4. Solomonoff, R.: Algorithmic Probability, Heuristic Programming and AGI. In: Baum, E.,
Hutter, M., Kitzelmann, E. (eds). Advances in Intelligent Systems Research, vol. 10 (proc.
3rd Conf. on Artificial General Intelligence), pp. 151157 (2010).
5. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI Approxima-
tion. J. Artificial Intelligence Research, vol.40, no.1, pp. 95142 (2011).
6. Kahn, K.: Partial Evaluation, Programming Methodology, and Artificial Intelligence. AI
Magazine 5(1), 5357 (1984).
7. Khudobakhshov, V.: Metacomputations and Program-based Knowledge Representation.
In:K.-U. Kühnberger, S. Rudolph, P. Wang (Eds.): AGI13, LNAI 7999, pp. 7077 (2013).
8. Levin, L. A.: Universal sequential search problems. Problems of Information Transmis-
sion, 9(3), pp. 265266 (1973).
9. Potapov, A., Rodionov, S.: Extending Universal Intelligence Models with Formal Notion
of Representation. In: J. Bach, B. Goertzel, M. Iklé (Eds.): AGI’12, LNAI 7716, pp. 242
251 (2012).
10. Futamura, Y.: Partial Evaluation of Computation Process an Approach to a Compiler-
Compiler. Systems, Computers, Controls 2(5), 4550 (1971).
11. Jones, N. D., Gomard, C.K., Sestoft, P.: Partial Evaluation and Automatic Program Gener-
ation. Prentice-Hall (1993).
12. Potapov, A., Rodionov, S.: Universal Induction with Varying Sets of Combinators. In: K.-
W. Kühnberger, S. Rudolph, P. Wang (Eds.): AGI’13, LNAI 7999, pp. 8897 (2013).
13. Hochreiter, S, Schmidhuber, J.: Nonlinear ICA through Low-Complexity Autoencoders.
Proc. IEEE Int’l Symp. on Circuits and Systems, vol. 5, pp. 5356 (1999).
... In this paper, we start from the concept of narrow machine learning methods as the result of specialization of universal induction [12], and show that practical metalearning methods can be considered as the result of partial specialization of the universal induction. As the proof of concept, we develop a family of meta-inference methods in the form of deep neural networks and compare them on several tasks of different complexity. ...
... The presented work is conceptually based on our two previous research directions. The first one is the theory of universal induction specialization [12]. The second one is implementation of the universal induction in the form of probabilistic programming languages with optimization queries (e.g. ...
... corresponding generative models. One can also consider the problem of simultaneously learning machines µ and ν given some data that yields a sort of universal autoencoders [12]. ...
Conference Paper
Universal induction relies on some general search procedure that is doomed to be inefficient. One possibility to achieve both generality and efficiency is to specialize this procedure w.r.t. any given narrow task. However, complete specialization that implies direct mapping from the task parameters to solutions (discriminative models) without search is not always possible. In this paper, partial specialization of general search is considered in the form of genetic algorithms (GAs) with a specialized crossover operator. We perform a feasibility study of this idea implementing such an operator in the form of a deep feedforward neural network. GAs with trainable crossover operators are compared with the result of complete specialization, which is also represented as a deep neural network. Experimental results show that specialized GAs can be more efficient than both general GAs and discriminative models.
... The idea of an autoencoder is to use a descriptive map f to project input data x on a shorter residual description r, from which a feature map f can reconstruct it, see Fig. 1. The idea was inspired by SS'-Search [21]. Examples. ...
Article
Full-text available
The ability to find short representations, i.e. to compress data, is crucial for many intelligent systems. We present a theory of incremental compression showing that arbitrary data strings, that can be described by a set of features, can be compressed by searching for those features incrementally, which results in a partition of the information content of the string into a complete set of pairwise independent pieces. The description length of this partition turns out to be close to optimal in terms of the Kolmogorov complexity of the string. Exploiting this decomposition, we introduce ALICE – a computable ALgorithm for Incremental ComprEssion – and derive an expression for its time complexity. Finally, we show that our concept of a feature is closely related to Martin-Löf randomness tests, thereby formalizing the meaning of “property” for computable objects.
... The idea of an autoencoder is to use a descriptive map f 0 to project input data x on a shorter residual description r, from which a feature map f can reconstruct it, see Fig. 1. The idea was inspired by SS'-Search [21]. ...
Preprint
The ability to find short representations, i.e. to compress data, is crucial for many intelligent systems. We present a theory of incremental compression showing that arbitrary data strings, that can be described by a set of features, can be compressed by searching for those features incrementally, which results in a partition of the information content of the string into a complete set of pairwise independent pieces. The description length of this partition turns out to be close to optimal in terms of the Kolmogorov complexity of the string. At the same time, the incremental nature of our method constitutes a major step toward faster compression compared to non-incremental versions of universal search, while still staying general. We further show that our concept of a feature is closely related to Martin-L\"of randomness tests, thereby formalizing the meaning of "property" for computable objects.
... We consider a discriminative model as a result of specialization of a general inference procedure in projection onto a certain generative model [5]. So, the properties of models of both types are understandable. ...
Preprint
Full-text available
What frameworks and architectures are necessary to create a vision system for AGI? In this paper, we propose a formal model that states the task of perception within AGI. We show the role of discriminative and generative models in achieving efficient and general solution of this task, thus specifying the task in more detail. We discuss some existing generative and discriminative models and demonstrate their insufficiency for our purposes. Finally, we discuss some architectural dilemmas and open questions.
... In the definition, any feature is required to do at least some compression, l(f ) + l(p) < l(x), since otherwise f = f = id would always trivially satisfy the definition for any x. This procedure to search for description and its inverse at the same time has been proposed in [5], called SS -Search, albeit not in the context of features and incremental compression. ...
Conference Paper
The ability to induce short descriptions of, i.e. compressing, a wide class of data is essential for any system exhibiting general intelligence. In all generality, it is proven that incremental compression – extracting features of data strings and continuing to compress the residual data variance – leads to a time complexity superior to universal search if the strings are incrementally compressible. It is further shown that such a procedure breaks up the shortest description into a set of pairwise orthogonal features in terms of algorithmic information.
... One attempt to bridge this gap consists in producing efficient methods of narrow AI from models of universal intelligence by specialization [3,4]. Specialization consists in constructing an efficient projection of an algorithm with some of its parameters being fixed. ...
Conference Paper
Full-text available
The problem of bridging the gap between efficient but narrow methods of machine learning, and universal but inefficient methods was considered. Our main claim, which is methodologically important to the field of Artificial General Intelligence (AGI), is that neither narrow nor basic universal methods are sufficient for AGI. This claim was illustrated on example of pattern recognition task using stacked autoencoders and their two extensions with more exhaustive search and richer solution space. These three types of classifiers were evaluated on the base of a criterion that account for both error rate and training time. Depending on the urgency of the task to be solved, less or more universal methods appeared to be better. Thus, AGI might start with narrow methods, but should be able to perform their " universalization " (i.e. extension of the model space possibly up to Turing-complete space if it is appropriate in a certain situation).
Chapter
What frameworks and architectures are necessary to create a vision system for AGI? In this paper, we propose a formal model that states the task of perception within AGI. We show the role of discriminative and generative models in achieving efficient and general solution of this task, thus specifying the task in more detail. We discuss some existing generative and discriminative models and demonstrate their insufficiency for our purposes. Finally, we discuss some architectural dilemmas and open questions.
Conference Paper
Despite the fact that there are thousands of programming languages existing there is a huge controversy about what language is better to solve a particular problem. In this paper we discuss requirements for programming language with respect to AGI research. In this article new language will be presented. Unconventional features (e.g. probabilistic programming and partial evaluation) are discussed as important parts of language design and implementation. Besides, we consider possible applications to particular problems related to AGI. Language interpreter for Lisp-like probabilistic mixed paradigm programming language is implemented in Haskell.
Article
Full-text available
Flat Minimum Search, a regularizer algorithm for finding low-complexity networks describable by few bits of information, is employed to train autocoders. Flat minima are regions in weight space where there is tolerable small error and the weights can be perturbed without greatly affecting the network's output. The procedure reveals an important connection between regularization and independent component analysis. This connection may represent a first step towards unification of regularization and unsupervised learning.
Article
Full-text available
Personal motivation. The dream of creating artificial devices which reach or outperform human intelligence is an old one. It is also one of the two dreams of my youth, which have never let me go (the other is finding a physical theory of everything). What makes this challenge so interesting? A solution would have enormous implications on our society, and there are reasons to believe that the AI problem can be solved in my expected lifetime. So it’s worth sticking to it for a lifetime, even if it will take 30 years or so to reap the benefits. The AI problem. The science of Artificial Intelligence (AI) may be defined as the construction of intelligent systems and their analysis. A natural definition of a system is anything which has an input and an output stream. Intelligence is more complicated. It can have many faces like creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, surviving in an environment, language processing, knowledge, and many more. A formal definition incorporating every aspect of intelligence, however, seems difficult. Most, if not all known facets of intelligence can be formulated as goal
Conference Paper
Full-text available
Solomonoff induction is known to be universal, but incomputable. Its approximations, namely, the Minimum Description (or Message) Length (MDL) principles, are adopted in practice in the efficient, but non-universal form. Recent attempts to bridge this gap leaded to development of the Representational MDL principle that originates from formal decomposition of the task of induction. In this paper, possible extension of the RMDL principle in the context of universal intelligence agents is considered, for which introduction of representations is shown to be an unavoidable meta-heuristic and a step toward efficient general intelligence. Hierarchical representations and model optimization with the use of information-theoretic interpretation of the adaptive resonance are also discussed.
Conference Paper
Full-text available
Universal induction is a crucial issue in AGI. Its practical applicability can be achieved by the choice of the reference machine or representation of algorithms agreed with the environment. This machine should be updatable for solving subsequent tasks more efficiently. We study this problem on an example of combinatory logic as the very simple Turing-complete reference machine, which enables modifying program representations by introducing different sets of primitive combinators. Genetic programming system is used to search for combinator expressions, which are easily decomposed into sub-expressions being recombined in crossover. Our experiments show that low-complexity induction or prediction tasks can be solved by the developed system (much more efficiently than using brute force); useful combinators can be revealed and included into the representation simplifying more difficult tasks. However, optimal sets of combinators depend on the specific task, so the reference machine should be adaptively chosen in coordination with the search engine.
Conference Paper
Computer programs are a very attractive way to represent knowledge about the world. A program is more than just objects and relations. It naturally provides information about evolution of a system in time. Therefore, programs can be considered the most universal data structures. The main problem with such representation is that it is much more difficult to deal with programs than with usual data structures. Metacomputations are a powerful tool for program analysis and transformations. This paper describes artificial general intelligence from metacomputational point of view. It shows that many methods of metacomputations e.g. supercompilation and Futamura projections can be applied to AGI problems.
Article
This paper is about Algorithmic Probability (ALP) and Heuristic Programming and how they can be combined to achieve AGI. It is an update of a 2003 report describing a system of this kind (Sol03). We first describe ALP, giving the most common implementation of it, then the features of ALP relevant to its application to AGI. They are: Completeness, Incomputability, Subjectivity and Diversity. We then show how these features enable us to create a very general, very intelligent problem solving machine.
Chapter
We present the first class of mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers. Inspired by Kurt Gödel’s celebrated self-referential formulas (1931), such a problem solver rewrites any part of its own code as soon as it has found a proof that the rewrite is useful, where the problem-dependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and efficiently tests computable proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable self-rewrite. We show that such a self-rewrite is globally optimal—no local maxima!—since the code first had to prove that it is not useful to continue the proof search for alternative self-rewrites. Unlike previous non-self-referential methods based on hardwired proof searchers, ours not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()-notation, provided the utility of such speed-ups is provable at all.
Conference Paper
The OpenCog software development framework, for advancement of the development and testing of powerful and responsible integrative AGI, is described. The OpenCog Framework (OCF) 1.0, to be released in 2008 under the GPLv2, is comprised of a collection of portable libraries for OpenCog applications, plus an initial collection of cognitive algorithms that operate within the OpenCog framework. The OCF libraries include a flexible knowledge representation embodied in a scalable knowledge store, a cognitive process scheduler, and a plug-in architecture for allowing interaction between cognitive, perceptual, and control algorithms.