ArticlePDF Available

Artificial general intelligence through recursive data compression and grounded reasoning: a position paper

Authors:
  • Odessa Competence Center for Artificial intelligence and Machine learning (OCCAM)

Abstract and Figures

This paper presents a tentative outline for the construction of an artificial, generally intelligent system (AGI). It is argued that building a general data compression algorithm solving all problems up to a complexity threshold should be the main thrust of research. A measure for partial progress in AGI is suggested. Although the details are far from being clear, some general properties for a general compression algorithm are fleshed out. Its inductive bias should be flexible and adapt to the input data while constantly searching for a simple, orthogonal and complete set of hypotheses explaining the data. It should recursively reduce the size of its representations thereby compressing the data increasingly at every iteration. Based on that fundamental ability, a grounded reasoning system is proposed. It is argued how grounding and flexible feature bases made of hypotheses allow for resourceful thinking. While the simulation of representation contents on the mental stage accounts for much of the power of propositional logic, compression leads to simple sets of hypotheses that allow the detection and verification of universally quantified statements. Together, it is highlighted how general compression and grounded reasoning could account for the birth and growth of first concepts about the world and the commonsense reasoning about them.
Content may be subject to copyright.
M(x)X
p:U(p)=x
2p|
p|p|
U x
x
M(xt|x1, . . . , xt1)1
t→ ∞ xii
3·109·2/8/10242= 715
1.5
715 ·10% ·1.5% ·1% =
1.0
U
U pi
L
t xi
Kt(x) = min
p{|p|+ log t:U(p) = x t }
(xi, po
i)
po
i={p:Kt(xi) = |p|+ log t, |p| ≤ L}
po
ixip
ri(L) = |xi|−|p|
|xi|−|po
i|[0,1]
x{xi}
R
L
R(L) = hri(L)i
L R 1L
L
x
p|p| ≤ L
O(2nm)
n m
Θ(2nm)
π3,1,4,1,5,9,2,6,5,3, . . .
π
1,3,1,3,2,4,2,4,2,3,5,3,5,3,5,4,6,4,6,4,6,4,
1,4,1,4,2,5,2,5,2,3,6,3,6,3,6,4,7,4,7,4,7,4, ...
1,3, . . .
2
1,3,5,7,9, . . .
n
{−1,2,...,n}n
n
0001111100000000
{0,1}p= 0.5
H0=16 log2(p) = 16
n l n l
H= log216 = 4
H
1H+H
H0
H(1) = 2 ·10 ·log2(1024) =
200
1200/1024 = 80.4%
ni= 100 ·i li= 4
100 H(2) = 20
120/1024 = 98%
π
{1,...,14}
α
8 8
0.5
0.46878
5 101 46878
1011011100011110
0.5 0.46878
1
/2,1
/3,1
/4
1
l
l1, l2, l3, . . .
ll1l > l2l(l3l2)2/l1
lr
llr
l=lr±  1
a b 1 1000 99.9%
a=b
lrl
ll0l
l0al0b
X X = 1
H1:l0=lrH2:l0
H1
l p(X= 1|l, H1) = Θ(llr) Θ
H2
a b p(X= 1|l, H2) = la
ba
p(X= 1|l) = p(X= 1|l, H1)p(H1) + p(X= 1|l, H2)p(H2) =
Θ(llr)β+la
ba(1 β) = 1
2
β=p(H1)=1p(H2)
1
2l
H1β.1lr
l0
1
2
l=lr
n×n
n2
2 log2n m n2
1+2mlog2nn2
l
1+2llog2n1+2mlog2n
l= 12 n= 128
H= 1 + 2llog2n= 169
H=n2= 16384
⊂ ⊂
⊂ ⊂
n n(n1)
n(n1)
... The strength of those algorithms appears to entail their weakness: the inability to represent a wide range of simple data. In order to make progress toward AGI, we suggest to fill the "cup of complexity" for the bottom up, as exemplified in Fig. 1.1 in [3]. Instead of building algorithms for complex but narrow data, we suggest heading for simple but general ones. ...
... . The feature f 1 and the residual description r 1 form the program 3,6,9,12,15,18], [0, 1, 2, 3, 4, 5, 6], 21, 9) , ...
... which inserts the numbers 0 to 6 at the indicated indices and fills the rest with 9's, which is shorter than the initial target x by 28%. In the current version of WILLIAM, the new target is obtained Table 2. Example of an incrementally induced composition of functions (also called alleys in [5]) denotation feature residual description compression x [0, 9,9,1,9,9,2,9,9,3,9,9,4,9,9,5,9,9,6,9,9] This concatenation is done in order to account for possible mutual information between leaves. For example, the two lists in eq. ...
Chapter
Full-text available
We present WILLIAM – an inductive programming system based on the theory of incremental compression. It builds representations by incrementally stacking autoencoders made up of trees of general Python functions, thereby stepwise compressing data. It is able to solve a diverse set of tasks including the compression and prediction of simple sequences, recognition of geometric shapes, write code based on test cases, self-improve by solving some of its own problems and play tic-tac-toe when attached to AIXI and without being specifically programmed for it.
... Another class of algorithm that can be mentioned encompasses systems that are usually somewhere on the edge of cognitive architectures and adaptive general problem-solving systems. Examples of such systems are: the Non-Axiomatic Reasoning System [96], Growing Recursive Self-Improvers [2], recursive data compression architecture [24], OpenCog [32], Never-Ending Language Learning [14], Ikon Flux [67], MicroPsi [3], Lida [23] and many others [48]. These systems usually have a fixed structure with adaptive parts and are in some cases able to learn from real-world data. ...
... , x(t + T f ). Since the sequence of observations O l j might not have the Markovian property, and it might have been further compromised by the Spatial Pooler, the problem is not 24 The terminology adopted from [34] 25 Due to simplicity and interpretability reasons, as described in Section I. solvable in general. So we limit the learning and inference in one Expert to Markov chains of small order and learn the probabilities: ...
Preprint
Full-text available
Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are usually uncomputable, incompatible with theories of biological intelligence, or lack practical implementations. The goal of this work is to combine the main advantages of the two: to follow a big picture view, while providing a particular theory and its implementation. In contrast with purely theoretical approaches, the resulting architecture should be usable in realistic settings, but also form the core of a framework containing all the basic mechanisms, into which it should be easier to integrate additional required functionality. In this paper, we present a novel, purposely simple, and interpretable hierarchical architecture which combines multiple different mechanisms into one system: unsupervised learning of a model of the world, learning the influence of one's own actions on the world, model-based reinforcement learning, hierarchical planning and plan execution, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations with the following properties: 1) they are increasingly more abstract, but can retain details when needed, and 2) they are easy to manipulate in their local and symbolic-like form, thus also allowing one to observe the learning process at each level of abstraction. On all levels of the system, the representation of the data can be interpreted in both a symbolic and a sub-symbolic manner. This enables the architecture to learn efficiently using sub-symbolic methods and to employ symbolic inference.
... For many authors there is a very strong relationship between Data Compression and Artificial Intelligence [8] [9]. Data Compression is about making good predictions which is also the goal of Machine Learning, a field of Artificial Intelligence. ...
Article
Full-text available
In recent studies [1][13][12] Recurrent Neural Networks were used for generative processes and their surprising performance can be explained by their ability to create good predictions. In addition, data compression is also based on predictions. What the problem comes down to is whether a data compressor could be used to perform as well as recurrent neural networks in natural language processing tasks. If this is possible,then the problem comes down to determining if a compression algorithm is even more intelligent than a neural network in specific tasks related to human language. In our journey we discovered what we think is the fundamental difference between a Data Compression Algorithm and a Recurrent Neural Network.
... The present work is a continuation of my general approach to artificial intelligence [6]. In fact, I have already demonstrated the practical feasibility and efficiency of incremental compression in a general setting. ...
Conference Paper
The ability to induce short descriptions of, i.e. compressing, a wide class of data is essential for any system exhibiting general intelligence. In all generality, it is proven that incremental compression – extracting features of data strings and continuing to compress the residual data variance – leads to a time complexity superior to universal search if the strings are incrementally compressible. It is further shown that such a procedure breaks up the shortest description into a set of pairwise orthogonal features in terms of algorithmic information.
Article
Full-text available
Research in Artificial Intelligence (AI) has focused mostly on two extremes: either on small improvements in narrow AI domains, or on universal theoretical frameworks which are often uncomputable, or lack practical implementations. In this paper we attempt to follow a big picture view while also providing a particular theory and its implementation to present a novel, purposely simple, and interpretable hierarchical architecture. This architecture incorporates the unsupervised learning of a model of the environment, learning the influence of one’s own actions, model-based reinforcement learning, hierarchical planning, and symbolic/sub-symbolic integration in general. The learned model is stored in the form of hierarchical representations which are increasingly more abstract, but can retain details when needed. We demonstrate the universality of the architecture by testing it on a series of diverse environments ranging from audio/visual compression to discrete and continuous action spaces, to learning disentangled representations.
Article
Full-text available
Abstract The topic of deep,learning systems,has,received,sig- nificant attention during the past few years, particularly as a biologically-inspired approach,to processing,high- dimensional signals. The latter often involve spatiotemporal information that may span large scales, rendering its repre- sentation in the general case highly challenging. Deep learn- ing networks,attempt to overcome,this challenge by means of a hierarchical architecture that is comprised,of common circuits with similar (and often cortically influenced) fun c- tionality. The goal of such systems is to represent sensory observations in a manner,that will later facilitate robust p at- tern classification, mimicking a key attribute of the mammal brain. This stands in contrast with the mainstream,approach of pre-processing the data so as to reduce its dimensionalit y - a paradigm,that often results in sub-optimal performance. This paper presents a Deep SpatioTemporal Inference Net- work (DeSTIN) - a scalable deep learning architecture that re- lies on a combination,of unsupervised learning and Bayesian inference. Dynamic,pattern learning forms an inherent way of capturing complex,spatiotemporal dependencies. Simula- tion results demonstrate the core capabilities of the propo sed framework, particularly in the context of high-dimensional signal classification.
Article
Full-text available
Personal motivation. The dream of creating artificial devices which reach or outperform human intelligence is an old one. It is also one of the two dreams of my youth, which have never let me go (the other is finding a physical theory of everything). What makes this challenge so interesting? A solution would have enormous implications on our society, and there are reasons to believe that the AI problem can be solved in my expected lifetime. So it’s worth sticking to it for a lifetime, even if it will take 30 years or so to reap the benefits. The AI problem. The science of Artificial Intelligence (AI) may be defined as the construction of intelligent systems and their analysis. A natural definition of a system is anything which has an input and an output stream. Intelligence is more complicated. It can have many faces like creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, surviving in an environment, language processing, knowledge, and many more. A formal definition incorporating every aspect of intelligence, however, seems difficult. Most, if not all known facets of intelligence can be formulated as goal
Article
Although computational models of cognition have become very popular, these models are relatively limited in their coverage of cognition - they usually only emphasize problem solving and reasoning, or treat perception and motivation as isolated modules. The first architecture to cover cognition more broadly is the Psi theory, developed by Dietrich Dörner. By integrating motivation and emotion with perception and reasoning, and including grounded neuro-symbolic representations, the Psi contributes significantly to an integrated understanding of the mind. It provides a conceptual framework that highlights the relationships between perception and memory, language and mental representation, reasoning and motivation, emotion and cognition, autonomy and social behavior. So far, the Psi theory's origin in psychology, its methodology, and its lack of documentation have limited its impact. This book adapts the Psi theory to cognitive science and artificial intelligence, by elucidating both its theoretical and technical frameworks, and clarifying its contribution to how we have come to understand cognition.
Article
This thesis describes EM-ONE, an architecture for commonsense thinking capable of reflective reasoning about situations involving physical, social, and mental dimensions. EM-ONE uses as its knowledge base a library of commonsense narratives, each describing the physical, social, and mental activity that occurs during an interaction between several actors. EM-ONE reasons with these narratives by applying "mental critics," procedures that debug problems that exist in the outside world or within EM- ONE itself. Mental critics draw upon commonsense narratives to suggest courses of action, methods for deliberating about the circumstances and consequences of those actions, and--when things go wrong-ways to reflect upon and debug the activity of previously invoked mental critics. Mental critics are arranged into six layers, the reactive, deliberative, reflective, self-reflective, self-conscious, and self-ideals layers. The selection of mental critics within these six layers is itself guided by a separate collection of meta-level critics that recognize what overall problem-type presently confronts the system. EM-ONE was developed and tested within an artificial life domain where simulated robotic actors face concrete physical and social problems.
Article
A system built on a layered reflective cognitive architecture presents many novel and difficult software engineering problems. Some of these problems can be ameliorated by erecting the system on a substrate that implicitly supports tracing the behavior of the system to the data and through the procedures that produced that behavior. Good traces make the system accountable; it enables the analysis of success and failure, and thus enhances the ability to learn from mistakes. This constructed substrate provides for general parallelism and concurrency, while supporting the automatic collection of audit trails for all processes, including the processes that analyze audit trails. My system natively supports a Lisp-like language. In such a language, as in machine language, a program is data that can be easily manipulated by a program, making it easier for a user or an automatic procedure to read, edit, and write programs as they are debugged. Constructed within this substrate is an implementation of the bottom four layers of an Emotion Machine cognitive architecture, including built-in reactive, learned reactive, deliberative, and reflective layers. A simple natural language planning language is presented for the deliberative control of a problem domain. Also, a number of deliberative planning algorithms are implemented in this natural planning language, allowing a recursive application of reflectively planned control. This recursion is demonstrated in a fifth super-reflective layer of planned control of the reflective planning layer, implying N reflective layers of planned control. Here, I build and demonstrate an example of reflective problem solving through the use of English plans in a block building problem domain. In my demonstration an AI model can learn from experience of success or failure. The Al not only learns about physical activities but also reflectively learns about thinking activities, refining and learning the utility of built-in knowledge. Procedurally traced memory can be used to assign credit to those thinking processes that are responsible for the failure, facilitating learning how to better plan for these types of problems in the future.
Article
How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis of their (arbitrary) shapes, be grounded in anything but other meaningless symbols? The problem is analogous to trying to learn Chinese from a Chinese/Chinese dictionary alone. A candidate solution is sketched: Symbolic representations must be grounded bottom-up in nonsymbolic representations of two kinds: (1) "iconic representations," which are analogs of the proximal sensory projections of distal objects and events, and (2) "categorical representations," which are learned and innate feature-detectors that pick out the invariant features of object and event categories from their sensory projections. Elementary symbols are the names of these object and event categories, assigned on the basis of their (nonsymbolic) categorical representations. Higher-order (3) "symbolic representations," grounded in these elementary symbols, consist of symbol strings describing category membership relations (e.g., "An X is a Y that is Z").