Conference PaperPDF Available

Learning of an Anticipatory World-Model and the quest for General versus Reinforced Knowledge

Authors:
  • Constructor University

Abstract and Figures

this paper our main interest is in the question how the arbitration between general and reinforced knowledge can be done, i.e., how to decide what should be included in a world-model. With the experimental results from the learning of eye-hand-coordination we demonstrate here how the mechanisms of reliability and applicability can be used for this purpose. When learning eye-hand-coordination the system starts with the "discovery" of rules like
Content may be subject to copyright.
Learning of an Anticipatory World-Model
and the quest for General versus Reinforced Knowledge
Andreas Birk
Artificial Intelligence Laboratory, Vrije Universiteit Brussel
Abstract
We describe in this paper learning-experiments with robotic systems with a special focus on
the arbitration between general and reinforced knowledge. The robotic systems include a
combination of a camera with a robot-arm for eye-hand-coordination and autonomous mobile
robots in an ecosystem-like setting. In the case of the learning of eye-hand-coordination we
present results of evolving an anticipatory world-model that features controlled movements
and gripping of building-blocks. The model is stored in a graph of anticipatory rules.
Operations on the graph are conducted locally, though global planning is possible. Two
simple heuristics are used to decide if knowledge is too general or worth to be kept. In the
case of the autonomous mobile robots we address the problem of encountering dangerous
situations due to the urge to explore the world. We present learning experiments where the
robots can actually destroy themselves in the worst case and discuss possibilities to avoid
this. We report results where robots learn to stay long-term operationable, including
autonomous re-charging.
Keywords: learning, eye-hand coordination, J.P. Piaget, G.L. Drescher, autonomous mobile robots
Final Version:
@inproceedings{BehaviorsAndWorldModel_CASYS97,
author = {Birk, Andreas},
title = {Learning of an Anticipatory World-Model and the Quest for
General versus Reinforced Knowledge},
booktitle = {First International Conference on Computing Anticipatory Systems},
editor = {Dubois, Daniel M.},
publisher = {The American Institute of Physics (AIP)},
doi = {10.1063/1.56312},
year = {1998}
}
1 Introduction
The ultimate aim of AI is to understand intelligence in a constructive way, i.e., to build systems
exhibiting intelligent behavior. An anticipatory system, i.e., a system having a model of itself
and/or its environment (Rosen 1985), has obviously a “higher cognitive level” than a purely
reactive system, as it is capable of planning, imitation, and so on. But from a “designer viewpoint”,
i.e., as an AI researcher who actually has to build systems, the modeling is a hard and labor-
intensive task. For this reason, among several others, e.g. to be able to cope with changing
environments, it is tempting to propose learning as panacea, i.e., to build up and update the model
while interacting with the world. When the world-model is built up a crucial question involved is
the balance between general and reinforced knowledge, i.e., information that could be useful and
information that has already been indicated as being profitable. A naive solution is to make a
complete model, meaning to store as much information about the world as possible. But this is
usually infeasible for reasons of computational complexity, especially in respect to memory and
speed. Another problem is that some information about the world can only be found in an active
way, i.e., through interaction with the world including trial and error manners. But this can be
dangerous; harmful situations can be encountered that otherwise would not emerge.
There are several interesting possibilities to do the arbitration between the amount of general and
reinforced knowledge. First, some sensor-effector combinations can not be learnable at all, thus
reducing computational complexity. For example, Gallistel et al. (1991) report that rats rapidly
associate poisoned water with smell, but they cannot learn to associate it with visual or auditory
information. This restriction of the search-space leaves more room for storing general knowledge.
Second, the arbitration can be done in respect to the computational resources available. Starting
with a more or less tabula rasa, a learning system can gather as much information as possible in the
beginning. But as the model grows information too general has to be deleted to make room for
more useful ones. The world-model so to say evolves towards being more and more “greedy”, i.e.,
focusing on usefulness. Starting from storing any experience without getting reinforcement or
information from a teacher, proves of usefulness are more and more needed to include or keep
something in the world-model as time passes. A third mechanism has to deal with dangerous
situations because searching general knowledge can provoke harmful events. So, this kind of
undesired strong reinforcement has to override the drives to generate general knowledge as
otherwise “curiosity kills the cat".
The rest of the paper is structured as follows. Section 2 gives a conceptual overview over Stimulus
Response Learning. This learning approach features the possibility to build an anticipatory world-
model from scratch. In section 3 concrete experimental results of learning eye-hand-coordination
with Stimulus Response Learning are presented. In doing so, we have a deeper discussion of how
the arbitration between storing general and useful knowledge is done. Section 4 features an
example of how the “search for knowledge” can lead into dangerous situation. More precisely, we
describe experiments where mobile robots can actively destroy themselves when exploring their
“complete” range of possible behaviors. In section 5 we conclude the paper and give a brief outlook
on future work.
2 Stimulus Response Learning
Stimulus response learning combines an improved version of the schema mechanism of Gary L.
Drescher (1991) and the concepts of evolutionary algorithms. It is related to the the common
classes of evolutionary algorithms, genetic algorithms (Holland 1975, Goldberg 1989), genetic
programming (Koza 1992, 1994), evolutionary strategies (Rechenberg 1973, Schwefel 1977) and
evolutionary programming (Fogel et al. 1966), only in respect to the basic inspiration: the principle
of evolution in nature. Stimulus response learning permits systems with sensors and effectors, so-
called animats (Wilson 1991), to explore and internally model an environment.
The system starts learning with an empty model of the environment (tabula rasa). It constructs the
model from scratch using primitives which are in no way adapted to the environment and effectors
and only in a very mild way adapted to the sensors of the system. In what follows we will call the
model of the environment simply the world model. The central data structure of the world model is
a dynamic directed graph. Its nodes are simple rules for behavior-control, the so-called stimulus
response rules (SRRs), which are composed of predicates and actions. In its simplest form such a
rule has the form c/a which says, that action a can be performed, if condition c holds. Performing a
when c holds is called execution of the rule. In a more elaborated form, the rules have the form
c/a/r that includes a result r. A result r describes the state of world in terms of sensor perception
after execution of c/a.
An edge between two rules stands for possible consecutive execution of the rules. The graph is
constructed by an evolutionary algorithm using sets of SRRs and sets of edges as population. The
fitness of rules and edges is measured by two simple, universal and purely statistical quality
measures, namely reliability and applicability. Reliability counts how often the prediction of are
single rule is right, applicability counts how often a rule is actual used. Reliability is kind of the
most general drive in the learning process. In the beginning, every information about constant
relations between sensor-input and effector-output is stored. But the mechanism of applicability
prunes the model. If some knowledge in the model has not been used for quite some time, then it is
removed.
By building up a directed graph, the system constructs a spatial, but in general not Euclidean,
model of the world. In the graph we keep track of the SRR executed last. This SRR is called
standpoint, and models the system's current position in the world. It is introduced for two reasons.
First, planning reduces to the search of paths from the standpoint to an SRR with fitting result.
Second, we have the possibility to restrict the population of SRRs who participate in a learning step
to the neighborhood of the standpoint in the graph theoretic sense. This measure greatly reduces the
complexity of learning steps and captures the following intuition. If we want to explain a new
phenomenon at some place in the world, modeled by the standpoint in the world model, then we
expect existing knowledge about that part of the world, the neighborhood of the standpoint, to be
more useful than existing knowledge about remote parts of the world.
3 Learning of Eye-Hand Coordination
We achieved promising results with Stimulus Response Learning in experiments to control a robot-
arm with a camera. In one class of experiments the camera picture is sectioned as a grid (figure 1).
In each field of the grid the most frequent color in it is determined. With this set-up the system
learned to move its hand - a red colored gripper - and to grasp and move building blocks. Gary L.
Drescher presents in (Drescher 1991) an own approach using a similar, but to some extent richer,
environment. His best run on a Thinking Machines CM2 (16K processors, 512 Mbyte main
memory) ended after two days with memory overflow. His system learned approximately 70% of
the desired world-model. A corresponding world-model is found with Stimulus Response Learning
in 25 seconds on a SUN Sparc 10 completely. The total amount of memory used is less than 250
Kbyte.
Furthermore, Drescher's experiments are done in simulations only. All experiments with Stimulus
Figure 2: part of a high-resolution camera picture showing the robot gripper and a building block. Picture B
features the raw input. Picture A shows what the system learns to recognize: the triangle of the robot gripper and
the rectangle of the building block.
Figure 1: schematic layout of the set-up for learning eye-hand coordination
Response Learning were done in real world set-ups as well. In doing so, the system was very
successful in dealing with noise and errors. In the most challenging class of experiments
unprocessed real world images were used (figure 2). The system learned an unpredicted solution
for classifying the gripper and the building blocks by inventing a kind of edge detection. In a real
world set-up hand movements and grasping were learned successfully - in every run - in
approximately 50 hours on average. In these experiments the run-time was dominated by the speed
of the robot arm.
In this paper our main interest is in the question how the arbitration between general and reinforced
knowledge can be done, i.e., how to decide what should be included in a world-model. With the
experimental results from the learning of eye-hand-coordination we demonstrate here how the
mechanisms of reliability and applicability can be used for this purpose.
When learning eye-hand-coordination the system starts with the “discovery” of rules like
red spot at (x,y) / activate motor k / red spot at (x’,y’)
These rules contain the knowledge where the hand (red spot) is moved to when at a certain hand
position certain motors are activated. The whole model for moving the hand around consists of a
grid-like graph where the rules for moving to position (x,y) are connected to rules which tell where
the hand can move to from position (x,y). At first glance, this seems to be simple to achieve but in
the real world there are many problems involved. First, there are “errors” due to reflexes, shadows,
and so on. So what happens is that the system perceives for example a change in its environment in
form of a “popped up” white spot (a reflex) which it “believes” it can influence (as the reflex might
disappear due to a motor activation). As a consequence, the system stores these “experiences” in
rules similar to the above one.
Fortunately, the system can find out that these rules are “nonsense” by actively trying to validate
the knowledge in the model. This is realized by using alternatively so-called creation- and training-
phases during the learning process. In a creation phase, the actual “discoveries” are made, i.e., the
rules and edges are formed depending on sensor-changes due to random effector-activation. In a
training phase, the system “applies its knowledge” by executing rules. In doing do, it does not need
a teacher. It simply can train itself by making kind of random walks through its knowledge learned
so far. It consecutively executes rules which are connected by edges and which have fulfilled
conditions. In doing so it counts how often the prediction made by a rule holds, i.e., it computes the
reliability. Only rules with high reliability (close to One) are kept in the model. Therefore, rules
describing “perceptual errors” (reflexes, shadows, etc.) are eliminated.
Some nonsense” rules have conditions corresponding to “perceptual errors” which occur very
rarely. Therefore, they are not tested during a training phase (as their conditions are not fulfilled).
But these rules are recognized as being “bad” due to their low applicability, i.e. the number of
executions since creation of that particular rule. In addition to its role in the elimination of
“nonsense”-knowledge, applicability is useful to generalize knowledge. Take for example the
learning of rules for moving the hand (red spot) in the presence of a building block (blue spot).
Following two rules are both very reliable
red at (4,5) + blue at (2,3) / motor 1 / red at (4,5) + blue at (2,3)
red at (4,5) / motor 1 / red at (4,5)
But the second rule is much more general than the first one. The first rule can only be used to move
the hand if a building block is present at the right position. This somehow largely restricts hand-
movements. But fortunately, the second rule has a much higher applicability than the first one as it
can be executed much more often. Therefore, it is kept in the world-model whereas rules like the
first one disappear.
An important point is that the knowledge is not generalized too much by the mechanism we are
using in our system. Note that if the presence of a “blue spot” at a certain fixed position is actually
important
1
for hand movements then the second rule will have a low reliability as it fails whenever
the blue spot is not present but the rule is executed.
One of the most important features of using the combination of reliability and applicability to guide
the process of learning a world-model is the “greediness” involved in these concepts, i.e., starting
from including very general knowledge there is a tendency to keep only knowledge in the model
that gets a reinforcement in form of usage. As mentioned in the introduction, the learning of a
world-model is constrained by computational limits in processing-power and memory-space.
Storing general knowledge without revision leads easily to an explosion of the model in size. In
Stimulus Response Learning there is kind of a compromise. General knowledge is stored, but after
a while it is forgotten again if there was no “reason” to use it.
Let us illustrate this in an example. When eye-hand-coordination is learned the system builds a
world-model for hand-movements and for gripping and moving of building blocks. Hand-
movements are always possible whereas gripping a block is restricted to situations where the hand
is by chance above the block (given “random-walks” through the knowledge). Therefore, rules for
hand-movements are much more applicable than rules for gripping. Though the system learns to
grip in our experiments, it “forgets” this again after a while. But if the systems is “told”
2
to grip and
move building blocks occasionally then this “reinforcement” causes this part of the model to be
kept.
4 When fast Feedback is needed
The disadvantage of the mechanisms presented above is that they rely on repeated “experiences”
and active “exploration” of the world. This means that they involve some trial and error
components. Therefore, they can drive the system into harmful situations and this even several
times.
1
We did experiments where a “magic lamp” indicated if the hand can be moved into a certain direction.
2
Remember that our system is capable of planning. When we confront it with a goal in the form of “bring the blue spot
to position (x,y)”, then it can search a rule with corresponding result and do a search for the shortest path from the
standpoint to this rule in the world-model-graph. If the system already learned enough, this results in moving the
gripper to the building block, gripping, movement to the position (x,y), and release of the block.
Figure 3: one of the VUB AI-lab mobile robots. The metal antenna on top of it is used for the recharging process.
In the VUB AI-lab, we are working with autonomous mobile robots (figure 3) in a so-called
ecosystem setting. The ecosystem (figure 4) includes a charging-station where the robots can re-fill
Figure 4: the VUB ecosystem with a mobile robot (right), the charging station (middle), and a competitor (left).
their batteries and so-called competitors. The competitors are boxes housing lamps connected to the
same global energy-source as the charging-station. They are therefore “eating up” some of the
robots’ resources. But when a robot pushes against a box, the lights inside dim for some time and
there is more energy in the charging station.
From the viewpoint of AI, learning of the crucial behaviors and eventually world-models is an
important aim. But there are serious dangers involved when using a system that tries to learn
general anticipatory knowledge. So burn the drive-motors for example if the robot is stuck at an
obstacle or the batteries explode when they are overcharged. It follows that a complete world-
model must include this knowledge. But obviously we don’t want our robots to learn this from
experience.
In experiments of learning basic behaviors in the ecosystem we therefore used a “pain-like”
criterion that gives fast feedback. This criterion is based on short-term monitoring of the internal
current of the robot. We claim that this kind of monitoring of the essential variables (Ashby 1952)
is necessary for any system that learns world-models. In case one (or more) of the variables gets
towards the borders of the viability space (Mc Farland et al. 1981) fast and effective measures are
needed. Otherwise “curiosity kills the cat”.
5 Future Work
The obvious next step is to learn an anticipatory world model on the mobile robots in the ecosystem
using Stimulus Response Learning and mechanisms handling “emergencies”. A very useful and in
our opinion feasible to learn model is a kind of map featuring rules that have tests which are
capable of “identifying” locations. This can be achieved for example by using information based on
Figure 5: an exploded robot due to an overcharged battery-pack
touch (e.g. corners), vision, sensing active beacons and so on. It is of course an open question if the
accuracy of the model can be made sufficiently high, so that “hardwired” reactive robots can be
outperformed. Nevertheless, the complexity of the environment plus the fact that it is embedded in
the real world make this task a fascinating goal.
Acknowledgments
Many thanks to all members of the VUB AI-lab who work jointly on maintenance and
improvements of the robots and the ecosystem, as well as on the concepts behind it. The robotic
agents group of the VUB AI-lab is partially financed by the Belgian Federal government FKFO
project on emergent functionality (NFWO contract nr. G.0014.95) and the IUAP project (nr. 20)
CONSTRUCT.
References
Ross Ashby (1952). Design for a brain. Chapman and Hall, London.
Gary L. Drescher. (1991) Made-up minds, A constructivist approach to artificial intelligence. The
MIT Press, Cambridge.
L.J. Fogel (1966), A.J. Owens, and M.J. Walsh. Artificial Intelligence through Simulated
Evolution. Wiley, New York.
Karl Pribam (1960), George Miller, Eugene Galanter. Plans and the structure of behavior. Holt,
Rinehart & Winston, New York.
C. Gallistel (1991), A.L. Brown, S. Carey, R. Gelman, F.C. Keil. Lessons From Animal Learning
for the Study of Cognitive Development. In The Epigenesis of Mind, S. Carey and R. Gelman, eds,
Lawrence Erlbaum, Hillsdale, NJ.
David Goldberg (1989). Genetic Algorithms in Search Optimization and Machine Learning.
Addison-Wesley, Reading.
John H. Holland (1975). Adaptation in Natural and Artificial Systems. The University of Michigan
Press, Ann Arbor, 1975.
JohnR. Koza (1992). Genetic programming. The MIT Press, Cambridge.
John R. Koza (1991). Genetic programming II. The MIT Press, Cambridge.
D. Mc Farland (1981), A. Houston. Quantitative Ethology: the state-space approach. Pitman Books,
Lonodon.
Jean Piaget (1991). Gesammelte Werke. Klett-Cotta, Stuttgart.
Ingo Rechenberg (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien
der biologischen Evolution. Fromman-Holzboog, Stuttgart.
Robert Rosen (1985). Anticipatory Systems Philosophical, Mathematical and Methodological
Foundations. Pergamon Press.
Hans Paul Schwefel (1977). Numerische Optimierung von Computer-Modellen mittels der
Evolutions-Strategie. Birkhaeuser, Basel.
S.W. Wilson (1991). The animat path to AI. In From Animals to Animats. Proc.of the First
International Conference on Simulation of Adaptive Behavior. The MIT Press/Bradford Books,
Cambridge.
... This measure of similarity can then be used as a fitness function in an evolutionary algorithm. The application of reproductive perception in experiments with a set-up consisting of a robot arm and a camera learning a 3D world-model related to eye hand coordination [BJP00,Bir98,Bir96] is presented in this section. ...
... The experiments with a real world set-up showed that the system was indeed capable of learning a 3D representation of its world in real-time that in the end enable it to manipulate building blocks. More detailed descriptions can be found in [BJP00,Bir98,Bir96]. ...
Conference Paper
Full-text available
Reproductive perception is a novel approach to percep- tion. It is based on the assumption that perception is pre- dominantly a generative process, i.e., that models that represent hypotheses about the current state of the envi- ronment generate so-called pseudo-sensor data, which is matched against the actual sensor data to determine and improve the correctness of these hypotheses. This is in contrast to the view that perception is mainly a re- ductive process where large amounts of sensor data are processed until a compact representation is achieved. Several successful examples of using this approach for spatial world-modeling are presented here. The first one deals with a robot arm and a camera to learn eye hand coordination. In the second example, the successful de- tection and 3D localization of humans in single 2D ther- mal images is described. In the last but not least exam- ple, work in progress on the generation and usage of compact 3D models of unstructured environments on a mobile robot is presented. The reproductive perception approach in all three examples does not only influence the way the spatial knowledge is generated and repre- sented, but also its usage especially with respect to the classification and recognition of objects.
Article
A research methodology is proposed for understandingintelligence through simulation of artificialanimals ("animats") in progressively morechallenging environments while retaining characteristicsof holism, pragmatism, perception, categorization,and adaptation that are oftenunderrepresented in standard AI approaches to intelligence.It is suggested that basic elements of themethodology should include a theory/taxonomyof environments by which they can be ordered indifficulty---one is...
Article
A profitable way of seeing 'Plans,' therefore, is as an act of persuasion, an attempt to show that a scientifically acceptable language could discuss real human functions such as those disturbed in brain injury; but without unwarranted assumptions about the actual physiological mechanisms. From this aspect, notice in the following pages the clear, simple, and appealing accounts of early cybernetic work. The analysis of computability by Turing, of neural nets by McCulloch and Pitts, of precise formulations of grammar by Chomsky, and of concept learning by Bruner, Goodnow, and Austion, had already appeared. . . . The summaries of this work in 'Plans' have probably never been bettered either for accuracy or for level of literary quality. The readers then, and the readers now, could see without effort what was being claimed, why it was important, and the promise it gave for the future. If it were only a summary of pre-existing ideas, however, the book might merely have had its persuasive effect and then been forgotten. It added other ideas as well, less visible in earlier work. One in particular, the concept of the TOTE [Text-Operate-Text-Exit] unit, is probably the most frequently cited by later writers. . . . In addition to the TOTE unit, there were a number of other ideas that are not usually quoted as coming from this book; but whose influence is clearly visible in later research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)