Print ISSN: 1059-7123
Publications
The Iterated Classification Game (ICG) combines the Classification Game with the Iterated Learning Model (ILM) to create a more realistic model of the cultural transmission of language through generations. It includes both learning from parents and learning from peers. Further, it eliminates some of the chief criticisms of the ILM: that it does not study grounded languages, that it does not include peer learning, and that it builds in a bias for compositional languages. We show that, over the span of a few generations, a stable linguistic system emerges that can be acquired very quickly by each generation, is compositional, and helps the agents to solve the classification problem with which they are faced. The ICG also leads to a different interpretation of the language acquisition process. It suggests that the role of parents is to initialize the linguistic system of the child in such a way that subsequent interaction with peers results in rapid convergence to the correct language.

The Dyna class of reinforcement learning architectures enables the creation of integrated learning, planning and reacting systems. A class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency is examined. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. It is proposed that the backups to be performed in Dyna be prioritized in order to improve its efficiency. It is demonstrated with simple tasks that use some specific prioritizing schemes can lead to significant reductions in computational effort and corresponding improvements in learning performance

This article describes a biomimetic control architecture affording an animat both action selection and navigation functionalities. It satisfies the survival constraint of an artificial metabolism and supports several complementary navigation strategies. It builds upon an action selection model based on the basal ganglia of the vertebrate brain, using two interconnected cortico-basal ganglia-thalamo-cortical loops: a ventral one concerned with appetitive actions and a dorsal one dedicated to consummatory actions. The performances of the resulting model are evaluated in simulation. The experiments assess the prolonged survival permitted by the use of high level navigation strategies and the complementarity of navigation strategies in dynamic environments. The correctness of the behavioral choices in situations of antagonistic or synergetic internal states are also tested. Finally, the modelling choices are discussed with regard to their biomimetic plausibility, while the experimental results are estimated in terms of animat adaptivity.

This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.

This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.

In this study, we investigate the adaptation and robustness of a packet switching network (PSN), the fundamental architecture of the Internet. We claim that the adaptation introduced by a transmission control protocol (TCP) congestion control mechanism is interpretable as the self-organization of multiple attractors and stability to switch from one attractor to another. To discuss this argument quantitatively, we study the adaptation of the Internet by simulating a PSN using ns-2. Our hypothesis is that the robustness and fragility of the Internet can be attributed to the inherent dynamics of the PSN feedback mechanism called the congestion window size, or \textit{cwnd}. By varying the data input into the PSN system, we investigate the possible self-organization of attractors in cwnd temporal dynamics and discuss the adaptability and robustness of PSNs. The present study provides an example of Ashby's Law of Requisite Variety in action.

The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to "next" constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, predicting thousands of features of the world's state, including all sensory inputs, at timescales from 0.1 to 8 seconds. This was achieved by treating each state feature as a reward-like target and applying temporal-difference methods to learn a corresponding value function with a discount rate corresponding to the timescale. We show that two thousand predictions, each dependent on six thousand state features, can be learned and updated online at better than 10Hz on a laptop computer, using the standard TD(lambda) algorithm with linear function approximation. We show that this approach is efficient enough to be practical, with most of the learning complete within 30 minutes. We also show that a single tile-coded feature representation suffices to accurately predict many different signals at a significant range of timescales. Finally, we show that the accuracy of our learned predictions compares favorably with the optimal off-line solution.

Language is often regarded as the hallmark for human intelligence. So, what design features make humans' linguistic behavior so special? First, human language is largely symbolic, which means that the communicative signals have either an arbitrary relationship to their meaning or reference, or this relationship is conventionalized (Peirce, 1931-58). As a result, the relationship between signal and reference must be learnt. Second, the number of words that make up a typical language is extremely large. There have been estimates that humans by the age of 18 have acquired approximately 60,000 words (Anglin, 1993). Third, the human vocal apparatus and auditory system allow us to produce and distinguish many different sounds, which we can combine in a controlled fashion to make even more distinctive vocalizations. Fourth, human language is an open system, so we can easily invent new words and communicate about previously unseen objects or events, thus allowing for language to grow and change. Finally, language has a complex grammar, which allows us to combine words in a different order and inflect words to give utterances different meanings. In effect, this allows humans to produce an infinite number of utterances given a finite number of means(Chomsky, 1956). This special issue includes computation studies which have provided major insight into the underlying principles of adaptive systems--biological evolution, individual learning, and cultural evolution--that interact with each other to account for language evolution. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

In order to study learning as an adaptive process it is necessary to take into consideration the role of evolution which is the primary adaptive process. In addition, learning should be studied in (artificial) organisms that live in an independent physical environment in such a way that the input from the environment can be at least partially controlled by the organisms' behavior. To explore these issues we used a genetic algorithm to simulate the evolution of a population of neural networks each controlling the behavior of a small mobile robot that must explore efficiently an environment surrounded by walls. Since the environment changes from one generation to the next each network must learn during its life to adapt to the particular environment it happens to be born in. We found that evolved networks incorporate a genetically inherited predisposition to learn that can be described as: (a) the presence of initial conditions that tend to canalize learning in the right direc...

This paper concerns the relationship between the detectable and useful structure in an environment and the degree to which a population can adapt to that environment. We explore the hypothesis that adaptability will depend unimodally on environmental variety, and we measure this component of environmental structure using the information-theoretic uncertainty (Shannon entropy) of detectable environmental conditions. We define adaptability as the degree to which a certain kind of population successfully adapts to a certain kind of environment, and we measure adaptability by comparing a population's size to the size of a non-adapting, but otherwise comparable, population in the same environment. We study the relationship between adaptability and environmental structure in an evolving artificial population of sensorimotor agents that live, reproduce, and die in a variety of environments. We find that adaptability does not show a unimodal dependence on environmental variety alone...

We present an approach to support effective learning and adaptation of behaviors for autonomous agents with reinforcement learning algorithms. These methods can identify control systems that optimize a reinforcement program, which is, usually, a straightforward representation of the designer's goals. Reinforcement learning algorithms usually are too slow to be applied in real time on embodied agents, although they provide a suitable way to represent the desired behavior. We have tackled three aspects of this problem: the speed of the algorithm, the learning procedure, and the control system architecture. The learning algorithm we have developed includes features to speed up learning, such as niche-based learning, and a representation of the control modules in terms of fuzzy rules that reduces the search space and improves robustness to noisy data. Our learning procedure exploits methodologies such as learning from easy missions and transfer of policy from simpler environments to the more complex. The architecture of our control system is layered and modular, so that each module has a low complexity and can be learned in a short time. The composition of the actions proposed by the modules is either learned or predefined. Finally, we adopt an anytime learning approach to improve the quality of the control system on-line and to adapt it to dynamic environments. The experiments we present in this article concern learning to reach another moving agent in a real, dynamic environment that includes nontrivial situations such as that in which the moving target is faster than the agent and that in which the target is hidden by obstacles.

The study of adaptive behavior, including learning, usually centers on the effects of natural selection for individual survival. But because reproduction is evolutionarily more important than survival, sexual selection through mate choice (Darwin, 1871), can also have profound consequences on the evolution of creatures' bodies and behaviors. This paper shows through simulation models how one type of learning, parental imprinting, can evolve purely through sexual selection, to help in selecting appropriate mates and in tracking changes in the phenotypic makeup of the population across generations. At moderate mutation rates, when populationtracking becomes an important but still soluble problem, imprinting proves more useful and evolves more quickly than at low or high mutation rates. We also show that parental imprinting can facilitate the formation of new species. In reviewing the biological literature on imprinting, we note that these results confirm some previous s...

The paper presents a neural network architecture (MAXSON) based on second-order connections that can learn a multiple goal approach/avoid task using reinforcement from the environment. It also enables an agent to learn vicariously, from the successes and failures of other agents. The paper shows that MAXSON can learn certain spatial navigation tasks much faster than traditional Q-learning, as well as learn goal directed behavior, increasing the agent's chances of long-term survival. The paper shows that an extension of MAXSON (V-MAXSON) enables agents to learn vicariously, and this improves the overall survivability of the agent population.

This paper proposes the concept of basis behaviors as ubiquitous general building blocks for synthesizing artificial group behavior in multi--agent systems, and for analyzing group behavior in nature. We demonstrate the concept through examples implemented both in simulation and on a group of physical mobile robots. The basis behavior set we propose, consisting of avoidance, safe--wandering, following, aggregation, dispersion, and homing, is constructed from behaviors commonly observed in a variety of species in nature. The proposed behaviors are manifested spatially, but have an effect on more abstract modes of interaction, including the exchange of information and cooperation. We demonstrate how basis behaviors can be combined into higher--level group behaviors commonly observed across species. The combination mechanisms we propose are useful for synthesizing a variety of new group behaviors, as well as for analyzing naturally occurring ones. Key words: group behavior, robotics, eth...

Introduction Psychologists have long paid lip service to Darwin, conceding that the human brain did arise through the course of evolution (for whatever, often unspecified, reason). But the full power of Darwinian theory is almost never used in day-to-day psychology research. This is peculiar, given the successful, integrated nature of evolutionary biology, and the typically fragmented and incomplete visage of modern psychology: one would think that a theory that explains the origin and maintenance of complex behavioral adaptations across all species (evolution) could inform and unify the study of human behavior (psychology) just as productively as it does the study of animal behavior (ethology and comparative cognition). But the emergence of a genuinely evolutionary psychology of humans (HEP) has been a slow, painful, and quite recent occurrence, marked at last by the publication of a flagship volume, The Adapted Mind. This work is of great importance not only for researchers in all br

this paper demonstrates scaling up of this movement imitative strategy for transmitting a vocabulary across a group of robotic agents, i.e. from a teacher agent to several learner agents. In particular, it shows that imitative behaviour is necessary for the grounding of the agents' proprioceptions and speeds up the grounding of exteroceptions. These studies stress the importance of behavioural social mechanisms in addition to general cognitive abilities of associativity for grounding communication in embodied agents. In particular, it shows that a simple movement imitation strategy is an interesting scenario for the transmission of a language, as it is an easy means of getting the agents to share a common context of perceptions, which is a prerequisite for a common understanding of the language to develop. It is thus suggested that a behaviour -oriented approach might be more appropriate than a pure cognitivist one which is dominating in related studies of the mechanisms involved in grounding communication

Monitoring is an important activity for anyembedded agent. To operate effectively, agents must gather information about their environment. The policy by whichtheydo this is called a monitoring strategy.Ourwork has focussed on classifying differenttypes of monitoring strategies, and understanding how strategies depend on features of the task and environment. Wehave discovered only a few general monitoring strategies, in particular periodic and interval reduction, and speculate that there are no more.

Situated and embodied reactive agents solve relatively complex tasks by coordinating action and perception. Reactive agents are generally believed to be incapable of coping with identical sensory states that require di#erent responses (i.e., perceptual ambiguity). In contrast to reactive agents, non-reactive agents can cope with perceptual ambiguity by storing and integrating sensory information over time. This paper investigates how and to what extent reactive agents can cope with perceptual ambiguity. An active categorical perception model is introduced in which agents categorise objects by coordinating action and perception. The agent's neurocontrollers are evolutionary optimised for the task. Our results show that reactive agents can cope with perceptual ambiguity despite their incapability to store sensory information over time. An analysis of behaviour reveals that reactive agents use the environment as an external memory.

While computational models are playing an increasingly important role in developmental psychology, at least one lesson from robotics is still being learned: modeling epigenetic processes often requires simulating an embodied, autonomous organism. This paper first contrasts prevailing models of infant cognition with an agent-based approach. A series of infant studies by Baillargeon (1986; Baillargeon & DeVos, 1991) is described, and an eye-movement model is then used to simulate infants' visual activity in this study. I conclude by describing three behavioral predictions of the eyemovement model, and discussing the implications of this work for infant cognition research.

This paper describes the recently developed genetic programming paradigm which genetically breeds a population of computer programs to solve problems. The paper then shows, step by step, how to apply genetic programming to a problem of behavioral ecology in biology -- specifically, two versions of the problem of finding an optimal food foraging strategy for the Caribbean Anolis lizard. A simulation of the adaptive behavior of the lizard is required to evaluate each possible adaptive control strategy considered for the lizard. The foraging strategy produced by genetic programming is close to the mathematical solution for the one version for which the solution is known and appears to be a reasonable approximation to the solution for the second version of the problem. 3 1 Introduction and Overview Organisms in nature often possess an optimally designed anatomical trait. For example, a bird's wing may be shaped to maximize lift or a leaf may be shaped to maximize interception of light. ...

This paper describes a novel method of achieving load balancing in telecommunications networks. A simulated network models a typical distribution of calls between nodes; nodes carrying an excess of traffic can become congested, causing calls to be lost. In addition to calls, the network also supports a population of simple mobile agents with behaviours modelled on the trail laying abilities of ants. The ants move across the network between randomly chosen pairs of nodes; as they move they deposit simulated pheromones as a function of their distance from their source node, and the congestion encountered on their journey. They select their path at each intermediate node according the distribution of simulated pheromones at each node. Calls between nodes are routed as a function of the pheromone distributions at each intermediate node. The performance of the network is measured by the proportion of calls which are lost. The results of using the ant-based control (ABC) are compa...

This work proposes a connectionist architecture, DRAMA, for dynamic control and learning of autonomous robots. DRAMA stands for dynamical recurrent associative memory architecture. It is a time-delay recurrent neural network, using Hebbian update rules. It allows learning of spatio-temporal regularities and time series in discrete sequences of inputs, in the face of an important amount of noise. The first part of this paper gives the mathematical description of the architecture and analyses theoretically and through numerical simulations its performance. The second part of this paper reports on the implementation of DRAMA in simulated and physical robotic experiments. Training and rehearsal of the DRAMA architecture is computationally fast and inexpensive, which makes the model particularly suitable for controlling computationally-challenged' robots. In the experiments, we use a basic hardware system with very limited computational capability and show that our robot can carr...

We review recent research in robotics, neuroscience, evolutionary neurobiology, and ethology with the aim of highlighting some points of agreement and convergence. Specifically, we compare Brooks' (1986) subsumption architecture for robot control with research in neuroscience demonstrating layered control systems in vertebrate brains, and with research in ethology that emphasizes the decomposition of control into multiple, intertwined behavior systems. From this perspective we then describe interesting parallels between the subsumption architecture and the natural layered behavior system that determines defense reactions in the rat. We then consider the action selection problem for robots and vertebrates and argue that, in addition to subsumption-like conflict resolution mechanisms, the vertebrate nervous system employs specialized selection mechanisms located in a group of central brain structures termed the basal ganglia. We suggest that similar specialized switching mechanisms might...

The traditional explanation of delayed maturation age, as part of an evolved life history, focuses on the increased costs of juvenile mortality due to early maturation. Prior quantitative models of these trade-offs, however, have addressed only morphological phenotypic traits, such as body size. We argue that the development of behavioral skills prior to reproductive maturity also constitutes an advantage of delayed maturation and thus should be included among the factors determining the trade-off for optimal age at maturity. Empirical support for this hypothesis from animal field studies is abundant. This paper provides further evidence drawn from simulation experiments. "Latent Energy Environments" (LEE) are a class of tightly controlled environments in which learning organisms are modeled by neural networks and evolved according to a type of genetic algorithm. An advantage of this artificial world is that it becomes possible to discount all non-behavioral costs of early maturity in ...

Human language is a unique ability. It sits apart from other systems of communication in two striking ways: it is syntactic, and it is learned. While most approaches to the evolution of language have focused on the evolution of syntax, this paper explores the computational issues that arise in shifting from a simple innate communication system to an equally simple one that is learned. Associative network learning within an observational learning paradigm is used to explore the computational difficulties involved in establishing and maintaining a simple learned communication system. Because Hebbian learning is found to be sufficient for this task, it is proposed that the basic computational demands of learning are unlikely to account for the rarity of even simple learned communication systems. Instead, it is the problem of observing that is likely to be central -- in particular the problem of determining what meaning a signal is intended to convey. 1 The learning barrier There is a lon...

Adaptation of ecological systems to their environments is commonly viewed through some explicit fitness function defined a priori by the experimenter, or measured a posteriori by estimations based on population size and/or reproductive rates. These methods do not capture the role of environmental complexity in shaping the selective pressures that control the adaptive process. Ecological simulations enabled by computational tools such as the Latent Energy Environments (LEE) model allow us to characterize more closely the effects of environmental complexity on the evolution of adaptive behaviors. LEE is described in this paper. Its motivation arises from the need to vary complexity in controlled and predictable ways, without assuming the relationship of these changes to the adaptive behaviors they engender. This goal is achieved through a careful characterization of environments in which different forms of "energy" are well-defined. A genetic algorithm using endogenous fitness and local ...

The structure of an environment affects the behaviors of the organisms that have evolved in it. How is that structure to be described, and how can its behavioral consequences be explained and predicted? We aim to establish initial answers to these questions by simulating the evolution of very simple organisms in simple environments with different structures. Our artificial creatures, called "minimats," have neither sensors nor memory and behave solely by picking amongst the actions of moving, eating, reproducing, and sitting, according to an inherited probability distribution. Our simulated environments contain only food (and multiple minimats) and are structured in terms of their spatial and temporal food density and the patchiness with which the food appears. Changes in these environmental parameters affect the evolved behaviors of minimats in different ways, and all three parameters are of importance in describing the minimat world. One of the most useful behavioral strategies that ...

Complete physically embodied agents present a powerful medium for the investigation of cognitive models for spatial navigation. This paper presents a maze solving robot, called a micromouse, that parallels many of the behaviours found in its biological counterpart, the rat. A cognitive model of the robot is presented and its limits investigated. Limits are found to exist with respect to biological plausibility and robot applicability. It is proposed that the fundamental representations used to store and process information are the limiting factor. A review of the literature of current cognitive models finds a lack of models suitable for implementation in real agents, and proposes that these models fail as they have not been developed with real agents in mind. A solution to this conundrum is proposed in a list of guidelines for the development of future spatial models. 1 Introduction This paper presents a complete physically embodied agent for a complex spatial navigation task; namely ...

This paper describes how an animat endowed with the MonaLysa control architecture can build a cognitive map that merges into a hierarchical framework not only topological links between landmarks, but also higher-level structures, control information, and metric distances and orientations. The paper also describes how the animat can use such a map to locate itself, even if it is endowed with noisy dead-reckoning capacities. MonaLysa's mapping and self-positioning capacities are illustrated by results obtained in three different environments and four noise-level conditions. These capacities appear to be gracefully degraded when the environment grows more challenging and when the noise level increases. In the discussion, the current approach is compared to others with similar objectives, and directions for future work are outlined. Keywords Hierarchical map. Topological information. Metric information. Landmarks. Self-positioning. Dead-reckoning. Robustness to noise. 1 Introdu...

A constructivist approach is applied to characterising social embeddedness. Social embeddedness is intended as a strong type of social situatedness. It is defined as the extent to which modelling the behaviour of an agent requires the inclusion of other agents as individuals rather than as an undifferentiated whole. Possible consequences of the presence of social embedding and ways to check for it are discussed. A model of co-developing agents is exhibited which demonstrates the possibility of social embedding. This is an extension of Brian Arthur's El Farol Bar' model, with added learning and communication. Some indicators of social embedding are analysed and some possible causes of social embedding are discussed. It is suggested that social embeddedness may be an explanation of the causal link between the social situatedness of the agent and it employing a constructivist strategy in its modelling.

This article examines the relationship between environmental and cognitive structure. One of the key tasks for any agent interacting in the real world is the management of uncertainty; because of this the cognitive structures which interact with real environments, such as would be used in navigation, must effectively cope with the uncertainty inherent in a constantly changing world. Despite this uncertainty, however, real environments usually afford structure that can be effectively exploited by organisms. The article examines environmental characteristics and structures that enable humans to survive and thrive in a wide range of real environments. The relationship between these characteristics and structures, uncertainty, and cognitive structure is explored in the context of PLAN, a proposed model of human cognitive mapping, and R-PLAN, a version of PLAN that has been instantiated on an actual mobile robot. An examination of these models helps to provide insight into environmental characteristics which impact human performance on tasks which require interaction with the world.

Achieving tasks with a multiple robot system will require a control system that is both simple and scalable as the number of robots increases. Collective behavior as demonstrated by social insects is a form of decentralized control that may prove useful in controlling multiple robots. Nature 's several examples of collective behavior have motivated our approach to controlling a multiple robot system using a group behavior. Our mechanisms, used to invoke the group behavior, allow the system of robots to perform tasks without centralized control or explicit communication. We have constructed a system of five mobile robots capable of achieving simple collective tasks to verify the results obtained in simulation. The results suggest that decentralized control without explicit communication can be used in performing cooperative tasks requiring a collective behavior. 1 Introduction Can useful tasks be accomplished by a homogeneous team of mobile robots without communication using decentral...

This paper presents a general model that covers signalling with and without conflicts of interest between signallers and receivers. Krebs and Dawkins (1984) argued that a conflict of interests will lead to an evolutionary arms race between manipulative signallers and sceptical receivers, resulting in ever more costly signals; whereas common interests will lead to cheap signals or "conspiratorial whispers". Previous simulation models of the evolution of communication have usually assumed either cooperative or competitive contexts. Simple game-theoretic and evolutionary simulation models are presented; they suggest that signalling will evolve only if it is in the interests of both parties. In a model where signallers may inform receivers as to the value of a binary random variable, if signalling is favoured at all, then signallers will always use the cheapest and the second cheapest signal available. Costly signalling arms races do not get started. A more complex evolutionary s...

It has been postulated that aspects of human language are both genetically and culturally transmitted. How might these processes interact to determine the structure of language? An agent-based model designed to study gene-culture interactions in the evolution of communication is introduced. This model shows that cultural selection resulting from learner biases can be crucial in determining the structure of communication systems transmitted through both genetic and cultural processes. Furthermore, the learning bias which leads to the emergence of optimal communication in the model resembles the learning bias brought to the task of communication by human infants. This suggests that the iterated application of such human learning biases may explain much of the structure of human language.

Research on exploratory and searching behavior of animals and robots has attracted an increasing amount of interest recently. Existing works have focused mostly on exploratory behavior guided by vision and audition. Research on smell-guided exploration has been lacking, even though animals may use the sense of smell more widely than sight or hearing to search for food and to evade danger. This article contributes to the study of smell-guided exploration. It describes a series of increasingly complex neural networks, each of which allows a simulated creature to search for food and to evade danger by using smell. Other behaviors such as obstacle negotiation and risk taking emerge naturally from the creature's interaction with the environment. Comparative studies of these networks show that there is no significant performance advantage for a creature to have more than two sensors. This result may help to explain why real animals have only one or two smell-sensing organs.

Instrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while only being rewarded for correct performance. But animals learn much more complicated behaviors through instrumental conditioning than robots presently acquire through reinforcement learning. We describe a new computational model of the conditioning process that attempts to capture some of the aspects that are missing from simple reinforcement learning: conditioned reinforcers, shifting reinforcement contingencies, explicit action sequencing, and state space refinement. We apply our model to a task commonly used to study working memory in rats and monkeys: the DMTS (Delayed Match to Sample) task. Animals learn this task in stages. In simulation, our model also acquires the task in stages, in a similar manner. We have used the model to train an RWI B21 robot.

w sense, it is not sufficient that the system's faculties determine what constitutes its environment; more than this, the organic system must actually intervene causally in the external world. This narrow conception of constructivism allows Godfrey-Smith to make a sharp contrast between, on the one hand, an organism constructing its environment and, on the other hand, an organism changing itself rather than its environment and so merely accommodating its environment (p. 147). (Hereafter I will always use "construct" and its cognates in Godfrey-Smith's preferred narrow sense.) Classifying explanations into these categories involves some subtleties. For one thing, although the definitions might suggest that the distinction between externalist and internalist explanations is dichotomous, Godfrey-Smith is clear that the distinction defines the poles of a continuous range of positions. The explanations of most organic systems invoke both internal and external factors (p. 51), so the degree

This paper addresses the relation between memory, representation and adaptive behavior. More specifically, it demonstrates and discusses the use of synaptic plasticity, realized through neuromodulation of sensorimotor mappings, as a shortterm memory mechanism in delayed response tasks. A number of experiments with extended sequential cascaded networks, i.e. higher-order recurrent neural nets, controlling simple robotic agents in six different delayed response tasks are presented. The focus of the analysis is on how short-term memory is realized in such control networks through the dynamic modulation of sensorimotor mappings (rather than through feedback of neuronal activation, as in conventional recurrent nets), and how these internal dynamics interact with environmental/behavioral dynamics. In particular, it is demonstrated in the analysis of the last experimental scenario how this type of network can make very selective use of feedback/memory, while as far as possible limiting itself to the use of reactive sensorimotor mechanisms and occasional switches between them.

It has been reported recently that learning has a beneficial effect on evolution even if the learning involved the acquisition of an ability which is different from the ability for which individuals were selected (Nolfi, Elman & Parisi, 1994). This effect was explained as the result of the interaction between learning and evolution. In a successive paper, however, the effect was explained as a form of recovery from weight perturbation caused by mutations (Harvey, 1996, 1997). In this paper I provide additional data that show how the effect, at least in the case considered in the paper, can only be explained as a result of the interaction between learning and evolution as originally hypothesized. In a recent article Jeffrey Elman, Domenico Parisi, and I reported the results of a set of simulations in which neural networks that evolve (to become fitter at one task) at the population level may also learn (a different task) at the individual level (Nolfi, Elman & Parisi, 1994). In ...

The adaptive value of emotions in nature indicates that they might also be useful in artificial creatures. Experiments were carried out to investigate this hypothesis in a simulated learning robot. For this purpose, a non-symbolic emotion model was developed that takes the form of a recurrent artificial neural network where emotions both depend on and influence the perception of the state of the world. This emotion model was integrated in a reinforcement-learning architecture with three different roles: influencing perception, providing reinforcement value, and determining when to reevaluate decisions. Experiments to test and compare this emotion-dependent architecture with a more conventional architecture were done in the context of a solitary learning robot performing a survival task. This research led to the conclusion that artificial emotions are a useful construct to have in the domain of behavior-based autonomous agents with multiple goals and faced with an unstructured environment, because they provide a unifying way to tackle different issues of control, analogous to natural systems' emotions.

Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. 1 Introduction Many problems faced by an autonomous agent in an unknown environment can be cast in the form of reinforcement learning tasks. Recent work in this area has led to a clearer understanding of the relationship between algorithms found useful for such tasks and asynchronous approaches to dynamic programming (Bertsekas & Tsitsiklis, 1989), and this understanding has led in turn to both new results relevant to the theory of dynamic programming (Barto, Bradtke, & Singh, 1991; Watkins & Dayan, 1991; Williams & Baird, 1990) and the creation of new reinforcement learning algorithms, such as Qlearn...

This paper explores the use of a real-valued modular genetic algorithm to evolve continuous-time recurrent neural networks capable of sequential behavior and learning. We evolve networks that can generate a fixed sequence of outputs in response to an external trigger occurring at varying intervals of time. We also evolve networks that can learn to generate one of a set of possible sequences based upon reinforcement from the environment. Finally, we utilize concepts from dynamical systems theory to understand the operation of some of these evolved networks. A novel feature of our approach is that we assume neither an a priori discretization of states or time nor an a priori learning algorithm that explicitly modifies network parameters during learning. Rather, we merely expose dynamical neural networks to tasks that require sequential behavior and learning and allow the genetic algorithm to evolve network dynamics capable of accomplishing these tasks. 2 1. Introduction Much of the rec...

If behavior networks, which use spreading activation to select actions, are analogous to connectionist methods of pattern recognition, then we suggest that recurrent behavior networks, which use energy minimization, are analogous to Hopfield networks. Hopfield networks memorize patterns by making them attractors. We argue that, similarly, each behavior of a recurrent behavior network should be an attractor of the network, to inhibit fruitless, repeated switching between different behaviors in response to small changes in the environment and in motivations. We demonstrate that the performance in a test domain of the Do the Right Thing recurrent behavior network is improved by redesigning it to create desirable attractors and basins of attraction. We further show that this performance increase is correlated with an increase in persistence and a decrease in undesirable behavior-switching.

We explore the use of behavior-based architectures within the context of reinforcement learning and examine the effects of using different behavior-based architectures on the ability to learn the task at hand correctly and efficiently. In particular, we study the task of learning to push boxes in a simulated 2D environment originally proposed by Mahadevan and Connell [Mahadevan and Connell, 1992]. We examine issues such as effectiveness of learning, flexibility of the learning method to adapt to new environments, effect of the behavior architecture on the ability to learn, and we report results obtained on a large number of simulation runs. Keywords: Reinforcement learning, behavior-based architectures, robot learning. 1 Introduction Behavior-based architectures [Brooks, 1986] are extremely popular for robotics. In this paper we examine the use of behavior-based architectures within the context of reinforcement learning and examine the effects of using different behavior-base...

A new way of building control systems, known as behavior based robotics, has recently been proposed to overcome the difficulties of the traditional AI approach to robotics. This new approach is based upon the idea of providing the robot with a range of simple behaviors and letting the environment determine which behavior should have control at any given time. We will present a set of experiments in which neural networks with different architectures have been trained to control a mobile robot designed to keep an arena clear by picking-up trash objects and releasing them outside the arena. Controller weights are selected using a form of genetic algorithm and do not change during the lifetime (i.e. no learning occurs). We will compare, in simulation and on a real robot, five different network architectures and will show that a network which allows for fine-grained modularity achieves significantly better performance. By comparing the functionality of each network module and its interactio...

Several researchers have demonstrated how complex behavior can be learned through neuro-evolution (i.e. evolving neural networks with genetic algorithms). However, complex general behavior such as evading predators or avoiding obstacles, which is not tied to specific environments, turns out to be very difficult to evolve. Often the system discovers mechanical strategies (such as moving back and forth) that help the agent cope, but are not very effective, do not appear believable and would not generalize to new environments. The problem is that a general strategy is too difficult for the evolution system to discover directly. This paper proposes an approach where such complex general behavior is learned incrementally, by starting with simpler behavior and gradually making the task more challenging and general. The task transitions are implemented through successive stages of delta-coding (i.e. evolving modifications), which allows even converged populations to adapt to the new task. The...

The paper describes simulations on populations of neural networks that both evolve at the population level and learn at the individual level. Unlike other simulations, the evolutionary task (finding food in the environment) and the learning task (predicting the next position of food on the basis of present position and planned network's movement) are different tasks. In these conditions both learning influences evolution (without Lamarckian inheritance of learned weight changes) and evolution influences learning. Average but not peak fitness has a better evolutionary growth with learning than without learning. After the initial generations individuals that learn to predict during life also improve their food finding ability during life. Furthermore, individuals which inherit an innate capacity to find food also inherit an innate predisposition to learn to predict the sensory consequences of their movements. They do not predict better at birth but they do learn to predict bett...