Adaptive Behavior

Published by SAGE Publications
Print ISSN: 1059-7123
The Iterated Classification Game (ICG) combines the Classification Game with the Iterated Learning Model (ILM) to create a more realistic model of the cultural transmission of language through generations. It includes both learning from parents and learning from peers. Further, it eliminates some of the chief criticisms of the ILM: that it does not study grounded languages, that it does not include peer learning, and that it builds in a bias for compositional languages. We show that, over the span of a few generations, a stable linguistic system emerges that can be acquired very quickly by each generation, is compositional, and helps the agents to solve the classification problem with which they are faced. The ICG also leads to a different interpretation of the language acquisition process. It suggests that the role of parents is to initialize the linguistic system of the child in such a way that subsequent interaction with peers results in rapid convergence to the correct language.
The Dyna class of reinforcement learning architectures enables the creation of integrated learning, planning and reacting systems. A class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency is examined. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. It is proposed that the backups to be performed in Dyna be prioritized in order to improve its efficiency. It is demonstrated with simple tasks that use some specific prioritizing schemes can lead to significant reductions in computational effort and corresponding improvements in learning performance
This article describes a biomimetic control architecture affording an animat both action selection and navigation functionalities. It satisfies the survival constraint of an artificial metabolism and supports several complementary navigation strategies. It builds upon an action selection model based on the basal ganglia of the vertebrate brain, using two interconnected cortico-basal ganglia-thalamo-cortical loops: a ventral one concerned with appetitive actions and a dorsal one dedicated to consummatory actions. The performances of the resulting model are evaluated in simulation. The experiments assess the prolonged survival permitted by the use of high level navigation strategies and the complementarity of navigation strategies in dynamic environments. The correctness of the behavioral choices in situations of antagonistic or synergetic internal states are also tested. Finally, the modelling choices are discussed with regard to their biomimetic plausibility, while the experimental results are estimated in terms of animat adaptivity.
This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.
This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process.
Temporal development in packet transmissions for all 30 flows in the network. The source node of each flow is plotted on the y-axis, and when the packets of each flow are transmitted from one of the relay nodes, the packet is marked at the corresponding time step (x-axis). Thus, each dot corresponds to the transmission event of a packet in the flow, and dropped packets are shown in red. When the duty is low (top), the plot is a V-shaped curve, with the shortest paths at the middle nodes representing the minimum packet transmission time and the longest paths at the peripheral nodes representing the maximum packet transmission time. When the duty ratio is higher, the shape has two more peaks (middle). This corresponds to the origin of congestion. For higher duty rations, a moth-eaten shape with many drop events is observed (bottom).  
cwnd bifurcation diagrams drawn by taking local peaks of cwnd time series for flows 0, 5, 15, and 20 with respect to duty ratio. Changes in the cwnd dynamics from periodic to chaotic are shown.  
Examples of state transition diagrams for flows 0, 5, 15, and 20 with the duty ratio of x = 0.20 (top) and x = 0.30 (bottom). The nodes colored in blue depict the state newly created by a perturbation, while the edges colored in blue depict transitions by perturbation. The value on the edge depicts the number of transition occurrences.  
In this study, we investigate the adaptation and robustness of a packet switching network (PSN), the fundamental architecture of the Internet. We claim that the adaptation introduced by a transmission control protocol (TCP) congestion control mechanism is interpretable as the self-organization of multiple attractors and stability to switch from one attractor to another. To discuss this argument quantitatively, we study the adaptation of the Internet by simulating a PSN using ns-2. Our hypothesis is that the robustness and fragility of the Internet can be attributed to the inherent dynamics of the PSN feedback mechanism called the congestion window size, or \textit{cwnd}. By varying the data input into the PSN system, we investigate the possible self-organization of attractors in cwnd temporal dynamics and discuss the adaptability and robustness of PSNs. The present study provides an example of Ashby's Law of Requisite Variety in action.
An average of 100 cycles like the three shown in Figure 3 (right panel), aligned on the onset of sensor saturation. Error bars are slightly wider than the lines themselves and overlap substantially, so are dropped for clarity
Nexting can be extended, for example to consider time-varying gamma to predict of the amount of power that the robot will expend before a probabilistic pseudo-termination with a 2-second time horizon or a saturation event on Light3.
The term "nexting" has been used by psychologists to refer to the propensity of people and many other animals to continually predict what will happen next in an immediate, local, and personal sense. The ability to "next" constitutes a basic kind of awareness and knowledge of one's environment. In this paper we present results with a robot that learns to next in real time, predicting thousands of features of the world's state, including all sensory inputs, at timescales from 0.1 to 8 seconds. This was achieved by treating each state feature as a reward-like target and applying temporal-difference methods to learn a corresponding value function with a discount rate corresponding to the timescale. We show that two thousand predictions, each dependent on six thousand state features, can be learned and updated online at better than 10Hz on a laptop computer, using the standard TD(lambda) algorithm with linear function approximation. We show that this approach is efficient enough to be practical, with most of the learning complete within 30 minutes. We also show that a single tile-coded feature representation suffices to accurately predict many different signals at a significant range of timescales. Finally, we show that the accuracy of our learned predictions compares favorably with the optimal off-line solution.
Language is often regarded as the hallmark for human intelligence. So, what design features make humans' linguistic behavior so special? First, human language is largely symbolic, which means that the communicative signals have either an arbitrary relationship to their meaning or reference, or this relationship is conventionalized (Peirce, 1931-58). As a result, the relationship between signal and reference must be learnt. Second, the number of words that make up a typical language is extremely large. There have been estimates that humans by the age of 18 have acquired approximately 60,000 words (Anglin, 1993). Third, the human vocal apparatus and auditory system allow us to produce and distinguish many different sounds, which we can combine in a controlled fashion to make even more distinctive vocalizations. Fourth, human language is an open system, so we can easily invent new words and communicate about previously unseen objects or events, thus allowing for language to grow and change. Finally, language has a complex grammar, which allows us to combine words in a different order and inflect words to give utterances different meanings. In effect, this allows humans to produce an infinite number of utterances given a finite number of means(Chomsky, 1956). This special issue includes computation studies which have provided major insight into the underlying principles of adaptive systems--biological evolution, individual learning, and cultural evolution--that interact with each other to account for language evolution. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Increase in the fitness of the single best individual of 1,000 successive generations for the population with learning during life (black curve) and for the population without learning (grey curve). Each curve represents the average of 10 replications.  
Performance of learning (thick curve) and nonlearning (thin curve) individuals at birth across 1000 generations. The performance of learning individuals has been assessed by letting these individuals live for 10 epochs without any learning. Average of 10 replications.  
Performance (fitness) of the learning individuals in the 10 successive epochs of their life. Each curve represents the average result of 200 successive generations in 10 replications of the simulation. Performance at epoch 0 was calculated by measuring the fitness of the individual at the end of an entire epoch (500 cycles) without learning.  
In order to study learning as an adaptive process it is necessary to take into consideration the role of evolution which is the primary adaptive process. In addition, learning should be studied in (artificial) organisms that live in an independent physical environment in such a way that the input from the environment can be at least partially controlled by the organisms' behavior. To explore these issues we used a genetic algorithm to simulate the evolution of a population of neural networks each controlling the behavior of a small mobile robot that must explore efficiently an environment surrounded by walls. Since the environment changes from one generation to the next each network must learn during its life to adapt to the particular environment it happens to be born in. We found that evolved networks incorporate a genetically inherited predisposition to learn that can be described as: (a) the presence of initial conditions that tend to canalize learning in the right direc...
This paper concerns the relationship between the detectable and useful structure in an environment and the degree to which a population can adapt to that environment. We explore the hypothesis that adaptability will depend unimodally on environmental variety, and we measure this component of environmental structure using the information-theoretic uncertainty (Shannon entropy) of detectable environmental conditions. We define adaptability as the degree to which a certain kind of population successfully adapts to a certain kind of environment, and we measure adaptability by comparing a population's size to the size of a non-adapting, but otherwise comparable, population in the same environment. We study the relationship between adaptability and environmental structure in an evolving artificial population of sensorimotor agents that live, reproduce, and die in a variety of environments. We find that adaptability does not show a unimodal dependence on environmental variety alone...
-Comparison among s-r and ∆ behaviors 
We present an approach to support effective learning and adaptation of behaviors for autonomous agents with reinforcement learning algorithms. These methods can identify control systems that optimize a reinforcement program, which is, usually, a straightforward representation of the designer's goals. Reinforcement learning algorithms usually are too slow to be applied in real time on embodied agents, although they provide a suitable way to represent the desired behavior. We have tackled three aspects of this problem: the speed of the algorithm, the learning procedure, and the control system architecture. The learning algorithm we have developed includes features to speed up learning, such as niche-based learning, and a representation of the control modules in terms of fuzzy rules that reduces the search space and improves robustness to noisy data. Our learning procedure exploits methodologies such as learning from easy missions and transfer of policy from simpler environments to the more complex. The architecture of our control system is layered and modular, so that each module has a low complexity and can be learned in a short time. The composition of the actions proposed by the modules is either learned or predefined. Finally, we adopt an anytime learning approach to improve the quality of the control system on-line and to adapt it to dynamic environments. The experiments we present in this article concern learning to reach another moving agent in a real, dynamic environment that includes nontrivial situations such as that in which the moving target is faster than the agent and that in which the target is hidden by obstacles.
Graphic representation of the initial generation.  
Number of generations to evolve sexual imprinting versus mutation rate. Dots are data points from 20 runs; means values are indicated by the solid line; and standard deviations around the mean are indicated by the dashed lines. The mutation rate varies from 0.0 to 0.1, in steps of .001 from 0.0 to 0.01, and in steps of .01 from 0.01 to 0.1. The x-axis is transformed logarithmically (but with mutation rate 0.0 added at the far left) to allow the small mutation rates to be seen more distinctly. The values for the randomwalk case (no selection on the learning gene) are shown at the far right.  
The study of adaptive behavior, including learning, usually centers on the effects of natural selection for individual survival. But because reproduction is evolutionarily more important than survival, sexual selection through mate choice (Darwin, 1871), can also have profound consequences on the evolution of creatures' bodies and behaviors. This paper shows through simulation models how one type of learning, parental imprinting, can evolve purely through sexual selection, to help in selecting appropriate mates and in tracking changes in the phenotypic makeup of the population across generations. At moderate mutation rates, when populationtracking becomes an important but still soluble problem, imprinting proves more useful and evolves more quickly than at low or high mutation rates. We also show that parental imprinting can facilitate the formation of new species. In reviewing the biological literature on imprinting, we note that these results confirm some previous s...
The paper presents a neural network architecture (MAXSON) based on second-order connections that can learn a multiple goal approach/avoid task using reinforcement from the environment. It also enables an agent to learn vicariously, from the successes and failures of other agents. The paper shows that MAXSON can learn certain spatial navigation tasks much faster than traditional Q-learning, as well as learn goal directed behavior, increasing the agent's chances of long-term survival. The paper shows that an extension of MAXSON (V-MAXSON) enables agents to learn vicariously, and this improves the overall survivability of the agent population.
This paper proposes the concept of basis behaviors as ubiquitous general building blocks for synthesizing artificial group behavior in multi--agent systems, and for analyzing group behavior in nature. We demonstrate the concept through examples implemented both in simulation and on a group of physical mobile robots. The basis behavior set we propose, consisting of avoidance, safe--wandering, following, aggregation, dispersion, and homing, is constructed from behaviors commonly observed in a variety of species in nature. The proposed behaviors are manifested spatially, but have an effect on more abstract modes of interaction, including the exchange of information and cooperation. We demonstrate how basis behaviors can be combined into higher--level group behaviors commonly observed across species. The combination mechanisms we propose are useful for synthesizing a variety of new group behaviors, as well as for analyzing naturally occurring ones. Key words: group behavior, robotics, eth...
Introduction Psychologists have long paid lip service to Darwin, conceding that the human brain did arise through the course of evolution (for whatever, often unspecified, reason). But the full power of Darwinian theory is almost never used in day-to-day psychology research. This is peculiar, given the successful, integrated nature of evolutionary biology, and the typically fragmented and incomplete visage of modern psychology: one would think that a theory that explains the origin and maintenance of complex behavioral adaptations across all species (evolution) could inform and unify the study of human behavior (psychology) just as productively as it does the study of animal behavior (ethology and comparative cognition). But the emergence of a genuinely evolutionary psychology of humans (HEP) has been a slow, painful, and quite recent occurrence, marked at last by the publication of a flagship volume, The Adapted Mind. This work is of great importance not only for researchers in all br
this paper demonstrates scaling up of this movement imitative strategy for transmitting a vocabulary across a group of robotic agents, i.e. from a teacher agent to several learner agents. In particular, it shows that imitative behaviour is necessary for the grounding of the agents' proprioceptions and speeds up the grounding of exteroceptions. These studies stress the importance of behavioural social mechanisms in addition to general cognitive abilities of associativity for grounding communication in embodied agents. In particular, it shows that a simple movement imitation strategy is an interesting scenario for the transmission of a language, as it is an easy means of getting the agents to share a common context of perceptions, which is a prerequisite for a common understanding of the language to develop. It is thus suggested that a behaviour -oriented approach might be more appropriate than a pure cognitivist one which is dominating in related studies of the mechanisms involved in grounding communication
A plot of cell means for a two-way ANOVA, monitoring error m by distance. The dependent measure is the tness of the best individual.  
Monitoring is an important activity for anyembedded agent. To operate effectively, agents must gather information about their environment. The policy by whichtheydo this is called a monitoring strategy.Ourwork has focussed on classifying differenttypes of monitoring strategies, and understanding how strategies depend on features of the task and environment. Wehave discovered only a few general monitoring strategies, in particular periodic and interval reduction, and speculate that there are no more.
Situated and embodied reactive agents solve relatively complex tasks by coordinating action and perception. Reactive agents are generally believed to be incapable of coping with identical sensory states that require di#erent responses (i.e., perceptual ambiguity). In contrast to reactive agents, non-reactive agents can cope with perceptual ambiguity by storing and integrating sensory information over time. This paper investigates how and to what extent reactive agents can cope with perceptual ambiguity. An active categorical perception model is introduced in which agents categorise objects by coordinating action and perception. The agent's neurocontrollers are evolutionary optimised for the task. Our results show that reactive agents can cope with perceptual ambiguity despite their incapability to store sensory information over time. An analysis of behaviour reveals that reactive agents use the environment as an external memory.
While computational models are playing an increasingly important role in developmental psychology, at least one lesson from robotics is still being learned: modeling epigenetic processes often requires simulating an embodied, autonomous organism. This paper first contrasts prevailing models of infant cognition with an agent-based approach. A series of infant studies by Baillargeon (1986; Baillargeon & DeVos, 1991) is described, and an eye-movement model is then used to simulate infants' visual activity in this study. I conclude by describing three behavioral predictions of the eyemovement model, and discussing the implications of this work for infant cognition research.
This paper describes the recently developed genetic programming paradigm which genetically breeds a population of computer programs to solve problems. The paper then shows, step by step, how to apply genetic programming to a problem of behavioral ecology in biology -- specifically, two versions of the problem of finding an optimal food foraging strategy for the Caribbean Anolis lizard. A simulation of the adaptive behavior of the lizard is required to evaluate each possible adaptive control strategy considered for the lizard. The foraging strategy produced by genetic programming is close to the mathematical solution for the one version for which the solution is known and appears to be a reasonable approximation to the solution for the second version of the problem. 3 1 Introduction and Overview Organisms in nature often possess an optimally designed anatomical trait. For example, a bird's wing may be shaped to maximize lift or a leaf may be shaped to maximize interception of light. ...
This paper describes a novel method of achieving load balancing in telecommunications networks. A simulated network models a typical distribution of calls between nodes; nodes carrying an excess of traffic can become congested, causing calls to be lost. In addition to calls, the network also supports a population of simple mobile agents with behaviours modelled on the trail laying abilities of ants. The ants move across the network between randomly chosen pairs of nodes; as they move they deposit simulated pheromones as a function of their distance from their source node, and the congestion encountered on their journey. They select their path at each intermediate node according the distribution of simulated pheromones at each node. Calls between nodes are routed as a function of the pheromone distributions at each intermediate node. The performance of the network is measured by the proportion of calls which are lost. The results of using the ant-based control (ABC) are compa...
This work proposes a connectionist architecture, DRAMA, for dynamic control and learning of autonomous robots. DRAMA stands for dynamical recurrent associative memory architecture. It is a time-delay recurrent neural network, using Hebbian update rules. It allows learning of spatio-temporal regularities and time series in discrete sequences of inputs, in the face of an important amount of noise. The first part of this paper gives the mathematical description of the architecture and analyses theoretically and through numerical simulations its performance. The second part of this paper reports on the implementation of DRAMA in simulated and physical robotic experiments. Training and rehearsal of the DRAMA architecture is computationally fast and inexpensive, which makes the model particularly suitable for controlling `computationally-challenged' robots. In the experiments, we use a basic hardware system with very limited computational capability and show that our robot can carr...
We review recent research in robotics, neuroscience, evolutionary neurobiology, and ethology with the aim of highlighting some points of agreement and convergence. Specifically, we compare Brooks' (1986) subsumption architecture for robot control with research in neuroscience demonstrating layered control systems in vertebrate brains, and with research in ethology that emphasizes the decomposition of control into multiple, intertwined behavior systems. From this perspective we then describe interesting parallels between the subsumption architecture and the natural layered behavior system that determines defense reactions in the rat. We then consider the action selection problem for robots and vertebrates and argue that, in addition to subsumption-like conflict resolution mechanisms, the vertebrate nervous system employs specialized selection mechanisms located in a group of central brain structures termed the basal ganglia. We suggest that similar specialized switching mechanisms might...
The traditional explanation of delayed maturation age, as part of an evolved life history, focuses on the increased costs of juvenile mortality due to early maturation. Prior quantitative models of these trade-offs, however, have addressed only morphological phenotypic traits, such as body size. We argue that the development of behavioral skills prior to reproductive maturity also constitutes an advantage of delayed maturation and thus should be included among the factors determining the trade-off for optimal age at maturity. Empirical support for this hypothesis from animal field studies is abundant. This paper provides further evidence drawn from simulation experiments. "Latent Energy Environments" (LEE) are a class of tightly controlled environments in which learning organisms are modeled by neural networks and evolved according to a type of genetic algorithm. An advantage of this artificial world is that it becomes possible to discount all non-behavioral costs of early maturity in ...
Human language is a unique ability. It sits apart from other systems of communication in two striking ways: it is syntactic, and it is learned. While most approaches to the evolution of language have focused on the evolution of syntax, this paper explores the computational issues that arise in shifting from a simple innate communication system to an equally simple one that is learned. Associative network learning within an observational learning paradigm is used to explore the computational difficulties involved in establishing and maintaining a simple learned communication system. Because Hebbian learning is found to be sufficient for this task, it is proposed that the basic computational demands of learning are unlikely to account for the rarity of even simple learned communication systems. Instead, it is the problem of observing that is likely to be central -- in particular the problem of determining what meaning a signal is intended to convey. 1 The learning barrier There is a lon...
Adaptation of ecological systems to their environments is commonly viewed through some explicit fitness function defined a priori by the experimenter, or measured a posteriori by estimations based on population size and/or reproductive rates. These methods do not capture the role of environmental complexity in shaping the selective pressures that control the adaptive process. Ecological simulations enabled by computational tools such as the Latent Energy Environments (LEE) model allow us to characterize more closely the effects of environmental complexity on the evolution of adaptive behaviors. LEE is described in this paper. Its motivation arises from the need to vary complexity in controlled and predictable ways, without assuming the relationship of these changes to the adaptive behaviors they engender. This goal is achieved through a careful characterization of environments in which different forms of "energy" are well-defined. A genetic algorithm using endogenous fitness and local ...
The structure of an environment affects the behaviors of the organisms that have evolved in it. How is that structure to be described, and how can its behavioral consequences be explained and predicted? We aim to establish initial answers to these questions by simulating the evolution of very simple organisms in simple environments with different structures. Our artificial creatures, called "minimats," have neither sensors nor memory and behave solely by picking amongst the actions of moving, eating, reproducing, and sitting, according to an inherited probability distribution. Our simulated environments contain only food (and multiple minimats) and are structured in terms of their spatial and temporal food density and the patchiness with which the food appears. Changes in these environmental parameters affect the evolved behaviors of minimats in different ways, and all three parameters are of importance in describing the minimat world. One of the most useful behavioral strategies that ...
The maze used in the 1996 Australian micromouse championships.
The mechanical layout of CUQEE.
The cognitive architecture of CUQEE has three levels. The lowest level is implemented as schemas which interface in a reactive manner with the world. The cognitive level instantiates schemas to perform the spatial navigation task. The cognitive level operates virtually with a cognitive map of the maze. A motivational level generates goals, and determines the contest winning strategy.
A section of the maze from the 1996 Australian micromouse championships that features a stair case pattern.
The flow of information between different cognitive processes. The key resource is the map which stores the maze. This map is constructed by a map building process, and can be recalled by a map recall process. The recall module plans paths using the solutions to the maze calculated by the maze solver. Action is generated by an action instantiation module, with action integration during recall. Any physical action of the robot is accompanied by virtual movement in the location maintenance module, providing that the low level schemas indicate a satisfactory execution of action.
Complete physically embodied agents present a powerful medium for the investigation of cognitive models for spatial navigation. This paper presents a maze solving robot, called a micromouse, that parallels many of the behaviours found in its biological counterpart, the rat. A cognitive model of the robot is presented and its limits investigated. Limits are found to exist with respect to biological plausibility and robot applicability. It is proposed that the fundamental representations used to store and process information are the limiting factor. A review of the literature of current cognitive models finds a lack of models suitable for implementation in real agents, and proposes that these models fail as they have not been developed with real agents in mind. A solution to this conundrum is proposed in a list of guidelines for the development of future spatial models. 1 Introduction This paper presents a complete physically embodied agent for a complex spatial navigation task; namely ...
This paper describes how an animat endowed with the MonaLysa control architecture can build a cognitive map that merges into a hierarchical framework not only topological links between landmarks, but also higher-level structures, control information, and metric distances and orientations. The paper also describes how the animat can use such a map to locate itself, even if it is endowed with noisy dead-reckoning capacities. MonaLysa's mapping and self-positioning capacities are illustrated by results obtained in three different environments and four noise-level conditions. These capacities appear to be gracefully degraded when the environment grows more challenging and when the noise level increases. In the discussion, the current approach is compared to others with similar objectives, and directions for future work are outlined. Keywords Hierarchical map. Topological information. Metric information. Landmarks. Self-positioning. Dead-reckoning. Robustness to noise. 1 Introdu...
A constructivist approach is applied to characterising social embeddedness. Social embeddedness is intended as a strong type of social situatedness. It is defined as the extent to which modelling the behaviour of an agent requires the inclusion of other agents as individuals rather than as an undifferentiated whole. Possible consequences of the presence of social embedding and ways to check for it are discussed. A model of co-developing agents is exhibited which demonstrates the possibility of social embedding. This is an extension of Brian Arthur's `El Farol Bar' model, with added learning and communication. Some indicators of social embedding are analysed and some possible causes of social embedding are discussed. It is suggested that social embeddedness may be an explanation of the causal link between the social situatedness of the agent and it employing a constructivist strategy in its modelling.
This article examines the relationship between environmental and cognitive structure. One of the key tasks for any agent interacting in the real world is the management of uncertainty; because of this the cognitive structures which interact with real environments, such as would be used in navigation, must effectively cope with the uncertainty inherent in a constantly changing world. Despite this uncertainty, however, real environments usually afford structure that can be effectively exploited by organisms. The article examines environmental characteristics and structures that enable humans to survive and thrive in a wide range of real environments. The relationship between these characteristics and structures, uncertainty, and cognitive structure is explored in the context of PLAN, a proposed model of human cognitive mapping, and R-PLAN, a version of PLAN that has been instantiated on an actual mobile robot. An examination of these models helps to provide insight into environmental characteristics which impact human performance on tasks which require interaction with the world.
The box-pushing robot model. Each robot is equipped with a goal sensor, an obstacle sensor, a robot sensor, and two actuators: a left and right wheel motor.
The box-pushing robot's behavior architecture. A behavior's actuator commands may be suppressed (the circles with an`San`S') and replaced by those of a higher priority.
The initial con guration of the cooperative box- pushing task (top) and after 404 simulation steps in which the box has been moved 130 units upwards. The robots (circles) must locate and push the large box, which is too heavy to be moved by a single robot; therefore requiring the cooperative e ort of at least 2 robots pushing on the same side. 
The box-pushing robot's control architecture. Behavior arbitration is handled using a xed priority sub- sumption network. 
The box-pushing robot's arbitration circuit us- ing simple combinational logic. 
Achieving tasks with a multiple robot system will require a control system that is both simple and scalable as the number of robots increases. Collective behavior as demonstrated by social insects is a form of decentralized control that may prove useful in controlling multiple robots. Nature 's several examples of collective behavior have motivated our approach to controlling a multiple robot system using a group behavior. Our mechanisms, used to invoke the group behavior, allow the system of robots to perform tasks without centralized control or explicit communication. We have constructed a system of five mobile robots capable of achieving simple collective tasks to verify the results obtained in simulation. The results suggest that decentralized control without explicit communication can be used in performing cooperative tasks requiring a collective behavior. 1 Introduction Can useful tasks be accomplished by a homogeneous team of mobile robots without communication using decentral...
This paper presents a general model that covers signalling with and without conflicts of interest between signallers and receivers. Krebs and Dawkins (1984) argued that a conflict of interests will lead to an evolutionary arms race between manipulative signallers and sceptical receivers, resulting in ever more costly signals; whereas common interests will lead to cheap signals or "conspiratorial whispers". Previous simulation models of the evolution of communication have usually assumed either cooperative or competitive contexts. Simple game-theoretic and evolutionary simulation models are presented; they suggest that signalling will evolve only if it is in the interests of both parties. In a model where signallers may inform receivers as to the value of a binary random variable, if signalling is favoured at all, then signallers will always use the cheapest and the second cheapest signal available. Costly signalling arms races do not get started. A more complex evolutionary s...
It has been postulated that aspects of human language are both genetically and culturally transmitted. How might these processes interact to determine the structure of language? An agent-based model designed to study gene-culture interactions in the evolution of communication is introduced. This model shows that cultural selection resulting from learner biases can be crucial in determining the structure of communication systems transmitted through both genetic and cultural processes. Furthermore, the learning bias which leads to the emergence of optimal communication in the model resembles the learning bias brought to the task of communication by human infants. This suggests that the iterated application of such human learning biases may explain much of the structure of human language.
Research on exploratory and searching behavior of animals and robots has attracted an increasing amount of interest recently. Existing works have focused mostly on exploratory behavior guided by vision and audition. Research on smell-guided exploration has been lacking, even though animals may use the sense of smell more widely than sight or hearing to search for food and to evade danger. This article contributes to the study of smell-guided exploration. It describes a series of increasingly complex neural networks, each of which allows a simulated creature to search for food and to evade danger by using smell. Other behaviors such as obstacle negotiation and risk taking emerge naturally from the creature's interaction with the environment. Comparative studies of these networks show that there is no significant performance advantage for a creature to have more than two sensors. This result may help to explain why real animals have only one or two smell-sensing organs.
Flow of information in the learning program.
Instrumental (or operant) conditioning, a form of animal learning, is similar to reinforcement learning (Watkins, 1989) in that it allows an agent to adapt its actions to gain maximally from the environment while only being rewarded for correct performance. But animals learn much more complicated behaviors through instrumental conditioning than robots presently acquire through reinforcement learning. We describe a new computational model of the conditioning process that attempts to capture some of the aspects that are missing from simple reinforcement learning: conditioned reinforcers, shifting reinforcement contingencies, explicit action sequencing, and state space refinement. We apply our model to a task commonly used to study working memory in rats and monkeys: the DMTS (Delayed Match to Sample) task. Animals learn this task in stages. In simulation, our model also acquires the task in stages, in a similar manner. We have used the model to train an RWI B21 robot.
w sense, it is not sufficient that the system's faculties determine what constitutes its environment; more than this, the organic system must actually intervene causally in the external world. This narrow conception of constructivism allows Godfrey-Smith to make a sharp contrast between, on the one hand, an organism constructing its environment and, on the other hand, an organism changing itself rather than its environment and so merely accommodating its environment (p. 147). (Hereafter I will always use "construct" and its cognates in Godfrey-Smith's preferred narrow sense.) Classifying explanations into these categories involves some subtleties. For one thing, although the definitions might suggest that the distinction between externalist and internalist explanations is dichotomous, Godfrey-Smith is clear that the distinction defines the poles of a continuous range of positions. The explanations of most organic systems invoke both internal and external factors (p. 51), so the degree
This paper addresses the relation between memory, representation and adaptive behavior. More specifically, it demonstrates and discusses the use of synaptic plasticity, realized through neuromodulation of sensorimotor mappings, as a shortterm memory mechanism in delayed response tasks. A number of experiments with extended sequential cascaded networks, i.e. higher-order recurrent neural nets, controlling simple robotic agents in six different delayed response tasks are presented. The focus of the analysis is on how short-term memory is realized in such control networks through the dynamic modulation of sensorimotor mappings (rather than through feedback of neuronal activation, as in conventional recurrent nets), and how these internal dynamics interact with environmental/behavioral dynamics. In particular, it is demonstrated in the analysis of the last experimental scenario how this type of network can make very selective use of feedback/memory, while as far as possible limiting itself to the use of reactive sensorimotor mechanisms and occasional switches between them.
It has been reported recently that learning has a beneficial effect on evolution even if the learning involved the acquisition of an ability which is different from the ability for which individuals were selected (Nolfi, Elman & Parisi, 1994). This effect was explained as the result of the interaction between learning and evolution. In a successive paper, however, the effect was explained as a form of recovery from weight perturbation caused by mutations (Harvey, 1996, 1997). In this paper I provide additional data that show how the effect, at least in the case considered in the paper, can only be explained as a result of the interaction between learning and evolution as originally hypothesized. In a recent article Jeffrey Elman, Domenico Parisi, and I reported the results of a set of simulations in which neural networks that evolve (to become fitter at one task) at the population level may also learn (a different task) at the individual level (Nolfi, Elman & Parisi, 1994). In ...
The adaptive value of emotions in nature indicates that they might also be useful in artificial creatures. Experiments were carried out to investigate this hypothesis in a simulated learning robot. For this purpose, a non-symbolic emotion model was developed that takes the form of a recurrent artificial neural network where emotions both depend on and influence the perception of the state of the world. This emotion model was integrated in a reinforcement-learning architecture with three different roles: influencing perception, providing reinforcement value, and determining when to reevaluate decisions. Experiments to test and compare this emotion-dependent architecture with a more conventional architecture were done in the context of a solitary learning robot performing a survival task. This research led to the conclusion that artificial emotions are a useful construct to have in the domain of behavior-based autonomous agents with multiple goals and faced with an unstructured environment, because they provide a unifying way to tackle different issues of control, analogous to natural systems' emotions.
A maze navigation task.
Overview of the queue-Dyna architecture. The primitive reinforcement learner represents an algorithm like Q-learning. Not shown is the data path allowing the world model to learn to mimic the world.
Sutton's Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. 1 Introduction Many problems faced by an autonomous agent in an unknown environment can be cast in the form of reinforcement learning tasks. Recent work in this area has led to a clearer understanding of the relationship between algorithms found useful for such tasks and asynchronous approaches to dynamic programming (Bertsekas & Tsitsiklis, 1989), and this understanding has led in turn to both new results relevant to the theory of dynamic programming (Barto, Bradtke, & Singh, 1991; Watkins & Dayan, 1991; Williams & Baird, 1990) and the creation of new reinforcement learning algorithms, such as Qlearn...
This paper explores the use of a real-valued modular genetic algorithm to evolve continuous-time recurrent neural networks capable of sequential behavior and learning. We evolve networks that can generate a fixed sequence of outputs in response to an external trigger occurring at varying intervals of time. We also evolve networks that can learn to generate one of a set of possible sequences based upon reinforcement from the environment. Finally, we utilize concepts from dynamical systems theory to understand the operation of some of these evolved networks. A novel feature of our approach is that we assume neither an a priori discretization of states or time nor an a priori learning algorithm that explicitly modifies network parameters during learning. Rather, we merely expose dynamical neural networks to tasks that require sequential behavior and learning and allow the genetic algorithm to evolve network dynamics capable of accomplishing these tasks. 2 1. Introduction Much of the rec...
If behavior networks, which use spreading activation to select actions, are analogous to connectionist methods of pattern recognition, then we suggest that recurrent behavior networks, which use energy minimization, are analogous to Hopfield networks. Hopfield networks memorize patterns by making them attractors. We argue that, similarly, each behavior of a recurrent behavior network should be an attractor of the network, to inhibit fruitless, repeated switching between different behaviors in response to small changes in the environment and in motivations. We demonstrate that the performance in a test domain of the Do the Right Thing recurrent behavior network is improved by redesigning it to create desirable attractors and basins of attraction. We further show that this performance increase is correlated with an increase in persistence and a decrease in undesirable behavior-switching.
We explore the use of behavior-based architectures within the context of reinforcement learning and examine the effects of using different behavior-based architectures on the ability to learn the task at hand correctly and efficiently. In particular, we study the task of learning to push boxes in a simulated 2D environment originally proposed by Mahadevan and Connell [Mahadevan and Connell, 1992]. We examine issues such as effectiveness of learning, flexibility of the learning method to adapt to new environments, effect of the behavior architecture on the ability to learn, and we report results obtained on a large number of simulation runs. Keywords: Reinforcement learning, behavior-based architectures, robot learning. 1 Introduction Behavior-based architectures [Brooks, 1986] are extremely popular for robotics. In this paper we examine the use of behavior-based architectures within the context of reinforcement learning and examine the effects of using different behavior-base...
The top part of the figure represents the behavior of a typical evolved individual in its environment. Lines represent walls, empty and full circles represent the original and the final position of the target objects respectively, the trace on the terrain represents the trajectory of the robot. The bottom part of the figure represents the type of object currently perceived, the state of the motor, and the state of the sensors throughout time for 500 cycles respectively. The 'W/T' graph shows whether the robot is currently perceiving a wall (top line), a target (bottom line), or nothing (no line). The 'LM', 'RM,' 'PU', and 'RL' graphs show the state of the motors (left and right motors, pick-up and release procedures, respectively). For each motor, in the top part of the graph the activation state is indicated (after the arbitration between component modules has been performed by the selector neurons) and in the bottom part which of the two competing neural modules has control is indicated (the thickness of the line at the bottom indicates whether the first or the second module has control: thin line segment = module 1; thick line segment = module 2). The graphs 'I0' to 'I5' show the state of the 6 infrared sensors. Finally, the 'LB' graph shows the state of the light-barrier sensor. The activation state of sensor and motor neurons is represented by the height with respect to the baseline (in the case of motor neurons the activation state of the output neurons of the module that currently have the control is shown).
A new way of building control systems, known as behavior based robotics, has recently been proposed to overcome the difficulties of the traditional AI approach to robotics. This new approach is based upon the idea of providing the robot with a range of simple behaviors and letting the environment determine which behavior should have control at any given time. We will present a set of experiments in which neural networks with different architectures have been trained to control a mobile robot designed to keep an arena clear by picking-up trash objects and releasing them outside the arena. Controller weights are selected using a form of genetic algorithm and do not change during the lifetime (i.e. no learning occurs). We will compare, in simulation and on a real robot, five different network architectures and will show that a network which allows for fine-grained modularity achieves significantly better performance. By comparing the functionality of each network module and its interactio...
Symbiotic, Adaptive Neuro-Evolution (SANE). The population consists of hidden neurons,
The Enforced Sub-Populations Method (ESP). The population of neurons is segregated into sub-populations shown here as clusters of circles. The network is formed by randomly selecting one neuron from each subpopulation.
The Cauchy distribution for = 0:5. Most of the -values represent small modiications to the best solution, but large values are also possible. -chromosomes are added to the best solution to form the new best solution for the next iteration of the Delta phase. Delta-Coding was developed by Whitley et al.(1991) to enhance the ne local tuning capability of Genetic Algorithms for numerical optimization. However, its potential for adaptive behavior lies in the facilitation of task transfer. Delta-Coding provides a mechanism for transitioning the evolution into each progressively more demanding task:  
Performance of direct and incremental evolution in the Prey Capture task. The maximum tness per generation is plotted for each of the two approaches. The direct evolution (bottom plot) makes slight progress at rst but stalls after about 20 generations. The plot is an average of 10 simulations. Incremental evolution, however, proceeds through several task transitions (seen as abrupt dropoos in the plot), and eventually solves the goal-task. The incremental plot is an average of 5 simulations. Each of these included a diierent number of generations for each evaluation-task, so time was stretched or shrank for each so that the transitions could be lined up. 5.3 Incremental Evolution  
Several researchers have demonstrated how complex behavior can be learned through neuro-evolution (i.e. evolving neural networks with genetic algorithms). However, complex general behavior such as evading predators or avoiding obstacles, which is not tied to specific environments, turns out to be very difficult to evolve. Often the system discovers mechanical strategies (such as moving back and forth) that help the agent cope, but are not very effective, do not appear believable and would not generalize to new environments. The problem is that a general strategy is too difficult for the evolution system to discover directly. This paper proposes an approach where such complex general behavior is learned incrementally, by starting with simpler behavior and gradually making the task more challenging and general. The task transitions are implemented through successive stages of delta-coding (i.e. evolving modifications), which allows even converged populations to adapt to the new task. The...
The paper describes simulations on populations of neural networks that both evolve at the population level and learn at the individual level. Unlike other simulations, the evolutionary task (finding food in the environment) and the learning task (predicting the next position of food on the basis of present position and planned network's movement) are different tasks. In these conditions both learning influences evolution (without Lamarckian inheritance of learned weight changes) and evolution influences learning. Average but not peak fitness has a better evolutionary growth with learning than without learning. After the initial generations individuals that learn to predict during life also improve their food finding ability during life. Furthermore, individuals which inherit an innate capacity to find food also inherit an innate predisposition to learn to predict the sensory consequences of their movements. They do not predict better at birth but they do learn to predict bett...
this paper (Parisi, Nolfi, & Cecconi, 1992). The performance of the elite did not improve when lifetime learning of the second task was introduced, whereas average performance did improve. It seems clear that the effect of lifetime learning was merely to go some way towards restoring performance of networks which had had their weights perturbed (by mutation) away from trained (through evolution) values --- a form of relearning. The extreme convergence of the population around the clustered elite members of the previous generation should be borne in mind when reading from (Nolfi et al., 1994), p. 22: The offspring of a reproducing individual occupy initial positions in weight space that are deviations (due to mutations) from the position occupied by their parent at birth (i.e., prior to learning). One form of relearning in networks was analysed in (Hinton & Plaut, 1987). In that case a network is first trained by some learning algorithm on a set of input/output pairs; the weights are then perturbed. After retraining on a subset of the original training set, it is found that performance improves also on the balance of the original training set. The present case differs from this, in that the lifetime learning is on a fresh task, rather than on a subset of the original task. Recently just such an effect was predicted and observed in networks (Harvey & Stone, 1995). When good performance on one task is degraded by random perturbations of the weights, then in general training on any unrelated second task can be expected to improve, at least initially, the performance on the first task. C P Q B B A 1 2 Figure 1: A two-dimensional sketch of weight space. To briefly summarise the reasons for this, consider the diagram, which represents the weight space of a network in just 2 ...
We present a novel evolutionary approach to robotic control of a real robot based on genetic programming (GP). Our approach uses genetic programming techniques that manipulate machine code to evolve control programs for robots. This variant of GP has several advantages over a conventional GP system, such as higher speed, lower memory requirements and better real time properties. Previous attempts to apply GP in robotics use simulations to evaluate control programs and have difficulties with learning tasks involving a real robot. We present an on-line control method that is evaluated in two different physical environments and applied to two tasks using the Khepera robot platform: obstacle avoidance and object following. The results show fast learning and good generalization. 1 Introduction Autonomous robots or agents have a large potential for the future. There are many situations where they could relieve humans from dangerous, difficult or monotone tasks. We are convinced that many...
Comparison of evidence grids built using (a) raw sonar and (b) laser-limited sonar
Exploration and localization are two of the capabilities necessary for mobile robots to navigate robustly in unknown environments. A robot needs to explore in order to learn the structure of the world, and a robot needs to know its own location in order to make use of its acquired spatial information. However, a problem arises with the integration of exploration and localization. A robot needs to know its own location in order to add new information to its map, but a robot may also need a map to determine its own location. We have addressed this problem with ARIEL, a mobile robot system that combines frontier-based exploration with continuous localization. ARIEL is capable of exploring and mapping an unknown environment while maintaining an accurate estimate of its position at all times. In this paper, we describe frontier-based exploration and continuous localization, and we explain how ARIEL integrates these techniques. Then we show results from experiments performed in the explorati...
Top-cited authors
Stefano Nolfi
  • Italian National Research Council
Owen Holland
  • University of Sussex
Domenico Parisi
  • Italian National Research Council
Ezequiel Di Paolo
  • Ikerbasque - Basque Foundation for Science
Peter Stone
  • University of Texas at Austin