Conference Paper

Multisensory Object Discovery via Self-detection and Artificial Attention

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We address self-perception and object discovery by integrating multimodal tactile, proprioceptive and visual cues. Considering sensory signals as the only way to obtain relevant information about the environment, we enable a humanoid robot to infer potential usable objects relating visual self-detection with tactile cues. Hierarchical Bayesian models are combined with signal processing and protoobject artificial attention to tackle the problem. Results show that the robot is able to: (1) discern between inbody and outbody sources without using markers or simplified segmentation; (2) accurately discover objects in the reaching space; and (3) discriminate real objects from visual artefacts, aiding scene understanding. Furthermore, this approach reveals the need for several layers of abstraction for achieving agency and causality due to the inherent ambiguity of the sensory cues.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This approach has shown generalization capabilities for multiple tool models[31]. Finally, Dynamical Bayesian Networks (DBN) have been proposed for inferring the parts of the body that belong to the robot exploiting the multimodal correlations[32], i.e., self-detection[33]. In[34], an extra simplified DBN model was proposed to differentiate own body from the others and we recently presented a hierarchical Bayesian model that relates the spatio-temporal sensory signals using a more plausible visual attention system[32]. ...
... Finally, Dynamical Bayesian Networks (DBN) have been proposed for inferring the parts of the body that belong to the robot exploiting the multimodal correlations[32], i.e., self-detection[33]. In[34], an extra simplified DBN model was proposed to differentiate own body from the others and we recently presented a hierarchical Bayesian model that relates the spatio-temporal sensory signals using a more plausible visual attention system[32]. ...
... Where to store the information depending on the followed representational approach and how to retrieve the already learned information. Evaluating the works presented inTable I, an enactive robot should count with: the multivalued learning provided by[31]or[19], the efficiency and adaptability of[27], the inter-modal learning as[20],[7], incremental refinement of the model using new knowledge[36],[50], multisensory selfdetection and causality inference as[32], be able to progressively switch from chaotic generators to goal-babbling[25], and incorporate self-perception learning with social cues[46],[51]. The design of the algorithm should also enforce knowledge reusing[52], although in some cases this can be computationally expensive. ...
Conference Paper
Full-text available
In this paper we discuss the enactive self from a computational point of view and study the suitability of current methods to instantiate it onto robots. As an assumption, we consider any cognitive agent as an autonomous system that constructs its identity by continuous interaction with the environment. We start examining algorithms to learn the body-schema and to enable tool-extension, and we finalise by studying their viability for generalizing the enactive self computational model. This paper points out promising techniques for bodily self-modelling and exploration, as well as formally link sensorimotor models with differential kinematics. Although the study is restricted to basic sensorimotor construction of the self, some of the analysed works also traverse into more complex self constructions with a social component. Furthermore, we discuss the main gaps of current engineering approaches for modelling enactive robots and describe the main characteristics that a synthetic sensorimotor self-model should present.
... During the last few years, roboticists have been looking for building machines that, whenever they are turned on for the first time, learn how to interact with the environment by means of their sensorimotor experience [1], [2], [3], [4]. We envisage that, as in humans, this mechanism is the key for adaptability, since they will be able to relearn when unexpected changes appear using the same machinery [5]. ...
... It is still unknown and even controversial how to get from sensor information to self-awareness, causality, semantic interpretation and agency attribution. In this sense, the ability of self-perception and the capacity to learn the body schema seems to be one of the core processes involved [4], [6]. ...
... Here we assume that we can coherently track the object O along the time. The probability of usability only rises when there is a sensory link (touching) and causality (arm moves → object moves) [4]. Furthermore, the probability of being usable decreases when the object belongs to the robot. . ...
Article
Full-text available
We address self-perception in robots as the key for world understanding and causality interpretation. We present a self-perception mechanism that enables a humanoid robot to understand certain sensory changes caused by naive actions during interaction with objects. Visual, proprioceptive and tactile cues are combined via artificial attention and probabilistic reasoning to permit the robot to discern between inbody and outbody sources in the scene.With that support and exploiting inter-modal sensory contingencies, the robot can infer simple concepts such as discovering potential "usable" objects. Theoretically and through experimentation with a real humanoid robot, we show how self-perception is a backdrop ability for high order cognitive skills. Moreover, we present a novel model for self-detection, which does not need to track the body parts. Furthermore, results show that the proposed approach successfully discovers objects in the reaching space improving scene understanding by discriminating real objects from visual artefacts.
... Gold and Scassellati [31] employ probabilistic reasoning about possible causes of the movement, calculating the likelihoods of dynamic Bayesian models. A similar approach was proposed in [41,42], where the notion of body control was extended to sensorimotor contingencies: "this is my arm not only because I am sending the command to move it but also because I sense the consequences of moving it". All these exploited the spatio-temporal contingency, related to the sense of agency. ...
Article
Self-recognition or self-awareness is a capacity attributed typically only to humans and few other species. The definitions of these concepts vary and little is known about the mechanisms behind them. However, there is a Turing test-like benchmark: the mirror self-recognition, which consists in covertly putting a mark on the face of the tested subject, placing her in front of a mirror, and observing the reactions. In this work, first, we provide a mechanistic decomposition, or process model, of what components are required to pass this test. Based on these, we provide suggestions for empirical research. In particular, in our view, the way the infants or animals reach for the mark should be studied in detail. Second, we develop a model to enable the humanoid robot Nao to pass the test. The core of our technical contribution is learning the appearance representation and visual novelty detection by means of learning the generative model of the face with deep auto-encoders and exploiting the prediction error. The mark is identified as a salient region on the face and reaching action is triggered, relying on a previously learned mapping to arm joint angles. The architecture is tested on two robots with completely different face.
... Dynamic Hebbian learning has also been proposed for obtaining intermodal forward models in [23]. Body model free visual detection [24] has been approached as an intermodal inference problem but it is restricted to the camera view of the robot. ...
Book
Full-text available
Humanoid robots are highly sophisticated machines equipped with human-like sensory and motor capabilities. Today we are on the verge of a new era of rapid transformations in both science and engineering—one that brings together technological advancements in a way that will accelerate both neuroscience and robotics. Humanoid Robotics and Neuroscience: Science, Engineering and Society presents the contributions of prominent scientists who explore key aspects of the further potential of these systems. Topics include: Neuroscientific research findings on dexterous robotic hand control Humanoid vision and how understanding the structure of the human eye can lead to improvements in artificial vision Humanoid locomotion, motor control, and the learning of motor skills Cognitive elements of humanoid robots, including the neuroscientific aspects of imitation and development The impact of robots on society and the potential for developing new systems and devices to benefit humans The use of humanoid robotics can help us develop a greater scientific understanding of humans, leading to the design of better engineered systems and machines for society. This book assembles the work of scientists on the cutting edge of robotic research who demonstrate the vast possibilities in this field of research.
Article
Full-text available
The network between the parietal cortex and premotor cortex has a pivotal role in sensory-motor control. Grasping-related neurons in the anterior intraparietal area (AIP) and the ventral premotor cortex (F5) showed complementary properties each other. The object information for grasping is sent from the parietal cortex to the premotor cortex for sensory-motor transformation, and the backward signal from the premotor cortex to parietal cortex can be considered an efference copy/corollary discharge that is used to predict sensory outcome during motor behavior. Mirror neurons that represent both own action and other's action are involved in this system. This system also very well fits with body schema that reflects online state of the body during motor execution. We speculate that the parieto-premotor network, which includes the mirror neuron system, is key for mapping one's own body and the bodies of others. This means that the neuronal substrates that control one's own action and the mirror neuron system are shared with the "who" system, which is related to the recognition of action contribution, i.e., sense of agency. Representation of own and other's body in the parieto-premotor network is key to link between sensory-motor control and higher-order cognitive functions.
Conference Paper
Full-text available
In this paper we present proof-of-concept for a novel solution consisting of a short-term 3D memory for artificial attention systems, loosely inspired in perceptual processes believed to be implemented in the human brain. Our solution supports the implementation of multisen-sory perception and stimulus-driven processes of attention. For this purpose , it provides (1) knowledge persistence with temporal coherence tackling potential salient regions outside the field of view, via a panoramic, log-spherical inference grid; (2) prediction, by using estimates of local 3D velocity to anticipate the effect of scene dynamics; (3) spatial correspondence between volumetric cells potentially occupied by proto-objects and their corresponding multisensory saliency scores. Visual and auditory signals are processed to extract features that are then filtered by a proto-object segmentation module that employs colour and depth as discriminatory traits. We consider as features, apart from the commonly used colour and intensity contrast, colour bias, the presence of faces, scene dynamics and also loud auditory sources. Combining conspicuity maps derived from these features we obtain a 2D saliency map, which is then processed using the probability of occupancy in the scene to construct the final 3D saliency map as an additional layer of the Bayesian Volumetric Map (BVM) inference grid.
Article
Full-text available
Artificial vision systems cannot process all the information that they receive from the world in real time because it is highly expensive and inefficient in terms of computational cost. Inspired by biological perception systems, artificial attention models pursuit to select only the relevant part of the scene. On human vision, it is also well established that these units of attention are not merely spatial but closely related to perceptual objects (proto-objects). This implies a strong bidirectional relationship between segmentation and attention processes. While the segmentation process is the responsible to extract the proto-objects from the scene, attention can guide segmentation, arising the concept of foveal attention. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. In this paper, a bottom-up foveal attention model is presented. In this model the input image is a foveal image represented using a Cartesian Foveal Geometry (CFG), which encodes the field of view of the sensor as a fovea (placed in the focus of attention) surrounded by a set of concentric rings with decreasing resolution. Then multi-resolution perceptual segmentation is performed by building a foveal polygon using the Bounded Irregular Pyramid (BIP). Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image proto-object. Saliency is computed as a linear combination of multiple low level features such as color and intensity contrast, symmetry, orientation and roundness. Obtained results from natural images show that the performance of the combination of hierarchical foveal segmentation and saliency estimation is good in terms of accuracy and speed.
Conference Paper
Full-text available
In order to have robots interact with other agents, it is important that they are able recognize their own actions. The research reported here relates to the use of internal models for self-other distinction. We demonstrate how a humanoid robot, which acquires a sensorimotor scheme through self-exploration, can produce and predict simple trajectories that have particular characteristics. Comparing these predictions to incoming sensory information provides the robot with a basic tool for distinguishing between self and other.
Article
Full-text available
Agency is the sense that I am the cause or author of a movement. Babies develop early this feeling by perceiving the contingency between afferent (sensor) and efferent (motor) information. A comparator model is hypothesized to be associated with many brain regions to monitor and simulate the concordance between self-produced actions and their consequences. In this paper, we propose that the biological mechanism of spike timing-dependent plasticity, that synchronizes the neural dynamics almost everywhere in the central nervous system, constitutes the perfect algorithm to detect contingency in sensorimotor networks. The coherence or the dissonance in the sensorimotor information flow imparts then the agency level. In a head-neck-eyes robot, we replicate three developmental experiments illustrating how particular perceptual experiences can modulate the overall level of agency inside the system; i.e., (1) by adding a delay between proprioceptive and visual feedback information, (2) by facing a mirror, and (3) a person. We show that the system learns to discriminate animated objects (self-image and other persons) from other type of stimuli. This suggests a basic stage representing the self in relation to others from low-level sensorimotor processes. We discuss then the relevance of our findings with neurobiological evidences and development psychological observations for developmental robots.
Conference Paper
Full-text available
One of the most fundamental issues for physical agents (humans, primates, and robots) in performing various kinds of tasks is body representation. Especially during tool-use by monkeys, neurophysiological evidence shows that the representation can be dynamically reconstructed by spatio-temporal integration of different sensor modalities so that it can be adaptive to environmental changes. However, to construct such a representation, an issue to be resolved is how to associate which information among various sensory data. This paper presents a method that constructs cross-modal body representation from vision, touch, and proprioception. Tactile sensation, when the robot touches something, triggers the construction process of the visual receptive field for body parts that can be found by visual attention based on a saliency map and consequently regarded as the end effector. Simultaneously, proprioceptive information is associated with this visual receptive field to achieve the cross-modal body representation. The proposed model is applied to a real robot and results comparable to the activities of parietal neurons observed in monkeys are shown.
Conference Paper
Full-text available
The question of how the mirror neuron system (MNS) develops has attracted increased attention of researchers. Among various hypotheses, a widely accepted model is associative sequence learning, which acquires the MNS as a by-product of sensorimotor learning. The model, however, cannot discriminate self from others since it adopts too much simplified sensory representations. We propose a computational model for early development of the MNS, which is originated in immature vision. The model gradually increases the spatiotemporal resolution of a robot's vision while the robot learns sensorimotor mapping through primal interactions with others. In the early stage of development, the robot interprets all observed actions as equivalent due to a lower resolution, and thus associates the non-differentiated observation with motor commands. As vision develops, the robot starts discriminating actions generated by self from those by others. The initially acquired association is, however, maintained through development, which results in two types of associations: one is between motor commands and self-observation and the other between motor commands and other-observation (i.e., what the MNS does). Our experiments demonstrate that the model achieves early development of the MNS, which enables a robot to imitate others' actions.
Article
Full-text available
This paper proposes cognitive developmental robotics (CDR) as a new principle for the design of humanoid robots. This principle may provide ways of understanding human beings that go beyond the current level of explanation found in the natural and social sciences. Furthermore, a methodological emphasis on humanoid robots in the design of artificial creatures holds promise because they have many degrees of freedom and sense modalities and, thus, must face the challenges of scalability that are often side-stepped in simpler domains. We examine the potential of this new principle as well as issues that are likely to be important to CDR in the future.
Book
Since the term robot (from the Czech or Polish words robota, meaning “labour”, and robotnik, meaning “workman”) was introduced in 1923 and the first steps towards real robotic systems were taken by the early-to-mid-1940s, expectations regarding Robotics have shifted from the development of automatic tools to aid or even replace humans in highly repetitive, simple, but physically demanding tasks, to the emergence of autonomous robots and vehicles, and finally to the development of service and social robots.
Article
Foreword Preface Part I. Principles and Elementary Applications: 1. Plausible reasoning 2. The quantitative rules 3. Elementary sampling theory 4. Elementary hypothesis testing 5. Queer uses for probability theory 6. Elementary parameter estimation 7. The central, Gaussian or normal distribution 8. Sufficiency, ancillarity, and all that 9. Repetitive experiments, probability and frequency 10. Physics of 'random experiments' Part II. Advanced Applications: 11. Discrete prior probabilities, the entropy principle 12. Ignorance priors and transformation groups 13. Decision theory: historical background 14. Simple applications of decision theory 15. Paradoxes of probability theory 16. Orthodox methods: historical background 17. Principles and pathology of orthodox statistics 18. The Ap distribution and rule of succession 19. Physical measurements 20. Model comparison 21. Outliers and robustness 22. Introduction to communication theory References Appendix A. Other approaches to probability theory Appendix B. Mathematical formalities and style Appendix C. Convolutions and cumulants.
Book
Probabilistic Reasoning and Decision Making in Sensory-Motor Systems by Pierre Bessiere, Christian Laugier and Roland Siegwart provides a unique collection of a sizable segment of the cognitive systems research community in Europe. It reports on contributions from leading academic institutions brought together within the European projects Bayesian Inspired Brain and Artifact (BIBA) and Bayesian Approach to Cognitive Systems (BACS). This fourteen-chapter volume covers important research along two main lines: new probabilistic models and algorithms for perception and action, new probabilistic methodology and techniques for artefact conception and development. The work addresses key issues concerned with Bayesian programming, navigation, filtering, modelling and mapping, with applications in a number of different contexts.
Article
the comparative analysis of when and how animals become aware of themselves has at least 2 levels of focus / one [the categorical self] has to do with a conceptual awareness / it implies some representational memory and is reflected in some self-referencing behavior that discriminates unique features of the individual / the other [the existential self] has to do with a perceptual sensitivity / it implies some detection mechanism and is reflected in some self-referencing behavior that discriminates self from nonself at least momentarily / [consider] how infants detect their existential selves 2 time-based algorithms for self-detection / 4 options for detecting contingency / distinction between detecting contingency and seeking or avoiding it / speculation about deviant self-seeking in infantile autism and Rett syndrome (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This paper formulates five basic principles of developmental robotics. These principles are formulated based on some of the recurring themes in the developmental learning literature and in the author's own research. The five principles follow logically from the verification principle (postulated by Richard Sutton) which is assumed to be self-evident. This paper also gives an example of how these principles can be applied to the problem of autonomous tool use in robots.
Article
In this paper, we present a new generation of active tactile modules (i.e., HEX-O-SKIN), which are developed in order to approach multimodal whole-body-touch sensation for humanoid robots. To better perform like humans, humanoid robots need the variety of different sensory modalities in order to interact with their environment. This calls for certain robustness and fault tolerance as well as an intelligent solution to connect the different sensory modalities to the robot. Each HEX-O-SKIN is a small hexagonal printed circuit board equipped with multiple discrete sensors for temperature, acceleration, and proximity. With these sensors, we emulate the human sense of temperature, vibration, and light touch. Off-the-shelf sensors were utilized to speed up our development cycle; however, in general, we can easily extend our design with new discrete sensors, thereby making it flexible for further exploration. A local controller on each HEX-O-SKIN preprocesses the sensor signals and actively routes data through a network of modules toward the closest PC connection. Local processing decreases the necessary network and high-level processing bandwidth, while a local analog-to-digital conversion and digital-data transfers are less sensitive to electromagnetic interference. With an active data-routing scheme, it is also possible to reroute the data around broken connections-yielding robustness throughout the global structure while minimizing wirings. To support our approach, multiple HEX-O-SKIN are embedded into a rapid-prototyped elastomer skin material and redundantly connected to neighboring modules by just four ports. The wiring complexity is shifted to each HEX-O-SKIN such that a power and data connection between two modules is reduced to four noncrossing wires. Thus, only a very simple robot-specific base frame is needed to support and wire the HEX-O-SKIN to a robot. The potential of our multimodal sensor modules is demonstrated experimentally on a robot platform.
Article
Using the probabilistic methods outlined in this paper, a robot can learn to recognize its own motor-controlled body parts, or their mirror reflections, without prior knowledge of their appearance. For each item in its visual field, the robot calculates the likelihoods of each of three dynamic Bayesian models, corresponding to the categories of “self”, “animate other”, or “inanimate”. Each model fully incorporates the object’s entire motion history and the robot’s whole motor history in constant update time, via the forward algorithm. The parameters for each model are learned in an unsupervised fashion as the robot experiments with its arm over a period of four minutes. The robot demonstrated robust recognition of its mirror image, while classifying the nearby experimenter as “animate other”, across 20 experiments. Adversarial experiments, in which a subject mirrored the robot’s motion showed that as long as the robot had seen the subject move for as little as 5 s before mirroring, the evidence was “remembered” across a full minute of mimicry.
Article
This paper addresses the problem of self-detection by a robot. The paper describes a methodology for autonomous learning of the characteristic delay between motor commands (efferent signals) and observed movements of visual stimuli (afferent signals). The robot estimates its own efferent-afferent delay from self-observation data gathered while performing motor babbling, i.e., random rhythmic movements similar to the primary circular reactions described by Piaget. After the efferent-afferent delay is estimated, the robot imprints on that delay and can later use it to successfully classify visual stimuli as either “self” or “other.” Results from robot experiments performed in environments with increasing degrees of difficulty are reported.
Article
Many current neurophysiological, psychophysical, and psychological approaches to vision rest on the idea that when we see, the brain produces an internal representation of the world. The activation of this internal representation is assumed to give rise to the experience of seeing. The problem with this kind of approach is that it leaves unexplained how the existence of such a detailed internal representation might produce visual consciousness. An alternative proposal is made here. We propose that seeing is a way of acting. It is a particular way of exploring the environment. Activity in internal representations does not generate the experience of seeing. The outside world serves as its own, external, representation. The experience of seeing occurs when the organism masters what we call the governing laws of sensorimotor contingency. The advantage of this approach is that it provides a natural and principled way of accounting for visual consciousness, and for the differences in the perceived quality of sensory experience in the different sensory modalities. Several lines of empirical evidence are brought forward in support of the theory, in particular: evidence from experiments in sensorimotor adaptation, visual "filling in," visual stability despite eye movements, change blindness, sensory substitution, and color perception.
Article
In this review we discuss how we are aware that actions are self-generated. We review behavioural data that suggest that a prediction of the sensory consequences of movement might be used to label actions and their consequences as self-generated. We also describe recent functional neuroimaging experiments and studies of neurological and psychiatric patients, which suggest that the parietal cortex plays a crucial role in the awareness of action.
Article
Experimentation is crucial to human progress at all scales, from society as a whole to a young infant in its cradle. It allows us to elicit learning episodes suited to our own needs and limitations. This paper develops active strategies for a robot to acquire visual experience through simple experimental manipulation. The experiments are oriented towards determining what parts of the environment are physically coherent--that is, which parts will move together, and which are more or less independent. We argue that following causal chains of events out from the robot's body into the environment allows for a very natural developmental progression of visual competence, and relate this idea to results in neuroscience.
Self-awareness in animals and humans: Developmental perspectives
  • J S Watson
J. S. Watson, "Detection of self: The perfect algorithm," Self-awareness in animals and humans: Developmental perspectives, pp. 131-148, 1994.
Designing an artificial attention system for social robots
  • P Lanillos
  • J F Ferreira
  • J Dias
P. Lanillos, J. F. Ferreira, and J. Dias, "Designing an artificial attention system for social robots," in Intelligent Robots and Systems (IROS), IEEE/RSJ Int. Conf. on, 2015, pp. 4171-4178.
Visual attention by saliency leads cross-modal body representation
  • M Hikita
  • S Fuke
  • M Ogino
  • T Minato
  • M Asada
M. Hikita, S. Fuke, M. Ogino, T. Minato, and M. Asada, "Visual attention by saliency leads cross-modal body representation," in Development and Learning, (ICDL), IEEE Int. Conf. on, 2008, pp. 157-162.