Conference PaperPDF Available

An Integrated, Modular Framework for Computer Vision and Cognitive Robotics Research (icVision)

  • LYRO Robotics

Abstract and Figures

We present an easy-to-use, modular framework for performing computer vision related tasks in support of cognitive robotics research on the iCub humanoid robot. The aim of this biologically inspired, bottom-up architecture is to facilitate research towards visual perception and cognition processes, especially their influence on robotic object manipulation and environment interaction. The icVision framework described provides capabilities for detection of objects in the 2D image plane and locate those objects in 3D space to facilitate the creation of a world model.
Content may be subject to copyright.
An Integrated, Modular Framework for
Computer Vision and Cognitive Robotics
Research (icVision)
urgen Leitner, Simon Harding, Mikhail Frank, Alexander F¨
orster and J¨
Abstract We present an easy-to-use, modular framework for performing computer
vision related tasks in support of cognitive robotics research on the iCub humanoid
robot. The aim of this biologically inspired, bottom-up architecture is to facilitate re-
search towards visual perception and cognition processes, especially their influence
on robotic object manipulation and environment interaction. The icVision frame-
work described provides capabilities for detection of objects in the 2D image plane
and locate those objects in 3D space to facilitate the creation of a world model.
1 Introduction
Vision and the visual system are the focus of much research in psychology, cogni-
tive science, neuroscience and biology. A major issue in visual perception is that
what individuals ‘see’ is not just a simple translation of input stimuli (compare op-
tical illusions). The research of Marr in the 1970s led to a theory of vision using
different levels of abstraction [11]. He described human vision as processing inputs,
stemming from a two-dimensional visual array (on the retina), to build a three-
dimensional description of the world as output. For this he defines three levels: a
2D (or primal) sketch of the scene (using feature extraction), a sketch of the scene
using textures to provide more information, and finally a 3D model.
Visual perception is of critical importance, as the sensory feedback allows to
make decisions, trigger certain behaviours, and adapt these to the current situation.
This is not just the case for humans, but also for autonomous robots. The visual
feedback enables robots to build up a cognitive mapping between sensory inputs
urgen Leitner, Mikhail Frank, Alexander F¨
orster and J¨
urgen Schmidhuber
Dalle Molle Institute for Artificial Intelligence (IDSIA), USI/SUPSI, Lugano, Switzerland
Simon Harding
Machine Intelligence, Ltd UK, e-mail:
2 Leitner et al.
and action outputs, therefore closing the sensorimotor loop. Thus being able to per-
form actions and adapt to dynamic environments. We are aiming to build a visual
perception system for robots, based on human vision, that allows to provide this
feedback leading to more autonomous and adaptive behaviours.
Our research platform is the open-system humanoid robot iCub [17] developed
within the EU funded ‘RobotCub’ project. In our setup, as shown in Figure 1 (left), it
consists of two anthropomorphic arms, a head and a torso and is roughly the size of
a human child. The iCub was designed for object manipulation research. It also is an
excellent experimental, high degree-of-freedom (DOF) platform for artificial (and
human) cognition research and embodied artificial intelligence (AI) development
[13]. To localise objects in the environment the iCub has to rely solely, similarly
to human perception, on a visual system based on stereo vision. The two cameras
are mounted in the head. Their pan and tilt can jointly be controlled, with vergence
providing a third DOF. The neck provides 3 more DOF for gazing.
We describe a framework, named icVision, supporting the learning of hand-
eye coordination and object manipulation, by solving visual perception issues in
a biologically-inspired way.
2 The icVision Framework
Research on perception has been an active component of developing artificial vision
(or computer vision) systems, in industry and robotics. Our humanoid robot should
be able, like the human mind, learn to perceive objects and develop a representation
that allows it to detect this object again. The goal is to enable adaptive, autonomous
behaviours based on visual feedback by combining robot learning approaches (AI
and machine learning (ML) techniques), with computer vision.
This framework was developed to build a biologically-inspired architecture (in-
line with the description by Marr). It processes the visual inputs received by the
cameras and builds (internal) representations of objects. It facilitates the 3D locali-
sation of the detected objects in the 2D image plane and provides this information
Fig. 1 Left: The iCub humanoid robot. Right: Architecture of the icVision framework.
The icVision Framework 3
to other systems (e.g. motion planner). Figure 1 (right) sketches the icVision ar-
chitecture. The system consists of distributed YARP modules1interacting with the
iCub hardware and each other. The specialised modules can be connected and form
pathways to perform, for example, object detection, similarly to the hierarchies in
human perception in the visual cortex (V1, V2, V3, ...) [6].
The main module, the icVision Core, handles the connection with the hard-
ware and provides housekeeping functionality (e.g., GUI, module start/stop). Im-
plemented modules include object detection (aka filters), 3D localisation and a gaze
controller interface based on the position of the object in the image plane (as pro-
vided by the filters). These are reachable via standardised interfaces allowing for
easy swapping and reuse of modules and extending functionality. For example, other
brain-inspired modules, a saliency & a disparity map, have recently been added.
2.1 Detecting Objects (icVision Filter)
The first thing done by the human visual system, and investigated by us, is the seg-
mentation (detection) in the visual space (the 2D images). There exists a vast body
of work on all aspects of image processing [3], using both classical and machine
learning approaches. The icVision filter modules, which relate to Marr’s first and
second level, provide object detection in the images. As input the filter module pro-
vides the camera image in grayscale and split into RGB and HSV channels. The
result of the filter is a binary segmentation of the camera image for a specific object.
Figure 2 shows a tea box being tracked by the iCub in real-time using a learned
filter. Also in Figure 3 the binary output can be seen.
Using machine learning, more complicated filters can be generated automatically
instead of engineered. We apply Cartesian Genetic Programming (CGP) [14, 15] to
provide automatic generation of computer programs making use of the functional-
ity integrated in the OpenCV image processing library [1], therefore incorporating
domain knowledge. It provides an effective method to learn new object detection
algorithms that are robust, if the training set is chosen correctly [4].
Fig. 2 The detection of a tea box in changing lighting condition performed by a learned filter. The
binary output of the filter is used as red overlay.
1YARP [12] is a middleware that allows easy, distributed access to the sensors and actuators of the
iCub humanoid robot, as well as, to other software modules.
4 Leitner et al.
2.2 Locating Objects (icVision 3D)
To enable the robot to interact with the environment it is important to localise the
object first. Developing an approach to perform robust localisation to be deployed
on a real humanoid robot is necessary to provide the necessary inputs for on-line
motion planning, reaching, and object manipulation.
The icVision 3D localisation module is one of the modules provided by the core
framework. It allows for conversion between camera image coordinates and 3D co-
ordinates in the robot reference frame. Using the objects location in the cameras
(provided by an icVision Filter module) and pose information from the hardware,
this module calculates where the object is in the world. This information is then
used to update the world model. Figure 3 describes the full 3D location estimation
process, starting with the camera images received from the robot and ending with
the localised object being placed in our MoBeE world model [2].
Stereo Vision describes the extraction of 3D information out of digital images
and is similar to the biological process of stereopsis in humans. Its basic principle is
the comparison of images taken of the same scene from different viewpoints. To ob-
tain a distance measure the relative displacement of a pixel between the two images
is used [5]. While these approaches, based on projective geometry, have been proven
effective under carefully controlled experimental circumstances, they are not easily
transferred to robotics applications. For the iCub platform several approaches have
previously been developed. The ‘Cartesian controller module’ [16], for example,
provides basic 3D position estimation functionality and gaze control. This module
works well on the simulated iCub, however it is not fully supported and functional
on the hardware platform, and therefore does not perform well. The most accu-
rate currently available localisation module for the iCub exists in the ‘stereoVision’
module. It provides accuracy in the range of a few centimetres.
The icVision 3D localisation module provides an easy way of swapping between
various localisation implementations, including the two mentioned. We also provide
an implementation estimating the location using machine learning [8, 10].
3 Use Cases and Current Application of the Framework
This framework has already successfully been used in our research. Here we give a
short list of use cases for the icVision framework.
The full system has been used together with a reactive controller to enable the
iCub to autonomously re-plan a motion to avoid an object it sees [2]. The object is
placed into the world model purely from vision, it is able to update the position of
the object in real-time, even while the robot is moving.
The learning of specific filters for certain objects was done using CGP (as men-
tioned above) [4, 9]. To allow for a more autonomous acquisition of object represen-
tations icVision filters are learned from objects perceived by the cameras. By using
a saliency map and standard feature detectors, we were able to provide the needed
The icVision Framework 5
Fig. 3 The 3D location estimation works
the following: At first the camera im-
ages are acquired from the hardware
via YARP. The images are converted
into grayscale, as well as, split into
RGB/HSV channels and distributed to all
active icVision filters.
Each filter then processes the images re-
ceived using OpenCV functions. (ending
with a thresholding operation). The out-
put of this is a binary image, segmenting
the object to be localised.
A blob detection algorithm is run on
these binary images to find the (centre)
location of the detected object in the im-
age frame.
The position of the object in both the
right and left camera images is sent to the
3D localisation module, where together
with the robots pose, i.e. the joint en-
coders, a 3D location estimation is gen-
As the last step the localised object is
then placed in the existing world model.
inputs to our CGP learner for building robust filters [7]. We are in the process of
learning filters for the robot’s fingers to perform research in how the humanoid can
develop sensorimotor control.
To add our 3D localisation approach to the framework, we used a Katana robotic
arm to teach the iCub how to perceive the location of the objects it sees. The Katana
positions an object within the shared workspace, and informs the iCub about the
location. The iCub then moves to observe the object from various angles and poses.
Its pose and the 2D position outputs provided by an icVision filter are used to train
artificial neural networks (ANN) to estimate the object’s Cartesian location. We
show that satisfactory results can be obtained for localisation [10]. Furthermore,
we demonstrate that this task can be accomplished safely using collision avoidance
software to prevent collisions between multiple robots in the same workspace [2].
6 Leitner et al.
4 Conclusions
We combine the current machine learning and computer vision research to build a
biologically-inspired, cognitive framework for the iCub humanoid robot. The de-
veloped icVision framework facilitates the autonomous development of new robot
controllers. Cognition and perception are seen as the foundation to developmental
mechanisms, such as as sensorimotor coordination, intrinsic motivation and hierar-
chical learning, which are investigated on the robotic platform.
The reason for the focus on vision is twofold, firstly the limited sensing capa-
bilities of the robotic platform and secondly, vision is the most important sense
for humans. As we use a humanoid robot investigating how humans do this tasks
of perception, detection and tracking of objects is of interest. These facilitate the
building of a world model, which is used for tasks like motion planning and grasp-
ing. Realtime, incremental learning is applied to further improve perception and the
model of the environment and the robot itself. Learning to grasp and basic hand-eye
coordination are the areas of research this framework is currently applied.
1. Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)
2. Frank, M., et al.: The modular behavioral environment for humanoids and other robots
(mobee). In: Int’l. Conference on Informatics in Control, Automation and Robotics (2012)
3. Gonzalez, R., Richard, E.W.: Digital image processing (2002)
4. Harding, S., Leitner, J., Schmidhuber, J.: Cartesian genetic programming for image process-
ing. In: Genetic Programming Theory and Practice X (to appear). Springer (2012)
5. Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge University
Press (2000)
6. Hubel, D., Wensveen, J., Wick, B.: Eye, brain, and vision. Scientific American Library (1995)
7. Leitner, J., et al.: Autonomous learning of robust visual object detection on a humanoid (2012).
(submitted to IEEE Int’l. Conference on Developmental Learning and Epigenetic Robotics)
8. Leitner, J., et al.: Learning spatial object localisation from vision on a humanoid robot. (sub-
mitted to International Journal of Advanced Robotics Systems) (2012)
9. Leitner, J., et al.: Mars terrain image classification using cartesian genetic programming. In:
International Symposium on Artificial Intelligence, Robotics & Automation in Space (2012)
10. Leitner, J., et al.: Transferring spatial perception between robots operating in a shared
workspace. (submitted to IROS) (2012)
11. Marr, D.: Vision: A Computational Approach. Freeman & Co., San Francisco (1982)
12. Metta, G., Fitzpatrick, P., Natale, L.: YARP: Yet Another Robot Platform. International Jour-
nal of Advanced Robotics Systems 3(1) (2006)
13. Metta, G., et al.: The iCub humanoid robot: An open-systems platform for research in cogni-
tive development. Neural Networks 23(8-9), 1125–1134 (2010)
14. Miller, J.: An empirical study of the efficiency of learning boolean functions using a carte-
sian genetic programming approach. In: Genetic and Evolutionary Computation Conference
15. Miller, J.: Cartesian genetic programming. Cartesian Genetic Programming pp. 17–34 (2011)
16. Pattacini, U.: Modular Cartesian Controllers for Humanoid Robots: Design and Implementa-
tion on the iCub. Ph.D. thesis, RBCS, Italian Institute of Technology, Genova (2011)
17. Tsagarakis, N.G., et al.: iCub: the design and realization of an open humanoid platform for
cognitive and neuroscience research. Advanced Robotics 21, 1151–1175 (2007)
... This makes it both efficient to evolve solutions and run the final programs. Our existing computer vision framework enables this code to be used directly with our robots [32]. ...
... Although it detects the blue cup in all test images it has also a few false positives, due to its simplicity. icImage is a wrapper class for the OpenCV functionality and memory management within our framework [32]. ...
Conference Paper
Full-text available
We propose a method for learning specific object representations that can be applied (and reused) in visual detection and identification tasks. A machine learning technique called Cartesian Genetic Programming (CGP) is used to create these models based on a series of images. Our research investigates how manipulation actions might allow for the development of better visual models and therefore better robot vision. This paper describes how visual object representations can be learned and improved by performing object manipulation actions, such as, poke, push and pick-up with a humanoid robot. The improvement can be measured and allows for the robot to select and perform the ‘right’ action, i.e. the action with the best possible improvement of the detector.
... Our implementation of CGP-IP generates human readable C# or C++ code based on OpenCV functions. The code can be compiled and directly be used with our robots within our computer vision framework [33]. It is typically found that this process reduces the number of used instructions, and hence reduces the execution time of the evolved program. ...
... (19); return node49; } Listing 1. Generated C# code from CGP-IP for detecting the Nao's fingers. icImage is a wrapper class to allow portability within our framework [33]. ...
Conference Paper
Full-text available
Robust object manipulation is still a hard problem in robotics, even more so in high degree-of-freedom (DOF) humanoid robots. To improve performance a closer integration of visual and motor systems is needed. We herein present a novel method for a robot to learn robust detection of its own hands and fingers enabling sensorimotor coordination. It does so solely using its own camera images and does not require any external systems or markers. Our system based on Cartesian Genetic Programming (CGP) allows to evolve programs to perform this image segmentation task in real-time on the real hardware. We show results for a Nao and an iCub humanoid each detecting its own hands and fingers.
... The A* path planning algorithm is an example of a gridbased algorithm [20], [21]. The algorithm starts from the starting point and then gives a cost value for each neighbor grid-point. ...
Full-text available
Path quality and computational time have formed together a well-known trade-off problem for path planning techniques. Due to this trade-off, contributions were usually considering improving only one of the two aspects, either increasing the swiftness as in real-time robotic path planning algorithms or enhancing the path quality as in shortest path query algorithms. Producing a path planning technique that targets both aspects is a challenging problem for robotic systems. However, this paper proposes a novel path planning framework that controls the motion of robotic systems and aims to overcome this traditional trade-off challenge, by targeting both, decreasing the computational time, and improving the path quality represented by the path length and smoothness. The shortest path is obtained by minimizing a novel objective function inspired by the artificial potential field methodology. To accelerate the execution, the Particle Swarm Optimization (PSO) technique is adopted to obtain the optimal solution in a real-time hop-by-hop manner imitating the procedures performed by computer network routing protocols. Testbed experimental results have proven the effectiveness of the proposed technique and showed superior performance over other meta-heuristic optimization techniques and over classical path planning approaches such as A*, D*, and PRM.
... The A* path planning technique is an example of a gridbased scheme [14], [15]. The methodology in finding the shortest path is to assign a cost value to each grid point which reflects the distance from each grid point to the destination. ...
Full-text available
The traditional trade-off between execution speed and path quality has forced real-time robotic path planning algorithms to sacrifice path quality in order to execute in real-time. Producing a path planning algorithm that targets enhancing both, the path quality and swiftness is a challenging problem. However, this paper proposes a novel path planning strategy that aims to break this traditional trade-off, by targeting both, increasing the swiftness, and enhancing the path quality represented by the path length and smoothness. The proposed strategy is based on the observation that most path planning algorithms waste the processing efforts in less critical areas of the map. Therefore, the proposed path planning strategy tends to focus on critical areas such as the areas around obstacles and areas around the goal point, and exhausts the processing power on these critical areas. This is done by neglecting all static obstacles that do not lie between the robot and the destination. For obstacles that intersect with the linear line from the robot to the destination, a basis traditional path planning algorithm such as A*, D* or the Probabilistic RoadMap (PRM) technique is only implemented around the obstacles in order to find a feasible path around each selected obstacle. This procedure would minimize the computational efforts compared to applying the basis algorithm on the whole map. Finally, the path quality is enhanced by finding any linear shortcuts between any two points in the path and fix these shortcuts as the final path from the starting point to the goal point. The proposed path planning strategy was tested on a P3-DX Pioneer mobile robot using a kinematic controller. The experimental results have proven that the path planning strategy was able to show a superior advantage over other path planning techniques in both aspects, computational time (reached up to 97.05% improvement) and path quality (reached up to 16.21% improvement for path length and 98.50% for smoothness).
... One such example is saliency, which models the ability to prioritize visual stimuli based on their features or relevance to the task. As such, saliency is often used to find regions of interest in the scene [Kismet (Breazeal et al. 2001), ARCA-DIA (Bridewell and Bello 2015), DIARC , iCub (Leitner et al. 2013), STAR (Kotseruba 2016)]. Another variation on the saliency map, found in robotic architectures, is ego-sphere. ...
Full-text available
In this paper we present a broad overview of the last 40 years of research on cognitive architectures. To date, the number of existing architectures has reached several hundred, but most of the existing surveys do not reflect this growth and instead focus on a handful of well-established architectures. In this survey we aim to provide a more inclusive and high-level overview of the research on cognitive architectures. Our final set of 84 architectures includes 49 that are still actively developed, and borrow from a diverse set of disciplines, spanning areas from psychoanalysis to neuroscience. To keep the length of this paper within reasonable limits we discuss only the core cognitive abilities, such as perception, attention mechanisms, action selection, memory, learning, reasoning and metareasoning. In order to assess the breadth of practical applications of cognitive architectures we present information on over 900 practical projects implemented using the cognitive architectures in our list. We use various visualization techniques to highlight the overall trends in the development of the field. In addition to summarizing the current state-of-the-art in the cognitive architecture research, this survey describes a variety of methods and ideas that have been tried and their relative success in modeling human cognitive abilities, as well as which aspects of cognitive behavior need more research with respect to their mechanistic counterparts and thus can further inform how cognitive science might progress.
... At IDSIA over the last years I developed the icVision framework [Leitner et al., 2012c[Leitner et al., , 2013b. It aims to be a tool (or more a suite of tools) for an easier development, testing and integration of the on-going computer vision research into the real hardware. ...
Full-text available
Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions.
... Humanoid robot James [15,16] was equipped with two artificial eyes, which can pan and tilt independently (totally 4 DOFs.). Thus, the iCub [17,18] also had two artificial eyes with 3 DOFs, offering viewing and tracking motions. Wang et al. [19] devised a novel humanoid robot eye, which is driven by six pneumatic artificial muscles (PAMs) and rotates with 3 DOFs. ...
Full-text available
A symmetric Kullback-Leibler metric based tracking system, capable of tracking moving targets, is presented for a bionic spherical parallel mechanism to minimize a tracking error function to simulate smooth pursuit of human eyes. More specifically, we propose a real-time moving target tracking algorithm which utilizes spatial histograms taking into account symmetric Kullback-Leibler metric. In the proposed algorithm, the key spatial histograms are extracted and taken into particle filtering framework. Once the target is identified, an image-based control scheme is implemented to drive bionic spherical parallel mechanism such that the identified target is to be tracked at the center of the captured images. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller inspired by the Vestibuloocular Reflex mechanism. The proposed tracking system is designed to make the robot track dynamic objects when the robot travels through transmittable terrains, especially bumpy environment. To perform bumpy-resist capability under the condition of violent attitude variation when the robot works in the bumpy environment mentioned, experimental results demonstrate the effectiveness and robustness of our bioinspired tracking system using bionic spherical parallel mechanism inspired by head-eye coordination.
... We are not the first to use a deep network representation for GP-based image processing. GP-based network systems have already generated successful image segmentation methods [33,23], which are partial inspiration for this work. There are several important differences with our model, however: firstly, we aim for feature extraction for classification; secondly, our system co-evolves both network structure and the component sub-representations simultaneously; and thirdly, our system uses a classifier as a wrapper, meaning that the extracted features need to support a classifier rather than solve the problem directly. ...
Full-text available
We propose an evolutionary feature creator (EFC) to explore a non-linear and offline method for generating features in image recognition tasks. Our model aims at extracting low-level features automatically when provided with an arbitrary image database. In this work, we are concerned with the addition of algorithmic depth to a genetic programming (GP) system, hypothesizing that it will improve the capacity for solving problems that require high-level, hierarchical reasoning. For this we introduce a network superstructure that co-evolves with our low-level GP representations. Two approaches are described: the first uses our previously used "shallow" GP system, the second presents a new "deep" GP system that involves this network superstructure. We evaluate these models against a benchmark object recognition database. Results show that the deep structure outperforms the shallow one in generating features that support classification, and does so without requiring significant additional computational time. Further, high accuracy is achieved on the standard ETH-80 classification task, also outperforming many existing specialized techniques. We conclude that our EFC is capable of data-driven extraction of useful features from an object recognition database.
Full-text available
We describe our software system enabling a tight integration between vision and control modules on complex, high-DOF humanoid robots. This is demonstrated with the iCub humanoid robot performing visual object detection, reaching and grasping actions. A key capability of this system is reactive avoidance of obstacle objects detected from the video stream while carrying out reach-and-grasp tasks. The subsystems of our architecture can independently be improved and updated, for example, we show that by using machine learning techniques we can improve visual perception by collecting images during the robot’s interaction with the environment. We describe the task and software design constraints that led to the layered modular system architecture.
An Image-based visual servo system is presented for a bionic spherical parallel mechanism to minimize a tracking error function to analog smooth pursuit of human eye, which is also called eye-in-hand visual servoing, capable of tracking moving target. More specially, we propose a real-time moving target tracking algorithms which utilizes perceptive image hash based on Discrete Cousin Transform, collaborative in particle filtering framework to achieve automatic moving target detection. In the proposed algorithm, the key geometry position features of the target are extracted to detect and identify the target. Once the target is identified, an image-based control scheme is implemented to drive bionic spherical parallel mechanism such that the identified target is to be tracked at the center of the captured images. Experimental results demonstrate the effectiveness and robustness of our visual servo system for bionic spherical parallel mechanism.
Full-text available
We present a combined machine learning and computer vision approach for robots to localize objects. It allows our iCub humanoid to quickly learn to provide accurate 3D position estimates (in the centimetre range) of objects seen. Biologically inspired approaches, such as Artificial Neural Networks (ANN) and Genetic Programming (GP), are trained to provide these position estimates using the two cameras and the joint encoder readings. No camera calibration or explicit knowledge of the robot’s kinematic model is needed. We find that ANN and GP are not just faster and have lower complexity than traditional techniques, but also learn without the need for extensive calibration procedures. In addition, the approach is localizing objects robustly, when placed in the robot’s workspace at arbitrary positions, even while the robot is moving its torso, head and eyes.
Conference Paper
Full-text available
In this work we introduce a technique for a humanoid robot to autonomously learn the representations of objects within its visual environment. Our approach involves an attention mechanism in association with feature based segmentation that explores the environment and provides object samples for training. These samples are learned for further object identification using Cartesian Genetic Programming (CGP). The learned identification is able to provide robust and fast segmentation of the objects, without using features. We showcase our system and its performance on the iCub humanoid robot.
Full-text available
Combining domain knowledge about both imaging processing and machine learning techniques can expand the abilities of Genetic Programming when used for image processing. We successfully demonstrate our new approach on several different problem domains. We show that the approach is fast, scalable and robust. In addition, by virtue of using off-the-shelf image processing libraries we can generate human readable programs that incorporate sophisticated domain knowledge.
Conference Paper
Full-text available
We use a Katana robotic arm to teach an iCub humanoid robot how to perceive the location of the objects it sees. To do this, the Katana positions an object within the shared workspace, and tells the iCub where it has placed it. While the iCub moves it observes the object, and a neural network then learns how to relate its pose and visual inputs to the object location. We show that satisfactory results can be obtained for localisation even in scenarios where the kine- matic model is imprecise or not available. Furthermore, we demonstrate that this task can be accomplished safely. For this task we extend our collision avoidance software for the iCub to prevent collisions between multiple, independently controlled, heterogeneous robots in the same workspace.
Conference Paper
Full-text available
To produce even the simplest human-like behaviors, a humanoid robot must be able to see, act, and react, within a tightly integrated behavioral control system. Although there exists a rich body of literature in Computer Vision, Path Planning, and Feedback Control, wherein many critical subproblems are addressed individually, most demonstrable behaviors for humanoid robots do not effectively integrate elements from all three disciplines. Consequently, tasks that seem trivial to us humans, such as pick-and-place in an unstructured environment, remain far beyond the state-of-the-art in experimental robotics. We view this primarily as a software engineering problem, and have therefore developed MoBeE, a novel behavioral framework for humanoids and other complex robots, which integrates elements from vision, planning, and control, facilitating the synthesis of autonomous, adaptive behaviors. We communicate the efficacy of MoBeE through several demonstrative experiments. We first develop Adaptive Roadmap Planning by integrating a reactive feedback controller into a roadmap planner. Then, an industrial manipulator teaches a humanoid to localize objects as the two robots operate autonomously in a shared workspace. Finally, an integrated vision, planning, control system is applied to a real-world reaching task using the humanoid robot.
Conference Paper
We are interested in engineering smart machines that enable backtracking of emergent behaviors. Our SSNNS simulator consists of hand-picked tools to explore spiking neural networks in more depth with flexibility. SSNNS is based on the Spike Response ...
In this chapter, we describe the original and most widely known form of Cartesian genetic programming (CGP). CGP encodes computational structures, which we call ‘programs’ in the form of directed acyclic graphs. We refer to this as ‘classic’ CGP. However these program may be computer programs, circuits, rules, or other specialized computational entities.