Conference PaperPDF Available

Towards Spatial Perception: Learning to Locate Objects From Vision

Authors:
  • LYRO Robotics
  • Machine Intelligence Ltd.
Towards Spatial Perception: Learning to Locate Objects From Vision
J¨
urgen Leitner, Simon Harding, Mikhail Frank, Alexander F¨
orster, J¨
urgen Schmidhuber
Dalle Molle Institute for Artificial Intelligence (IDSIA),
Scuola universitaria professionale della Svizzera Italiana (SUPSI), and
Universit´a della Svizzera Italiana (USI), Lugano, Switzerland
juxi@idsia.ch
Abstract
Our humanoid robot learns to provide position estimates of ob-
jects placed on a table, even while the robot is moving its torso,
head and eyes (cm range accuracy). These estimates are pro-
vided by trained artificial neural networks (ANN) and a genetic
programming (GP) method, based solely on the inputs from the
two cameras and the joint encoder positions. No prior camera
calibration and kinematic model is used. We find that ANN and
GP are both able to localise objects robustly regardless of the
robot’s pose and without an explicit kinematic model or camera
calibration. These approaches yield an accuracy comparable to
current techniques used on the iCub.
Index Terms: spatial understanding, object localisation, hu-
manoid robot, neural network, genetic programming
1. Introduction
The majority of robotic systems used nowadays are still mainly
performing pre-programmed automation tasks. In recent years
progress has been made in enabling these robotic systems to
perform more autonomous behaviours. Increasing these capa-
bilities is necessary for future use of robots in interesting set-
tings of daily living, such as, household tasks, grocery shopping
and elderly care. An important step to perform autonomous de-
cisions and actions is to perceive the state of the environment.
Perception though is still a hard problem in robotics.
Weare interested in robust approaches to visual perception,
with applications to object localisation while the robot is con-
trolling its torso, head and gaze. The localisation will be used in
combination with on-line motion planning for object manipula-
tion tasks on a real humanoid robot. In this work, we focus on a
machine learning setup that provides the robot with a method to
estimate the location of objects relative to itself in 3D Cartesian
space. Our research platform is the iCub humanoid robot [1], an
open robotic system, providing a 41 degree-of-freedom (DOF)
upper-body, comprising two arms, a head and a torso. Its visual
system is a pair of cameras mounted in the head in a human-like
fashion (see Fig. 1), providing passive, binocular images.
The problem of localising objects in 3D Cartesian space
given two images from cameras in different locations is widely
known in the computer vision literature as ‘Stereo Vision’. In
the following discussion, CSL and CSR refer to the local ref-
erence frames of the left and right cameras respectively, the
reference frame of the body is CSBody, but as it is mounted
at a fixed point this is also the reference frame chosen for the
environment. Therefore CSWorld denotes the common envi-
ronmental reference frame, in which we seek to express object
locations. Cameras that photograph the same scene from two
different locations provide different 2D projections of the 3D
scene. If the ‘intrinsic parameters’ that specify each camera’s
projection from 3D to 2D, as well as the ‘fundamental matrix’
that is the rigid-body transformation between CSL and CSR are
known, and if there are some features of the scene that can be
identified in both images, then the 3D locations of those fea-
tures can be triangulated. For a thorough review of approaches
based on this principle, we refer the interested reader to [2].
While traditional stereo vision approaches, based on projective
geometry, have been proven effective under carefully controlled
experimental circumstances, they are not ideally suited to most
robotics applications. Intrinsic camera parameters and the fun-
damental matrix may be unknown or time varying, and this re-
quires the frequent repetition of lengthly calibration procedures,
wherein known, structured objects are viewed by the stereo vi-
sion system, and the required parameters are estimated by nu-
merical algorithms. Assuming a solution to the standard stereo
vision problem, applying it to a real physical robot to facilitate
object manipulation remains a challenge. In many robotics ap-
plications, it is somewhat inconvenient to express the environ-
ment with respect to a camera. For example, from a planning
and control standpoint, the most logical choice of coordinate
system is CSWorld, the reference frame at the base of the ma-
nipulator, which does not move with respect to the environment.
In order to transform coordinates from CSL or CSR to CSWorld,
such that we can model objects and control the robot in the same
frame of reference, an accurate kinematic model of the robot is
required. If such a model is available, it must be carefully cal-
ibrated against the actual hardware, and even then its accuracy
may be limited by un-modelled nonlinearities.
We show that localising can be learned without explicit
knowledge of the camera parameters and the kinematic model.
CSR CSL
3-DOF Neck
3-DOF Torso
CSWorld
3-DOF Eyes
vergence
pan/tilt
Figure 1: The coordinate frames relevant for object localisation
on the iCub. Cameras located at the origin of CSL/CSR are used
to express the position of objects with respect to the CSWorld.
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition
10-12 September 2012, Lausanne, Switzerland
20
Table 1: A typical entry from the dataset and the limits used to scale the features and locations for the neural network.
ImageL ImageR Neck Eyes Torso Location
X Y X Y 0 1 2 3 4 5 0 1 2 X Y Z
Vector v0v1v2v3v4v5v6v7v8v9v10 v11 v12 p0p1p2
Example 479 411 503 437 -10.0 0.0 0.0 -19.9 -19.9 0.0 -0.1 -9.9 10.1 0.42 0.27 -0.12
max 640 480 640 480 25 25 10 20 15 5 20 20 50 0.66 0.5 0.55
min 0 0 0 0 -25 -25 -10 -20 -15 0 -20 -20 0 0.00 -0.5 -0.15
2. Previous Work
Several different localisation systems have previously been de-
veloped for the iCub. A popular representation for (stereo) vi-
sion research is based on log-polar transformed images. This bi-
ologically inspired approach first applies a transformation tothe
camera images before typical stereo vision algorithms are used.
The available module currently supports only a static head, ie.
it puts the object position in the CSL/R coordinate frame. The
‘Cartesian controller module’ provides another basic 3D posi-
tion estimation functionality [3]. This module works well on
the simulated robot, however its performance on the hardware
platform is weak, this is because of inaccuracies in the robot
model and camera parameters. The most accurate, currently
available localisation module for the iCub exists in the ‘stereo-
Vision’ module providing centimeter accuracy. Unlike the pre-
sented log-polar approach, this module employs the entire iCub
kinematic model, providing a position estimate in the CSWorld
coordinate frame. The module requires the previously men-
tioned ‘Cartesian controller’ and uses tracking of features to
improve the kinematic model of the camera pair by estimating
a new fundamental matrix continuously. The precision of all of
these approaches depends upon an accurate kinematic model of
the iCub. A very accurate model, or estimation of the model, is
therefore necessary.
There exists currently no module estimating the kinemat-
ics of the iCub, for other robotic systems this has been done:
Gloye et al. used visual feedback to learn the model of a holo-
nomic wheeled robot [4] and Bongard et al. used sensory feed-
back to learn the model of a legged robot [5], but their method
uses no high-dimensional sensory information (such as images).
In robot learning, especially imitation learning, various
approaches have been investigated to tackle these problems.
Sauser & Billard have investigated the problem of reference
frame transformations from a neuroscience perspective [6].
They were able to imitate gestures from a teacher on a Hoap-2
humanoid robot with external fixed cameras. Though promis-
ing their approach has so far not been extended to systems with
non-stationary cameras.
3. Machine Learning Approach
In this paper we investigate two biologically inspired machine
learning approaches: a feed-forward artificial neural network
(ANN) and genetic programming (GP) approach. These tech-
niques use supervised learning, requiring a dataset including
both inputs and outputs (ground truth). More formally, the task
is to estimate the position of an object pR3in the robot’s
reference frame (CSWorld) given an input, also called feature
vector, v. Here we defined vR13 containing the state of the
robot as described by 9 joint encoder values (ie. the 9 controlled
DOF) and the observed positionin both camera images.
A dataset of reference points (RPs) was collected on the
real hardware. A YARP [7] module registering the robot state
and storing the camera images was implemented. To obtain the
position of an object in the images, an object detection algo-
rithm [8] was used to filter the raw stream from the camera.
The hand-measured position of the object in 3D space was then
added as the correlating output. The dataset contains 32 RPs on
the table, with more than 30 robot poses per point. They lie in
a region where the iCub is able to reach with its arms and were
distributed in a grid with a spacing of 6cm.
3.1. Artificial Neural Network (ANN)
An ANN, more precisely a multi-layer perceptron [9] was
trained applying a standard error back-propagation [9] method
on the dataset collected. The neural network approach requires
a pre-processing step, in which the dataset (input vector) is
scaled using the limits given in Table 1 to get values in the range
[1,+1]. The limits are based on to the maximum image size
for the first 4 values, and the joint limits (range of motion in the
stochastic controller) of the robot, for the encoder values. The
output of the neural network is in the same limited range and
needs to be un-scaled.
For training the network the (scaled) dataset was first ran-
domly split into a training (80% of the data) and test set (20%).
The test set allows to verify that the results obtained via learn-
ing are not over-fitting. Separate networks were trained for the
estimation in the X and Y direction. Each network consists of
one input layer with dimension 13, a hidden layer, and an out-
put layer. The network uses bias terms and is fully connected.
The hidden layer consists of 10 neurons, which use a sigmoidal
activation function of the form σ(u) = 1
1+eu. Finally the out-
put layer is a single neuron representing the estimated position
along one axis. The ANNs were trained using PyBrain [10]
with a learning rate of 0.35 and a moment of 0.1. The errors
reported are the average of 10 runs.
3.2. Genetic Programming
Genetic Programming (GP) is a search technique, most com-
monly used for symbolic regression and classification tasks. It
is inspired by concepts from Darwinian evolution [11]. Herein
we use GP to find expressions mapping the inputs to the out-
puts (3D coordinates). The basic algorithm works as follows:
a population is initialised randomly. Each individual represents
a tree, encoding a mathematical expression. The nodes encode
a function, with the leaf nodes either being an available input
or a constant value. For a given set of input values, the output
of the expression can be found by recursing from the root node
through to the terminal nodes. The individuals are then tested to
calculate their ‘fitness’ (in our case the sum of the mean error).
The lower this error, the better the individual is at performing
the mapping. In the next step a new population is generated out
of the old, by taking pairs of the best performing individuals and
performing functions analogous to recombination and mutation.
The process of test and generate is repeated until a solution is
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition
10-12 September 2012, Lausanne, Switzerland
21
Table 2: The mathematical functions available for the genetic
programming (GP) method to select from.
add subtract multiply devide
power sqrt exp log
sin sinh cos cosh
tan tanh asin acos
atan2 min max abs
found or a certain number of individuals have been evaluated.
A comprehensive introduction to genetic programming and its
applications can be found in [12].
Herein we use a freely available software ‘Eureqa’ [13]. It
produces compact, human readable expressions from datasets
employing the above mentioned techniques. The input values
do not have to be scaled in this approach and can remain in the
original form. As with the neural network regression, data was
shuffled and then split into training and validation sets. The
standard settings were used. These including a population of
64 individuals, a crossover rate of 0.5and a mutation rate of
1.5% and the mean square error of the predication was used
as a fitness metric. The generated solution can contain any of
mathematical functions in in Table 2.
4. Experiments and Results
To learn the ability to generalise, the techniques need a dataset
representing the robot in various configurations and object loca-
tions on the table. Our first approach was to place a single object
at different known positions on the table and collect data. To
simplify the image processing, a red LED was used. The LED
was placed at a known position in the grid to mark the refer-
ence point, while the iCub moved into different poses. For each
pose the joint angles and camera images were collected. After
collecting data for a number of poses, the LED was moved to
another position and the process repeated.
For the table case the problem is simplified as we can as-
sume a constant height (Z axis) estimation. Table 3 compares
the position prediction errors of the ANN and GP techniques. It
shows that the neural network is performing better during learn-
ing, which can also be seen in Fig. 2. Both approaches perform
similarly when generalising to unseen data (test set). The ANN
training necessitates a longer runtime, as the back-propagation
Table 3: Estimation accuracy on the dataset for both techniques.
ANN GP
Average Error 2D (cm) 0.846 3.325
Standard Deviation 2D (cm) 0.504 2.210
Average Error X (cm) 0.540 2.028
Standard Deviation X (cm) 0.445 1.760
Average Error Y (cm) 0.5433 2.210
Standard Deviation Y (cm) 0.4304 1.716
algorithm repeats to update the neural networkuntil the network
performance is satisfactory. As described above, two separate
networks were trained to predict the coordinates on the X and Y
axes independently. This approach was chosen as it allowed for
faster learning (i.e. less generations needed to yield the results)
and the ability to run the learning in parallel. On average about
1700 epochs were needed pernetwork for its prediction error to
converge. After training the network produces estimates with
an average accuracy of 0.8cm, with lower separate errors on
the axes (see Table 3). This makes the ANN approach the best
performing approach on the dataset.
The GP method, while converging faster than the neural
network, performs with a lower average accuracy of 3.3cm.
Although this performance is worse than the ANN, it is still
sufficiently accurate to allow for simple reaching and grasping
tasks on the iCub. However, there are a number of advantages to
be considered. The output is in a human-readable form, which
can easily and quickly be transferred and tested on the robot.
Table 4 shows the evolved equations. An interesting observa-
tion is that only one of the camera images is used (features v0
and v1). This allows to reduce the (complete) runtime of the es-
timation as only one images needs to be processed with object
detection algorithms before the expression can be evaluated.
During off-line training it appeared that both the ANN and
GP approaches provide sufficient accuracy for object manip-
ulation. Both approaches were implemented on the iCub to
perform real time distance estimation of objects and to allow
for further verification. The object position in the images (pro-
vided by an object detection filter from a separate iCub vision
system [8]) and joint encoder values were provided to both the
trained neural network and the GP evolved formulae, to allow
easy comparison of the position predictions.
The validation results were obtained using locations on the
Figure 2: The estimated object position (blue dots) vs. the measured object position (red blocks) for the machine learning approaches:
on the left the result obtained from artificial neural networks (ANN), on the right the results using genetic programming (GP).
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition
10-12 September 2012, Lausanne, Switzerland
22
Table4: The equations generated using Genetic Programming.
x=17.81 0.01906 v1+ 0.1527 v4+ 0.1378 v7+ 0.01108 v10 0.0296 v11 0.1207 v12
y=1.124224045 + 0.1295920897 v10 + 0.1156011386 v8+ 0.01695234993 v0
Table 5: The relative estimation errors (in cm) when estimating
the position using fixed poses of the robot and object locations
not in the training nor test set.
ANN GP current iCub
dX dY estX estY estX estY estX estY
0+2 0.10 1.93 0.51 2.28 0.0 2.17
0+1 0.10 0.78 0.30 0.91 0.05 1.0
0 0 0 0 0 0 0 0
0-1 0.03 1.14 0.31 1.35 0.03 1.07
0-2 0.11 2.08 0.61 2.40 0.03 2.07
+2 01.70 0.01 1.93 0.57 2.01 0.17
+1 00.71 0.10 0.81 0.34 0.92 0.11
0 0 0 0 0 0 0 0
-1 00.99 0.21 1.12 0.11 1.17 0.06
-2 01.98 0.30 2.24 0.34 2.33 0.1
Figure 3: The relative localisation errors on the real hardware.
The ground truth is shown in black, the circles represent the
learning approaches, ANN (empty circle) and GP (filled). Re-
sults from the iCub ‘stereoVision’ module is plotted in green.
table and poses that were not in the original training not test
set. It was found that the GP out-performed (average error of
2.7cm) the ANN (average error of 3.5cm) on localisation.
Both techniques performed slightly worse than a fully calibrated
iCub’s ‘stereoVision’ module (1.8cm accuracy). The perfor-
mance on the relative error (where the targetobject was moved
by small increments away from a central point) was very high
for both implementation with the ANN yielding better results,
as can be seen by the values in Table 5 and Fig. 3. The results of
the current iCub localisation module are added for comparison.
To test these approaches under moving conditions, we
scripted the robot to move a given trajectory and recorded the
position estimates for an object at a fixed location. The errors
were tested for using only head/neck joints, for only using torso
and for a combination of both. These all ranged in 2-4 cen-
timetres. The faster the movement the higher the error was, this
lead us to believe that it might mainly be an issue of getting the
images from both cameras synchronised as much as possible.
We also performed this test with a moving test object, the
error though is harder to measure when both objects are moving.
In visual verification no big errors were found1.
5. Conclusions
To estimate the positions of objects placed on a table in front
of an iCub robot we compared artificial neural networks (ANN)
and genetic programming (GP). No explicit robot model nor a
time-consuming stereo camera calibration procedure is needed
to learn. Results of locating objects on the table (2D) are suf-
ficient for real world reaching scenarios, with the GP approach
performing worse than the ANN method on the training set but
generalising better when used on the hardware. The results on
the first 3D dataset show that the method can be scaled to per-
form full 3D estimation. That said a more thorough experimen-
tal testing on the iCub will need to be conducted.
The results show that the iCub can learn simpler ways to
perceive the location of objects than the human engineered
methods. Both approaches provide simple and fastmethods that
can be used in real time on the robot. As the learnt models are
‘light weight’ they could easily be incorporated into embedded
systems and other robotic platforms.
6. References
[1] N. G. Tsagarakis et al., “iCub: the design and realization of
an open humanoid platform for cognitive and neuroscience re-
search,” Advanced Robotics, vol. 21, pp. 1151–1175, Jan. 2007.
[2] R. Hartley and A. Zisserman, Multiple view geometry in computer
vision, 2nd ed. Cambridge University Press, 2000.
[3] U. Pattacini, “Modular Cartesian Controllers for Humanoid
Robots: Design and Implementation on the iCub,” Ph.D. disserta-
tion, RBCS, Italian Institute of Technology, Genova, 2011.
[4] A. Gloye, F. Wiesel, O. Tenchio, and M. Simon, “Reinforcing the
Driving Quality of Soccer Playing Robots by Anticipation,” IT -
Information Technology, vol. 47, no. 5, 2005.
[5] J. Bongard and V. Zykov, “Resilient machines through continuous
self-modeling,Science, vol. 314, no. 5802, pp. 1118–1121, 2006.
[6] E. Sauser and A. Billard, “View sensitive cells as a neural basis for
the representation of others in a self-centered frame of reference,
in Int’l. Symposium on Imitation in Animals and Artifacts, 2005.
[7] G. Metta, P. Fitzpatrick, and L. Natale, “YARP: Yet Another
Robot Platform,Advanced Robotic Systems, vol. 3, 2006.
[8] J. Leitner, S. Harding, M. Frank, A. F¨orster, and J. Schmidhuber,
“icVision: A Modular Vision System for Cognitive Robotics Re-
search,” in International Conference on Cognitive Systems, 2012.
[9] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Ap-
proach, 3rd ed. Prentice Hall, 2010.
[10] T. Schaul et al., “PyBrain,Journal of Machine Learning Re-
search, 2010.
[11] J. Koza, Genetic Programming: On the Programming of Comput-
ers by Means of Natural Selection. MIT Press, 1992.
[12] R. Poli, W. B. Langdon, and N. F. McPhee, A field guide to genetic
programming. Published at http://lulu.com ; Freely avail-
able at http://www.gp-field-guide.org.uk,2008.
[13] M. Schmidt and H. Lipson, “Distilling Free-Form Natural Laws
from Experimental Data,Science, pp. 1–5, Apr. 2009.
1A video showing localisation while the iCub and the object is mov-
ing can be found at http://Juxi.net/projects/icVision/.
Proceedings of the Post-Graduate Conference on Robotics and Development of Cognition
10-12 September 2012, Lausanne, Switzerland
23
... MoBeE was a key component in the success of this work. Without it, the iCub would only be able to learn about objects that were far away, or in a constrained region -such as on the surface of a table [Leitner et al., 2012e]. ...
Thesis
Full-text available
Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions.
... It allows our two robots to safely work in the same workspace. Without this, the iCub would only be able to learn about objects that were far away, or in a constrained region -such as on the surface of a table [36]. ...
Conference Paper
Full-text available
We use a Katana robotic arm to teach an iCub humanoid robot how to perceive the location of the objects it sees. To do this, the Katana positions an object within the shared workspace, and tells the iCub where it has placed it. While the iCub moves it observes the object, and a neural network then learns how to relate its pose and visual inputs to the object location. We show that satisfactory results can be obtained for localisation even in scenarios where the kine- matic model is imprecise or not available. Furthermore, we demonstrate that this task can be accomplished safely. For this task we extend our collision avoidance software for the iCub to prevent collisions between multiple, independently controlled, heterogeneous robots in the same workspace.
Conference Paper
In this paper, we present our on-going research to allow humanoid robots to learn spatial perception. We are using artificial neural networks (ANN) to estimate the location of objects in the robot’s environment. The method is using only the visual inputs and the joint encoder readings, no camera calibration and information is necessary, nor is a kinematic model. We find that these ANNs can be trained to allow spatial perception in Cartesian (3D) coordinates. These lightweight networks are providing estimates that are comparable to current state of the art approaches and can easily be used together with existing operational space controllers.
Article
Full-text available
This paper shows how an omnidirectional robot can learn to correct inaccuracies when driving, or even learn to use corrective motor commands when a motor fails, whether partially or completely. Driving inaccuracies are unavoidable, since not all wheels have the same grip on the surface, or not all motors can provide exactly the same power. When a robot starts driving, the real system response differs from the ideal behavior assumed by the control software. Also, malfunctioning motors are a fact of life that we have to take into account. Our approach is to let the control software learn how the robot reacts to instructions sent from the control computer. We use a neural network, or a linear model for learning the robot's response to the commands. The model can be used to predict deviations from the desired path, and take corrective action in advance, thus increasing the driving accuracy of the robot. The model can also be used to monitor the robot and assess if it is performing according to its learned response function. If it is not, the new response function of the malfunctioning robot can be learned and updated. We show, that even if a robot loses power from a motor, the system can re-learn to drive the robot in a straight path, even if the robot is a black-box and we are not aware of how the commands are applied internally.
Article
Full-text available
We describe YARP, Yet Another Robot Platform, an open-source project that encapsulates lessons from our experience in building humanoid robots. The goal of YARP is to minimize the effort devoted to infrastructure-level software development by facilitating code reuse, modularity and so maximize research-level development and collaboration. Humanoid robotics is a "bleeding edge" field of research, with constant flux in sensors, actuators, and processors. Code reuse and maintenance is therefore a significant challenge. We describe the main problems we faced and the solutions we adopted. In short, the main features of YARP include support for inter-process communication, image processing as well as a class hierarchy to ease code reuse across different hardware platforms. YARP is currently used and tested on Windows, Linux and QNX6 which are common operating systems used in robotics.
Chapter
The long-anticipated revision of this best-selling book offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Intelligent Agents. Solving Problems by Searching. Informed Search Methods. Game Playing. Agents that Reason Logically. First-order Logic. Building a Knowledge Base. Inference in First-Order Logic. Logical Reasoning Systems. Practical Planning. Planning and Acting. Uncertainty. Probabilistic Reasoning Systems. Making Simple Decisions. Making Complex Decisions. Learning from Observations. Learning with Neural Networks. Reinforcement Learning. Knowledge in Learning. Agents that Communicate. Practical Communication in English. Perception. Robotics. For those interested in artificial intelligence.
Article
The development of robotic cognition and the advancement of understanding of human cognition form two of the current greatest challenges in robotics and neuroscience, respectively. The RobotCub project aims to develop an embodied robotic child (iCub) with the physical (height 90 cm and mass less than 23 kg) and ultimately cognitive abilities of a 2.5-year-old human child. The iCub will be a freely available open system which can be used by scientists in all cognate disciplines from developmental psychology to epigenetic robotics to enhance understanding of cognitive systems through the study of cognitive development. The iCub will be open both in software, but more importantly in all aspects of the hardware and mechanical design. In this paper the design of the mechanisms and structures forming the basic 'body' of the iCub are described. The papers considers kinematic structures dynamic design criteria, actuator specification and selection, and detailed mechanical and electronic design. The paper concludes with tests of the performance of sample joints, and comparison of these results with the design requirements and simulation projects.