Article

Learning Visual Object Detection and Localisation Using icVision

Authors:
  • LYRO Robotics
  • Machine Intelligence Ltd.
  • Denso ADAS Automotive GmbH
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Building artificial agents and robots that can act in an intelligent way is one of the main research goals in artificial intelligence and robotics. Yet it is still hard to integrate functional cognitive processes into these systems. We present a framework combining computer vision and machine learning for the learning of object recognition in humanoid robots. A biologically inspired, bottom-up architecture is introduced to facilitate visual perception and cognitive robotics research. It aims to mimic processes in the human brain performing visual cognition tasks. A number of experiments with this icVision framework are described. We showcase both detection and identification in the image plane (2D), using machine learning. In addition we show how a biologically inspired attention mechanism allows for fully autonomous learning of visual object representations. Furthermore localising the detected objects in 3D space is presented, which in turn can be used to create a model of the environment.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this work, it is explained how the biologically inspired architecture emulates human visual cognition processes, which allow detecting, identifying, and 3D localizing objects. The presented icVision framework enables autonomous learning of visual representations and environment modeling and thus promotes intelligent artificial agents and robots [28]. ...
Research
Full-text available
Within the last few years only, deep learning has grown into an enabler of sight and interpretation of a robot's surroundings in a manner that is unprecedentedly correct and sophisticated. This paper provides a survey of the deep learning techniques applied in robotic perception, covering their use in the tasks of object detection, recognition, segmentation, and scene understanding. We detail the progress on CNNs, RNNs, and other deep architectures for enhancing the capability of robots with respect to interpreting more complex visual data. We also consider how multimodal data (visual, auditory, tactile) integration can provide better understanding of the environment. Key highlights of this paper are on autonomous navigation, human-robot interaction, and industrial automation in demonstrating deep learning's transformative capabilities across these domains. We address the challenges faced in real-world scenarios of deploying these systems-particularly, large datasets, computational resources, and robust training methodologies. Finally, we provide insights into future trends and possible research directions that would further push the envelope of robotic perception research, making it ever more perceptive and autonomous in operation.
... A simple illustration of the iCub vision kinematics (image from(Leitner et al. 2017)). ...
Article
Full-text available
Biological vision incorporates intelligent cooperation between the sensory and the motor systems, which is facilitated by the development of motor skills that help to shape visual information that is relevant to a specific vision task. In this article, we seek to explore an approach to active vision inspired by biological systems, which uses limited constraints for motor strategies through progressive adaptation via an evolutionary method. This type of approach gives freedom to artificial systems in the discovery of eye-movement strategies that may be useful to solve a given vision task but are not known to us. In the experiment sections of this article, we use this type of evolutionary active vision system for more complex natural images in both two-dimensional (2D) and three-dimensional (3D) environments. To further improve the results, we experiment with the use of pre-processing the visual input with both the uniform local binary patterns (ULBP) and the histogram of oriented gradients (HOG) for classification tasks in the 2D and 3D environments. The 3D experiments include application of the active vision system to object categorisation and indoor versus outdoor environment classification. Our experiments are conducted on the iCub humanoid robot simulator platform.
... While currently the pipeline is using mainly single feature detecotrs, in the future this should be extended to connect and stack various filters together. A decent overview of some of the experiments done using the framework can be found in Leitner et al. [2013a]. ...
Thesis
Full-text available
Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions.
... In particular, the visual input could be sent to the controllers: the possibility to learn to reach where the eye is foveating [32], [75] would allow the robot to reuse the skills learnt with a certain objects to interact with other objects located in different positions. Second, vision and attention could support the discovery of new goals: using visual processing techniques such as object recognition [44], [86] and active vision [9], [59] strategies the system could find where objects are located in the space, and also identify novel items between them, so as to focus its exploration in those parts of the environment to speed up the discovery of interesting events. ...
Article
Full-text available
In this work we present GRAIL (Goal-discovering Robotic Architecture for Intrisically-motivated Learning), a 4- level architecture that is able to autonomously (1) discover changes in the environment, (2) form representations of the goals corresponding to those changes, (3) select the goal to pursue on the basis of intrinsic motivations, (4) select suitable computational resources to achieve the selected goal, (5) monitor the achievement of the selected goal, and (6) self-generate a learning signal when the selected goal is successfully achieved. Building on previous research, GRAIL exploits the power of goals and competence-based intrinsic motivations to autonomously explore the world and learn different skills that allow the robot to modify the environment. To highlight the features of GRAIL, we implement it in a simulated iCub robot and test the system in 4 different experimental scenarios where the agent has to perform reaching tasks within a 3D environment.
... These are generalpurpose architectures, and thus they can be used to build several modules of a larger system (e.g., object recognition, key point detectors and object detection modules of a robot vision system). Examples include trainable COSFIRE filters Petkov, 2013, 2014), and Cartesian Genetic Programming (CGP) Leitner et al., 2013). ...
Article
Full-text available
Object detection is a key ability required by most computer and robot vision systems. The latest research on this area has been making great progress in many directions. In the current manuscript, we give an overview of past research on object detection, outline the current main research directions, and discuss open problems and possible future directions.
... Still, despite impressive successes and growing interest in BICA, wide gaps separate different approaches from each other and from solutions found in biology. Most efforts either focus on specific problems to be solved such as vision [13], decisionmaking [14], speech activity [15], emotion [16], memory [17], or take a more general approach that disregard all the other research efforts. ...
Conference Paper
RoboBrain proposes a biologically inspired 'blue-print' software architecture for humanoid robots that maps the main functions of the human brain. Given the extreme complexity of designing such an architecture, our intention is to provide a basic robots' IT command and control foundation with minimal required components to address humanoid robots' functionality needs. Our approach integrates the 'must have' components that emulate the human brain's corresponding functionality as well as addresses the command, control, perception and task execution requirements of a humanoid robot.
... icVision provides modules to estimate the 3D position based on the robot's pose and the location of object in the camera images [22]. It has been used extensively at IDSIA for various robot vision experiments [23]. ...
Article
Full-text available
Salient Object Detection (SOD) in natural images is an active research area with burgeoning applications across diverse disciplines such as object recognition, image compression, video summarization, object discovery, image retargetting etc. Most salient object detection methods model this problem as a binary segmentation problem where firstly a saliency map is found which highlights the salient pixels and suppresses the background pixels in an image. Secondly, some threshold is applied to obtain the binary segmentation from the saliency map. Thus, thresholding is an important ingredient of salient object detection methods and affects the SOD performance. In this paper, we provide a comprehensive review of various thresholding methods in literature employed for SOD. We have developed a taxonomy of thresholding methods which shall be useful to the researchers and practitioners working in this fascinating research field. Further, we also discuss unexplored thresholding approaches which can be employed in SOD. Various existing and proposed performance measures to analyze SOD methods that depend on thresholding are also presented. Experiments on popular thresholding methods have also been carried out to show the dependence of qualitative and quantitative performance on thresholding.
Article
Full-text available
We describe our software system enabling a tight integration between vision and control modules on complex, high-DOF humanoid robots. This is demonstrated with the iCub humanoid robot performing visual object detection, reaching and grasping actions. A key capability of this system is reactive avoidance of obstacle objects detected from the video stream while carrying out reach-and-grasp tasks. The subsystems of our architecture can independently be improved and updated, for example, we show that by using machine learning techniques we can improve visual perception by collecting images during the robot’s interaction with the environment. We describe the task and software design constraints that led to the layered modular system architecture.
Conference Paper
Full-text available
Robust object manipulation is still a hard problem in robotics, even more so in high degree-of-freedom (DOF) humanoid robots. To improve performance a closer integration of visual and motor systems is needed. We herein present a novel method for a robot to learn robust detection of its own hands and fingers enabling sensorimotor coordination. It does so solely using its own camera images and does not require any external systems or markers. Our system based on Cartesian Genetic Programming (CGP) allows to evolve programs to perform this image segmentation task in real-time on the real hardware. We show results for a Nao and an iCub humanoid each detecting its own hands and fingers.
Conference Paper
Full-text available
We present a “curious” active vision system for a humanoid robot that autonomously explores its environment and learns object representations without any human assistance. Similar to an infant, who is intrinsically motivated to seek out new information, our system is endowed with an attention and learning mechanism designed to search for new information that has not been learned yet. Our method can deal with dynamic changes of object appearance which are incorporated into the object models. Our experiments demonstrate improved learning speed and accuracy through curiosity-driven learning.
Article
Full-text available
We present a combined machine learning and computer vision approach for robots to localize objects. It allows our iCub humanoid to quickly learn to provide accurate 3D position estimates (in the centimetre range) of objects seen. Biologically inspired approaches, such as Artificial Neural Networks (ANN) and Genetic Programming (GP), are trained to provide these position estimates using the two cameras and the joint encoder readings. No camera calibration or explicit knowledge of the robot’s kinematic model is needed. We find that ANN and GP are not just faster and have lower complexity than traditional techniques, but also learn without the need for extensive calibration procedures. In addition, the approach is localizing objects robustly, when placed in the robot’s workspace at arbitrary positions, even while the robot is moving its torso, head and eyes.
Conference Paper
Full-text available
In this work we introduce a technique for a humanoid robot to autonomously learn the representations of objects within its visual environment. Our approach involves an attention mechanism in association with feature based segmentation that explores the environment and provides object samples for training. These samples are learned for further object identification using Cartesian Genetic Programming (CGP). The learned identification is able to provide robust and fast segmentation of the objects, without using features. We showcase our system and its performance on the iCub humanoid robot.
Article
Full-text available
Examined the ways that infants acquire information about the haptic and visual properties of objects. The 1st study was a cross-sectional investigation of exploratory behavior in 60 6-, 9-, and 12-mo-old infants. Each S was presented with 2 series of objects having some common characteristic. Several general behaviors—looking, handling, mouthing, and banging—were considered along with more specific measures—turning the object while looking, alternating between looking and mouthing, transferring the object from hand to hand, and fingering. Duration of mouthing and particular types of mouthing decreased with age, whereas fingering and other more precise forms of manipulation increased. There were significant stimulus effects showing that the Ss adjusted their behavior to the particular characteristics of the objects. Decrements with increasing familiarization were also observed in most behaviors. The 2nd study addressed the issue of whether the different behaviors are actually used to pick up information about object characteristics. 48 9- and 12-mo-old infants were presented with 3 problems that involved a period of familiarization followed by a trial in which the object was changed along 1 dimension: shape, texture, or weight. Ss' behavior in the change trials suggests that different types of manipulation are used to explore the different changes. (28 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Chapter
Full-text available
Combining domain knowledge about both imaging processing and machine learning techniques can expand the abilities of Genetic Programming when used for image processing. We successfully demonstrate our new approach on several different problem domains. We show that the approach is fast, scalable and robust. In addition, by virtue of using off-the-shelf image processing libraries we can generate human readable programs that incorporate sophisticated domain knowledge.
Conference Paper
Full-text available
We use a Katana robotic arm to teach an iCub humanoid robot how to perceive the location of the objects it sees. To do this, the Katana positions an object within the shared workspace, and tells the iCub where it has placed it. While the iCub moves it observes the object, and a neural network then learns how to relate its pose and visual inputs to the object location. We show that satisfactory results can be obtained for localisation even in scenarios where the kine- matic model is imprecise or not available. Furthermore, we demonstrate that this task can be accomplished safely. For this task we extend our collision avoidance software for the iCub to prevent collisions between multiple, independently controlled, heterogeneous robots in the same workspace.
Chapter
Full-text available
In this chapter, we will present three applications in which CGP can automatically generate novel image processing algorithms that compare to or exceed the best known conventional solutions. The applications fall into the areas of image preprocessing and classification.
Conference Paper
Full-text available
In this paper, we present a novel scale- and rotation-invariant interest point detector and descriptor, coined SURF (Speeded Up Robust Features). It approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (in casu, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper presents experimental results on a standard evaluation set, as well as on imagery obtained in the context of a real-life object recognition application. Both show SURF’s strong performance.
Conference Paper
Full-text available
To produce even the simplest human-like behaviors, a humanoid robot must be able to see, act, and react, within a tightly integrated behavioral control system. Although there exists a rich body of literature in Computer Vision, Path Planning, and Feedback Control, wherein many critical subproblems are addressed individually, most demonstrable behaviors for humanoid robots do not effectively integrate elements from all three disciplines. Consequently, tasks that seem trivial to us humans, such as pick-and-place in an unstructured environment, remain far beyond the state-of-the-art in experimental robotics. We view this primarily as a software engineering problem, and have therefore developed MoBeE, a novel behavioral framework for humanoids and other complex robots, which integrates elements from vision, planning, and control, facilitating the synthesis of autonomous, adaptive behaviors. We communicate the efficacy of MoBeE through several demonstrative experiments. We first develop Adaptive Roadmap Planning by integrating a reactive feedback controller into a roadmap planner. Then, an industrial manipulator teaches a humanoid to localize objects as the two robots operate autonomously in a shared workspace. Finally, an integrated vision, planning, control system is applied to a real-world reaching task using the humanoid robot.
Article
Full-text available
We describe YARP, Yet Another Robot Platform, an open-source project that encapsulates lessons from our experience in building humanoid robots. The goal of YARP is to minimize the effort devoted to infrastructure-level software development by facilitating code reuse, modularity and so maximize research-level development and collaboration. Humanoid robotics is a "bleeding edge" field of research, with constant flux in sensors, actuators, and processors. Code reuse and maintenance is therefore a significant challenge. We describe the main problems we faced and the solutions we adopted. In short, the main features of YARP include support for inter-process communication, image processing as well as a class hierarchy to ease code reuse across different hardware platforms. YARP is currently used and tested on Windows, Linux and QNX6 which are common operating systems used in robotics.
Chapter
Full-text available
We present a system for recognizing human faces from single images out of a large database with one image per person. The task is difficult because of image variation in terms of position, size, expression, and pose. The system collapses most of this variance by extracting concise face descriptions in the form of image graphs. In these, fiducial points on the face (eyes, mouth etc.) are described by sets of wavelet components (jets). Image graph extraction is based on a novel approach, the bunch graph, which is constructed from a small set of sample image graphs. Recognition is based on a straight-forward comparison of image graphs. We report recognition experiments on the FERET database and the Bochum database, including recognition across pose.
Article
Full-text available
The repeatability and efficiency of a corner detector determines how likely it is to be useful in a real-world application. The repeatability is important because the same scene viewed from different positions should yield features which correspond to the same real-world 3D locations. The efficiency is important because this determines whether the detector combined with further processing can operate at frame rate. Three advances are described in this paper. First, we present a new heuristic for feature detection and, using machine learning, we derive a feature detector from this which can fully process live PAL video using less than 5 percent of the available processing time. By comparison, most other detectors cannot even operate at frame rate (Harris detector 115 percent, SIFT 195 percent). Second, we generalize the detector, allowing it to be optimized for repeatability, with little loss of efficiency. Third, we carry out a rigorous comparison of corner detectors based on the above repeatability criterion applied to 3D scenes. We show that, despite being principally constructed for speed, on these stringent tests, our heuristic detector significantly outperforms existing feature detectors. Finally, the comparison demonstrates that using machine learning produces significant improvements in repeatability, yielding a detector that is both very fast and of very high quality.
Article
Full-text available
We used functional magnetic resonance imaging (fMRI) to measure activity in human early visual cortex (areas V1, V2 and V3) during a challenging contrast-detection task. Subjects attempted to detect the presence of slight contrast increments added to two kinds of background patterns. Behavioral responses were recorded so that the corresponding cortical activity could be grouped into the usual signal detection categories: hits, false alarms, misses and correct rejects. For both kinds of background patterns, the measured cortical activity was retinotopically specific. Hits and false alarms were associated with significantly more cortical activity than were correct rejects and misses. That false alarms evoked more activity than misses indicates that activity in early visual cortex corresponded to the subjects' percepts, rather than to the physically presented stimulus.
Article
Full-text available
In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector. Many different descriptors have been proposed in the literature. It is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context, steerable filters, PCA-SIFT, differential invariants, spin images, SIFT, complex filters, moment invariants, and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.
Article
Full-text available
We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
Conference Paper
Full-text available
This paper describes the design of a cognitive architecture for the iCub : an open-systems 53 degree-of-freedom cognitive humanoid robot. At 94 cm tall, the iCub is the same size as a three year-old child and is designed to be able to crawl on all fours and sit up. Its hands will allow dexterous manipulation, its head and eyes are fully articulated, and it has visual, vestibular, auditory, and haptic sensory capabilities. We begin by reviewing briefly the enactive approach to cognition, highlighting the requirements for phylogenetic configuration, the necessity for ontogenetic development, and the importance of humanoid embodiment. After a short look at the the iCub's mechanical and electronic specifications, we detail the iCub cognitive architecture, addressing the iCub phylogeny, i.e. the robot's intended innate abilities, the modulation of these skills by circuits inspired by the functionality of the hippocampus, basal ganglia, and amygdala. The architecture also include a prospective ability whereby sensorimotor behaviours can be simulated and then used to influence the action selection in the basal ganglia. We conclude by outlining our scenario for ontogenesis based on human neo-natal development.
Conference Paper
Full-text available
This paper presents a general trainable framework for object detection in static images of cluttered scenes. The detection technique we develop is based on a wavelet representation of an object class derived from a statistical analysis of the class instances. By learning an object class in terms of a subset of an overcomplete dictionary of wavelet basis functions, we derive a compact representation of an object class which is used as an input to a support vector machine classifier. This representation overcomes both the problem of in-class variability and provides a low false detection rate in unconstrained environments. We demonstrate the capabilities of the technique in two domains whose inherent information content differs significantly. The first system is face detection and the second is the domain of people which, in contrast to faces, vary greatly in color, texture, and patterns. Unlike previous approaches, this system learns from examples and does not rely on any a priori (hand-crafted) models or motion-based segmentation. The paper also presents a motion-based extension to enhance the performance of the detection algorithm over video sequences. The results presented here suggest that this architecture may well be quite general
Article
Full-text available
The graph-based Cartesian genetic programming system has an unusual genotype representation with a number of advantageous properties. It has a form of redundancy whose role has received little attention in the published literature. The representation has genes that can be activated or deactivated by mutation operators during evolution. It has been demonstrated that this "junk" has a useful role and is very beneficial in evolutionary search. The results presented demonstrate the role of mutation and genotype length in the evolvability of the representation. It is found that the most evolvable representations occur when the genotype is extremely large and in which over 95% of the genes are inactive.
Article
Full-text available
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Article
We present a system for recognizing human faces from single images out of a large database containing one image per person. Faces are represented by labeled graphs, based on a Gabor wavelet transform. Image graphs of new faces are extracted by an elastic graph matching process and can be compared by a simple similarity function. The system differs from the preceding one in three respects. Phase information is used for accurate node positioning. Object-adapted graphs are used to handle large rotations in depth. Image graph extraction is based on a novel data structure, the bunch graph, which is constructed from a small set of sample image graphs.
Article
We present a "curious" active vision system for a humanoid robot that autonomously explores its environment and learns object representations without any human assistance. Similar to an infant, who is intrinsically motivated to seek out new information, our system is endowed with an attention and learning mechanism designed to search for new information that has not been learned yet. Our method can deal with dynamic changes of object appearance which are incorporated into the object models. Our experiments demonstrate improved learning speed and accuracy through curiosity-driven learning.
Conference Paper
We introduce a fully autonomous active vision system that explores its environment and learns visual representations of objects in the scene. The system design is motivated by the fact that infants learn internal representations of the world without much human assistance. Inspired by this, we build a curiosity driven system that is drawn towards locations in the scene that provide the highest potential for learning. In particular, the attention on a stimulus in the scene is related to the improvement in its internal model. This makes the system learn dynamic changes of object appearance in a cumulative fashion. We also introduce a self-correction mechanism in the system that rectifies situations where several distinct models have been learned for the same object or a single model has been learned for adjacent objects. We demonstrate through experiments that the curiosity-driven learning leads to a higher learning speed and improved accuracy.
Conference Paper
Abstract— This paper gives an overview of ROS, an open- source robot operating,system. ROS is not an operating,system in the traditional sense of process management,and scheduling; rather, it provides a structured communications layer above the host operating,systems,of a heterogenous,compute,cluster. In this paper, we discuss how ROS relates to existing robot software frameworks, and briefly overview some of the available application software,which,uses ROS.
Chapter
In this chapter, we describe the original and most widely known form of Cartesian genetic programming (CGP). CGP encodes computational structures, which we call ‘programs’ in the form of directed acyclic graphs. We refer to this as ‘classic’ CGP. However these program may be computer programs, circuits, rules, or other specialized computational entities.
Article
Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
Conference Paper
We present an approach for learning to detect objects in still gray images, that is based on a sparse, part-based representation of objects. A vocabulary of information-rich object parts is automatically constructed from a set of sample images of the object class of interest. Images are then represented using parts from this vocabulary, along with spatial relations observed among them. Based on this representation, a feature-efficient learning algorithm is used to learn to detect instances of the object class. The framework developed can be applied to any object with distinguishable parts in a relatively fixed spatial configuration. We report experiments on images of side views of cars. Our experiments show that the method achieves high detection accuracy on a difficult test set of real-world images, and is highly robust to partial occlusion and background variation. In addition, we discuss and offer solutions to several methodological issues that are significant for the research community to be able to evaluate object detection approaches.
Article
The development of robotic cognition and the advancement of understanding of human cognition form two of the current greatest challenges in robotics and neuroscience, respectively. The RobotCub project aims to develop an embodied robotic child (iCub) with the physical (height 90 cm and mass less than 23 kg) and ultimately cognitive abilities of a 2.5-year-old human child. The iCub will be a freely available open system which can be used by scientists in all cognate disciplines from developmental psychology to epigenetic robotics to enhance understanding of cognitive systems through the study of cognitive development. The iCub will be open both in software, but more importantly in all aspects of the hardware and mechanical design. In this paper the design of the mechanisms and structures forming the basic 'body' of the iCub are described. The papers considers kinematic structures dynamic design criteria, actuator specification and selection, and detailed mechanical and electronic design. The paper concludes with tests of the performance of sample joints, and comparison of these results with the design requirements and simulation projects.
Article
We describe a humanoid robot platform--the iCub--which was designed to support collaborative research in cognitive development through autonomous exploration and social interaction. The motivation for this effort is the conviction that significantly greater impact can be leveraged by adopting an open systems policy for software and hardware development. This creates the need for a robust humanoid robot that offers rich perceptuo-motor capabilities with many degrees of freedom, a cognitive capacity for learning and development, a software architecture that encourages reuse & easy integration, and a support infrastructure that fosters collaboration and sharing of resources. The iCub satisfies all of these needs in the guise of an open-system platform which is freely available and which has attracted a growing community of users and developers. To date, twenty iCubs each comprising approximately 5000 mechanical and electrical parts have been delivered to several research labs in Europe and to one in the USA.
Article
Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
Article
1. Using the two-dimensional (2D) spatial and spectral response profiles described in the previous two reports, we test Daugman's generalization of Marcelja's hypothesis that simple receptive fields belong to a class of linear spatial filters analogous to those described by Gabor and referred to here as 2D Gabor filters. 2. In the space domain, we found 2D Gabor filters that fit the 2D spatial response profile of each simple cell in the least-squared error sense (with a simplex algorithm), and we show that the residual error is devoid of spatial structure and statistically indistinguishable from random error. 3. Although a rigorous statistical approach was not possible with our spectral data, we also found a Gabor function that fit the 2D spectral response profile of each simple cell and observed that the residual errors are everywhere small and unstructured. 4. As an assay of spatial linearity in two dimensions, on which the applicability of Gabor theory is dependent, we compare the filter parameters estimated from the independent 2D spatial and spectral measurements described above. Estimates of most parameters from the two domains are highly correlated, indicating that assumptions about spatial linearity are valid. 5. Finally, we show that the functional form of the 2D Gabor filter provides a concise mathematical expression, which incorporates the important spatial characteristics of simple receptive fields demonstrated in the previous two reports. Prominent here are 1) Cartesian separable spatial response profiles, 2) spatial receptive fields with staggered subregion placement, 3) Cartesian separable spectral response profiles, 4) spectral response profiles with axes of symmetry not including the origin, and 5) the uniform distribution of spatial phase angles. 6. We conclude that the Gabor function provides a useful and reasonably accurate description of most spatial aspects of simple receptive fields. Thus it seems that an optimal strategy has evolved for sampling images simultaneously in the 2D spatial and spatial frequency domains.
Article
Adjacent simple cells recorded and "isolated" simultaneously from the same microelectrode placement were usually tuned to the same orientation and spatial frequency. The responses of the members of these "spatial frequency pairs" to drifting sine-wave gratings were cross-correlates. Within the middle range of the spatial frequency selectivity curves, the responses of the paired cells differed in phase by approximately 90 percent. This phase relationship suggests that adjacent simple cells tuned to the same spatial frequency and orientation represent paired sine and cosine filters in terms of their processing of afferent spatial inputs and truncated sine and cosine filters in terms of the output of simple cells.
Conference Paper
In this paper, a new recursive neural network model, able to process directed acyclic graphs with labeled edges, is introduced, in order to address the problem of object detection in images. In fact, the detection is a preliminary step in any object recognition system. The proposed method assumes a graph-based representation of images, that combines both spatial and visual features. In particular, after segmentation, an edge between two nodes stands for the adjacency relationship of two homogeneous regions, the edge label collects information on their relative positions, whereas node labels contain visual and geometric information on each region (area, color, texture, etc.). Such graphs are then processed by the recursive model in order to determine the eventual presence and the position of objects inside the image. Some experiments on face detection, carried out on scenes acquired by an indoor camera, are reported, showing very promising results. The proposed technique is general and can be applied in different object detection systems, since it does not include any a priori knowledge on the particular problem.
Conference Paper
An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds
Article
This survey presents an overview of the autonomous development of mental capabilities in computational agents. It does so based on a characterization of cognitive systems as systems which exhibit adaptive, anticipatory, and purposive goal-directed behavior. We present a broad survey of the various paradigms of cognition, addressing cognitivist (physical symbol systems) approaches, emergent systems approaches, encompassing connectionist, dynamical, and enactive systems, and also efforts to combine the two in hybrid systems. We then review several cognitive architectures drawn from these paradigms. In each of these areas, we highlight the implications and attendant problems of adopting a developmental approach, both from phylogenetic and ontogenetic points of view. We conclude with a summary of the key architectural features that systems capable of autonomous development of mental capabilities should exhibit