Giorgio Metta

Italian Institute of Technology (IIT) , Genova, Liguria, Italy

Are you Giorgio Metta?

Claim your profile

Publications (279)110 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Human expertise in face perception grows over development, but even within minutes of birth, infants exhibit an extraordinary sensitivity to face-like stimuli. The dominant theory accounts for innate face detection by proposing that the neonate brain contains an innate face detection device, dubbed 'Conspec'. Newborn face preference has been promoted as some of the strongest evidence for innate knowledge, and forms a canonical stage for the modern form of the nature-nurture debate in psychology. Interpretation of newborn face preference results has concentrated on monocular stimulus properties, with little mention or focused investigation of potential binocular involvement. However, the question of whether and how newborns integrate the binocular visual streams bears directly on the generation of observable visual preferences. In this theoretical paper, we employ a synthetic approach utilizing robotic and computational models to draw together the threads of binocular integration and face preference in newborns, and demonstrate cases where the former may explain the latter. We suggest that a system-level view considering the binocular embodiment of newborn vision may offer a mutually satisfying resolution to some long-running arguments in the polarizing debate surrounding the existence and causal structure of newborns' 'innate knowledge' of faces.
    Developmental science. 06/2014;
  • Developmental science. 06/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data).
    Topics in Cognitive Science 06/2014; · 2.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose a weighted supervised pooling method for visual recognition systems. We combine a standard Spatial Pyramid Representation which is commonly adopted to encode spatial information, with an appropriate Feature Space Representation favoring semantic information in an appropriate feature space. For the latter, we propose a weighted pooling strategy exploiting data supervision to weigh each local descriptor coherently with its likelihood to belong to a given object class. The two representations are then combined adaptively with Multiple Kernel Learning. Experiments on common benchmarks (Caltech- 256 and PASCAL VOC-2007) show that our image representation improves the current visual recognition pipeline and it is competitive with similar state-of-art pooling methods. We also evaluate our method on a real Human-Robot Interaction setting, where the pure Spatial Pyramid Representation does not provide sufficient discriminative power, obtaining a remarkable improvement.
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Calibration continues to receive significant atten-tion in robotics because of its key impact on performance and cost associated with the operation of complex robots. Calibration of kinematic parameters is typically the first mandatory step. To this end, a variety of metrology systems and corresponding algorithms have been described in the literature relying on measurements of the pose of the end-effector using a camera or laser tracking system, or, exploiting constraints arising from contacts of the end-effector with the environment. In this work, we take inspiration from the behavior of infants and certain animals, who are believed to use self-stimulation or self-touch to "calibrate" their body representations, and present a new solution to this problem by letting the robot close the kinematic chain by touching its own body. The robot considered in this paper is sensorized with tactile arrays for a total of about 4200 sensing points. The correspondence between the predicted contact point from existing forward kinematics and the actual position on the robot's 'skin' provides sample data that allows refining the kinematic representation (DH param-eters). The data collection procedure is automated—self-touch is autonomously executed by the robot—and can be repeated at any time, providing a compact self-calibration system that does not require an external measurement apparatus.
    Proc. IEEE Int. Conf. Robotics and Automation (ICRA), Hong Kong, China; 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new technique to control highly redundant mechanical systems, such as humanoid robots. We take inspiration from two approaches. Prioritized control is a widespread multi-task technique in robotics and animation: tasks have strict priorities and they are satisfied only as long as they do not conflict with any higher-priority task. Optimal control instead formulates an optimization problem whose solution is either a feedback control policy or a feedforward trajectory of control inputs. We introduce strict priorities in multi-task optimal control problems, as an alternative to weighting task errors proportionally to their importance. This ensures the respect of the specified priorities, while avoiding numerical conditioning issues. We compared our approach with both prioritized control and optimal control with tests on a simulated robot with 11 degrees of freedom.
    Robotics and Automation, IEEE International Conference on (ICRA), Hong Kong, China; 05/2014
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose an autoencoder-based method for the unsupervised identification of subword units. We experiment with different types and architectures of autoencoders to asses what autoencoder properties are most important for this task. We first show that the encoded representation of speech pro-duced by standard autencoders is more effective than Gaus-sian posteriorgrams in a spoken query classification task. Fi-nally we evaluate the subword inventories produced by the proposed method both in terms of classification accuracy in a word classification task (with lexicon size up to 263 words) and in terms of consistency between subword transcription of different word examples of a same word type. The evaluation is carried out on Italian and American English datasets.
    IEEE Internation Conference on Acoustics, Speech and Language Processing; 05/2014
  • Source
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that image representations learned through ad-hoc dictionaries improve the overall results in object categorization problems. Following the widely accepted coding-pooling visual recognition pipeline, these representations are often tightly coupled with a coding stage. In this paper we show how to exploit ad- hoc representations both within the coding and the pooling phases. We learn a dictionary for each object class and then use local descriptors encoded with the learned atoms to guide the pooling operator. We exhaustively evaluate the proposed approach in both single instance object recognition and object categorization problems. From the applications standpoint we consider a classical image retrieval scenario with the Caltech 101, as well as a typical robot vision task with data acquired by the iCub humanoid robot.
    VISAPP; 01/2014
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for three-finger precision grasp and its implementation in a complete grasping toolchain. We start from binocular vision to recover the partial 3D structure of unknown objects. We then process the incomplete 3D point clouds searching for good triplets according to a function that weighs both the feasibility and the stability of the solution. In particular, while stability is determined in a classical way (i.e. via force-closure), feasibility is evaluated according to a new measure that includes information about the possible configuration shapes of the hand as well as the hand’s inverse kinematics. We finally extensively assess the proposed method using the stereo vision and the kinematics of the iCub robot.
    Proceedings - IEEE International Conference on Robotics and Automation; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a developmental framework for action-driven perception in anthropomorphic robots. The key idea of the framework is that action generation develops the agent's perception of its own body and actions. Action-driven development is critical for identifying changing body parts and understanding the effects of actions in unknown or nonstationary environments. We embedded minimal knowledge into the robot's cognitive system in the form of motor synergies and actions to allow motor exploration. The robot voluntarily generates actions and develops the ability to perceive its own body and the effect that it generates on the environment. The robot, in addition, can compose this kind of learned primitives to perform complex actions and characterize them in terms of their sensory effects. After learning, the robot can recognize manipulative human behaviors with cross-modal anticipation for recovery of unavailable sensory modality, and reproduce the recognized actions afterward. We evaluated the proposed framework in the experiments with a real robot. In the experiments, we achieved autonomous body identification, learning of fixation, reaching and grasping actions, and developmental recognition of human actions as well as their reproduction.
    IEEE transactions on neural networks and learning systems 01/2014; 25(1):183-202. · 3.77 Impact Factor
  • Source
    Nicholas M Wilkinson, Giorgio Metta
    [Show abstract] [Hide abstract]
    ABSTRACT: Visual scan paths exhibit complex, stochastic dynamics. Even during visual fixation, the eye is in constant motion. Fixational drift and tremor are thought to reflect fluctuations in the persistent neural activity of neural integrators in the oculomotor brainstem, which integrate sequences of transient saccadic velocity signals into a short term memory of eye position. Despite intensive research and much progress, the precise mechanisms by which oculomotor posture is maintained remain elusive. Drift exhibits a stochastic statistical profile which has been modeled using random walk formalisms. Tremor is widely dismissed as noise. Here we focus on the dynamical profile of fixational tremor, and argue that tremor may be a signal which usefully reflects the workings of oculomotor postural control. We identify signatures reminiscent of a certain flavor of transient neurodynamics; toric traveling waves which rotate around a central phase singularity. Spiral waves play an organizational role in dynamical systems at many scales throughout nature, though their potential functional role in brain activity remains a matter of educated speculation. Spiral waves have a repertoire of functionally interesting dynamical properties, including persistence, which suggest that they could in theory contribute to persistent neural activity in the oculomotor postural control system. Whilst speculative, the singularity hypothesis of oculomotor postural control implies testable predictions, and could provide the beginnings of an integrated dynamical framework for eye movements across scales.
    Frontiers in Systems Neuroscience 01/2014; 8:29.
  • 13th International Conference on Intelligent Autonomous Systems; 01/2014
  • Source
    Nicholas Wilkinson, Giorgio Metta
    [Show abstract] [Hide abstract]
    ABSTRACT: Empirical studies have revealed remarkable perceptual organization in neonates. Newborn behavioral distinctions have often been interpreted as implying functionally specific modular adaptations, and are widely cited as evidence supporting the nativist agenda. In this theoretical paper, we approach newborn perception and attention from an embodied, developmental perspective. At the mechanistic level, we argue that a generative mechanism based on mutual gain control between bilaterally corresponding points may underly a number of functionally defined "innate predispositions" related to spatial-configural perception. At the computational level, bilateral gain control implements beamforming, which enables spatial-configural tuning at the front end sampling stage. At the psychophysical level, we predict that selective attention in newborns will favor contrast energy which projects to bilaterally corresponding points on the neonate subject's sensor array. The current work extends and generalizes previous work to formalize the bilateral correlation model of newborn attention at a high level, and demonstrate in minimal agent-based simulations how bilateral gain control can enable a simple, robust and "social" attentional bias.
    Frontiers in Neurorobotics 01/2014; 8:9.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent developments in learning sophisticated, hierarchical image representations have led to remarkable progress in the context of visual recognition. While these methods are becoming standard in modern computer vision systems, they are rarely adopted in robotics. The question arises of whether solutions, which have been primarily developed for image retrieval, can perform well in more dynamic and unstructured scenarios. In this paper we tackle this question performing an extensive evaluation of state of the art methods for visual recognition on a iCub robot. We consider the problem of classifying 15 different objects shown by a human demon-strator in a challenging Human-Robot Interaction scenario. The classification performance of hierarchical learning approaches are shown to outperform benchmark solutions based on local descriptors and template matching. Our results show that hierarchical learning systems are computationally efficient and can be used for real-time training and recognition of objects.
    IEEE International Conference on Intelligent Robots and Systems (IROS); 10/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel technological approach for the implementation of large-area flexible artificial skin based on arrays of piezoelectric polymer transducers. Polyvinylidene fluoride (PVDF) transducers are chosen for the high electromechanical transduction frequency bandwidth (up to 1 kHz). A low-cost and scalable technique for extracting PVDF signals is used to directly provide the piezoelectric film with patterned electrodes. If the skin is meant to cover large areas of a robot body, specific requirements have to be fulfilled from the point of view of the overall system and of the technology. Experimental tests on the prototype skin modules demonstrate the feasibility of the proposed approach and reveal the potential-ity to build large area flexible skin.
    IEEE Sensors Journal 10/2013; 13(10):4022 - 4029. · 1.48 Impact Factor
  • Source

Publication Stats

3k Citations
110.00 Total Impact Points


  • 2006–2013
    • Italian Institute of Technology (IIT)
      • • iCub Facility
      • • Department of Robotics, Brain and Cognitive Sciences
      Genova, Liguria, Italy
  • 2012
    • Idiap Research Institute
      Martigny, Valais, Switzerland
  • 2007–2012
    • University of Ferrara
      • Sezione di Fisiologia Umana
      Ferrare, Emilia-Romagna, Italy
    • University of Sharjah
      Ash Shāriqah, Ash Shāriqah, United Arab Emirates
    • Università degli Studi di Trento
      Trient, Trentino-Alto Adige, Italy
  • 2011
    • Osaka University
      • Department of Systems Innovation
      Ōsaka-shi, Osaka-fu, Japan
    • University of Plymouth
      • Adaptive Behaviour and Cognition Laboratory
      Plymouth, ENG, United Kingdom
  • 1995–2011
    • Università degli Studi di Genova
      • Dipartimento di Matematica (DIMA)
      Genova, Liguria, Italy
  • 2009
    • Delft University of Technology
      Delft, South Holland, Netherlands
  • 2008
    • Khalifa University
      Abū Z̧aby, Abu Dhabi, United Arab Emirates
    • Università del Salento
      • Interdisciplinary Center for Research on Language CRIL
      Lecce, Apulia, Italy
  • 2006–2007
    • University of Salford
      • School of Computing, Science and Engineering
      Salford, England, United Kingdom
  • 2004
    • Democritus University of Thrace
      Komotina, East Macedonia and Thrace, Greece
  • 2002–2003
    • Massachusetts Institute of Technology
      Cambridge, Massachusetts, United States
  • 2000
    • Collège de France
      Lutetia Parisorum, Île-de-France, France