Giorgio Metta

Italian Institute of Technology (IIT), Genova, Liguria, Italy

Are you Giorgio Metta?

Claim your profile

Publications (296)134.79 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.
    IEEE transactions on neural networks and learning systems 07/2014; · 4.37 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Human expertise in face perception grows over development, but even within minutes of birth, infants exhibit an extraordinary sensitivity to face-like stimuli. The dominant theory accounts for innate face detection by proposing that the neonate brain contains an innate face detection device, dubbed 'Conspec'. Newborn face preference has been promoted as some of the strongest evidence for innate knowledge, and forms a canonical stage for the modern form of the nature-nurture debate in psychology. Interpretation of newborn face preference results has concentrated on monocular stimulus properties, with little mention or focused investigation of potential binocular involvement. However, the question of whether and how newborns integrate the binocular visual streams bears directly on the generation of observable visual preferences. In this theoretical paper, we employ a synthetic approach utilizing robotic and computational models to draw together the threads of binocular integration and face preference in newborns, and demonstrate cases where the former may explain the latter. We suggest that a system-level view considering the binocular embodiment of newborn vision may offer a mutually satisfying resolution to some long-running arguments in the polarizing debate surrounding the existence and causal structure of newborns' 'innate knowledge' of faces.
    Developmental Science 06/2014; · 3.89 Impact Factor
  • Developmental Science 06/2014; · 3.89 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data).
    Topics in Cognitive Science 06/2014; 6(3):461-475. · 2.88 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article presents results from a multidisciplinary research project on the integration and transfer of language knowledge into robots as an empirical paradigm for the study of language development in both humans and humanoid robots. Within the framework of human linguistic and cognitive development, we focus on how three central types of learning interact and co-develop: individual learning about one's own embodiment and the environment, social learning (learning from others), and learning of linguistic capability. Our primary concern is how these capabilities can scaffold each other's development in a continuous feedback cycle as their interactions yield increasingly sophisticated competencies in the agent's capacity to interact with others and manipulate its world. Experimental results are summarized in relation to milestones in human linguistic and cognitive development and show that the mutual scaffolding of social learning, individual learning, and linguistic capabilities creates the context, conditions, and requisites for learning in each domain. Challenges and insights identified as a result of this research program are discussed with regard to possible and actual contributions to cognitive science and language ontogeny. In conclusion, directions for future work are suggested that continue to develop this approach toward an integrated framework for understanding these mutually scaffolding processes as a basis for language development in humans and robots.
    Topics in Cognitive Science 06/2014; · 2.88 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Calibration continues to receive significant atten-tion in robotics because of its key impact on performance and cost associated with the operation of complex robots. Calibration of kinematic parameters is typically the first mandatory step. To this end, a variety of metrology systems and corresponding algorithms have been described in the literature relying on measurements of the pose of the end-effector using a camera or laser tracking system, or, exploiting constraints arising from contacts of the end-effector with the environment. In this work, we take inspiration from the behavior of infants and certain animals, who are believed to use self-stimulation or self-touch to "calibrate" their body representations, and present a new solution to this problem by letting the robot close the kinematic chain by touching its own body. The robot considered in this paper is sensorized with tactile arrays for a total of about 4200 sensing points. The correspondence between the predicted contact point from existing forward kinematics and the actual position on the robot's 'skin' provides sample data that allows refining the kinematic representation (DH param-eters). The data collection procedure is automated—self-touch is autonomously executed by the robot—and can be repeated at any time, providing a compact self-calibration system that does not require an external measurement apparatus.
    Proc. IEEE Int. Conf. Robotics and Automation (ICRA), Hong Kong, China; 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose a weighted supervised pooling method for visual recognition systems. We combine a standard Spatial Pyramid Representation which is commonly adopted to encode spatial information, with an appropriate Feature Space Representation favoring semantic information in an appropriate feature space. For the latter, we propose a weighted pooling strategy exploiting data supervision to weigh each local descriptor coherently with its likelihood to belong to a given object class. The two representations are then combined adaptively with Multiple Kernel Learning. Experiments on common benchmarks (Caltech- 256 and PASCAL VOC-2007) show that our image representation improves the current visual recognition pipeline and it is competitive with similar state-of-art pooling methods. We also evaluate our method on a real Human-Robot Interaction setting, where the pure Spatial Pyramid Representation does not provide sufficient discriminative power, obtaining a remarkable improvement.
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 06/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new technique to control highly redundant mechanical systems, such as humanoid robots. We take inspiration from two approaches. Prioritized control is a widespread multi-task technique in robotics and animation: tasks have strict priorities and they are satisfied only as long as they do not conflict with any higher-priority task. Optimal control instead formulates an optimization problem whose solution is either a feedback control policy or a feedforward trajectory of control inputs. We introduce strict priorities in multi-task optimal control problems, as an alternative to weighting task errors proportionally to their importance. This ensures the respect of the specified priorities, while avoiding numerical conditioning issues. We compared our approach with both prioritized control and optimal control with tests on a simulated robot with 11 degrees of freedom.
    Robotics and Automation, IEEE International Conference on (ICRA), Hong Kong, China; 05/2014
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we propose an autoencoder-based method for the unsupervised identification of subword units. We experiment with different types and architectures of autoencoders to asses what autoencoder properties are most important for this task. We first show that the encoded representation of speech pro-duced by standard autencoders is more effective than Gaus-sian posteriorgrams in a spoken query classification task. Fi-nally we evaluate the subword inventories produced by the proposed method both in terms of classification accuracy in a word classification task (with lexicon size up to 263 words) and in terms of consistency between subword transcription of different word examples of a same word type. The evaluation is carried out on Italian and American English datasets.
    IEEE Internation Conference on Acoustics, Speech and Language Processing; 05/2014
  • Source
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that image representations learned through ad-hoc dictionaries improve the overall results in object categorization problems. Following the widely accepted coding-pooling visual recognition pipeline, these representations are often tightly coupled with a coding stage. In this paper we show how to exploit ad- hoc representations both within the coding and the pooling phases. We learn a dictionary for each object class and then use local descriptors encoded with the learned atoms to guide the pooling operator. We exhaustively evaluate the proposed approach in both single instance object recognition and object categorization problems. From the applications standpoint we consider a classical image retrieval scenario with the Caltech 101, as well as a typical robot vision task with data acquired by the iCub humanoid robot.
    VISAPP; 01/2014
  • Source
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for three-finger precision grasp and its implementation in a complete grasping toolchain. We start from binocular vision to recover the partial 3D structure of unknown objects. We then process the incomplete 3D point clouds searching for good triplets according to a function that weighs both the feasibility and the stability of the solution. In particular, while stability is determined in a classical way (i.e. via force-closure), feasibility is evaluated according to a new measure that includes information about the possible configuration shapes of the hand as well as the hand’s inverse kinematics. We finally extensively assess the proposed method using the stereo vision and the kinematics of the iCub robot.
    Proceedings - IEEE International Conference on Robotics and Automation; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Motor resonance mechanisms are known to affect humans' ability to interact with others, yielding the kind of "mutual understanding" that is the basis of social interaction. However, it remains unclear how the partner's action features combine or compete to promote or prevent motor resonance during interaction. To clarify this point, the present study tested whether and how the nature of the visual stimulus and the properties of the observed actions influence observer's motor response, being motor contagion one of the behavioral manifestations of motor resonance. Participants observed a humanoid robot and a human agent move their hands into a pre-specified final position or put an object into a container at various velocities. Their movements, both in the object- and non-object- directed conditions, were characterized by either a smooth/curvilinear or a jerky/segmented trajectory. These trajectories were covered with biological or non-biological kinematics (the latter only by the humanoid robot). After action observation, participants were requested to either reach the indicated final position or to transport a similar object into another container. Results showed that motor contagion appeared for both the interactive partner except when the humanoid robot violated the biological laws of motion. These findings suggest that the observer may transiently match his/her own motor repertoire to that of the observed agent. This matching might mediate the activation of motor resonance, and modulate the spontaneity and the pleasantness of the interaction, whatever the nature of the communication partner.
    PLoS ONE 01/2014; 9(8):e106172. · 3.53 Impact Factor
  • 01/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a developmental framework for action-driven perception in anthropomorphic robots. The key idea of the framework is that action generation develops the agent's perception of its own body and actions. Action-driven development is critical for identifying changing body parts and understanding the effects of actions in unknown or nonstationary environments. We embedded minimal knowledge into the robot's cognitive system in the form of motor synergies and actions to allow motor exploration. The robot voluntarily generates actions and develops the ability to perceive its own body and the effect that it generates on the environment. The robot, in addition, can compose this kind of learned primitives to perform complex actions and characterize them in terms of their sensory effects. After learning, the robot can recognize manipulative human behaviors with cross-modal anticipation for recovery of unavailable sensory modality, and reproduce the recognized actions afterward. We evaluated the proposed framework in the experiments with a real robot. In the experiments, we achieved autonomous body identification, learning of fixation, reaching and grasping actions, and developmental recognition of human actions as well as their reproduction.
    IEEE transactions on neural networks and learning systems 01/2014; 25(1):183-202. · 4.37 Impact Factor
  • Source
    Nicholas M Wilkinson, Giorgio Metta
    [Show abstract] [Hide abstract]
    ABSTRACT: Visual scan paths exhibit complex, stochastic dynamics. Even during visual fixation, the eye is in constant motion. Fixational drift and tremor are thought to reflect fluctuations in the persistent neural activity of neural integrators in the oculomotor brainstem, which integrate sequences of transient saccadic velocity signals into a short term memory of eye position. Despite intensive research and much progress, the precise mechanisms by which oculomotor posture is maintained remain elusive. Drift exhibits a stochastic statistical profile which has been modeled using random walk formalisms. Tremor is widely dismissed as noise. Here we focus on the dynamical profile of fixational tremor, and argue that tremor may be a signal which usefully reflects the workings of oculomotor postural control. We identify signatures reminiscent of a certain flavor of transient neurodynamics; toric traveling waves which rotate around a central phase singularity. Spiral waves play an organizational role in dynamical systems at many scales throughout nature, though their potential functional role in brain activity remains a matter of educated speculation. Spiral waves have a repertoire of functionally interesting dynamical properties, including persistence, which suggest that they could in theory contribute to persistent neural activity in the oculomotor postural control system. Whilst speculative, the singularity hypothesis of oculomotor postural control implies testable predictions, and could provide the beginnings of an integrated dynamical framework for eye movements across scales.
    Frontiers in Systems Neuroscience 01/2014; 8:29.
  • 13th International Conference on Intelligent Autonomous Systems; 01/2014

Publication Stats

3k Citations
134.79 Total Impact Points


  • 2006–2014
    • Italian Institute of Technology (IIT)
      • • iCub Facility
      • • Department of Robotics, Brain and Cognitive Sciences
      Genova, Liguria, Italy
  • 2013
    • University Pompeu Fabra
      Barcino, Catalonia, Spain
  • 2012
    • Idiap Research Institute
      Martigny, Valais, Switzerland
  • 2007–2012
    • University of Ferrara
      • Sezione di Fisiologia Umana
      Ferrare, Emilia-Romagna, Italy
    • Università degli Studi di Trento
      Trient, Trentino-Alto Adige, Italy
    • University of Sharjah
      Ash Shāriqah, Ash Shāriqah, United Arab Emirates
  • 2011
    • Osaka University
      • Department of Systems Innovation
      Ōsaka-shi, Osaka-fu, Japan
  • 2008–2011
    • University of Plymouth
      • Adaptive Behaviour and Cognition Laboratory
      Plymouth, ENG, United Kingdom
    • Università del Salento
      • Interdisciplinary Center for Research on Language CRIL
      Lecce, Apulia, Italy
    • Khalifa University
      Abū Z̧aby, Abu Dhabi, United Arab Emirates
  • 1995–2011
    • Università degli Studi di Genova
      • Dipartimento di Matematica (DIMA)
      Genova, Liguria, Italy
  • 2009
    • Delft University of Technology
      Delft, South Holland, Netherlands
  • 2006–2007
    • University of Salford
      • School of Computing, Science and Engineering
      Salford, England, United Kingdom
  • 2004
    • Democritus University of Thrace
      Komotina, East Macedonia and Thrace, Greece
  • 2002–2003
    • Massachusetts Institute of Technology
      Cambridge, Massachusetts, United States
  • 2000
    • Collège de France
      Lutetia Parisorum, Île-de-France, France