Conference Paper

Detecting robot-directed speech by situated understanding in object manipulation tasks

Adv. Telecommun. Res. Labs., Kyoto Inst. of Technol., Kyoto, Japan
DOI: 10.1109/ROMAN.2010.5598729 Conference: RO-MAN, 2010 IEEE
Source: IEEE Xplore


In this paper, we propose a novel method for a robot to detect robot-directed speech, that is, to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interaction. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.

Download full-text


Available from: Shigeki Matsuda, Apr 08, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Speech technologies nowadays available on mobile devices show an increased performance both in terms of the language that they are able to capture and in terms of reliability. The availability of perfor-mant speech recognition engines suggests the deployment of vocal inter-faces also in consumer robots. In this paper, we report on our current work, by specifically focussing on the difficulties that arise in grounding the user's utterances in the environment where the robot is operating.
    Artificial General Intelligence, Beijing, China; 07/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.