Conference Paper

Detecting robot-directed speech by situated understanding in object manipulation tasks

Adv. Telecommun. Res. Labs., Kyoto Inst. of Technol., Kyoto, Japan
DOI: 10.1109/ROMAN.2010.5598729 Conference: RO-MAN, 2010 IEEE
Source: IEEE Xplore


In this paper, we propose a novel method for a robot to detect robot-directed speech, that is, to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interaction. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.

Download full-text


Available from: Shigeki Matsuda, Apr 08, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Speech technologies nowadays available on mobile devices show an increased performance both in terms of the language that they are able to capture and in terms of reliability. The availability of perfor-mant speech recognition engines suggests the deployment of vocal inter-faces also in consumer robots. In this paper, we report on our current work, by specifically focussing on the difficulties that arise in grounding the user's utterances in the environment where the robot is operating.
    Full-text · Conference Paper · Jul 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The currently available speech technologies on mobile devices achieve effective performance in terms of both reliability and the language they are able to capture. The availability of performant speech recognition engines may also support the deployment of vocal interfaces in consumer robots. However, the design and implementation of such interfaces still requires significant work. The language processing chain and the domain knowledge must be built for the specific features of the robotic platform, the deployment environment and the tasks to be performed. Hence, such interfaces are currently built in a completely ad hoc way. In this paper, we present a design methodology together with a support tool aiming to streamline and improve the implementation of dedicated vocal interfaces for robots. This work was developed within an experimental project called Speaky for Robots. We extend the existing vocal interface development framework to target robotic applications. The proposed solution is built using a bottom-up approach by refining the language processing chain through the development of vocal interfaces for different robotic platforms and domains. The proposed approach is validated both in experiments involving several research prototypes and in tests involving end-users.
    No preview · Article · Jul 2015 · Applied Intelligence
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.
    Full-text · Article · Sep 2015