Conference Paper

Bag of multimodal LDA models for concept formation

Dept. of Electron. Eng., Univ. of Electro-Commun., Chofu, Japan
DOI: 10.1109/ICRA.2011.5980324 Conference: Robotics and Automation (ICRA), 2011 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce various sets of weights to the perceptual information. Obviously, as the weights change, the categories vary. In the proposed model, various sets of weights and model structures are assumed. Then the multimodal LDA-based categorization is carried out many times that results in a variety of models. In order to make the categories (concepts) useful for inference, significant models should be selected. The selection process is carried out through the interaction between the robot and the user. These selected models enable the robot to infer unobserved properties of the object. For example, the robot can infer audio information only from its appearance. Furthermore, the robot can describe appearance of any objects using some suitable words, thanks to the connection between words and perceptual information. The proposed algorithm is implemented on a robot platform and preliminary experiment is carried out to validate the proposed algorithm.

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a robot that acquires multimodal information, i.e. auditory, visual, and haptic information, fully autonomous way using its embodiment. We also propose an online algorithm of multimodal categorization based on the acquired multimodal information and words, which are partially given by human users. The proposed framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. In order to obtain multimodal information, the robot detects an object on a fla surface. Then the robot grasps and shakes it for gaining haptic and auditory information. For obtaining visual information, the robot uses a hand held small observation table, so that the robot can control the viewpoints for observing the object. As for the multimodal concept formation, the multimodal LDA using Gibbs sampling is extended to the online version in this paper. The proposed algorithms are implemented on a real robot and tested using real everyday objects in order to show validity of the proposed system.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a nonparametric Bayesian framework for categorizing multimodal sensory sig­ nals such as audio, visual, and haptic information by robots. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The multimodal information enables the robot to form human-like object categories that are bases of intelligence. The proposed method is an extension of Hierarchi­ cal Dirichlet Process (HDP), which is a kind of nonparametric Bayesian models, to multimodal HDP (MHDP). MHDP can estimate the number of categories, while the parametric model, e.g. LDA-based categorization, requires to specify the number in advance. As this is an unsupervised learning method, a human user does not need to give any correct labels to the robot and it can classify objects autonomously. At the same time the proposed method provides a probabilistic framework for inferring object properties from limited observations. Validity of the proposed method is shown through some experimental results.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The language is a symbolic system unique to human being. The acquisition of language, which has its meanings in the real world, is important for robots to understand the environment and communicate with us in our daily life. This paper proposes a novel approach to establish a fundamental framework for the robots which can understand language through their whole body motions. The proposed framework is composed of three modules: “motion symbol”, “motion language model”, and “natural language model”. In the motion symbol module, motion data are symbolized by Hidden Markov Models (HMMs). Each HMM represents abstract motion patterns. Then the HMMs are defined as motion symbols. The motion language model is stochastically designed for links between motion symbols and words. This model consists of three layers of motion symbols, latent states and words. The connections between the motion symbol and the latent state, and between the latent state and the words are denoted by two kinds of probabilities respectively. One connection is represented by the probability that the motion symbol generates the latent state, and the other connection is represented by the probability that the latent state generates the word. Therefore, the motion language model can connect the motion symbols to the words through the latent state. The natural language model stochastically represents sequences of words. In this paper, a bigram, which is a special case of N-gram model, is adopted as the natural language model. This model has the words as nodes and transitions between two words as edges. Therefore sentence structure is expressed as transitions among words. The integration of the motion language model and natural language model can be implemented by the search computation for sentences corresponding to motions and for motions corresponding to sentences. Especially, the usage of the bigram as the natural language model provides a simple search computation so that appropriate and fast bidirectional computation between the motions and language can be achieved. Our approach makes it possible for humanoid robots not only to interpret motions as sentences but also to generate motions from sentences. The tests by using various motions and words validate our framework for the language acquisition of humanoid robots.
    Proceedings - IEEE International Conference on Robotics and Automation 01/2012; DOI:10.1109/ICRA.2012.6225331
Show more