Conference Paper

Bag of multimodal LDA models for concept formation

Dept. of Electron. Eng., Univ. of Electro-Commun., Chofu, Japan
DOI: 10.1109/ICRA.2011.5980324 Conference: Robotics and Automation (ICRA), 2011 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce various sets of weights to the perceptual information. Obviously, as the weights change, the categories vary. In the proposed model, various sets of weights and model structures are assumed. Then the multimodal LDA-based categorization is carried out many times that results in a variety of models. In order to make the categories (concepts) useful for inference, significant models should be selected. The selection process is carried out through the interaction between the robot and the user. These selected models enable the robot to infer unobserved properties of the object. For example, the robot can infer audio information only from its appearance. Furthermore, the robot can describe appearance of any objects using some suitable words, thanks to the connection between words and perceptual information. The proposed algorithm is implemented on a robot platform and preliminary experiment is carried out to validate the proposed algorithm.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a nonparametric Bayesian framework for categorizing multimodal sensory sigĀ­ nals such as audio, visual, and haptic information by robots. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The multimodal information enables the robot to form human-like object categories that are bases of intelligence. The proposed method is an extension of HierarchiĀ­ cal Dirichlet Process (HDP), which is a kind of nonparametric Bayesian models, to multimodal HDP (MHDP). MHDP can estimate the number of categories, while the parametric model, e.g. LDA-based categorization, requires to specify the number in advance. As this is an unsupervised learning method, a human user does not need to give any correct labels to the robot and it can classify objects autonomously. At the same time the proposed method provides a probabilistic framework for inferring object properties from limited observations. Validity of the proposed method is shown through some experimental results.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a robot that acquires multimodal information, i.e. auditory, visual, and haptic information, fully autonomous way using its embodiment. We also propose an online algorithm of multimodal categorization based on the acquired multimodal information and words, which are partially given by human users. The proposed framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. In order to obtain multimodal information, the robot detects an object on a fla surface. Then the robot grasps and shakes it for gaining haptic and auditory information. For obtaining visual information, the robot uses a hand held small observation table, so that the robot can control the viewpoints for observing the object. As for the multimodal concept formation, the multimodal LDA using Gibbs sampling is extended to the online version in this paper. The proposed algorithms are implemented on a real robot and tested using real everyday objects in order to show validity of the proposed system.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The formation of categories, which constitutes the basis of developing concepts, requires multimodal information with a complex structure. We propose a model called the bag of multimodal hierarchical Dirichlet processes (BoMHDP), which enables robots to form a variety of multimodal categories. The BoMHDP model is a collection of a large number of MHDP models, each of which has a different set of weights for sensory information. The weights work to realize selective attention and enable the formation of various types of categories (e.g., object, haptic, and color). The BoMHDP model is an extension of the HDP, and categorization is unsupervised. However, categories that are not natural for humans are also formed. Therefore, only the significant categories are selected through interaction between the user and the robot. At the same time, words obtained during the interaction are connected to the categories. Finally, categories, which are represented by words, are selected. The BoMHDP model was implemented on a robot platform and a preliminary experiment was conducted to validate it. The results revealed that various categories can be formed with the BoMHDP model. We also analyzed the formed conceptual structure by using multidimensional scaling. The results indicate that the complex conceptual structure was represented reasonably well with the BoMHDP model.
    Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on; 01/2012
Show more