Conference Paper

Bag of multimodal LDA models for concept formation

Dept. of Electron. Eng., Univ. of Electro-Commun., Chofu, Japan
DOI: 10.1109/ICRA.2011.5980324 Conference: Robotics and Automation (ICRA), 2011 IEEE International Conference on
Source: IEEE Xplore

ABSTRACT In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce various sets of weights to the perceptual information. Obviously, as the weights change, the categories vary. In the proposed model, various sets of weights and model structures are assumed. Then the multimodal LDA-based categorization is carried out many times that results in a variety of models. In order to make the categories (concepts) useful for inference, significant models should be selected. The selection process is carried out through the interaction between the robot and the user. These selected models enable the robot to infer unobserved properties of the object. For example, the robot can infer audio information only from its appearance. Furthermore, the robot can describe appearance of any objects using some suitable words, thanks to the connection between words and perceptual information. The proposed algorithm is implemented on a robot platform and preliminary experiment is carried out to validate the proposed algorithm.

  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a robot that acquires multimodal information, i.e. visual, auditory, and haptic information, fully autonomously using its embodiment. We also propose batch and online algorithms for multimodal categorization based on the acquired multimodal information and partial words given by human users. To obtain multimodal information, the robot detects an object on a flat surface. Then, the robot grasps and shakes it to obtain haptic and auditory information. For obtaining visual information, the robot uses a small hand-held observation table with an XBee wireless controller to control the viewpoints for observing the object. In this paper, for multimodal concept formation, multimodal latent Dirichlet allocation using Gibbs sampling is extended to an online version. This framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. The proposed algorithms are implemented on a real robot and tested using real everyday objects to show the validity of the proposed system.
    Advanced Robotics 12/2012; 26(17). DOI:10.1080/01691864.2012.728693 · 0.56 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a fully unsupervised algorithm for perception.•The algorithm processes high-dimensional, multimodal input.•The output is a symbolic representation along with continuous traits.•We apply it on a robotic task involving vision, proprioception and speech.
    Robotics and Autonomous Systems 11/2014; DOI:10.1016/j.robot.2014.11.005 · 1.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The formation of categories, which constitutes the basis of developing concepts, requires multimodal information with a complex structure. We propose a model called the bag of multimodal hierarchical Dirichlet processes (BoMHDP), which enables robots to form a variety of multimodal categories. The BoMHDP model is a collection of a large number of MHDP models, each of which has a different set of weights for sensory information. The weights work to realize selective attention and enable the formation of various types of categories (e.g., object, haptic, and color). The BoMHDP model is an extension of the HDP, and categorization is unsupervised. However, categories that are not natural for humans are also formed. Therefore, only the significant categories are selected through interaction between the user and the robot. At the same time, words obtained during the interaction are connected to the categories. Finally, categories, which are represented by words, are selected. The BoMHDP model was implemented on a robot platform and a preliminary experiment was conducted to validate it. The results revealed that various categories can be formed with the BoMHDP model. We also analyzed the formed conceptual structure by using multidimensional scaling. The results indicate that the complex conceptual structure was represented reasonably well with the BoMHDP model.
    Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on; 01/2012