Conference Paper

Bag of multimodal LDA models for concept formation

Dept. of Electron. Eng., Univ. of Electro-Commun., Chofu, Japan
DOI: 10.1109/ICRA.2011.5980324 Conference: Robotics and Automation (ICRA), 2011 IEEE International Conference on
Source: IEEE Xplore


In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce various sets of weights to the perceptual information. Obviously, as the weights change, the categories vary. In the proposed model, various sets of weights and model structures are assumed. Then the multimodal LDA-based categorization is carried out many times that results in a variety of models. In order to make the categories (concepts) useful for inference, significant models should be selected. The selection process is carried out through the interaction between the robot and the user. These selected models enable the robot to infer unobserved properties of the object. For example, the robot can infer audio information only from its appearance. Furthermore, the robot can describe appearance of any objects using some suitable words, thanks to the connection between words and perceptual information. The proposed algorithm is implemented on a robot platform and preliminary experiment is carried out to validate the proposed algorithm.

13 Reads
  • Source
    • "This problem was addressed in many previous works, that mainly focused on the search of the optimal model granularity ; some of them proposed multi-granularity approaches, by hierarchical LDA modeling [6], bag of multimodal models [7] or multi-grain models [8]. All these approaches consisted in choosing the best granularity level, to obtain a partitioning of the semantic space that matches the task requirements; our proposal is to consider LDA models of different granularities as complementary views. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a method to detect social events in a set of pictures from an image hosting service (Flickr). This method relies on the analysis of user-generated tags, by using statistical models trained on both a small set of manually annotated data and a large data set collected from the Internet. Social event modeling relies on multi-span topic model based on LDA (Latent Dirichlet Allocation). Experiments are conducted in the experimental setup of MediaEval'2011 evaluation campaign. The proposed system outperforms significantly the best system of this benchmark, reaching a F-measure score of about 71%.
    Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a nonparametric Bayesian framework for categorizing multimodal sensory sig­ nals such as audio, visual, and haptic information by robots. The robot uses its physical embodiment to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The multimodal information enables the robot to form human-like object categories that are bases of intelligence. The proposed method is an extension of Hierarchi­ cal Dirichlet Process (HDP), which is a kind of nonparametric Bayesian models, to multimodal HDP (MHDP). MHDP can estimate the number of categories, while the parametric model, e.g. LDA-based categorization, requires to specify the number in advance. As this is an unsupervised learning method, a human user does not need to give any correct labels to the robot and it can classify objects autonomously. At the same time the proposed method provides a probabilistic framework for inferring object properties from limited observations. Validity of the proposed method is shown through some experimental results.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a robot that acquires multimodal information, i.e. auditory, visual, and haptic information, fully autonomous way using its embodiment. We also propose an online algorithm of multimodal categorization based on the acquired multimodal information and words, which are partially given by human users. The proposed framework makes it possible for the robot to learn object concepts naturally in everyday operation in conjunction with a small amount of linguistic information from human users. In order to obtain multimodal information, the robot detects an object on a fla surface. Then the robot grasps and shakes it for gaining haptic and auditory information. For obtaining visual information, the robot uses a hand held small observation table, so that the robot can control the viewpoints for observing the object. As for the multimodal concept formation, the multimodal LDA using Gibbs sampling is extended to the online version in this paper. The proposed algorithms are implemented on a real robot and tested using real everyday objects in order to show validity of the proposed system.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 09/2011
Show more

Similar Publications