Bag of multimodal LDA models for concept formation
ABSTRACT In this paper a novel framework for multimodal categorization using Bag of multimodal LDA models is proposed. The main issue, which is tackled in this paper, is granularity of categories. The categories are not fixed but varied according to context. Selective attention is the key to model this granularity of categories. This fact motivates us to introduce various sets of weights to the perceptual information. Obviously, as the weights change, the categories vary. In the proposed model, various sets of weights and model structures are assumed. Then the multimodal LDA-based categorization is carried out many times that results in a variety of models. In order to make the categories (concepts) useful for inference, significant models should be selected. The selection process is carried out through the interaction between the robot and the user. These selected models enable the robot to infer unobserved properties of the object. For example, the robot can infer audio information only from its appearance. Furthermore, the robot can describe appearance of any objects using some suitable words, thanks to the connection between words and perceptual information. The proposed algorithm is implemented on a robot platform and preliminary experiment is carried out to validate the proposed algorithm.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: Mixture models, in which a probability distribu-tion is represented as a linear superposition of component distributions, are widely used in sta-tistical modelling and pattern recognition. One of the key tasks in the application of mixture models is the determination of a suitable number of components. Conventional approaches based on cross-validation are computationally expen-sive, are wasteful of data, and give noisy esti-mates for the optimal number of components. A fully Bayesian treatment, based on Markov chain Monte Carlo methods for instance, will re-turn a posterior distribution over the number of components. However, in practical applications it is generally convenient, or even computation-ally essential, to select a single, most appropri-ate model. Recently it has been shown, in the context of linear latent variable models, that the use of hierarchical priors governed by continu-ous hyper-parameters whose values are set by type-II maximum likelihood, can be used to opti-mize model complexity. In this paper we extend this framework to mixture distributions by con-sidering the classical task of density estimation using mixtures of Gaussians. We show that, by setting the mixing coefficients to maximize the marginal log-likelihood, unwanted components can be suppressed, and the appropriate number of components for the mixture can be determined in a single training run without recourse to cross-validation. Our approach uses a variational treat-ment based on a factorized approximation to the posterior distribution.Artificial Intelligence and Statistics. 01/2001;
Conference Proceeding: Grounding of word meanings in multimodal concepts using LDA[show abstract] [hide abstract]
ABSTRACT: In this paper we propose LDA-based framework for multimodal categorization and words grounding for robots. The robot uses its physical embodiment to grasp and observe an object from various view points as well as listen to the sound during the observing period. This multimodal information is used for categorizing and forming multimodal concepts. At the same time, the words acquired during the observing period are connected to the related concepts using multimodal LDA. We also provide a relevance measure that encodes the degree of connection between words and modalities. The proposed algorithm is implemented on a robot platform and some experiments are carried out to evaluate the algorithm. We also demonstrate a simple conversation between a user and the robot based on the learned model.Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on; 11/2009
- [show abstract] [hide abstract]
ABSTRACT: Given a set of images containing multiple object categories,we seek to discover those categories and their image locations withoutsupervision. We achieve this using generative modelsfrom the statistical text literature: probabilistic Latent SemanticAnalysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysisthese are used to discover topics in a corpus using the bag-of-wordsdocument representation. Here we discover topics as object categories, sothat an image containing instances of several categories is modelled as amixture of topics.The models are applied to images by using avisual analogue of a word, formed by vector quantizing SIFT like regiondescriptors. We investigate a set of increasingly demanding scenarios,starting with image sets containing only two object categories through tosets containing multiple categories (including airplanes, cars, faces,motorbikes, spotted cats) and background clutter. The object categoriessample both intra-class and scale variation, and both the categories andtheir approximate spatial layout are found without supervision.We also demonstrate classification of unseen images and images containingmultiple objects. Performance of the proposed unsupervised method is compared tothe semi-supervised approach of Fergus et al.01/2005;