Conference Paper

Experiments with Non-parametric Topic Models

DOI: 10.1145/2623330.2623691 Conference: 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, At New York, USA

ABSTRACT In topic modelling, various alternative priors have been de-veloped, for instance asymmetric and symmetric priors for the document-topic and topic-word matrices respectively, the hierarchical Dirichlet process prior for the document-topic matrix and the hierarchical Pitman-Yor process prior for the topic-word matrix. For information retrieval, lan-guage models exhibiting word burstiness are important. In-deed, this burstiness effect has been show to help topic mod-els as well, and this requires additional word probability vectors for each document. Here we show how to combine these ideas to develop high-performing non-parametric topic models exhibiting burstiness based on standard Gibbs sam-pling. Experiments are done to explore the behavior of the models under different conditions and to compare the algo-rithms with previously published. The full non-parametric topic models with burstiness are only a small factor slower than standard Gibbs sampling for LDA and require double the memory, making them very competitive. We look at the comparative behaviour of different models and present some experimental insights.

Download full-text


Available from: Wray Buntine, Jun 24, 2014
  • Source
    EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The two parameter Poisson-Dirichlet Process (PDP), a generalisation of the Dirichlet Process, is increasingly being used for probabilistic modelling in discrete areas such as language technology, bioinformatics, and image analysis. There is a rich literature about the PDP and its derivative distributions such as the Chinese Restaurant Process (CRP). This article reviews some of the basic theory and then the major results needed for Bayesian modelling of discrete problems including details of priors, posteriors and computation. The PDP allows one to build distributions over countable partitions. The PDP has two other remarkable properties: first it is partially conjugate to itself, which allows one to build hierarchies of PDPs, and second using a marginalised relative the CRP, one gets fragmentation and clustering properties that lets one layer partitions to build trees. This article presents the basic theory for understanding the notion of partitions and distributions over them, the PDP and the CRP, and the important properties of conjugacy, fragmentation and clustering, as well as some key related properties such as consistency and convergence. This article also presents a Bayesian interpretation of the Poisson-Dirichlet process based on an improper and infinite dimensional Dirichlet distribution. This means we can understand the process as just another Dirichlet and thus all its sampling properties emerge naturally. The theory of PDPs is usually presented for continuous distributions (more generally referred to as non-atomic distributions), however, when applied to discrete distributions its remarkable conjugacy property emerges. This context and basic results are also presented, as well as techniques for computing the second order Stirling numbers that occur in the posteriors for discrete distributions.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outperforms previous unsupervised segmentation methods using only lexical information on Choi’s datasets and two meeting transcripts and has performance comparable to those previous methods on two written datasets.
    Proceedings of NAACL-HLT; 01/2013
Show more