Recently, there have been considerable advances in fast inference for latent Dirichlet allocation (LDA). In particular, stochastic
optimization of the variational Bayes (VB) objective function with a natural gradient step was proved to converge and able
to process massive document collections. To reduce noise in the gradient estimation, it considers multiple documents chosen
uniformly at random.
... [Show full abstract] While it is widely recognized that the scheduling of documents in stochastic optimization may have significant
consequences, this issue remains largely unexplored. In this work, we address this issue. Specifically, we propose residual
LDA, a novel, easy-to-implement, LDA approach that schedules documents in an informed way. Intuitively, in each iteration,
residual LDA actively selects documents that exert a disproportionately large influence on the current residual to compute
the next update. On several real-world datasets, including 3M articles from Wikipedia, we demonstrate that residual LDA can
handily analyze massive document collections and find topic models as good or better than those found with batch VB and randomly
scheduled VB, and significantly faster.