Conference Paper

Automatic Evaluation of Local Topic Quality

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Topic model interpretability is a nebulous concept (Lipton, 2018) related to other topic model qualities, but without an agreed-upon definition. Measures of semantic coherence influence how easily understood the top-N T ws are (Morstatter and Liu, 2017;Lund et al., 2019;Newman et al., 2010a;Lau et al., 2014b). This is also referred to as topic understandability (Röder et al., 2015;Aletras et al., 2015). ...
... Morstatter and Liu (2017) presented interpretability from the perspective of both coherence and consensus, where consensus is a measure of annotator agreement about a topics' representation in its T dc . Alignment is how representative a topic is of its T dc and is another understanding of interpretability (Ando and Lee, 2001;Chang et al., 2009;Mimno et al., 2011;Bhatia et al., 2017;Alokaili et al., 2019;Morstatter and Liu, 2017;Lund et al., 2019). However, the probabilistic nature of topic models impede this measure. ...
... Following the seminal work of Chang et al. (2009), the development of coherence measures and the human evaluation tasks that guide their design has been actively pursued (Newman et al., 2010a;Bhatia et al., 2017Bhatia et al., , 2018Morstatter and Liu, 2017;Lau and Baldwin, 2016;Lund et al., 2019;Alokaili et al., 2019). Newman et al. (2010a) showed that human ratings of topic coherence (observed coherence) correlated with their coherence measure when the aggregate Pointwise Mutual Information (PMI) pairwise scores were calculated over the top-N T ws . ...
... NVI has allowed the above recurrent topic modeling approaches to be studied, but with two primary modifications: the discrete variables can be reparametrized and then sampled, or each word's topic assignment can be analytically marginalized out, prior to performing any learning. However, previous work has shown that topic models that preserve explicit topics yield higher quality topics than similar models that do not [25], and recent work has shown that topic models that have relatively consistent word-level topic assignments are preferred by end-users [24]. Together, these suggest that there are benefits to preserving these assignments that are absent from standard RNN-based language models. ...
... While "coherence" [30] has been a popular automated metric, it can have peculiar failure points especially regarding very common words [36]. To counter this, Lund et al. [24] recently introduced switch percent (SwitchP). SwitchP makes the very intuitive yet simple assumption that "good" topics will exhibit a type of inertia: one would not expect adjacent words to use many different topics. ...
... Consistent with Wang et al. [49] we report the maximum of three VRTM runs. [24] results and entropy analysis of the model. These results support the idea that if topic models capture semantic dependencies, then they should capture the topics well, explain the topic assignment for each word, and provide an overall level of thematic consistency across the document (lower θ entropy). ...
Conference Paper
Full-text available
We show how to learn a neural topic model with discrete random variables-one that explicitly models each word's assigned topic-using neural variational inference that does not rely on stochastic backpropagation to handle the discrete variables. The model we utilize combines the expressive power of neural methods for representing sequences of text with the topic model's ability to capture global, thematic coherence. Using neural variational inference, we show improved perplex-ity and document understanding across multiple corpora. We examine the effect of prior parameters both on the model and variational parameters, and demonstrate how our approach can compete and surpass a popular topic model implementation on an automatic measure of topic quality.
... NVI has allowed the above recurrent topic modeling approaches to be studied, but with two primary modifications: the discrete variables can be reparametrized and then sampled, or each word's topic assignment can be analytically marginalized out, prior to performing any learning. However, previous work has shown that topic models that preserve explicit topics yield higher quality topics than similar models that do not [25], and recent work has shown that topic models that have relatively consistent word-level topic assignments are preferred by end-users [24]. Together, these suggest that there are benefits to preserving these assignments that are absent from standard RNN-based language models. ...
... While "coherence" [30] has been a popular automated metric, it can have peculiar failure points especially regarding very common words [36]. To counter this, Lund et al. [24] recently introduced switch percent (SwitchP). SwitchP makes the very intuitive yet simple assumption that "good" topics will exhibit a type of inertia: one would not expect adjacent words to use many different topics. ...
... Consistent with Wang et al. [49] we report the maximum of three VRTM runs. [24] results and entropy analysis of the model. These results support the idea that if topic models capture semantic dependencies, then they should capture the topics well, explain the topic assignment for each word, and provide an overall level of thematic consistency across the document (lower θ entropy). ...
Preprint
We show how to learn a neural topic model with discrete random variables---one that explicitly models each word's assigned topic---using neural variational inference that does not rely on stochastic backpropagation to handle the discrete variables. The model we utilize combines the expressive power of neural methods for representing sequences of text with the topic model's ability to capture global, thematic coherence. Using neural variational inference, we show improved perplexity and document understanding across multiple corpora. We examine the effect of prior parameters both on the model and variational parameters and demonstrate how our approach can compete and surpass a popular topic model implementation on an automatic measure of topic quality.
... Simply relying on human judgment to determine whether 624 a topic is good or not does not encompass all of a topic's properties. The documents that should be represented by the topic or topic diversity are seldom accounted for (Newman et al. 2010;Lund et al. 2019;Clark et al. 2021). Clark et al. (2021) even question human judgment alltogether; however, the used questionnaire design not only does not provide a midpoint but additionally can strongly induce a bias in preference due to a highly biasing follow-up question (Clark et al. 2021) (see, e.g., Lehman et al. 1992). ...
... Adapting questions and tasks to the complicated nature of topics can result in promising questionnaire designs. Lund et al. (2019), for example, introduced a topicword matching task, weighting and selecting answers from participants who have a high confidence and performed well on test questions. Choosing that approach reduces ambiguity in answers, but also induces a bias towards highly confident participants and neglects the subtle differences in perceived quality from humans. ...
Article
Full-text available
Extracting and identifying latent topics in large text corpora have gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. Through simple corpus expansion, our model can detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared with state-of-the-art topic modeling and document clustering models. The code is available at the following link: https://github.com/AnFreTh/STREAM.
... The metric considers the entire topic-word distribution, unlike the coherence measure. SwitchP (Lund et al., 2019) estimates local topic quality, that regards to the quality of a topic within a specific document. SwitchP demonstrates a higher positive correlation with human judgments in comparison to coherence (Lund et al., 2019;Rezaee and Ferraro, 2020). ...
... SwitchP (Lund et al., 2019) estimates local topic quality, that regards to the quality of a topic within a specific document. SwitchP demonstrates a higher positive correlation with human judgments in comparison to coherence (Lund et al., 2019;Rezaee and Ferraro, 2020). ...
Conference Paper
Full-text available
Topic modeling is an essential instrument for exploring and uncovering latent patterns in unstructured textual data, that allows researchers and analysts to extract valuable understanding of a particular domain. Nonetheless, topic modeling lacks consensus on the matter of its evaluation. The estimation of obtained insightful topics is complicated by several obstacles, the majority of which are summarized by the absence of a unified system of metrics, the one-sidedness of evaluation, and the lack of generalization. Despite various approaches proposed in the literature, there is still no consensus on the aspects of effective examination of topic quality. In this research paper, we address this problem and propose a novel framework for evaluating topic modeling results based on the notion of attention mechanism and Layer-wise Relevance Propagation as tools for discovering the dependencies between text tokens. One of our proposed metrics achieved a 0.71 Pearson correlation and 0.74 correlation with human assessment. Additionally, our score variant outperforms other metrics on the challenging Amazon Fine Food Reviews dataset, suggesting its ability to capture contextual information in shorter texts.
... Except from the coherence-based metric from [12] that has a high correlation with human judgement and implementations of the basic automatic metrics for quality evaluation (such as NPMI [16], switchP [17]), there is also a variant of LLM-based metric inspired by [18]. ...
Preprint
Full-text available
In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different languages.
... It is also essential to pay attention to the obtained text representations. In [27], a new metric switchP is proposed to measure the local quality, which is a way to estimate how good the model is at describing topics of the document. ...
Conference Paper
Full-text available
Topic modeling is a popular unsupervised method for text corpora processing to obtain interpreted knowledge on the data. However there is an automatic quality measurement gap between existing metrics, human evaluation and performance on the target tasks. That is a big challenge for automatic hyperparameter tuning methods as they heavily rely on the output signal to define the optimisation direction. Currently, this process of evaluating the effectiveness of the topic model faces a number of difficulties and keeps being a labor-intensive routine performed manually due to absence of universal metric that may show strong correspondence with human assessment. Development of a quality metric that may satisfy this condition is essential to provide a valuable feedback for the optimization algorithm when working with flexible and complex models such as models based on additive regularisation or neural networks. To address the quality measurement gap we performed an experimental study of existing scores on specially created dataset containing topic models for several different text corpora accompanied with evaluated existing metrics and scores obtained from human assessment. The results of study shows how the situation with automatic quality estimation may be improved and paves a way to metrics learning with ensembles of machine learning algorithms.
... He has been teaching using a flipped classroom approach since 2013. He and his collaborators helped end the use of perplexity for topic models (Chang et al., 2009), first developed interactive topic models (Hu et al., 2011), and improved word-level analysis of topic model explanations (Lund et al., 2019). Additional information at: http://boydgraber.org. ...
Chapter
The paper addresses a problem of tuning topic models with additive regularization by introducing a novel hybrid evolutionary approach that combines Genetic and Nelder-Mead algorithms to generate domain-specific topic models with better quality. Introducing Nelder-Mead into the Genetic Algorithm pursues the goal of enhancing exploitation capabilities of the resulting hybrid algorithm with improved local search. The conducted experimental study performed on several datasets on Russian and English languages shows noticeable increase in quality of the obtained topic models. Moreover, the experiments demonstrate that the proposed modification also improves the convergence dynamics of the tuning procedure, leading to a stable increases in quality from generation to generation.KeywordsTopic modelingEvolutionary algorithmsGenetic algorithmNelder-Mead optimizationHyperparameter optimizationARTM
Article
Topic modeling is an unsupervised learning task that discovers the hidden topics in a collection of documents. In turn, the discovered topics can be used for summarizing, organizing, and understanding the documents in the collection. Most of the existing techniques for topic modeling are derivatives of the Latent Dirichlet Allocation which uses a bag-of-word assumption for the documents. However, bag-of-words models completely dismiss the relationships between the words. For this reason, this article presents a two-stage algorithm for topic modelling that leverages word embeddings and word co-occurrence. In the first stage, we determine the topic-word distributions by soft-clustering a random set of embedded n -grams from the documents. In the second stage, we determine the document-topic distributions by sampling the topics of each document from the topic-word distributions. This approach leverages the distributional properties of word embeddings instead of using the bag-of-words assumption. Experimental results on various data sets from an Australian compensation organization show the remarkable comparative effectiveness of the proposed algorithm in a task of document classification.
Article
Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.
Preprint
Full-text available
Various cultural and behavioral preferences delineate human creativity. One such set of preferences, literary inclination, impacts not only the books people choose to read but also on the way they perceive the narrative and human relationships contained within. Understanding the overall development of a plot and the evolution of relationships among different characters has significant implications for holding the reader’s attention. In this paper, we establish a computational means to derive progressions and associations among characters in a given narrative. We use two books from different cultural traditions to validate this technique. For purposes of measuring relationships progression between different characters, we propose the Graphical Association Method (GAM). Further analysis of changes in these imaginary social relationships in relation to a reader’s literary inclinations demonstrates that this method holds promise for a more general analysis of narrative structure.
Conference Paper
Full-text available
The exchangeability assumption in topic models like Latent Dirichlet Allocation (LDA) often results in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. We propose copulaLDA, that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copulaLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora.
Conference Paper
Full-text available
This paper presents an analysis of three techniques used for similar tasks, especially related to semantics, in Natural Language Processing (NLP): Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and lexical chains. These techniques were evaluated and compared on two different corpora in order to highlight the similarities and differences between them from a semantic analysis viewpoint. The first corpus consisted of four Wikipedia articles on different topics, while the second one consisted of 35 online chat conversations between 4-12 participants debating four imposed topics (forum, chat, blog and wikis). The study focuses on finding similarities and differences between the outcomes of the three methods from a semantic analysis point of view, by computing quantitative factors such as correlations, degree of coverage of the resulting topics, etc. Using corpora from different types of discourse and quantitative factors that are task-independent allows us to prove that although LSA and LDA provide similar results, the results of lexical chaining are not very correlated with neither the ones of LSA or LDA, therefore lexical chains might be used complementary to LSA or LDA when performing semantic analysis for various NLP applications.
Conference Paper
Full-text available
Statistical approaches to language learning typically foc us on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntact ic classes and semantic topics despite having no representation of syntax or seman- tics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification wi th models that exclusively use short- and long-range dependencies respectively.
Conference Paper
Full-text available
Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summarize the corpus, and guide explo- ration of its contents. However, whether the latent space is interpretable is in need of quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may infer less semantically meaningful topics.
Conference Paper
Full-text available
A significant portion of the world's text is tagged by readers on social bookmark- ing websites. Credit attribution is an in- herent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associ- ating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Al- location by defining a one-to-one corre- spondence between LDA's latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA's improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA out- performs SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative base- line on a variety of datasets.
Conference Paper
Full-text available
This work concerns automatic topic segmentation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliability. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Segmenter (LCSeg) and Latent Dirichlet Allocation (LDA)) which are solely based on lexical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clusters and incorporating conversation structure into these models improves the performance significantly.
Conference Paper
Full-text available
Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Unfortunately, typical dimensionality reduction methods for text, such as latent Dirichlet allocation, often produce low-dimensional subspaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an automated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale document collection from the National Institutes of Health (NIH).
Conference Paper
Full-text available
This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet are patchy at best.
Conference Paper
Full-text available
Topic models, like Latent Dirichlet Allocation (LDA), have been recently used to automatically generate text corpora topics, and to subdivide the corpus words among those topics. However, not all the es- timated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant or background words, or represent insigniflcant themes. Current approaches to topic modeling perform manual examination of their output to flnd meaningful and important topics. This paper presents the flrst auto- mated unsupervised analysis of LDA models to identify and distinguish junk topics from legitimate ones, and to rank the topic signiflcance. The basic idea consists of measuring the distance between a topic distribu- tion and a "junk distribution". In particular, three deflnitions of "junk distribution" are introduced, and a variety of metrics are used to com- pute the distances, from which an expressive flgure of topic signiflcance is implemented using a 4-phase Weighted Combination approach. Our ex- periments on synthetic and benchmark datasets show the efiectiveness of the proposed approach in expressively ranking the signiflcance of topics.
Article
Full-text available
Algorithms such as Latent Dirichlet Alloca- tion (LDA) have achieved significant progress in modeling word document relationships. These algorithms assume each word in the document was generated by a hidden topic and explicitly model the word distribution of each topic as well as the prior distribution over topics in the document. Given these pa- rameters, the topics of all words in the same document are assumed to be independent. In this paper, we propose modeling the top- ics of words in the document as a Markov chain. Specifically, we assume that all words in the same sentence have the same topic, and successive sentences are more likely to have the same topics. Since the topics are hid- den, this leads to using the well-known tools of Hidden Markov Models for learning and inference. We show that incorporating this dependency allows us to learn better topics and to disambiguate words that can belong to different topics. Quantitatively, we show that we obtain better perplexity in model- ing documents with only a modest increase in learning and inference complexity.
Article
Full-text available
this paper, we examine the idea of lexical chains as such a representation. We show how they can be constructed by means of WordNet, and how they can be applied in one particular linguistic task: the detection and correction of malapropisms.
Conference Paper
Probabilistic topic models describe the content of documents at word level in large document collections. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.
Article
Latent Dirichlet Allocation models a document by a mixture of topics, where each topic itself is typically modeled by a unigram word distribution. Documents however often have known structures, and the same topic can exhibit different word distributions under different parts of the structure. We extend latent Dirichlet allocation model by replacing the unigram word distributions with a factored rep-resentation conditioned on both the topic and the structure. In the resultant model each topic is equivalent to a set of unigrams, reflecting the structure a word is in. The proposed model is more flexible in modeling the corpus. The factored repre-sentation prevents combinatorial explosion and leads to efficient parameterization. We derive the variational optimization algorithm for the new model. The model shows improved perplexity on text and image data, but not significant accuracy improvement when used for classification.
Chapter
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering C{\cal C} to clustering C¢{\cal C}'. The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings. KeywordsClustering-Comparing partitions-Measures of agreement-Information theory-Mutual information
Conference Paper
A natural evaluation metric for statistical topic models is the probability of held-out documents given a trained model. While exact computation of this probability is in- tractable, several estimators for this prob- ability have been used in the topic model- ing literature, including the harmonic mean method and empirical likelihood method. In this paper, we demonstrate experimentally that commonly-used methods are unlikely to accurately estimate the probability of held- out documents, and propose two alternative methods that are both accurate and ecient. In this paper we consider only the simplest topic model, latent Dirichlet allocation (LDA), and compare a number of methods for estimating the probability of held-out documents given a trained model. Most of the methods presented, however, are applicable to more complicated topic models. In addition to com- paring evaluation methods that are currently used in the topic modeling literature, we propose several al- ternative methods. We present empirical results on synthetic and real-world data sets showing that the currently-used estimators are less accurate and have higher variance than the proposed new estimators.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
This acclaimed book is a master teacher's tested program for turning clumsy prose into clear, powerful, and effective writing. A logical, expert, easy-to-use plan for achieving excellence in expression, Style offers neither simplistic rules nor endless lists of dos and don'ts. Rather, Joseph Williams explains how to be concise, how to be focused, how to be organized. Filled with realistic examples of good, bad, and better writing, and step-by-step strategies for crafting a sentence or organizing a paragraph, Style does much more than teach mechanics: it helps anyone who must write clearly and persuasively transform even the roughest of drafts into a polished work of clarity, coherence, impact, and personality. "Buy Williams's book. And dig out from storage your dog-eared old copy of The Elements of Style. Set them side by side on your reference shelf."—Barbara Walraff, Atlantic "Let newcoming writers discover this, and let their teachers and readers rejoice. It is a practical, disciplined text that is also a pleasure to read."—Christian Century "An excellent book....It provides a sensible, well-balanced approach, featuring prescriptions that work."—Donald Karzenski, Journal of Business Communication "Intensive fitness training for the expressive mind."—Booklist (The college textbook version, Style: Ten Lessons in Clarity and Grace, 9th edition, is available from Longman. ISBN 9780321479358.)
A practical algorithm for topic modeling with provable guarantees
  • Sanjeev Arora
  • Rong Ge
  • Yonatan Halpern
  • David Mimno
  • Ankur Moitra
  • David Sontag
  • Yichen Wu
  • Michael Zhu
Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. 2013. A practical algorithm for topic modeling with provable guarantees. In Proceedings of the International Conference of Machine Learning.
Learning topic models-going beyond svd
  • Sanjeev Arora
  • Rong Ge
  • Ankur Moitra
Sanjeev Arora, Rong Ge, and Ankur Moitra. 2012. Learning topic models-going beyond svd. In Proceedings of Foundations of Computer Science.
Syntactic topic models
  • L Jordan
  • David M Boyd-Graber
  • Blei
Jordan L Boyd-Graber and David M Blei. 2009. Syntactic topic models. In Proceedings of Advances in Neural Information Processing Systems.
Not-so-latent Dirichlet allocation: Collapsed Gibbs sampling using human judgments
  • Jonathan Chang
Jonathan Chang. 2010. Not-so-latent Dirichlet allocation: Collapsed Gibbs sampling using human judgments. In NAACL Workshop: Creating Speech and Language Data With Amazon'ss Mechanical Turk.
Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality
  • David Jey Han Lau
  • Timothy Newman
  • Baldwin
Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, pages 530-539.
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation
  • Stefanie Nowak
  • Stefan Rüger
Stefanie Nowak and Stefan Rüger. 2010. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In Proceedings of the international conference on Multimedia information retrieval, pages 557-566. ACM.
Exploring the space of topic coherence measures
  • Michael Röder
  • Andreas Both
  • Alexander Hinneburg
Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining, pages 399-408. ACM.
The New York Times annotated corpus
  • Evan Sandhaus
Evan Sandhaus. 2008. The New York Times annotated corpus.