Conference Paper

Aspect and sentiment unification model for online review analysis.

DOI: 10.1145/1935826.1935932 Conference: Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9-12, 2011
Source: DBLP

ABSTRACT User-generated reviews on the Web contain sentiments about detailed aspects of products and services. However, most of the reviews are plain text and thus require much effort to obtain information about relevant details. In this paper, we tackle the problem of automatically discovering what aspects are evaluated in reviews and how sentiments for different aspects are expressed. We first propose Sentence-LDA (SLDA), a probabilistic generative model that assumes all words in a single sentence are generated from one aspect. We then extend SLDA to Aspect and Sentiment Unification Model (ASUM), which incorporates aspect and sentiment together to model sentiments toward different aspects. ASUM discovers pairs of {aspect, sentiment} which we call senti-aspects. We applied SLDA and ASUM to reviews of electronic devices and restaurants. The results show that the aspects discovered by SLDA match evaluative details of the reviews, and the senti-aspects found by ASUM capture important aspects that are closely coupled with a sentiment. The results of sentiment classification show that ASUM outperforms other generative models and comes close to supervised classification methods. One important advantage of ASUM is that it does not require any sentiment labels of the reviews, which are often expensive to obtain.

  • [Show abstract] [Hide abstract]
    ABSTRACT: We propose a domain-dependent/independent topic switching model based on Bayesian probabilistic modeling for modeling online product reviews that are accompanied with numerical ratings provided by users. In this model, each word is allocated to a domain-dependent topic or a domain-independent topic, and the distribution of topics in an online review is connected to an observed numerical rating via a linear regression model. Domain-dependent topics utilize domain information observed with a corpus, and domain-independent topics utilize the framework of Bayesian Nonparametrics, which can estimate the number of topics in posterior distributions. The posterior distribution is estimated via collapsed Gibbs sampling. Using real data, our proposed model had smaller mean square error and smaller average mean error with a small model size and achieved convergence in fewer iterations for a regression task involving online review ratings, outperforming a baseline model that did not consider domains. Moreover, the proposed model can also tell us whether the words are positive or negative in the form of continuous values. This feature allows us to extract domain-dependent and -independent sentiment words.
    Proceedings of the 22nd ACM international conference on Conference on information & knowledge management; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: The traditional collaborative filtering algorithm is a successful recommendation technology. The core idea of this algorithm is to calculate user or item similarity based on user ratings and then to predict ratings and recommend items based on similar users' or similar items' ratings. However, real applications face a problem of data sparsity because most users provide only a few ratings, such that the traditional collaborative filtering algorithm cannot produce satisfactory results. This paper proposes a new topic model-based similarity and two recommendation algorithms: user-based collaborative filtering with topic model algorithm (UCFTM, in this paper) and item-based collaborative filtering with topic model algorithm (ICFTM, in this paper). Each review is processed using the topic model to generate review topic allocations representing a user's preference for a product's different features. The UCFTM algorithm aggregates all topic allocations of reviews by the same user and calculates the user most valued features representing product features that the user most values. User similarity is calculated based on user most valued features, whereas ratings are predicted from similar users' ratings. The ICFTM algorithm aggregates all topic allocations of reviews for the same product, and item most valued features representing the most valued features of the product are calculated. Item similarity is calculated based on item most valued features, whereas ratings are predicted from similar items' ratings. Experiments on six data sets from Amazon indicate that when most users give only one review and one rating, our algorithms exhibit better prediction accuracy than other traditional collaborative filtering and state-of-the-art topic model-based recommendation algorithms.
    Service Oriented Computing and Applications 03/2014; 8(1):15-31.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In sentiment analysis, aspect-level review analysis has been an important task because it can catalogue, aggregate, or summarize various opinions according to a product's properties. In this paper, we explore a new concept for aspect-level review analysis, latent sentiment explanations, which are defined as a set of informative aspect-specific sentences whose polarities are consistent with that of the review. In other words, sentiment explanations best represent a review in terms of both aspect and polarity. We formulate the problem as a structure learning problem, and sentiment explanations are modeled with latent variables. Training samples are automatically identified through a set of pre-defined aspect signature terms (i.e., without manual annotation on samples), which we term the way weakly supervised. Our major contributions lie in two folds: first, we formalize the use of aspect signature terms as weak supervision in a structural learning framework, which remarkably promotes aspect-level analysis; second, the performance of aspect analysis and document-level sentiment classification are mutually enhanced through joint modeling. The proposed method is evaluated on restaurant and hotel reviews respectively, and experimental results demonstrate promising performance in both document-level and aspect-level sentiment analysis.
    Proceedings of the 22nd ACM international conference on Conference on information & knowledge management; 10/2013


Available from