Table 1 - uploaded by Richard Everson
Content may be subject to copyright.
Dataset and sentiment lexicon statistics. (Note: †denotes before preprocessing and * denotes after preprocessing.) 

Dataset and sentiment lexicon statistics. (Note: †denotes before preprocessing and * denotes after preprocessing.) 

Source publication
Article
Full-text available
This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentimenttopic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain...

Contexts in source publication

Context 1
... standard stemming was performed in order to reduce the vocabulary size and address the issue of data sparseness. Summary statistics of the datasets before and after preprocessing are shown in Table 1. ...
Context 2
... the prior in- formation was produced by retaining all words in the MPQA and appraisal lexicons that occurred in the experimental datasets. The prior information statistics for each dataset is listed in the last row of Table 1. ...

Citations

... Lin et al. constructed three unsupervised emotional analysis systems using LSM model, JST model and reverse JST model. However, because deep emotional analysis inevitably involves semantics analysis, and often occurs in the text of emotional transfer phenomenon, deep semanticbased emotional analysis method is not ideal [24]. Therefore, in order to improve the effectiveness of deep semantic analysis, a dual LSTM model is introduced in this paper. ...
Article
Full-text available
The hybrid neural network model proposed in this paper consists of two main parts: extracting local features of text vectors by convolutional neural network, extracting global features related to text context by BiLSTM, and fusing the features extracted by the two complementary models. In this paper, the pre-processed sentences are put into the hybrid neural network for training. The trained hybrid neural network can automatically classify the sentences. When testing the algorithm proposed in this paper, the training corpus is Word2vec. The test results show that the accuracy rate of text categorization reaches 94.2%, and the number of iterations is 10. The results show that the proposed algorithm has high accuracy and good robustness when the sample size is seriously unbalanced.
... Lin and He (2009) proposed a four-layer probabilistic modelling framework for extracting sentiment polarity from online reviews, such that the topics are generated dependent on sentiment, while the words are generated by both sentiment and topic pairs. Lin et al. (2010) compared unsupervised document level sentiment classification method with machine learning approaches to sentiment classification that often require labeled corpora for classifier training, indicating that unsupervised classification is more appropriate for sentiment topic detection. Jo and Oh (2011) developed an unsupervised probabilistic generative model to identify and evaluate different aspects of sentiment polarities from online reviews. ...
Article
Full-text available
A significant body of knowledge exists on inverse problems and extensive research has been conducted on data-driven design in the past decade. This paper provides a comprehensive review of the state-of-the-art methods and practice reported in the literature dealing with many different aspects of data-informed inverse design. By reviewing the origins and common practice of inverse problems in engineering design, the paper presents a closed-loop decision framework of product usage data-informed inverse design. Specifically reviewed areas of focus include data-informed inverse requirement analysis by user generated content, data-informed inverse conceptual design for product innovation, data-informed inverse embodiment design for product families and product platforming, data-informed inverse analysis and optimization in detailed design, along with prevailing techniques for product usage data collection and analytics. The paper also discusses the challenges of data-informed inverse design and the prospects for future research.
... Topics are generated dependent on sentiment, and words are generated on sentiment as well as topic pairs. Later, a reverse modelling framework was presented where sentiments are generated dependent on topic distributions (Lin, He, and Everson 2010). An unsupervised model was proposed to identify aspects and evaluate sentiment polarities from online reviews (Jo and Oh 2011). ...
Article
Big consumer data provide new opportunities for business administrators to explore the value to fulfil customer requirements (CRs). Generally, they are presented as purchase records, online behaviour, etc. However, distinctive characteristics of big data, Volume, Variety, Velocity and Value or ‘4Vs’, lead to many conventional methods for customer understanding potentially fail to handle such data. A visible research gap with practical significance is to develop a framework to deal with big consumer data for CRs understanding. Accordingly, a research study is conducted to exploit the value of these data in the perspective of product designers. It starts with the identification of product features and sentiment polarities from big consumer opinion data. A Kalman filter method is then employed to forecast the trends of CRs and a Bayesian method is proposed to compare products. The objective is to help designers to understand the changes of CRs and their competitive advantages. Finally, using opinion data in Amazon.com, a case study is presented to illustrate how the proposed techniques are applied. This research is argued to incorporate an interdisciplinary collaboration between computer science and engineering design. It aims to facilitate designers by exploiting valuable information from big consumer data for market-driven product design.
... Although the data was annotated by three graduate students, a more rigorous crowd-sourcing approach should be employed to avoid any bias. The paper lacks comparison to other important techniques and algorithms based on SVM, Maximum Entropy, SenticNet, etc. Lin et al. [50] studied Bayesian models for unsupervised sentiment classification where the domain adaptation approach, named joint sentiment topic model, performed best overall. ...
... Verma and Bhattacharyya [38] used SentiWordNet with SVM and Information Gain based feature pruning to attain an accuracy of 82.10% for Cornell movie review dataset. Lin et al. [50] presented a study of unsupervised sentiment classification using Bayesian models. Joint sentiment topic model was observed to perform best with 70.20% accuracy. ...
... In this study, automated analysis techniques were used to determine the sentiment of a Tweet using Bayesian Analysis techniques. Using this Bayesian Analysis approach to Twitter Sentiment 5 sentiment analysis, a sentence can be broken into words and then a sentiment probability value can be assigned to each word which is then summed up to provide an overall sentence probability (Lin, He, & Everson, 2010). This probability can then be used to assign a sentiment category to the sentence. ...
... For the purposes of this study, the Naïve Bayesian Classification technique was selected for its simplicity, high accuracy ratings when used with good training datasets, wide adoption, ease of implementation and visibility into the classification process (Durant & Smith, 2006;Frank & Bouckaert, 2006;Lin et al., 2010). Additionally, previous research using sentiment analysis techniques applied to the financial markets have used the Naïve Bayesian Classification method (Antweiler & Frank, 2004;Sprenger & Welpe, 2010). ...
... Naïve Bayesian Classification performs text and sentiment classification by assigning probabilities to text based on the conditional probability of the words in that text occurring in a document that is classified as a member of a particular class (Lewis, 1998;Lin et al., 2010). ...
... Acc. Eigen Vector Clustering (Dasgupta and Ng, 2009) 70.9 Semi Supervised, 40% doc. Label (Li et al., 2009) 73.5 LSM Unsupervised with prior info (Lin et al., 2010) 74.1 SO-CAL Full Lexicon (Taboada et al., 2011) 76.37 RAE Semi Supervised Recursive Auto Encoders with random word initialization (Socher et al., 2011) 76.8 ...
Conference Paper
Full-text available
In this work, we propose an author-specific sentiment aggregation model for polarity prediction of reviews using an ontology. We propose an approach to construct a Phrase Annotated Author Specific Sentiment Ontology Tree (PASOT), where the facet nodes are annotated with opinion phrases of the author, used to describe the facets, as well as the author's preference for the facets. We show that an author-specific aggregation of sentiment over an ontology fares better than a flat classification model, which does not take the domain-specific facet importance or author-specific facet preference into account. We compare our approach to supervised classification using Support Vector Machines, as well as other baselines from previous works, where we achieve an accuracy improvement of 7.55% over the SVM baseline. Furthermore, we also show the effectiveness of our approach in capturing thwarting in reviews, achieving an accuracy improvement of 11.53% over the SVM baseline.
... Given that semantic class is chosen, the author gets to choose a T=bad T=good T=actor T=actor T= actor T=food T=food T=food T=service T=bad L=neg L=pos L=neg L=pos L=obj L=obj L=neg L=pos L=pos L=neg bad good kevin funny cruise food bad dish ambience average suppose great violence comedy name diner awful price face noth bore sometimes comic laugh run customer seem din hearty wasn unfortunate different early joke ship sweet just first pretty bad stupid hunt someth fun group kitchen cheap beautiful exceptional basic waste truman not eye patch feel wasn chicken diner nor ridiculous sean long talk creature meal stop quality friendly didn half excellent every hour tribe front cold recommend perfection [2] 70.9 Semi Supervised, 40% doc. Label [8] 73.5 LSM Unsupervised with prior info [10] 74.1 SO-CAL Full Lexicon [21] 76.37 RAE Semi Supervised Recursive Auto Encoders with random word initialization [20] 76.8 ...
Conference Paper
Full-text available
Traditional works in sentiment analysis and aspect rating prediction do not take author preferences and writing style into account during rating prediction of reviews. In this work, we introduce Joint Author Sentiment Topic Model (JAST), a generative process of writing a review by an author. Authors have di�fferent topic preferences, `emotional' attachment to topics, writing style based on the distribution of semantic (topic) and syntactic (background) words and their tendency to switch topics. JAST uses Latent Dirichlet Allocation to learn the distribution of author-speci�fic topic preferences and emotional attachment to topics. It uses a Hidden Markov Model to capture short range syntactic and long range semantic dependencies in reviews to capture coherence in author writing style. JAST jointly discovers the topics in a review, author preferences for the topics, topic ratings as well as the overall review rating from the point of view of an author. To the best of our knowledge, this is the fi�rst work in Natural Language Processing to bring all these dimensions together to have an author-speci�fic generative model of a review.
... In addition, it is simple and computationally efficient; rendering more suitable for online and real-time sentiment classification from the Web. Incorporating sentiment prior knowledge into LDA model for sentiment analysis has been previously studied in [8, 7] where the LDA model has been modified to jointly model sentiment and topic. However their approach uses the sentiment prior information in the Gibbs sampling inference step that a sentiment label will only be sampled if the current word token has no prior sentiment as defined in a sentiment lexicon. ...
Conference Paper
In this paper, we present a novel weakly-supervised method for cross-lingual sentiment analysis. In specific, we propose a latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon, the LSM model is able to extract highly domain-specific polarity words from text. KeywordsLatent sentiment model (LSM)–cross-lingual sentiment analysis–Generalized expectation–latent Dirichlet allocation
... This has motivated much research on sentiment transfer learning which transfers knowledge from a source task or domain to a different but related task or domain (Aue and Gamon, 2005; Blitzer et al., 2007; Wu et al., 2009; Pan et al., 2010). Joint sentiment-topic (JST) model (Lin and He, 2009; Lin et al., 2010) was extended from the latent Dirichlet allocation (LDA) model (Blei et al., 2003) to detect sentiment and topic simultaneously from text. The only supervision required by JST learning is domain-independent polarity word prior information . ...
Conference Paper
Full-text available
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
... The more recently proposed joint sentimenttopic (JST) model (Lin and He, 2009; Lin et al., 2010) holds the closest paradigm to the proposed subjLDA model. They targeted document-level sentiment detection with weakly-supervised generative model learning, where the only knowledge being incorporated was from generic sentiment lexicons. ...
... The improvement over this baseline will reflect how much subjLDA can learn from data. The LDA model (Blei et al., 2003), as shown inFigure 1(a), has been used as baseline in document-level sentiment classification in previous research (Lin et al., 2010). Thus, we also evaluated LDA on the sentence-level subjectivity detection task by modelling a mixture of three sentiment topics, i.e., positive, negative and neutral. ...
Article
Full-text available
This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classifier train-ing or linguistic pattern extraction for subjectivity classification, we view the problem as weakly-supervised generative model learning, where the only input to the model is a small set of domain independent subjectivity lexical clues. A mechanism is introduced to incor-porate the prior information about the subjectivity lexical clues into model learning by modifying the Dirichlet priors of topic-word distributions. The subjLDA model has been evaluated on the Multi-Perspective Question Answering (MPQA) dataset and promising results have been observed in the preliminary experiments. We have also explored adding neutral words as prior information for model learning. It was found that while incorporating subjectivity clues bearing positive or negative polarity can achieve a significant performance gain, the prior lexical information from neutral words is less effective.