ArticlePDF Available A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection


Abstract and Figures

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentimenttopic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.
Content may be subject to copyright.
Open Research Online
The Open University’s repository of research publications
and other research outputs
A comparative study of Bayesian models for unsuper-
vised sentiment detection
Conference Item
How to cite:
Lin, Chenghua; He, Yulan and Everson, Richard (2010). A comparative study of Bayesian models for
unsupervised sentiment detection. In: The 14th Conference on Computational Natural Language Learning
(CoNLL-2010), 15-16 Jul 2010, Uppsala, Sweden.
For guidance on citations see FAQs.
2010 Association for Computational Linguistics
Version: Version of Record
Link(s) to article on publisher’s website:
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copy-
right owners. For more information on Open Research Online’s data policy on reuse of materials please consult
the policies page.
Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pages 144–152,
Uppsala, Sweden, 15-16 July 2010. c
2010 Association for Computational Linguistics
A Comparative Study of Bayesian Models for Unsupervised Sentiment
Chenghua Lin
School of Engineering,
Computing and Mathematics
University of Exeter
Exeter, EX4 4QF, UK.
Yulan He
Knowledge Media Institute
The Open University
Milton Keynes
Richard Everson
School of Engineering,
Computing and Mathematics
University of Exeter
Exeter, EX4 4QF, UK.
This paper presents a comparative study
of three closely related Bayesian mod-
els for unsupervised document level senti-
ment classification, namely, the latent sen-
timent model (LSM), the joint sentiment-
topic (JST) model, and the Reverse-JST
model. Extensive experiments have been
conducted on two corpora, the movie re-
view dataset and the multi-domain senti-
ment dataset. It has been found that while
all the three models achieve either bet-
ter or comparable performance on these
two corpora when compared to the exist-
ing unsupervised sentiment classification
approaches, both JST and Reverse-JST are
able to extract sentiment-oriented topics.
In addition, Reverse-JST always performs
worse than JST suggesting that the JST
model is more appropriate for joint senti-
ment topic detection.
1 Introduction
With the explosion of web 2.0, various types of
social media such as blogs, discussion forums and
peer-to-peer networks present a wealth of infor-
mation that can be very helpful in assessing the
general public’s sentiments and opinions towards
products and services. Recent surveys have re-
vealed that opinion-rich resources like online re-
views are having greater economic impact on both
consumers and companies compared to the tradi-
tional media (Pang and Lee, 2008). Driven by
the demand of gleaning insights of such great
amounts of user-generated data, work on new
methodologies for automated sentiment analysis
has bloomed splendidly.
Compared to the traditional topic-based text
classification, sentiment classification is deemed
to be more challenging as sentiment is often em-
bodied in subtle linguistic mechanisms such as
the use of sarcasm or incorporated with highly
domain-specific information. Although the task
of identifying the overall sentiment polarity of
a document has been well studied, most of the
work is highly domain dependent and favoured
in supervised learning (Pang et al., 2002; Pang
and Lee, 2004; Whitelaw et al., 2005; Kennedy
and Inkpen, 2006; McDonald et al., 2007), re-
quiring annotated corpora for every possible do-
main of interest, which is impractical for real
applications. Also, it is well-known that senti-
ment classifiers trained on one domain often fail
to produce satisfactory results when shifted to an-
other domain, since sentiment expression can be
quite different in different domains (Aue and Ga-
mon, 2005). Moreover, aside from the diversity
of genres and large-scale size of Web corpora,
user-generated contents evolve rapidly over time,
which demands much more efficient algorithms
for sentiment analysis than the current approaches
can offer. These observations have thus motivated
the problem of using unsupervised approaches for
domain-independent joint sentiment topic detec-
Some recent research efforts have been made to
adapt sentiment classifiers trained on one domain
to another domain (Aue and Gamon, 2005; Blitzer
et al., 2007; Li and Zong, 2008; Andreevskaia
and Bergler, 2008). However, the adaption perfor-
mance of these lines of work pretty much depends
on the distribution similarity between the source
and target domain, and considerable effort is still
required to obtain labelled data for training.
Intuitively, sentiment polarities are dependent
on contextual information, such as topics or do-
mains. In this regard, some recent work (Mei et
al., 2007; Titov and McDonald, 2008a) has tried to
model both sentiment and topics. However, these
two models either require postprocessing to calcu-
late the positive/negative coverage in a document
for polarity identification (Mei et al., 2007) or re-
quire some kind of supervised setting in which
review text should contain ratings for aspects of
interest (Titov and McDonald, 2008a). More re-
cently, Dasgupta and Ng (2009) proposed an unsu-
pervised sentiment classification algorithm by in-
tegrating user feedbacks into a spectral clustering
algorithm. Features induced for each dimension of
spectral clustering can be considered as sentiment-
oriented topics. Nevertheless, human judgement
of identifying the most important dimensions dur-
ing spectral clustering is required.
Lin and He (2009) proposed a joint sentiment-
topic (JST) model for unsupervised joint senti-
ment topic detection. They assumed that top-
ics are generated dependent on sentiment distri-
butions and then words are generated conditioned
on sentiment-topic pairs. While this is a reason-
able design choice, one may argue that the re-
verse is also true that sentiments may vary ac-
cording to topics. Thus in this paper, we studied
the reverse dependence of the JST model called
Reverse-JST, in which sentiments are generated
dependent on topic distributions in the modelling
process. We also note that, when the topic num-
ber is set to 1, both JST and reversed-JST es-
sentially become a simple latent Dirichlet alloca-
tion (LDA) model with only S(number of sen-
timent label) topics, each of which corresponds
to a sentiment label. We called it latent senti-
ment model (LSM) in this paper. Extensive ex-
periments have been conducted on the movie re-
view (MR)1(Pang et al., 2002) and multi-domain
sentiment (MDS)2(Blitzer et al., 2007) datasets
to compare the performance of LSM, JST and
Reverse-JST. Results show that all these three
models are able to give either better or compara-
ble performance compared to the existing unsu-
pervised sentiment classification approaches. In
addition, both JST and reverse-JST are able to ex-
tract sentiment-oriented topics. Furthermore, the
fact that reverse-JST always performs worse than
JST suggests that the JST model is more appropri-
ate for joint sentiment topic detection.
The rest of the paper is organized as follows.
Section 2 presents related work. Section 3 de-
scribes the LSM, JST and Reserver-JST models.
Experimental setup and results on the MR and
MDS datasets are discussed in Section 4 and 5 re-
spectively. Finally, Section 6 concludes the paper
and outlines the future work.
2 Related Work
As opposed to the work (Pang et al., 2002; Pang
and Lee, 2004; Whitelaw et al., 2005; Kennedy
and Inkpen, 2006) that only focused on senti-
ment classification in one particular domain, re-
cent research attempts have been made to address
the problem of sentiment classification across do-
mains. Aue and Gamon (2005) explored vari-
ous strategies for customizing sentiment classifiers
to new domains, where the training is based on
a small number of labelled examples and large
amounts of unlabelled in-domain data. However,
their experiments achieved only limited success,
with most of the classification accuracy below
80%. In the same vein, some more recent work
focused on domain adaption for sentiment classi-
fiers. Blitzer et al. (2007) used the structural corre-
spondence learning (SCL) algorithm with mutual
information. Li and Zong (2008) combined multi-
ple single classifiers trained on individual domains
using SVMs. However, the adaption performance
in (Blitzer et al., 2007) depends on the selection of
pivot features that used to link the source and tar-
get domains; whereas the approach of Li and Zong
(2008) heavily relies on labelled data from all the
domains to train the integrated classifier and thus
lack the flexibility to adapt the trained classifier to
domains where label information is not available.
Recent years have also seen increasing interests
in modelling both sentiment and topics simultane-
ously. The topic-sentiment mixture (TSM) model
(Mei et al., 2007) can jointly model sentiment and
topics by constructing an extra background com-
ponent and two additional sentiment subtopics on
top of the probabilistic latent semantic indexing
(pLSI) (Hofmann, 1999). However, TSM may
suffer from the problem of overfitting the data
which is known as a deficiency of pLSI, and post-
processing is also required in order to calculate
the sentiment prediction for a document. The
multi-aspect sentiment (MAS) model (Titov and
McDonald, 2008a), which is extended from the
multi-grain latent Dirichlet allocation (MG-LDA)
model (Titov and McDonald, 2008b), allows sen-
timent text aggregation for sentiment summary of
each rating aspect extracted from MG-LDA. One
drawback of MAS is that it requires that every as-
pect is rated at least in some documents, which
is practically infeasible. More recently, Dasgupta
and Ng (2009) proposed an unsupervised sen-
timent classification algorithm where user feed-
backs are provided on the spectral clustering pro-
cess in an interactive manner to ensure that text are
clustered along the sentiment dimension. Features
induced for each dimension of spectral cluster-
ing can be considered as sentiment-oriented top-
ics. Nevertheless, human judgement of identify-
ing the most important dimensions during spectral
clustering is required.
Among various efforts for improving senti-
ment detection accuracy, one direction is to in-
corporate prior information or subjectivity lexi-
con (i.e., words bearing positive or negative sen-
timent) into the sentiment model. Such sen-
timent lexicons can be acquired from domain-
independent sources in many different ways, from
manually built appraisal groups (Whitelaw et
al., 2005), to semi-automatically (Abbasi et al.,
2008) and fully automatically (Kaji and Kitsure-
gawa, 2006) constructed lexicons. When incor-
porating lexical knowledge as prior information
into a sentiment-topic model, Andreevskaia and
Bergler (2008) integrated the lexicon-based and
corpus-based approaches for sentence-level sen-
timent annotation across different domains; Li et
al. (2009) employed lexical prior knowledge for
semi-supervised sentiment classification based on
non-negative matrix tri-factorization, where the
domain-independent prior knowledge was incor-
porated in conjunction with domain-dependent un-
labelled data and a few labelled documents. How-
ever, this approach performed worse than the JST
model on the movie review data even with 40%
labelled documents as will be shown in Section 5.
3 Latent Sentiment-Topic Models
This section describes three closely related
Bayesian models for unsupervised sentiment clas-
sification, the latent sentiment model (LSM), the
joint sentiment-topic (JST) model, and the joint
topic sentiment model by reversing the generative
process of sentiment and topics in the JST model,
called Reverse-JST.
3.1 Latent Sentiment Model (LSM)
The LSM model, as shown in Figure 1(a), can be
treated as a special case of LDA where a mixture
of only three sentiment labels are modelled, i.e.
positive, negative and neutral.
Assuming that we have a total number of Ssen-
timent labels3; a corpus with a collection of D
documents is denoted by C={d1, d2, ..., dD};
each document in the corpus is a sequence of Nd
words denoted by d= (w1, w2, ..., wNd), and
each word in the document is an item from a vo-
cabulary index with Vdistinct terms denoted by
{1,2, ..., V }. The procedure of generating a word
in LSM starts by firstly choosing a distribution
over three sentiment labels for a document. Fol-
lowing that, one picks up a sentiment label from
the sentiment label distribution and finally draws a
word according to the sentiment label-word distri-
The joint probability of words and sentiment la-
bel assignment in LSM can be factored into two
P(w,l) = P(w|l)P(l|d).(1)
Letting the superscript tdenote a quantity that
excludes data from the tth position, the conditional
posterior for ltby marginalizing out the random
variables ϕand πis
P(lt=k|w,lt, β, γ)
wt,k +β
k+V β ·Nt
k,d +γk
where Nwt,k is the number of times word wthas
associated with sentiment label k;Nkis the the
number of times words in the corpus assigned to
sentiment label k;Nk,d is the number of times
sentiment label khas been assigned to some word
tokens in document d;Ndis the total number of
words in the document collection.
Gibbs sampling is used to estimate the poste-
rior distribution of LSM, as well as the JST and
Reverse-JST models that will be discussed in the
following two sections.
3.2 Joint Sentiment-Topic Model (JST)
In contrast to LSM that only models document
sentiment, the JST model (Lin and He, 2009)
can detect sentiment and topic simultaneously, by
modelling each document with S(number of sen-
timent labels) topic-document distributions. It
should be noted that when the topic number is
set to 1, JST effectively becomes the LSM model
with only three topics corresponding to each of the
3For all the three models, i.e., LSM, JST and Reverse-
JST, we set the sentiment label number Sto 3 representing
the positive, negative and neutral polarities, respectively.
(a) (b) (c)
Figure 1: (a) LSM model; (b) JST model; (c) Reverse-JST model.
three sentiment labels. Let Tbe the total num-
ber of topics, the procedure of generating a word
wiaccording to the graphical model shown in Fig-
ure 1(b) is:
For each document d, choose a distribution
For each sentiment label lof document d,
choose a distribution θd,l Dir(α).
For each word wiin document d
choose a sentiment label li
choose a topic ziMultinomial(θd,li),
choose a word wifrom ϕli
zi, a Multi-
nomial distribution over words condi-
tioned on topic ziand sentiment label li.
In JST, the joint probability of words and topic-
sentiment label assignments can be factored into
three terms:
P(w,z,l) = P(w|z,l)P(z|l, d)P(l|d).(3)
The conditional posterior for ztand ltcan be ob-
tained by marginalizing out the random variables
ϕ,θ, and π:
P(zt=j, lt=k|w,zt,lt, α, β, γ)
wt,j,k +β
j,k +V β ·Nt
j,k,d +α
k,d +T α ·Nt
k,d +γk
where Nwt,j,k is the number of times word wtap-
peared in topic jand with sentiment label k;Nj,k
is the number of times words assigned to topic
jand sentiment label k,Nk,j,d is the number of
times a word from document dhas been associ-
ated with topic jand sentiment label k;Nk,d is
the number of times sentiment label khas been
assigned to some word tokens in document d.
3.3 Reverse Joint Sentiment-Topic Model
We also studied a variant of the JST model,
called Reverse-JST. As opposed to JST in which
topic generation is conditioned on sentiment la-
bels, sentiment label generation in Reverse-JST is
dependent on topics. As shown in Figure 1(c),
Reverse-JST is effectively a four-layer hierarchi-
cal Bayesian model, where topics are associated
with documents, under which sentiment labels are
associated with topics and words are associated
with both topics and sentiment labels.
The procedure of generating a word wiin
Reverse-JST is shown below:
For each document d, choose a distribution
For each topic zof document d, choose a dis-
tribution πd,z Dir(γ).
For each word wiin document d
choose a topic ziMultinomial(θd),
choose a sentiment label li
choose a word wifrom ϕli
zi, a multi-
nomial distribution over words condi-
tioned on the topic ziand sentiment la-
bel li.
Analogy to JST, in Reverse-JST the joint prob-
ability of words and the topic-sentiment label as-
signments can be factored into the following three
P(w,l,z) = P(w|l,z)P(l|z, d)P(z|d),(5)
and the conditional posterior for ztand ltcan be
derived by integrating out the random variables ϕ,
θ, and π, yielding
P(zt=j, lt=k|w,zt,lt, α, β, γ)
wt,j,k +β
j,k +V β ·Nt
k,j,d +γk
j,d +Pkγk
j,d +α
d+T α .(6)
It it noted that most of the terms in the Reverse-
JST posterior is identical to the posterior of JST in
Equation 4, except that Nj,d is the number of times
topic jhas been assigned to some word tokens in
document d.
As we do not have a direct sentiment label-
document distribution in Reverse-JST, a distribu-
tion over sentiment label for document P(l|d)is
calculated as P(l|d) = PzP(l|z, d)P(z|d). For
all the three models, the probability P(l|d)will
be used to determine document sentiment polar-
ity. We define that a document dis classified
as a positive-sentiment document if its probabil-
ity of positive sentiment label given document
P(lpos|d), is greater than its probability of neg-
ative sentiment label given document P(lneg|d),
and vice versa.
4 Experimental Setup
4.1 Dataset Description
Two publicly available datasets, the MR and MDS
datasets, were used in our experiments. The MR
dataset (also known as the polarity dataset) has
become a benchmark for many studies since the
work of Pang et al. (2002). The version 2.0 used in
our experiment consists of 1000 positive and 1000
negative movie reviews drawn from the IMDB
movie archive, with an average of 30 sentences in
each document. We also experimented with an-
other dataset, namely subjective MR, by removing
the sentences that do not bear opinion information
from the MR dataset, following the approach of
Pang and Lee (2004). The resulting dataset still
contains 2000 documents with a total of 334,336
words and 18,013 distinct terms, about half the
size of the original MR dataset without perform-
ing subjectivity detection.
First used by Blitzer et al. (2007), the MDS
dataset contains 4 different types of product re-
views taken from including books,
DVDs, electronics and kitchen appliances, with
1000 positive and 1000 negative examples foreach
4We did not perform subjectivity detection on the MDS
dataset since its average document length is much shorter
Preprocessing was performed on both of the
datasets. Firstly, punctuation, numbers, non-
alphabet characters and stop words were removed.
Secondly, standard stemming was performed in
order to reduce the vocabulary size and address the
issue of data sparseness. Summary statistics of the
datasets before and after preprocessing are shown
in Table 1.
4.2 Defining Model Priors
In the experiments, two subjectivity lexicons,
namely the MPQA5and the appraisal lexicon6,
were combined and incorporated as prior infor-
mation into the model learning. These two lexi-
cons contain lexical words whose polarity orien-
tation have been fully specified. We extracted the
words with strong positive and negative orienta-
tion and performed stemming in the preprocess-
ing. In addition, words whose polarity changed af-
ter stemming were removed automatically, result-
ing in 1584 positive and 2612 negative words, re-
spectively. It is worth noting that the lexicons used
here are fully domain-independent and do not bear
any supervised information specifically to the MR,
subjMR and MDS datasets. Finally, the prior in-
formation was produced by retaining all words in
the MPQA and appraisal lexicons that occurred in
the experimental datasets. The prior information
statistics for each dataset is listed in the last row of
Table 1.
In contrast to Lin and He (2009) that only uti-
lized prior information during the initialization of
the posterior distributions, we use the prior infor-
mation in the Gibbs sampling inference step and
argue that this is a more appropriate experimental
setting. For the Gibbs sampling step of JST and
Reverse-JST, if the currently observed word token
matches a word in the sentiment lexicon, a cor-
responding sentiment label will be assigned and
only a new topic will be sampled. Otherwise, a
new sentiment-topic pair will be sampled for that
word token. For LSM, if the current word token
matches a word in the sentiment lexicon, a corre-
sponding sentiment label will be assigned andskip
the Gibbs sampling procedure. Otherwise, a new
sentiment label will be sampled.
than that of the MR dataset, with some documents even hav-
ing one sentence only.
Table 1: Dataset and sentiment lexicon statistics. (Note:denotes before preprocessing and * denotes
after preprocessing.)
Dataset # of words
Book DVD Electronic Kitchen
Corpus size1,331,252 812,250 352,020 341,234 221,331 186,122
Corpus size* 627,317 334,336 157,441 153,422 95,441 79,654
Vocabulary38,906 34,559 22,028 21,424 10,669 9,525
Vocabulary* 25,166 18,013 14,459 14,806 7,063 6,252
# of lexicon 1248/1877 1150/1667 1000/1352 979/1307 574/552 582/504
Table 2: LSM sentiment classification results.
Aaccuracy (%) MDS
MR subjMR Book DVD Electronic Kitchen MDS overall
LSM (without prior info.) 61.7 57.9 51.6 53.5 58.4 56.8 55.1
LSM (with prior info.) 74.1 76.1 64.2 66.3 72.5 74.1 69.3
Dasgupta and Ng (2009) 70.9 N/A 69.5 70.8 65.8 69.7 68.9
Li et al.(2009) with 10% doc. label 60 N/A N/A 62
Li et al.(2009) with 40% doc. label 73.5 N/A 73
5 Experimental Results
5.1 LSM Sentiment Classification Results
In this section, we discuss the sentiment classifica-
tion results of LSM at document level by incorpo-
rating prior information extracted from the MPQA
and appraisal lexicon. The symmetry Dirichlet
prior βwas set to 0.01, and the asymmetric Dirich-
let sentiment prior γwas set to 0.01 and 0.9 for
the positive and negative sentiment label, respec-
tively. Classification accuracies were averaged
over 5 runs for each dataset with 2000 Gibbs sam-
pling iterations.
As can be observed from Table 2, the perfor-
mance of LSM is only mediocre for all the 6
datasets when no prior information was incorpo-
rated. A significant improvement, with an aver-
age of more than 13%, is observed after incor-
porating prior information, especially notable for
subjMR and kitchen with 18.2% and 17.3% im-
provement, respectively. It is also noted that LSM
with subjMR dataset achieved 2% improvement
over the original MR dataset, implying that the
subjMR dataset has better representation of sub-
jective information than the original dataset by fil-
tering out the objective contents. For the MDS
dataset, LSM achieved 72.5% and 74.1% accu-
racy on electronic and kitchen domain respec-
tively, which is much better than the book and
DVD domain with only around 65% accuracy.
Manually analysing the MDS dataset reveals that
the book and DVD reviews often contain a lot
of descriptions of book contents or movie plots,
which make the reviews from these two domains
difficult to classify; whereas in the electronic and
kitchen domain, comments on the product are of-
ten expressed in a straightforward manner.
When compared to the recently proposed un-
supervised approach based on a spectral cluster-
ing algorithm (Dasgupta and Ng, 2009), except
for the book and DVD domain, LSM achieved
better performance in all the other domains with
more than 5% overall improvement. Neverthe-
less, the approach proposed by Dasgupta and Ng
(2009) requires users to specify which dimensions
(defined by the eigenvectors in spectral cluster-
ing) are most closely related to sentiment by in-
specting a set of features derived from the re-
views for each dimension, and clustering is per-
formed again on the data to derive the final re-
sults. In all the Bayesian models studied here, no
human judgement is required. Another recently
proposed non-negative matrix tri-factorization ap-
proach (Li et al., 2009) also employed lexical prior
knowledge for semi-supervised sentiment classi-
fication. However, when incorporating 10% of
labelled documents for training, the non-negative
matrix tri-factorization approach performed much
worse than LSM, with only around 60% accu-
racy achieved for all the datasets. Even with 40%
labelled documents, it still performs worse than
LSM on the MR dataset and slightly outperforms
LSM on the MDS dataset. It is worth noting that
no labelled documents were used in the LSM re-
sults reported here.
Figure 2: JST and Reverse-JST sentiment classification results with multiple topics.
5.2 JST and Reverse-JST Results with
Multiple Topics
As both JST and Reverse-JST model document
level sentiment and mixture of topic simulta-
neously, it is worth to explore how the senti-
ment classification and topic extraction tasks af-
fect/benifit each other. With this in mind, we
conducted a set of experiments on both JST and
Reverse-JST, with topic number varying from 30,
50 to 100. The symmetry Dirichlet prior αand β
were set to 50/T and 0.01 respectively for both
models. The asymmetry sentiment prior γwas
empirically set to (0.01, 1.8) for JST and (0.01,
0.012) for Reverse-JST, corresponding to positive
and negative sentiment prior, respectively. Results
were averaged over 5 runs with 2000 Gibbs sam-
pling iterations.
As can be seen from Figure 2 that, for both mod-
els, the sentiment classification accuracy based on
the subjMR dataset still outperformed the results
based on the original MR dataset, where an over-
all improvement of 3% is observed for JST and
about 2% for Reverse-JST. When comparing JST
and Reverse-JST, it can be observed that Reverse-
JST performed slightly worse than JST for all sets
of experiments with about 1% to 2% drop in ac-
curacy. By closely examining the posterior of JST
and Reverse-JST (c.f. Equation 4 and 6), we no-
ticed that the count Nj,d (number of times topic j
associated with some word tokens in document d)
in the Reverse-JST posterior would be relatively
small due to the factor of large topic number set-
ting. On the contrary, the count Nk,d (number of
times sentiment label kassigned to some word to-
kens in document d) in the JST posterior would be
relatively large as kis only defined over 3 differ-
ent sentiment labels. This essentially makes JST
less sensitive to the data sparseness problem and
the perturbation of hyperparameter setting. In ad-
dition, JST encodes an assumption that there is ap-
proximately a single sentiment for the entire docu-
ment, i.e. the documents are usually either mostly
positive or mostly negative. This assumption is
important as it allows the model to cluster different
terms which share similar sentiment. In Reverse-
JST, this assumption is not enforced unless only
one topic for each sentiment is defined. Therefore,
JST appears to be a more appropriate model de-
sign for joint sentiment topic detection.
In addition, it is observed that the sentiment
classification accuracy of both JST and Reverse-
JST drops slightly when the topic number in-
creases from 30 to 100, with the changes of 2%
(MR) and 1.5% (subjMR and MDS overall re-
sult) being observed for both models. This is
likely due to the fact that when the topic number
increases, the probability mass attracted under a
sentiment-topic pair would become smaller, which
essentially creates data sparseness problem. When
comparing with LSM, we notice that the differ-
ence in sentiment classification accuracy is only
marginal by additionally modelling a mixture of
topics. But both JST and Reverse-JST are able to
extract sentiment-oriented topics apart from docu-
ment level sentiment detection.
Table 3: Topic examples extracted by JST under different sentiment labels.
Book DVD Electronic Kitchen
pos. neg. pos. neg. pos. neg. pos. neg.
recip war action murder mous drive color fan
food militari good killer hand fail beauti room
cook armi fight crime logitech data plate cool
cookbook soldier right cop comfort complet durabl air
beauti govern scene crime scroll manufactur qualiti loud
simpl thing chase case wheel failur fiestawar nois
eat evid hit prison smooth lose blue live
famili led art detect feel backup finger annoi
ic iraq martial investig accur poorli white blow
kitchen polici stunt mysteri track error dinnerwar vornado
varieti destruct chan commit touch storag bright bedroom
good critic brilliant thriller click gb purpl inferior
pictur inspect hero attornei conveni flash scarlet window
tast invas style suspect month disast dark vibrat
cream court chines shock mice recogn eleg power
5.3 Topic Extraction
We also evaluated the effectiveness of topic sen-
timent captured. In contrast to LDA in which a
word is drawn from the topic-word distribution,
in JST or Reverse-JST, a word is drawn from the
distribution over words conditioned on both topic
and sentiment label. As an illustration, Table 3
shows eight topic examples extracted from the
MDS dataset by JST, where each topic was drawn
from a particular product domain under positive or
negative sentiment label.
As can be seen from Table 3, the eight extracted
topics are quite informative and coherent, and each
of the topics represents a certain product review
from the corresponding domain. For example,
the positive book topic probably discusses a good
cookbook; the positive DVD topic is apparently
about a popular action movie by Jackie Chan; the
negative electronic topic is likely to be complains
regarding data lose due to the flash drive failure,
and the negative kitchen topic is probably the dis-
satisfaction of the high noise level of the Vornado
brand fan. In terms of topic sentiment, by examin-
ing through the topics in the table, it is evident that
topics under the positive and negative sentiment
label indeed bear positive and negative sentiment
respectively. The above analysis reveals the effec-
tiveness of JST in extracting topics and capturing
topic sentiment from text.
6 Conclusions and Future Work
In this paper, we studied three closed related
Bayesian models for unsupervised sentiment de-
tection, namely LSM, JST and Reverse-JST. As
opposing to most of the existing approaches to
sentiment classification which favour in super-
vised learning, these three models detect senti-
ment in a fully unsupervised manner. While all the
three models gives either better or comparable per-
formance compared to the existing approaches on
unsupervised sentiment classification on the MR
and MDS datasets, JST and Reverse-JST can also
model a mixture of topics and the sentiment as-
sociated with each topic. Moreover, extensive ex-
periments conducted on the datasets from differ-
ent domains reveal that JST always outperformed
Reverse-JST, suggesting JST being a more appro-
priate model design for joint sentiment topic de-
There are several directions we plan to inves-
tigate in the future. One is incremental learn-
ing of the JST parameters when facing with new
data. Another one is semi-supervised learning
of the JST model with some supervised informa-
tion being incorporating into the model parameter
estimation procedure such as some known topic
knowledge for certain product reviews or the doc-
ument labels derived automatically from the user-
supplied review ratings.
Ahmed Abbasi, Hsinchun Chen, and Arab Salem.
2008. Sentiment analysis in multiple languages:
Feature selection for opinion classification in web
forums. ACM Trans. Inf. Syst., 26(3):1–34.
Alina Andreevskaia and Sabine Bergler. 2008. When
specialists and generalists work together: Overcom-
ing domain dependence in sentiment tagging. In
Proceedings of (ACL-HLT), pages 290–298.
A. Aue and M. Gamon. 2005. Customizing sentiment
classifiers to new domains: a case study. In Pro-
ceedings of Recent Advances in Natural Language
Processing (RANLP).
John Blitzer, Mark Dredze, and Fernando Pereira.
2007. Biographies, bollywood, boom-boxes and
blenders: Domain adaptation for sentiment classi-
fication. In Proceedings of the Association for Com-
putational Linguistics (ACL), pages 440–447.
S. Dasgupta and V. Ng. 2009. Topic-wise, Sentiment-
wise, or Otherwise? Identifying the Hidden Dimen-
sion for Unsupervised Text Classification. In Pro-
ceedings of the 2009 Conference on Empirical Meth-
ods in Natural Language Processing, pages 580–
Thomas Hofmann. 1999. Probabilistic latent semantic
indexing. In Proceedings of the ACM Special Inter-
est Group on Information Retrieval (SIGIR), pages
Nobuhiro Kaji and Masaru Kitsuregawa. 2006. Au-
tomatic construction of polarity-tagged corpus from
html documents. In Proceedings of the COL-
ING/ACL on Main conference poster sessions, pages
A. Kennedy and D. Inkpen. 2006. Sentiment classi-
fication of movie reviews using contextual valence
shifters. Computational Intelligence, 22(2):110–
Shoushan Li and Chengqing Zong. 2008. Multi-
domain sentiment classification. In Proceedings of
the Association for Computational Linguistics and
the Human Language Technology Conference (ACL-
HLT), Short Papers, pages 257–260.
Tao Li, Yi Zhang, and Vikas Sindhwani. 2009. A non-
negative matrix tri-factorization approach to senti-
ment classification with lexical prior knowledge. In
Proceedings of (ACL-IJCNLP), pages 244–252.
Chenghua Lin and Yulan He. 2009. Joint senti-
ment/topic model for sentiment analysis. In Pro-
ceedings of the ACM international conference on In-
formation and knowledge management (CIKM).
Ryan McDonald, Kerry Hannan, Tyler Neylon, Mike
Wells, and Jeff Reynar. 2007. Structured models for
fine-to-coarse sentiment analysis. In Proceedings of
the Annual Meeting of the Association of Computa-
tional Linguistics (ACL), pages 432–439.
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su,
and ChengXiang Zhai. 2007. Topic sentiment mix-
ture: modeling facets and opinions in weblogs. In
Proceedings of the conference on World Wide Web
(WWW), pages 171–180.
Bo Pang and Lillian Lee. 2004. A sentimental ed-
ucation: sentiment analysis using subjectivity sum-
marization based on minimum cuts. In Proceedings
of the Annual Meeting on Association for Computa-
tional Linguistics (ACL), page 271.
Bo Pang and Lillian Lee. 2008. Opinion mining and
sentiment analysis. Found. Trends Inf. Retr., 2(1-
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
2002. Thumbs up?: sentiment classification using
machine learning techniques. In Proceedings of the
Conference on Empirical Methods in Natural Lan-
guage Processing (EMNLP), pages 79–86.
Ivan Titov and Ryan McDonald. 2008a. A joint model
of text and aspect ratings for sentiment summariza-
tion. In Proceedings of the Aunal Meeting on Asso-
ciation for Computational Linguistics and the Hu-
man Language Technology Conference (ACL-HLT),
pages 308–316.
Ivan Titov and Ryan McDonald. 2008b. Modeling on-
line reviews with multi-grain topic models. In Pro-
ceeding of the International Conference on World
Wide Web (WWW 08’), pages 111–120.
Casey Whitelaw, Navendu Garg, and Shlomo Arga-
mon. 2005. Using appraisal groups for sentiment
analysis. In Proceedings of the ACM international
conference on Information and Knowledge Manage-
ment (CIKM), pages 625–631.
... Lin et al. constructed three unsupervised emotional analysis systems using LSM model, JST model and reverse JST model. However, because deep emotional analysis inevitably involves semantics analysis, and often occurs in the text of emotional transfer phenomenon, deep semanticbased emotional analysis method is not ideal [24]. Therefore, in order to improve the effectiveness of deep semantic analysis, a dual LSTM model is introduced in this paper. ...
Full-text available
The hybrid neural network model proposed in this paper consists of two main parts: extracting local features of text vectors by convolutional neural network, extracting global features related to text context by BiLSTM, and fusing the features extracted by the two complementary models. In this paper, the pre-processed sentences are put into the hybrid neural network for training. The trained hybrid neural network can automatically classify the sentences. When testing the algorithm proposed in this paper, the training corpus is Word2vec. The test results show that the accuracy rate of text categorization reaches 94.2%, and the number of iterations is 10. The results show that the proposed algorithm has high accuracy and good robustness when the sample size is seriously unbalanced.
... Lin and He (2009) proposed a four-layer probabilistic modelling framework for extracting sentiment polarity from online reviews, such that the topics are generated dependent on sentiment, while the words are generated by both sentiment and topic pairs. Lin et al. (2010) compared unsupervised document level sentiment classification method with machine learning approaches to sentiment classification that often require labeled corpora for classifier training, indicating that unsupervised classification is more appropriate for sentiment topic detection. Jo and Oh (2011) developed an unsupervised probabilistic generative model to identify and evaluate different aspects of sentiment polarities from online reviews. ...
Full-text available
A significant body of knowledge exists on inverse problems and extensive research has been conducted on data-driven design in the past decade. This paper provides a comprehensive review of the state-of-the-art methods and practice reported in the literature dealing with many different aspects of data-informed inverse design. By reviewing the origins and common practice of inverse problems in engineering design, the paper presents a closed-loop decision framework of product usage data-informed inverse design. Specifically reviewed areas of focus include data-informed inverse requirement analysis by user generated content, data-informed inverse conceptual design for product innovation, data-informed inverse embodiment design for product families and product platforming, data-informed inverse analysis and optimization in detailed design, along with prevailing techniques for product usage data collection and analytics. The paper also discusses the challenges of data-informed inverse design and the prospects for future research.
Multi-topic sentiment analysis, which aims to identify the topics and classify their corresponding sentiment, is of great value in understanding consumers’ behaviour and improving services. Because of the high cost of manual annotation of the datasets, topic model-based approaches that model the joint distributions of both topics and sentiments have been studied previously. Some studies proposed models that leverage the prior knowledge derived from the pre-trained word embeddings and have proven effective. However, most of the existing models are based on the assumption that words and topics are conditionally independent, ignoring the dependency relations among them. Additionally, the fine-tuning of the pre-trained word embeddings to incorporate the contextual information is also neglected in these models. This could result in the ambiguous representations of topics. In this paper, we propose a novel weakly-supervised graph-based joint sentiment topic model (W-GJST) that integrates an edge-gated graph convolutional network (E-GCN) into a joint sentiment-topic model. An importance sampling-based training method is proposed to learn the contextual representations of topics and words efficiently. Additionally, a self-training multi-topic classifier is designed for the multi-label topic identification. Experiments on two benchmark datasets demonstrate the superiority of the proposed W-GJST compared to the baseline models in terms of topic modelling, topic identification and topic-sentiment identification.
In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design.
Unified models of sentiment and topic have been widely employed in unsupervised sentiment analysis, where each word in text carries both sentiment and topic information. In fact, however, some words tend to express objective things while others prefer to express subjective sentiments. Based on this observation, the concept of word bias is put forward firstly, including objective bias and subjective bias. Considering the relations of bias, sentiment, and topic, a unified framework named Bias-Sentiment-Topic (BST) model is then presented to jointly model them for microblog sentiment analysis. After that, an improved Gibbs sampler is proposed for the inference of BST by introducing the general Pólya urn model, which incorporates word embedding as the general knowledge. Finally, experiments on standard test datasets illustrate major improvements of BST in sentiment classification and its effectiveness in separation of words with different biases.
Full-text available
The online reviews are one type of social media which are opinions generated by the users to comment on some special items. Since the sentiments are dependent on topics, probabilistic topic models have been widely used for sentiment analysis. However, most of existing methods only model the text, but rarely consider the users, who express the opinions, and the items, which the opinions are expressed on. Different users are usually concerned with different topics and use different sentiment expressions, a lenient user might tend to give positive review than a critical user. High-quality items tend to receive positive reviews than low-quality items. To better model the topics and sentiments, we argue that it is essential to explore reviews as well as users and items. To this end, we propose a novel model called User Item Sentiment Topic (UIST) which incorporates users and items for topic modeling and produces topic–word, user–topic, and item–topic distributions simultaneously. Extensive experiments on several datasets demonstrate the advantages and effectiveness of our method. The extracted topics with our method are more coherent and informative; consequently, the performance of sentiment classification is also improved. Furthermore, the user preference obtained with our method could be utilized for many personalized applications.
Full-text available
With the continuous growth of the Web, Personalized Recommender Systems (PRS) have been the important building blocks of many online web applications, which contribute to our daily lives in various manners. For example, the product recommendation engines in E-commerce websites recommend potentially interesting products to users, friend recommendation helps to find and connect users in social networks, video recommendation in video sharing websites help users to find favourite videos more quickly and efficiently, and news recommendation in news portals push the latest news to users according to their personalized information needs. In a way, personalized recommendation has become one of the most basic supportive techniques in the era of web intelligence. Although personalized recommendation has been investigated for decades of years, the wide adoption of Latent Factor Models (LFM) has made the explainability of recommendations an important and critical issue to both the research community and practical application of recommender systems. For example, in many practical systems the algorithm just provide a personalized item recommendation list to the users, without persuasive personalized explanation about why such an item is recommended while another is not. Unexplainable recommendations introduce negative effects to the trustworthiness of recommender systems, and thus affect the effectiveness of recommendation engines. In this work, we investigate explainable recommendation in aspects of data explainability, model explainability, and result explainability, and the main contributions are as follows: 1. Data Explainability: Data input is the first step of typical recommender systems, and user-item rating matrix is the most basic data format for most personalized recommendation algorithms, especially for Matrix Factorization (MF)-based approaches. In this work, we propose Localized Matrix Factorization (LMF) framework based Bordered Block Diagonal Form (BBDF) matrices, and further applied this technique for parallelized matrix factorization. Traditional MF algorithms treat the original rating matrix as a whole for factorization, without specific understanding of the inherent structure embedded therein. In this work, however, we propose the (recursive) BBDF structure of sparse matrices, and formally prove its equivalence with community detection on bipartite graphs, with which to explain the inherent community structures and their relationships in sparse matrices. Based on this, we further propose the LMF framework, and prove its compatibility with most of the traditional MF algorithms, which makes it a unified parallelization framework for matrix factorization, that improves both the effect and efficiency at the same time. 2. Model Explainability: Based on user-item rating matrices, personalized recommendation algorithms attempt to model user preferences and make personalized recommendations. In this work, we propose Explicit Factor Models (EFM) based on phrase-level sentiment analysis, as well as dynamic user preference modeling based on time series analysis. For their prediction accuracy and scalability, Latent Factor Models (LFM) based on MF have achieved wide application in real-world systems. However, due to their inherently latent factors, it is usually difficult for LFM to provide intuitively understandable explanations to the recommendation algorithms and results, which reduces the persuasiveness of recommendations. In this work, we extract product features and user opinions towards different features from large-scale user textual reviews based on phrase-level sentiment analysis techniques, and introduce the EFM approach for explainable model learning and recommendation. Because user preference on features may change over time, we conduct dynamic user modeling based on time series analysis, so as to construct explainable dynamic recommendations. 3. Economic Explainability: Based on data analysis and user preference modeling, recommender systems actually manipulate the way that items are matched with users, and eventually affect the economic benefits of the online economic system. In this work, we propose the Total Surplus Maximization (TSM) framework for personalized recommendation, as well as the model specification in different types of online applications. More and more human activities are experiencing the continuous progressing from offline to online, and many commonly used online applications can be formalized into the ’producer–service–consumer’ framework. For example, in E-commerce websites online retailers (producers) provide normal goods (services), and the users (consumers) thus make choices and purchases from the vast amount of online services. Based on basic economic concepts, we provide the definitions of utility, cost, and surplus in the application scenario of Web services, and propose the general framework of web total surplus calculation and maximization. Further more, we specific the total surplus maximization framework to different types of online applications, i.e., E-commerce, P2P lending, and online freelancing services. Experimental results on real-world datasets verify that our TSM framework is able to improve the recommendation performance and at the same time benefit the social good of the Web.
Full-text available
An immense amount of data is available with the advent of social media in the last decade. This data can be used for sentiment analysis and decision making. The data present on blogs, news/review sites, social networks, etc. is so enormous that manual labeling is not feasible and an automatic approach is required for its analysis. The sentiment of the masses can be understood by analyzing this large scale and opinion rich data. The major issues in the application of automated approaches are data unavailability, data sparsity, domain independence and inadequate performance. This research proposes a semi-supervised sentiment analysis approach that incorporates lexicon based methodology with machine learning in order to improve sentiment analysis performance. Mathematical models such as Information Gain and Cosine Similarity are employed to revise the sentiment scores defined in SentiWordNet. This research also emphasizes on the importance of nouns and employs them as semantic features with other parts of speech. The evaluation of performance measures and comparison with state of the art techniques proves that the proposed approach is superior.
Conference Paper
Full-text available
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to arbitrary HTML documents. The idea behind our method is to utilize certain layout structures and linguistic pattern. By using them, we can automatically extract such sentences that express opinion. In our experiment, the method could construct a corpus consisting of 126,610 sentences.
Conference Paper
Full-text available
While traditional work on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author's mood, gender, age, or sentiment. Without knowing the user's intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address this problem, we propose a novel way of incorporating user feedback into a clustering algorithm, which allows a user to easily specify the dimension along which she wants the data points to be clustered via inspecting only a small number of words. This distinguishes our method from existing ones, which typically require a large amount of effort on the part of humans in the form of document annotation or interactive construction of the feature space. We demonstrate the viability of our method on several challenging sentiment datasets.
Full-text available
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The Entropy Weighted Genetic Algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of the key features. The proposed features and techniques are evaluated on a benchmark movie review data set and U.S. and Middle Eastern web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracy over 95% on the benchmark data set and over 93% for both the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all test beds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document level classification of sentiments.
Sentiment analysis seeks to identify the view- point(s) underlying a text span; an example appli- cation is classifying a movie review as "thumbs up" or "thumbs down". To determine this sentiment po- larity, we propose a novel machine-learning method that applies text-categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.
Conference Paper
Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.
Conference Paper
Sentiment classication refers to the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a subject at hand. The proliferation of user-generated web content such as blogs, discussion forums and online review sites has made it possi- ble to perform large-scale mining of pub- lic opinion. Sentiment modeling is thus becoming a critical component of market intelligence and social media technologies that aim to tap into the collective wis- dom of crowds. In this paper, we consider the problem of learning high-quality senti- ment models with minimal manual super- vision. We propose a novel approach to learn from lexical prior knowledge in the form of domain-independent sentiment- laden terms, in conjunction with domain- dependent unlabeled data and a few la- beled documents. Our model is based on a constrained non-negative tri-factorization of the term-document matrix which can be implemented using simple update rules. Extensive experimental studies demon- strate the effectiveness of our approach on a variety of real-world sentiment predic- tion tasks.
Conference Paper
In this paper we investigate a structured model for jointly classifying the sentiment of text at varying levels of granularity. Inference in the model is based on standard sequence classification techniques using constrained Viterbi to ensure consistent solutions. The primary advantage of such a model is that it allows classification decisions from one level in the text to influence decisions at another. Experiments show that this method can significantly reduce classification error relative to models trained in isolation. 1
We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers and diminishers. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respectively, the degree to which a term is positive or negative. The first method classifies reviews based on the number of positive and negative terms they contain. We use the General Inquirerin order to identify positive and negative terms, as well as negation terms, intensifiers, and diminishers. We also use positive and negative terms from other sources, including a dictionary of synonym differences and a very large Web corpus. To compute corpus-based semantic orientation values of terms, we use their association scores with a small group of positive and negative terms. We show that extending the term-counting method with contextual valence shifters improves the accuracy of the classification. The second method uses a Machine Learning algorithm, Support Vector Machines. We start with unigram features and then add bigrams that consist of a valence shifter and another word. The accuracy of classification is very high, and the valence shifter bigrams slightly improve it. The features that contribute to the high accuracy are the words in the lists of positive and negative terms. Previous work focused on either the term-counting method or the Machine Learning method. We show that combining the two methods achieves better results than either method alone.
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.