Conference PaperPDF Available

Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification.

Authors:

Abstract

Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors. We study the polarity-bearing topics extracted by JST and show that by augmenting the original feature space with polarity-bearing topics, the in-domain supervised classifiers learned from augmented feature representation achieve the state-of-the-art performance of 95% on the movie review data and an average of 90% on the multi-domain sentiment dataset. Furthermore, using feature augmentation and selection according to the information gain criteria for cross-domain sentiment classification, our proposed approach performs either better or comparably compared to previous approaches. Nevertheless, our approach is much simpler and does not require difficult parameter tuning.
Open Research Online
The Open University’s repository of research publications
and other research outputs
Automatically extracting polarity-bearing topics for cross-
domain sentiment classification
Conference Item
How to cite:
He, Yulan; Lin, Chenghua and Alani, Harith (2011). Automatically extracting polarity-bearing topics
for cross-domain sentiment classification. In: 49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies, 19 - 24 Jun 2011, Portland, Oregon, USA.
For guidance on citations see FAQs.
c
2011 The Authors
Version: Accepted Manuscript
Link(s) to article on publisher’s website:
http://acl2011.org/accepted papers.shtml
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copy-
right owners. For more information on Open Research Online’s data policy on reuse of materials please consult
the policies page.
oro.open.ac.uk
Automatically Extracting Polarity-Bearing Topics for Cross-Domain
Sentiment Classification
Yulan He Chenghua Lin
Harith Alani
Knowledge Media Institute, The Open University
Milton Keynes MK7 6AA, UK
{y.he,h.alani}@open.ac.uk
School of Engineering, Computing and Mathematics
University of Exeter, Exeter EX4 4QF, UK
cl322@exeter.ac.uk
Abstract
Joint sentiment-topic (JST) model was previ-
ously proposed to detect sentiment and topic
simultaneously from text. The only super-
vision required by JST model learning is
domain-independent polarity word priors. In
this paper, we modify the JST model by in-
corporating word polarity priors through mod-
ifying the topic-word Dirichlet priors. We
study the polarity-bearing topics extracted by
JST and show that by augmenting the original
feature space with polarity-bearing topics, the
in-domain supervised classifiers learned from
augmented feature representation achieve the
state-of-the-art performance of 95% on the
movie review data and an average of 90% on
the multi-domain sentiment dataset. Further-
more, using feature augmentation and selec-
tion according to the information gain criteria
for cross-domain sentiment classification, our
proposed approach performs either better or
comparably compared to previous approaches.
Nevertheless, our approach is much simpler
and does not require difficult parameter tun-
ing.
1 Introduction
Given a piece of text, sentiment classification aims
to determine whether the semantic orientation of the
text is positive, negative or neutral. Machine learn-
ing approaches to this problem (Pang et al., 2002;
Pang and Lee, 2004; Whitelaw et al., 2005; Kennedy
and Inkpen, 2006; McDonald et al., 2007; Zhao et
al., 2008) typically assume that classification mod-
els are trained and tested using data drawn from
some fixed distribution. However, in many practi-
cal cases, we may have plentiful labeled examples
in the source domain, but very few or no labeled ex-
amples in the target domain with a different distribu-
tion. For example, we may have many labeled books
reviews, but we are interested in detecting the po-
larity of electronics reviews. Reviews for different
produces might have widely different vocabularies,
thus classifiers trained on one domain often fail to
produce satisfactory results when shifting to another
domain. This has motivated much research on sen-
timent transfer learning which transfers knowledge
from a source task or domain to a different but re-
lated task or domain (Aue and Gamon, 2005; Blitzer
et al., 2007; Wu et al., 2009; Pan et al., 2010).
Joint sentiment-topic (JST) model (Lin and He,
2009; Lin et al., 2010) was extended from the latent
Dirichlet allocation (LDA) model (Blei et al., 2003)
to detect sentiment and topic simultaneously from
text. The only supervision required by JST learning
is domain-independent polarity word prior informa-
tion. With prior polarity words extracted from both
the MPQA subjectivity lexicon
1
and the appraisal
lexicon
2
, the JST model achieves a sentiment classi-
fication accuracy of 74% on the movie review data
3
and 71% on the multi-domain sentiment dataset
4
.
Moreover, it is also able to extract coherent and in-
formative topics grouped under different sentiment.
The fact that the JST model does not required any
1
http://www.cs.pitt.edu/mpqa/
2
http://lingcog.iit.edu/arc/appraisal_
lexicon_2007b.tar.gz
3
http://www.cs.cornell.edu/people/pabo/
movie-review-data
4
http://www.cs.jhu.edu/
˜
mdredze/
datasets/sentiment/index2.html
labeled documents for training makes it desirable
for domain adaptation in sentiment classification.
Many existing approaches solve the sentiment trans-
fer problem by associating words from different do-
mains which indicate the same sentiment (Blitzer
et al., 2007; Pan et al., 2010). Such an associa-
tion mapping problem can be naturally solved by
the posterior inference in the JST model. Indeed,
the polarity-bearing topics extracted by JST essen-
tially capture sentiment associations among words
from different domains which effectively overcome
the data distribution difference between source and
target domains.
The previously proposed JST model uses the sen-
timent prior information in the Gibbs sampling in-
ference step that a sentiment label will only be sam-
pled if the current word token has no prior sentiment
as defined in a sentiment lexicon. This in fact im-
plies a different generative process where many of
the word prior sentiment labels are observed. The
model is no longer “latent”. We propose an alter-
native approach by incorporating word prior polar-
ity information through modifying the topic-word
Dirichlet priors. This essentially creates an informed
prior distribution for the sentiment labels and would
allow the model to actually be latent and would be
consistent with the generative story.
We study the polarity-bearing topics extracted
by the JST model and show that by augmenting
the original feature space with polarity-bearing top-
ics, the performance of in-domain supervised clas-
sifiers learned from augmented feature representa-
tion improves substantially, reaching the state-of-
the-art results of 95% on the movie review data and
an average of 90% on the multi-domain sentiment
dataset. Furthermore, using simple feature augmen-
tation, our proposed approach outperforms the struc-
tural correspondence learning (SCL) (Blitzer et al.,
2007) algorithm and achieves comparable results
to the recently proposed spectral feature alignment
(SFA) method (Pan et al., 2010). Nevertheless, our
approach is much simpler and does not require diffi-
cult parameter tuning.
We proceed with a review of related work on
sentiment domain adaptation. We then briefly de-
scribe the JST model and present another approach
to incorporate word prior polarity information into
JST learning. We subsequently show that words
from different domains can indeed be grouped un-
der the same polarity-bearing topic through an illus-
tration of example topic words extracted by JST be-
fore proposing a domain adaptation approach based
on JST. We verify our proposed approach by con-
ducting experiments on both the movie review data
and the multi-domain sentiment dataset. Finally, we
conclude our work and outline future directions.
2 Related Work
There has been significant amount of work on algo-
rithms for domain adaptation in NLP. Earlier work
treats the source domain data as “prior knowledge”
and uses maximum a posterior (MAP) estimation
to learn a model for the target domain data under
this prior distribution (Roark and Bacchiani, 2003).
Chelba and Acero (2004) also uses the source do-
main data to estimate prior distribution but in the
context of a maximum entropy (ME) model. The
ME model has later been studied in (Daum
´
e III and
Marcu, 2006) for domain adaptation where a mix-
ture model is defined to learn differences between
domains.
Other approaches rely on unlabeled data in the tar-
get domain to overcome feature distribution differ-
ences between domains. Motivated by the alternat-
ing structural optimization (ASO) algorithm (Ando
and Zhang, 2005) for multi-task learning, Blitzer et
al. (2007) proposed structural correspondence learn-
ing (SCL) for domain adaptation in sentiment clas-
sification. Given labeled data from a source domain
and unlabeled data from target domain, SCL selects
a set of pivot features to link the source and target
domains where pivots are selected based on their
common frequency in both domains and also their
mutual information with the source labels.
There has also been research in exploring care-
ful structuring of features for domain adaptation.
Daum
´
e (2007) proposed a kernel-mapping function
which maps both source and target domains data
to a high-dimensional feature space so that data
points from the same domain are twice as similar
as those from different domains. Dai et al.(2008)
proposed translated learning which uses a language
model to link the class labels to the features in
the source spaces, which in turn is translated to
the features in the target spaces. Dai et al. (2009)
further proposed using spectral learning theory to
learn an eigen feature representation from a task
graph representing features, instances and class la-
bels. In a similar vein, Pan et al. (2010) proposed the
spectral feature alignment (SFA) algorithm where
some domain-independent words are used as a
bridge to construct a bipartite graph to model the
co-occurrence relationship between domain-specific
words and domain-independent words. Feature clus-
ters are generated by co-align domain-specific and
domain-independent words.
Graph-based approach has also been studied in
(Wu et al., 2009) where a graph is built with nodes
denoting documents and edges denoting content
similarity between documents. The sentiment score
of each unlabeled documents is recursively calcu-
lated until convergence from its neighbors the ac-
tual labels of source domain documents and pseudo-
labels of target document documents. This approach
was later extended by simultaneously considering
relations between documents and words from both
source and target domains (Wu et al., 2010).
More recently, Seah et al. (2010) addressed the
issue when the predictive distribution of class label
given input data of the domains differs and proposed
Predictive Distribution Matching SVM learn a ro-
bust classifier in the target domain by leveraging the
labeled data from only the relevant regions of multi-
ple sources.
3 Joint Sentiment-Topic (JST) Model
Assume that we have a corpus with a collection of D
documents denoted by C = {d
1
, d
2
, ..., d
D
}; each
document in the corpus is a sequence of N
d
words
denoted by d = (w
1
, w
2
, ..., w
N
d
), and each word
in the document is an item from a vocabulary index
with V distinct terms denoted by {1, 2, ..., V }. Also,
let S be the number of distinct sentiment labels, and
T be the total number of topics. The generative
process in JST which corresponds to the graphical
model shown in Figure 1(a) is as follows:
For each document d, choose a distribution
π
d
Dir(γ).
For each sentiment label l under document d,
choose a distribution θ
d,l
Dir(α).
For each word w
i
in document d
choose a sentiment label l
i
Mult(π
d
),
choose a topic z
i
Mult(θ
d,l
i
),
choose a word w
i
from ϕ
l
i
z
i
, a Multino-
mial distribution over words conditioned
on topic z
i
and sentiment label l
i
.
w
z
N
d
S*T
D
l
S
(a) JST model.
w
z
N
d
S*T
D
l
S
S
S
(b) Modified JST model.
Figure 1: JST model and its modified version.
Gibbs sampling was used to estimate the posterior
distribution by sequentially sampling each variable
of interest, z
t
and l
t
here, from the distribution over
that variable given the current values of all other
variables and data. Letting the superscript t de-
note a quantity that excludes data from t
th
position,
the conditional posterior for z
t
and l
t
by marginaliz-
ing out the random variables ϕ, θ, and π is
P (z
t
= j, l
t
= k|w , z
t
, l
t
, α, β, γ)
N
t
w
t
,j,k
+ β
N
t
j,k
+ V β
·
N
t
j,k,d
+ α
j,k
N
t
k,d
+
P
j
α
j,k
·
N
t
k,d
+ γ
N
t
d
+ Sγ
. (1)
where N
w
t
,j,k
is the number of times word w
t
ap-
peared in topic j and with sentiment label k, N
j,k
is the number of times words assigned to topic j
and sentiment label k, N
j,k,d
is the number of times
a word from document d has been associated with
topic j and sentiment label k, N
k,d
is the number of
times sentiment label k has been assigned to some
word tokens in document d, and N
d
is the total num-
ber of words in the document collection.
In the modified JST model as shown in Fig-
ure 1(b), we add an additional dependency link of
ϕ on the matrix λ of size S ×V which we use to en-
code word prior sentiment information into the JST
model. For each word w {1, ..., V }, if w is found
in the sentiment lexicon, for each l {1, ..., S}, the
element λ
lw
is updated as follows
λ
lw
=
½
1 if S(w) = l
0 otherwise
, (2)
where the function S(w) returns the prior sentiment
label of w in a sentiment lexicon, i.e. neutral, posi-
tive or negative.
The matrix λ can be considered as a transforma-
tion matrix which modifies the Dirichlet priors β of
size S × T × V , so that the word prior polarity can
be captured. For example, the word excellent” with
index i in the vocabulary has a positive polarity. The
corresponding row vector in λ is [0, 1, 0] with its el-
ements representing neutral, positive, and negative.
For each topic j, multiplying λ
li
with β
lji
, only the
value of β
l
pos
ji
is retained, and β
l
neu
ji
and β
l
neg
ji
are set to 0. Thus, the word excellent can only
be drawn from the positive topic word distributions
generated from a Dirichlet distribution with param-
eter β
l
pos
.
4 Polarity Words Extracted by JST
The JST model allows clustering different terms
which share similar sentiment. In this section, we
study the polarity-bearing topics extracted by JST.
We combined reviews from the source and target
domains and discarded document labels in both do-
mains. There are a total of six different combi-
nations. We then run JST on the combined data
sets and listed some of the topic words extracted as
shown in Table 1. Words in each cell are grouped
under one topic and the upper half of the table shows
topic words under the positive sentiment label while
the lower half shows topic words under the negative
sentiment label.
We can see that JST appears to better capture sen-
timent association distribution in the source and tar-
get domains. For example, in the DVD+Elec. set,
words from the DVD domain describe a rock con-
cert DVD while words from the Electronics domain
are likely relevant to stereo amplifiers and receivers,
and yet they are grouped under the same topic by the
JST model. Checking the word coverage in each do-
main reveals that for example “bass” seldom appears
in the DVD domain, but appears more often in the
Electronics domain. Likewise, in the Book+Kitch.
set, “stainless” rarely appears in the Book domain
and “interest” does not occur often in the Kitchen
domain and they are grouped under the same topic.
These observations motivate us to explore polarity-
bearing topics extracted by JST for cross-domain
sentiment classification since grouping words from
different domains but bearing similar sentiment has
the effect of overcoming the data distribution differ-
ence of two domains.
5 Domain Adaptation using JST
Given input data x and a class label y, labeled pat-
terns of one domain can be drawn from the joint
distribution P (x, y) = P (y|x)P (x). Domain adap-
tation usually assume that data distribution are dif-
ferent in source and target domains, i.e., P
s
(x) 6=
P
t
(x). The task of domain adaptation is to predict
the label y
t
i
corresponding to x
t
i
in the target domain.
We assume that we are given two sets of training
data, D
s
and D
t
, the source domain and target do-
main data sets, respectively. In the multiclass clas-
sification problem, the source domain data consist
of labeled instances, D
s
= {(x
s
n
; y
s
n
) X × Y :
1 n N
s
}, where X is the input space and Y
is a finite set of class labels. No class label is given
in the target domain, D
t
= {x
t
n
X : 1 n
N
t
, N
t
À N
s
}. Algorithm 1 shows how to per-
form domain adaptation using the JST model. The
source and target domain data are first merged with
document labels discarded. A JST model is then
learned from the merged corpus to generate polarity-
bearing topics for each document. The original doc-
uments in the source domain are augmented with
those polarity-bearing topics as shown in Step 4 of
Algorithm 1, where l
i
z
i
denotes a combination of
sentiment label l
i
and topic z
i
for word w
i
. Finally,
feature selection is performed according to the infor-
mation gain criteria and a classifier is then trained
from the source domain using the new document
representations. The target domain documents are
also encoded in a similar way with polarity-bearing
topics added into their feature representations.
As discussed in Section 3 that the JST model di-
rectly models P (l|d), the probability of sentiment
label given document, and hence document polar-
ity can be classified accordingly. Since JST model
Book DVD Book Elec. Book Kitch. DVD Elec. DVD Kitch. Elec. Kitch.
Pos.
recommend funni interest pictur interest qualiti concert sound movi recommend sound pleas
highli cool topic clear success easili rock listen stori highli excel look
easi entertain knowledg paper polit servic favorit bass classic perfect satisfi worth
depth awesom follow color clearli stainless sing amaz fun great perform materi
strong worth easi accur popular safe talent acoust charact qulati comfort profession
Neg.
mysteri cop abus problem bore return bore poorli horror cabinet tomtom elimin
fbi shock question poor tediou heavi plot low alien break region regardless
investig prison mislead design cheat stick stupid replac scari install error cheapli
death escap point case crazi defect stori avoid evil drop code plain
report dirti disagre flaw hell mess terribl crap dead gap dumb incorrect
Table 1: Extracted polarity words by JST on the combined data sets.
Algorithm 1 Domain adaptation using JST.
Input: The source domain data D
s
= {(x
s
n
; y
s
n
) X ×
Y : 1 n N
s
}, the target domain data, D
t
=
{x
t
n
X : 1 n N
t
, N
t
À N
s
}
Output: A sentiment classifier for the target domain D
t
1: Merge D
s
and D
t
with document labels discarded,
D = {(x
s
n
, 1 n N
s
; x
t
n
, 1 n N
t
}
2: Train a JST model on D
3: for each document x
s
n
= (w
1
, w
2
, ..., w
m
) D
s
do
4: Augment document with polarity-bearing topics
generated from JST,
x
s
0
n
= (w
1
, w
2
, ..., w
m
, l
1
z
1
, l
2
z
2
, ..., l
m
z
m
)
5: Add {x
s
0
n
; y
s
n
} into a document pool B
6: end for
7: Perform feature selection using IG on B
8: Return a classifier, trained on B
learning does not require the availability of docu-
ment labels, it is possible to augment the source do-
main data by adding most confident pseudo-labeled
documents from the target domain by the JST model
as shown in Algorithm 2.
6 Experiments
We evaluate our proposed approach on the two
datasets, the movie review (MR) data and the multi-
domain sentiment (MDS) dataset. The movie re-
view data consist of 1000 positive and 1000 neg-
ative movie reviews drawn from the IMDB movie
archive while the multi-domain sentiment dataset
contains four different types of product reviews ex-
tracted from Amazon.com including Book, DVD,
Electronics and Kitchen appliances. Each category
of product reviews comprises of 1000 positive and
1000 negative reviews and is considered as a do-
main. Preprocessing was performed on both of the
Algorithm 2 Adding pseudo-labeled documents.
Input: The target domain data, D
t
= {x
t
n
X :
1 n N
t
, N
t
À N
s
}, document sentiment
classification threshold τ
Output: A labeled document pool B
1: Train a JST model parameterized by Λ on D
t
2: for each document x
t
n
D
t
do
3: Infer its sentiment class label from JST as
l
n
= arg max
s
P (l|x
t
n
; Λ)
4: if P(l
n
|x
t
n
; Λ) > τ then
5: Add labeled sample (x
t
n
, l
n
) into a docu-
ment pool B
6: end if
7: end for
datasets by removing punctuation, numbers, non-
alphabet characters and stopwords. The MPQA sub-
jectivity lexicon is used as a sentiment lexicon in our
experiments.
6.1 Experimental Setup
While the original JST model can produce reason-
able results with a simple symmetric Dirichlet prior,
here we use asymmetric prior α over the topic pro-
portions which is learned directly from data using a
fixed-point iteration method (Minka, 2003).
In our experiment, α was updated every 25 itera-
tions during the Gibbs sampling procedure. In terms
of other priors, we set symmetric prior β = 0.01 and
γ = (0.05×L)/S, where L is the average document
length, and the value of 0.05 on average allocates 5%
of probability mass for mixing.
6.2 Supervised Sentiment Classification
We performed 5-fold cross validation for the per-
formance evaluation of supervised sentiment clas-
sification. Results reported in this section are av-
eraged over 10 such runs. We have tested several
classifiers including Na
¨
ıve Bayes (NB) and support
vector machines (SVMs) from WEKA
5
, and maxi-
mum entropy (ME) from MALLET
6
. All parameters
are set to their default values except the Gaussian
prior variance is set to 0.1 for the ME model train-
ing. The results show that ME consistently outper-
forms NB and SVM on average. Thus, we only re-
port results from ME trained on document vectors
with each term weighted according to its frequency.
Figure 2: Classification accuracy vs. no. of topics.
The only parameter we need to set is the num-
ber of topics T . It has to be noted that the actual
number of feature clusters is 3 × T . For example,
when T is set to 5, there are 5 topic groups under
each of the positive, negative, or neutral sentiment
labels and hence there are altogether 15 feature clus-
ters. The generated topics for each document from
the JST model were simply added into its bag-of-
words (BOW) feature representation prior to model
training. Figure 2 shows the classification results on
the five different domains by varying the number of
topics from 1 to 200. It can be observed that the best
classification accuracy is obtained when the number
of topics is set to 1 (or 3 feature clusters). Increas-
ing the number of topics results in the decrease of
accuracy though it stabilizes after 15 topics. Never-
theless, when the number of topics is set to 15, us-
ing JST feature augmentation still outperforms ME
without feature augmentation (the baseline model)
in all of the domains. It is worth pointing out that
5
http://www.cs.waikato.ac.nz/ml/weka/
6
http://mallet.cs.umass.edu/
the JST model with single topic becomes the stan-
dard LDA model with only three sentiment topics.
Nevertheless, we have proposed an effective way to
incorporate domain-independent word polarity prior
information into model learning. As will be shown
later in Table 2 that the JST model with word po-
larity priors incorporated performs significantly bet-
ter than the LDA model without incorporating such
prior information.
Method MR
MDS
Book DVD Elec. Kitch.
Baseline 82.53 79.96 81.32 83.61 85.82
LDA 83.76 84.32 85.62 85.4 87.68
JST 94.98 89.95 91.7 88.25 89.85
[YE10] 91.78 82.75 82.85 84.55 87.9
[LI10] - 79.49 81.65 83.64 85.65
Table 2: Supervised sentiment classification accuracy.
For comparison purpose, we also run the LDA
model and augmented the BOW features with the
generated topics in a similar way. The best accu-
racy was obtained when the number of topics is set
to 15 in the LDA model. Table 2 shows the clas-
sification accuracy results with or without feature
augmentation. We have performed significance test
and found that LDA performs statistically signifi-
cant better than Baseline according to a paired t-test
with p < 0.005 for the Kitchen domain and with
p < 0.001 for all the other domains. JST performs
statistically significant better than both Baseline and
LDA with p < 0.001.
We also compare our method with other recently
proposed approaches. Yessenalina et al. (2010a) ex-
plored different methods to automatically generate
annotator rationales to improve sentiment classifica-
tion accuracy. Our method using JST feature aug-
mentation consistently performs better than their ap-
proach (denoted as [YE10] in Table 2). They fur-
ther proposed a two-level structured model (Yesse-
nalina et al., 2010b) for document-level sentiment
classification. The best accuracy obtained on the
MR data is 93.22% with the model being initial-
ized with sentence-level human annotations, which
is still worse than ours. Li et al. (2010) adopted a
two-stage process by first classifying sentences as
personal views and impersonal views and then using
an ensemble method to perform sentiment classifi-
cation. Their method (denoted as [LI10] in Table 2)
performs worse than either LDA or JST feature aug-
mentation. To the best of our knowledge, the re-
sults achieved using JST feature augmentation are
the state-of-the-art for both the MR and the MDS
datasets.
6.3 Domain Adaptation
We conducted domain adaptation experiments on
the MDS dataset comprising of four different do-
mains, Book (B), DVD (D), Electronics (E), and
Kitchen appliances (K). We randomly split each do-
main data into a training set of 1,600 instances and a
test set of 400 instances. A classifier trained on the
training set of one domain is tested on the test set of
a different domain. We preformed 5 random splits
and report the results averaged over 5 such runs.
Comparison with Baseline Models
We compare our proposed approaches with two
baseline models. The first one (denoted as “Base” in
Table 3) is an ME classifier trained without adapta-
tion. LDA results were generated from an ME clas-
sifier trained on document vectors augmented with
topics generated from the LDA model. The number
of topics was set to 15. JST results were obtained
in a similar way except that we used the polarity-
bearing topics generated from the JST model. We
also tested with adding pseudo-labeled examples
from the JST model into the source domain for ME
classifier training (following Algorithm 2), denoted
as “JST-PL in Table 3. The document sentiment
classification probability threshold τ was set to 0.8.
Finally, we performed feature selection by selecting
the top 2000 features according to the information
gain criteria (“JST-IG”)
7
.
There are altogether 12 cross-domain sentiment
classification tasks. We showed the adaptation loss
results in Table 3 where the result for each domain
and for each method is averaged over all three pos-
sible adaptation tasks by varying the source domain.
The adaptation loss is calculated with respect to the
in-domain gold standard classification result. For
example, the in-domain goal standard for the Book
7
Both values of 0.8 and 2000 were set arbitrarily after an ini-
tial run on some held-out data; they were not tuned to optimize
test performance.
domain is 79.96%. For adapting from DVD to Book,
baseline achieves 72.25% and JST gives 76.45%.
The adaptation loss is 7.71 for baseline and 3.51 for
JST.
It can be observed from Table 3 that LDA only
improves slightly compared to the baseline with an
error reduction of 11%. JST further reduces the er-
ror due to transfer by 27%. Adding pseudo-labeled
examples gives a slightly better performance com-
pared to JST with an error reduction of 36%. With
feature selection, JST-IG outperforms all the other
approaches with a relative error reduction of 53%.
Domain Base LDA JST JST-PL JST-IG
Book 10.8 9.4 7.2 6.3 5.2
DVD 8.3 6.1 4.8 4.4 2.9
Electr. 7.9 7.7 6.3 5.4 3.9
Kitch. 7.6 7.6 6.9 6.1 4.4
Average 8.6 7.7 6.3 5.5 4.1
Table 3: Adaptation loss with respect to the in-domain
gold standard. The last row shows the average loss over
all the four domains.
Parameter Sensitivity
There is only one parameters to be set in the JST-
IG approach, the number of topics. We plot the clas-
sification accuracy versus different topic numbers in
Figure 3 with the number of topics varying between
1 and 200, corresponding to feature clusters varying
between 3 and 600. It can be observed that for the
relatively larger Book and DVD data sets, the accu-
racies peaked at topic number 10, whereas for the
relatively smaller Electronics and Kitchen data sets,
the best performance was obtained at topic number
50. Increasing topic numbers results in the decrease
of classification accuracy. Manually examining the
extracted polarity topics from JST reveals that when
the topic number is small, each topic cluster contains
well-mixed words from different domains. How-
ever, when the topic number is large, words under
each topic cluster tend to be dominated by a single
domain.
Comparison with Existing Approaches
We compare in Figure 4 our proposed approach
with two other domain adaptation algorithms for
sentiment classification, SCL and SFA. Each set of
(a) Adapted to Book and DVD data sets.
(b) Adapted to Electronics and Kitchen data sets.
Figure 3: Classification accuracy vs. no. of topics.
bars represent a cross-domain sentiment classifica-
tion task. The thick horizontal lines are in-domain
sentiment classification accuracies. It is worth not-
ing that our in-domain results are slightly different
from those reported in (Blitzer et al., 2007; Pan et
al., 2010) due to different random splits. Our pro-
posed JST-IG approach outperforms SCL in average
and achieves comparable results to SFA. While SCL
requires the construction of a reasonable number
of auxiliary tasks that are useful to model “pivots”
and “non-pivots”, SFA relies on a good selection of
domain-independent features for the construction of
bipartite feature graph before running spectral clus-
tering to derive feature clusters. On the contrary, our
proposed approach based on the JST model is much
simpler and yet still achieves comparable results.
7 Conclusions
In this paper, we have studied polarity-bearing top-
ics generated from the JST model and shown that by
augmenting the original feature space with polarity-
bearing topics, the in-domain supervised classi-
fiers learned from augmented feature representation
achieve the state-of-the-art performance on both the
(a) Adapted to Book and DVD data sets.
(b) Adapted to Electronics and Kitchen data sets.
Figure 4: Comparison with existing approaches.
movie review data and the multi-domain sentiment
dataset. Furthermore, using feature augmentation
and selection according to the information gain cri-
teria for cross-domain sentiment classification, our
proposed approach outperforms SCL and gives sim-
ilar results as SFA. Nevertheless, our approach is
much simpler and does not require difficult parame-
ter tuning.
There are several directions we would like to ex-
plore in the future. First, polarity-bearing topics
generated by the JST model were simply added into
the original feature space of documents, it is worth
investigating attaching different weight to each topic
maybe in proportional to the posterior probability of
sentiment label and topic given a word estimated by
the JST model. Second, it might be interesting to
study the effect of introducing a tradeoff parameter
to balance the effect of original and new features.
Finally, our experimental results show that adding
pseudo-labeled examples by the JST model does not
appear to be effective. We could possibly explore
instance weight strategies (Jiang and Zhai, 2007) on
both pseudo-labeled examples and source domain
training examples in order to improve the adaptation
performance.
Acknowledgements
This work was supported in part by the EC-FP7
projects ROBUST (grant number 257859).
References
R.K. Ando and T. Zhang. 2005. A framework for learn-
ing predictive structures from multiple tasks and un-
labeled data. The Journal of Machine Learning Re-
search, 6:1817–1853.
A. Aue and M. Gamon. 2005. Customizing sentiment
classifiers to new domains: a case study. In Proceed-
ings of Recent Advances in Natural Language Process-
ing (RANLP).
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
2003. Latent Dirichlet allocation. J. Mach. Learn.
Res., 3:993–1022.
J. Blitzer, M. Dredze, and F. Pereira. 2007. Biographies,
bollywood, boom-boxes and blenders: Domain adap-
tation for sentiment classification. In ACL, page 440–
447.
C. Chelba and A. Acero. 2004. Adaptation of maxi-
mum entropy classifier: Little data can help a lot. In
EMNLP.
W. Dai, Y. Chen, G.R. Xue, Q. Yang, and Y. Yu. 2008.
Translated learning: Transfer learning across different
feature spaces. In NIPS, pages 353–360.
W. Dai, O. Jin, G.R. Xue, Q. Yang, and Y. Yu. 2009.
Eigentransfer: a unified framework for transfer learn-
ing. In ICML, pages 193–200.
H. Daum
´
e III and D. Marcu. 2006. Domain adaptation
for statistical classifiers. Journal of Artificial Intelli-
gence Research, 26(1):101–126.
H. Daum
´
e. 2007. Frustratingly easy domain adaptation.
In ACL, pages 256–263.
J. Jiang and C.X. Zhai. 2007. Instance weighting for
domain adaptation in NLP. In ACL, pages 264–271.
A. Kennedy and D. Inkpen. 2006. Sentiment clas-
sification of movie reviews using contextual valence
shifters. Computational Intelligence, 22(2):110–125.
S. Li, C.R. Huang, G. Zhou, and S.Y.M. Lee. 2010.
Employing personal/impersonal views in supervised
and semi-supervised sentiment classification. In ACL,
pages 414–423.
C. Lin and Y. He. 2009. Joint sentiment/topic model for
sentiment analysis. In Proceedings of the 18th ACM
international conference on Information and knowl-
edge management (CIKM), pages 375–384.
C. Lin, Y. He, and R. Everson. 2010. A Compara-
tive Study of Bayesian Models for Unsupervised Sen-
timent Detection. In Proceedings of the 14th Confer-
ence on Computational Natural Language Learning
(CoNLL), pages 144–152.
Ryan McDonald, Kerry Hannan, Tyler Neylon, Mike
Wells, and Jeff Reynar. 2007. Structured models for
fine-to-coarse sentiment analysis. In ACL, pages 432–
439.
T. Minka. 2003. Estimating a Dirichlet distribution.
Technical report.
S.J. Pan, X. Ni, J.T. Sun, Q. Yang, and Z. Chen. 2010.
Cross-domain sentiment classification via spectral fea-
ture alignment. In Proceedings of the 19th interna-
tional conference on World Wide Web (WWW), pages
751–760.
Bo Pang and Lillian Lee. 2004. A sentimental educa-
tion: sentiment analysis using subjectivity summariza-
tion based on minimum cuts. In ACL, page 271–278.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
2002. Thumbs up?: sentiment classification using ma-
chine learning techniques. In EMNLP, pages 79–86.
B. Roark and M. Bacchiani. 2003. Supervised and un-
supervised PCFG adaptation to novel domains. In
NAACL-HLT, pages 126–133.
C.W. Seah, I. Tsang, Y.S. Ong, and K.K. Lee. 2010. Pre-
dictive Distribution Matching SVM for Multi-domain
Learning. In ECML-PKDD, pages 231–247.
Casey Whitelaw, Navendu Garg, and Shlomo Argamon.
2005. Using appraisal groups for sentiment analysis.
In Proceedings of the ACM international conference
on Information and Knowledge Management (CIKM),
pages 625–631.
Q. Wu, S. Tan, and X. Cheng. 2009. Graph ranking for
sentiment transfer. In ACL-IJCNLP, pages 317–320.
Q. Wu, S. Tan, X. Cheng, and M. Duan. 2010. MIEA:
a Mutual Iterative Enhancement Approach for Cross-
Domain Sentiment Classification. In COLING, page
1327-1335.
A. Yessenalina, Y. Choi, and C. Cardie. 2010a. Auto-
matically generating annotator rationales to improve
sentiment classification. In ACL, pages 336–341.
A. Yessenalina, Y. Yue, and C. Cardie. 2010b. Multi-
Level Structured Models for Document-Level Senti-
ment Classification. In EMNLP, pages 1046–1056.
Jun Zhao, Kang Liu, and Gen Wang. 2008. Adding re-
dundant features for CRFs-based sentence sentiment
classification. In EMNLP, pages 117–126.
... In other words, the machine learning approach treats the post as a whole (Y. He et al., 2011). ...
Article
Full-text available
Nowadays, Islamic Fintech has become a booming subfield of global finance. In this research, we propose to explore the acceptance of using Blockchain technology in building Islamic financial solutions, based on an extended model of the unified theory of acceptance and use of technology. The proposed model integrates Islamic religiosity with Twitter user-generated content analysis and search volume queries. This research focuses on displaying users' perception constructs using Islamic Fintech-related hashtags to map the adoption drivers of Blockchain technology in the context of Islamic finance. Furthermore, this study is concentrated on visualizing Islamic Religiosity as a construct in the proposed model and validating its impact on the behavioral intention to use Blockchain in developing Islamic Fintech. The obtained results showed that users express more interest in security and shareability characteristics for Blockchain implementations in Islamic Fintech solutions, followed by privacy and fraud detection. The usefulness of Blockchain and Islamic Fintech is more perceived in the informational domain. We also investigated the perceived ease of use. The obtained results showed that Islamic Fintech as an initial coin offering is the most popular and the most discussed among Twitter users. Islamic religiosity has a significant positive effect on the behavioral intention of integrating Blockchain technology in developing Islamic Fintech. These research results bring valuable information to help financial organizations adjust their marketing strategies and long-term vision.
... In fact, most studies agree on the superiority of lexicon-based techniques on machine learning classification because of the unrepresentative character of steady terms. In other words, the machine learning approach treats the post as a whole (He et al., 2011). For this purpose, the NRC dictionary is used to visualize the most basic emotions toward BITG. ...
Article
Full-text available
Purpose Nowadays, Bitcoin is facing many environmental problems arising from the proof of work based on blockchain. For this reason, Bitcoin Green (BITG) has been created and would solve these issues. The purpose of this paper is to visualize the users’ perception toward BITG through Twitter text analysis. Design/methodology/approach The big data used in this study includes two sources. The first data were extracted from the “Google Trends” engine during the period between 20 September 2015 and 15 September 2020. The second data were extracted from the Twitter application. This research explores the perceived ease of use, the perceived usefulness, the social influence, the perceived control and the user attitudes toward BITG. Therefore, lexicon-based sentiment analysis techniques combined with different dictionaries are built to visualize the drivers of investor attitudes toward the BITG using Twitter text messages. Besides, this study has checked the validity of two main assumptions using the normality (Jarque-Bera) and Kruskal-Wallis rank sum tests capable to conclude whether users mostly perceive BITG as a sustainable technology. Findings This empirical work affords insights into users’ intentions by exploring the drivers of BITG perception. The results show that users positively perceive the use of BITG as a sustainable blockchain. Besides, its usefulness is more appreciated from its ethical and technological characteristics, and its perceived application is mainly based on investment and coin offering use. Similarly, users are mostly showing positive emotions toward BITG. Research limitations/implications Tweets related to “BITG” are not as voluminous as the other cryptocurrencies like Bitcoin and Ethereum, which make it difficult to extract all the characteristics and use cases. Originality/value To the best of the authors’ knowledge, this work is the first one that uses the theory of planned behavior and the theory of acceptance model to explore cognitive factors in understanding investor intentions in adopting BITG.
... So the cross-domain SA, which transfers knowledge from the richsource domains (denoted as source domains) to low-source domains (denoted as target domains), is widely explored by different lines of work. One line of literature focuses on the one-to-one crossdomain SA [6,[28][29][30][31][32]. The structural correspondence Content courtesy of Springer Nature, terms of use apply. ...
Article
Full-text available
Sentiment analysis (SA) is an important research area in cognitive computation—thus, in-depth studies of patterns of sentiment analysis are necessary. At present, rich-resource data-based SA has been well-developed, while the more challenging and practical multi-source unsupervised SA (i.e., a target-domain SA by transferring from multiple source domains) is seldom studied. The challenges behind this problem mainly locate in the lack of supervision information, the semantic gaps among domains (i.e., domain shifts), and the loss of knowledge. However, existing methods either lack the distinguishable capacity of the semantic gaps among domains or lose private knowledge. To alleviate these problems, we propose a two-stage domain adaptation framework. In the first stage, a multi-task methodology-based shared-private architecture is employed to explicitly model the domain-common features and the domain-specific features for the labeled source domains. In the second stage, two elaborate mechanisms are embedded in the shared-private architecture to transfer knowledge from multiple source domains. The first mechanism is a selective domain adaptation (SDA) method, which transfers knowledge from the closest source domain. And the second mechanism is a target-oriented ensemble (TOE) method, in which knowledge is transferred through a well-designed ensemble method. Extensive experiment evaluations verify that the performance of the proposed framework outperforms unsupervised state-of-the-art competitors. What can be concluded from the experiments is that transferring from very different distributed source domains may degrade the target-domain performance, and it is crucial to choose proper source domains to transfer from.
... In other words, the machine learning approach treats the post as a whole (Y. He, Lin, & Alani, 2011). In the third step, we visualized the trust and perceived joyfulness emotions, positive and negative affect factors as shown in Fig. 3 by the bar graph. ...
Article
Full-text available
Purpose This study investigates the affective technology acceptance model applied to the case of blockchain through Twitter text mining. Design/methodology/approach The analysis focuses on mapping the acceptance drivers of the blockchain technology by visualizing the users perception constructs through Blockchain hashtags. More than 5000 relevant tweets per day were collected between December 15, 2020, and January 15, 2021. The Kruskal-Wallis and the Mann-Whitney tests were applied over the frequency of the characteristics and the emotions' measurements to validate the research hypotheses. Findings The results prove that users show more interest in security, shareability, and decentralization characteristics. Therefore, the blockchain technology usefulness is rather perceived in the informational domain, and the blockchain ease of use is further expressed in smart contracts as a use case. Blockchain benefits are more discussed than the drawbacks among Twitter users. Besides, positive feelings with strong emotions of trust and joy dominate among users. In summary, the results show significant awareness of users towards blockchain technology. Originality To the best of the authors' knowledge, this paper is the first study that explores the affective technology acceptance model with user-generated content analysis.
... One line of literature focus on the one-to-one cross-domain SA [6,44,33,25,20,41]. The structural correspondence learning (SCL) algorithm [6] implement domain adaptation at the feature level based on the selected pivot features. ...
Preprint
Sentiment analysis (SA) is an important research area in cognitive computation-thus in-depth studies of patterns of sentiment analysis are necessary. At present, rich resource data-based SA has been well developed, while the more challenging and practical multi-source unsupervised SA (i.e. a target domain SA by transferring from multiple source domains) is seldom studied. The challenges behind this problem mainly locate in the lack of supervision information, the semantic gaps among domains (i.e., domain shifts), and the loss of knowledge. However, existing methods either lack the distinguishable capacity of the semantic gaps among domains or lose private knowledge. To alleviate these problems, we propose a two-stage domain adaptation framework. In the first stage, a multi-task methodology-based shared-private architecture is employed to explicitly model the domain common features and the domain-specific features for the labeled source domains. In the second stage, two elaborate mechanisms are embedded in the shared private architecture to transfer knowledge from multiple source domains. The first mechanism is a selective domain adaptation (SDA) method, which transfers knowledge from the closest source domain. And the second mechanism is a target-oriented ensemble (TOE) method, in which knowledge is transferred through a well-designed ensemble method. Extensive experiment evaluations verify that the performance of the proposed framework outperforms unsupervised state-of-the-art competitors. What can be concluded from the experiments is that transferring from very different distributed source domains may degrade the target-domain performance, and it is crucial to choose the proper source domains to transfer from.
... For example, even though SA can analyze large volumes of tweets in bulk, questions may arise over its accuracy and the limited depth to the analyzed data [3]. Further, machine learning-based sentiment classifiers can often prove less efficient in the case of tweets [5,7], since the latter do not typically consist of representative and syntactically consistent words, due to the imposed character restriction [1]. An additional limitation is that classifiers usually distinguish sentiment into classes (positive, negative, and neutral), assigning a corresponding score to the post as a whole, even though many aspects of the same "notion" may be discussed in a single post [1]. ...
Chapter
While there has been much growth in the use of microblogging platforms (e.g., Twitter) to share information on a range of topics, researchers struggle to analyze the large volumes of data produced on such platforms. Established methods such as Sentiment Analysis (SA) have been criticized over their inaccuracy and limited analytical depth. In this exploratory methodological paper, we propose a combination of SA with Epistemic Network Analysis (ENA) as an alternative approach for providing richer qualitative and quantitative insights into Twitter discourse. We illustrate the application and potential use of these approaches by visualizing the differences between tweets directed or discussing Democrats and Republicans after the COVID-19 Stimulus Package announcement in the US. SA was integrated into ENA models in two ways: as a part of the blocking variable and as a set of codes. Our results suggest that incorporating SA into ENA allowed for a better understanding of how groups viewed the components of the stimulus issue by splitting them by sentiment and enabled a meaningful inclusion of data with singular subject focus into the ENA models.
... Class information is also studied in distant supervision learning for relation extraction (Ye et al., 2017). In NLP, early work explores domain-invariant and domain-specific words to reduce domain discrepancy (Blitzer et al., 2007;Pan et al., 2010;He et al., 2011). ...
Chapter
Opinions are found everywhere. In web forums like social networking websites, e-commerce sites, etc., rich user-generated content is available in large volume. Web 2.0 has made rich information easily accessible. Manual insight extraction of information from these platforms is a cumbersome task. Deriving insight from such available information is known as opinion mining (OM). Opinion mining is not a single-stage process. Text mining and natural language processing (NLP) is used to obtain information from such data. In NLP, content from the text corpus is pre-processed, opinion word is extracted, and analysis of those words is done to get the opinion. The volume of web content is increasing every day. There is a demand for more ingenious techniques, which remains a challenge in opinion mining. The efficiency of opinion mining systems has not reached the satisfactory level because of the issues in various stages of opinion mining. This chapter will explain the various research issues and challenges present in each stage of opinion mining.
Chapter
Full-text available
-Presentation: https://www.youtube.com/watch?v=4pIohEySPLM&t=162s -Publication: https://link.springer.com/book/9789811928390
Article
Huge amount of unstructured data is posted on the cloud from various sources for the purpose of feedback and reviews. These review needs require classification for many a reasons and sentiment classification is one of them. Sentiment classification of these reviews quite difficult as they are arriving from many sources. A robust classifier is needed to deal with different data distributions. Traditional supervised machine learning approaches not works well as they require retraining when domain is changed. Deep learning techniques perform well to handle these situations, but they are more data hungry and computationally expensive.Transfer learning is a feature in the cross-domain sentiment classification where features are transferred from one domain to another without any training. Moreover, transfer learning allows the domains, tasks, and distributions used in training and testing to be different. Therefor transfer learning mechanism is required to transfer the sentiment features across the domains. This paper presents a transfer learning approach using pretrained language model, ELMO which helps in transferring sentiment features across domains. This model has been tested on text reviews posted on twitter data set and compared with deep learning methods with and without pretraining process, also our model delivers promising results. This model permits flexibility to plug and play parameters with target models with easier domain adaptivity and transfer sentiment features. Also, model enables sentiment classifiers by using the transferred features from an already trained domain and reuse the sentiment features by saving the time and training cost.
Conference Paper
Full-text available
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
Sentiment analysis seeks to identify the view- point(s) underlying a text span; an example appli- cation is classifying a movie review as "thumbs up" or "thumbs down". To determine this sentiment po- larity, we propose a novel machine-learning method that applies text-categorization techniques to just the subjective portions of the document. Extracting these portions can be implemented using efficient techniques for finding minimum cuts in graphs; this greatly facilitates incorporation of cross-sentence contextual constraints.
Article
The Dirichlet distribution and its compound variant, the Dirichlet-multinomial, are two of the most basic models for proportional data, such as the mix of vocabulary words in a text document. Yet the maximum-likelihood estimate of these distributions is not available in closed-form. This paper describes simple and efficient iterative schemes for obtaining parameter estimates in these models. In each case, a fixed-point iteration and a Newton-Raphson (or generalized Newton-Raphson) iteration is provided. 1 The Dirichlet distribution The Dirichlet distribution is a model of how proportions vary. Let p denote a random vector whose elements sum to 1, so that pk represents the proportion of item k. Under the Dirichlet model with parameter vector α, the probability density at p is p(p) ∼ D(α1,...,αK) = Γ(∑k αk) k Γ(αk)
Article
A novel technique for maximum “a posteriori” (MAP) adaptation of maximum entropy (MaxEnt) and maximum entropy Markov models (MEMM) is presented.The technique is applied to the problem of automatically capitalizing uniformly cased text. Automatic capitalization is a practically relevant problem: speech recognition output needs to be capitalized; also, modern word processors perform capitalization among other text proofing algorithms such as spelling correction and grammar checking. Capitalization can be also used as a preprocessing step in named entity extraction or machine translation.A “background” capitalizer trained on 20 M words of Wall Street Journal (WSJ) text from 1987 is adapted to two Broadcast News (BN) test sets – one containing ABC Primetime Live text and the other NPR Morning News/CNN Morning Edition text – from 1996.The “in-domain” performance of the WSJ capitalizer is 45% better relative to the 1-gram baseline, when evaluated on a test set drawn from WSJ 1994. When evaluating on the mismatched “out-of-domain” test data, the 1-gram baseline is outperformed by 60% relative; the improvement brought by the adaptation technique using a very small amount of matched BN data – 25–70k words – is about 20–25% relative. Overall, automatic capitalization error rate of 1.4% is achieved on BN data.The performance gain obtained by employing our adaptation technique using a tiny amount of out-of-domain training data on top of the background data is striking: as little as 0.14 M words of in-domain data brings more improvement than using 10 times more background training data (from 2 M words to 20 M words).
Conference Paper
This paper proposes a general framework, called EigenTransfer, to tackle a variety of transfer learning problems, e.g. cross-domain learning, self-taught learning, etc. Our basic idea is to construct a graph to represent the target transfer learning task. By learning the spectra of a graph which represents a learning task, we obtain a set of eigenvectors that reflect the intrinsic structure of the task graph. These eigenvectors can be used as the new features which transfer the knowledge from auxiliary data to help classify target data. Given an arbitrary non-transfer learner (e.g. SVM) and a particular transfer learning task, EigenTransfer can produce a transfer learner accordingly for the target transfer learning task. We apply EigenTransfer on three different transfer learning tasks, cross-domain learning, cross-category learning and self-taught learning, to demonstrate its unifying ability, and show through experiments that EigenTransfer can greatly outperform several representative non-transfer learners.
Conference Paper
Recent years have witnessed a large body of research works on cross-domain sentiment classification problem, where most of the research endeavors were based on a supervised learning strategy which builds models from only the labeled documents or only the labeled sentiment words. Unfortunately, such kind of supervised learning method usually fails to uncover the full knowledge between documents and sentiment words. Taking account of this limitation, in this paper, we propose an iterative reinforcement learning approach for cross-domain sentiment classification by simultaneously utilizing documents and words from both source domain and target domain. Our new method can make full use of the reinforcement between documents and words by fusing four kinds of relationships between documents and words. Experimental results indicate that our new method can improve the performance of cross-domain sentiment classification dramatically.
Conference Paper
Sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of users publishing sentiment data (e.g., reviews, blogs). Although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data, the labeling work can be time-consuming and expensive. Meanwhile, users often use some different words when they express sentiment in different domains. If we directly apply a classifier trained in one domain to other domains, the performance will be very low due to the differences between these domains. In this work, we develop a general solution to sentiment classification when we do not have any labels in a target domain but have some labeled data in a different domain, regarded as source domain. In this cross-domain sentiment classification setting, to bridge the gap between the domains, we propose a spectral feature alignment (SFA) algorithm to align domain-specific words from different domains into unified clusters, with the help of domain-independent words as a bridge. In this way, the clusters can be used to reduce the gap between domain-specific words of the two domains, which can be used to train sentiment classifiers in the target domain accurately. Compared to previous approaches, SFA can discover a robust representation for cross-domain data by fully exploiting the relationship between the domain-specific and domain-independent words via simultaneously co-clustering them in a common latent space. We perform extensive experiments on two real world datasets, and demonstrate that SFA significantly outperforms previous approaches to cross-domain sentiment classification.
Conference Paper
In this paper, we present a novel method based on CRFs in response to the two special characteristics of "contextual dependency" and "label redundancy" in sentence sentiment classification. We try to capture the contextual constraints on sentence sentiment using CRFs. Through introducing redundant labels into the original sentimental label set and organizing all labels into a hierarchy, our method can add redundant features into training for capturing the label redundancy. The experimental results prove that our method outperforms the traditional methods like NB, SVM, MaxEnt and standard chain CRFs. In comparison with the cascaded model, our method can effectively alleviate the error propagation among different layers and obtain better performance in each layer.