In this paper, we present a novel weakly-supervised method for cross-lingual sentiment analysis. In specific, we propose a
latent sentiment model (LSM) based on latent Dirichlet allocation where sentiment labels are considered as topics. Prior information
extracted from English sentiment lexicons through machine translation are incorporated into LSM model learning, where preferences
on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. An efficient
parameter estimation procedure using variational Bayes is presented. Experimental results on the Chinese product reviews show
that the weakly-supervised LSM model performs comparably to supervised classifiers such as Support vector Machines with an
average of 81% accuracy achieved over a total of 5484 review documents. Moreover, starting with a generic sentiment lexicon,
the LSM model is able to extract highly domain-specific polarity words from text.
KeywordsLatent sentiment model (LSM)–cross-lingual sentiment analysis–Generalized expectation–latent Dirichlet allocation