Conference PaperPDF Available

Aspect-level Sentiment Analysis using AS-Capsules


Abstract and Figures

Aspect-level sentiment analysis aims to provide complete and detailed view of sentiment analysis from different aspects. Existing solutions usually adopt a two-staged approach: first detecting aspect category in a document, then categorizing the polarity of opinion expressions for detected aspect(s). Inevitably, such methods lead to error accumulation. Moreover, aspect detection and aspect-level sentiment classification are highly correlated with each other. The key issue here is how to perform aspect detection and aspect-level sentiment classification jointly, and effectively. In this paper, we propose the aspect-level sentiment capsules model (AS-Capsules), which is capable of performing aspect detection and sentiment classification simultaneously, in a joint manner. AS-Capsules utilizes the correlation between aspect and sentiment through shared components including capsule embedding, shared encoders, and shared attentions. AS-Capsules is also capable of communicating with different capsules through a shared Recurrent Neural Network (RNN). More importantly, AS-Capsules model does not require any linguistic knowledge as additional input. Instead, through the attention mechanism, this model is able to attend aspect related words and sentiment words corresponding to different aspect(s). Experiments show that the AS-Capsules model achieves state-of-the-art performances on a benchmark dataset for aspect-level sentiment analysis.
Content may be subject to copyright.
Aspect-level Sentiment Analysis using AS-Capsules
Yequan Wang1,*Aixin Sun2Minlie Huang1Xiaoyan Zhu1
1Institute for Articial Intelligence, State Key Lab of Intelligent Technology and Systems
1Beijing National Research Center for Information Science and Technology
1Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
2School of Computer Science and Engineering, Nanyang Technological University, Singapore;;;
Aspect-level sentiment analysis aims to provide complete and de-
tailed view of sentiment analysis from dierent aspects. Existing
solutions usually adopt a two-staged approach: rst detecting aspect
category in a document, then categorizing the polarity of opinion
expressions for detected aspect(s). Inevitably, such methods lead
to error accumulation. Moreover, aspect detection and aspect-level
sentiment classication are highly correlated with each other. The
key issue here is how to perform aspect detection and aspect-level
sentiment classication jointly, and eectively. In this paper, we
propose the aspect-level sentiment capsules model (
which is capable of performing aspect detection and sentiment clas-
sication simultaneously, in a joint manner. AS-Capsules utilizes
the correlation between aspect and sentiment through shared com-
ponents including capsule embedding, shared encoders, and shared
attentions. AS-Capsules is also capable of communicating with dif-
ferent capsules through a shared Recurrent Neural Network (RNN).
More importantly, AS-Capsules model does not require any linguis-
tic knowledge as additional input. Instead, through the attention
mechanism, this model is able to attend aspect related words and
sentiment words corresponding to dierent aspect(s). Experiments
show that the AS-Capsules model achieves state-of-the-art perfor-
mances on a benchmark dataset for aspect-level sentiment analysis.
ACM Reference Format:
Yequan Wang
Aixin Sun
Minlie Huang
Xiaoyan Zhu
. 2019.
Aspect-level Sentiment Analysis using AS-Capsules. In Proceedings of the
2019 World Wide Web Conference (WWW ’19), May 13–17, 2019, San Francisco,
CA, USA. ACM, New York, NY, USA, 11 pages.
Sentiment analysis aims to analyze people’s sentiments, opinions,
evaluations, attitudes, and emotions from human languages [
Current researches focus on document-level (e.g., document, para-
graph, and sentence), or in-depth aspect-level analysis. As a ne-
grained task, aspect-level sentiment analysis provides complete
and detailed view of sentiments from dierent aspects. In general,
This work was done when Yequan was a visiting Ph.D student at School of Computer
Science and Engineering, Nanyang Technological University, Singapore.
This paper is published under the Creative Commons Attribution 4.0 International
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their
personal and corporate Web sites with the appropriate attribution.
WWW ’19, May 13–17, 2019, San Francisco, CA, USA
2019 IW3C2 (International World Wide Web Conference Committee), published
under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-6674-8/19/05.
this task requires aspect detection and aspect-level sentiment clas-
sication. Most existing solutions rst detect aspect category in a
sentence, and then categorize the polarity of opinion expressions
with respect to the detected aspect(s). In other words, the two
subtasks are tackled separately. As a result, errors made in aspect
detection would aect aspect-level sentiment classication. On the
other hand, the two subtasks, i.e., aspect detection and aspect-level
sentiment classication, are highly correlated with each other.
In this paper, we propose the aspect-level sentiment capsules
) model. This model utilizes the correlation between
aspects and corresponding sentiments. Hence, we jointly perform
the two subtasks: aspect detection and aspect-level sentiment clas-
The Research Problem.
In aspect-level sentiment analysis, we
have a predened set of aspects
A={a1,a2,· · · ,aM}
, and a pre-
dened set of sentiment polarities
P={o1,o2,· · · ,oP}
. Given
a piece of text (e.g., a sentence or a paragraph), denoted by
[w1,w2, . . . , wN]
, the task is to predict the aspect(s) and the cor-
responding sentiment(s), i.e., aspect-sentiment pairs
, ex-
pressed in the text.
Consider the task is to be conducted on restaurant reviews. The
set of aspects
can be {food, service, price, ambience, anecdote},
and the set of sentiment polarities
may include {positive,neutral,
negative}. Each restaurant review is considered a piece of text
Given an example review “Stas are not that friendly, but the taste
covers all., the expected output will be aspect-sentiment pairs
{⟨food,positive,service,negative }
. Accordingly, such pairs are
available for sample inputs as training data, where a supervised
sentiment detection algorithm could learn from.
Recently, sentiment analysis has attracted wide attention. As-
pect detection has been studied as a subtask of aspect-level sen-
timent analysis. The goal of aspect detection is to identify the
aspect categories in sentence instead of extracting aspect terms.
The aspect category set is predened in advance. Most existing
researches focus on classical machine learning models using clas-
siers with rich features, or deep learning models. LR and SVM
are among the popular and eective classiers. Unigram, Bigram
and Lexicon features are the most important features for aspect
detection. Many neural network models have been proposed for
various tasks including sentiment analysis, such as Recursive Auto
Encoder (RAE)[
], Recurrent Neural Network (RNN) [
Convolutional Neural Network (CNN) [
], and more. The
common solutions typically train deep learning models by using
the basic models, and ne-tuning the word representations pre-
trained by word2vec [
] or glove [
]. Attention-based LSTM with
aspect embedding (ATAE-LSTM) [
] is shown eective to enforce
the neural model to attend the related part of a sentence, with re-
sponse to a specic aspect. Some variants of RNN and attention
are proposed to improve the performance of aspect-level sentiment
classication. However, ATAE-LSTM and its variants need aspect
category as input, which limits the application. As we mentioned
before, how to take advantage of the relationship between aspect
detection and sentiment classication is the key.
In spite of the great success of neural network models, linguistic
knowledge [
] is often required to achieve the best performance
for sentiment analysis. However, linguistic knowledge is domain
specic and costly to obtain. For example, the words to express
positive and negative opinions in restaurant reviews will be very
dierent from the words used in movie reviews. Further, many
neural network models cannot provide explanations on how and
why the predictions are made. Very recently, RNN-Capsule [
demonstrates state-of-the-art accuracy on sentence-level sentiment
classication. It does not require linguistic knowledge and is capable
of outputting meaningful words with sentiment tendency. Inspired
by RNN-Capsule, we propose AS-Capsules model for aspect-level
sentiment analysis.
The AS-Capsules Model.
The concept of “capsule” was proposed
by Hinton et al. in 2011 [
]. A capsule is a group of neurons that
“perform some quite complicated internal computations on their
inputs and then encapsulate the results of these computations into a
small vector of highly informative outputs” [
]. Following this high-
level concept, each capsule in RNN-Capsule was designed to predict
one sentiment polarity (e.g., positive, negative, and neutral) [
In this work, we follow the same high-level concept of capsule as
in RNN-Capsule, to design the AS-Capsules model for aspect-level
sentiment analysis.1
Specically, each individual capsule in the AS-Capsules model
contains an attribute,a state,a capsule embedding, and four modules.
The four modules are ‘aspect representation module’, ‘aspect prob-
ability module’, ‘sentiment representation module’, and ‘sentiment
distribution module’. For each predened aspect, we build a capsule
whose attribute is the same as the aspect category (e.g., food or
price). Given a piece of text, we represent the given text by the
low-level hidden vectors encoded by an encoder RNN. All capsules
take the low-level hidden representations as their input, and each
capsule outputs: (i) the aspect probability computed by its aspect
probability module, and (ii) the sentiment distribution computed
by its sentiment distribution module. All capsules utilize a shared
RNN to communicate with each other to prevent capsules from at-
tending conict parts. The hidden representation of shared RNN is
high-level because its input is the output of low-level encoder RNN.
All attentions in capsule rely on the capsule embedding, which is
capable of enforcing the model to focus on the correlated parts
with respect to aspect. Aspect representation module and senti-
ment representation module share one representation generated
by a component known as shared attention.
Compared with most existing neural network models for senti-
ment analysis, the AS-Capsules model does not heavily rely on the
quality of input instance representation. The RNN layer to encode
the given text input can be realized through the widely used LSTM
Our AS-Capsules model is dierent from the idea of Capsule Network (CapsNet) [
models, GRU models or their variants. The model does not require
any linguistic knowledge. Instead, each capsule is capable of out-
putting two kinds of attended words, one kind of words to reect
its assigned aspect category, and the other to reect the sentiment
tendency. Both sets of words are learned through the attention
mechanisms. Experiments show that the words attended by each
capsule well reect the capsule’s aspect category and sentiment
tendency. We observe that the attended words cover a wide range of
words from high frequency words to low frequency words. As low
frequency words are not usually covered in sentiment lexicon, the
domain-dependent aspect and sentiment words could be extremely
useful sense making from the feedbacks to services or products. To
summarize, the main contributions of this work are as follows:
We propose AS-Capsules model to simultaneously perform
aspect detection and aspect-level sentiment classication. A
capsule is easy to build with input representations encoded
by RNN. Each capsule contains an attribute, a state, a capsule
embedding, and four modules known as aspect representation
module, aspect probability module, sentiment representation
module, and sentiment distribution module.
The proposed AS-Capsules model does not require any linguis-
tic knowledge to achieve state-of-the-art performance. Instead,
the model is able to attend both sentiment words and aspect
words reecting the aspect knowledge of the dataset.
We conduct experiments on a benchmark dataset (SemEval
2014 Task 4 dataset) to compare our model with strong base-
lines. Results show that our model is competitive and robust.
We further show that our model trained on the SemEval dataset
could be directly applied on Yelp reviews and output meaning-
ful results.
Early approaches for sentiment analysis are mostly based on fea-
ture engineering and manually dened rules [
]. Recently, neural
networks become the mainstream for sentiment analysis. Most
current studies focus on improving the quality of vector repre-
sentation of input instance using dierent models e.g., RNN, RAE,
CNN. We briey review the related works on aspect detection, and
aspect-level sentiment classication.
Aspect Detection.
Aspect detection aims at identifying aspects
about which users express their sentiments. A popular approach
for aspect detection features a frequency-based method [
], where
single nouns and compound nouns are considered possible aspects.
Hence, only explicit aspects are detected. The authors in [
] then
employ association rules mining to nd implicit aspects. Instead
of focusing on frequencies, syntax-based methods have also been
used to detect aspects by means of syntactical relations [
]. In
general, this kind of models operates in an unsupervised manner.
In [
], the authors propose a hybrid model where pointwise mutual
information is used to nd possible aspects, which are then fed into
a Naive Bayes classier to output a set of explicit aspects. There
are also works like [
] to formulate aspect detection as a labeling
problem, and a linear chain Conditional Random Field (CRF) is
Recently, relevant aspects are identied by employing word
embedding techniques [
]. The method utilizes semantic and syn-
tactic relationships in word embedding vectors in order to improve
the extraction of multiple words aspects and distinguish conict
aspects. The eectiveness of word embeddings is investigated for
aspect-level sentiment analysis in [
], in which both semantic and
sentiment information are encoded. A semi-supervised word em-
bedding algorithm is proposed in [
] to obtain continuous word
embeddings on a large set of reviews. Then the word represen-
tations could be used to generate deeper and hybrid features to
predict the aspect category.
To improve the performance of aspect detection, additional con-
volutional neural network features are extended in [
] besides
extracting lots of features including lexicon, syntactic and cluster.
Nevertheless expensive human eort and CNN operations limit its
Aspect-level Sentiment Analysis.
Aspect-level sentiment classi-
cation deals with ne-grained classication with respect to spe-
cic aspect(s). Traditional approaches are to design a set of features
manually. There are lexicon-based features built for sentiment anal-
ysis [
] with the abundance of sentiment lexicons [
Many studies focus on building sentiment classiers with bag-of-
words, sentiment lexicons, and other features, using SVM [
] or
other classiers. However, the results highly depend on the quality
of features, and feature engineering is expensive.
Recently, many neural network models have been developed to
tackle sentiment analysis. Transferring knowledge from existing
public datasets [
] or pre-annotated information [
] improves the
performance of aspect-term level sentiment classication. There
are methods using memory network or linguistic knowledge to
improve the performance of aspect-level sentiment classication.
An innovative model named CEA is proposed in [
] using context
memory, entity memory, and aspect memory.
Attention mechanism has shown to be eective in many ap-
plications including machine translation [
], sentiment analy-
sis [
], summarization [
], and more. Given a sentence and
the corresponding aspect(s), Attention-based LSTM with Aspect
Embedding (ATAE-LSTM) [
] is able to predict the sentiment po-
larity at the aspect level. ATAE-LSTM is the rst model to predict
dierent sentiment tendencies with respect to dierent aspects
in the same text. Aspect embedding and aspect based attention
mechanism are designed in ATAE-LSTM to utilize aspect infor-
mation eectively. The attention mechanism is well-designed to
attend the dierent parts of a sentence when considering dierent
aspects. To improve the performance of attention mechanism, [
proposes a method named Aspect Fusion LSTM for incorporating
aspect information for learning attentions. Motivated by similar
reason, [
] proposes content attention with two enhancing at-
tention mechanism. Multi-Head attention [
] jointly attends to
information from dierent representation subspaces at dierent po-
sitions. In the proposed AS-Capsules model, we design attentions
for every representation module in each capsule, which benet
from both low-level representation and high-level representation
through dierent levels of encoder. In addition, a shared attention
is able to generating shared representation as a part of both aspect
and sentiment representations.
The aspect-level sentiment capsules (AS-Capsules) model has its
root in RNN-Capsule. Next we briey describe RNN-Capsule, then
detail the design of AS-Capsules model and its optimization method.
3.1 Preliminary: RNN-Capsule
Recurrent Neural Network.
As the name suggests, RNN-Capsule
is based on RNN. A recurrent neural network (RNN) is able to
exhibit dynamic temporal behavior for a time sequence through
connections between units. A unit can be realized by an LSTM
model, a GRU model, or their variants. RNNs can be bi-directional,
by using a nite sequence to predict or label each element in the
sequence based on the element’s past and future contexts. This is
achieved by concatenating the outputs of two RNNs, one processes
the sequence from left to right, and the other from right to left.
Briey speaking, in an RNN realized by LSTM, the hidden states
and memory cell
in LSTM is a function of the previous
and ct1, and input vector xt, or formally as follows:
The hidden state
denotes the representation of position
encoding the preceding contexts of the position. More details about
LSTM are given in [9].
RNN-Capsule is designed to predict the sentiment
category (e.g., positive, negative, and neutral) of a given piece of
text. The input text is encoded by RNN and the hidden vector
representations are input to all capsules. One capsule is built for
one sentiment category and each capsule contains an attribute,
a state, and three modules. The attribute of a capsule reects its
dedicated sentiment category (e.g., positive). The three modules
are: (i) representation module for building capsule representation
using attention mechanism, (ii) probability module for predicting
the capsule’s state probability based on its representation, and (iii)
reconstruction module for rebuilding the representation of the input
instance. A capsule’s state is ‘active’ if the output of its probability
module is the largest among all capsules, and ‘inactive’ otherwise.
There are two learning objectives in RNN-Capsule network. The
rst is to maximize the state probability of the capsule correspond-
ing to the groundtruth sentiment, and to minimize the state proba-
bilities of other capsule(s). The second is to minimize the distance
between the input representation and the reconstruction represen-
tation of the capsule corresponding to the ground truth, and to
maximize such distances for other capsule(s).
RNN-Capsule is not designed for aspect-level sentiment analysis
and each capsule in RNN-Capsule corresponds to one sentiment cat-
egory. The sentiment category predicted by RNN-Capsule therefore
does not reect the sentiment on any particular aspect. Aspect-
level sentiment classication relies on aspect category heavily, so
well-designed shared components for the two subtasks will benet
a lot. The high-level shared RNN between capsules is important, it
allows capsules to cooperate to prevent capsules from attending
conict parts.
3.2 The AS-Capsules Model
The architecture of the proposed AS-Capsules model is depicted
in Figure 1. The number of the capsules
equals to the number
Cap 1
Cap 2 Cap M
Figure 1: Architecture of AS-Capsules Model. Number of cap-
sules equals the number of aspect categories. The hidden
vectors H1are encoded by RNNe, which encodes the input
text. All capsules take H1as input and each capsule outputs
its aspect probability pand sentiment distribution P.
of predened aspect categories. For example, we need 5 capsules
to model a set of 5 categories
, and a capsule is built for each aspect category.
As in RNN-Capsule [
], we use an encoder RNN named
encode the input text.
Given a piece of text (e.g., a sentence, a paragraph, a document),
encodes the given instance and outputs the hidden repre-
. Briey speaking, in the encoder RNN, the hidden
are function of
and input word representations
W, or formally:
The word representations
W=[w1,w2, . . . , wN]
are obtained from
glove, and
is the length of input text. The base unit of RNN is an
LSTM, GRU, or their variants, e.g., bi-directional LSTM.
All the capsules take the same hidden vectors
as their in-
put, and they share an RNN named
. Obviously, dierent
aspect ought to attend dierent parts of input text, which is not
well-designed in RNN-Capsule [
allows capsules to com-
municate with each other, which is capable of preventing capsules
from attending conict parts. The high-level hidden representa-
tions are the hidden vectors of
. High-level here means that
the input of
is the hidden representation of
. There
are special designed attentions in capsule, and we will explain
them in Section 3.3. Each capsule outputs an aspect probability
and a sentiment distribution, through its aspect probability module
and sentiment distribution module, respectively. Analyzer refers
to the training objective strategy, which is capable of utilizing the
correlation between aspect detection and aspect-level sentiment
classication to improve the performance.
3.3 Structure of A Single Capsule
The structure of a single capsule is shown in Figure 2. A capsule
contains an attribute,a state,a capsule embedding, and four modules
(aspect representation module, aspect probability module, senti-
ment representation module, and sentiment distribution module).
Aspect Sentiment
ATasp AT sen
Figure 2: The architecture of a single capsule. The input to a
capsule is the hidden vectors H1from RNNe.ecis the capsule
embedding. The output is aspect probability pand sentiment
distribution Pof this capsule.
The attribute of a capsule reects its dedicated aspect category,
which is pre-assigned when we build the capsule. Depending
on the number of aspect categories in a given problem, the same
number of capsules are built to reect each aspect category.
The state of a capsule, i.e., ‘active’ or ‘inactive’, is determined by
aspect probability. To learn the model, a capsule’s state is active
if the current aspect appears in input text. Then AS-Capsules
maximizes the aspect probability of the current active capsule.
In testing, a capsule’s state will be active if its aspect probability
is above a predened threshold e.g., 0
5. Note that, a given
piece of text may contain opinions on multiple aspects, then
multiple capsules will be active.
Capsule embedding
is a vector representation of the current
capsule learned during training as in [
]. Recall that each
capsule is assigned to learn one particle embedding and dif-
ferent aspects often demonstrate dierent word features for
expressing the aspect and sentiment tendency. For example,
“the pizza is delicious” demonstrates food aspect with positive
sentiment. Here, the words “pizza” and “delicious” will not be
applicable for other aspects like service or price. Therefore, for
each capsule, we learn its capsule embedding.
Aspect representation module learns the aspect representa-
including three parts
, as shown in Figure 2.
Given the hidden representations
as input, we compute the
high-level representation H2through RNNs. The rst part vh
is the weighted representation using the aspect-based
with capsule embedding
as input.
Utilizing the attention scores
, we can get
the second representation vl
afor aspect. The shared represen-
, which is the output of the shared attention
, is
the last part of
is shared with sentiment representa-
tion module. The aspect probability module then predicts the
capsule’s aspect probability
based on
. Similarly, sentiment
representation module computes sentiment representation
of this capsule through similar method. Based on
, the senti-
ment distribution module generates sentiment distribution
on the predened sentiment categories e.g., positive, negative,
The essence of the AS-Capsules model is the four modules briefed
above. Next, we detail the four modules.
Aspect Representation Module.
Given the hidden vectors
encoded by
, we are able to compute high-level hidden repre-
, which is shared by all capsules. Then, we
use capsule embedding
and two attention mechanisms known
as aspect-based attention
and shared attention
, to
construct aspect representation
inside each capsule. The aspect-
based attention attends the indicative words based on the aspect de-
tection task. The shared attention benets both aspect detection and
sentiment classication. Our formulation is inspired by [
Specically, given
as the low-level representations of the input
text, we get a high-level hidden representations through
is the shared encoder. The input of
is the con-
catenation of
to enforce the RNN to attend the aspect
correlated parts, inspired by AE-LSTM [
]. After getting
, we
compute the high-level aspect representation
and attention
weights αathrough aspect-based attention ATasp, or formally:
Aspect-based attention, sentiment-based attention and shared at-
tention have the same structure, so here we only detail the attention
mechanism in ATasp:
is the operator that repeatedly concatenates
are the parameters of the current capsule for
the aspect-based attention layer. The attention importance score
is obtained by multiplying the representations with the weight
matrix, and then normalizing to a probability distribution over
the words. Lastly, the high-level aspect representation vector
is a weighted summation over all the positions using the attention
importance scores as weights.
The shared attention
is computed in a similar manner as
the aspect-based attention.
To obtain the original, or low-level aspect representation
the input of capsule, we use the attention weights from aspect-based
attention to weight the low-level input H1.
Lastly, we get the aspect representation by concatenating
and vs,
where square brackets refer to the concatenation of vectors.
Aspect Probability Module.
The aspect probability
is computed
by a sigmoid function after getting the aspect representation ra.
are the parameters for the aspect probability
module of the current capsule.
Sentiment Representation Module.
The sentiment representa-
tion module is computed in a similar manner as the aspect repre-
sentation module. The sentiment representation rocontains three
parts, known as high-level sentiment representation
, low-level
sentiment representation
and shared representation
. The high-
level sentiment representation
is computed in sentiment-based
After getting the attention importance scores
in sentiment rep-
resentation module, we utilize the low-level information through
In the two formulas above,
is the sentiment-based attention.
is the capsule embedding.
are the hidden representa-
tions of
. The attention importance score for each
position for sentiment is
. The sentiment representation vector
rois obtained by concatenating vh
oand vs.
Note that, both the aspect representation and the sentiment rep-
resentation obtained from attention layer are high-level encodings
of the entire input text. The attention mechanism designed in the
model is for improving the model’s capability and robustness.
Sentiment Distribution Module.
The sentiment distribution
is computed by a softmax function after getting the sentiment
representation vector ro.
are the parameters for the sentiment distribution
module of the current capsule.
The above four modules complement each other in the AS-
Capsules model. From a macro perspective i.e., the full dataset,
the words attended by dierent capsules match the capsules’ at-
tribute. From micro perspective i.e., a piece of input text, the state
‘active’ or ‘inactive’ of a capsule is determined by its aspect prob-
. The sentiment distribution is with respect to the aspect
category of the current capsule.
There are two dierent ways to utilize the correlation between
aspect and sentiment. One way is to draw support from the capsule
structure. In our rst attempt, we design the capsule structure with
two independent attentions based on their own embeddings named
aspect embedding and sentiment embedding. However, the two
independent embeddings have the same eect, and cannot benet
from the correlation between aspect and sentiment eectively. Mo-
tivated by shared embedding [
] and multi-head attention [
], in
our current design, aspect representation module and sentiment
representation module share one embedding known as capsule
embedding. Specially, we design aspect-based attention, sentiment-
based attention and shared attention. Aspect-based attention and
sentiment-based attention are shared by low-level and high-level
RNN encoders to get hierarchical representations of input text.
Shared attention is used to generate shared representation for both
aspect detection and sentiment classication. The other way to
utilize the correlation considers the co-occurrence of aspect and
sentiment pair. We use the idea of mask. That is, we only keep the
cross entropy of aspect(s) that appear in the input text and ignore
the irrelevant aspect(s) in the learning objective.
3.4 Training Objective
The training of the proposed AS-Capsules model has two objec-
tives. One is to maximize the aspect probability of active capsule(s)
matching the ground truth and minimize aspect probability of the
inactive capsule(s). The other is to minimize the cross-entropy of
sentiment distribution of active capsules.
Aspect Probability Objective.
A given text may express senti-
ments on multiple aspects. Hence, we have both positive sample(s)
(i.e., the active capsule(s)) and negative sample(s) (i.e., the inactive
capsule(s)). Recall that our objective is to maximize the aspect prob-
ability of active capsules and to minimize the aspect probability of
inactive capsules. The classication objective
can be formulated
by cross entropy loss:
is the aspect probability of the capsule
. For a given train-
ing instance,
1for an active capsule (i.e., the corresponding
aspect occurs in the given text), and
0for an inactive capsule.
Mis the number of aspect categories, where is 5.
Aspect-level Sentiment Classication Objective.
The other ob-
jective is to ensure the accuracy of aspect-level sentiment classica-
tion of the active capsule(s). Similarly, the unregularized objective
Ucan be formulated as cross entropy loss:
We only utilize the cross-entropy loss of the ‘active’ capsule(s).
is the groundtruth sentiment of the ‘active’ capsule
is the
predicted sentiment distribution of the capsule i.
Considering both objectives, our nal objective function
obtained by adding Jand U:
We now evaluate the proposed AS-Capsules for aspect detection
and aspect-level sentiment classication, against baselines.
4.1 Dataset and Model Implementation Details
We conduct experiments on SemEval 2014 Task 4 dataset [
The dataset consists of customer reviews for laptops and restaurants,
but only restaurant reviews are annotated with aspect-specic po-
larity. Hence we conduct experiments on restaurant reviews.
The restaurant reviews consist of about 3K English sentences
from [
]. We randomly cut out
as validation dataset, and the
rest is the training dataset. Additional restaurant reviews, not in
2Refer to for more details.
Table 1: The statistics of restaurant reviews on SemEval 2014
Task 4 dataset. The sentiment category ‘conict’ is not used.
Aspect Positive Negative Neural
Train Test Train Test Train Test
Food 867 302 209 69 90 31
Price 179 51 115 28 10 1
Service 324 101 218 63 20 3
Ambience 263 76 98 21 23 8
Anecdote 546 127 199 41 357 51
Total 2179 657 839 222 500 94
the original dataset of [
], are used as test data. The set of as-
is {food, service, price, ambience, anecdote}. There are four
sentiment categories: positive,neutral,negative and conict. Each
sentence is assigned one or more aspects together with a polar-
ity label for each aspect, e.g., “Stas are not that friendly, but the
taste covers all.” would be assigned the aspect-sentiment pairs
{⟨food,positive,service,negative }
. In our experiments, we use
the rst three sentiment categories, as in most other studies.
Table 1 presents the statistics of the SemEval 2014 Task 4 dataset.
The two aspects food and anecdote have the largest number of in-
stances. However, it is hard to think about words that are indicative
for aspect anecdote.
Implementation Details.
In our experiments, all word vectors are
initialized by Glove [
The pre-trained word embeddings and
capsule embedding have dimensions of 300 and 256, respectively.
The dimension of hidden vectors encoded by RNN is 256 (hence 512
if RNN is bidirectional). We use a single layer bidirectional LSTM in
AS-Capsules. The model is trained with a batch size of 16 examples,
and there is a checkpoint every 8 mini-batch due to the small size
of dataset. Dropout is 0.5 for word embedding and linear layer in
aspect probability module and sentiment distribution module.
We implement our model on Pytorch (version 0.4).
Model pa-
rameters are randomly initialized. Adam [
] is utilized to optimize
our model, and we use 1
3as learning rate for model parameters
except word vectors, and 1e4for word vectors.
4.2 Evaluation on Three Subtasks
SemEval 2014 task 4 dataset is widely used for aspect-level senti-
ment analysis. Because of the detailed annotation, multiple eval-
uations can be conducted. We report our experiments on three
subtasks: aspect detection, sentiment classication on given as-
pects, and aspect-level sentiment analysis.
4.2.1 Subtask 1: Aspect Detection. Given a piece of text, the task
of aspect detection is to predict the existence of predened as-
pects, which is a typical multi-label classication task. We compare
AS-Capsules with state-of-the-art baselines designed for aspect de-
tection. Specially, we compare detailed
of all categories with ef-
fective joint deep learning baselines including Bi-LSTM, AE-LSTM,
AT-LSTM, and RNN-Capsule.
Table 2: Micro F1of methods on SemEval 2014 Task 4 dataset.
Best results are in bold face and second best underlined.
Model micro-F1Model micro-F1
KNN 63.9 HLBL 69.7
LR 66.0 C&W 72.5
SVM 80.8 word2vec 83.3
SemEval-Average 73.8 Hybrid-WRL-300 88.6
NRC-Lexicon 84.1 Hybrid-WRL-Best 90.1
NRC 88.6 AS-Capsules 89.6
KNN is the baseline provided by SemEval ocial [
]. First, the
Dice coecient is used to calculate the similarity to nd
similar sentences in training dataset for given test sentence. Then,
most frequent aspect categories of the
retrieved sentences will
be assigned to the test sentence.
is the number of most frequent
aspect categories per sentence among the
sentences. Logistic
Regression (LR) and Support Vector machine (SVM) are used as
classiers using unigram and bigram features. SemEval-Average
is the average result of all the systems in SemEval 2014. NRC, the
best system in SemEval 2014, adopts SVM as the classier with
some well-designed features including n-grams, stemmed n-grams,
character n-grams, non-contiguous n-grams, word cluster n-grams
and lexicons. NRC-Lexicon is the result without the lexicon feature.
Some word representation methods are compared with our AS-
Capsules. C&W [
] and word2vec [
] are the powerful and ac-
cepted generally methods for word representation learning. HLBL [
is a hierarchical model, which performs well using a carefully con-
structed hierarchy over words. Hybrid-WRL [
] is a word repre-
sentation learning method using hybrid features including shared-
features and aspect-specic features.
Table 2 lists the micro
of aspect detection on restaurants re-
views. Our proposed AS-Capsules model achieves the second best.
Notice that, there are two kinds of Hybrid-WRL methods. The dier-
ence between them is the size of word representation. Hybrid-WRL-
300 uses 300 as word representation size, 600 for Hybrid-WRL-Best.
Our AS-Capsules adopts 300 as the dimension of word represen-
tation, improves 1percentage than Hybrid-WRL at the same size
of word representation. Obviously, there are more complexity and
parameters with bigger size of word representation. For classical
machine learning methods, SVM performs better than KNN and LR.
NRC achieves the best SemEval result with the textual features and
lexicons features using SVM as classier. Without Lexicons features,
NRC-Lexicon is 4percentage lower than NRC. Word representation
methods perform better than KNN, LR and SVM, however they
perform worse than NRC with well-designed features.
Although classical machine learning methods have shown their
eectiveness, it is better to compare with neural network meth-
ods. Bidirectional-LSTM (Bi-LSTM) is a variant of LSTM which
is introduced in Section 3.1. It is capable of utilizing the content
information through the bidirectional structure. Aspect Embedding
LSTM (AE-LSTM) is proposed in [
], where aspect embeddings are
concatenated with word vectors. AE-LSTM considers aspect infor-
mation so it is expected to perform better than Bi-LSTM. Dierent
parts of input sentence have dierent importance, so attention is a
Table 3: The average F1and F1of dierent aspects for Sub-
task 1: aspect detection.
Model Average Food Price Service Ambience Anecdote
Bi-LSTM 83.0 92.6 79.2 88.8 74.8 79.6
AE-LSTM 84.5 93.8 83.3 88.8 78.2 78.5
AT-LSTM 84.5 91.6 83.0 85.6 83.0 79.2
RNN-Capsule 85.5 93.8 85.4 89.4 80.6 78.3
AS-Capsules 87.2 93.4 85.9 91.0 83.3 82.4
Table 4: Accuracy and F1of sentiment categories for Subtask
2: sentiment classication on given aspects.
Model Accuracy F1-Positive F1-Neutral F1-Negative
Bi-LSTM 82.1 89.2 49.7 73.2
AE-LSTM 82.8 89.8 54.2 72.6
AT-LSTM 83.0 89.8 47.3 75.8
ATAE-LSTM 84.3 90.1 61.9 77.5
AS-Capsules 85.0 91.2 50.7 78.7
powerful way to address this problem. Attention based LSTM (AT-
] further uses attention mechanism, which performs
better than other baselines. RNN-Capsule uses each capsule to de-
tect one category, sentiment category in its original paper [
], and
aspect in this experiment.
Reported in Table 3, our proposed AS-Capsules is the best per-
forming method, followed by RNN-Capsule. Specically, AS-Capsules
achieves the best average
, best results on four aspects except
aspect food. The gap between AS-Capsules and the best model in
aspect food is very small. AE-LSTM and AT-LSTM deliver similar
results and Bi-LSTM performs the poorest.
4.2.2 Subtask 2: Sentiment Classification on Given Aspects. Given
a piece of text and also the annotated aspect, the task is to predict
the sentiment expressed in the text on the given aspect. For this
subtask, we compare AS-Capsules with four baselines: Bi-LSTM,
AE-LSTM, AT-LSTM, and Attention-based LSTM with Aspect Em-
bedding (ATAE-LSTM) [
]. ATAE-LSTM unitizes aspect informa-
tion to attend the important words in sentence with respect to a
specic aspect. Note that RNN-Capsule cannot be applied to this
subtask because RNN-Capsule is self-attentive, and it cannot take
aspect as an additional input.
From the results reported in Table 4, we observe that AS-Capsules
achieves the best accuracy of 85.0. It outperforms all baselines with
respect to the
results on the positive and negative sentiment
categories. Due to less neutral data in dataset, AS-Capsules per-
forms not good enough. Among the baseline methods, ATAE-LSTM
outperforms the rest. AE-LSTM and AT-LSTM perform better than
Bi-LSTM. Among three sentiment categories, positive is much eas-
ier to predict and neutral is the most dicult category due to the
smallest size of data.
4.2.3 Subtask 3: Aspect-level Sentiment Analysis. Given a piece
of text, subtask 3 requires a method to detect
aspect ,sentiment
pair(s) from the input text. A detected pair is considered correct if
Table 5: Accuracy and F1of dierent sentiment categories
for Subtask 3: aspect-level sentiment analysis.
Model Accuracy F1-Positive F1-Neutral F1-Negative
Bi-LSTM 62.3 80.2 43.6 56.6
AE-LSTM 64.7 82.4 50.3 55.7
AT-LSTM 65.6 82.4 46.6 57.7
AS-Capsules 68.1 83.3 53.6 61.6
both components in the pair are correctly identied. We compare
AS-Capsules with Bi-LSTM, AE-LSTM and AT-LSTM. Again, RNN-
Capsule is not designed to classify sentiment with given aspect(s)
and ATAE-LSTM needs aspect as additional input; hence these two
models are not applicable to this task.
Reported in Table 5, AS-Capsules model delivers the best results
on accuracy. It also outperforms all baselines with respects to the
results on the three sentiment categories. AE-LSTM and AT-LSTM
perform similarly and both outperform Bi-LSTM. Similar to earlier
observations, sentiment positive is relatively easier to detect and
neutral is the hardest category.
We have shown that AS-Capsules model outperforms all baseline
models for aspect-level sentiment analysis. Because of the attention
mechanism, AS-Capsules model is able to attend meaningful words
in the aspect representation module and sentiment representation
module. Specically, each word is assigned an attention weight in
aspect representation module and sentiment representation module
by multiplying aspect probability with aspect attention weight and
sentiment attention weight, respectively.
As a case study, we show the words attended by AS-Capsules
during test. That is, after all test instances are evaluated, we obtain
two lists of attended words from each capsule with their attention
weights for aspect and sentiment respectively. Due to page limit,
we can only display a small number of words with some ranking
criteria. A straightforward ranking is by the averaged attention
weight of a word. By this ranking, most top-ranked words are
of low frequency. That is, some words have signicant attention
weight (or strong aspect or sentiment tendencies) but do not appear
very often. Another way of ranking is by the product of averaged
attention weight and the logarithm of word frequency.
Words Attended for Aspect Detection.
Table 7a lists the top-
ranked 20 words by the product of average attention weight and
logarithm of word frequency, for the ve aspects. Table 7b lists the
top-ranked 20 words by average attention weight.
From the two tables, we observe that almost attended words
are self-explanatory for the assigned aspect category. For instance,
the attended words by capsule food are mostly food categories or
ingredients. Capsule service attends words for personnel involved
and their attitude. Without the need of any linguistic knowledge, the
AS-Capsules model is able to identify words that reect the aspect
categories. This provides an easy way to build domain specic
lexicons on domain specic data.
Table 6: Word attended by sentiment representation module
in Capsule food. Signicant words and low frequency words
are ranked by average aention weight ×log(word frequency)
and average aention weight, respectively.
No. Signicant words Freq Low frequency words Freq
1 delicious 22 divine 1
2 great 67 satisfying 2
3 tasty 8 favourites 1
4 good 55 scrumptious 2
5 best 24 terric 1
6 fresh 23 yummy 5
7 yummy 5 greatest 1
8 excellent 23 frosty 1
9 amazing 7 delicious 22
10 outstanding 5 fave 1
11 fantastic 5 luscious 1
12 wonderful 8 tasty 8
13 mouth 5 unexpected 1
14 superb 3 winner 1
15 delectable 3 refreshing 2
16 recommend 8 recomend 1
17 satisfying 2 avor 2
18 scrumptious 2 highlight 1
19 perfect 6 lemons 1
20 sweet 5 delicate 1
Words Attended for Aspect-level Sentiment Classication.
We show the sentiment words attended by AS-Capsules for capsule
food with positive sentiment tendency. Other aspects are omitted
due to limited space. Similarly, Table 6 lists the top-ranked 20 signif-
icant words and low frequency words ranked by average attention
logarithm of word frequency and average attention weight,
respectively. These words are consistent with sentiment lexicons
identied in related studies [
]. All of those words reect
the positive sentiment tendency signicantly. The attention weights
of all signicant words are above 0.35. More importantly, the words
are used commonly. For low frequency words, the attention weights
are over 0.77. Even though the words are used not so often, they
belong to small but beautiful word group. ‘delicious’ exists in both
columns due to a very positive sentiment tendency and common
use. The neutral sentiment attends lots of punctuation marks and
meaningless words e.g.,, ‘and’, ‘a’ and ‘is’, so they are not shown.
Negative sentiment attends signicant words including ‘terrible’,
‘worst’ and ‘disappointed’ whose attention weights are over 0.1.
5.1 Applying AS-Capsules on Yelp Reviews
To the best of our knowledge, the SemEval 2014 Task 4 dataset is the
only dataset that comes with aspect-level sentiment annotations.
This limits the evaluation of our model on aspect-level sentiment
analysis. On the other hand, many restaurant reviews are available
from other domains without aspect-level manual annotation. The
Yelp dataset
is an example. As most reviews on Yelp are for restau-
rants, we can directly apply the trained AS-Capsules to conduct a
qualitative evaluation. Because there are no aspect-level annota-
tions on Yelp dataset, we are unable to report quantitative measures
Table 7: Words attended by aspect representation module in all capsules on SemEval dataset.
(a) Top ranked words by average attention weight ×log(word frequency).
No. Food Freq Price Freq Service Freq Ambience Freq Anecdote Freq
1 food 150 prices 18 service 76 atmosphere 21 ! 23
2 sushi 22 cheap 10 sta 28 decor 5 meal 3
3 pizza 14 price 11 waiter 10 ambiance 5 . 172
4 meal 12 priced 7 waiters 7 space 4 restaurant 20
5 menu 23 value 5 bartender 6 music 5 sushi 3
6 desserts 8 inexpensive 3 attentive 8 cozy 6 food 7
7 portions 7 expensive 4 waitress 4 ambience 3 menu 6
8 pasta 5 bill 4 owner 5 interior 2 experience 11
9 wine 14 overpriced 2 servers 3 room 4 place 28
10 sauce 11 money 3 hostess 3 intimate 4 italian 4
11 shrimp 7 aordable 2 friendly 22 quiet 3 in 43
12 soup 9 pay 3 courteous 3 clean 2 the 98
13 dessert 5 over 3 greeted 3 chic 2 dining 4
14 dishes 9 reasonable 5 politely 2 crowded 2 at 26
15 cheese 10 the 81 accomodating 2 scene 2 money 3
16 seafood 6 for 23 rude 9 downstairs 2 pizza 2
17 chicken 17 your 4 prompt 5 , 78 for 36
18 cuisine 4 reasonably 4 bartenders 2 place 21 night 7
19 crab 5 worth 4 manager 4 laid-back 4 here 17
20 crust 4 at 14 asked 3 like 8 this 53
(b) Words ranked by average attention weight.
No. Food Freq Price Freq Service Freq Ambience Freq Anecdote Freq
1 trues 1 overpriced 2 service 76 claustrophobic 1 thai 1
2 hotdogs 1 pricey 1 politely 2 chairs 1 cost 1
3 meat 1 inexpensive 3 accomodating 2 bathroom 1 meal 3
4 meats 1 cheap 10 lady 1 patio 1 agave 1
5 wines 2 deal 1 servers 3 decor 5 service 1
6 sangria 2 prices 18 hostess 3 chill 1 desserts 1
7 pancakes 1 price 11 solicitous 1 decoration 1 delicacy 1
8 mojitos 1 priced 7 brusquely 1 landscaping 1 sum 1
9 codsh 1 value 5 bartenders 2 pretentious 1 pizza 2
10 crepes 1 wallet 1 courteous 3 singer 1 tacos 1
11 pizzas 1 expensive 4 port 1 setting 1 sushi 3
12 breads 1 cost 1 waitress 4 lighting 1 martinis 1
13 guacamole 1 bill 4 greeted 3 atmosphere 21 delivary 1
14 tequila 1 spend 1 polite 2 space 4 crepes 1
15 cookies 1 14 1 sta 28 bumping 1 steak 1
16 pudding 1 cheaper 1 awful 1 rooftop 1 astoria 1
17 marscapone 1 investment 1 waiters 7 uncomfortably 1 taste 1
18 meatball 1 50 1 gracious 1 air 1 brunch 2
19 bbq 2 aordable 2 bartender 6 ambiance 5 diamond 1
20 chili 2 6.25 1 owner 5 interior 2 stock 1
like accuracy or
. However, it is interesting to observe whether
the AS-Capsules model learned on SemEval 2014 Task 4 dataset can
be used to identify meaningful words on Yelp dataset, to reects its
aspects and sentiment categories.
In this case study, we take the rst 1,000 reviews from Yelp
dataset. As Yelp reviews are relatively long and each review has
several paragraphs so we split the reviews by paragraphs and con-
sider each paragraph a test input to AS-Capsules model. As a result,
we have 3,776 test instances from Yelp dataset.
Words Attended for Aspect Detection on Yelp.
Table 8 lists the
top ranked words identied by AS-Capsules for the ve aspects.
We provide two rankings following our earlier ranking criteria.
Observe that regardless word frequencies, most of them well reect
the corresponding aspect category. For example, ‘beers’, ‘sauce
and ‘pizza’ are identied as important words for food. Interestingly,
most low frequency words are meaningful to the related aspects. For
example, ‘porridge’, ‘terrine’ and ‘llings’ are attended by capsule
food. More interestingly, some numbers e.g., 45, 50 and 20, are
Table 8: Words attended by aspect representation module on Yelp dataset.
No. Ranked by average attention weight ×log(word frequency) Ranked by average attention weight
Food Price Service Ambience Anecdote Food Price Service Ambience Anecdote
1 food prices service atmosphere ! chilaquiles pricy managers jazz gelato
2 beers cheap sta decor craving porridge aordable station scene pies
3 sauce price friendly patio again terrine overpriced stas smoke greeted
4 beer overpriced polite cramped breakfast llings overcharged unattentive pretension catering
5 menu priced server ambiance sushi oatmeal prices waitsta cramped panini
6 foods expensive waitress music eaten miso inexpensive politely noisy intriguing
7 pizza pricey employees vibe pasta sauces cheap greeting venues eww
8 meal deal courteous crowded burger milkshakes priced apologetic cozy ahhh...
9 cheese cost greeted bathroom buying wintermelon expensive answering rowdy cakes
10 meat buy attentive rooms dinner concepts pricey polite atmosphere sandwiches
11 chicken cheaper bartender seating eat beers price courteous decor burgers
12 sushi dollar servers cozy grocery unagi cheaper marketing nondescript treated
13 beef pay manager walls menu shakes deal service crowded soeur
14 dessert aordable waiter relaxing sandwiches cheeses 45 greet pretentious tidy
15 burger bucks desk oor service foods costs handled pop salty
16 sauces million patient noisy dining appetite cost sta interior pasta
17 pork pricing waitsta space lunch concoctions 600 servers jukebox donut
18 ingredients buck apologetic loud heaven maya pricing manager claustrophobic digress
19 burgers charged answering air brunch tartare 50 employees ornate nooo...
20 lemonade inexpensive prompt interior steakhouse pig 25 greeted mellow bagel
Table 9: Word attended by sentiment representation module
in Capsule food on Yelp dataset.
No. Signicant words Freq Low frequency words Freq
1 delicious 108 dynamic 1
2 tasty 51 delicious 108
3 yummy 31 yummy 31
4 excellent 28 adventurous 3
5 fresh 51 friendliest 1
6 fantastic 37 satisfying 6
7 great 203 superb 2
8 good 390 tasty 51
9 amazing 71 meaty 1
10 avorful 13 palates 1
11 avor 79 heavenly 6
12 satisfying 6 scrumptious 4
13 avour 13 chai 1
14 incredible 14 ecient 1
15 heavenly 6 loves 3
16 awesome 34 delectable 1
17 enjoy 44 excellent 28
18 avors 20 palate 3
19 enjoying 9 avorful 13
20 goodness 9 bite 6
attended in capsule price. The phenomenon reects that our AS-
Capsules model performs well for low frequency words.
Words Attended for Aspect-level Sentiment Classication
on Yelp.
We now present the words attended by sentiment rep-
resentation module. Table 9 lists the top ranked signicant words
and low frequency words identied by capsule food for positive
sentiments. Most of the words attended by neutral are not very
meaningful, so they are not shown. Most of the attended words
t their sentiment tendency in sentiment lexicons. Specially, we
observe that most of them are able to reect food aspect. As we
mentioned before, ‘delicious’ is a very common word for describing
food, it is also shown in low frequency words because we rank the
attended words by their attention weight.
In this paper, we study aspect-level sentiment analysis and propose
aspect-level sentiment capsules (AS-Capsules) model. The key idea
of AS-Capsules model is to use capsule structure to focus on each
aspect category. Each capsule outputs its aspect probability and sen-
timent distribution on the targeted aspect. The objective of learning
is to maximize the aspect probability of the capsule(s) matching
the groundtruth and to minimize its (their) sentiment cross entropy
loss. Through shared components including capsule embedding,
shared encoders and shared attentions, our model utilizes the cor-
relation between aspects and corresponding sentiments eectively.
Experiments show that the proposed AS-Capsules model achieves
state-of-the-art performance without the need of linguistic knowl-
edge. We show that the capsules are able to identify words best
reect the aspect category and sentiment tendency. We also show
that the model can be directly applied to restaurant reviews on Yelp,
demonstrating its eectiveness and robustness.
This work was partially supported by Singapore Ministry of Educa-
tion Research Fund MOE2014-T2-2-066, the National Key R&D Pro-
gram of China (Grant No. 2018YFC0830200), the National Science
Foundation of China (Grant No. 61876096/61332007), and China
Scholarship Council.
[1] Abdulaziz Alghunaim, Mitra Mohtarami, Scott Cyphers, and Jim Glass. 2015. A
Vector Space Approach for Aspect Based Sentiment Analysis. In Proc. NAACL
HLT. 116–122.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine
Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473
Ronan Collobert and Jason Weston. 2008. A unied architecture for natural
language processing: deep neural networks with multitask learning. In Proc.
ICML. 160–167.
Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann.
2017. Using millions of emoji occurrences to learn any-domain representations
for detecting sentiment, emotion and sarcasm. In Proc. EMNLP. 1615–1625.
Gayatree Ganu, Noemie Elhadad, and Amélie Marian. 2009. Beyond the Stars:
Improving Rating Predictions using Review Text Content. In Proc. WebDB.
Zhen Hai, Kuiyu Chang, and Jung-jae Kim. 2011. Implicit Feature Identication
via Co-occurrence Association Rule Mining. In Proc. Computational Linguistics
and Intelligent Text Processing - International Conference, CICLing, Part I. 393–404.
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2018. Exploiting
Document Knowledge for Aspect-level Sentiment Classication. In Proc. ACL,
Volume 2: Short Papers. 579–585.
Georey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming
Auto-Encoders. In Proc. ICANN, Part I. 44–51.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory.
Neural Computation 9, 8 (1997), 1735–1780.
Minqing Hu and Bing Liu. 2004. Mining Opinion Features in Customer Reviews.
In Proc. National Conference on Articial Intelligence, Conference on Innovative
Applications of Articial Intelligence. 755–760.
Niklas Jakob and Iryna Gurevych. 2010. Extracting Opinion Targets in a Single
and Cross-Domain Setting with Conditional Random Fields. In Proc. EMNLP.
Nobuhiro Kaji and Masaru Kitsuregawa. 2007. Building Lexicon for Sentiment
Analysis from Massive Collection of HTML Documents. In Proc. EMNLP-CoNLL.
Nal Kalchbrenner, EdwardGrefenstette, and Phil Blunsom. 2014. A Convolutional
Neural Network for Modelling Sentences. In Proc. ACL, Volume 1: Long Papers.
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classication. In
Proc. EMNLP. 1746–1751.
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
mization. CoRR abs/1412.6980 (2014).
Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. 2015. Molding CNNs for text:
non-linear, non-consecutive convolutions. In Proc. EMNLP. 1565–1575.
Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool
Qiao Liu, Haibin Zhang, Yifu Zeng, Ziqi Huang, and Zufeng Wu. 2018. Content
Attention Model for Aspect Based Sentiment Analysis. In Proc. WWW. 1023–
Tomáš Mikolov. 2012. Statistical language models based on neural networks.
Presentation at Google, Mountain View, 2nd April (2012).
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jerey Dean.
2013. Distributed Representations of Words and Phrases and their Composition-
ality. In Proc. NIPS. 3111–3119.
Andriy Mnih and Georey E. Hinton. 2008. A Scalable Hierarchical Distributed
Language Model. In Proc. NIPS. 1081–1088.
Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada:
Building the State-of-the-Art in Sentiment Analysis of Tweets. In Proc. Workshop
on Semantic Evaluation, SemEval@NAACL-HLT. 321–327.
Tony Mullen and Nigel Collier. 2004. Sentiment Analysis using Support Vector
Machines with Diverse Information Sources. In Proc. EMNLP. 412–418.
Bo Pang and Lillian Lee. 2007. Opinion Mining and Sentiment Analysis. Founda-
tions and Trends in Information Retrieval 2, 1-2 (2007), 1–135.
Jerey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove:
Global Vectors for Word Representation. In Proc. EMNLP. 1532–1543.
Verónica Pérez-Rosas, Carmen Banea, and Rada Mihalcea. 2012. Learning Senti-
ment Lexicons in Spanish. In Proc. LREC. 3077–3081.
Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion An-
droutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based
Sentiment Analysis. In Proc. Workshop on Semantic Evaluation, SemEval@COLING.
Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion
Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect
Based Sentiment Analysis. In Proc. SemEval@COLING. 27–35.
Ana-Maria Popescu and Oren Etzioni. 2005. Extracting Product Features and
Opinions from Reviews. In Proc. HLT/EMNLP. 339–346.
Qiao Qian, Minlie Huang, JinHao Lei, and Xiaoyan Zhu. 2017. Linguistically
Regularized LSTMs for Sentiment Classication. In Proc. ACL, Vol. 1. 1679–1689.
Delip Rao and Deepak Ravichandran. 2009. Semi-Supervised Polarity Lexicon
Induction. In Proc. EACL. 675–682.
Seyyed Aref Razavi and Masoud Asadpour. 2017. Word embedding-based ap-
proach to aspect detection for aspect-based summarization of persian customer
reviews. In Proc. IML. 33:1–33:10.
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention
Model for Abstractive Sentence Summarization. In Proc. EMNLP. 379–389.
Sara Sabour, Nicholas Frosst, and Georey E. Hinton. 2017. Dynamic Routing
Between Capsules. In Proc. NIPS. 3859–3869.
Kim Schouten and Flavius Frasincar. 2016. Survey on Aspect-Level Sentiment
Analysis. IEEE Trans. Knowl. Data Eng. 28, 3 (2016), 813–830.
Richard Socher, Jerey Pennington, Eric H. Huang, Andrew Y. Ng, and Christo-
pher D. Manning. 2011. Semi-Supervised Recursive Autoencoders for Predicting
Sentiment Distributions. In Proc. EMNLP. 151–161.
Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Man-
ning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for
semantic compositionality over a sentiment treebank. In Proc. EMNLP, Vol. 1631.
Citeseer, 1642.
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved
Semantic Representations From Tree-Structured Long Short-Term Memory Net-
works. In Proc. ACL, Volume 1: Long Papers. 1556–1566.
Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu. 2015. Target-Dependent
Sentiment Classication with Long Short Term Memory. CoRR abs/1512.01100
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document Modeling with Gated
Recurrent Neural Network for Sentiment Classication. In Proc. EMNLP. 1422–
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Learning to Attend via Word-
Aspect Associative Fusion for Aspect-Based Sentiment Analysis. In Proc. AAAI.
Zhiqiang Toh and Jian Su. 2016. NLANGP at SemEval-2016 Task 5: Improving
Aspect Based Sentiment Analysis using Neural Network Features. In Proc. NAACL
HLT. 282–288.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All
you Need. In Proc. NIPS. 6000–6010.
Yequan Wang, Minlie Huang, Li Zhao, and Xiaoyan Zhu. 2016. Attention-based
LSTM for Aspect-level Sentiment Classication. In Proc. EMNLP. 606–615.
Yequan Wang, Aixin Sun, Jialong Han, Ying Liu, and Xiaoyan Zhu. 2018. Senti-
ment Analysis by Capsules. In Proc. WWW. 1165–1174.
Jun Yang, Runqi Yang, Chongjun Wang, and Junyuan Xie. 2018. Multi-Entity
Aspect-Based Sentiment Analysis With Context, Entity and Aspect Memory. In
Proc. AAAI. 6029–6036.
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Ed-
uard H. Hovy. 2016. Hierarchical Attention Networks for Document Classication.
In Proc. NAACL HLT. 1480–1489.
Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain. 2010. Extracting
and Ranking Product Features in Opinion Documents. In Proc. COLING, Posters
Volume. 1462–1470.
Yanyan Zhao, Bing Qin, Shen Hu, and Ting Liu. 2010. Generalizing Syntactic
Structures for Product Attribute Candidate Extraction. In Proc. HLT NAACL.
Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2015. Representation Learning for
Aspect Category Detection in Online Reviews. In Proc. AAAI. 417–424.
... We follow the definition of aspect category sentiment analysis (ACSA) in previous studies [27,14,8]. There are k predefined aspect categories A = {a 1 , . . . ...
... ATAE-CAN-2R o [5] uses aspect detection as an auxiliary task. AS-Capsule [27] is a capsule alike network. Each capsule encloses a set of computations for one aspect. ...
Full-text available
A sentence may express sentiments on multiple aspects. When these aspects are associated with different sentiment polarities, a model's accuracy is often adversely affected. We observe that multiple aspects in such hard sentences are mostly expressed through multiple clauses, or formally known as elementary discourse units (EDUs), and one EDU tends to express a single aspect with unitary sentiment towards that aspect. In this paper, we propose to consider EDU boundaries in sentence modeling, with attentions at both word and EDU levels. Specifically, we highlight sentiment-bearing words in EDU through word-level sparse attention. Then at EDU level, we force the model to attend to the right EDU for the right aspect, by using EDU-level sparse attention and orthogonal regularization. Experiments on three benchmark datasets show that our simple EDU-Attention model outperforms state-of-the-art baselines. Because EDU can be automatically segmented with high accuracy, our model can be applied to sentences directly without the need of manual EDU boundary annotation.
... The AS-Capsules model is capable of performing criteria detection and criteria-level sentiment classification simultaneously, it achieves this through exploring the relevance between criteria and corresponding sentiments. Experiments have proven that the model has state-of-the-art performance [58]. ...
... But the cleanliness was poor." , the sentiment orientations of criteria can be identified by the AS-Capsules model as three criterion-sentiment pairs-⟨location, positive⟩ , ⟨service, positive⟩ , ⟨cleanliness, negative⟩ . A detailed description of the model could be found in the literature [58]. ...
Full-text available
Many efforts have been dedicated to the research on online reviews-based hotel selection. However, the existing methods often fail to take into account the types of potential travellers and, moreover, they rarely consider the impact of the interdependencies among hotel criteria when modelling the hotel selection process. To counter these defects, this paper proposes a multi-criteria decision-making model for hotel selection that considers the types of potential travellers and the interdependencies among criteria. To achieve this, the proposed model first converts online reviews into picture fuzzy numbers, and introduces 2-order additive fuzzy measures to capture the interdependencies among criteria by proposing a novel aggregation function based on Choquet average integral. Furthermore, it uses the similarity between each traveller type and the potential traveller type to calculate the weights of different traveller types so as to account the impact of traveller type. To validate the effectiveness of the proposed model, a case study is conducted on the reviews collected from TripAdvisor. The study shows that the proposed decision support model can effectively help potential travellers of different traveller types to make their desirable hotel choices.
... One example is [97], where an attention-based capsule network is proposed that also includes a multi-hop attention mechanism for the purpose of visual question answering. Another example is [98], where capsule-based attention is used for aspect-level sentiment analysis of restaurant reviews. ...
Attention is an important mechanism that can be employed for a variety of deep learning models across many different domains and tasks. This survey provides an overview of the most important attention mechanisms proposed in the literature. The various attention mechanisms are explained by means of a framework consisting of a general attention model, uniform notation, and a comprehensive taxonomy of attention mechanisms. Furthermore, the various measures for evaluating attention models are reviewed, and methods to characterize the structure of attention models based on the proposed framework are discussed. Last, future work in the field of attention models is considered.
... The proposed model jointly learns the two tasks and produces improved performance for both tasks. In [211], a capsule attention model is proposed that also jointly learns the detection and classification tasks in an end-to-end manner. This model produces state-of-the-art results for both tasks. ...
With the constantly growing number of reviews and other sentiment-bearing texts on the Web, the demand for automatic sentiment analysis algorithms continues to expand. Aspect-based sentiment classification (ABSC) allows for the automatic extraction of highly fine-grained sentiment information from text documents or sentences. In this survey, the rapidly evolving state of the research on ABSC is reviewed. A novel taxonomy is proposed that categorizes the ABSC models into three major categories: knowledge-based, machine learning, and hybrid models. This taxonomy is accompanied with summarizing overviews of the reported model performances, and both technical and intuitive explanations of the various ABSC models. State-of-the-art ABSC models are discussed, such as models based on the transformer model, and hybrid deep learning models that incorporate knowledge bases. Additionally, various techniques for representing the model inputs and evaluating the model outputs are reviewed. Furthermore, trends in the research on ABSC are identified and a discussion is provided on the ways in which the field of ABSC can be advanced in the future.
With the rapid rise of e-commerce, a big number of things are being sold online, and a growing number of people are making purchases online. Users can get valuable information from online reviews before purchasing a product or making a purchase. We investigate the peculiarities of their behaviour based on their early reviews. In this study, customer feedback linked with various products is collected from several online shopping websites in order to forecast product ratings based on user feedback utilising opinion mining. We classified the product's lifespan into three segments at first (Early, majority and laggards). A person who posts a review at an early stage is considered to be an early responder to the product. The product reviews are analysed using machine learning techniques. They give comments, and products are subsequently recommended for purchase and sale based on that factor. Users can provide product reviews on popular e-commerce platforms like Flipkart, Myntra, Amazon, and many others. To purchase a product, the consumer will investigate to gain a deeper grasp of the product and how it works. The interpretation will be a very straightforward product with inferior, superior, and neutral product checks. This experiment is carried out using machine learning techniques. Sentiment Analysis is a type of market research in which customers are aware of their reaction to a product. Individual decision-makers, businesses, and governments can all benefit from the awareness of feeling.
Named Entity Recognition (NER) or the extraction of concepts from clinical text is the task of identifying entities in text and slotting them into categories such as problems, treatments, tests, clinical departments, occurrences (such as admission and discharge) and others. NER forms a critical component of processing and leveraging unstructured data from Electronic Health Records (EHR). While identifying the spans and categories of concepts is itself a challenging task, these entities could also have attributes such as negation that pivot their meanings implied to the consumers of the named entities. There has been little research dedicated to identifying the entities and their qualifying attributes together. This research hopes to contribute to the area of detecting entities and their corresponding attributes by modelling the NER task as a supervised, multi-label tagging problem with each of the attributes assigned tagging sequence labels. In this paper, we propose 3 architectures to achieve this multi-label entity tagging: BiLSTM n-CRF, BiLSTM-CRF-Smax-TF and BiLSTM n-CRF-TF. We evaluate these methods on the 2010 i2b2/VA and the i2b2 2012 shared task datasets. Our different models obtain best NER scores of 0.903 and 0.808 on the i2b2 2010/VA and i2b2 2012 respectively. The highest span based micro-averaged F1 polarity scores obtained were 0.832 and 0.836 on the i2b2 2010/VA and i2b2 2012 datasets respectively, and the highest macro-averaged F1 polarity scores obtained were 0.924 and 0.888 respectively. The modality studies conducted on i2b2 2012 dataset revealed high scores of 0.818 and 0.501 for span based micro-averaged F1 and macro-averaged F1 respectively.
Existing aspect-based/category sentiment analysis methods have shown great success in detecting sentiment polarity towards a given aspect in a sentence with supervised learning, where the training and inference stages share the same pre-defined set of aspects. However, in practice, the aspect categories are changing rather than keeping fixed over time. Dealing with unseen aspect categories is under-explored in existing methods. In this paper, we formulate a new few-shot aspect category sentiment analysis (FSACSA) task, which aims to effectively predict the sentiment polarity of previously unseen aspect categories. To this end, we propose a novel Aspect-Focused Meta-Learning (AFML) framework, which constructs aspect-aware and aspect-contrastive representations from external knowledge to match the target aspect with aspects in the training set. Concretely, we first construct two auxiliary contrastive sentences for a given sentence with the incorporation of external knowledge, enabling the learning of sentence representations with a better generalization. Then, we devise an aspect-focused induction network to leverage the contextual sentiment towards a given aspect to refine the label vectors. Furthermore, we employ the episode-based meta-learning algorithm to train the whole network, so as to learn to generalize to novel aspects. Extensive experiments on multiple real-life datasets show that our proposed AFML framework achieves the state-of-the-art results for the FSACSA task.
The pair-wise aspect and opinion term extraction (PAOTE) task aims to extract aspect terms and opinion terms from reviews in the form of opinion pairs, which provides a global profile for reviews of goods or users. Up-to-date studies ignore the interaction between term detection and term pairing, which may be crucial for the PAOTE task. Other studies use syntactic dependency structures to enhance their models, which cannot better provide task-specific structural information. In this work, we design an aspect-to-opinion graph and transform PAOTE into a graph parsing task. To exploit the interaction between term detection and pairing, we propose a novel mutually-aware interaction network (MAIN), which interactively updates the representations for term detection and pairing via graph sampling and convolution. Further, the word-word graph learned during training can be iteratively refined and gradually approaches the aspect-to-opinion graph. Experimental results on four benchmark datasets show that our proposed method significantly outperforms strong baselines with state-of-the-art performance and achieves a maximum increase of 2.01 points on the F1 metric. Further analysis demonstrates the advance of the aspect-to-opinion graph and the effectiveness of the mutually-aware interaction mechanism.
A large number of deep learning classification methods for emotion recognition tasks based on electroencephalogram (EEG) have achieved excellent performance, and it is implicitly assumed that all labels are correct. However, humans have natural bias, subjectiveness and inconsistencies in their judgement, which would lead to noisy label for the EEG emotion state. To this end, we propose a framework for multi-channel EEG-based emotion recognition in the presence of noisy label. The proposed noisy label classification method is based on the capsule network using a joint optimization strategy (JO-CapsNet) until convergence. Specifically, the network parameters are updated based on the loss function of capsule network, and the pseudo label is updated by predicting the existence possibility of the class label based on the output of capsule network. By this way, the alternate updating strategy can promote each other to correct the noisy label. Experimental results demonstrate the advantage of our method.
Conference Paper
Full-text available
In this paper, we propose RNN-Capsule, a capsule model based on Recurrent Neural Network (RNN) for sentiment analysis. For a given problem, one capsule is built for each sentiment category e.g., 'positive' and 'negative'. Each capsule has an attribute, a state, and three modules: representation module, probability module, and reconstruction module. The attribute of a capsule is the assigned sentiment category. Given an instance encoded in hidden vectors by a typical RNN, the representation module builds capsule representation by the attention mechanism. Based on capsule representation, the probability module computes the capsule's state probability. A capsule's state is active if its state probability is the largest among all capsules for the given instance, and inactive otherwise. On two benchmark datasets (i.e., Movie Review and Stanford Sentiment Treebank) and one proprietary dataset (i.e., Hospital Feedback), we show that RNN-Capsule achieves state-of-the-art performance on sentiment classification. More importantly, without using any linguistic knowledge, RNN-Capsule is capable of outputting words with sentiment tendencies reflecting capsules' attributes. The words well reflect the domain specificity of the dataset.
Full-text available
Aspect-based sentiment analysis (ABSA) tries to predict the polarity of a given document with respect to a given aspect entity. While neural network architectures have been successful in predicting the overall polarity of sentences, aspect-specific sentiment analysis still remains as an open problem. In this paper, we propose a novel method for integrating aspect information into the neural model. More specifically, we incorporate aspect information into the neural model by modeling word-aspect relationships. Our novel model, \textit{Aspect Fusion LSTM} (AF-LSTM) learns to attend based on associative relationships between sentence words and aspect which allows our model to adaptively focus on the correct words given an aspect term. This ameliorates the flaws of other state-of-the-art models that utilize naive concatenations to model word-aspect similarity. Instead, our model adopts circular convolution and circular correlation to model the similarity between aspect and words and elegantly incorporates this within a differentiable neural attention framework. Finally, our model is end-to-end differentiable and highly related to convolution-correlation (holographic like) memories. Our proposed neural model achieves state-of-the-art performance on benchmark datasets, outperforming ATAE-LSTM by $4\%-5\%$ on average across multiple datasets.
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. Opinion Mining and Sentiment Analysis covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. The focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. The survey includes an enumeration of the various applications, a look at general challenges and discusses categorization, extraction and summarization. Finally, it moves beyond just the technical issues, devoting significant attention to the broader implications that the development of opinion-oriented information-access services have: questions of privacy, vulnerability to manipulation, and whether or not reviews can have measurable economic impact. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided. Opinion Mining and Sentiment Analysis is the first such comprehensive survey of this vibrant and important research area and will be of interest to anyone with an interest in opinion-oriented information-seeking systems.
Fine-grained sentiment analysis is a useful tool for producers to understand consumers’ needs as well as complaints about products and related aspects from online platforms. In this article, we define a novel task named “Multi-Entity Aspect-Based Sentiment Analysis (ME-ABSA)”. It investigates the sentiment towards entities and their related aspects. It makes the well-studied aspect-based sentiment analysis a special case of this type, where the number of entities is limited to one. We contribute a new dataset for this task, with multi-entity Chinese posts in it. We propose to model context, entity, and aspect memory to address the task and incorporate dependency information for further improvement. Experiments show that our methods perform significantly better than baseline methods on datasets for both ME-ABSA task and ABSA task. The in-depth analysis further validates the effectiveness of our methods and shows that our methods are capable of generalizing to new (entity, aspect) combinations with little loss of accuracy. This observation indicates that data annotation in real applications can be largely simplified.
Conference Paper
Many¹ ecommerce websites provide the customers with the ability to share their opinions about the products. These opinions can assist other customers to purchase wisely and manufacturers to improve their products and services. Due to a huge volume of product reviews in the online websites and hence difficulty of perusing all of them, it is essential to produce a concise summary of the reviews about different aspects of the products. Aspect detection is a vital step of aspect-based summarization, aiming at identifying the most important product aspects about which users express their opinions. In this paper, we propose a novel unsupervised approach to aspect detection employing word embedding techniques to identify relevant aspects and their semantically related words, called aspect keywords and categorize aspects into semantic categories. The main purpose of our method is to use semantic and syntactic relationships in word embedding vectors in order to improve extraction of multiword aspects and distinguishing explicit and implicit aspects from their keywords. Our experimental results indicate the improvements.
Conference Paper
Aspect based sentiment classification is a crucial task for sentiment analysis. Recent advances in neural attention models demonstrate that they can be helpful in aspect based sentiment classification tasks, which can help identify the focus words in human. However, according to our empirical study, prevalent content attention mechanisms proposed for aspect based sentiment classification mostly focus on identifying the sentiment words or shifters, without considering the relevance of such words with respect to the given aspects in the sentence. Therefore, they are usually insufficient for dealing with multi-aspect sentences and the syntactically complex sentence structures. To solve this problem, we propose a novel content attention based aspect based sentiment classification model, with two attention enhancing mechanisms: sentence-level content attention mechanism is capable of capturing the important information about given aspects from a global perspective, whiles the context attention mechanism is responsible for simultaneously taking the order of the words and their correlations into account, by embedding them into a series of customized memories. Experimental results demonstrate that our model outperforms the state-of-the-art, in which the proposed mechanisms play a key role.
A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part. We use the length of the activity vector to represent the probability that the entity exists and its orientation to represent the instantiation paramters. Active capsules at one level make predictions, via transformation matrices, for the instantiation parameters of higher-level capsules. When multiple predictions agree, a higher level capsule becomes active. We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. To achieve these results we use an iterative routing-by-agreement mechanism: A lower-level capsule prefers to send its output to higher level capsules whose activity vectors have a big scalar product with the prediction coming from the lower-level capsule.
NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-the-art performance on 8 benchmark datasets within sentiment, emotion and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.