ArticlePDF Available

Aspect based Sentiment Analysis with Feature Enhanced Attention CNN-BiLSTM

Authors:

Abstract and Figures

Previous work has recognized the importance of using the attention mechanism to obtain the interaction between aspect words and contexts for sentiment analysis. However, for the most attention mechanisms, it is unrigorous to use the average vector of the aspect words to calculate the context attention. Besides, the feature extraction ability of the model is also essential for effective analysis, the combination of CNN and LSTM can enhance the feature extraction ability and semantic expression ability of the model, which is also a popular research trend. This paper introduces an aspect level neural network for sentiment analysis named Feature Enhanced Attention CNN-BiLSTM (FEA-NN). Our method is to extract a higher-level phrase representation sequence from the embedding layer by using CNN, which provides effective support for subsequent coding tasks. In order to improve the quality of context encoding and preserve semantic information, we use BiLSTM to capture both local features of phrases as well as global and temporal sentence semantics. Besides, we add an attention mechanism to model interaction relationships between aspect words and sentences to focus on those keywords of targets to learn more effective context representation. We evaluate the proposed model on three datasets: Restaurant, Laptop, and Twitter. Extensive experiments show that the effectivess of FEA-NN.
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 1
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
Aspect based Sentiment Analysis with Feature
Enhanced Attention CNN-BiLSTM
WEI MENG1, YONGQING WEI2, PEIYU LIU1, ZHENFANG ZHU3 AND HONGXIA YIN1
1 School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
2 Basic education Department, Shandong Police College, Jinan 250014, China
3School of Information Science and Electrical Engineering, Shando ng Jiaotong University, Jinan 250357, China
Corresponding author: Yongqing Wei (e-mail: weiyongqing@sdpc.edu.cn).
This work was supported in part by the National Social Science Fund under Grant 19BYY076, in part by the Science Foundation of Ministry of Education
of China under Grant 14YJC860042, and in part by the Shandong Provincial Social Science Planning Project under Grant 19BJCJ51 18CXWJ01
18BJYJ04.
ABSTRACT Previous work has recognized the importance of using the attention mechanism to obtain the
interaction between aspect words and contexts for sentiment analysis. However, for the most attention
mechanisms, it is unrigorous to use the average vector of the aspect words to calculate the context attention.
Besides, the feature extraction ability of the model is also essential for effective analysis, the combination
of CNN and LSTM can enhance the feature extraction ability and semantic expression ability of the model,
which is also a popular research trend. This paper introduces an aspect level neural network for sentiment
analysis named Feature Enhanced Attention CNN-BiLSTM (FEA-NN). Our method is to extract a higher-
level phrase representation sequence from the embedding layer by using CNN, which provides effective
support for subsequent coding tasks. In order to improve the quality of context encoding and preserve
semantic information, we use BiLSTM to capture both local features of phrases as well as global and
temporal sentence semantics. Besides, we add an attention mechanism to model interaction relationships
between aspect words and sentences to focus on those keywords of targets to learn more effective context
representation. We evaluate the proposed model on three datasets: Restaurant, Laptop, and Twitter.
Extensive experiments show that the effectivess of FEA-NN.
INDEX TERMS Aspect-based sentiment analysis, BiLSTM, CNN, Attention mechanism.
I. INTRODUCTION
As a significant subtask in the field of natural language
processing (NLP), sentiment analysis has attracted more
and more researchers' attention in recent years [1]. It is also
widely used in data mining, question and answer,
recommendation systems, and information retrieval. The
rise of this field has benefited from the rapid development
of social media and e-commerce. Especially in the field of
e-commerce, a massive number of reviews reflecting
consumers' sentiments are published to different aspects of
products and services, such as quality, price and so on. The
purpose of Sentiment analysis is to capture opinions and
emotions in user reviews that can be used to predict
customs demands and to provide support in company
decision-making. It is necessary to research to help users to
concentrate on vital information and reduce the interference
of spam [2]. According to the different granularity of
processing texts, traditional sentiment analysis could be
roughly divided into two levels: sentence-level and
document-level. However, it is difficult for traditional
sentiment analysis to find the sentiment for the aspects of
the entity. For the product reviews, the product itself is
usually the entity, and all things related to the product (e.g.,
price, quality, etc.) are aspects of it [3].
Aspect-based sentiment analysis is a fine-grained
sentiment analysis task. Its purpose is to identify the
sentiment polarity (e.g., positive, negative, and neutral) of
one sentence towards an aspect word (also called target
word). For example, the sentence of "the computer screen
looks great, but the battery life is too short," the aspects of
"screen" and "battery" are positive and negative,
respectively. The complete ABSA (Aspect-based Sentiment
Analysis) task includes two subtasks: aspect detection and
sentiment classification. In this article, we use aspect
identification to define aspect embeddings, because the
aspects we use are predefined. ABSA is a fundamental task
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
in natural language processing and catches many
researchers' attention [4]. Early researchs on the ABSA task
typically use machine learning algorithms and build
sentiment classifier in a supervised manner. Although
machine learning has achieved much success in aspect-
based sentiment analysis tasks, it requires large-scale text
preprocessing and complex feature engineering.
Representative approaches in the literature include feature-
based Support Vector Machine (SVM) [5] [6].
Recently, the neural network has made a breakthrough in
the field of text sentiment analysis with its superior
performance. Bahdanau et al. [7] constructed a language
model by using recurrent neural network and expressed
word vectors in low-dimensional space, which could better
measure the correlation between words and words.
Convolutional Neural Networks model (CNN) has been
successfully applied to text classification, which has
multiple convolution kernels of different sizes to extract
local features [8] [9]. Long Short-Term Memory (LSTM)
solves the temporal relationship between words in
sentences and the interactivity between words, which
effectively improves the accuracy of sentiment analysis [10]
[11] [12]. Grave et al. [13] proposed the use of recurrent
convolutional neural networks for text classification, which
uses a bidirectional cyclic structure to model text. Tai et al.
[14] introduces tree structure on the basis of LSTM network
to improve the semantic expression of sentences. Liu et al.
[15] proposed an AC-BiLSTM model for text classification,
and this model builds a network model by linking CNN and
LSTM and enhanced the model feature learning ability to
improve the model classification effect. Although these
efforts have yielded good results, these models could not
distinguish the different contributions of each word to the
entire sentence, which limited their performance.
In order to distinguish the different contributions of
different words in the sentence to the classification, the
attention model is also widely introduced into the neural
network model. Yin et al. [16] added attention mechanism
based on the CNN model, which had better results than the
model without attention mechanism. Researchers combined
the LSTM network and attention mechanism to conduct
sentiment classification based on specific aspects and
achieved better results than the previous model [4] [17].
However, different from the general sentiment analysis task,
if a sentence contains multiple aspects and each of which
has related words to embellish it, and other target words
would become noise, it is difficult to distinguish this
sentiment polarity by different aspects. So, we need to
consider the relevance among aspect words and context
words when we designing models. At the same time, it
proves the validity of the attention module in learning the
relevance between aspect words with context words [18]
[19] [20].
Despite these advances in the attention mechanism, there
are still some problems that have not been resolved. When
an aspect word contains multiple words, most existing
researches only compute the average attention vector of the
aspect word, this oversimplified approach introduces noise
into the attention module, reducing classification accuracy.
For example, "bottle of milk," the average operation would
introduce noise word "of." Besides, the instance in the
aspect-target words has different effects on classification
results. "Milk" is more important than "Bottle." If we can
use accurate aspect-target information to calculate aspect
and context interaction attention scores, the quality of the
classification will high. Existing work does not solve this
problem well. IAN uses an interactive approach to learn the
attention relationship between sentence and aspect target,
and it can extract features related to the aspect target
through the attention mechanism. However, the pooling
operation ignores the interaction between the word pairs
between the sentence and the aspect target. Inspired by IAN,
we believe that obtaining aspect and sentence interaction
representations are useful. Therefore, we propose a Feature
Enhanced Attention Mechanism (FEA) to extract
interactive information from the target representation and
sentence representation generated from CNN-BiLSTM.
FEA not only generates common concern from aspect to
sentence but also from sentence to aspect automatically.
This is inspired by the observation that only a few words in
a sentence contribute to one aspect of sentiment [21]. For
example, there are two aspects "food" and "service" in the
sentence "the food was delicious, but the service was
supercilious." Based on our knowledge, we know that the
negative word "supercilious" is more likely to describe
"service" than the "food." Similarly, for an aspect phrase,
we also need to concentrate on the most critical part.
Our work has the following contributions:
1: The high-quality word embedding model is an
essential basis for sentiment analysis. We chose a new
method, the improved word vector (IWV) [22], which
improves the accuracy of pre-trained word embedding in
sentiment analysis.
2: We propose a novel Feature Enhanced Attention
model that could learn the interaction between the aspect
and the sentence, distinguish the contribution of different
aspect words to sentences, reduce the effect of interference
words that are not related to the aspect word, and make full
use of the keywords associated with the target to learn
semantic representation.
3: To confirm the eectiveness of feature enhanced
module, we propose the Feature Enhanced Attention CNN-
BiLSTM (FEA-NN) to extract local information and long-
distance dependent information. It can learn the interaction
among context and aspect, and obtain more vital sentiment
features through FEA.
We evaluate the proposed model on three datasets:
Restaurant and Laptop reviews from SemEval 2014 [23]
and Twitter dataset [24]. The experimental results verify the
effectiveness of our model in solving the above problems.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
The organizational structure of the article is as follows.
Section II introduces the related work. The proposed
approach is presented in detail in Section III. We evaluate
the results of the experiment in section IV. Finally, Section
V provides some conclusions and possible ways for future
research.
II. RELATED WORK
Since the mission is presented, aspect-based sentiment
analysis (ABSA) has attracted many researchers in the field
of machine learning and natural language processing. Most
of the research works are concentrated in two directions, one
is the neural network, and the other is the attention
mechanism.
A.
ABSA BASED ON NEURAL NETWORK
ABSA aims at identifying sentiment polarity of a sentence
expressed towards a target, which is one given aspect of a
specific entity. Early works on the ABSA task relied on
manual labeling lexicons and syntactic features engineering,
then use the machine learning for classification, such as
SVM [5] [6]. Although these methods have achieved better
results, feature engineering requires a lot of manual labeling,
which is time-consuming and laborious. Due to the success
of deep learning models in the field of natural language
processing (NLP), recent studies use deep neural networks to
generate sentence embedding and then feed the vector
representation of sentences to classifiers as low-dimensional
feature vectors. Dong et al. [25] first introduced Recurrent
Neural Networks (RNNs) into this field, they proposed an
adaptive RNN, which can adaptively propagate the sentiment
of context words to the aspect words. The results shown that
the sentence representation of RNN was effective [26]. But
they may fail due to common grammar analysis errors in
practice [25] [27]. In contrast, in processing many (language)
sequence learning tasks, the recurrent neural networks
(RNNs) have been proven to be effective, most of the
advanced solutions are based on RNNs [28] [27].
In deep learning, LSTM is mainly used for the
processing of sequence data. In recent years, the application
range of LSTM has expanded rapidly. In order to further
improve the performance of LSTM to meet the requirements
of various tasks, and to process variable-length sequence
information, many researchers have proposed many methods
to improve LSTM. Recently, LSTM and its variants have
been applied to various tasks and achieved satisfactory
results. Tang et al. [28] proposed a target-dependent long-
short-term memory network (TD-LSTM) which learns
representation directly from the left and right context of a
given aspect by using two LSTM networks, respectively.
Zhang et al. [27] used gated neural network structures to
simulate the grammar and semantics of tweets and the
interaction between context and aspect. IAN uses an
interactive approach to learn the attention relationship
between sentence and aspect target, and it can extract
features related to the aspect target through the attention
mechanism. The combination of LSTM and other network
structures is also a popular research direction.
Researchers have combined LSTM with other network
structures as an essential direction of exploration. Kolawole
[29] used CNN-LSTM model to solve the challenging
problems in sentiment analysis. This method is
straightforward and can be well executed without excessive
parameter optimization. Wang et al. [30] proposed a regional
CNN-LSTM model consisting of two parts: regional CNN
and LSTM, used to predict the value of text valence-arousal
(VA) rating. The regional CNN used a single sentence as a
region and divided the input sentence into several regions so
that significant sentiment information in each region can be
obtained and weighted according to their contribution to the
VA prediction. Then, the LSTM is used to sequentially
integrate such area information across regions to perform VA
prediction. By combining the regional CNN and LSTM can
fully consider the local (regional) information within the
sentence and the long-distance dependency information
across the sentence in the prediction process. Zhou et al. [31]
proposed a C-LSTM model that combined the advantages of
CNN and LSTM, where CNN to extract a sequence of
higher-level phrase representations, and were fed into a long
short-term memory neural network (LSTM) to obtain the
sentence representation. C-LSTM was able to capture both
local features of phrases as well as global and temporal
sentence semantics. The above model demonstrates the
validity of CNN combined with LSTM in extracting local
information and long-distance dependent information.
Besides, with the widespread use of attention mechanisms in
the field of image processing, researchers have attempted to
introduce it into the field of natural language processing.
B.
ABSA BASED ON ATTENTION MODEL
Since Bahdanau et al. [7] have successfully applied attention
mechanisms to machine translation tasks, many research
efforts have used attention mechanisms to extract the
relationship between targets and contexts. Recently, many
attempts have been made. Tang et al. [32] proposed a deep
memory network with multiple computational layers each of
which is a neural attention model on external memory to
extract the significance of each context word in deducing an
aspect of sentiment polarity. Wang et al. [33] proposed two
attention-based LSTM networks, namely AE-LSTM and
ATAE-LSTM neural networks. The AE-LSTM neural
network modeled the context through the LSTM neural
network, and then combined the hiding state with the aspect
embedding to generate the attention vector, and finally
obtained the sentiment category of the aspect through the
classifier. Based on AE-LSTM, the ATAE-LSTM network
further enhanced the effects of aspect embedding. Yang et al.
[34] proposed an attention-based BiLSTM to improve
sentiment classification accuracy. Tay et al. [20] proposed an
AF-LSTM based on the association between context words
and targets. Thus, when given a target, it allows the model to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
focus on the correct context word adaptively. While the use
of attention mechanisms improved the performance of ABSA
tasks, all of this work only deals with targets by averaging
pooling when calculating contextual attention scores. If the
aspect involves multiple words, the performance of attention
module will be reduced. Ma et al. [35] proposed a
hierarchical attention model for ABSA analysis tasks,
including target-level attention and sentence-level attention.
Nevertheless, the attention at the aspect-level is a network of
self-attention, with nothing but the hidden layer output itself
as input. Without the guidance of context information,
attention at the aspect-level is challenging to learn. Ma et al.
[4] proposed an interactive attentional neural network that
takes into account both the aspect and the contextual content,
using two attention networks to detect important information
between aspects and contextual content interactively, but it is
too simple to calculate the interaction between aspect and
context by means of the average vector, and the aspect
information is not effectively utilized. Similar to IAN, we
believe that contextual information will help to learn
attention to the aspect-level.
III. METHOD
In this section, we first define the research task and introduce
the FEA-CNN-BiLSTM, and then describe the model
architecture we proposed, which is shown in Figure 1.
A.
TASK DEFINITION
In this ABSA task, we suppose that the input sentence is
x=[x1, x2, , xn], t = [t1, t2, …, tm] is the given target-aspect
words, y, yp
{Positive, Negative, Neutral} denote real-
sentiment and the predicted sentiment, respectively. The
aspect-target may be a single word or including multiple
words. Our goal is to predict the sentiment polarity of
sentence x based on the aspect-target word t. For example,
the sentiment polarity of the sentence "The food was great,
but the service was dreadful." towards "food" is positive,
while the polarity towards "service" is negative.
B.
OVERVIEW
We propose a new attention mechanism model to solve the
problems in the ABSA task. Figure 1 illustrates an overview
of Feature Enhanced Attention CNN-BiLSTM (FEA-NN),
which includes word embedding layer, CNN layer
BiLSTM layer, Feature Enhanced Attention layer, and
sentiment classifier layer. The word embedding layer is used
to encode the context and aspect into matrixes. We used
CNN to extract local information, and then the BiLSTM was
used to learn the long-dependent information and the hidden
states of the sentence and the target. Given the hidden
semantic representations of the text and the aspect-target
generated by BiLSTM, we use the feature enhanced attention
layer calculate the attention weights of context and target.
Finally, the hidden state of the BiLSTM output was
combined with the attention weight, and the softmax function
was used to predict the probability of different sentiments.
Context Word
Representation
CNN Layer
...
...
CNN Layer
...
...
Aspect Word
Representation
...
... ...
w
md
tR
w
nd
sR
Hidden states Hidden states
2h
nd
s
hR
2h
md
t
hR
Interaction
nm
IR
Column-wise
softmax Row-wise
softmax
nm
R
nm
R
Column-wise
average
n
R
...
2h
d
rR
Linersoftmax P(y=c)
BiLSTM Layer
BiLSTM Layer
Embedding layer
Neural Network layer
FEA layer
Output layer
FIGURE 1. The overall architecture of FEA-NN. The model (FEA-NN) we
proposed mainly includes four modules: the first module is word
embedding layer which convert input to word vectors. The second
module is neural network layer, it can obtain the hidden representation
from word embedding layer. The third module is Feature enhanced
attention layer, which obtain more vital sentiment features. The last
module is output layer, it is responsible for output polarity judgment
results.
C.
WORD EMBEDDING
Choosing a suitable pre-trained word vector model has an
important impact on improving the accuracy of sentiment
analysis, this has been proven in the work of Syeda Rida-e-
Fatima et al [36]. Existing context-based word embedding,
such as Word2vec and GloVe, typically does not capture
enough sentiment information, which may result in words
with similar vector representations having opposite sentiment
polarity (eg, good and bad), thereby reducing sentiment
analysis performance [37]. In this paper, we chosed a new
word embedding model, the Improved Word Vector (IWV)
and all the baseline models in this paper used the IWV,
which improved the accuracy of pre-training word
embeddings in sentiment analysis. This method is based on
the Part of Speech (POS) tagging technology, dictionary-
based methods, word location algorithm, and
Word2Vec/GloVe methods. In accordance with the method
in Seyed's paper, we used Google's pre-trained word2vec and
Stanford's pre-trained glove Vector to select the six
dictionaries mentioned in the paper, introduced Part-of-
speech (POS) tagging and Word-position2Vec. Then get the
word vector model used in this paper. The main architecture
of the IWV shown in Figure 2 [22].
Given a sentence x=[x1, x2 xn] with length n and an
aspect t = [t1, t2… tm] with length m, we first map each word
into a low-dimensional real-value vector, and each word is
converted into a 300-dimensional word vector. Finally, the
output of the embedding layer (n depends on the length of the
longest input sentence) is a two-dimensional matrix of n*300,
and each sentence forms a two-dimensional matrix M= [w1,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
w2,…,wn] of n*k (k=300). Where, wi= [wi1, wi2,…,wik] is the
word vector of the word xi. Similarly, we can get the
embedding matrix A of aspect words.
FIGURE 2. The main architecture of the IWV [22].
D.
CNN
CNN is a feed-forward neural network. The basic structure of
a convolutional neural network includes convolution layer,
pooling layer, and fully connected layer.
1) CONVOLUTIONAL LAYER
The purpose of the convolutional layer is to extract the
semantic features of the sentence. Each convolution kernel
corresponds to extract a certain part of the feature. The
number of convolution kernels was set to 150. Convolution
operation for each sentence matrix M of the embedded layer
output.
()Z f MW b
(1)
Where Z represents the extracted feature matrix after
convolution operation; the weight matrix W and the bias
vector b are the learning parameters of the network.
For the convenience of calculation, it is necessary to
make a nonlinear mapping for the convolution result of each
convolution kernel.
 
0,f relu max x
(2)
Where relu function is one of the activation functions
commonly used in neural network models.
In order to extract features more comprehensively, this
paper uses convolution windows of size 2 and 3 to extract the
binary and ternary features of the sentence.
2) K-MAX POOLING LAYER
After the convolution operation, the extracted features were
transferred to the pooling layer. The pooling layer further
aggregate these features to simplify their expression. In this
paper, we use K-Max pooling operation, which chooses Top-
K maxima of each filter to represent the semantic information
represented by the filter. The K-value expression is
1
[]
2s
lf
K
(3)
Where l is the length of the sentence vector, fs is the
convolution window size.
After the pooling operation, the feature vector extracte
by each convolution kernel is significantly reduce, and the
semantic information of the core of the sentence was retained
since the number of convolution kernels was set to 150, the
sentence representation matrix generate after pooling is
150k
WR
.
CNN convolution layer and pooling layer extract local
features of short text sentences by convolution operation and
pooling operation respectively and get generalized binary and
ternary eigenvectors. After the fusion layer, the two kinds of
eigenvectors are joined together as the input matrix of LSTM
model.
E.
BILSTM LAYER
After convolution operation, we use BiLSTM to extract the
hidden semantics of words in the sentence and the targe, and
it can also obtain long-dependent sequence information of
sentences. The core of it is to use memory cells to remember
long-term historical information and manage it with a door
mechanism. The door structure does not provide information,
but it is used to limit the amount of information. In fact,
adding gate control mechanism is a multi-level feature
selection method. The expressions of gates and memory cells
in the gate mechanism are as follows:
1
, () []
i f i i f
f W x h b

(4)
1
, () []
i I i i I
I W x h b

(5)
~
1
( [ ] ),
iC i i C
C tanh W x h b

(6)
1i i i i i
C f C I C
 
(7)
1
, () []
i o i i o
o W x h b

(8)
()
i i i
h o tanh C
(9)
Where fiIi and oi are the forget gate, input gate, and
output gate, respectively. WfWIWoWc, bf, bI, bc, and
bo are the weight matrix and bias for each gate. Ci is the cell
state, and hi is the hidden output. A single LSTM usually
encodes sentence from just one direction. However, two
LSTMs can be used as a bidirectional encoder, called
bidirectional LSTM (BiLSTM). For a sentence x=[x1, x2…
xn], bidirectional LSTM produces a sequence of hidden
states
h
nd
s
hR
from CNN feature representation vectors,
we generate another state
h
nd
s
hR
by another backward
LSTM. In the BiLSTM network, the final output hidden
states
2h
nd
s
hR
are generated by concatenating
s
h
and
s
h
. We compute the hidden semantic states ht for the aspect
t in the same way.
12
([ ; ;...; ])
sn
h LSTM v v v
(10)
12
([ ; ;...; ])
sn
h LSTM v v v
(11)
[ , ]
s s s
h h h
(12)
Similarly, we can get the hidden state ht of aspect words.
F.
FEA MECHANISM
For improving the accuracy of sentiment analysis, both the
coattention model [2] and IAN [4] mentioned that it was
significant to obtain the connection between aspects and
context. Besides, the contribution of a sentence to the
sentiment polarity of different aspects of a sentence should
be different [32]. Inspired by this, we use a feature
enhanced attention mechanism to track information that is
most related to the sentiment polarity of a given aspect in
the context.
1) CONTEXT-TARGET INTERACT ATTENTION
Combined with the hidden representation of the context and
aspect words generated by BiLSTM, we calculate the
attention weights of the sentences through an FEA module.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
Given the target representation
2h
md
t
hR
and the sentence
representation
2h
nd
s
hR
, we first compute a pairwise
interaction matrix:
·T
st
I h h
(13)
Where the value of each vector represents the association
between the sentence and the aspect-target word. Through
column-wise softmax and row-wise softmax, we get the
attention matrix of the target to the sentence
and the
sentence to the aspect-target
:
()
()
ij
ij ij
i
exp I
exp I
(14)
()
()
ij
ij ij
j
exp I
exp I
(15)
After column-wise averaging β, we get aspect-level
attention
m
R
that represents an important part of the
aspect-target.
1
jij
i
n

(16)
The final sentence-level attention matrix
n
R
calculates each aspect-to-sentence attention by weighting. By
explicitly considering the contribution of each word to the
aspect, we can get the weight of each word in the sentence.
·T
 
(17)
2) LOCATION-ENHANCED ATTENTION
Although our method works well on calculating aspect-level
and sentence-level attention, there still exists one drawback,
our method calculates the aspect-level attention weights with
an average operation, which introduce interference
information to reduce classification accuracy.
In order to solve this problem, we try to introduce
position information in the attention calculation procedure
with the inspiration that the closer to the aspect-target word is,
the more significant the contribution to the prediction.
Besides, adding position information to the sentence-level of
attention can make the model more dependent on the context
words around the aspect-target word to calculate the critical
sentiment features of the aspect.
This method is inspired by the works of Liu et al. [38].
Firstly, we defined Di to measure the relative distance
between the current word and the aspect word. For instance,
“the food was great, but the service was terrible.” For the
aspect “food,” the relative distance of “great” to “food” was
2 and the distance of “terrible” was 8.
We also define Ei to measure the location significance of
words relative to aspect:
||
1i
inD
En

(18)
Where n is the length of the sentence.
Ei was used to initialize the context representation
before incorporating the location information into the
attention mechanism.
We first normalize Ei:
1
i
in
i
i
E
PE
(19)
Then we get the context representation through the weighting
function:
1
n
normal
s i si
i
h p h

(20)
Where hsi is the hidden representation of the context word xi
encoded by BiLSTM.
We add location information to sentence-level attention,
using location information to modify attention weights to
concentrate on words around the aspect:
i i i
RE

(21)
After standardization, we can get the new sentence-level
attention:
1
in
i
Ri
Ri
(22)
G. OUTPUT AND TRAINING
The final sentence representation is to use the sentence
attention of FEA module to weigh the hidden semantic state
of the sentence.
·
T
s
rh
(23)
We represent this sentence as the final classification
feature and feed it into a linear layer, projecting r into the
space of the target class C.
·
ll
x W r b
(24)
Where Wl and bl are weight matrix and bias. After the
linear layer, we use the softmax layer to calculate the
probability that the sentence S has a sentiment polarity of
cC
for the aspect a:
(
()
()
)c
i
iC
exp x
P y c exp x

(25)
The ultimate predictive emotional polarity of one
aspect of the goal is only the label with the highest
probability. We minimize the cross-entropy loss by L2
regularization to train our model
2
( ) ( ( )· || ||)
ii
i c C
loss I y c log P y c

 

(26)
()I
is the indicator function,
is the L2 regularization
parameter, and
is the weight matrix of a set of LSTM
networks and linear layers. We further apply dropout to
avoid overfitting, where we randomly remove some of the
input from the BiLSTM unit.
Using the Adam [39] update rule, the mini-batch
stochastic gradient descent is used to minimize the loss
function of the weight matrix and the bias term in the model.
IV. EXPERIMENT
In this section, we describe the experiment setting and
demonstrate the validity of the proposed model by setting up
a series of comparative experiments.
A. DATASET
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
We experiment the proposed model on three public datasets,
restaurant and laptop reviews from SemEval 2014 [2] have
been widely used in previous studies, the twitter dataset
collecte by Dong et al. [11]. The statistics of these datasets
are presente in Table I. For the sake of data balance, we
removed some examples of "conflict" sentiment polarity, for
example, "Certainly not the best sushi in New York, however,
it is always fresh" [2]. The evaluation metrics are
classification accuracy and F1 score.
TABLE I
STATISTICS OF RESTAURANT, LAPTOP AND TWITTER
DATASETS.
Dataset
Positive
Negative
Natural
Train
Test
Train
Test
Train
Test
Restaurant
2164
728
898
196
637
196
Laptop
987
340
866
128
460
169
Tweet
1411
173
1411
173
2826
346
B. EXPERIMENT SETTINGS
In the experiment, we first randomly select 20% of the
training data as the verification set to adjust the
hyperparameters. All weights are randomly initialize by a
uniform distribution of U (−104, 104) with all bias terms set
to zero. The L2 regularization rate is set to 104 and, the
dropout is set to 0.2 [15]. The word embedding is initialized
with a 300-dimensional pre-trained word vector and fixed
during training. For words outside the vocabulary, we use
random distribution U (−0.25, 0.25) to initialize randomly, as
done in [9]. The dimension of the LSTM hidden state is set to
150. The initial learning rate of Adam optimizer is 0.01 [39].
If the training loss does not decrease after every three stages,
our learning rate will be reduced by half. The batch size is set
to 25.
C.
MODEL COMPARISON
Our methods and previous ABSA methods are trained and
evaluated on these four data sets, respectively. We use
accuracy and macro-F1 to evaluate the model. To further
validate the performance of the model, we compare it with
several baseline approaches. In order to exclude the influence
of word vectors on experiments, all the baseline models in
this paper use the same word vector model--the pre-trained
IWV.
SVM [5]: the classic SVM model using a series of
manual features has the best results in SemEval-2014 Task
4. CNN: based on the convolutional neural network
model proposed by Kim [9].
LSTM [28]: uses single LSTM to model the sentence,
and the last hidden state is used as the sentence
representation for the final classification.
TD-LSTM: TD-LSTM [28] uses BiLSTM to encode
the sequential structure of the sentence and represent the
given target using a vector that averages the hidden output
of the target instance.
ATAE-LSTM: an attention-based LSTM network
proposed by Wang et al. [33]. Used the pre-trained
word2vec word vector as input to train the model,
combined aspect information and attention mechanisms.
The model focuses on specific aspects during training and
effectively identifies sentiment polarity.
BILSTM-ATT-G [19]: It models left and right
contexts using two attention-based LSTMs and introduces
gates to measure the importance of left context, right
context, and the entire sentence for the prediction.
RAM [18]: RAM is a multilayer architecture where
each layer consists of attention-based aggregation of word
features and a GRU cell to learn the sentence representation.
IAN [4]: an LSTM-based network which uses
attention module to obtain significant information to
generate representations of aspect and context separately by
the interactive way.
AF-LSTM [20]: learn to participate according to the
relationship between sentences and aspects which allows
our model to concentrate on the correct words given an
aspect term adaptively.
TNET [40]: use CNN as a feature extractor for
classification problems and rely on context-preserving and
location-correlation mechanisms to maintain the advantages
of LSTM.
IAN-NN: inspired by the TNET model, we added a
CNN layer in IAN to extract significant features.
FEA-NN*: the model proposed in this paper without
adding location attention.
D.
MAIN RESULT
It can be seen from Table II that our model FEA-NN
maintains the best performance on all datasets, which verifies
the effectiveness of our entire model. Meantime, majority
method has the poor performance, and the main reason is that
the model cannot obtain target information accurately,
especially those models that use the average pooling method,
which ignored the target information, such as TD-LSTM,
ATAE-LSTM, AF-LSTM, BILSTM-ATT-G, IAN and RAM.
TD-LSTM improves performance by approximately 2%
compared to LSTM, it is improved from the basis of LSTM,
uses a forward LSTM and a backward LSTM to model the
left context and right context with an aspect, but it may be
incapable of capturing the interactions between aspects and
contexts, resulting in poor performance. ATAE-LSTM
improve performance by approximately 2~3% compared
with TD-LSTM because it uses attention mechanisms to
model the relationship between aspect and context. AF-
LSTM adopts circular convolution and circular correlation to
model the similarity between aspect and words, the
performance of the model has been significantly improved.
BILSTM-ATT-G improved performance by approximately
2% compared with AF-LSTM, close to IAN, because it uses
two attention-based LSTMs, and gates were introduced to
measure the importance of the left and right contexts and the
entire sentence. IAN has better results because it models
aspect representations by using attention modules while
modeling context, which makes better use of aspect
information than AF-LSTM. The performance of RAM is
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
2% higher than that of IAN, close to TNET, because it uses
multiple attention mechanisms to capture distant sentiment
features, which is more robust to interference from
uncorrelated information. The performance of TNET is 2%
higher than that of IAN, instead of using attention
mechanism, TNET uses CNN to extract significant features
from BiLSTM's context representation and adds a component
between the two layers to maintain the original context
information, and this effectively avoids the natural defects of
the previous attention mechanism.
Among our proposed models, FEA-NN obtains the best
classification accuracies on three datasets. The IAN-NN
combine the advantages of CNN extraction features in TNET
and the interactive attention mechanism in IAN, and
achieved better results, this proved the effectiveness of
introducing the CNN layer, but the average pooling method
in IAN limited the further improvement of the model. FEA-
NN adopts FEA without adding location attention
mechanism. FEA-NN uses FEA without adding location
attention mechanism, which also achieves better results, but
the performance improvement is limited compared with
IAN-NN because of the lack of location attention mechanism,
and the ability to distinguish the importance of context words
related to aspect needs to be improved. As expected, FEA-
NN can better predict the sentiment polarity of a given aspect
in a sentence by considering the attention mechanism of
location information between aspect words and context
words. The above results show that adding attention
mechanism on the basis of considering the interaction
between aspect and context words can better distinguish the
importance of different words to aspect and help to improve
the accuracy of sentiment prediction.
TABLE II
COMPARISON WITH OTHER MODELS.
Dataset
Restaurant
Laptop
Twitter
Acc
F1
Acc
F1
Acc
F1
CNN
62.33
57.17
57.14
53.43
63.20
64.12
LSTM
71.62
68.54
63.25
59.89
59.31
55.52
SVM
72.23
67.65
64.34
61.43
62.23
61.72
TD-LSTM
73.56
65.90
66.90
61.73
-
-
ATAE-LSTM
76.82
72.56
67.93
60.81
-
-
AF-LSTM
77.13
72.85
72.32
68.21
66.60
60.82
BILSTM-ATT-G
77.82
72.33
72.61
67.42
67.52
62.47
IAN
78.33
73.12
73.64
67.31
-
-
RAM
78.95
74.21
74.10
68.42
67.94
63.02
TNET
80.79
74.32
75.21
69.73
68.63
64.31
IAN-NN
81.99
75.21
76.21
70.13
70.71
65.85
FEA-NN*
82.21
76.05
76.43
70.62
71.10
67.31
FEA-NN
83.21
77.57
78.55
72.65
73.31
68.67
E.
EFFECT OF CNN-BILSTM NETWORK LAYER ON THE
ACCURACY
Choosing a suitable CNN-BiLSTM network layer is
important for improving the accuracy of the model. We
determined the number of layers through experiments. The
more the number of layers, the more accurate the surface is
represented, but the more difficult it is to obtain the global
optimal value. We experiment with 1 to 10, and selecte 2.
Figure 3 shows the details.
FIGURE 3. Effect of CNN-BiLSTM network Layer on the accuracy of sentiment
analysis.
F. COMPARISON OF RESULTS USING DIFFERENT
WORD EMBEDDING MODEL
For sentiment analysis tasks, it is crucial to choose a high-
quality word embedding model. We combine the neural
network model proposes in this paper with different word
vector models, to ensure the selection of the most suitable
vectors for this model so that our model can achieve the best
performance. The experimental results are shown in Table III.
Word2vec: is an open-source tool proposed by Google
in 2013 to represent words as real-valued vectors. It is a
typical representative of the method of predicting word
vector by context information.
Glove: Using statistical methods to model the frequency
of occurrence of words and their context words.
Fasttext: When fasttext does word embedding, it thinks
that words consist of English letters and words with similar
letter structures should have standard features. This method
uses Word2vec for reference and adds information of letters
in words to assist.
word2gm: Drawing on Word2vec, it is believed that
words may have different semantics in different contexts
(multiple meanings), one word corresponding to a vector is
not enough to reflect such information, consider multiple
words embedding, and learn each sub-vector of words with
Gaussian mixture model.
Prob-fasttext: Fasttext considers the letter information,
but does not consider the word polysemy, it mixes the idea of
word2gm and fasttext, each word is represented by two
embedding results, one of which is embedding of letters, one
is embeeding of its own. TABLE III
COMPARISON OF RESULTS USING DIFFERENT WORD EMBEDDING MODEL.
Dataset
Restaurant
Laptop
Twitter
FEA-NN+fasttext
70.99
65.41
60.41
FEA-NN+glove
73.14
68.93
63.14
FEA-NN+word2gm
74.51
69.38
64.71
FEA-NN+prob-fasttext
76.23
71.34
66.23
FEA-NN+word2Vec
80.13
76.93
69.24
FEA-NN+IWV
83.21
78.55
73.31
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
It can be seen from the experimental results that the
FEA-NN using the IWV word vector model achieves the best
sentiment classification accuracy.
G. CASE STUDY
In Figure 4, we list five examples of test sets that analyze the
essential aspects of emotional polarity words, and we see the
last sentence note vector in Figure 4. Color depth indicates
the importance of a word in a sentence. The darker the color,
the more critical it is. In the first two examples, there are two
aspects "screen" and "battery." In the sentence "The screen
looks great, but the battery life is too short." We can observe
that when there are two aspects in a sentence, our model
automatically points to the correct sentiment for the word
representing each aspect, as in the third and fourth examples.
In the last example, the aspect is the phrase "boot time." Boot
time is super fast, around anywhere from 35 seconds to 1
minute. The color depth denotes the importance degree of the
weight in attention vector γ.
Screen
Aspect Sentence
The screen lo oks great , but the battery Life is too short
1.0
0.8
0.6
0.4
0.2
0.0
battery
Great food but service was dreadful
food
service
boot time Boot t ime is sup er fast , around any where from 35 seconds to 1 mi nute .
The screen lo oks great , but the battery Life is too short
Great food but service was dreadful
FIGURE 4. Examples of final attention weights for sentences.
V. Conclusion
In this paper, a novel neural network structure for target
ABSA task is proposed. The model uses CNN to provide
BiLSTM with higher quality text representation and uses
FEA attention module to encode the target and complete
sentences. Target-level attention learns to focus on the salient
emotional part of the target expression and generates a more
accurate representation of the target. Sentence-level attention
searches for the relevance between the target and the aspect
in the whole sentence, location-enhanced attention can be
used to distinguish better the importance related to sentiment
prediction base on the distance between context words and
aspect words. We evaluate the model on three datasets. The
experimental results verified the effectiveness of the
proposed neural network and shown that the model achieves
the most advanced performance.
REFERENCES
[1] Lei, Zhang, and Liu. Bing. "Sentiment Analysis and
Opinion Mining." Synthesis Lectures on Human
Language Technologies 30.1(2011):167.
[2] Chao Yang, Hefeng Zhang, Bin Jiang and Keqin Li.
"Aspect-based sentiment analysis with alternating
coattention networks." Information Processing &
Management 56.3(2019):463-478.
[3] Schouten, Kim, and F. Frasincar. "Survey on Aspect-
Level Sentiment Analysis." IEEE Educational Activities
Department, 2016.
[4] Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang.
"Interactive Attention Networks for Aspect-Level
Sentiment Classification." Proceedings of the twenty-
sixth international joint conference on artifcial
intelligence (IJCAI 2017)40694074.
[5] Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and
Saif M. Mohammad. "Detecting Aspects and Sentiment
in Customer Reviews. International Work-shop on
Semantic Evaluation." NRC, Canada, 2014.
[6] Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab
Barman, Dasha Bogdanova, Jennifer Foster and Lamia
Tounsi. "Aspect-based polarity classifcation for semeval
task 4." Proceedings of the 8th international workshop on
semantic evaluation (SEMEVAL 2014)223229.
[7] Bahdanau, Dzmitry, Kyunghyun Cho and Yoshua
Bengio. "Neural Machine Translation by Jointly Learning
to Align and Translate." Computer Science (2014).
[8] Nal Kalchbrenner, Edward Grefenstette and Phil
Blunsom. "A Convolutional Neural Network for
Modelling Sentences." ArXiv: 1404.2188.
[9] Kim, Y. "Convolutional neural networks for sentence
classifcation." Proceedings of the 2014 conference on
empirical methods in natural language processing
(EMNLP2014)17461751.
[10] Qiao Qian, Minlie Huang and Xiaoyan Zhu.
"Linguistically regularized lstms for sentiment
classifcation." In Proceedings of the 55th annual meeting
of the association for computational linguistics (ACL
2017)16791689.
[11] Sebastian RuderParsa GhaffariJohn G. Breslin. "A
hierarchical model of reviews for aspect-based sentiment
analysis." In Proceedings of the 2016 conference on
empirical methods in natural language processing
(EMNLP 2016)9991005.
[12] Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu,
Hongyun Bao and Bo Xu. "Text classifcation improved
by integrating bidirectional lstm with two-dimensional
max pooling." In Proceedings of the 26th international
conference on computational linguistics (COLING
2016)34853495.
[13] Alex Graves, Abdel-rahman Mohamed and Geoffrey
Hinton. "Speech Recognition with Deep Recurrent
Neural Networks." 2013 IEEE International Conference
on Acoustics, Speech and Signal Processing IEEE, 2013.
[14] Kai Sheng Tai, Richard Socher and Christopher D.
Manning. "Improved Semantic Representations from
Tree-Structured Long Short-Term Memory Networks."
Computer Science 5.1(2015): 36.
[15] Gang Liu, Jiabao Guo. "Bidirectional LSTM with
attention mechanism and convolutional layer for text
classification." Neurocomputing 337 (2019) 325338.
[16] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos,
Mo Yu, Bing Xiang, Bowen Zhou and Yoshua Bengio.
"A structured self-attentive sentence embedding."
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
International conference on learning representations
(ICLR 2017).
[17] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex
Smola and Eduard Hovy. "Hierarchical attention
networks for document classifcation." In Proceedings of
the 2016 conference of the North American chapter of
the association for computational linguistics: Human
language technologies (NAACL 2016)14801489.
[18] Peng Chen, Zhongqian Sun, Lidong Bing, and Wei.
"Recurrent attention network on memory for aspect
sentiment analysis." Proceedings of the 2017 conference
on empir-ical methods in natural language processing
(EMNLP 2017)452461.
[19] Jiangming Liu and Yue Zhang. "Attention modeling for
targeted sentiment." In Proceedings of the 15th
conference of the European chapter of the association for
computa-tional linguistics (EACL 2017)572577.
[20] Yi Tay, Anh Tuan Luu and Siu Cheung Hui. "Learning to
attend via word-aspect associative fusion for aspect-
based sentiment analysis." In Proceedings of the thirty-
second AAAI conference on artifcial intelligence (AAAI
2018)59565963.
[21] Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting
Liu and Guoping Hu. "Attention-over-attention neural
networks for reading comprehension." In Proceedings of
the 55th Annual Meeting of the Association for
Computational Linguistics. pp. 593-602 (2017).
[22] Seyed Mahdi Rezaeinia, Rouhollah Rahmani, Ali
Ghodsi, Hadi Veisi. "Sentiment analysis based on
improved pretrained word embeddings." Expert Systems
with Applications 117 (2019) 139-147.
[23] Maria Pontiki, Dimitris Galanis, John Pavlopoulos,
Harris Papageorgiou, Ion Androutsopoulos and Suresh
Manandhar. "SemEval-2014 Task 4: Aspect Based
Sentiment Analysis." Proceedings of International
Workshop on Semantic Evaluation at, 27-35 (2014).
[24] Xin Li, Lidong Bing, Piji Li, Wai Lam. "A Unified
Model for Opinion Target Extraction and Target
Sentiment Prediction." AAAI 2019.
[25] Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, Ming
Zhou, and Ke Xu. "Adaptive recursive neural network for
target-dependent twitter sentiment classifcation." In Proc.
of the 52nd Annual Meeting of the Associ·ation for
Computa-tionalLinguistics (Short Papers). ACL,
Baltimore, Maryland, 49-54.
[26] Thien Hai Nguyen and Kiyoaki Shirai. "PhraseRNN:
Phrase recursive neural network for aspect-based
sentiment analysis." In Proc. of the 2015 Conference on
Empirical Methods in Natural Language Processing.
ACL, Lisbon, Portugal, 2509-2514.
[27] Meishan Zhang, Yue Zhang, and Duy Tang. "Gated
neural networks for targeted sentiment analysis." In Proc.
of the Thirtieth AAAI. AAAI Press, Phoenix, Arizona,
USA, 3087-3093.
[28] Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu.
"Eective LSTMs for target-dependent sentiment
classification." In Proc. of COLING 2016, the 26th
International Conference on Computational Linguistics:
Technical Papers. Osaka, Japan, 32983307.
[29] ren Kaae Sønderby, Casper Kaae Sønderby, Henrik
Nielsen and Ole Winther. "Convolutional LSTM
networks for subcellular localization of proteins." In
Proceedings of the 2nd International Conference on
Algorithms for Computational Biology, Springer Verlag,
Mexico City, Mexico, 2015, pp. 68-80.
[30] Jin Wang, Liang-Chih Yu, K. Robert Lai, Xue-Jie Zhang.
"Dimensional sentiment analysis using a regional CNN-
LSTM model." Association for Computational
Linguistics (ACL), Berlin, Germany, 2016, pp. 225-230.
[31] Chunting Zhou, Chonglin Sun, Zhiyuan Liu and Francis
C.M. Lau. "A C-LSTM Neural Network for Text
Classification" [J]. Computer Science, 2015, 1(4):39-44.
[32] Duyu Tang, Bing Qin and Ting Liu. "Aspect level
sentiment classification with deep memory network."
Proceedings of the 2016 conference on empirical
methods in natural language processing (EMNLP
2016)214224.
[33] Yequan Wang, Minlie Huang, Xiaoyan Zhu and Li Zhao.
"Attention-based lstm for aspect-level sentiment
classification." In the EMNLP 2016, 606615 (2016).
[34] Min Yang, Wenting Tu, Jingxuan Wang, Fei Xu and
Xiaojun Chen. "Attention-based LSTM for target
dependent sentiment classification." Proceedings of
AAAI, 2017: 5013-5014.
[35] Yukun Ma, Haiyun Peng and Erik Cambria. "Targeted
aspect-based sentiment analysis via embedding
commonsense knowledge into an attentive lstm."
Proceedings of the thirty-second AAAI conference on
artificial intelligence (AAAI 2018)58765883.
[36] Syeda Rida-e-Fatima, Ali Javed, Ameen Banjar, Aun
Irtaza, Hassan Dawood, Hussain Dawood and Abdullah
Alamri. "A Multi-Layer Dual Attention Deep Learning
Model with Refined Word Embeddings for Aspect-Based
Sentiment Analysis." IEEE Access, 2019.
[37] Lian-Chih Yu, Jin Wang, K. Robert Lai and Xuejie
Zhang. "Refining Word Embeddings Using Intensity
Score for Sentiment Analysis." IEEE/ACM Transactions
on Audio, Speech, and Language Processing, 2017, PP
(99):1-1.
[38] Qiao Liu, Haibin Zhang, Yifu Zeng and Ziqi Huang.
"Content Attention Model for Aspect Based Sentiment
Analysis." The 2018 World Wide Web Conference, 2018.
[39] Adam Kosiorek. "Attention mechanism in neural
networks." Robot Industry, (6), 12-17 (2017).
[40] Xin, Li, Lidong Bing, Wai Lam1 and Bei Shiet.
"Transformation Networks for Target-Oriented Sentiment
Classification." ACL 2018.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2952888, IEEE Access
VOLUME XX, 2017 9
WEI MENG was born in 1995. He is currently
pursuing the master’s degree in the College of
Computer Science and Engineering, Shandong
Normal University, China. His research interests
include natural language processing, data mining
and sentiment analysis.
YONGQING WEI received the Bachelor's degree
from Shandong Normal University in 1983. She is
currently a Second-level professor, Master's
supervisor, with the Basic education Department,
Shandong Police College, China. She is the middle-
aged and young expert with outstanding
contribution in Shandong province, the outstanding
policeman in Shandong Province, the top-notch
scientist in Shandong Public Security, and the
advanced individual in Shandong Public Security
Science and Technology Work. Her research interests include network
information security, information retrieval, natural language processing,
and artificial intelligence.
PEIYU LIU received the Master degree from East
China normal university in 1986.He is currently a
Second-level professor, doctoral supervisor, with
the School of information science and engineering,
Shandong Normal University, China. He is the
national outstanding science and technology worker,
middle-aged and young expert with outstanding
contribution in Shandong province, famous teacher
of Shandong province. His research interests
include network information security, information
retrieval, natural language processing, and artificial intelligence.
ZHENFANG ZHU received the Ph.D. degree from
Shandong Normal University in 2012and he was
a postdoctoral fellow at Shandong University
between 2012 and 2015.He is currently an
Associate Professor and a Master's Supervisor with
the School of information science and electrical
engineering, Shandong Jiao Tong University, China.
His research interests include network information
security, natural language processing, and applied
linguistics.
HONGXIA YIN was born in 1993. She is currently
pursuing the master’s degree in the College of
Computer Science and Engineering, Shandong
Normal University, China. Her research interests
include natural language processing, data mining
and textual sentiment analysis.
... Transfer learning accelerates model training and improves performance, especially when limited labeled data is available for sentiment analysis tasks. • Attention Mechanisms: Attention mechanisms enhance the interpretability of CNN models [8]by focusing on relevant image regions or features for sentiment prediction. Attention mechanisms can improve model performance by selectively attending to salient visual cues associated with sentiment expressions [8]. ...
... • Attention Mechanisms: Attention mechanisms enhance the interpretability of CNN models [8]by focusing on relevant image regions or features for sentiment prediction. Attention mechanisms can improve model performance by selectively attending to salient visual cues associated with sentiment expressions [8]. ...
... • For text and image modalities [1], early fusion concatenates or combines feature vectors obtained from text embeddings (e.g., Word2Vec, BERT) and image representations (e.g., CNN features). • For audio, text, and image modalities, early fusion aggregates feature vectors extracted from audio (e.g., MFCCs), text (e.g., TF-IDF vectors), and image (e.g., CNN features) [8] modalities into a unified feature vector. • Early fusion provides a unified representation that captures complementary information from multiple modalities at an early stage of the sentiment analysis pipeline. ...
Article
Full-text available
Sentiment assessment is crucial for grasping consumer viewpoints on products, services, and brands [3]. Given the surge in online reviews across diverse platforms, there's an imperative for sophisticated technologies capable of precisely gauging sentiment. This study introduces a specialized multimodal sentiment evaluation [5] framework tailored for product assessments. By leveraging textual [1] and visual indicators, our method fuses natural language processing (NLP) methods with computer vision techniques to capture subtle emotional nuances. We employ pretrained linguistic and visual models for feature extraction and employ a fusion approach to amalgamate data from diverse sources. Through testing on comprehensive benchmark datasets, our method showcases its aptitude in predicting emotional sentiment with accuracy. Furthermore, we undertake comparative assessments against cutting-edge techniques to highlight the advantages of our proposed framework [3]. Our findings underscore the significance of multimodal strategies in enhancing the precision and resilience of product review sentiment analysis, furnishing businesses and stakeholders with insightful insights into consumer sentiments. Multimodal [5] sentiment analysis encompasses the evaluation of emotions conveyed through varied mediums, including text [1], images, audio, and video. As the volume of user-generated content on e-commerce sites and social networks continues to rise, there is an escalating interest in diversifying sentiment analysis methodologies.
... BiLSTM networks process text in both forward and backward directions, capturing contextual information from preceding and succeeding tokens in a sequence, which is crucial in sentiment analysis where sentence meaning often depends on multiple parts [14,15,16]. Attention mechanisms allow models to focus on the most important words or phrases in a text by assigning weights to input elements, enhancing the model's sensitivity to emotional cues in user-generated content [18,19,20]. ...
Article
Full-text available
This study investigates a novel sentiment analysis model designed for the Xiaohongshu platform, leveraging the RoBERTa model combined with BiLSTM and attention mechanisms. Xiaohongshu, a prominent social commerce platform in China, offers unique challenges for sentiment analysis due to its user-generated content, which is often informal and multi-dimensional. The RoBERTa-BiLSTM-Attention model is introduced to capture complex semantic nuances and enhance the accuracy of sentiment classification. A comparative experiment was conducted to evaluate this model's effectiveness against traditional methods, including Word2vec-LSTM, Word2vec-BiLSTM, BERT-BiLSTM, and BERT-BiLSTM-Attention. Results show that the RoBERTa-BiLSTM-Attention model outperforms these models in terms of accuracy and F1-score, demonstrating its potential to capture and classify sentiment in user-generated content on Xiaohongshu with improved robustness and depth.
... However, this model may ignore important features when processing highdimensional inputs and large-scale data, which affects learning ability and prediction accuracy. Introducing an attention mechanism can capture the impact of feature pairs of data at different moments on the predicted value, thereby improving the model's prediction accuracy [46]. ...
Article
Full-text available
As the application of the Global Navigation Satellite System (GNSS) continues to expand, its stability and safety issues are receiving more and more attention, especially the interference problem. Interference reduces the signal reception quality of ground terminals and may even lead to the paralysis of GNSS function in severe cases. In recent years, Low Earth Orbit (LEO) satellites have been highly emphasized for their unique advantages in GNSS interference detection, and related commercial and academic activities have increased rapidly. In this context, based on the signal-to-noise ratio (SNR) and radio-frequency interference (RFI) measurements data from COSMIC-2 satellites, this paper explores a method of predicting RFI measurements using SNR correlation variations in different GNSS signal channels for application to the detection and localization of civil terrestrial GNSS interference signals. Research shows that the SNR in different GNSS signal channels shows a correlated change under the influence of RFI. To this end, a CNN-BiLSTM-Attention model combining a convolutional neural network (CNN), bi-directional long and short-term memory network (BiLSTM), and attention mechanism is proposed in this paper, and the model takes the multi-channel SNR time series of the GNSS as the input and outputs the maximum measured value of RFI in the multi-channels. The experimental results show that compared with the traditional band-pass filtering inter-correlation method and other deep learning models, the model in this paper has a root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R2) of 1.0185, 1.8567, and 0.9693, respectively, in RFI prediction, which demonstrates a higher RFI detection accuracy and a wide range of rough localization capabilities, showing significant competitiveness. Since the correlation changes in the SNR can be processed to decouple the signal strength, this model is also suitable for future GNSS-RO missions (such as COSMIC-1, CHAMP, GRACE, and Spire) for which no RFI measurements have yet been made.
... Meng et al. [15] have done an aspect level analysis on sentiment using the feature-based attention mechanism. The proposed method was used to acquire the higher level sequence of phrase representation with the CNN for getting efficient support related to the coding tasks. ...
... 5. AEN (2019) [56]: Utilizes attention-based encoding to capture aspect-context relationships. 6. FEA-NN (2019) [57]: Links aspects and sentences to minimize irrelevant words. Content courtesy of Springer Nature, terms of use apply. ...
Article
Full-text available
Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity associated with individual aspects within a text or user review. Its significance in business and political intelligence has established it as a crucial area of study. Researchers have employed various methods, including dictionaries, machine learning, and deep learning, to address this challenge and extract valuable insights from extensive textual data. Despite these advances, challenges such as language ambiguity, sarcasm, and context-specific meanings persist. In our research, we introduce a novel approach that utilizes Graph Attention Networks (GATs) based on dependency trees to illustrate the relationship between aspects and opinion words within a text. Each sentence is represented by a dependency tree, with words as distinct nodes. We propose the Contrastive Learning Graph Attention Network (CL-GAT) model, which is trained using a dual-path approach that integrates both syntactic and semantic information. The model employs the Contrastive Loss function to enhance learning. Our experiments, conducted on datasets from restaurants, laptops, and Twitter, achieved notable performance improvements. Specifically, on the SemEval 2014 Task 4 restaurant dataset, the CL-GAT model achieved an accuracy of 89.07%. These results demonstrate the promising performance and efficacy of our proposed approach.
... This fusion allows the model to understand the context of the data more accurately, thus improving the accuracy and reliability of the diagnosis. Consider that CNN-BiLSTM combined model applications can perform well in many fields [39][40][41], this paper presents a method of turbine rotor fault diagnosis based on CNN-BiLSTM model optimized VMD by NGSABO. Specific steps are as follows: ...
Article
Full-text available
To solve the problem of difficulty in extracting and identifying fault types during turbine rotor operation, a fault diagnosis method based on improved subtraction mean optimizer (NGSABO) algorithm to optimize variational mode decomposition (VMD) and CNN-BiLSTM neural network is proposed. Firstly, three improvements are made to the subtraction average optimizer algorithm. Secondly, the optimal VMD parameter combination of NGSABO adaptive selection mode decomposition number K and penalty factor α is used to decompose the rotor fault signal, and the minimum sample entropy is used as the fitness function for feature extraction. Combining convolutional neural network and bidirectional long short-term memory network to identify and classify features. Compared with other methods, this method has outstanding performance in the diagnosis of single and coupled rotor faults. The accuracy of fault diagnosis reaches 98.5714%, which has good practical application value.
Article
Full-text available
This work focuses on sentiment analysis on a sample of tweets with air-line-related content, using both classic classifiers and the state-of-the-art deep neural networks. Out of ten models all the varying algorithms like Random Forest, CNN, LSTM, combined CNN-LSTM and the deep learning models that consist of attention are Attention LSTM model and Self Attention Model were implemented and tested. Text preprocessing included removing stop words, stemming, lemmatization, and converting text into vectors in the traditional ML approach with the help of TF-IDF; while in DL with the help of GloVe. Both models were trained and cross-valcaned on this dataset and several measures of performance which included accuracy, precision, recall, and the F1 score were compiled. The obtained experimental data pointed out that the Self-Attention Model provides the highest rate of targeted identification’s accuracy – 94.72% – as compared to other structures. The other two models; hybrid CNN-LSTM and standalone CNN gave improved results with accuracies of 93.29% and 93.50 % thereby outcompeting the Random Forest model which recorded 90.34% only. The main contribution is the analysis that shows that attention mechanisms yield high accuracy especially for sentiment classification of short noisy texts such as tweets. First, this research fills the gap in the literature about multiple architectures and showcases the effectiveness of the attention-based models for the SA task.
Article
Aspect-based sentiment analysis aims to extract the sentiment polarity of different aspects within a text. In recent years, most methods have relied on pre-trained language models such as BERT and Roberta to learn semantic representations from the context. However, in texts with ambiguous sentiment expression, the absence of domain knowledge guidance may lead pre-trained language models to miss critical information, and the attention mechanism might incorrectly focus on text that is irrelevant to the aspect categories. To address these issues, this study integrates the ontology of movie reviews to construct an aspect-based sentiment analysis model based on the ERNIE(OMR-EBA). We annotated a new Chinese data set focused on movie reviews to evaluate the model’s performance. Experimental results show that our model achieves 86% accuracy in aspectual sentiment analysis, which is better than other baseline models. The movie review domain ontology and aspect-based sentiment analysis model proposed in this study can provide valuable reference and guidance for research in the field of online movie reviews. It can also help movie production teams understand genuine user sentiments, aiding in subsequent marketing and production efforts.
Article
Full-text available
Although the sentiment analysis domain has been deeply studied in last few years, the analysis of social media content is still a challenging task due to exponential growth of the multimedia content. Natural language ambiguities and indirect sentiments within the social media text has made it hard to classify. Aspect based sentiment analysis creates a need to develop explicit extraction techniques using syntactic parsers to exploit the relationship between aspect and sentiment terms. Along-with the extraction approaches, word embeddings are generated through Word2Vec models for the continuous low-dimensional vector representation of text that fails to capture the significant sentiment information. This paper presents a coextraction model with refined word embeddings to exploit the dependency structures without using syntactic parsers. For this purpose, a deep learning based multilayer dual attention model is proposed to exploit the indirect relation between the aspect and opinion terms. In addition, word embeddings are refined by providing distinct vector representations to dissimilar sentiments unlike Word2Vec model. For this we employed a sentiment refinement technique for pre-trained word embedding model to overcome the problem of similar vector representations of opposite sentiments. Performance of the proposed model is evaluated on three benchmark datasets of SemEval Challenge 2014 and 2015. Experimental results indicate the effectiveness of our model as compared to existing state-of-the-art models for aspect-based sentiment analysis.
Article
We present an attention-based bidirectional LSTM approach to improve the target-dependent sentiment classification. Our method learns the alignment between the target entities and the most distinguishing features. We conduct extensive experiments on a real-life dataset. The experimental results show that our model achieves state-of-the-art results.
Article
Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect-based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment-related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment tasks.
Article
Targeted sentiment analysis classifies the sentiment polarity towards each target entity mention in given text documents. Seminal methods extract manual discrete features from automatic syntactic parse trees in order to capture semantic information of the enclosing sentence with respect to a target entity mention. Recently, it has been shown that competitive accuracies can be achieved without using syntactic parsers, which can be highly inaccurate on noisy text such as tweets. This is achieved by applying distributed word representations and rich neural pooling functions over a simple and intuitive segmentation of tweets according to target entity mentions. In this paper, we extend this idea by proposing a sentence-level neural model to address the limitation of pooling functions, which do not explicitly model tweet-level semantics. First, a bi-directional gated neural network is used to connect the words in a tweet so that pooling functions can be applied over the hidden layer instead of words for better representing the target and its contexts. Second, a three-way gated neural network structure is used to model the interaction between the target mention and its surrounding contexts. Experiments show that our proposed model gives significantly higher accuracies compared to the current best method for targeted sentiment analysis.
Article
Target-based sentiment analysis involves opinion target extraction and target sentiment classification. However, most of the existing works usually studied one of these two sub-tasks alone, which hinders their practical use. This paper aims to solve the complete task of target-based sentiment analysis in an end-to-end fashion, and presents a novel unified model which applies a unified tagging scheme. Our framework involves two stacked recurrent neural networks: The upper one predicts the unified tags to produce the final output results of the primary target-based sentiment analysis; The lower one performs an auxiliary target boundary prediction aiming at guiding the upper network to improve the performance of the primary task. To explore the inter-task dependency, we propose to explicitly model the constrained transitions from target boundaries to target sentiment polarities. We also propose to maintain the sentiment consistency within an opinion target via a gate mechanism which models the relation between the features for the current word and the previous word. We conduct extensive experiments on three benchmark datasets and our framework achieves consistently superior results.
Article
Neural network models have been widely used in the field of natural language processing (NLP). Recurrent neural networks (RNNs), which have the ability to process sequences of arbitrary length, are common methods for sequence modeling tasks. Long short-term memory (LSTM) is one kind of RNNs and has achieved remarkable performance in text classification. However, due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language, text classification presents difficult challenges. In order to solve the above problems, a novel and unified architecture which contains a bidirectional LSTM (BiLSTM), attention mechanism and the convolutional layer is proposed in this paper. The proposed architecture is called attention-based bidirectional long short-term memory with convolution layer (AC-BiLSTM). In AC-BiLSTM, the convolutional layer extracts the higher-level phrase representations from the word embedding vectors and BiLSTM is used to access both the preceding and succeeding context representations. Attention mechanism is employed to give different focus to the information outputted from the hidden layers of BiLSTM. Finally, the softmax classifier is used to classify the processed context information. AC-BiLSTM is able to capture both the local feature of phrases as well as global sentence semantics. Experimental verifications are conducted on six sentiment classification datasets and a question classification dataset, including detailed analysis for AC-BiLSTM. The results clearly show that AC-BiLSTM outperforms other state-of-the-art text classification methods in terms of the classification accuracy.
Article
Aspect-based sentiment analysis aims to predict the sentiment polarities of specific targets in a given text. Recent researches show great interest in modeling the target and context with attention network to obtain more effective feature representation for sentiment classification task. However, the use of an average vector of target for computing the attention score for context is unfair. Besides, the interaction mechanism is simple thus need to be further improved. To solve the above problems, this paper first proposes a coattention mechanism which models both target-level and context-level attention alternatively so as to focus on those key words of targets to learn more effective context representation. On this basis, we implement a Coattention-LSTM network which learns nonlinear representations of context and target simultaneously and can extracts more effective sentiment feature from coattention mechanism. Further, a Coattention-MemNet network which adopts a multiple-hops coattention mechanism is proposed to improve the sentiment classification result. Finally, we propose a new location weighted function which considers the location information to enhance the performance of coattention mechanism. Extensive experiments on two public datasets demonstrate the effectiveness of all proposed methods, and our findings in the experiments provide new insight for future developments of using attention mechanism and deep neural network for aspect-based sentiment analysis.
Article
Sentiment analysis is a fast growing area of research in natural language processing (NLP) and text classifications. This technique has become an essential part of a wide range of applications including politics, business, advertising and marketing. There are various techniques for sentiment analysis, but recently word embeddings methods have been widely used in sentiment classification tasks. Word2Vec and GloVe are currently among the most accurate and usable word embedding methods which can convert words into meaningful vectors. However, these methods ignore sentiment information of texts and need a large corpus of texts for training and generating exact vectors. As a result, because of the small size of some corpora, researcher often have to use pre-trained word embeddings which were trained on other large text corpora such as Google News with about 100 billion words. The increasing accuracy of pre-trained word embeddings has a great impact on sentiment analysis research. In this paper, we propose a novel method, Improved Word Vectors (IWV), which increases the accuracy of pre-trained word embeddings in sentiment analysis. Our method is based on Part-of-Speech (POS) tagging techniques, lexicon-based approaches, word position algorithm and Word2Vec/GloVe methods. We tested the accuracy of our method via different deep learning models and benchmark sentiment datasets. Our experiment results show that Improved Word Vectors (IWV) are very effective for sentiment analysis.
Conference Paper
Aspect based sentiment classification is a crucial task for sentiment analysis. Recent advances in neural attention models demonstrate that they can be helpful in aspect based sentiment classification tasks, which can help identify the focus words in human. However, according to our empirical study, prevalent content attention mechanisms proposed for aspect based sentiment classification mostly focus on identifying the sentiment words or shifters, without considering the relevance of such words with respect to the given aspects in the sentence. Therefore, they are usually insufficient for dealing with multi-aspect sentences and the syntactically complex sentence structures. To solve this problem, we propose a novel content attention based aspect based sentiment classification model, with two attention enhancing mechanisms: sentence-level content attention mechanism is capable of capturing the important information about given aspects from a global perspective, whiles the context attention mechanism is responsible for simultaneously taking the order of the words and their correlations into account, by embedding them into a series of customized memories. Experimental results demonstrate that our model outperforms the state-of-the-art, in which the proposed mechanisms play a key role.