Conference Paper

ECNU at SemEval-2017 Task 4: Evaluating Effective Features on Machine Learning Methods for Twitter Message Polarity Classification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In comparison to baseline techniques, an increase of 3.4% in accuracy and 2.4% in F1 score was achieved. YunxiaoZhou et al. [60], used a set of features viz. NLP (natural language processing), domain & word embedding for sentiment classification via supervised method. ...
... Complete background of user [16,18,33,49,60] 4. ...
... Linguistic features like n gram, parts of speech tag [60] 12. ...
Article
Full-text available
The opinion seeking behavior of people for good decision making has greatly enhanced the importance of social media as a platform for exchange of information. This trend has led to a sudden spurt of information overflow on the Web. The huge volume of such information has to be technically processed for segregating the relevant knowledge. Sentiment analysis is the popular method extensively used for this purpose. It is defined as the computational study of mining the opinions from the available content about the entity of interest. Existing Sentiment analysis techniques quite efficiently capture opinions from text written in syntactically correct and explicit language. However, while dealing with the informal data, limitation has been observed in performance of sentiment analysis techniques. With a view to deal with the imperfect and indirect language used by the netizens, it has become necessary to work on improvement in the existing sentiment analysis techniques. In this regard, the conventional sentiment analysis techniques have shown some improvement on applying the appropriate context information. However, still there is ample scope for further research to find the relevant “context” and applying it to a given scenario. This systematic literature review paper intends to explore and analyze the existing work on the context-based sentiment analysis and to report gaps and future directions in the said research area.
... The proposed system got 10 th rank in the competition and achieved an F1-score of 0.624. Yunxiao Zhouet al., reported [28] a system for SA in twitter task of SemEval-2017 competition. They investigated various traditional Natural Language Processing (NLP) features (Word RF n-grams, POS tag, Negation), domain specific features (All-caps, Bag-of-Hashtags, Elongated, Emoticon, Punctuation) and word embedding features (GoogleW2V, GloVe, sentiment word vector (SWV), sentiment-specific word embedding (SSWE)) alongwith supervised machine learning techniques ((SVM), AdaBoost, Logistic Regression (LR) and SGD) to address this task. ...
Article
Bayesian belief network is an effective and practical approach that is widely acceptable for real-time series prediction and decision making. However, its computational efforts and complexity increased exponentially with increased number of states. Hence, this research paper a proposed approach inspired by context-based persuasion analysis of sentiment analysis and its impact on the propagation of false information is designed. As social media text consist of unwanted information and needs to be addressed including effective polarity prediction of a sentimentwise ambiguous word in generic contexts. Therefore, in proposed approach persuasion-based strategy based on social media crowd is considered for analyzing the impact of sentimental contextual polarity in social media including pre-processing. For analyzing the polarity of sentiment, Bayesian belief network is used, whereas Turbo Parser is implemented for visual representation of diverse feature class and spontaneous hold of the relationships between features. Furthermore, to analyze the lexicons dependency on each word in terms of context, a tree-based dependency parser representation is used to count the dependency score. Features associated with sentimental words are extracted using Penn tree bank for sentiment polarity disambiguation. Therefore, a graphical model known as Bayesian network learning is opted to design a proposed approach which take care the dependency among various lexicons. Various predictors, namely, (1) pre-processing and subjectivity normalization, (2) computation of threshold and persuasion factor, and (3) extraction of sentiments from dependency parsing from the retrieved text are introduced. The findings of this study indicate that it is most important to compute the local and global context of various sentimental words to analyze the polarity of text. Furthermore, we have tested our proposed method with a standard data set and a real case study is also implemented based on COVID-19, Olympics-2020 and Russia–Ukraine war for the feasibility analysis of the proposed approach. The findings of this study imply a complex and context-dependent mechanism behind the sentiment analysis which shed lights on the efforts for resolving contextual polarity disambiguation in social media.
Preprint
Full-text available
This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii)~we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.
Conference Paper
Full-text available
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design, straight-forward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Conference Paper
In this paper, we describe our system for the Sentiment Analysis of Twitter shared task in SemEval 2014. Our system uses an SVM classifier along with rich set of lexical features to detect the sentiment of a phrase within a tweet (Task-A) and also the sentiment of the whole tweet (TaskB). We start from the lexical features that were used in the 2013 shared tasks, we enhance the underlying lexicon and also introduce new features. We focus our feature engineering effort mainly on TaskA. Moreover, we adapt our initial framework and introduce new features for TaskB. Our system reaches weighted score of 87.11% in Task-A and 64.52% in Task-B. This places us in the 4th rank in the TaskA and 15th in the Task-B.
Conference Paper
We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors. We address this issue by learning sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions. To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with hand-crafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set.
Article
We introduce a novel approach for automatically classify-ing the sentiment of Twitter messages. These messages are classified as either positive or negative with respect to a query term. This is useful for consumers who want to re-search the sentiment of products before purchase, or com-panies that want to monitor the public sentiment of their brands. There is no previous research on classifying sen-timent of messages on microblogging services like Twitter. We present the results of machine learning algorithms for classifying the sentiment of Twitter messages using distant supervision. Our training data consists of Twitter messages with emoticons, which are used as noisy labels. This type of training data is abundantly available and can be obtained through automated means. We show that machine learn-ing algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data. This paper also describes the preprocessing steps needed in order to achieve high accuracy. The main contribution of this paper is the idea of using tweets with emoticons for distant supervised learning.
Article
In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with SVM and kappa NN algorithms. In consideration of the distribution of relevant documents in the collection, we propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminating power for text categorization task. From the controlled experimental results, these supervised term weighting methods have mixed performance. Specifically, our proposed supervised term weighting method, tf.rf, has a consistently better performance than other term weighting methods while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets.
NRC-Canada: Building the state-ofthe-art in sentiment analysis of tweets
  • Saif Mohammad
  • Svetlana Kiritchenko
  • Xiaodan Zhu
Saif Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. 2013. NRC-Canada: Building the state-ofthe-art in sentiment analysis of tweets. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA, pages 321-327.