Article
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This section analysis and compares the performance of the most efficient sentiment analysis methods that were selected from the methodologies analyzed in the previous sections to determine the comparative performance efficiency. The most efficient methods considered for performance comparative analysis in this paper are: JMTS [2], ConSent [12], SWIMS [13], SentiMI [14], Politwi [17], SACI [18], Topic Sentence-based Instance Transfer (TSIT) [23], SentiView [24], AEPs [29], and Bilingual Sentiment Analysis (Bilingual SA) [30]. The comparison was done by the experimental results of the methods in terms of accuracy, precision, recall and F-measure. ...
... The F-measure is a accuracy testing score considering both the precision and recall, and is calculated as follows: [30] Bilingual sentiment analysis, dictionary based segmentation, statistics and machine learning based segmentation, Modified Chi-square feature selection, Modified N-Gram method, SVM Figure 1 compares of the selected efficient sentiment analysis methods in terms of accuracy. The graph clearly shows that the ConSent method [12] provides a higher level of accuracy when it compared to other methods. ...
... Figure 2 presents a comparison of the selected efficient sentiment analysis methods in terms of precision. The graph clearly shows that the Bilingual sentiment analysis method considered in [30] provides higher degree of precision when it compared to other methods. Figure 3 compares the selected efficient sentiment analysis methods in terms of recall. ...
Research
Full-text available
This paper also highlights the advantages and disadvantages of the analyzed methodologies with the objective of determining the efficiency of the sentiment analysis schemes. Finally the sentiment analysis schemes have been compared in terms of performance evaluation metrics with respect to the social media contents. Thus this paper work provides a detailed analysis of the recent sentiment analysis schemes and throws light on new avenues for future research work in this domain
... This type of classification is called multiclass classification. In general, there are various levels in sentiment classification:-Document Level Sentiment Classification, Sentence level Sentiment Classification and Aspect Level Sentiment Classification [12], [26], [29], [30], [45], [48], [55], [61], [65]. ...
... Fifth step is to test the dataset using training data. Finally we can classify it as positive,negative or neutral [16], [20], [29], [30], [36], [44], [62], [72], [74]. For example,consider the Nikon COOLPIX S3300 review from Amazon. ...
... Opinion polling generation is a final task which gives quantitative indications of user as positive and negative opinions about product or business. Hownet lexicon is used to find the polarity of each aspect expressed in each review.The authors Farhan Hassan Khan et al. [29] have developed a framework called SWIMS which performs subjectivity detection by determining the feature weights and a lexical approach is used for polarity selection on different part of speech simultaneously To achieve high accuracy, a model called 'Intelligent model selection' is also proposed for cross valkidation. A semi-supervised framework for feature selection called Multi-Objective Model Selection(MOMS) is proposed by the authors Farhan Hassan Khan el at. ...
... This research compares our method with SWIMS [29], Diego Terrana [27] and manually annotated data. For the manually labelled dataset, we label the review ratings regarding how appropriate the aspects are to the products using the abovementioned rating scale. ...
... The baseline methods use dataset-based general sentiments; when a complex sentence structuring is present, they produce majority-based results or could consider them neutral [27,29]. ...
... 7 shows a comparison of the aspect term extraction after sentiment term detection is applied to calculate the polarity estimation. Tab. 8 presents an analysis of the polarity estimation, which is discussed in detail in Section 3. Fig. 8 visualizes a comparison of ABSA-PER aspect extraction with existing state-of-the-art approaches [29]. ...
Article
Full-text available
: Most consumers read online reviews written by different users before making purchase decisions, where each opinion expresses some sentiment. Therefore, sentiment analysis is currently a hot topic of research. In particular, aspect-based sentiment analysis concerns the exploration of emotions, opin�ions and facts that are expressed by people, usually in the form of polarity. It is crucial to consider polarity calculations and not simply categorize reviews as positive, negative, or neutral. Currently, the available lexicon-based method accuracy is affected by limited coverage. Several of the available polarity estimation techniques are too general and may not re�ect the aspect/topic in question if reviews contain a wide range of information about different topics. This paper presents a model for the polarity estimation of customer reviews using aspect-based sentiment analysis (ABSA-PER). ABSA-PER has three major phases: data preprocessing, aspect co-occurrence calculation (CAC) and polarity estimation. A multi-domain sentiment dataset, Twitter dataset, and trust pilot forum dataset (developed by us by de
... In the operation, the input features fed to intermediate layer are obtained by concatenating the outputs of the previous layer of intermediate layer and input layer. Moreover, Khan et al. [41,42] took sentiment lexicon into consideration and created two lexicon-based machine learning SA approaches. In this case, sentiment lexicons are incorporated into machine learning classifier. ...
... In recent years, sentiment lexicon was introduced into machine learning based SA approaches. Khan et al. [41] came up with a hybrid approach named SWIMS incorporating machine learning with a lexicon based approach, where SentiWordNet determined the feature weight and SVM was utilized to learn the feature weights. Poria et al. [54] proposed a SA system based on dependency rules and machine learning classifier. ...
... Sentiment analysis refers to a discipline that deals with the analysis and classification of subjective opinions, sentiments and emotions of consumers towards products or services expressed in texts [19]. Current research on sentiment analysis techniques can be roughly classified into three categories, namely, sentiment analysis techniques based on machine learning [20], lexicon-based sentiment analysis techniques [12,21,22] and hybrid models [4,23]. Machine learning methods excel in optimising system parameters with a large training data. ...
... Lexicon-based methods use positive and negative term sets to classify sentiments [4], whereas hybrid models combine the merits of both. To date, many works studied machine learning methods involving supervised learning models [2,24,25], semi-supervised learning models [23] and unsupervisedlearning models [26]. Extant research on lexicon-based sentiment analysis techniques mainly consists of dictionary-and corpus-based [19] categories. ...
Article
Online product reviews significantly impact the online purchase decisions of consumers. However, extant decision support models have neglected the randomness and fuzziness of online reviews and the interrelationships among product features. This study presents an integrated decision support model that can help customers discover desirable products online. This proposed model encompasses three modules: information acquisition, information transformation, and integration model. We use the information acquisition module to gather linguistic intuitionistic fuzzy information in each review through sentiment analysis. We also apply the information transformation module to convert the linguistic intuitionistic fuzzy information into linguistic intuitionistic normal clouds (LINCs). The integration module is employed to obtain the overall LINCs for each product. A ranked list of alternative products is determined. A case study on Taobao.com is then provided to illustrate the effectiveness and feasibility of the proposal, along with sensitivity and comparison analyses, to verify its stability and superiority. Finally, conclusions and future research directions are suggested.
... HR sentiment analysis, also called opinion mining, aims to analyse employee sentiments, opinions and attitudes towards different elements like the organisation's services and products (Khan, Qamar, & Bashir, 2016). The opportunity to identify the divergence of opinions about the corresponding entity (product, service, decision, event) enables the stabilisation, revision, improvement or withdrawal of that entity to maintain receiver satisfaction (Murali Krishna & Lavanya Devi, 2019). ...
... Text mining can also analyse sentiments by combing natural language processing (NLP) and data mining techniques. It concentrates on identifying an opinion or sentiment by using the ML algorithm (Khan, Qamar, & Bashir, 2016) . ...
Article
Full-text available
This paper deals with the role of Artificial Intelligence (AI) in Human Resource Management (HRM). Although AI emerged in the mid of the twentieth century, current literature still offers an inconsistent view of AI in HRM. This piece of research provides an overview of the academic literature published in this field. AI and HRM, two separated research streams so far, have been analysed to aggregate knowledge and to identify common patterns on the interaction between them. The aim of this piece of paper is to analyse how AI can influence HRM and derive a specific definition of AI in HRM. Moreover, the authors discuss AI applications in HRM and current academic framework for AI adoption in HRM. The findings show a comprehensive review of the relationship between AI and HRM, identifying a research gaps regarding this knowledge area, and the implications of AI concerning.
... Liu et al. (2017) combined feature selection algorithm and machine learning method to propose a framework for multi-class sentiment classification. Khan et al. (2016) introduced a new framework called SWIMS to determine the feature weight based on sentiment lexicon, SentiWordNet. Liu et al. (2017) proposed a framework which combines feature selection algorithm and machine learning method for multi-class sentiment classification. ...
... Sentiment classification has become very important when the amount of digital text resources remarkably increases (Gokalp et al. 2020;Chouchani and Abed 2020). The purpose of sentiment analysis is to analyze the publics' sentiments, opinions, attitudes, emotions, and so on, towards different elements such as topics, products or services, individuals, or organizations (Liu et al. 2005;Khan et al. 2016;Singh et al. 2020). According to available works, machine learning method has been report as one of effective solutions. ...
Article
Full-text available
Text based social media has become one of important communication tools between customers and enterprises. In social media, users can easily express their opinions and evaluation regarding products or services. These online user experiences, especially negative evaluations indeed affect other consumers’ behaviors. Consequently, to effectively identify customers’ sentiments and avoid these negative comments to bring a great damage to enterprisers has become one of critical issues. In recent years, machine learning algorithms were viewed as one of effective solutions for sentiment classification. But, when the amount of the online reviews arises, the dimensionality of text data rises remarkably. The performances of machine learning methods have been degraded due to the dimensionality problem. But, conventional feature selection methods tend to select attributes from the majority sentiments, which usually cannot improve classification performance. Therefore, this study attempt to present two feature selection methods called modified categorical proportional difference (MCPD) approach that improves conventional CPD method, and balance category feature (BCF) strategy that equally selects attributes from both positive and negative examples, to improve sentiment classification performances. Finally, several real sentiment cases of text reviews will be provided to demonstrate the effectiveness of our proposed methods. Results showed that the combination of proposed BCF strategy and MCPD method can not only remarkably reduce feature space, but also improve the sentiment classification performance.
... Sentiment analysis has attracted considerable attention in recent years due to its applicability for various purposes (Abdi et al. 2018a;Khan et al. 2016;O'Connor et al. 2010). A document may include facts, opinions or objective/subjective information. ...
... A synset is ''objective'' if the following equation is equal to 1, ObjScore ¼ 1 À ðPosScore þ NegScoreÞ. Equation (1) (Khan et al. 2016) is used to calculate the score of each word within the range of [-1, 1]: ...
Article
Full-text available
This paper presents an automatic sentiment-oriented summarization of multi-documents using soft computing (called ASMUS). It integrates two main phases: sentiment analysis and sentiment summarization. Sentiment analysis phase includes multiple strategies to tackle the following drawbacks: (1) word coverage limit of an individual lexicon; (2) contextual polarity; (3) sentence types, while the sentiment summarization phase is a graph-based ranking model that integrates the sentiment information, statistical and linguistic methods to improve the sentence ranking result. We found that the current methods are suffering from the following problems: (1) they do not consider the semantic and syntactic information in comparison between two sentences when they share the similar bag-of-words (capturing meaning); (2) vocabulary mismatch problem (lexical gaps). Furthermore, ASMUS also considers content coverage and redundancy. We conduct the experiments on the Document Understanding Conference datasets. The results present the excellent outcomes of the ASMUS in sentiment-oriented summarization.
... Supervised learning algorithm is very efficient for classifying the sentiments but due to the lack of labeled training data it becomes difficult to adopt this algorithm for classification while general purpose lexicons do not require any training data [6,16]. This research makes following contribution towards opinion making: ...
Chapter
Opinion mining and its applications in product recommendation, business intelligence, targeted marketing etc. has got a lot of research attention to the area of data mining. Many researches have been conducted for improving opinion mining and various frameworks have been proposed. Still the improvement is in progress and more efficient and improved frameworks are being proposed. For opinion analysis from reviews, classifying them as positive and negative and then making future decisions about the product is very important and fascinating aspect of text mining. Many techniques fail to provide the coverage to the features of the product and are not very progressive in accurately classifying and ranking the reviews. In this paper, we have proposed a framework for opinion mining to process public reviews in Facebook comments. The features are ranked and clustered according to the similarity between them. Many methodologies fail at the point of finding which features are relatively similar and can be easily grouped together. This framework also retrieves reviews and summarizes them in a most efficient way providing coverage to the features.
... The sentiment classification of online reviews is the most fundamental and important work in natural language processing [1,2]. At present, the review sentiment classification (positive and negative) has been widely studied, but the review sentiment classification cannot meet the demand for fine-grained review sentiment analysis [3][4][5][6][7]. For example, how do consumers choose the most appropriate product from several products that are being reviewed? ...
Article
Full-text available
With the explosion of online user reviews, review rating prediction has become a research focus in natural language processing. Existing review rating prediction methods only use a single model to capture the sentiments of review texts, ignoring users who express the sentiment and products that are evaluated, both of which have great influences on review rating prediction. In order to solve the issue, we propose a review rating prediction method based on user context and product context by incorporating user information and product information into review texts. Our method firstly models the user context information of reviews, and then models the product context information of reviews. Finally, a review rating prediction method that is based on user context and product context is proposed. Our method consists of three main parts. The first part is a global review rating prediction model, which is shared by all users and all products, and it can be learned from training datasets of all users and all products. The second part is a user-specific review rating prediction model, which represents the user’s personalized sentiment information, and can be learned from training data of an individual user. The third part is a product-specific review rating prediction model, which uses training datasets of an individual product to learn parameter of the model. Experimental results on four datasets show that our proposed methods can significantly outperform the state-of-the-art baselines in review rating prediction.
... A widely used weighting scheme borrowed from information retrieval systems is the TF-IDF weighting scheme. To reduce dimensionality, feature selection algorithms can be used, selecting the most class label informative features by using for example χ 2 , mutual information or lexicon based selection [19]. Another popular approach for document representation uses word embeddings [24] [28], a lower dimensional projection of a high dimensional feature space. ...
Preprint
Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a predictive measure, which allows us to select an optimal source domain from a set of candidates. The proposed measure is a linear combination of well-known distance functions between probability distributions supported on the source and target domains (e.g. Earth Mover's distance and Kullback-Leibler divergence). The performance of the proposed methodology is validated through an SC case study in which our numerical experiments suggest a significant improvement in the cross domain classification error in comparison with a random selected source domain for both a naive and adaptive learning setting. In the case of more heterogeneous datasets, the predictability feature of the proposed model can be utilized to further select a subset of candidate domains, where the corresponding classifier outperforms the one trained on all available source domains. This observation reinforces a hypothesis that our proposed model may also be deployed as a means to filter out redundant information during a training phase of SC.
... Sentiment analysis has become an important research field since the early 2000s (Habernal, Ptáček, & Steinberger, 2014). It has attracted considerable attention in recent years due to its applicability in various purposes (Khan, Qamar, & Bashir, 2016b). A document may include facts, opinions, or objective/ subjective information. ...
Article
Sentiment summarization is the process of automatically creating a compressed version of the opinionated information expressed in a text. This paper presents a machine learning-based approach to summarize user’s opinion expressed in reviews using: 1)Sentiment knowledge to calculate a sentence sentiment score as one of the features for sentence-level classification. It integratesmultiple strategies to tackle the following problems: sentiment shifter, the types of sentencesand word coverage limit. 2)Word embedding model, a deep-learning-inspired method to understand meaning and semantic relationships among wordsand to extract avector representation for each word. 3)Statistical and linguistic knowledge todetermine salientsentences.The proposed method combines several types of features into a unified feature set to design a more accurate classification system(“True”: the extractive reference summary; “False”: otherwise). Thus, to achieve better performance scores,we carried out a performance study of four well-known feature selection techniques and seven of themostfamous classifiers to select themost relevant set of features and find an efficient machine learning classifier, respectively.The proposed method is applied to three different datasets and the results show the integration of support vector machine-basedclassificationmethod and Information Gain (IG) as a feature selection technique can significantly improve the performance and make the method comparable to other existing methods.Furthermore, our method that learns from this unified feature set can obtain better performance thanone that learns from a feature subset
... Recently, Loog [50] proposed a contrastive pessimistic likelihood estimation for semi-supervised classification. SSL algorithms have successfully applied into text classification [51] and sentiment analysis [52,53] . ...
Article
Transductive Support Vector Machine (TSVM) is one of the most successful classification methods for semi-supervised learning (SSL). One challenge of TSVMs is that the performance degeneration is caused by unlabeled examples that are obscure or misleading for the discovery of the underlying distribution. To address this problem, we disclose the underlying data distribution and describe the margin distribution of TSVMs as the first-order (margin mean) and second-order (margin variance) statistics of examples. Since the optimization problems of TSVMs are not convex, we utilize the concave-convex procedure and variation of stochastic variance reduced gradient methods to solve them. Particularly, we propose two specific algorithms to optimize the margin distribution of TSVM via maximizing the margin mean and minimizing the margin variance simultaneously, which the generalization ability is improved and being robust to the outliers and noise. In addition, we derive a bound on the expectation of error according to the leave-one-out cross-validation estimate, which is an unbiased estimate of the probability of test error. Finally, to validate the effectiveness of the proposed method, extensive experiments are conducted on diversity datasets. The experimental results demonstrate that the performance of proposed algorithms are superior to the existing TSVMs and other semi-supervised learning methods.
... They calculated frequency of features for mentioned classes, and then applied PMI and CHI feature selection techniques to determine features weights. The experimental results showed that the best accuracy was achieved when using hybrid features and SVM learning algorithm (Khan et al., 2016). ...
Article
Purpose This paper aims to propose a statistical and context-aware feature reduction algorithm that improves sentiment classification accuracy. Classification of reviews with different granularities in two classes of reviews with negative and positive polarities is among the objectives of sentiment analysis. One of the major issues in sentiment analysis is feature engineering while it severely affects time complexity and accuracy of sentiment classification. Design/methodology/approach In this paper, a feature reduction method is proposed that uses context-based knowledge as well as synset statistical knowledge. To do so, one-dimensional presentation proposed for SentiWordNet calculates statistical knowledge that involves polarity concentration and variation tendency for each synset. Feature reduction involves two phases. In the first phase, features that combine semantic and statistical similarity conditions are put in the same cluster. In the second phase, features are ranked and then the features which are given lower ranks are eliminated. The experiments are conducted by support vector machine (SVM), naive Bayes (NB), decision tree (DT) and k-nearest neighbors (KNN) algorithms to classify the vectors of the unigram and bigram features in two classes of positive or negative sentiments. Findings The results showed that the applied clustering algorithm reduces SentiWordNet synset to less than half which reduced the size of the feature vector by less than half. In addition, the accuracy of sentiment classification is improved by at least 1.5 per cent. Originality/value The presented feature reduction method is the first use of the synset clustering for feature reduction. In this paper features reduction algorithm, first aggregates the similar features into clusters then eliminates unsatisfactory cluster.
... The semi-supervised approach combines the features of the lexicon-based system with machine learning-based techniques to optimize sentiment classification performance [33]. The incorporation of machine learning into the lexicon approach increases the domain independence in lexicons [34]. ...
Article
Social media materialized as an influential platform that allows people to share their views on global and local issues. Sentiment analysis can handle these massive amounts of unstructured reviews and convert them into meaningful opinions. Undoubtedly, COVID-19 originated as the enormous challenge across the world that physically and financially bruted humankind. Meanwhile, farmers' protests shook up the world against three pieces of legislation passed by the Indian government. Hence, an artificial intelligence-based sentiment model is needed for suggesting the right direction toward outbreaks. Although Deep Neural Network (DNN) gained popularity in sentiment analysis applications, these still have a limitation of sequential training, high-dimension feature space, and equal feature importance distribution. In addition, inaccurate polarity scoring and utility-based topic modeling are other challenging aspects of sentiment analysis. It motivates us to propose a Knowledge-Enriched Attention-based Hybrid Transformer (KEAHT) model by enriching the explicit knowledge of Latent Dirichlet Allocation (LDA) topic modeling and lexicalized domain ontology. A pre-trained Bidirectional Encoder Representation from Transformer (BERT) is employed to train within a minimum training corpus. It provides the facility of attention mechanism and can solve complex text problems accurately. A comparative study with existing baselines and recent hybrid models affirms the credibility of the proposed KEAHT in the field of Natural Language Processing (NLP). This model emphasizes artificial intelligence's role in handling the situation of the global pandemic and democratic dispute in a country. Furthermore, two benchmark datasets, namely “COVID-19-Vaccine-Labelled-Tweets" and "Indian-Farmer-Protest-Labelled-Tweets”, are also constructed to accommodate future researchers for outlining the essential facts associated with the outbreaks.
... Besides basic machine learning approaches, some lexicon based machine learning approaches make a great progress in improving the performance of SA. Khan et al. [24] presented a hybrid approach named SWIMS that incorporated machine learning with a lexical based approach, where SentiWord-Net was used to determine the feature weight and support vector machine was utilized to learn the feature weights. Moreover, Khan et al. [25] built a general purpose sentiment lexicon in a semi-supervised manner. ...
Article
Online travel has developed dramatically during the past three years in China. This results in a large amount of unstructured data like tourism reviews from which it is hard to extract useful knowledge. In this paper, a DWWP system consisting of domain-specific new words detection (DW) and word propagation (WP) is presented. DW deals with the negligence of user-invented new words and converted sentiment words by means of AMI (Assembled Mutual Information). Inspired by social networks, the new method WP incorporates manually calibrated sentiment scores, semantic and statistical similarity information, which improves the quality of sentiment lexicon in comparison with existing data-driven methods. Experimental results show that DWWP improves seventeen percentage points compared with graph propagation and four percentage points compared with label propagation in terms of accuracy on Dataset I and Dataset II, respectively.
... Nowadays, opinion mining, sentiment analysis, and subjectivity attracted significant attention from both the research community and industry (Khan, Qamar, & Bashir, 2016b). The main goal of sentiment analysis is to use an automated approach to identify positive, negative, or neutral sentiment from documents (Hung & Chen, 2016). ...
Article
Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews. QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon. On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users’ needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.
... Supervised learning algorithm is very efficient for classifying the sentiments but due to the lack of labeled training data it becomes difficult to adopt this algorithm for classification while general purpose lexicons do not require any training data [6,16]. This research makes following contribution towards opinion making: ...
... Dolay s yla bir konu ile ilgili dü üncenin ne oldu u bilgisi sosyal medyada bulunan verinin içerisinde gizlidir. Duygu analizi ise gizli olan anlamsal bilgileri ortaya ç kararak, konuyla ilgili görü s n fland rmas yapar ve konu hakk ndaki görü ün yüksek oranda ne oldu unu ortaya ç kar r [14]. ...
... This technique proposes three methods for word sense disambiguation (WSD), on the basis of the context in which the text is implied. Khan et al. [18] presented a feature weighting technique, using semi-supervised feature weighting method. This technique uses SVM to create a model that performs intelligent sentiment analysis on the provided text. ...
... The model described the detailed subjectivity relations that exist between different actors in a sentence expressing separate attitudes for each actor. Farhan Hassan Khan et al. (2016) proposed a semi-supervised subjective feature weighting and an intelligent model selection called SWIMS for sentiment analysis. A SWIM determines the feature weight based on SentiWordNet. ...
Chapter
Full-text available
Sentiment analysis is one of the most important applications in the field of text mining. It computes people's opinions, comments, posts, reviews, evaluations, and emotions which are expressed on products, sales, services, individuals, organizations, etc. Nowadays, large amounts of structured and unstructured data are being produced on the web. The categorizing and grouping of these data become a real-world problem. In this chapter, the authors address the current research in this field, issues and the problem of sentiment analysis on Big Data for classification and clustering. It suggests new methods, applications, algorithm extensions of classification and clustering and software tools in the field of sentiment analysis.
... As a lexicon-based approach, WordNet is an external resource that is almost always available. WordNet has been utilized as a query expansion technique [13] for information retrieval and feature selection for sentiment analysis [14]. As an example of an internal resource approach, a method for corpus expansion for SMT using semantic role label (SRL) substitution rules that rely on an SRL labeler was proposed in [15]. ...
Article
This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.
... There are three main categories of word segmentation algorithms currently: based on string matching, based on understanding, and statistics-based word segmentation. Chinese lexical analysis software ICTCLAS [9] is the most widely used systems for Chinese word segmentation, part of speech tagging, named entity recognition and new word recognition. We found that there are a lot of noise data in the original review corpus. ...
... Feature selection problems arise in a variety of applications, reflecting their importance. Instances can be found in: microarray analysis (see Xing et al., 2001;Saeys et al., 2007;Bolón-Canedo et al., 2013;Li et al., 2004;Liu et al., 2002), clinical prediction (see Bagherzadeh-Khiabani et al., 2015;Li et al., 2004;Liu et al., 2002), text categorization (see Yang and Pedersen, 1997;Rogati and Yang, 2002;Varela et al., 2013;Khan et al., 2016), image classification and face recognition (see Bolón-Canedo et al., 2015), multi-label learning (see Schapire and Singer, 2000;Crammer and Singer, 2002), and classification of internet traffic (see Pascoal et al., 2012). ...
Article
Full-text available
Feature selection problems arise in a variety of applications, such as microarray analysis, clinical prediction, text categorization, image classification and face recognition, multi-label learning, and classification of internet traffic. Among the various classes of methods, forward feature selection methods based on mutual information have become very popular and are widely used in practice. However, comparative evaluations of these methods have been limited by being based on specific datasets and classifiers. In this paper, we develop a theoretical framework that allows evaluating the methods based on their theoretical properties. Our framework is grounded on the properties of the target objective function that the methods try to approximate, and on a novel categorization of features, according to their contribution to the explanation of the class; we derive upper and lower bounds for the target objective function and relate these bounds with the feature types. Then, we characterize the types of approximations taken by the methods, and analyze how these approximations cope with the good properties of the target objective function. Additionally, we develop a distributional setting designed to illustrate the various deficiencies of the methods, and provide several examples of wrong feature selections. Based on our work, we identify clearly the methods that should be avoided, and the methods that currently have the best performance.
... Sentiment analysis or opinion mining is a discipline which deals with analyzing and classifying subjective opinions, sentiments and emotions of people towards products, organizations, individuals and other topics [1,3,4] , expressed in text, such as tweets [5] , reviews [6] , forums [7] , blogs [8] and news [9] . Sentiment analysis makes it possible to capture the trends of people [10] . ...
Article
Sentiment analysis is about classifying opinions expressed in text. The aim of this study is to improve polarity classification of sentiments in microblogs by building adaptive sentiment lexicons. In the proposed method, corpora-based and lexicon-based approaches are combined and lexicons are generated from text. The sentiment classification is formulated as an optimization problem, in which the goal is to find optimum sentiment lexicons. A novel genetic algorithm is then proposed to solve this optimization problem and find lexicons to classify text. The algorithm generates adaptive sentiment lexicons, and then a meta-level feature is extracted based on it, which is then used alongside Bing Liu's lexicon and n-gram features. The experiments are conducted on six datasets. In terms of accuracy, the results outperform the state-of-the-art methods proposed in the literature in two of the datasets. Also, in four of the datasets, the proposed approach outperforms in terms of F-measure. Applying the proposed method on six datasets, the accuracy is higher than 80% in all six datasets and the F-measure is higher than 80% in four of these datasets. Using the sentiment lexicons created by the proposed algorithm, one can get a better understanding of the specific language and culture of Twitter users and sentiment orientation of words in different contexts. It is also shown that it is useful not to omit the conventional stop-words, as each word can have its sentimental implications.
... Support vector machine is utilized for the feature weights learning and an intelligent selection approach was exploited to improve the classification accuracy. Considerably, the subjectivity was used to select the features and the effects of POS on feature selection were presented [58]. The metaheuristic method (CSK) was proposed based on K-means and cuckoo search. ...
Article
Full-text available
In the traditional Web, users are considered as information consumers. In social Web, users play a much more active role since they are now not only information consumers but also data providers. Users like online posting reviews which has become an increasingly popular way to express opinions and sentiments toward the products bought or services received. Analyzing these reviews can be helpful for collecting opinions of people about products, social events and problems and would produce useful actionable knowledge that could be of economic values to vendors and other interested parties. Thus, due to the huge number of reviews and their unstructured nature, efficient computational methods are needed for mining and summarizing these reviews, because regular analysis of reviews does not indicate user likes and dislikes. In a review, user typically writes about both the positive and negative aspects of the object, although the general sentiment toward that object may be positive or negative. That's why sentiment analysis together with opinion mining try to extract and study of user's opinions, sentiments and subjectivity of text. However, this analysis must come with careful consideration of user's anonymity and the privacy of their sensitive data as privacy is today an important concern for both users and enterprises. In this research, automatic analysis of opinions (opinion mining) is performed to obtain such detailed aspects based on ontology. Opinion mining identify the features in the opinion and classify the sentiments of the opinion for each of these features. Opinion mining is a difficult task, owing to both the high semantic variability of the opinions expressed, and the diversity of the characteristics and sub-characteristics that describe the products and the multitude of opinion words used to depict them. In the proposed approach, the opinion polarity and polarity strength are measured using fuzzy set. As the fuzzy set theory is quite effective in processing natural languages, to measure the vagueness, it will also be effective in analyzing review articles, which are generally in natural languages. Additionally, the proposed system takes privacy into consideration by anonymizing data before final publishing. Methods of generalization and micro-aggregation are utilized for anonymizing quasi-identifiers to maintain the balance between data utility and user privacy.
... How to mine the information of reviews on sentiment and opinions has become a fundamental problem in natural language processing (NLP) and Web mining fields [1,2]. Sentiment polarity classification of online reviews has been widely studied in NLP, but it gradually fails to meet the requirement for mining fine-grained sentiment [3][4][5][6][7]. For example, a consumer doesn't know how to choose the optimum product from all kinds of products when they all belong to the positive sentiment polarity. ...
Article
Full-text available
With the explosive growth of product reviews, review rating prediction has become an important research topic which has a wide range of applications. The existing review rating prediction methods use a unified model to perform rating prediction on reviews published by different users, ignoring the differences of users within these reviews. Constructing a separate personalized model for each user to capture the user’s personalized sentiment expression is an effective attempt to improve the performance of the review rating prediction. The user-personalized sentiment information can be obtained not only by the review text but also by the user-item rating matrix. Therefore, we propose a user-personalized review rating prediction method by integrating the review text and user-item rating matrix information. In our approach, each user has a personalized review rating prediction model, which is decomposed into two components, one part is based on review text and the other is based on user-item rating matrix. Through extensive experiments on Yelp and Douban datasets, we validate that our methods can significantly outperform the state-of-the-art methods.
... Semi supervised feature weighting and intelligent model selection (SWIMS) approach is proposed by Khan et.al.,which obtain feature weights from SentiWordNet [11]. Their proposed approach work on lexical information obtained from comments and further, classification is done based on weights obtained from features using SVM. ...
Article
Full-text available
In the present situation, when the state and the central government ask the people to stay in their home if it is not necessary, the customer prefers the on-line market for buying their products. Before buying any product, the buyers always want to collect the feedback about the product they want to buy and based on the comments of the previous buyers that consider the product for buying or not. The reviews are mostly textual in nature and may or may not be a labelled one. However, when the new buyers analyse the reviews, they preferred the reviews might be labelled to help them in decision-making. While analysing the review datasets, the comments with smaller size are mostly labelled and the large size of the reviews are unlabelled. In the proposed approach, the inductive semi-supervised based approach is adopted to assign proper polarity for unlabelled reviews. The concept of inductive learning is quite similar to that of induction technique used in mathematics, where the concept or formula is proved for a particular value `k' and again proved for the next element and thus, the concept is valid for all elements in the dataset. The transformation process is inductive i.e., the knowledge obtained from the reviews which are already labelled and based on this obtained knowledge, labels are provided to unlabelled reviews. Again, inductive approach is also used where reviews are considered from the dataset where there is no label assigned with reviews and if they satisfy the required condition, they are considered. This proposed approach is adopted till labels are assigned to all unlabelled once. In order to test the proposed approach, aclIMDb movie reviews are taken into consideration as it contains reviews in both label and unlabelled form. The unlabelled reviews are labelled based on the information obtained from the labelled reviews and finally, all reviews are labelled. In order to test the proposed approach, Naive Bayes is considered. Finally, the accuracy value obtained to check the performance of proposed approach.
... The practice of opinion mining examines and classifies the subjective sentiments, emotions and opinions of the individual towards products, organizations, individuals and other types of the topics are stated in contributions [2], which are represented in the text such as tweets in the form of reviews [5], blogs [7], points out [3], news [6], forums [4]. The work [8] presents that the end users credible information will be estimated by the SA. ...
... Two major techniques commonly employed for SA tasks involve lexicon based technique and machine learning based technique. In revealing sentiment of a text data, lexicon based technique count on a predefined dictionary containing collection of sentiment term called sentiment lexicon such as NRC emotion lexicon [25], SenticNet [26], Harvard General Inquirer, MPQA subjectivity lexicon [27], SO-CAL [28], and Bing Liu's Opinion Lexicon. Lexicon based technique firstly extract part of speech (POS) of a sentence to take sentiment term commonly having POS of verb and adjective. ...
Article
Computational intelligence based technique becomes popular lately for many application including revealing trend in healthy food consumption. Healthy alternative food that insures the basic physical needs of mankind becomes more popular among people worldwide nowadays. Organic food is believed as alternative food providing sustainable benefit for mankind especially under the pandemic situation that body urgently needs to maintain optimal immune system. Organic food helps to supply sufficient nutrients that is important for body to cope with virus infection. Previously, many studies have been conducted worldwide to exhibit organic foods consumption pattern. The approach can be categorized into two types. The first approach relies on pencil survey and focus group discussion involving a certain number of respondents. The analysis commonly applies statistical techniques. This approach has been considered time consuming and costly. A more sophisticated and time saving technique commonly make use social media platform as the primary tool for revealing the pattern. This study is an initial study to provide model of Indonesian organic food consumer considering that Indonesia is potential for both producer and consumer of organic food. The analysis is based on Twitter dataset and applying computational based technique using Lexicon Based Sentiment Analysis using VADER. Beforehand, we perform text analysis using Force Atlas2 to reveal spatial representation of both attraction force and repulsion force of words. To extent VADER, we employ Indonesian sentiment lexicon namely INSET. The sentiment analysis result confirms that 64% user accept positively organic food as healthy dietary food highlighting the importance of organic food for people to maintain optimal immune system in Covid-19 Pandemic Circumstances. Most of the user that positively post organic food, associate the food with “kesehatan”, “praktis”, and “diet”. Meanwhile, the rest post negatively and regard organic food as having expensive price compared with another kind of food.
... The opinion mining practices analyse as well as the classification of subjective sentiments, opinions, as well as emotions of individuals towards organizations, products, individuals as well as other kinds of topics as pointed out by [2] that are presented in text, like tweets as [3] points out, forums [4], reviews [5], news [6], as well as blogs [7]. It is also worth pointing out that credible opinion of the end users (target audience) can be predicted by sentiment analysis [8]. ...
Article
Sentiment Analysis can be stated as an effective system of extricating vivid range of emotions and expressions from the users. Gaining insights in to emotions in to vary aspects of personal development is one of the critical elements for holistic development and sentiment analysis can be very resourceful in such process. SA is an integral development in the AI and plays a vital role in the process of polarity detection. It offers a significant opportunity in terms of capturing the sentiments of common public, customers, users etc, pertaining to varied aspects like product choices, stock market factors brand perceptions, political movements and social events etc. In the process of natural language processing, it is one of the contemporary solutions. Emergence of ICT and social media networks turned out to be a better platform enabling rapid exchange of viewpoints, expression etc. There is phenomenal development in the domain of affective computing and sentiment analysis that offers leverage in terms of system-human interaction, multimodal signal processing, and information retrieval in terms of ever-growing amount of varied social data. In this paper, the present state of various techniques of sentiment analysis is discussed. The various techniques used for Sentiment Analysis are analysed in this paper to perform an evaluation study and check the efficacy and resourcefulness of the earlier contributions in the domain. Our work will also help the future researchers to understand present gaps in the literature of sentiment analysis.
... On regular basis billions of people share their experiences, knowledge and views on latest trend of politics, economics and other global-critical issue. In current time Sentiment Analysis, subjectivity and Opinion mining enthralled significant interest from both the research community and Marketing Agency [2]. The main purpose of sentiment analysis is to rank the opinion according to its level of positive, negative or neutral polarity [3]. ...
... Lexicon-based sentiment analysis is suitable for sentence-level sentiment analysis, which can be further divided into dictionary-based and corpus-based (Liu et al., 2017b). Machine learning based emotion analysis is suitable for document-level sentiment analysis, which can be further divided into based on supervised machine learning (Fan et al., 2017;Liu et al., 2017a) and unsupervised machine learning (Khan et al., 2016). ...
Article
With the development of e-commerce, an increasing number of online reviews can serve as a promising data source for enterprises to improve online products. This paper proposes a method for modelling consumer satisfaction based on online reviews using the improved Kano model from the perspective of risk attitude and aspiration. Firstly, the attributes concerned by consumers are extracted from online reviews, and sentiment analysis of the extracted attributes is carried out using Standford CoreNLP. Secondly, to identify the types of product attributes, an improved Kano model is proposed based on the effects of product attributes on consumer total utility. On this basis, different attribute types are illustrated from the perspective of risk attitude. Then, the consumer aspirations are mined based on the risk attitudes of different attributes and the attribute impact on consumer satisfaction. According to the risk attitudes and aspirations of different attributes, the quantified satisfaction functions are constructed to provide more objective and accurate improvement suggestions. Finally, the proposed method is applied to the hotel service improvement to illustrate the effectiveness.
Article
In recent years, China has increased its investment in science and technology, and digital technologies such as mobile Internet, big data, and cloud computing have continuously made breakthroughs. The integration with the modern financial industry has stimulated online lending, third-party payment, digital insurance, and New financial forms such as digital wealth management are booming. With the rapid development of digital financial inclusion, the development of traditional financial industry has broken through time and geographical constraints, allowing more groups excluded from the traditional financial system to participate in financial activities and enjoy more convenient and faster personalized financial products and services, meet their financial needs and improve the reach of financial services. While the development of digital financial inclusion has benefited more groups, it has not changed the original risks of the financial industry. It has also brought about some negative external effects of financial technology, which poses greater challenges to the protection of financial consumers’ rights and interests. Therefore, this research aims at the digital inclusive risk prediction of financial institutions and personal risk prediction respectively, and proposes a financial risk prediction method based on the adaptive fusion of multi-source heterogeneous data, which can improve the effect of financial risk prediction through the effective use of multi-source data. purpose.
Chapter
The objective of this study was to explore the potential of the opinion mining methodology in sociopolitical research, its techniques, and applications from the field of reflexivity. It is a non-experimental and exploratory study. It concludes that advances in the field of artificial intelligence have provided the sociopolitical sciences with tools that make it possible to approach the dominant trends of opinion in society during specific junctures where decision-making and/or the positioning of an idea, public policy, or social project can be measured in real-time, with broad demographic scopes that can be segmented, bringing the researcher closer to the subject of study with minimum levels of bias. Opinion mining research continues to be dominated by electoral and marketing topics; however, there are potentialities in the research of public policies, social programs, democracy, and governance that are still waiting for the application of opinion mining as a sociopolitical research methodology.
Article
Full-text available
In recent years, the growth of social network has increased the interest of people in analyzing reviews and opinions for products before they buy them. Consequently, this has given rise to the domain adaptation as a prominent area of research in sentiment analysis. A classifier trained from one domain often gives poor results on data from another domain. Expression of sentiment is different in every domain. The labeling cost of each domain separately is very high as well as time consuming. Therefore, this study has proposed an approach that extracts and classifies opinion words from one domain called source domain and predicts opinion words of another domain called target domain using a semi-supervised approach, which combines modified maximum entropy and bipartite graph clustering. A comparison of opinion classification on reviews on four different product domains is presented. The results demonstrate that the proposed method performs relatively well in comparison to the other methods. Comparison of SentiWordNet of domain-specific and domain-independent words reveals that on an average 72.6% and 88.4% words, respectively, are correctly classified.
Article
Full-text available
Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.
Book
This book features best selected research papers presented at the International Conference on Machine Learning, Internet of Things and Big Data (ICMIB 2020) held at Indira Gandhi Institute of Technology, Sarang, India, during September 2020. It comprises high-quality research work by academicians and industrial experts in the field of machine learning, mobile computing, natural language processing, fuzzy computing, green computing, human–computer interaction, information retrieval, intelligent control, data mining and knowledge discovery, evolutionary computing, IoT and applications in smart environments, smart health, smart city, wireless networks, big data, cloud computing, business intelligence, internet security, pattern recognition, predictive analytics applications in healthcare, sensor networks and social sensing and statistical analysis of search techniques.
Article
Subjectivity analysis determines existence of subjectivity in text using subjective clues. It is the first task in opinion mining process. The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative. There were many techniques used to solve the problem of segregating subjective and objective text. This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis. SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions. Eight research questions were drawn for this purpose. Information such as technique, corpus, subjective clues representation and performance were extracted from 97 articles known as primary studies. This information was analysed to identify the strengths and weaknesses of the technique, affecting elements to the performance and missing elements from the subjectivity analysis. The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem. The performance of this approach outperformed other approaches though currently it is at satisfactory level. Therefore, more studies are needed to improve the performance of subjectivity analysis.
Article
Document sentiment classification is an area of study that has been developed for decades. However, sentiment classification of Email data is rather a specialised field that has not yet been thoroughly studied. Compared to typical social media and review data, Email data has characteristics of length variance, duplication caused by reply and forward messages, and implicitness in sentiment indicators. Due to these characteristics, existing techniques are incapable of fully capturing the complex syntactic and relational structure among words and phrases in Email documents. In this study, we introduce a dependency graph-based position encoding technique enhanced with weighted sentiment features, and incorporate it into the feature representation process. We combine encoded sentiment sequence features with traditional word embedding features as input for a revised deep CNN model for Email sentiment classification. Experiments are conducted on three sets of real Email data with adequate label conversion processes. Empirical results indicate that our proposed SSE-CNN model obtained the highest accuracy rate of 88.6%, 74.3% and 82.1% for three experimental Email datasets over other comparative state-of-the-art algorithms. Furthermore, our performance evaluations on the preprocessing and sentiment sequence encoding justify the effectiveness of Email preprocessing and sentiment sequence encoding with dependency-graph based position and SWN features on the improvement of Email document sentiment classification.
Article
Online reviews are becoming increasingly important for decision-making. Consumers often refer to online reviews for opinions before making a purchase. Marketers also acknowledge the importance of online reviews and use them to improve product success. However, the massive amount of online review data, as well as its unstructured nature, is a challenge for anyone wanting to derive a conclusion quickly. In this paper, we propose a novel framework for gauging the ratings of online reviews using machine learning techniques. This framework uses a combination of text pre-processing and feature extraction methods. Here, we investigate four different aspects of the new framework. First, we assess the performance of single and ensemble classifiers in predicting sentiment—positive or negative—initially on a specific dataset (Yelp), but subsequently also on two other datasets (Amazon's product reviews and a movie review dataset). Second, using the best identified classifiers, we improve the accuracy with which neutral polarity can be predicted, an ability largely overlooked in the literature. Third, we further improve the performance of these classifiers by testing different pre-processing and feature extraction methods. Finally, we measure how well our deep learning approach performs on the same task compared to the best previously identified classifiers. Our extensive testing shows that the linear-kernel support vector machine, logistic regression and multilayer perceptron are the three best single classifiers in terms of accuracy, precision, recall, and F-measure. Their performance could be further improved if they were used as base classifiers for ensemble models. We also observe that several text pre-processing techniques—negation word identification, word elongation correction, and part of speech lemmatisation (combined with Terms Frequency and N-gram words)—can increase accuracy. In addition, we demonstrate that the general sentiment of lexicons such as SentiWordNet 3.0 and SenticNet 4 can be used to generate features with good results, although deep learning models can perform equally well. Experiments with different datasets confirm that our framework provides consistent outcomes. In particular, we have focused on improving the accuracy of neutral sentiment, and we conclude by showing how this can be achieved without sacrificing the accuracy of positive or negative ratings.
Thesis
Full-text available
Extraire l'opinion publique en analysant les Big Social data a connu un essor considérable en raison de leur nature interactive, en temps réel. En effet, les données issues des réseaux sociaux sont étroitement liées à la vie personnelle que l’on peut utiliser pour accompagner les grands événements en suivant le comportement des personnes. C’est donc dans ce contexte que nous nous intéressons particulièrement aux méthodes d’analyse du Big data. La problématique qui se pose est que ces données sont tellement volumineuses et hétérogènes qu’elles en deviennent difficiles à gérer avec les outils classiques. Pour faire face aux défis du Big data, de nouveaux outils ont émergés. Cependant, il est souvent difficile de choisir la solution adéquate, car la vaste liste des outils disponibles change continuellement. Pour cela, nous avons fourni une étude comparative actualisée des différents outils utilisés pour extraire l'information stratégique du Big Data et les mapper aux différents besoins de traitement.La contribution principale de la thèse de doctorat est de proposer une approche d’analyse générique pour détecter de façon automatique des tendances d’opinion sur des sujets donnés à partir des réseaux sociaux. En effet, étant donné un très petit ensemble de hashtags annotés manuellement, l’approche proposée transfère l'information du sentiment connue des hashtags à des mots individuels. La ressource lexicale qui en résulte est un lexique de polarité à grande échelle dont l'efficacité est mesurée par rapport à différentes tâches de l’analyse de sentiment. La comparaison de notre méthode avec différents paradigmes dans la littérature confirme l'impact bénéfique de notre méthode dans la conception des systèmes d’analyse de sentiments très précis. En effet, notre modèle est capable d'atteindre une précision globale de 90,21%, dépassant largement les modèles de référence actuels sur l'analyse du sentiment des réseaux sociaux.
Article
In the process of online shopping, consumers usually compare the review information of the same product in different e‐commerce platforms. The sentiment orientation of online reviews from different platforms interactively influences on consumers’ purchase decision. However, due to the limitation of the ability to process information manually, it is difficult for a consumer to accurately identify the sentiment orientation of all reviews one by one and describe the process of their interactive influence. To this end, we proposed an online shopping support model using deep‐learning–based opinion mining and q‐rung orthopair fuzzy interaction weighted Heronian mean (q‐ROFIWHM) operators. First, in the proposed method, the deep‐learning model is used to automatically extract different product attribute words and opinion words from online reviews, and match the corresponding attribute‐opinion pairs; meanwhile, the sentiment dictionary is used to calculate sentiment orientation, including positive, negative, and neutral sentiments. Second, the proportions of the three kinds of sentiments about each attribute of the same product are calculated. According to the proportion value of attribute sentiment from different platforms, the sentiment information is converted into multiple cross‐decision matrices, which are represented by the q‐rung orthopair fuzzy set. Third, considering the interactive characteristics of decision matrix, the q‐ROFIWHM operators are proposed to aggregate this cross‐decision information, and then the ranking result was determined by score function to support consumers' purchase decisions. Finally, an actual example of mobile phone purchase is given to verify the rationality of the proposed method, and the sensitivity and the comparison analysis are used to show its effectiveness and superiority.
Article
Over the past few years, more and more consumers have come to read online reviews when they shop online. To support consumers' purchase decisions, many scholars focus on ranking products based on online reviews and propose various methods and techniques. Generally, the process of information fusion for ranking products based on online reviews consists of three stages: product feature extraction, sentiment analysis, and ranking products. In this paper, we review the existing studies on processes and methods of information fusion for each stage. Furthermore, we briefly review the existing research on information fusion based on online reviews in other fields. Finally, we summarize the main conclusions of this paper and point out the future research direction.
Chapter
Sentiment analysis, which is also referred as opinion mining, attracts continuous and increasing interest not only from the academic but also from the business domain. Countless text messages are exchanged on a daily basis within social media, capturing the interest of researchers, journalists, companies, and governments. In these messages people usually declare their opinions or express their feelings, their beliefs and speculations, i.e., their sentiments. The massive use of on-line social networks and the large amount of data collected through them, has raised the attention to analyze the rich information they contain. In this chapter we present a comprehensive overview of the various methods used for sentiment analysis and how they have evolved in the age of big data.
Article
Full-text available
Sentiment analysis using the part-of-speech (POS) tags and the joint sentiment topic features is a novel idea. As the sentiment analysis requires effective selection of features which are utilized in the determination of sentiment. In this paper, the POS tagging is performed by using hidden Markov model where the unigrams, bigrams and bi-tagged features are extracted. Similarly, the nonparametric hierarchical Dirichlet process is employed to extract the joint sentiment topic features. The extracted features are combined together in a linear fashion in order to effectively select the best feature subset. The best features are selected based on maximum relevance and minimum redundancy mutual information of the feature subset. The mutual information is used to measure the relevance between features and sentiment analysis decision. The maximum relevance and minimum redundancy mutual information remove the redundant features by considering the mutual information between features. Feature selection is carried out by these fitness conditions using firefly optimization algorithm. Then, the chosen feature subset is employed in the classification process which is performed using support vector machine and artificial neural networks. Thus the proposed sentiment analysis method provides more accurate sentiment recognition. Experimental results show that the proposed sentiment analysis method improves the accuracy and reduces the training speed for sentiment analysis.
Article
The effective method of extracting huge amount of expressions & emotions from the consumers is the SA. Achieving the visions into emotions to change the factors of personal improvement are the crucial components aimed at holistic improvement and SA might be very useful for such procedure. The substantial improvement of SA in AI acts as important role in polarity detection procedure. It delivers an important opportunity regarding the sentiments capturing of customers, common public relating to diverse parameters such as political movements, product choices, social events and many more. It will be one of existing solutions in the procedure of processing the natural language. The emergence of social media & ICT networks turned to be best platform for allowing the fast exchange of expressions, viewpoints. And there is a remarkable improvement in field of SA & affective computing which provides leverage regarding interaction of human-system, signal processing of multimodal and retrieval of information regarding diverse amount of social-data. In this paper, the nomenclature of learning dimensions in SA is explored. The diverse methods utilized for SA are examined for executing the assessment study and evaluate the resourcefulness and effectiveness of former works in this field. Our contribution will also assist future researchers for understanding the contemporary gaps in SA literature.
Article
Objectives: Sentiment analysis from the online web and social media contents is an important research and applications field for the organizations, businesses, and political and social life issues; in the business world sentiment analysis provides a clear picture of both quality and user satisfaction about the products, services or an event. Methods/ Statistical Analysis: Extraction of the information from the web, classification and prediction of the sentiment polarity is a complex process which performed through various approaches like Part-Of-Speech Tagging (POST), Support Vector Machine (SVM), and so on. In this paper, the efficient sentiment analysis schemes that introduced in the recent years are discussed and analyzed in order to understand the novel ideas behind these methodologies. Findings: This paper also highlights the advantages and disadvantages of the analyzed methodologies with the objective of determining the efficiency of the sentiment analysis schemes. Finally the sentiment analysis schemes have been compared in terms of performance evaluation metrics with respect to the social media contents. Thus this paper work provides a detailed analysis of the recent sentiment analysis schemes and throws light on new avenues for future research work in this domain.
Article
Full-text available
Sentiment analysis or opinion mining has become an open research domain after proliferation of Internet and Web 2.0 social media. People express their attitudes and opinions on social media including blogs, discussion forums, tweets, etc. and, sentiment analysis concerns about detecting and extracting sentiment or opinion from online text. Sentiment based text classification is different from topical text classification since it involves discrimination based on expressed opinion on a topic. Feature selection is significant for sentiment analysis as the opinionated text may have high dimensions, which can adversely affect the performance of sentiment analysis classifier. This paper explores applicability of feature selection methods for sentiment analysis and investigates their performance for classification in term of recall, precision and accuracy. Five feature selection methods (Document Frequency, Information Gain, Gain Ratio, Chi Squared, and Relief-F) and three popular sentiment feature lexicons (HM, GI and Opinion Lexicon) are investigated on movie reviews corpus with a size of 2000 documents. The experimental results show that Information Gain gave consistent results and Gain Ratio performs overall best for sentimental feature selection while sentiment lexicons gave poor performance. Furthermore, we found that performance of the classifier depends on appropriate number of representative feature selected from text.
Article
Full-text available
Web Opinion Mining (WOM) is a new concept in Web Intelligence.It embraces the problem of extracting, analyzing and aggregating web data aboutopinions. Studying users’ opinions is relevant because through them it is possible todetermine how people feel about a product or service and know how it was receivedby the market. In this chapter, we show an overview about what Opinion Mining isand give some approaches about how to do it. Also, we distinguish and discuss fourresources from where opinions can be extracted from, analyzing in each case themain issues that could alter the mining process. One last interesting topic related toWOM and discussed in this chapter is the summarization and visualization of theWOM results. We consider these techniques to be important because they offer a realchance to understand and find a real value for a huge set of heterogeneous opinionscollected. Finally, having given enough conceptual background, a practical exampleis presented using Twitter as a platform for Web Opinion Mining. Results show howan opinion is spread through the network and describes how users influence each other.
Article
Full-text available
With the advent of Web 2.0, people became more eager to express and share their opinions on web regarding day-to-day activities and global issues as well. Evolution of social media has also contributed immensely to these activities, thereby providing us a transparent platform to share views across the world. These electronic Word of Mouth (eWOM) statements expressed on the web are much prevalent in business and service industry to enable customer to share his/her point of view. In the last one and half decades, research communities, academia, public and service industries are working rigorously on sentiment analysis, also known as, opinion mining, to extract and analyze public mood and views. In this regard, this paper presents a rigorous survey on sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis. Several sub-tasks need to be performed for sentiment analysis which in turn can be accomplished using various approaches and techniques. This survey covering published literature during 2002-2015, is organized on the basis of sub-tasks to be performed, machine learning and natural language processing techniques used and applications of sentiment analysis. The paper also presents open issues and along with a summary table of a hundred and sixty-one articles.
Article
Full-text available
With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.
Article
Full-text available
One of the greatest challenges in speech technology is estimating the speaker's emotion. Most of the existing approaches concentrate either on audio or text features. In this work, we propose a novel approach for emotion classification of audio conversation based on both speech and text. The novelty in this approach is in the choice of features and the generation of a single feature vector for classification. Our main intention is to increase the accuracy of emotion classification of speech by considering both audio and text features. In this work we use standard methods such as Natural Language Processing, Support Vector Machines, WordNet Affect and SentiWordNet. The dataset for this work have been taken from Semval -2007 and eNTERFACE’05 EMOTION Database.
Article
Full-text available
Sentiment analysis research has been increasing tremendously in recent times due to the wide range of business and social applications. Sentiment analysis from unstructured natural language text has recently received considerable attention from the research community. In this paper, we propose a novel sentiment analysis model based on common-sense knowledge extracted from ConceptNet based ontology and context information. ConceptNet based ontology is used to determine the domain specific concepts which in turn produced the domain specific important features. Further, the polarities of the extracted concepts are determined using the contextual polarity lexicon which we developed by considering the context information of a word. Finally, semantic orientations of domain specific features of the review document are aggregated based on the importance of a feature with respect to the domain. The importance of the feature is determined by the depth of the feature in the ontology. Experimental results show the effectiveness of the proposed methods.
Article
Full-text available
Sentiment analysis involves classifying opinions in text into categories like "positive" or "negative". One of approaches used to make sentiment classification is using sentiment lexicon. This paper aims to build a sentiment lexicon which is domain independent. We propose a Machine Learning Based Senti-word Lexicon (MLBSL) based on the Amazon data set which contains reviews from different domains. Our proposed MLBSL yields an improvement over previous published manual and automatic-built lexicons like SentiWordNet. We also provide an improvement in calculation method used in reviews sentiment analysis.
Article
Full-text available
Sentiment analysis on Twitter has attracted much attention recently due to its wide applications in both, commercial and public sectors. In this paper we present SentiCircles, a lexicon-based approach for sentiment analysis on Twitter. Different from typical lexicon-based approaches, which offer a fixed and static prior sentiment polarities of words regardless of their context, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly. Our approach allows for the detection of sentiment at both entity-level and tweet-level. We evaluate our proposed approach on three Twitter datasets using three different sentiment lexicons to derive word prior sentiments. Results show that our approach significantly outperforms the baselines in accuracy and F-measure for entity-level subjectivity (neutral vs. polar) and polarity (positive vs. negative) detections. For tweet-level sentiment detection, our approach performs better than the state-of-the-art SentiStrength by 4–5% in accuracy in two datasets, but falls marginally behind by 1% in F-measure in the third dataset.
Article
Full-text available
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
Conference Paper
Full-text available
Sentiment classification concerns the use of automatic methods for predicting the orientation of subjective content on text documents, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is an opinion lexicon derived from the WordNet database where each term is associated with numerical scores indicating positive and negative sentiment information. This research presents the results of applying the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. Our approach comprises counting positive and negative term scores to determine sentiment orientation, and an improvement is presented by building a data set of relevant features using SentiWordNet as source, and applied to a machine learning classifier. We find that results obtained with SentiWordNet are in line with similar approaches using manual lexicons seen in the literature. In addition, our feature set approach yielded improvements over the baseline term counting method. The results indicate SentiWordNet could be used as an important resource for sentiment classification tasks. Additional considerations are made on possible further improvements to the method and its use in conjunction with other techniques.
Article
Full-text available
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twitter sentiment analysis, which is the task of automatically identifying and extracting subjective information from tweets, has received increasing attention from the Web mining community. Twitter provides an extremely valuable insight into human opinions, as well as new challenging Big Data problems. These problems include the processing of massive volumes of streaming data, as well as the automatic identification of human expressiveness within short text messages. In that area, several methods and lexical resources have been proposed in order to extract sentiment indicators from natural language texts at both syntactic and semantic levels. These approaches address different dimensions of opinions, such as subjectivity, polarity, intensity and emotion. This article is the first study of how these resources, which are focused on different sentiment scopes, complement each other. With this purpose we identify scenarios in which some of these resources are more useful than others. Furthermore, we propose a novel approach for sentiment classification based on meta-level features. This supervised approach boosts existing sentiment classification of subjectivity and polarity detection on Twitter. Our results show that the combination of meta-level features provides significant improvements in performance. However, we observe that there are important differences that rely on the type of lexical resource, the dataset used to build the model, and the learning strategy. Experimental results indicate that manually generated lexicons are focused on emotional words, being very useful for polarity prediction. On the other hand, lexicons generated with automatic methods include neutral words, introducing noise in the detection of subjectivity. Our findings indicate that polarity and subjectivity prediction are different dimensions of the same problem, but they need to be addressed using different subspace features. Lexicon-based approaches are recommendable for polarity, and stylistic part-of-speech based approaches are meaningful for subjectivity. With this research we offer a more global insight of the resource components for the complex task of classifying human emotion and opinion.
Article
Full-text available
With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional techniques including unsupervised and supervised approaches.
Article
Full-text available
This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.
Article
Full-text available
Sentiment Analysis (SA) is an ongoing field of research in text mining field. SA is the computational treatment of opinions, sentiments and subjectivity of text. This survey paper tackles a comprehensive overview of the last update in this field. Many recently proposed algorithms' enhancements and various SA applications are investigated and presented briefly in this survey. These articles are categorized according to their contributions in the various SA techniques. The related fields to SA (transfer learning, emotion detection, and building resources) that attracted researchers recently are discussed. The main target of this survey is to give nearly full image of SA techniques and the related fields with brief details. The main contributions of this paper include the sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the sentiment analysis and its related areas.
Conference Paper
Full-text available
A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this paper, we aim to evaluate the scalability of Naïve Bayes classifier (NBC) in large datasets. Instead of using a standard library (e.g., Mahout), we implemented NBC to achieve fine-grain control of the analysis procedure. A Big Data analyzing system is also design for this study. The result is encouraging in that the accuracy of NBC is improved and approaches 82% when the dataset size increases. We have demonstrated that NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput.
Conference Paper
Full-text available
Sentiment analysis is a text classification task where the goal is to determine the polarity (positive or negative) of the opinion expressed in a document. This task is typically addressed using machine learning tools, based on the standard bag-of-words description of the documents; the high dimensionality of these features makes feature selection an important step in this class of problems. This paper reports an extensive comparative study of feature selection (FS) methods in sentiment analysis, using two standard classifiers: naive Bayes (NB) and support vector machines (SVM). Furthermore, a new weighted SVM (WSVM) is proposed, where the features are weighted using the scores of a feature selection method. The proposed WSVM is shown to achieve better performance in the sentiment analysis task than the standard SVM, especially when the weighting is done using the mutual information feature scores.
Article
Full-text available
Twitter has become one of the most popular micro-blogging platform recently. Millions of users can share their thoughts and opinions about different aspects and events on the micro-blogging platform. Therefore, Twitter is considered as a rich source of information for decision making and sentiment analysis. Sentiment analysis refers to a classification problem where the main focus is to predict the polarity of words and then classify them into positive and negative feelings with the aim of identifying attitude and opinions that are expressed in any form or language. Sentiment analysis over Twitter offers organisations a fast and effective way to monitor the publics' feelings towards their brand, business, directors, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. The primary issues in previous techniques are classification accuracy, data sparsity and sarcasm, as they incorrectly classify most of the tweets with a very high percentage of tweets incorrectly classified as neutral. This research paper focuses on these problems and presents an algorithm for twitter feeds classification based on a hybrid approach. The proposed method includes various pre-processing steps before feeding the text to the classifier. Experimental results show that the proposed technique overcomes the previous limitations and achieves higher accuracy when compared to similar techniques.
Article
Full-text available
Sentiment classification concerns the use of automatic methods for predicting the orientation of subjective content on text documents, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is an opinion lexicon derived from the WordNet database where each term is associated with numerical scores indicating positive and negative sentiment information. This research presents the results of applying the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. Our approach comprises counting positive and negative term scores to determine sentiment orientation, and an improvement is presented by building a data set of relevant features using SentiWordNet as source, and applied to a machine learning classifier. We find that results obtained with SentiWordNet are in line with similar approaches using manual lexicons seen in the literature. In addition, our feature set approach yielded improvements over the baseline term counting method. The results indicate SentiWordNet could be used as an important resource for sentiment classification tasks. Additional considerations are made on possible further improvements to the method and its use in conjunction with other techniques.
Conference Paper
Full-text available
Sentiment analysis aims to automatically estimate the sentiment in a given text as positive or negative and has been an active area of research in recent years. Polarity lexicons, often used in sentiment analysis, indicate how positive or negative each term in the lexicon is. However, since creating domain-specific polarity lexicons is expensive and time-consuming, researchers often use a general purpose or domain-independent lexicon. In this work, we address the problem of adapting a general purpose polarity lexicon to a specific domain to better estimate the polarity of a set of reviews in that domain. We experimented with two sets of reviews from the hotel and movie domains and observed that while our adaptation techniques changed the polarity values for only a small set of words, the overall accuracy increases were significant: 77% to 83% in the hotel dataset with 3000 reviews and 61% to 66% in the movie dataset, with 1000 reviews.
Article
Full-text available
In this paper, we present the anatomy of pSenti --- a concept-level sentiment analysis system that seamlessly integrates into opinion mining lexicon-based and learning-based approaches. Compared with pure lexicon-based systems, it achieves significantly higher accuracy in sentiment polarity classification as well as sentiment strength detection. Compared with pure learning-based systems, it offers more structured and readable results with aspect-oriented explanation and justification, while being less sensitive to the writing style of text. Our extensive experiments on two real-world datasets (CNET software reviews and IMDB movie reviews) confirm the superiority of the proposed hybrid approach over state-of-the-art systems like SentiStrength.
Conference Paper
Full-text available
Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labeling of documents. Our results show that the new method is highly effective and promising.
Article
Transfer learning focuses on leveraging the knowledge in source domains to complete the learning tasks in target domains, where the data distributions of the source and target domains are related but different in accordance with original features. To tackle the challenge of different data distributions, previous methods mine the high-level concepts (e.g., feature clusters) from original features, which shows to be suitable for the classification. The general strategies of the previous approaches are to utilize the identical concepts, the synonymous concepts or both of them as shared concepts to establish the bridge between the source and target domains. Besides the shared concepts, some methods use the different concepts for training model. Specifically, these methods assume that the identical concepts (e.g., feature clusters) in different domains can be mapped to the same example classes. However, some ambiguous concepts may exist in different domains and result in misleading classification in the target domains. Therefore, we need a general transfer learning framework, which can exploit four kinds of concepts including the identical concepts, the synonymous concepts, the different concepts and the ambiguous concepts simultaneously, for cross-domain classification.In this paper, we present a novel method, Quadruple Transfer Learning (QTL), which models these four kinds of concepts together to fit different situations on the data distributions. In addition, an iterative algorithm with convergence guarantee based on non-negative matrix tri-factorization techniques is presented to solve the optimization problem. Finally, systematic experiments demonstrate that QTL is more effective than all the compared baselines.
Article
Collective opinions observed in Social Media represent valuable information for a range of applications. On the pursuit of such information, current methods require a prior knowledge of each individual opinion to determine the collective one in a post collection. Differently, we assume that collective analysis could be better performed when exploiting overlaps among distinct posts of the collection. Thus, we propose SACI (Sentiment Analysis by Collective Inspection), a lexicon-based unsupervised method that extracts collective sentiments without concerning with individual classifications. SACI is based on a directed transition graph among terms of a post set and on a prior classification of these terms regarding their roles in consolidating opinions. Paths represent subsets of posts on this graph and the collective opinion is defined by traversing all paths. Besides demonstrating that collective analysis outperforms individual one w.r.t. approximating collection opinions, assessments on SACI show that good individual classifications do not guarantee good collective analysis and vice-versa. Further, SACI fulfills simultaneously requirements of efficacy, efficiency and handle of dynamicity posed by high demanding scenarios. Indeed, the consolidation of a SACI-based Web tool for real-time analysis of tweets evinces the usefulness of this work.
Article
In recent years, research in sentiment classification has received considerable attention by natural language processing researchers. Annotated sentiment corpora are the most important resources used in sentiment classification. However, since most recent research works in this field have focused on the English language, there are accordingly not enough annotated sentiment resources in other languages. Manual construction of reliable annotated sentiment corpora for a new language is a labour-intensive and time-consuming task. Projection of sentiment corpus from one language into another language is a natural solution used in cross-lingual sentiment classification. Automatic machine translation services are the most commonly tools used to directly project information from one language into another. However, since term distribution across languages may be different due to variations in linguistic terms and writing styles, cross-lingual methods cannot reach the performance of monolingual methods. In this paper, a novel learning model is proposed based on the combination of uncertainty-based active learning and semi-supervised self-training approaches to incorporate unlabelled sentiment documents from the target language in order to improve the performance of cross-lingual methods. Further, in this model, the density measures of unlabelled examples are considered in active learning part in order to avoid outlier selection. The empirical evaluation on book review datasets in three different languages shows that the proposed model can significantly improve the performance of cross-lingual sentiment classification in comparison with other existing and baseline methods.
Article
Recent research indicates that a sentiment lexicon focusing on a specific domain leads to better sentiment analyses compared to a general-purpose sentiment lexicon, such as Senti-WordNet. In spite of this potential improvement, the cost of building a domain-specific sentiment lexicon hinders its wider and more practical applications. To compensate for this difficulty, we propose extracting a sentiment lexicon from a domain-specific corpus by annotating an intelligently selected subset of documents in the corpus. Specifically, the subset is selected by an active learner with initializations from diverse text analytics, i.e. latent Dirichlet allocation and our proposed lexicon coverage algorithm. This active learning produces a better domain-specific sentiment lexicon which results in a higher accuracy of the sentiment classification. Subsequently, we evaluate extracted sentiment lexicons by observing 1) the increased F1 measure in sentiment classifications and 2) the increased similarity to the sentiment lexicon with the full annotation. We expect that this contribution will enable more accurate sentiment classification by domain-specific sentiment lexicons with less sentiment tagging efforts.
Article
The existing senti-lexicon does not sufficiently accommodate the sentiment word that is used in the restaurant review. Therefore, this thesis proposes a new senti-lexicon for the sentiment analysis of restaurant reviews. When classifying a review document as a positive sentiment and as a negative sentiment using the supervised learning algorithm, there is a tendency for the positive classification accuracy to appear up to approximately 10% higher than the negative classification accuracy. This creates a problem of decreasing the average accuracy when the accuracies of the two classes are expressed as an average value. In order to mitigate such problem, an improved Naïve Bayes algorithm is proposed. The result of the experiment showed that when this algorithm was used and a unigrams + bigrams was used as the feature, the gap between the positive accuracy and the negative accuracy was narrowed to 3.6% compared to when the original Naïve Bayes was used, and that the 28.5% gap was able to be narrowed compared to when SVM was used. Additionally, the use of this algorithm based on the senti-lexicon showed an accuracy that improved by a maximum of 10.2% in recall and a maximum of 26.2% in precision compared to when SVM was used, and by a maximum of 5.6% in recall and a maximum of 1.9% in precision compared to when Naïve Bayes was used.
Article
Studying the relationship between public sentiment and stock prices has been the focus of several studies. This paper analyzes whether the sentiment expressed in Twitter feeds, which discuss selected companies and their products, can indicate their stock price changes. To address this problem, an active learning approach was developed and applied to sentiment analysis of tweet streams in the stock market domain. The paper first presents a static Twitter data analysis problem, explored in order to determine the best Twitter-specific text preprocessing setting for training the Support Vector Machine (SVM) sentiment classifier. In the static setting, the Granger causality test shows that sentiments in stock-related tweets can be used as indicators of stock price movements a few days in advance, where improved results were achieved by adapting the SVM classifier to categorize Twitter posts into three sentiment categories of positive, negative and neutral (instead of positive and negative only). These findings were adopted in the development of a new stream-based active learning approach to sentiment analysis, applicable in incremental learning from continuously changing financial tweet streams. To this end, a series of experiments was conducted to determine the best querying strategy for active learning of the SVM classifier adapted to sentiment analysis of financial tweet streams. The experiments in analyzing stock market sentiments of a particular company show that changes in positive sentiment probability can be used as indicators of the changes in stock closing prices.
Conference Paper
We determine the subjectivity of word senses. To avoid costly annotation, we evaluate how useful existing resources established in opinion mining are for this task. We show that results achieved with existing resources that are not tailored towards word sense subjectivity classification can rival results achieved with supervision on a manually annotated training set. However, results with different resources vary substantially and are dependent on the different definitions of subjectivity used in the establishment of the resources.
Article
Emotions play a key role in natural language understanding and sensemaking. Pure machine learning usually fails to recognize and interpret emotions in text. The need for knowledge bases that give access to semantics and sentics (the conceptual and affective information) associated with natural language is growing exponentially in the context of big social data analysis. To this end, this paper proposes EmoSenticSpace, a new framework for affective common-sense reasoning that extends WordNet-Affect and SenticNet by providing both emotion labels and polarity scores for a large set of natural language concepts. The framework is built by means of fuzzy c-means clustering and support-vector-machine classification, and takes into account different similarity measures, such as point-wise mutual information and emotional affinity. EmoSenticSpace was tested on three emotion-related natural language processing tasks, namely sentiment analysis, emotion recognition, and personality detection. In all cases, the proposed framework outperforms the state of the art. In particular, the direct evaluation of EmoSenticSpace against the psychological features provided in the ISEAR dataset shows a 92.15% agreement.
Article
This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.
Article
The Web is evolving through an era where the opinions of users are getting increasingly important and valuable. The distillation of knowledge from the huge amount of unstructured information on the Web can be a key factor for tasks such as social media marketing, branding, product positioning, and corporate reputation management. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions involves a deep understanding of natural language text by machines, from which we are still very far. To this end, concept-level sentiment analysis aims to go beyond a mere word-level analysis of text and provide novel approaches to opinion mining and sentiment analysis that enable a more efficient passage from (unstructured) textual information to (structured) machine-processable data. A recent knowledge-based technology in this context is sentic computing, which relies on the ensemble application of common-sense computing and the psychology of emotions to infer the conceptual and affective information associated with natural language. Sentic computing, however, is limited by the richness of the knowledge base and by the fact that the bag-of-concepts model, despite more sophisticated than bag-of-words, misses out important discourse structure information that is key for properly detecting the polarity conveyed by natural language opinions. In this work, we introduce a novel paradigm to concept-level sentiment analysis that merges linguistics, common-sense computing, and machine learning for improving the accuracy of tasks such as polarity detection. By allowing sentiments to flow from concept to concept based on the dependency relation of the input sentence, in particular, we achieve a better understanding of the contextual role of each concept within the sentence and, hence, obtain a polarity detection engine that outperforms state-of-the-art statistical methods.
Article
With the rapid growth of data generated by social web applications new paradigms in the generation of knowledge are opening. This paper introduces Crowd Explicit Sentiment Analysis (CESA) as an approach for sentiment analysis in social media environments. Similar to Explicit Semantic Analysis, microblog posts are indexed by a predefined collection of documents. In CESA, these documents are built up from common emotional expressions in social streams. In this way, texts are projected to feelings or emotions. This process is performed within a Latent Semantic Analysis. A few simple regular expressions (e.g. “I feel X“, considering X a term representing an emotion or feeling) are used to scratch the enormous flow of micro-blog posts to generate a textual representation of an emotional state with clear polarity value (e.g. angry, happy, sad, confident…). In this way, new posts can be indexed by these feelings according to the distance to their textual representation. The approach is suitable in many scenarios dealing with social media publications and can be implemented in other languages with little effort. In particular, we have evaluated the system on Polarity Classification with both English and Spanish data sets. The results show that CESA is a valid solution for sentiment analysis and that similar approaches for model building from the continuous flow of posts could be exploited in other scenarios.
Article
To improve the performance of word-of-mouth sentiment classification, this article reevaluates objective sentiment words in the SentiWordNet sentiment lexicon.