Notations used in this paper

Optimized Support Vector Machine with Particle Swarm Optimization to Improve the Accuracy Amazon Sentiment Analysis Classification

Article

Full-text available

Feb 2024

Text mining is a valuable technique that empowers users to gain a deeper understanding of existing textual data, ultimately allowing them to make more informed decisions. One important application of text mining is in the field of sentiment analysis, which has gained significant traction among companies aiming to understand how customers perceive their products and services. In response to this growing need, various research efforts have been made to improve the accuracy of sentiment analysis classification models. The purpose of this article is to discuss a specific approach using the Support Vector Machine (SVM) algorithm, which is often used in machine learning for text classification tasks and then combined with the application of Particle Swarm Optimization (PSO), which optimizes the SVM model parameters to achieve the best classification results. This dynamic combination not only improves accuracy but also enhances the model's ability to efficiently handle large amounts of text data to achieve better results. The research findings highlight the effectiveness of this approach. The application of the SVM algorithm with PSO resulted in an outstanding accuracy performance of 94.92%. The substantial increase in accuracy compared to previous studies shows the promising potential of this methodology. This proves that the SVM algorithm model approach with Particle Swarm Optimization provides good performance.

A Comprehensive Overview of Recommender System and Sentiment Analysis

Preprint

Full-text available

Sep 2021

Recommender system has been proven to be significantly crucial in many fields and is widely used by various domains. Most of the conventional recommender systems rely on the numeric rating given by a user to reflect his opinion about a consumed item; however, these ratings are not available in many domains. As a result, a new source of information represented by the user-generated reviews is incorporated in the recommendation process to compensate for the lack of these ratings. The reviews contain prosperous and numerous information related to the whole item or a specific feature that can be extracted using the sentiment analysis field. This paper gives a comprehensive overview to help researchers who aim to work with recommender system and sentiment analysis. It includes a background of the recommender system concept, including phases, approaches, and performance metrics used in recommender systems. Then, it discusses the sentiment analysis concept and highlights the main points in the sentiment analysis, including level, approaches, and focuses on aspect-based sentiment analysis.

Qualitative Big Data’s Challenges and Solutions: An Organizing Review

Conference Paper

Full-text available

Jan 2021

Sampsa Suvivuo

Digitalization of everyday lives has tremendously increased the amount of digital (trace) data of people’s behaviour available for researchers. However, traditional qualitative research methods struggle with the width and breadth of the data. This paper reviewed 61 recent studies that had utilized qualitative big data for the practical challenges they had encountered and how they were addressed. While quantitative and qualitative big data share many common issues, the review points at that lack of qualitative methods and dataset reduction required by algorithms in big data research decreases the richness of the qualitative data. Locating relevant data and reducing noise are further challenges. Currently, these challenges can be only partially addressed with a combination of human and computer pattern recognition and crowdsourcing. The review describes many “tricks of the trade” but abduction research and pragmatist philosophy seem promising starting places for a more pervasive framework.

Unsupervised Semantic Approach of Aspect-Based Sentiment Analysis for Large-Scale User Reviews

Article

Full-text available

Dec 2020

Aspect-based sentiment analysis (ABSA) has recently attracted increasing attention due to its extensive applications. Most of the existing ABSA methods been applied on small-sized labeled datasets. However, real datasets such as the Amazon and TripAdvisor contain a massive number of reviews. Thus, applying these methods on large-scale datasets may produce inefficient results. Furthermore, these existing methods extract huge number of aspects, most of which are not relevant to the domain of interest. But, on other hand, some of the infrequent relevant aspects are excluded during the extraction process. These limitations negatively affect the performance of the ABSA process. This article, therefore, aims to overcome such limitations by proposing an efficient approach that is suitable for real large-scale unlabeled datasets. The proposed approach is a combination of hybridizing a frequency-based approach (word level) and a syntactic-relation based approach (sentence level). It was enhanced further with a semantic similarity-based approach to extract aspects that are relevant to the domain, even terms (related to the aspects) are not frequently mentioned in the reviews. The extracted aspects according to the proposed approach are used to generate a total review sentiment score after estimating the weight and the rating of each extracted aspect mentioned in the review. The assignment of the weight of each extracted aspect is calculated based on a modified TF-IDF weighting scheme and the assignment of the aspect rating is calculated based on a domain-specific lexicon. Effectiveness of the extracted aspects is evaluated against two baselines available from existing literature: fixed aspect and extracted aspects. Evaluation was also performed by using a general lexicon and a domain-specific lexicon. Results in terms of F-measure and accuracy on Amazon and Yelp datasets show that the extracted aspects using the proposed approach with the domain-specific lexicon outperformed all the baselines. INDEX TERMS Aspect, core terms, aspect extraction, aspect weight, aspect rating, domain-specific lexicon, total review score, real large-scale dataset.

Studying social media sentiment using human validated analysis

Article

Full-text available

Mar 2020

The measurement of online sentiment is a developing field in social science and big data research. The methodology from this study provides an analysis of online sentiment using a unique combination of NLP and human validation techniques in order to create net sentiment scores and categorise topics of online conversation. The study focused on measuring the online sentiment of South Africa's major banks (covering almost the entire retail banking industry) over a 12-month period. Through this methodology, firms are able to track shifts in online sentiment (including extreme firestorms) as well as to monitor relevant conversation topics. To date, no published methodology combines the use of big data NLP and human validation in such a structured way. • Microsampling for manual validation of sentiment analysis (both qualitative and quantitative approaches in order to obtain the most accurate results) • Sentiment measurement • Sentiment map

An experimental study on the performance of collaborative filtering based on user reviews for large-scale datasets

Article

Full-text available

Aug 2023

Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.

Subject-Object Aspect-Based Sentiment Analysis Model Based on News Texts

Conference Paper

May 2022

Ontology based sentiment analysis for fake review detection

Article

Jun 2022
EXPERT SYST APPL

Majority of customers and manufacturers who tend to purchase and trade via e-commerce websites primarily rely on reviews before making purchasing decisions and product improvements. Deceptive reviewers consider this opportunity to write fake reviews to mislead customers and manufacturers. This calls for the necessity of identifying fake reviews before making them available for decision making. Accordingly, this research focuses on a fake review detection method that incorporates review-related features including linguistic features, Part-of-Speech (POS) features, and sentiment analysis features. A domain feature ontology is used in the feature-level sentiment analysis and all the review-related features are extracted and integrated into the ontology. The fake review detection is enhanced through a rule-based classifier by inferencing the ontology. Due to the lack of a labeled dataset for model training, the Mahalanobis distance method was used to detect outliers from an unlabeled dataset where the outliers were selected as fake reviews for model training. The performance measures of the rule-based classifier were improved by integrating linguistic features, POS feature,s and sentiment analysis features, in spite of considering them separately.

Library adoption in public software repositories

Article

Full-text available

May 2019

We study the the spread and adoption of libraries within Python projects hosted in public software repositories on GitHub. By modelling the use of Git pull, merge, commit, and other actions as deliberate cognitive activities, we are able to better understand the dynamics of what happens when users adopt new and cognitively demanding information. For this task we introduce a large corpus containing all commits, diffs, messages, and source code from 259,690 Python repositories (about 13% of all Python projects on Github), including all Git activity data from 89,311 contributing users. In this initial work we ask two primary questions: (1) What kind of behavior change occurs near an adoption event? (2) Can we model future adoption activity of a user? Using a fine-grained analysis of user behavior, we show that library adoptions are followed by higher than normal activity within the first 6 h, implying that a higher than normal cognitive effort is involved with an adoption. Further study is needed to understand the specific types of events that surround the adoption of new information, and the cause of these dynamics. We also show that a simple linear model is capable of classifying future commits as being an adoption or not, based on the commit contents and the preceding history of the user and repository. Additional work in this vein may be able to predict the content of future commits, or suggest new libraries to users.

Notations used in this paper

Context in source publication

Citations