Table 1 - available via license: CC BY
Content may be subject to copyright.
Notations used in this paper

Notations used in this paper

Source publication
Article
Full-text available
Abstract This study represents an efficient method for extracting product aspects from customer reviews and give solutions for inferring aspect ratings and aspect weights. Aspect ratings often reflect the user’s satisfaction on aspects of a product and aspect weights reflect the degree of importance of the aspects posed by the user. These tasks the...

Context in source publication

Context 1
... notations used throughout the paper are given in Table 1. ...

Citations

... In the same way that word of mouth is heard offline, online reviews can help consumers get more accurate information about their products [3]. Specification ratings often reflect user satisfaction with aspects of the product [4], Buyer reviews will generally be a measure for other customers [5] to consider purchasing on the platform. These reviews contain both positive and negative sentiments [6]. ...
Article
Full-text available
Text mining is a valuable technique that empowers users to gain a deeper understanding of existing textual data, ultimately allowing them to make more informed decisions. One important application of text mining is in the field of sentiment analysis, which has gained significant traction among companies aiming to understand how customers perceive their products and services. In response to this growing need, various research efforts have been made to improve the accuracy of sentiment analysis classification models. The purpose of this article is to discuss a specific approach using the Support Vector Machine (SVM) algorithm, which is often used in machine learning for text classification tasks and then combined with the application of Particle Swarm Optimization (PSO), which optimizes the SVM model parameters to achieve the best classification results. This dynamic combination not only improves accuracy but also enhances the model's ability to efficiently handle large amounts of text data to achieve better results. The research findings highlight the effectiveness of this approach. The application of the SVM algorithm with PSO resulted in an outstanding accuracy performance of 94.92%. The substantial increase in accuracy compared to previous studies shows the promising potential of this methodology. This proves that the SVM algorithm model approach with Particle Swarm Optimization provides good performance.
... The topic model-based method has two main limitations (Ngoc et al. 2019;Schouten&Frasincar 2016). First, it's a restriction of use in real-life SA applications because the method will not achieve reasonable and efficient results if the data size is small. ...
Preprint
Full-text available
Recommender system has been proven to be significantly crucial in many fields and is widely used by various domains. Most of the conventional recommender systems rely on the numeric rating given by a user to reflect his opinion about a consumed item; however, these ratings are not available in many domains. As a result, a new source of information represented by the user-generated reviews is incorporated in the recommendation process to compensate for the lack of these ratings. The reviews contain prosperous and numerous information related to the whole item or a specific feature that can be extracted using the sentiment analysis field. This paper gives a comprehensive overview to help researchers who aim to work with recommender system and sentiment analysis. It includes a background of the recommender system concept, including phases, approaches, and performance metrics used in recommender systems. Then, it discusses the sentiment analysis concept and highlights the main points in the sentiment analysis, including level, approaches, and focuses on aspect-based sentiment analysis.
... For example, a petition might be unreasonable to begin with and this is something the algorithms cannot account for [38]. The unsupervised tools often focus on term frequencies and might miss less frequent terms meaning that the result is not representative of the whole content but a picture of what is popular [41,62]. This was solved in [41] by categorizing content according to its similarities and then sampling it. ...
... Another study [49] managed to narrow the 1,25 million results for one keyword to 131 759 blog posts for all keywords by using a specialized search engine. When key terms do not occur simultaneously, arranging terms under topics can help to find relevant documents [56,62]. Although a lot of research on online health communities (OHC) has used a manual content analysis approach, the approach becomes quickly unfeasible when the volume increases. ...
... Arrange terms under topics and use the topics to find relevant documents [56,59,62]. ...
Conference Paper
Full-text available
Digitalization of everyday lives has tremendously increased the amount of digital (trace) data of people’s behaviour available for researchers. However, traditional qualitative research methods struggle with the width and breadth of the data. This paper reviewed 61 recent studies that had utilized qualitative big data for the practical challenges they had encountered and how they were addressed. While quantitative and qualitative big data share many common issues, the review points at that lack of qualitative methods and dataset reduction required by algorithms in big data research decreases the richness of the qualitative data. Locating relevant data and reducing noise are further challenges. Currently, these challenges can be only partially addressed with a combination of human and computer pattern recognition and crowdsourcing. The review describes many “tricks of the trade” but abduction research and pragmatist philosophy seem promising starting places for a more pervasive framework.
... • Estimating aspects' weight For estimating an aspect's weight task, three weighting methods are explored and the one that will give the best performance for the ABSA process will be chosen. The methods are the conventional Term Frequency-Inverse Document Frequency (TF-IDF), and two modified TF-IDF weighting schemes proposed by Zhu et al. [23] and Ngoc et al. [2]. ...
... • Calculating total review score For calculating total review score task, an algorithm is proposed to calculate the total review sentiment score based on the work of [2], [4], [25]. The algorithm takes the results of the previous three tasks as inputs (i.e., the extracted aspects with the core terms, aspects' weights, and the domain-specific lexicon for calculating the aspects' ratings). ...
... The topic model-based method has two main limitations [2], [21]. First, it's restriction of use in real-life sentiment analysis applications, because the method will not achieve reasonable and efficient results if the size of the data is small. ...
Article
Full-text available
Aspect-based sentiment analysis (ABSA) has recently attracted increasing attention due to its extensive applications. Most of the existing ABSA methods been applied on small-sized labeled datasets. However, real datasets such as the Amazon and TripAdvisor contain a massive number of reviews. Thus, applying these methods on large-scale datasets may produce inefficient results. Furthermore, these existing methods extract huge number of aspects, most of which are not relevant to the domain of interest. But, on other hand, some of the infrequent relevant aspects are excluded during the extraction process. These limitations negatively affect the performance of the ABSA process. This article, therefore, aims to overcome such limitations by proposing an efficient approach that is suitable for real large-scale unlabeled datasets. The proposed approach is a combination of hybridizing a frequency-based approach (word level) and a syntactic-relation based approach (sentence level). It was enhanced further with a semantic similarity-based approach to extract aspects that are relevant to the domain, even terms (related to the aspects) are not frequently mentioned in the reviews. The extracted aspects according to the proposed approach are used to generate a total review sentiment score after estimating the weight and the rating of each extracted aspect mentioned in the review. The assignment of the weight of each extracted aspect is calculated based on a modified TF-IDF weighting scheme and the assignment of the aspect rating is calculated based on a domain-specific lexicon. Effectiveness of the extracted aspects is evaluated against two baselines available from existing literature: fixed aspect and extracted aspects. Evaluation was also performed by using a general lexicon and a domain-specific lexicon. Results in terms of F-measure and accuracy on Amazon and Yelp datasets show that the extracted aspects using the proposed approach with the domain-specific lexicon outperformed all the baselines. INDEX TERMS Aspect, core terms, aspect extraction, aspect weight, aspect rating, domain-specific lexicon, total review score, real large-scale dataset.
... Due to the nuances of human conversation, certain entries could not be accurately processed by artificial intelligence; thus human insight was required in order to ensure accuracy (Ghiassi et al., 2013;Turney 2002;Pang et al., 2002). Entries were tagged with their sentiment, finding whether positive or negative emotions evoked from it (Bifet & Frank, 2010;Agarwal et al., 2011;Nguyen Thi Ngoc et al., 2019.) This process of including human involvement is the process of manual validation, whereby a sub-sample of the data was analysed by verified human contributors (micro-sampling for manual validation), which will be further discussed in the next section. ...
Article
Full-text available
The measurement of online sentiment is a developing field in social science and big data research. The methodology from this study provides an analysis of online sentiment using a unique combination of NLP and human validation techniques in order to create net sentiment scores and categorise topics of online conversation. The study focused on measuring the online sentiment of South Africa's major banks (covering almost the entire retail banking industry) over a 12-month period. Through this methodology, firms are able to track shifts in online sentiment (including extreme firestorms) as well as to monitor relevant conversation topics. To date, no published methodology combines the use of big data NLP and human validation in such a structured way. • Microsampling for manual validation of sentiment analysis (both qualitative and quantitative approaches in order to obtain the most accurate results) • Sentiment measurement • Sentiment map
Article
Full-text available
Collaborative filtering (CF) approaches generate user recommendations based on user similarities. These similarities are calculated based on the overall (explicit) user ratings. However, in some domains, such ratings may be sparse or unavailable. User reviews can play a significant role in such cases, as implicit ratings can be derived from the reviews using sentiment analysis, a natural language processing technique. However, most current studies calculate the implicit ratings by simply aggregating the scores of all sentiment words appearing in reviews and, thus, ignoring the elements of sentiment degrees and aspects of user reviews. This study addresses this issue by calculating the implicit rating differently, leveraging the rich information in user reviews by using both sentiment words and aspect–sentiment word pairs to enhance the CF performance. It proposes four methods to calculate the implicit ratings on large-scale datasets: the first considers the degree of sentiment words, while the second exploits the aspects by extracting aspect-sentiment word pairs to calculate the implicit ratings. The remaining two methods combine explicit ratings with the implicit ratings generated by the first two methods. The generated ratings are then incorporated into different CF rating prediction algorithms to evaluate their effectiveness in enhancing the CF performance. Evaluative experiments of the proposed methods are conducted on two large-scale datasets: Amazon and Yelp. Results of the experiments show that the proposed ratings improved the accuracy of CF rating prediction algorithms and outperformed the explicit ratings in terms of three predictive accuracy metrics.
Article
Majority of customers and manufacturers who tend to purchase and trade via e-commerce websites primarily rely on reviews before making purchasing decisions and product improvements. Deceptive reviewers consider this opportunity to write fake reviews to mislead customers and manufacturers. This calls for the necessity of identifying fake reviews before making them available for decision making. Accordingly, this research focuses on a fake review detection method that incorporates review-related features including linguistic features, Part-of-Speech (POS) features, and sentiment analysis features. A domain feature ontology is used in the feature-level sentiment analysis and all the review-related features are extracted and integrated into the ontology. The fake review detection is enhanced through a rule-based classifier by inferencing the ontology. Due to the lack of a labeled dataset for model training, the Mahalanobis distance method was used to detect outliers from an unlabeled dataset where the outliers were selected as fake reviews for model training. The performance measures of the rule-based classifier were improved by integrating linguistic features, POS feature,s and sentiment analysis features, in spite of considering them separately.
Article
Full-text available
We study the the spread and adoption of libraries within Python projects hosted in public software repositories on GitHub. By modelling the use of Git pull, merge, commit, and other actions as deliberate cognitive activities, we are able to better understand the dynamics of what happens when users adopt new and cognitively demanding information. For this task we introduce a large corpus containing all commits, diffs, messages, and source code from 259,690 Python repositories (about 13% of all Python projects on Github), including all Git activity data from 89,311 contributing users. In this initial work we ask two primary questions: (1) What kind of behavior change occurs near an adoption event? (2) Can we model future adoption activity of a user? Using a fine-grained analysis of user behavior, we show that library adoptions are followed by higher than normal activity within the first 6 h, implying that a higher than normal cognitive effort is involved with an adoption. Further study is needed to understand the specific types of events that surround the adoption of new information, and the cause of these dynamics. We also show that a simple linear model is capable of classifying future commits as being an adoption or not, based on the commit contents and the preceding history of the user and repository. Additional work in this vein may be able to predict the content of future commits, or suggest new libraries to users.