Studying recommendation method has long been a fundamental area in personalized marketing science. The rating data sparsity problem is the biggest challenge of recommendations. In addition, existing recommendation methods can only identify user preferences rather than customer needs. To solve these two bottleneck problems, we propose a novel implicit feedback recommendation method using user-generated content (UGC). We identify product feature and customer needs from UGC using Convolutional Neural Network (CNN) model and textual semantic analysis techniques, measure user-product fit degree introducing attention mechanism and antonym mechanism, and predict user rating based on user-product fit degree and user history rating data. Using data from a large-scale review sites, we demonstrate the effectiveness of our proposed method. Our study makes several research contributions. First, we propose a novel recommendation method with strong robustness against sparse rating data. Second, we propose a novel recommendation method based on the customer need-product feature fit. Third, we propose a novel approach to measure the fit degree of customer needs-product feature, which can effectively improve the performance of recommendation method. Our study also indicates the following findings: (1) UGC can be used to predict user ratings with no user rating records. This finding has important implications to solve the sparsity problem of recommendations thoroughly. (2) The customer need-based recommendation method has better performance than existing user preference-based recommendation methods. This finding sheds light on the necessity of mining customer need for recommendation methods. (3) UGC can be used to mine customer need and product features. This finding indicates that UGC also can be used in the other studies requiring information about customer need and product feature. (4) Comparing the opinions of user review should not be solely on the basis of semantic similarity. This finding sheds light on the limitation of existing opinion mining studies.
1. Introduction
In the past decade, with the rapid development of online retailing, recommender systems have deeply affected the daily life of people. When people search for a particular product, they will be recommended several products according to their preferences. When they read books or watch movies, the corresponding commodities will be recommended to them. These all show that much of our daily life is invisibly guided by the recommender system. Recommender systems also bring huge benefits to online retailers. For instance, 30% sales of Amazon are increased by the application of recommender system [1]. Researchers find that a minor improvement in the quality of recommender systems can bring millions of dollars in revenue every year to every online retailer [2].
Given the enormous prospects in the promoting product sales, studying the recommendation method to match products and target user has long been a fundamental area in personalized marketing science. So far, recommender system technology still faces great challenges. According to a survey conducted by Tencent, 86% of users have used recommender systems, but more than half of them believe that only a small part of the products recommended can meet their own needs [3]. It reveals that the existing recommender methods fail to satisfy needs of customers, leaving huge room for improvement.
Among them, the rating data sparsity problem is the biggest challenge faced by all existing recommendation methods. The existing mainstream recommendation methods include content-based recommendation methods, collaborative filtering methods, hybrid recommendation algorithms, and rule-based recommendation methods [2, 4–6]. They are all overreliant on user rating records. With the decrease of rating records, the accuracy of recommendation methods will drop sharply, which brings the rating data sparsity problem. In recent years, major advances have been made in overcoming the sparsity problem. For example, to improve the performance of matrix factorization recommendation method, which is one of the most popular modern recommendation methods, R. Du et al. [7] add user attribute information, Liu et al. [8] add product content information, Yulong Gu [9] adds contextual information, He et al. [10] and Rong-Ping Shen et al. [11] add user feedback information, and Li and Guo [12] add user local characteristics. These studies have alleviated the sparsity problem to a certain extent, but they are still unable to predict user ratings without user rating record.
To completely solve the sparsity problem, implicit feedback recommendation has gradually become one of the most fascinating recommendation research areas. Existing implicit feedback recommendation methods recommend products mainly using user purchase history [13]. For example, some of them utilize user video browsing history or purchase history to recommend videos or products [14]. In fact, both user rating and user purchase history can only be used to identify user preferences, but they do not contain other detailed information about customer need. The reason why users buy products is that the products can satisfy their needs. Therefore, existing recommendation methods can only identify user preferences rather than customer needs, which will inevitably affect their recommendation performance.
To solve the problems mentioned above, we propose a novel implicit feedback recommendation method using user-generated content (UGC). UGC is the content generated by users to express their views on people, events, and things. It can not only fully express user real ideas on people, events, and things, but also express their subjective feelings [15]. UGC has become one of the most important data sources for big data business analysis [16]. We propose method predicts user ratings based on customer need identified from UGC and can effectively predict user ratings without any user rating record or user purchase history. To demonstrate the superiority of our proposed method, we compare it with several benchmark methods including Convolutional Matrix Factorization (ConvMF) [17], Neural Graph Collaborative Filtering (NGCF) [18], Deep Factorization-Machine based Neural Network (DeepFM) [19], Probabilistic Matrix Factorization (PMF) [20], and User-based Collaborative Filtering (CF) [21].
The remainder of the paper is organized as follows. In Section 2, we review relevant previous research and discuss the differences between our proposed method and existing methods. In Section 3, we propose a novel personalized implicit feedback recommendation method using user-generated content in detail. To demonstrate the superiority of our proposed method, in Section 4, we evaluate its effectiveness on real data using representative existing methods as benchmarks. Finally, in Section 6, we summarize the findings of this study, discuss them, and conclude with the future work.
2. Literature Review
2.1. Research on Recommendation Algorithm
The core technology recommendation system is recommendation algorithm. Existing recommendation algorithms mainly include content-based recommendation algorithms, collaborative filtering recommendation algorithms, hybrid recommendation algorithms, and recommendation algorithms based on association rules.
The content-based recommendation algorithm is to analyze the product content to establish the similarity relationship between the products and then recommend similar products with high user ratings [22]. It recommends items based on the product features extracted from product content information. For example, Koren et al. [23] use the information extracted from movie description, such as movie category, actor, and director, to compare the similarity of movies. Deldjoo et al. [24] extract video features from video content using analysis techniques. Shu et al. [5] learn implicit features in product description text using convolutional neural network. Yu et al. [25] extract image features from image content using image analysis technology. Yong Wu et al. [26] extract the label that can describe the product from the text information of the product. In summary, existing content-based recommendation research extracts product features from product content information (such as the description of the product by merchants), and the product content information provided by merchants does not fully reflect the product, which will inevitably affect the matching accuracy of the product and the target user.
The collaborative filtering recommendation algorithm analyzes user preferences through user rating records to match products with target users. This kind of recommendation algorithm only needs user rating record data to achieve matching, so it has become the most widely used recommendation algorithm. Collaborative filtering recommendation algorithms can be divided into two categories: memory-based collaborative filtering recommendation and model-based collaborative filtering recommendation. Based on rating records, the memory-based collaborative filtering recommendation algorithm analyzes the similarity of user preferences or the similarity of products through rating records and then recommends high-scoring products purchased by users with similar preferences or high-scoring products similar to those purchased by users [4, 5, 27–30]. This type of recommendation algorithm is very sensitive to rating data. Once the rating data is sparse, its performance will drop sharply, and it is unable to obtain the user's preference for the specific features of the product or to match the recommendation with the target customer. The model-based collaborative filtering recommendation algorithm, which can also be named as matrix factorization (MF) recommendation algorithm, trains the relationship between products and users, users and users (or between products and products), through user history rating data, and can still accurately match products and target customers when the rating data is sparse [23, 31–33]. It is noted that MF recommendations are still based on user rating history records. Without rating records, it cannot work at all.
Hybrid recommendation algorithms avoid or make up for the weaknesses of their respective recommendation algorithms by combining content-based recommendation algorithms and collaborative filtering recommendation algorithms. It recommends items based on both the product features extracted from product content information and user history rating records. For example, Toon De Pessemier et al. [6] propose a hybrid algorithm based on content recommendation, collaborative filtering recommendation, and knowledge-based recommendation. Cai Biao et al. [34] propose an improved dual-parameter hybrid recommendation algorithm, which applies particle swarm optimization (PSO) to the parameter optimization of the hybrid recommendation algorithm. Li et al. [35] propose a hybrid recommendation algorithm based on content and user collaborative filtering to solve the problems of data sparsity and cold start. Although hybrid recommendation can solve the problem in collaborative filtering recommendation to a certain extent, especially sparsity and cold start, it still cannot work without user history rating records.
In addition, there exists another kind of recommendation methods: implicit feedback recommendation method. They are mainly based on association rules and learn the association rules between products and users based on the user's purchase history [2, 36–38]. Association rules are very widely used pattern recognition algorithms, which are used in shopping analysis and network analysis. Implicit feedback recommendations do not require rating records, but they need user shopping purchase history, which will also bring sparsity problem.
2.2. Researches on Customer Needs Mining Based on User-Generated Content
Customer personalized needs belong to the category of the user personalized behavior. Using user-generated content data (UGC) to analyze user behavior has become very hot in recent years, which is widely used in online marketing [39], public opinion analysis [40], and social media operations [41]. UGC can help companies understand customer needs more fully and deeply, so as to (1) improve product design [42]; (2) manage and innovate products [43, 44]; (3) analyze user preferences for product features [45, 46]; and (4) analyze products competitiveness [47]. These studies have fully proved that UGC is an important source of extracting customer needs. However, these studies mainly focus on mining the needs of user groups for product characteristics; they rarely involve the mining of individual needs of customers. To achieve accurate personalized recommendation, it is necessary to further study how to mine individual personalized needs from UGC.
There also exist researches using UGC for the performance improvement of recommendation methods. Utilizing the text mining techniques, they propose hybrid recommendation methods that combine user opinion mined from UGC with traditional recommendation methods. These research works are based on both UGC and rating records. For example, using text sentiment classification technique, mine user opinion information from UGC to improve matrix factorization recommendations [48–50], collaborative filtering recommendations [51], hybrid recommendations [52], sequential recommendations [53], and cross-domain recommendations [54]. These researches have demonstrated that UGC can be used to improve the performance of current recommendation methods. However, they still rely on user history rating records or user purchase history. In addition, they only focus on the sentiment analysis of UGC, failing to perform in-depth customer needs mining from UGC.
In summary, current recommendation methods are mainly divided into content-based recommendations, collaborative filtering recommendations, hybrid recommendations, and rule-based recommendations (implicit feedback recommendations). These four categories of recommendation methods have their own merits, but they all face the same challenge that when the data is sparse, the performance of the recommendation algorithm will drop sharply. Researchers alleviate the sparsity problem by considering information mined from UGC, although they rely on user rating history records and cannot work at all without these records. On the other hand, existing recommendation methods can only identify user preferences rather than customer needs, which will inevitably affect their recommendation performance. In order to solve the challenges of the abovementioned related research, we propose a novel implicit feedback recommendation method using UGC.
3. A Proposed Personalized Recommendation Method Using User-Generated Content
As shown in Figure 1, the proposed method consists of six stages(i)Stage 1. Text preprocessing: We perform text preprocessing such as word segmentation, depunctuation, and stop word removal on the extracted raw UGC.(ii)Stage 2. Identifying informative sentences: We train word embedding with a CBOW model to map words onto a numerical vector space that can be calculated and then train the convolutional neural network model to identify the informative sentences that can present customer needs.(iii)Stage 3. Identifying topic of informative sentences: We extract key words from informative sentences with K-means clustering algorithm to construct a key word vocabulary and mark the topic of each informative sentence based on the constructed key word vocabulary.(iv)Stage 4. Constructing product feature vector and customer needs vector: We calculate the sentence vector of the informative sentences, group all the informative sentences with the same topic and calculate central vector of each sentence group, use text sentiment analysis technology to identify customer need sentences from the informative sentences, and construct product feature vector and customer needs vector.(v)Stage 5. Measuring user-product fit degree: We measure user-product fit degree according to the following three steps: (1) We measure the extent of need the user has for product feature. (2) We measure customer need-product feature fit degree by introducing attention mechanism and antonym mechanism. (3) We measure user-product fit degree.(vi)Stage 6. Predicting user ratings: We predict users’ rating of each product based on user-product fit degree and user history rating data.