Conference Paper

Feature weighting and instance selection for collaborativefiltering

Siemens, München, Bavaria, Germany
DOI: 10.1109/DEXA.2001.953076 Conference: Database and Expert Systems Applications, 2001. Proceedings. 12th International Workshop on
Source: IEEE Xplore


Collaborative filtering uses a database about consumers' preferences to make personal product recommendations and is achieving widespread success in e-commerce nowadays. In this paper we present several feature-weighting methods to improve the accuracy of collaborative filtering algorithms. Furthermore, we propose a method to reduce the training data set by selecting only highly relevant instances. We evaluate various methods on the well-known EachMovie data set. Our experimental results show that mutual information achieves the largest accuracy gain among all feature-weighting methods. The most interesting fact is that our data reduction method even achieves an improvement of the accuracy of about 6% while speeding up the collaborative filtering algorithm by a factor of 15

7 Reads
  • Source
    • "Sarwar et al [9] uses Latent Semantic Indexing (LSI) to capture the similarity among users and items in a reduced dimensional space. Yu et al [10] uses a feature-weighting method to improve the accuracy of Collaborative Filtering algorithms. Lei Shen and Yiming Zhou [11] apply a basic fractional function and an exponential function to calculate the similarity between users by taking both common features and different features into consideration. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Similarity method is the key of the user-based collaborative filtering recommend algorithm. The traditional similarity measures, which cosine similarity, adjusted cosine similarity and Pearson correlation similarity are included, have some advantages such as simple, easy and fast, but with the sparse dataset they may lead to bad recommendation quality. In this article, we first research how the recommendation qualities using the three similarity methods respectively change with the different sparse datasets, and then propose a combinative similarity measure considering the account of items users co-rated. Compared with the three algorithms, our method shows its satisfactory performance with the same computation complexity.
    03/2013; 347-350. DOI:10.2991/iccsee.2013.482
  • Source
    • "In addition to variance, other weights such as inverse user frequency [1], entropy, and mutual information [13] have been studied in the previous literature. The results in [13] indicate that few weighting schemes for items are able to improve the performance of collaborative filtering. One of the reasons, in our opinion, is that most of the current weighting schemes are usually hand crafted and computed by predefined functions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Collaborative filtering identifies information interest of a particular user based on the information provided by other similar users. The memory-based approaches for collaborative filtering (e.g., Pearson correlation coefficient approach) identify the similarity between two users by comparing their ratings on a set of items. In these approaches, different items are weighted either equally or by some predefined functions. The impact of rating discrepancies among different users has not been taken into consideration. For example, an item that is highly favored by most users should have a smaller impact on the user-similarity than an item for which different types of users tend to give different ratings. Even though simple weighting methods such as variance weighting try to address this problem, empirical studies have shown that they are ineffective in improving the performance of collaborative filtering. In this paper, we present an optimization algorithm to automatically compute the weights for different items based on their ratings from training users. More specifically, the new weighting scheme will create a clustered distribution for user vectors in the item space by bringing users of similar interests closer and separating users of different interests more distant. Empirical studies over two datasets have shown that our new weighting scheme substantially improves the performance of the Pearson correlation coefficient method for collaborative filtering.
    SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004; 01/2004
  • Source
    • "PCF assigns high weights to features that have high correlations with the given class. Most recently, Yu et al. [10] assigned feature weights depending on inverse user frequency for collaborative filtering. In this study, a method is presented that assigns weights using frequency information for metric and symbolic feature-value pairs for lifetime maintenance of weights. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In active-case-based reasoning, feature-value pairs together with their weights are triggered to compute the ranking scores of the cases on a case base. A key factor in determining the success of ActiveCBRs is how to actively maintain the weights in a changing environment. In this study we develop a domain-specific information method, which uses feature frequency data to assign weights for metric and symbolic feature-value pairs. The effectiveness of the method is evaluated through experiments on the travel agency data set. Our experimental results show that our frequency method achieves on average 2% to 3% accuracy gain in the 10 nearest neighbor cases presented to the user. In addition, it improves cross-sell by suggesting other good trip packages the user might be interested in, while lowering the impacts of the bad ones. The most interesting fact is that our frequency method makes the system able to cater to a user's changing preference through the dynamic maintenance of the feature weights in an ActiveCBR system.
    Intelligent Data Engineering and Automated Learning, 4th International Conference, IDEAL 2003, Hong Kong, China; 03/2003
Show more


7 Reads
Available from