Conference Paper

Feature weighting and instance selection for collaborativefiltering

Inst. of Comput. Sci., Univ. of Munich
DOI: 10.1109/DEXA.2001.953076 Conference: Database and Expert Systems Applications, 2001. Proceedings. 12th International Workshop on
Source: IEEE Xplore

ABSTRACT Collaborative filtering uses a database about consumers' preferences to make personal product recommendations and is achieving widespread success in e-commerce nowadays. In this paper we present several feature-weighting methods to improve the accuracy of collaborative filtering algorithms. Furthermore, we propose a method to reduce the training data set by selecting only highly relevant instances. We evaluate various methods on the well-known EachMovie data set. Our experimental results show that mutual information achieves the largest accuracy gain among all feature-weighting methods. The most interesting fact is that our data reduction method even achieves an improvement of the accuracy of about 6% while speeding up the collaborative filtering algorithm by a factor of 15

  • [Show abstract] [Hide abstract]
    ABSTRACT: Collaborative filtering systems have achieved great success in both research and business applications. One of the key technologies in collaborative filtering is similarity measure. Cosine-based and Pearson correlation-based methods are popular ways for similarity measure, but have low accuracy. In this paper, we propose a novel method for similarity measure, referred as hierarchical pair-wise sequence (HPWS). In HPWS, we take into account both the sequence property of user behaviors and the hierarchical property of item categories. We design a collaborative filtering recommendation system to evaluate the performance of HPWS based on the empirical data collected from a real P2P application, i.e. "byrBT" in CERNET. Experiment results show that HPWS outperforms traditional Cosine similarity and Pearson similarity measures under all scenarios.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Research on the use of social trust relationships for col- laborative ltering has shown that trust-based recom- mendations can outperform traditional methods in cer- tain cases. This, in turn, lead to insights that tie trust to certain more subtle types of similarity between users which is not captured in the overall similarity measures normally used for making recommendations. In this study, we investigate the use these trust-inspired nu- anced similarity measures directly for making recom- mendations. After describing previous research that identied these similarity statistics, we present an ex- periment run on two data sets: FilmTrust and Movie- Lens. Our results show that using a simple measure - the single largest dierence between users - as a weight
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The properties of training data set such as size, distribution and number of attributes significantly contribute to the generalization error of a learning machine. A not- well-distributed data set is prone to lead to a partial overfitting model. The two approaches proposed in this paper for the binary classification enhance the useful data information by mining negative data. First, error driven compensating hypothesis approach is based on the Support Vector Machines with 1+k times learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which, each label is a transformation of the label from the negative data set, further produces the child positive and negative data subsets in subsequent iterations. This procedure refines the model created by the base learning algorithm, creating k number of hypotheses over k iterations. A predicting method is also proposed to trace the relationships between the negative subsets and testing data set by vector similarity technique. Second, a statistical negative examples learning approach based on theoretical analysis improves the performance of base learning algorithm learner by creating one or two additional hypothesis audit and booster to mine the negative examples output from the learner. The learner employs a regular support vector


1 Download
Available from