DataPDF Available
International Conference on Computer Science and Artificial Intelligence (ICCSAI 2013)
ISBN: 978-1-60595-132-4
Random Walks on the Bipartite-Graph for Personalized Recommendation
Zhong-you Pei, Chun-heng Chiang, Wen-bin Lin
School of Mathematics, Southwest JiaoTong University
Chengdu, 610031, China
pzy20062141@my.swjtu.edu.cn; chiangchunheng@gmail.com; wl@swjtu.edu.cn
Abstract—With the link relations between users and items, we
develop a new graph-based top-N recommendation method
following the idea of topic-sensitive PageRank. An iterative
propagation procedure over the bipartite graph is adopted to
simulate the spreading of users' preference information. It’s
proven that the random walks on the bipartite graph converge
to a stationary probability distribution. By exploiting the
critical point over the random walks, we design a novel
similarity metric to measure nodes’ similarity relationships.
The experimental results on MovieLens show that our method
outperforms the other two node's similarity methods in terms
of both Precision and Recall for the recommendations of
different lengths.
Keywords-random walk; bipartite graph; topic-sensitive
PageRank; collaborative filtering
I. INTRODUCTION
Generally, traditional recommendation approaches are
categorized into two main groups [1], namely Content-based
Filtering (CBF) and Collaborative Filtering (CF). CBF
recommends items that are similar to ones previously liked
by users. It relies on the similarity measure for items, the
higher the similarity score is, the more likely the items are
recommended as the preferred items for a given user. On the
other hand, the underlying idea of CF can be described as
following steps, (1) identify her/his preference neighborhood;
and (2) aggregate the mass opinions to estimate the given
user’s preference to unknown items; then (3) sort the items in
order of importance and make some top-N recommendations.
CF techniques have been acknowledged to be one of the
most successful recommendation techniques, and has been
used in different applications such as recommending news
(e.g. Yahoo!), products (e.g. Amazon), friends (e.g.,
Facebook, Twitter) and movies (e.g., Netflix, MovieLens,
Youtube).
Many variations of CF techniques have also been
proposed, including those based on the nearest
neighborhood-based prediction algorithm [2,3], the
approaches with the latent factor models [4, 5] and some
graph-based models [6, 7, 8, 9], etc.
In graph-based approaches, the rating database is
represented in the form of a graph, and the edge weight
encodes the similarities between the users and items. Up to
date, various graph-based recommendation methods have
been proposed. In order to search and rank the folksonomies,
Hotho et al. [7] develop FolkRank. Following the idea of
PageRank, they assume that, when resources are annotated
with important tags by influential users, the resources should
also be relevant and important. For practical reasons, a direct
application of PageRank is not possible, an alternative
weight-spreading approach is proposed. Fouss et al. [9]
compared several random-walk-based quantities for
measuring relatedness between users and movies on bipartite
graph. One of them is to compute the pseudo-inverse matrix
of the Laplacian matrix of the graph. With the supposed user
tastes transitivity property, Huang et al. [6] proposed a
graph-based model to deal with the cold start and data
sparsity problems [11] on the augmented rating matrix.
Similarly, Zhang and Pu turn to exploit the neighborhood
relatedness in a recursive way [8].
In this work, we introduce the Topic-Sensitive PageRank
into the graph-based approaches with random walks on the
user-item bipartite graph.
The rest of this paper is organized as follows. Section II
gives notations. Our method is presented in Section III. The
experimental results are provided in Section IV and
conclusions are given in Section V.
II. NOTATIONS
Let
U
be the set of the users,
I
be the set of the items
and their sizes are
m
and
n
, respectively. A bipartite graph
( , )
G V E
= can be constructed, where
V
is the vertex set and
can be represented by the union of two sets
V U I
= ∪
;
is
the edges which link the user nodes in
U
and the item nodes
in
I
. Without loss of generality, for any user
u U
, and
any item
i I
, if there exists some activities (referred to as
an event
ui
A
), e.g. u has viewed and (or) rated the item
i
,
then an edge is built to link them. Otherwise, they cannot be
directly linked and the corresponding edge does not exist.
For ease of reference, we denote
1 ,
0 ,
ui
ui
A is true
e
elsewise
=
and since the bipartite graph
G
is an undirected graph,
thus
ui iu
e e
=
.
III. THE PROPOSED METHOD
Inspired by the topic-sensitive PageRank [10], we
develop a bipartite-graph-based top-N recommendation
approach.
The relatedness between users and items can be
formulated as a bipartite graph
G
. Each user has her/his
unique preference, which is similar to Haveliwala’s topic.
Specifically, we design a novel type of random walk on the
user-item bipartite graph to simulate users’ preference
propagation procedure.
Considering a user
uU
, we start to run the random
walk, and denote the initial state as
0
Xu
. Strictly
speaking, the random walk is not completely irregular, rather
it is probability-constrained. At each step, the surfer has two
choices: continuing the walks or backing to the start point
(namely the node
u
).
Suppose the walk has advanced to the
thk
step and the
surfer now is located at the node
v
, which may be either a
user or an item node. The surfer may act as follows:
With probability
, the surfer continues the random
walk. Specifically, she/he randomly selects one of
the directly linked neighbours, and goes forward.
With probability
1
, the surfer returns to the start
node
u
.
Let
( , , )p u v k
be the probability for the surfers arriving at
the node
v
by k steps starting from the source user u.
Therefore, the source user’s initial probability must be 1, that
is
 
, ,0 =1p u u
. Using the state chain
0 1 1
X X X X
kk
, the
probability
( , , )p u v k
can be equivalently written as a
conditional probability given the initial state
0
Xu
.
0
( , , ) ( | )
k
p u v k p X v X u  
(1)
According to Bayes theorem, we have
0
1 0 1 0
'
10
'
( | )
( | ', ) ( '| )
( | ', ) ( , ', 1)
k
k k k
vV
kk
vV
p X v X u
p X v X v X u p X v X u
p X v X v X u p u v k


 
 
(2)
Suppose the surfer locates at the node
v
which is
different from
u
. If there does not exist a direct link between
v
and
'v
, we have
'0
vv
e
and
10
( | ', ) 0
kk
p X v X v X u
 
. It implies that, there is no
alternative road that can lead the surfer to the destination
node
v
at the next step, except the edges linking node
v
and
its neighbours. If there is a direct link between
v
and
'v
, we
have
'1
vv
e
, and the following equation holds
10
( | ', )
kk v
p X v X v X u O
 
(3)
where
v
O
denotes the out-degree of the node
'v
.
Substitute Equation (3) into Equation (2), we obtain
'
'
0
10
': 1
': 1
( | )
( | ', ) ( , ', 1)
( , ', 1)
vv
vv
k
kk
ve
ve v
p X v X u
p X v X v X u p u v k
p u v k
O

 
(4)
On the other hand, suppose
v
is the source user, i.e.,
vu
. There are two ways which can bring the surfer to
v
from the previous node
'v
. One is the teleportation with
probability
1
, and the other is based on the direct edges
analogous to the above spreading process.
We formulate the above procedure as follows:
'1
'1
': '
': '
( , ', 1) 1,
( , , ) ( , ', 1) ,
vv
vv
ve v
ve v
p u v k vu
O
p u v k p u v k vu
O

 
(5)
A. Theoretical Analysis
Let
| | | |VV
MR
be the transition probability matrix
defined as
0
0
TA
MB



(6)
where each entry is represented as
m
i
ij
ij v
e
O
, the block
entries
mn
AR
,
nm
BR
and
= mV U I n 
. Assume
| | | |VV
k
PR
be the
k
-step transition probability distribution
over the bipartite graph, and the initial distribution is denoted
as
00
Q
P


(7)
where
Q
is a
m
-dimensional block vector.
As in PageRank, the random walk can, at any step
k
,
either jump via the probability transition matrix
M
to an
adjacent node with probability
, or teleport to the active
user node with probability
1
. The probability distribution
vector
( , , )p u v k
for the source user
u
with
k
random walks
can be expressed recursively as the following matrix form
01
(1 ) T
kk
P P M P

 
(8)
Theorem 1. If
is nonnegative and less than one, the
random walks on the bipartite graph
G
must converge to a
stationary probability distribution after sufficient steps.
Proof. The distribution difference over the graph at two
successive steps can be formulated as follow
when
k
is an odd number
1
2
11
2
()
()
k
k
kk k
AB
P P Q
BA B






(9)
when
k
is an even number
2
2
12
2
()
()
k
k
kk k
AB AB
P P Q
BA B






(10)
Since all the matrices in the right side of the equations (9)
and (10) are independent of
, and
lim 0
k
k

, we have
1
lim kk
kPP

Now we will show that, for every
0
, there exists a
nature integer
K
and a stationary probability distribution P* ,
such that
*
|| ||
k
PP

holds for every
kK
.
Let
*k
k
Z P P
, since
** 0
(1 )
T
P M P P

 
we can obtain
*
11
()
TT
k k k
Z M P P M Z


 
and thus,
0
()
Tk
k
Z M Z
In view of the fact that
M
is a stochastic matrix, we have
11
|| || || || 1
TT
MM
 
 
According to the theorem of matrix theory,
T
M
is a
convergent matrix. Therefore, the random walks on the
bipartite graph
G
can converge to a stationary probability
distribution after sufficient steps.
B. Critical Point
There exists an interesting property behind the random
walks, which can be used to simplify our proposed method.
When the random walks proceeded at the length of three or
four, the recommendation accuracy is independent of the
factor
. We now prove it briefly.
Proof . The basic function of a recommender system is
whether it can recommend the unrated items to a user. Let
u
J
be the set of items which have not been rated by the
source user u, and
()
ku
PJ
be the
k
-step probability
distribution over the set of unrated items. Let
()
k
PI
be the
k
-step probability distribution over all item nodes.
According to Equations (9) and (10), by summating all the
successive steps probability distribution differences, we can
obtain the 3-step probability distribution over the item nodes
3
3 0 1
1
23
( ) ( ) [ ( ) ( )]
()
jj
j
P I P I P I P I
BQ BQ BABQ
 
 
 
(11)
and the 4-step probability distribution
2 3 4
4( ) ( ) ( )P I BQ BQ BABQ BABQ
 
 
(12)
For convenience, we use to represent a probability
distribution vector and use
()u
J
to denote a vector
comprising the elements in corresponding to
u
J
. The
spreading mechanism tells that the set
u
J
have not been
activated until the walks proceed to the third stage. As a
result, the term
BQ
doesn’t affect the distribution over
u
J
,
i.e.,
12
( ) ( ) ( ) 0
u
u u J
P J P J BQ 
. As for the third stage, the
term
BABQ
initializes the probability distribution over
u
J
,
that is
3
3
P ( ) ( )
uu
J BABQ J
. When it comes to the fourth
stage, according to Equation (12), we have
3 4 3
4( ) ( ) (1 )( )
uu
u J J
P J BABQ BABQ BABQ
 
 
It’s obvious that the ranking for elements in
4()
u
PJ
is the
same as
3()
u
PJ
, which indicates that the recommendation
accuracy is independent of the factor
.
C. Another Similarity Measure
Basing on the property of the critical point, we propose
one new similarity measure. Suppose the random walks have
forwarded to the third stage, let’s consider the matrix
AB
involved in current transition probability matrix
3
()
T
M
.
We utilize
mm
AB R
as the similarity matrix for user
nodes. The entries
mn
AR
and
nm
BR
in
T
M
represent
the relatedness between users and items, and
AB
make a
bridge for the nodes A and B. Specifically, the similarity
between two users
u
and
v
can be formulated as
1
uv
uv i I I iu
sOO

Compared with the Jaccard similarity index, this measure
has an important property, which can reduce the impact of
items’ popularity on the similarity score.
IV. EXPERIMENTATION
A. Accuracy Evaluation Metrics
Precision and Recall are two standard evaluation metrics
in information retrieval. Both have also been widely used in
the evaluation of recommendation accuracy.
Let
u
R
be the recommended items list to the user
u
provided by the recommender system, and
u
T
be her/his
rating history on the test data set. Precision is defined as the
percentage of relevant items in the recommended list
||
||
uu
uU
u
uU
RT
Precision R
(13)
which represents the probability that a recommended item is
relevant. Recall is defined as the ratio of the suggested
relevant items to the number of available relevant items
||
||
uu
uU
u
uU
RT
Recall T
(14)
B. Data Set
To evaluate the performances of our method, we
conducted experiments on MovieLens provided by
GroupLens Research Center (http://www.grouplens.org/).
MovieLens is both a recommender system and a virtual
community website, where users are allowed to share movies
using favored tags. The website has over 50,000 users who
have provided their ratings on more than 3,000 movies. To
achieve a greater reliability, only the users who have rated 20
or more movies are included, that results in a data set with
over 100,000 ratings from 943 users for 1,682 movies. Each
opinion is represented by a tuple
( , , )
ui ui
t u i r
, where
uU
denotes a user,
iI
is a movie, and the rating for the
movie
i
by user
u
is denoted by
ui
r
, which is an integer
score at five levels, e.g., 1 implies the movie is very bad, and
5 indicates the movie is very good. In addition, the data set
also provides users profile information, such as age, gender,
and features of movies, e.g. the type of the movie.
C. Baselines
In the experiments, the standard user-based and item-
based collaborative filtering techniques are selected as the
baseline for top-N recommendations.
The user-based approaches evaluate the interest of a user
u
for an item
i
using the ratings for the item by the her/his
neighborhood users, who give similar rating for the items.
The item-based methods [2], on the other hand, predict the
rating of user
u
to an item
i
using her/his historical rating
for items that similar to
i
.
Let
u
I
be the item set rated by the user
uU
, and
i
U
be the set of users who have rated a given item
iI
. Given
any two users
u
and
v
, their similarity
uv
s
is measured by
the Jaccard index
||
||
uv
uv uv
II
sII
Similarly, given two items
i
and
j
, we define the
similarity
ij
s
between them as
||
||
ij
ij j
UU
sU
Therefore, the
k
neighbours for the user
u
and the item
i
will be established based on the computed similarities,
written
( , )N u k
and
( , )N i k
, respectively.
Let
ui
r
be the rating for item
i
by user
u
and
ui
p
be the
satisfaction for item
i
by this user. In order to recommend
items for the user
u
, the user-based approaches and the
item-based approaches are formulated respectively as
( , )
( , )
i
u
ui uv vi
v N u k U
ui ij uj
j N i k I
p s r
p s r


Using the satisfaction scores, the system sort and
recommend the items with top scores to the given user
u
.
D. Experimental Results
To reduce the variability in prediction, we conduct five-
fold cross validation experiments on MovieLens data set.
Thus, the data set is divided into five subsets, one of which
is randomly selected for testing, and the others are for
training. Throughout the experiments, the performances in
terms of Precision and Recall are all averaged over the 5-
fold subsets.
There are three parameters in our method that need to be
suitably selected, they are the damping factor
, the
maximum number of iterations or length of random walks
K
and
0
P
the personalization initial probability distribution
vector over the bipartite graph.
We select one simplest version of personalization
distribution, where the only nonzero element corresponds to
the source user as described in section III.
To evaluate the impact of
, we measure the
performances of our method with nine different
chosen
from 0.1 to 0.9, in terms of Precision and Recall. The results
are plotted in Fig.1. It can be observed that for given
0
P
, the
smaller the value of
, the better our method performs,
indicating a similar trend as in [14].
Given a damping factor
, the iterative number
K
can
be adaptively determined until the probability distribution
over the graph is convergent. On the other hand, Zhang et al.
[14] have experimented on MovieLens, and pointed out that
a longer propagation path, i.e. a larger
K
, will bring more
redundancy information in prediction. To avoid the noises
induced by too many random walks, we choose
6K
which is slightly larger than the critical point.
According to the property about the critical point, it is
proved that, the damping factor
does not affect the
recommendation quality. At the critical point, we have
designed a novel similarity measure for user nodes. In order
to show the advantage of our method, the user-based
collaborative filtering approach with this novel similarity
measure, we compare it to the user-based collaborative
filtering approach with classical Jaccard similarity measure,
as well as the traditional item-based approach. The
comparison results on MovieLens are presented in Fig.2.
The experimental results show that the new similarity
measure is more consistent with the recommendation goal,
and has good potentials for future works.
V. CONCLUSIONS AND FUTURE WORKS
In this paper, we develop a graph-based recommendation
method, following the idea of topic-sensitive PageRank. The
main contribution of this study is: (1) construct a user-item
bipartite graph from the binary rating database, and also take
users’ unique preferences into consideration; (2) design a
special type of random walks on the bipartite-graph with a
critical point; (3) propose a novel metric to measure the
vicinity/similarity between user nodes. The experimental
results on MovieLens show that the new metric derived from
the critical point provides an alternative measure for the
user-based collaborative filtering approaches. On the other
hand, the performance of the top-N recommendation method
using different α and maximum number of iterations
suggests that too many propagation may deteriorate the
overall recommendation accuracy, since the procedure will
introduce more noises.
In the future work, we will focus on designing divers
similarity metrics to measure the relations between
heterogeneous nodes, and developing effective ensemble
methods to boost the capacity of the measures.
ACKNOWLEDGMENT
This work was supported in part by the Program for New
Century Excellent Talents in University (Grant No. NCET-
10-0702).
REFERENCES
[1] Adomavicius G, Tuzhilin A. Toward the next generation of
recommender systems: A survey of the state-of-the-art and possible
extensions[J]. Knowledge and Data Engineering, IEEE Transactions
on, 2005, 17(6): 734-749.
[2] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl.
Item-based collaborative filtering recommendation algorithms. In
Proceedings of the 10th international conference on World Wide Web,
pages 285295. ACM, 2001.
[3] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John
T Riedl. Evaluating collaborative filtering recommender systems.
ACM Transactions on Information Systems (TOIS), 22(1):553,
2004.K. Elissa, “Title of paper if known,” unpublished.
[4] Yehuda Koren. Factorization meets the neighborhood: a multifaceted
collaborative filtering model. In Proceedings of the 14th ACM
SIGKDD international conference on Knowledge discovery and data
mining, pages 426434. ACM,2008.
[5] Linas Baltrunas and Xavier Amatriain. Towards time-dependant
recommendation based on implicit feedback. In Workshop on
context-aware recommender systems (CARS09), 2009.
[6] Zan Huang, Hsinchun Chen, and Daniel Zeng. Applying associative
retrieval techniques to alleviate the sparsity problem in collaborative
filtering. ACM Transactions on Information Systems (TOIS),
22(1):116142, 2004.
[7] Andreas Hotho, Robert schke, Christoph Schmitz, and Gerd
Stumme. Information retrieval in folksonomies: Search and ranking.
In The semantic web: research and applications, pages 411426.
Springer, 2006.
[8] Jiyong Zhang and Pearl Pu. A recursive prediction algorithm for
collaborative filtering recommender systems. In Proceedings of the
2007 ACM conference on Recommender systems, pages 5764.
ACM, 2007.
[9] Francois Fouss, Alain Pirotte, Jean-Michel Renders, and Marco
Saerens. Random-walk computation of similarities between nodes of
a graph with application to collaborative recommendation.
Knowledge and Data Engineering, IEEE Transactions on, 19(3):355
369, 2007.
[10] Taher H Haveliwala. Topic-sensitive pagerank. In Proceedings of the
11th international conference on World Wide Web, pages 517526.
ACM, 2002.
[11] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard
Friedrich. Recommender systems: an introduction. Cambridge
University Press, 2010.
[12] Sergey Brin and Lawrence Page. The anatomy of a large-scale
hypertextual web search engine. Computer networks and ISDN
systems, 30(1):107117, 1998.
[13] Geoffrey Grimmett and David Stirzaker. Probability and random
processes. Oxford university press, 2001.
[14] Yin Zhang, Jiang-qin Wu, and Yue-ting Zhuang. Random walk
models for top-n recommendation task. Journal of Zhejiang
University SCIENCE A, 10(7):927936, 2009
Figure 1. (a) Precision and (b) Recall of our method at various choices of
on MovieLens data set
Figure 2. Comparison results for models in terms of (a) Precision and (b) Recall on MovieLens data set
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Collaborative filtering (CF) is a successful approach for build- ing online recommender systems. The fundamental process of the CF approach is to predict how a user would like to rate a given item based on the ratings of some nearest- neighbor users (user-based CF) or nearest-neighbor items (item-based CF). In the user-based CF approach, for exam- ple, the conventional prediction procedure is to find some nearest-neighbor users of the active user who have rated the given item, and then aggregate their rating information to predict the rating for the given item. In reality, due to the data sparseness, we have observed that a large proportion of users are filtered out because they don't rate the given item, even though they are very close to the active user. In this paper we present a recursive prediction algorithm, which allows those nearest-neighbor users to join the prediction process even if they have not rated the given item. In our approach, if a required rating value is not provided explicitly by the user, we predict it recursively and then integrate it into the prediction process. We study various strategies of selecting nearest-neighbor users for this recursive process. Our experiments show that the recursive prediction algo- rithm is a promising technique for improving the prediction accuracy for collaborative filtering recommender systems.
Article
Full-text available
this article, we propose to deal with this sparsity problem by applying an associative retrieval framework and related spreading activation algorithms to explore transitive associations among consumers through their past transactions and feedback. Such transitive associations are a valuable source of information to help infer consumer interests and can be explored to deal with the sparsity problem. To evaluate the effectiveness of our approach, we have conducted an experimental study using a data set from an online bookstore. We experimented with three spreading activation algorithms including a constrained Leaky Capacitor algorithm, a branch-and-bound serial symbolic search algorithm, and a Hopfield net parallel relaxation search algorithm. These algorithms were compared with several collaborative filtering approaches that do not consider the transitive associations: a simple graph search approach, two variations of the user-based approach, and an item-based approach. Our experimental results indicate that spreading activation-based approaches significantly outperformed the other collaborative filtering methods as measured by recommendation precision, recall, the F-measure, and the rank score. We also observed the over-activation effect of the spreading activation approach, that is, incorporating transitive associations with past transactional data that is not sparse may "dilute" the data used to infer user preferences and lead to degradation in recommendation performance
Article
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
Article
Contenido: 1. Eventos y sus probabilidades. 2. Variables aleatorias y sus distribuciones. 3. Variables aleatorias discretas. 4. Variables aleatorias continuas. 5. Generar funciones y sus aplicaciones. 6. Cadenas de Markov. 7. Convergencia y variables aleatorias. 8. Procesos aleatorios. 9. Procesos estacionarios. 10. Renovaciones. 11. Filas. 12. Martingalas y 13. Procedimientos de difusión.
Article
In this age of information overload, people use a variety of strategies to make choices about what to buy, how to spend their leisure time, and even whom to date. Recommender systems automate some of these strategies with the goal of providing affordable, personal, and high-quality recommendations. This book offers an overview of approaches to developing state-of-the-art recommender systems. The authors present current algorithmic approaches for generating personalized buying proposals, such as collaborative and content-based filtering, as well as more interactive and knowledge-based approaches. They also discuss how to measure the effectiveness of recommender systems and illustrate the methods with practical case studies. The final chapters cover emerging topics such as recommender systems in the social web and consumer buying behavior theory. Suitable for computer science researchers and students interested in getting an overview of the field, this book will also be useful for professionals looking for the right technology to build real-world recommender systems. © Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich 2011.
Article
Context-aware recommender systems (CARS) aim at im-proving users' satisfaction by tailoring recommendations to each particular context. In this work we propose a con-textual pre-filtering technique based on implicit user feed-back. We introduce a new context-aware recommendation approach called user micro-profiling. We split each single user profile into several possibly overlapping sub-profiles, each representing users in particular contexts. The predic-tions are done using these micro-profiles instead of a single user model. The users' taste can depend on the exact partition of the contextual variable. The identification of a meaningful par-tition of the users' profile and its evaluation is a non-trivial task, especially when using implicit feedback and a contin-uous contextual domain. We propose an off-line evaluation procedure for CARS in these conditions and evaluate our approach on a time-aware music recommendation sytem.
Article
Recently there has been an increasing interest in applying random walk based methods to recommender systems. We employ a Gaussian random field to model the top-N recommendation task as a semi-supervised learning problem, taking into account the degree of each node on the user-item bipartite graph, and induce an effective absorbing random walk (ARW) algorithm for the top-N recommendation task. Our random walk approach directly generates the top-N recommendations for individuals, rather than predicting the ratings of the recommendations. Experimental results on the two real data sets show that our random walk algorithm significantly outperforms the state-of-the-art random walk based personalized ranking algorithm as well as the popular item-based collaborative filtering method.
Conference Paper
Recommender systems provide users with personalized suggestions for products or services. These systems often rely on Collaborating Filtering (CF), where past transactions are analyzed in order to establish connections between users and products. The two more successful approaches to CF are latent factor models, which directly profile both users and products, and neighborhood models, which analyze similarities between products or users. In this work we introduce some innovations to both approaches. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model. Further accuracy improvements are achieved by extending the models to exploit both explicit and implicit feedback by the users. The methods are tested on the Netflix data. Results are better than those previously published on that dataset. In addition, we suggest a new evaluation metric, which highlights the differences among methods, based on their performance at a top-K recommendation task.
Conference Paper
Social bookmark tools are rapidly emerging on the Web. In such sys- tems users are setting up lightweight conceptual structures called folksonomies. The reason for their immediate success is the fact that no specific skills are needed for participating. At the moment, however, the information retrieval support is lim- ited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed al- gorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset.
Article
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical largescale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. Keywords World Wide Web, Search Engines, Information Retrieval, PageRank, Google 1.