From Social Networks to Behavioral Networks in Recommender Systems.
- Citations (17)
-
Cited In (0)
-
Conference Proceeding: Different Aspects of Social Network Analysis.
2006 IEEE / WIC / ACM International Conference on Web Intelligence (WI 2006), 18-22 December 2006, Hong Kong, China; 01/2006 -
SourceAvailable from: citeseerx.ist.psu.edu
Article: Hybrid Recommender Systems: Survey and Experiments
User Modeling and User-Adapted Interaction. 01/2002; 12:331-370. -
Article: Implicit Interest Indicators
[show abstract] [hide abstract]
ABSTRACT: Recommender systems provide personalized suggestions about items that users will nd interesting. Typically, recommender systems require a user interface that can intelligently " determine the interest of a user and use this information to make suggestions. The common solution, explicit ratings", where users tell the system what they think about a piece of information, is well-understood and fairly precise. However, having to stop to enter explicit ratings can alter normal patterns of browsing and reading. A more intelligent" method is to use implicit ratings, where a rating is obtained by a method other than obtaining it directly from the user. These implicit interest indicators have obvious advantages, including removing the cost of the user rating, and that every user interaction with the system can contribute to an implicit rating.04/2001;
Page 1
From Social Networks to Behavioral Networks in
Recommender Systems
Ilham Esslimani, Armelle Brun, Anne Boyer
KIWI Team, Universit´ e Nancy 2, LORIA
615 rue du Jardin Botanique, 54600 Villers-L` es-Nancy, France
{ilham.esslimani, armelle.brun, anne.boyer}@loria.fr
Abstract—Recommender systems are widely used for per-
sonalization of information on the web and information re-
trieval systems. Collaborative Filtering (CF) is the most popular
recommendation technique. However, classical CF systems use
only direct links and common features to model relationships
between users. This paper presents a new Collaborative Filtering
approach (BNCF) based on a behavioral network that uses
navigational patterns to model relationships between users and
exploits social networks techniques, such as transitivity, to explore
additional links throughout the behavioral network. The final aim
consists in involving these new links in prediction generation, to
improve recommendations quality. BNCF is evaluated in terms of
accuracy on a real usage dataset. The experimentation shows the
benefit of exploiting new links to compute predictions. Indeed,
BNCF highly improves the accuracy of predictions, especially in
terms of HMAE.
I. INTRODUCTION
Social networks represent a social structure between actors,
mostly individuals or organizations. It indicates the ways
they are connected through various social relationships as
friendship, co-working or information exchange [1]. With the
evolution of the web, social network analysis is becoming
increasingly relevant, since it aims at understanding the evo-
lution of interactions between users and social flows.
The development of the web engendered also an important
proliferation of information resources. The need of automatic
personalization of information thus becomes heightened. Rec-
ommender systems are widely used for this purpose thanks to
their ability to analyze users behaviors and guide them towards
relevant resources that suit their preferences.
Recommender systems use different input data to construct
user models in order to generate recommendations. The input
data can include content informations [2], explicit data like
votes [3], demographic data [4], etc.
Collaborative Filtering (CF) is one recommendation tech-
nique that identifies relationships (similarities) between users,
based on their ratings in order to select neighbors and compute
predictions for the active users.
Despite the success of recommender systems and collab-
orative filtering in many application areas, some research
questions still remain. Some of these questions concern the
requirement of explicit rating data to compute similarities
between users. As explicit rating data is not always available,
one challenge for recommender systems is to take into con-
sideration another type of data that represent efficiently users
behaviors. In this context, usage traces can be a relevant source
of data.
Another challenge for recommender systems is to model
relationships between users, not by using rating information or
social links, but navigational behaviors instead. Additionally,
in terms of modeling, one research problem consists in identi-
fying new links between users that are not directly connected.
Indeed, classical predictive systems exploit only direct links
and common preferences to compute recommendations. In
this frame, so as to improve the quality of predictions, we
propose to use social networks techniques, especially transitive
associations to explore new links. New “neighbors” can be
then used to compute predictions.
Thus, the research problem we are interested in, is related
to two main points:
• By using usage traces, how can we evaluate correlations
between users and how can we construct behavioral
networks.
• How can we introduce social networks techniques in
order to model relationships between users and identify
new links in these behavioral networks.
In this paper, we explore these issues and propose a
promising model that we experiment on a real usage dataset.
We suggest a new Behavioral Network based Collaborative
Filtering system (BNCF), that exploits navigational patterns
and transitive links to model users. To analyze behavior
similarities, we employ navigational patterns by taking into
account the longest common sub-sequences between pairs of
users. Then, users are modeled through a behavioral network
based on these navigational similarities.
In social networks, transitivity means “the friends of my
friends are my friends”. The transposition of this property
in behavioral networks implies “the users looking like those
who look like me, look like me”. The application of social
networks based techniques such as transitivity aims at refining
the behavioral network by additional links that can enhance
the performance of the CF system.
This paper is organized as follows. We describe in the
second part some research studies related to the analysis of
usage traces and graph based recommender systems. In the
third part of the paper, we present the BNCF approach that
exploits social networks techniques. The fourth part describes
the experimentation. Then, the results of the experimentation
Page 2
are put forward in the fifth part and finally we present a
conclusion.
II. RELATED WORK
A. Analysis of Usage traces
Several studies describe the impact of usage traces on
the recommendation process in predictive and recommender
systems. These studies demonstrate how the analysis of nav-
igational activities can be a relevant method that allows to
model users behaviors and identify their potential needs [6].
Let us notice that analysis of usage traces is mainly related
to the area of Web Usage Mining (WUM) which aims at
observing users behaviors while interacting with a system.
This observation refers to direct traces as explicit ratings and
annotations, or non direct traces like bookmarking, frequencies
of visits, visited links, etc. from which users preferences can be
inferred. The relevance of considering usage traces in order to
learn implicit votes is presented in [7]. A metric is suggested to
compute the “page interest estimator” from non direct traces.
Frequent patterns mining, Longest Common Subsequences
(LCS) technique and Markov models, are some of the WUM
approaches that tend to harness the navigational activities in
order to analyze users behaviors. The attempt of frequent
patterns mining is the discovering of time ordered sequences
that have been followed by past users in order to predict
future resources [8]. In order to discover patterns from these
traces, the process consists, first of all, in usage data pre-
processing [9], then it runs the patterns discovery mechanism
which allows finally the generation of recommendations. The
recommended resources are represented as pages or resources
that are frequently accessed by related users (with common
preferences). In [10] the concept of “active session window”
is used to generate recommendations. The recommendation
system suggests a hybrid personalization model that switches
between different recommendation models according to the
degree of connectivity and the depth of the active user’s
session.
Discovering of Longest Common Subsequences (LCS) is
another technique that has been applied in WUM domain in
order to analyze the potential links between navigational paths
and users profiles. Basically, this technique is one dynamic
programming method, it aims at identifying the longest com-
mon subsequence relating to two given sequences. An LCS
based architecture is suggested in [11] for classifying naviga-
tional patterns and generating predictions to users. In [12] an
algorithm based on LCS technique is proposed for clustering
users by using their navigational data. This clustering approach
uses the similarities between two navigational paths based on
the LCS and the time spent on resources contained in this
LCS.
Another approach that uses sequential links for navigational
activities is Markov chain model. In accordance with [13], the
sequential dependencies of navigational behaviors of users are
modeled by Markov Models ; the conditional probability of
one resource, considering users navigational traces is com-
puted.
The common feature of all these approaches related to
WUM, is the exploitation of usage traces so as to compute
links, distances and relationships between users based on
common visited resources in order to generate predictions.
B. Graph based Recommender Systems
With the heightened development of the web, social net-
works have been subject of several research studies. Most
of them study and analyze social networks structures to
represent interactions, collaborations or influences between
entities. [14] present a measurement analysis of various online
social networks by studying the corresponding topological
properties. [15] suggest a mixture between usage traces and
social networks. Interactions between social influence and
similarity of activities are modeled in order to predict future
behaviors.
An important attention has been also devoted to the com-
bination of social networks and recommender systems. Some
studies like [16] incorporate social network information into
Collaborative Filtering. Nodes are consumers and links are
the social relationships among them. In order to identify
neighbors, distances between users in the social network are
harnessed. These distances are computed by the breadth-first
search and are used to calculate predictions.
However, most of the studies in the context of recommender
systems, do not use directly social information, but apply
rather graph based techniques to model users in order to make
predictions. A navigation graph based recommender system
is presented in [17]. A distance metric based on maximum
common sub-graph is used to compute distances between
nodes that represent navigational sequences. Then, these nodes
are clustered by using the computed distances and the recom-
mender system matches the active user to the nearest cluster
and suggest recommendations according to his navigational
sequence. A two layer graph based recommender system that
combines content and collaborative approaches in the context
of digital libraries is proposed in [18]. Books and users
represent layer nodes and the edges represent transactions.
Low-degree association (based on content and collaborative
similarity weights) and high-degree association graph (based
on Hopfield spreading activation algorithm) have been applied
in order to improve recommendations quality. Transitive re-
lationships related to associative retrieval approaches, have
been explored in the context of Collaborative Filtering in
order to alleviate the sparsity problem [19]. The transitive
associations among users are identified based on spreading
activation techniques.
III. BEHAVIORAL NETWORK BASED CF (BNCF)
In standard recommender systems, similarities between
users are evaluated based on common features as ratings,
visited resources, etc. However, if two users do not share any
of these features, we cannot establish links between them.
Thus, to overcome this problem we propose to use social
networks techniques such as transitivity in order to identify
new links between users. The finality consists in involving
Page 3
these new links in the recommendation process in order to
improve predictions accuracy.
We propose the construction of a behavioral network by
modeling links between users, based on their behavioral simi-
larities that are computed by using navigational patterns. Un-
like classical predictive systems based on navigational patterns
(presented in section II-A), we attempt to analyze behavioral
similarities between pairs of users by using usage traces. Then,
by applying transitivity, we aim at identifying new neighbors
that are strongly connected to the active user throughout
the network. Strong connections are then deduced from high
similarity values between the intermediate neighbors.
The following sections present in details the different mech-
anisms used by the BNCF system to generate predictions.
A. Construction of the Behavioral Network
In social networks approaches, graph based models are
employed to model networks structures based on social in-
formation. In this paper, we introduce another type of net-
work, based on behavioral information where users are not
necessarily connected in the real world as social networks,
but are linked as they share similar navigational patterns.
The following section presents the technique that we employ
to assess navigational similarities between users in order to
construct the behavioral network.
1) Computing navigational similarities: As presented in
[20], we consider that two users ua and ub, who share
common sequential patterns are highly similar. The longer
the sequence of a common pattern is, the more the users are
similar. Therefore, our goal is to identify for every pair of
users < ua,ub>, the maximum length LKmax(ua,ub) of a
pattern among their common patterns. Then, the similarity of
navigation between two users is computed by using Equation
1 that takes into account the following parameters:
• Common patterns between the active user ua and the
neighbor user ub.
• The maximum length of a common pattern between the
active user uaand the neighbor user ub.
• The maximum length of sessions.
This formula computes, for each pair of users ua and
ub the similarity of navigation SimNav(ua,ub) as the ra-
tio of the maximum length of a common frequent pattern
LKmax(ua,ub) and the minimum of maximum sizes of ua
and ubsessions denoted SessMax(ua)and SessMax(ub). We
note that the common frequent pattern is intra-session.
SimNav(ua,ub)=
LKmax(ua,ub)
min(SessMax(ua),SessMax(ub))
(1)
We use the minimum of maximum sizes of sessions in the
denominator so as to avoid to penalize a new user who has
few sessions with short sizes. We note that the correlation
value is normalized between 0 and 1. This metric emphasizes
the importance of the longest frequent patterns to evaluate
similarities of users. The higher the length of a sequential
pattern is, the more the users are similar.
Fig. 1.Identification of new neighbors in the Behavioral Network
2) Modeling the Behavioral Network: In order to model
links between users, we use a directed graph G = (V,E)
where vertices V represent users, edges E represent the links
between users and the navigational similarities (computed in
the previous step) represent the weights of the edges. We
employ the Floyd-Warshall algorithm in view of its efficiency
and its simple implementation. This algorithm is used to
compute the shortest path between every pair of nodes ua
and ubin the graph by taking into account the weights. The
algorithm checks whether a shorter path from nodes ua to
ubexists via nodes ur. At the end of the process, the matrix
contains the length of the shortest path from uato ub. To use
this algorithm, we transform the navigational similarities into
distance-like values.
The computation of the shortest paths leads to the iden-
tification of new links between users throughout transitive
relationships, as described in Figure 1. Indeed, as we can see,
initially the active user uahas two direct neighbors ueand ub.
But, with the transitive links, he can be connected to additional
neighbors like user ud from two possible paths. Thus, the
algorithm selects the shortest path. That means that even if
two users are not similar in terms of navigation (they have
not viewed the same resources in the past) and are not directly
connected in the behavioral network, we can find strong links
between them thanks to the strong similarities of intermediate
neighbors. The smaller the distance is, the more these two
users are similar.
This step allows the discovering of new potential neighbors
of active users that are not direct neighbors. These neighbors
are then involved in the stage of computing predictions with
the objective of improving the performance of the recom-
mender system.
B. Prediction generation
1) Estimating ratings from usage traces: As ratings are
required in the prediction step, we employ usage traces to
estimate them as mentioned in section II-A. We choose two
implicit parameters: frequencies of visiting a resource and
duration of visiting a resource. Considering an active user ua,
the frequency of visiting an item ikis the ratio of the number
of visits of ik(N(ua,ik)) and the average number of visits on
all items I (N(ua,I)) as described in Equation (2).
Page 4
Frequency(ua,ik)=N(ua,ik)
N(ua,I)
(2)
As regards duration, it is computed as the ratio of the duration
of visiting an item ik (Drt(ua,ik)) and the total duration of
visiting all items I (Drt(ua,I)) as presented in Equation (3).
Duration(ua,ik)=Drt(ua,ik)
Drt(ua,I)
(3)
Once frequencies and durations are calculated, we use
the formula 4 suggested by [21], in order to compute and
normalize our ratings according to the rating scale.
fTransf(ua,ik)= Vmin+(
?
cp(c) ∗ c(ua,ik)
?
cp(c)
∗Vmax− Vmin
cmax
)
(4)
fTransf(ua,ik)represents the transformation function of uarat-
ing on item ik. Vminand Vmaxare respectively the minimum
and maximum possible ratings according to the rating scale.
p(c) denotes the weight assigned to the criterion (Frequency
and Duration in our case), c(ua,ik)is the value of the criterion
and cmaxdenotes the maximum value of the criterion.
2) Computing predictions:
paths) between the active user and other users are computed,
the similarities are deduced from normalized distance-like
values. Then, predictions are calculated by using the weighted
average prediction formula used by CF [22]. We select nearest
neighbors ub(direct and non direct) in the behavioral network,
that have already rated the active item ik.
Once the links (shortest
Pred(ua,ik)= V(ua)+
?
ubSim(ua,ub)∗ (V(ub,ik)− V(ub))
?
ubSim(ua,ub)
(5)
Here, Pred(ua,ik)represents the prediction of the rating of the
active user on ikand Sim(ua,ub)the similarity value between
uaand ubbased on the shortest path between them. Items that
will be recommended to the active user are the ones with high
predicted rating values.
IV. EXPERIMENTATION
A. Datasets
In order to evaluate the performance of our CF system, we
use real usage datasets extracted from the intranet of Credit
Agricole Banking Group, in particular the usage data relating
to the Department of Strategies and Technology Watch. All
the users are members of the Group and can access various
informations like: news, articles, faq, special reports, etc. This
intranet contains numerous resources and web pages. There-
fore, the finality of integrating a recommender system consists
in guiding users towards relevant resources corresponding to
their profiles.
Thus, to train our model, we use the usage data that
reflects the navigational activities of users. This data has been
collected during 24 months and stored in server log files.
It contains mainly informations about anonymous user-ids,
session-ids and time of starting sessions. The selected dataset
is related to 748 users and 3856 resources. It has been split
into 80% and 20% corresponding respectively to training and
test datasets by taking into account the temporal dimension.
We fixed the rating scale to [1 − 5] for estimated ratings.
B. Evaluation
Differentevaluation metrics
experimentation of recommender systems. The most important
criterion in recommender systems is precision. The precision
measures the accuracy of recommendations comparing to
real votes. As a measure of precision evaluation, we used
the Mean Absolute Error (MAE). This metric computes the
mean of absolute error between predicted ratings and the real
ratings that are actually assigned by users.
?n
where
a real vote viand a predicted vote Pred(ua,i)concerning an
active item i and n represents the number of items in the test
dataset.
Since items that have high prediction values are the ones
that are recommended to users, we use also the HMAE (High
MAE) metric [23] to evaluate the performance of the model.
The HMAE is similar to MAE but it considers only items that
are predicted with a value of 4 or 5. In our experimentation,
we choose the HMAE metric to measure how our system is
able to recommend relevant items to active users.
For both metrics, the lower the MAE and HMAE values
are, the more the generated recommendations are accurate.
can beused inthe
MAE =
i=1
??v(i)− Pred(ua,i)
??denotes the absolute error between
??
n
(6)
??v(i)− Pred(ua,i)
V. RESULTS
In order to analyze the performance of BNCF, we evaluate
the precision of generated predictions in terms of MAE and
HMAE. The BNCF accuracy is compared to Classical CF used
by standard recommender systems and to the Navigational
based CF presented in [24], where only direct neighbors (that
share similar navigational patterns with the active user) are
integrated to generate predictions.
The goal of this evaluation is to study the impact of using
both neighborhoods generated by the navigational technique
and the transitive associations.
We note that, before the computation of predictions, in order
to select reliable similarity values, we used at the same time
(for all the models) two criteria to select the nearest neighbors
of the active user:
• The threshold (related to the similarity value) has been
set to 0.2.
• The minimum number of co-rated items between the
active user and other users has been set to 20 [25].
Let us notice that the application of transitivity on the
studied dataset leads to the enhancement of BNCF identified
neighbors by about 4% comparing to the Navigational based
CF.
Page 5
TABLE I
MAE VALUES OF COMPARED MODELS
CF Models
Classical CF
Navigational based CF
BNCF
MAE
0.763
0.789
0.782
TABLE II
HMAE VALUES OF COMPARED MODELS
CF Models
Classical CF
Navigational based CF
BNCF
HMAE
0.541
0.501
0.468
A. MAE
Table I presents the MAE values related to the Classical CF,
the Navigational based CF and BNCF.
We can first notice that accuracy only slightly decreases if we
compare Classical CF (that exploits ratings to evaluate similar-
ity values) to the Navigational based CF (that uses navigational
patterns to evaluate similarity values). This confirms the idea
that navigational patterns are almost as informative as rating
data and may contain complementary information to ratings
in order to evaluate correlations between users.
Besides, the performance of BNCF is approximately similar to
the Navigational based CF. The use of additional links leads
to a slight improvement compared to the use of only direct
links.
B. HMAE
Let us recall that only items with high prediction values
are suggested by recommender systems to the active user.
Thus, we are interested in evaluating the performance of
studied CF models while generating high predictions. The
results of this evaluation are presented in Table II.
We can first notice that the Navigational based CF reaches
a better HMAE comparing to the Classical CF. This means
that the use of navigational patterns leads to an improvement
of HMAE. Moreover, the application of transitivity in oder to
discover new neighbors contributes to an important improve-
ment of accuracy. Indeed, we observe that BNCF outperforms
the Classical CF by 13% and the Navigational based CF by
7% in terms of HMAE, contrary to MAE.
This improvement can be explained by the fact that when
transitivity is applied, users are not connected only according
to the way they have similarly rated items commonly seen, as
in Classical CF. Users are joined when they share common
neighbors. Moreover, this approach has the advantage to not
only consider commonly rated items, but all the items users
have rated.
VI. CONCLUSION
In this paper, we presented a new CF approach based on
a behavioral network that exploits navigational patterns and
transitive links to explore associations between users.
Contrary to Classical CF systems, no explicit preferences
need to be provided by users. Preferences are inferred from
users navigational activities. Besides, unlike classical usage
predictive systems, BNCF is user-based and attempts to har-
ness usage traces in order to identify behavioral similarities
between users. The more two users share long common
navigational patterns, the more they are correlated.
Then, these behavioral similarities are employed to model
relationships between users throughout a behavioral network.
The objective consists in discovering new neighbors thanks
to the use of social networks techniques as transitive links.
Therefore, the new identified neighbors are involved in the
recommendation process with the objective of improving pre-
dictions accuracy.
BNCF has been evaluated both in terms of MAE and HMAE
and has been compared to other CF models in order to study
the impact of the navigational patterns and the transitive links
on accuracy.
The experimentation shows the relevance of integrating both
direct and non direct neighbors identified in the behavioral
network, on the accuracy of recommendations in terms of
HMAE.
As a future work, we intend to exploit additional tech-
niques used in social networks and other algorithms developed
in associative information retrieval as spreading activation
techniques and evaluate the impact of its combination with
the navigational based CF. Besides, we plan to study the
combination of social networks and behavioral networks and
examine its performance on recommendations precision.
VII. ACKNOWLEDGMENT
We would like to thank Mr. Jean Philippe Blanchard and
acknowledge the financial support to this project provided by
the Credit Agricole Banking Group.
REFERENCES
[1] M. Jamali and H. Abolhassani, “Different aspects of social network
analysis,” in Proceedings of the 2006 IEEE/WIC/ACM International
Conference on Web Intelligence, 2006.
[2] R. Burke, “Hybrid recommender systems: Survey and experiments,”
User Modeling and User-Adapted Interaction, vol. 12, no. 4, pp. 331–
370, 2002.
[3] M. Claypool, P. Le, M. Waseda, and D. Brown, “Implicit interest indi-
cators,” in Proceedings of ACM Intelligent User Interfaces Conference,
2001.
[4] M. Vozalis and K. Margaritis, “On the enhancement of collaborative
filtering by demographic data,” Web Intelligence and Agent Systems: An
International Journal (WIAS), vol. 4, no. 2, pp. 117–138, 2006.
[5] G. Adomavicius and A. Tuzhilin, “Toward the next generation of rec-
ommender systems: A survey of the state-of-the-art,” IEEE transactions
on knowledge and data engineering, vol. 17, no. 6, pp. 734–749, 2005.
[6] S. Anand and B. Mobasher, “Intelligent techniques for web personal-
ization,” Lecture Notes in Artificial Intelligence, vol. 3169, pp. 1–36,
2005.
[7] P. Chan, “A non-invasive learning approach to building user profiles,”
Web Usage Analysis and User Profiling, 1999.
[8] M. Gery and H. Haddad, “Evaluation of web usage mining approaches
for user’s next request prediction,” in Proceedings of the 5th ACM
international workshop on Web information and data management.
ACM Press, 2003.