Interweaving Public User Profiles on the Web.
-
Conference Proceeding: Mashups: who? what? why?
Extended Abstracts Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, Florence, Italy, April 5-10, 2008; 01/2008 -
SourceAvailable from: Tim Schwartz
Conference Proceeding: Gumo - The General User Model Ontology.
Dominik Heckmann, Tim Schwartz, Boris Brandherm, Michael Schmitz, Margeritta von Wilamowitz-MoellendorffUser Modeling 2005, 10th International Conference, UM 2005, Edinburgh, Scotland, UK, July 24-29, 2005, Proceedings; 01/2005 -
SourceAvailable from: Federica Cena
Article: User identification for cross-system personalisation
[show abstract] [hide abstract]
ABSTRACT: Currently, there is an increasing demand for user-adaptive systems for various purposes in many different domains. Typically, personalisation in information systems occurs separately within each system. The recent trends in user modeling rely on cross-system personalisation, i.e., the opportunity to share information across multiple information systems in order to improve user adaptation. Cooperation among systems in order to exchange user model knowledge is a complex task. This paper addresses a key challenge for cross-system personalisation which is often taken as a starting assumption, i.e., user identification.In this paper, we describe the conceptualization and implementation of a framework that provides a common base for user identification for cross-system personalisation among web-based user-adaptive systems. However, the framework can be easily adopted in different working environments and for different purposes.The framework represents a hybrid approach which draws parallels both from centralized and decentralized solutions for user modeling. To perform user identification, we propose to exploit a set of identification properties that are combined using an identification algorithm.Information Sciences.
Page 1
Interweaving Public User Profiles on the Web
Fabian Abel, Nicola Henze, Eelco Herder, Daniel Krause
IVS – Semantic Web Group & L3S Research Center, Leibniz University Hannover,
Germany
{abel,henze,herder,krause}@l3s.de
Abstract. While browsing the Web, providing profile information in
social networking services, or tagging pictures, users leave a plethora of
traces. In this paper, we analyze the nature of these traces. We inves-
tigate how user data is distributed across different Web systems, and
examine ways to aggregate user profile information. Our analyses focus
on both explicitly provided profile information (name, homepage, etc.)
and activity data (tags assigned to bookmarks or images). The experi-
ments reveal significant benefits of interweaving profile information: more
complete profiles, advanced FOAF/vCard profile generation, disclosure
of new facets about users, higher level of self-information induced by the
profiles, and higher precision for predicting tag-based profiles to solve
the cold start problem.
1 Introduction
In order to adapt functionality to the individual users, systems need information
about their users [1]. The Web provides opportunities to gather such information:
users leave a plethora of traces on the Web, varying from profile data to tags. In
this paper we analyze the nature of these distributed user data traces and inves-
tigate the advantages of interweaving publicly available profile data originating
from different sources: social networking services (Facebook, LinkedIn), social
media services (Flickr, Delicious, StumbleUpon, Twitter) and others (Google).
The main research question that we will answer in this paper is the following:
what are the benefits of aggregating these public user profile traces?
In our experiments we analyze the characteristics of both traditional profiles
– which are explicitly filled by the end-users with information about their names,
skills or homepages (see Section 3) – as well as rather implicitly generated tag-
based profiles (see Section 4). We show that the aggregation of profile data reveals
new facets about the users and present approaches to leverage such additional
information gained by profile aggregation. We made all approaches and findings
presented in this paper available for the public via the Mypes1service: it enables
users to inspect their distributed profiles and provides access to the aggregated
and semantically enriched profiles via a RESTful API.
1http://mypes.groupme.org/
Page 2
2Related Work
Connecting data from different sources and services is in line with today’s
Web 2.0 trend of creating mashups of various applications [2]. Support for the
development of interoperable services is provided by initiatives such as the data-
portability project2, standardization of APIs (e.g. OpenSocial) and authentica-
tion and authorization protocols (e.g. OpenID, OAuth), as well as by (Semantic)
Web standards such as RDF, RSS and specific Microformats. Further, it becomes
easier to connect distributed user profiles—including social connections—due to
the increasing take-up of standards like FOAF [3], SIOC3, or GUMO [4]. Con-
version approaches allow for flexible user modeling [5]. Solutions for user iden-
tification form the basis for personalization across application boundaries [6].
Google’s Social Graph API4enables application developers to obtain the social
connections of an individual user across different services. Generic user model-
ing servers such as CUMULATE [7] or PersonIs [8] as well as frameworks for
mashing up profile information [9] appear that facilitate handling of aggregated
user data. Given these developments, it becomes more and more important to
investigate the benefits of user profile aggregation in context of today’s Web
scenery.
In [10], Szomszor et al. present an approach to combine profiles generated in
two different tagging platforms to obtain richer interest profiles; Stewart et al.
demonstrate the benefits of combining blogging data and tag assignments from
Last.fm to improve the quality of music recommendations [11]. In this paper
we do not only analyze the benefits of aggregating tag-based user profiles [12,
13], which we enrich with Wordnet5facets, but also consider explicitly provided
profiles coming from five different social networking and social media services.
3 Traditional Profile Data on the Web
Currently, users need to manually enter their profile attributes in each separate
Web system. These attributes—such as the user’s full name, current affiliations,
or the location they are living at—are particularly important for social net-
working services such as LinkedIn or Facebook, but may be considered as less
important in services such as Twitter. In our analysis, we measure to which de-
gree users fill in their profile attributes in different services. To investigate the
benefits of profile aggregation we address the following questions.
1. How detailed do users fill in their public profiles at social networking and
social media services?
2. Does the aggregated user profile reveal more information about a particular
user than the profile created in some specific service?
2http://www.dataportability.org/
3http://rdfs.org/sioc/spec/
4http://socialgraph.apis.google.com
5http://wordnet.princeton.edu/
Page 3
3. Can the aggregated profile data be used to enrich an incomplete profile in
an individual service?
4. To which extent can the service-specific profiles and the aggregated profile
be applied to fill up standardized profiles such as FOAF [3] and vCard [14]?
3.1Dataset
To answer the questions above, we crawled the public profiles of 116032 distinct
users via the Social Graph API. People who have a Google account can explic-
itly link their different accounts and Web sites; the Social Graph API allows
developers to look up the different accounts of a particular user. On average, the
116032 users linked 1.26 accounts while 70963 did not link any account.
For our analysis on traditional profiles we were interested in popular services
where users can have public profiles. We therefore focused on the social network-
ing services Facebook and LinkedIn, as well as on Twitter, Flickr, and Google.
Figure 1(a) lists the number of public profiles and the concrete profile attributes
we obtained from each service. We did not consider private information, but only
crawled attributes that were publicly available. Among the users for whom we
crawled the Facebook, LinkedIn, Twitter, Flickr, and Google profiles were 338
users who had an account at all five different services.
3.2 Individual Profiles and Profile Aggregation
The completeness of the profiles varies from service to service. The public profiles
available in the social networking sites Facebook and LinkedIn are filled more ac-
curately than the Twitter, Flickr, or Google profiles—see Figure 1(b). Although
Twitter does not ask many attributes for its user profile, users completed their
profile up to just 48.9% on average. In particular the location and homepage—
which can also be a URL to another profile page, such as MySpace—are omitted
most often. By contrast, the average Facebook and LinkedIn profile is filled up to
85.4% and 82.6% respectively. Obviously, some user data is replicated at multi-
ple services: name and profile picture are specified at nearly all services, location
was provided at 2,9 out of five services. However, inconsistencies can be found
in the data: for example, 37.3% of the users’ full names in Facebook are not
exactly the same as the ones specified at Twitter.
For each user we aggregated the public profile information from Facebook,
LinkedIn, Twitter, Flickr, and Google, i.e. for each user we gathered attribute-
value pairs and mapped them to a uniform user model. Aggregated profiles reveal
more facets (17 distinct attributes) about the users than the public profiles avail-
able in each separate service. On average, the completeness of the aggregated
profile is 83.3%: more than 14 attributes are filled with meaningful values. As
a comparison, this is 7.6 for Facebook, 8.2 for LinkedIn and 3.3 for Flickr. Ag-
gregated profiles therewith reveal significantly more information about the users
than the public profiles of the single services.
Further, profile aggregation enables completion of the profiles available at the
specific services. For example, by enriching the incomplete Twitter profiles with
Page 4
Service
# crawledcrawled profile attributes
profiles
nickname, first/last/full name,
photo, email (hash), homepage,
locale settings, affiliations
nickname, first/last/full name,
about, homepage, location, inte-
rests, education, affiliations,
industry
nickname, full name, photo,
homepage, blog, location
nickname, full name, photo,
email, location
nickname, full name, photo,
about, homepage, blog, location
(a) Profile attributes
Facebook3080
LinkedIn3606
Twitter1538
Flickr2490
Google15947
00,20,4 0,60,81
Twitter (6)
Google (7)
Flickr (5)
LinkedIn (10)
Facebook (9)
service (# considered profile attributes)
completeness of profiles
profile information
available in the
individual service
profile information
available after
enrichment with
aggregated profile
(b) Completing service profiles
Fig.1. Service profiles: (a) number of public profiles as well as the profile attributes
that were crawled from the different services and (b) completing service profiles with
aggregated profile data. Only the 338 users who have an account at each of the listed
services are considered.
information gathered from the other services, the completeness increases to more
than 98% (see Figure 1(b)): profile fields that are often left blank, such as location
and homepage, can be obtained from the social networking sites. Moreover, even
the rather complete Facebook and LinkedIn profiles can benefit from profile
aggregation: LinkedIn profiles can, on average, be improved by 7%, even though
LinkedIn provides three attributes—interests, education and industry—that are
not in the public profiles of the other services (cf. Figure 2(a)).
In summary, profile aggregation results in an extensive user profile that re-
veals more information than the profiles at the individual services. Moreover,
aggregation can be used to fill in missing attributes at the individual services.
3.3FOAF and vCard Generation
In most Web 2.0 services, user profiles are primarily intended to be presented to
other end-users. However, it is also possible to use the profile data to generate
FOAF [3] profiles or vCard [14] entries that can be fed into applications such as
Outlook, Thunderbird or FOAF Explorer.
Figure 2(a) lists the attributes each service can contribute to fill in a FOAF
or vCard profile, if the corresponding fields are filled out by the user. Figure 2(b)
shows to which degree the real service profiles of the 338 considered users can
actually be applied to fill in the corresponding attributes with adequate values.
Using the aggregated profile data of the users, it is possible to generate
FOAF profiles and vCard entries to an average degree of more than 84% and
88% respectively—the corresponding attributes are listed in Figure 2(a). Google,
Flickr and Twitter profiles provide much less information applicable to fill the
FOAF and vCard details. Although Facebook and LinkedIn both provide seven
attributes that can potentially be applied to generate the vCard profile, it is
interesting to see that the actual LinkedIn user profiles are more valuable and
produce vCard entries with average completeness of 45%; using Facebook as
a data source this is only 34%. In summary, the aggregated profiles are thus
Page 5
Attribute
nickname
first name
last name
full name
profile photo
about
email
homepage
blog
location
locale settings
interests
education
affiliations
industry
(a) Services and available attributes
vCard FOAF
x
Fa L T Fl G
x x x
xx
xx
x x x
xx
x
x
xx x
x
x x
x
x
x
xx
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
xx
x
00,1 0,20,30,40,50,60,70,80,91
Twitter (4/5)
Flickr (4/5)
Google (4/5)
Facebook (6/7)
LinkedIn (8/6)
Aggregated (11/11)
service (# attributes applicable to FOAF/vCard)
completeness of FOAF/vCard profiles
completeness
of vCard
profiles
completeness
of FOAF profiles
(b) Completing FOAF/vCard profiles
Fig.2. FOAF/vCard profile generation: (a) services and attributes available in the the
public profiles of Facebook (Fa), LinkedIn (L), Twitter (T), Flickr (Fl), and Google
(G) that can be applied to fill in a FOAF profile or a vCard entry and (b) completing
FOAF and vCard profiles with the actual user profiles.
a far better source of information to generate FOAF/vCard entries than the
service-specific profiles.
3.4Synopsis
Our analysis of the user profiles distributed across the different services point
out several advantages of profile aggregation and motivate the intertwining of
profiles on the Web. With respect to the key questions raised at the beginning
of the section, the main outcomes can be summarized as follows.
1. Users fill in their public profiles at social networking services (Facebook,
LinkedIn) more extensively than profiles at social media services (Flickr,
Twitter) which can possibly be explained by differences in purpose of the
different systems.
2. Profile aggregation provides multi-faceted profiles that reveal significantly
more information about the users than individual service profiles can provide.
3. The aggregated user profile can be used to enrich incomplete profiles in
individual services, to make them more complete.
4. Service-specific profiles as well as the aggregated profiles can be applied to
generate FOAF profiles and vCard entries. The aggregated profile represents
the most useful profile, as it completes the FOAF profiles and vCard entries
to 84% and 88% respectively.
As user profiles distributed on the Web describe different facets of the user,
profile aggregation brings some advantages: users do not have to fill their profiles
over and over again; applications can make use of more and richer facets/at-
tributes of the user (e.g. for personalization purposes). However, our analysis
shows also the risk of intertwining user profiles. For example, users who deliber-
ately leave out some fields when filling their Twitter profile might not be aware
that the corresponding information can be gathered from other sources.
Page 6
Flickr StumbleUpon Delicious Overall
378112747
6912345
27.291.71
5.2244.42
tag assignments
distinct tags
tag assignments per user
distinct tags per user
61884
11760
445.21
165.83
78412
13212
564.12
71.82
Table 1. Tagging statistics of the 139 users who have an account at Flickr, Stumble-
Upon, and Delicious.
4User Activity Data on the Web
Most social media systems enable users to organize content with tags (freely
chosen keywords). The tagging activities of a user form a valuable source of
information for determining the interests of a user [12,13]. In our analysis we
examine the nature of the tag-based profiles in different systems. Again, we
investigate the the benefits of aggregating profile data and answer the following
questions.
1. What kind of tag-based profiles do individual users have in the different
systems?
2. Does the aggregation of tag-based user profiles reveal more information
about the users than the profiles available in some specific service?
3. Is it possible to predict tag-based profiles in a system, based on profile data
gathered from another system?
4.1Individual Tagging Behavior across different Systems
From the 116032 users , 139 users were randomly selected who linked their Flickr,
StumbleUpon, and Delicious accounts. Table 1 lists the corresponding tagging
statistics. For these users, we crawled 78412 tag assignments that were performed
on the 200 latest images (Flickr) or bookmarks (Delicious and StumbleUpon).
Overall, users tagged more actively in Delicious than in the other systems: more
than 75% of the tagging activities originate from Delicious, 16.3% from Stum-
bleUpon and 5% from Flickr. The usage frequency of the distinct tags shows a
typical power-law distribution in all three systems, as well as in the aggregated
set of tag assignments: while some tags are used very often, the majority of tags
is used rarely or even just once.
On average, each user provided 564.12 tag assignments across the different
systems. The user activity distribution corresponds to a gaussian distribution:
26.6% of the users have less than 200 tag assignments, 10.1% have more than
1000 and 63.3% have between 200 and 1000 tag assignments. Interestingly, people
who actively tagged in one system do not necessarily perform many tag assign-
ments in another system. For example, none of the top 5% taggers in Flickr or
StumbleUpon is also among the top 10% taggers in Delicious. This observation
of unbalanced tagging behavior across different systems again reveals possible
Page 7
0%
10%
20%
30%
40%
other
communication
action
artifact
person
group
location
cognition
Flickr
Delicious
StumbleUpon
(a) Type of tags in the systems
0%
10%
20%
30%
40%
other
communication
action
artifact
person
group
location
cognition
Flickr &
Delicious
Flickr &
StumbleUpon
StumbleUpon &
Delicious
(b) Type of overlapping tags
Fig.3. Tag usage characterized with Wordnet categories: (a) Type of tags users apply
in the different systems and (b) type of tags individual users apply in two different
systems.
advantages of profile aggregation for current tagging systems: given a sparse tag-
based user profile, the consideration of profiles produced in other systems might
be used to tackle sparsity problems.
4.2Commonalities and Differences in Tagging Activities
In order to analyze commonalities and differences of the users’ tag-based profiles
in the different systems, we mapped tags to Wordnet categories and consid-
ered only those 65% of the tags for which such a mapping exists. Figure 3(a)
shows that the type of tags in StumbleUpon and Delicious are quite similar,
except for cognition tags (e.g., research, thinking), which are used more often in
StumbleUpon than in Delicious. For both systems, most of the tags—21.9% in
StumbleUpon and 18.3% in Delicious—belong to the category communication
(e.g., hypertext, web). By contrast, only 4.4% of the Flickr tags refer to the field
of communication; the majority of tags (25.2%) denote locations (e.g., Hamburg,
tuscany). Action (e.g., walking), people (e.g., me), and group tags (e.g., commu-
nity) as well as words referring to some artifact (e.g., bike) occur in all three
systems with similar frequency. However, the concrete tags seem to be different.
For example, while artifacts in Delicious refer to things like “tool” or “mobile
device”, the artifact tags in Flickr describe things like “church” or “painting”.
This observation is supported by Figure 3(b), which shows the average overlap of
the individual category-specific tag profiles. On average, each user applied only
0.9% of the Flickr artifact tags tags also in Delicious. For Flickr and Delicious,
action tags allocate the biggest fraction of overlapping tags. It is interesting to
see that the overlap of location tags between Flickr and StumbleUpon is 31.1%,
even though location tags are used very seldomly in StumbleUpon (3.3%, as
depicted in Figure 3(a)). This means that if someone utilizes a location tag in
StumbleUpon, it is likely that she will also use the same tag in Flickr.
Having knowledge on the different (aggregated) tagging facets of a user opens
the door for interesting applications. For example, a system could exploit Stum-
Page 8
0%
5%
10%
15%
20%
25%
30%
(A) StumbleUpon and
(B) Delicious
(A) StumbleUpon and
(B) Flickr
service comparison
(A) Delicious and (B)
Flickr
average overlap of tag-based profiles
Overlap
(divided by
size of smaller
tag cloud)
Overlap A in B
(divided by
size of tag
cloud in
service A)
Overlap B in A
(divided by
size of tag
cloud in
service B)
(a) Overlap of tag-based profiles
0
1
2
3
4
5
6
7
8
9
FlickrStumbleUpon Delicious Flickr &
StumbleUpon &
Delicious
tag-based profiles in different services vs. aggregated profiles
(b) Entropy and self-information
entropy / self-information (in bits)
entropy
self-information
Fig.4. Aggregation of tag-based profiles: (a) average overlap and (b) entropy and
self-information of service-specific profiles in comparison to the aggregated profiles.
bleUpon tags referring to locations to recommend Flickr pictures even if the
user’s Flickr profile is empty. In Section 4.4 we will present an approach that
takes advantage of the faceted tag-based profiles for predicting tagging behavior.
4.3Aggregation of Tagging Activities
To analyze the benefits of aggregating tag-based profiles in more detail we mea-
sure the information gain, entropy and overlap of the individual profiles. Fig-
ure 4(a) describes the average overlap with respect to three different metrics:
given two tag-based profiles A and B, the overlap is (1) overlap =
(2) overlapAinB=A∩B
denotes the percentage of tags in A that also occur in B.
The overlap of the tag-based profiles produced in Delicious and Stumble-
Upon is significantly higher than the overlap of service combinations that include
Flickr. However, on average, a user still just applies 6.8% of her Delicious tags
also in StumbleUpon, which is approximately as high as the percentage of tags
a StumbleUpon user also applies in Flickr. Overall, the tag-based user profiles
do not overlap strongly. Hence, users reveal different facets of their profiles in
the different services.
Figure 4(b) compares the averaged entropy and self-information of the tag-
based profiles obtained from the different services with the aggregated profile.
The entropy of a tag-based profile T, which contains of a set of tags t, is computed
as follows.
entropy(T) =
p(t) · self-information(t)
A∩B
min(|A|,|B|),
|A|, or (3) overlapBinA=A∩B
|B|. For example, overlapAinB
?
t∈T
(1)
In Equation 1, p(t) denotes the probability that the tag t was utilized by
the corresponding user and self-information(t) = −log(p(t)). In Figure 4(b), we
summarize self-information by building the average of the mean self-information
of the users’ tag-based profiles. Among the service-specific profiles, the tag-based
profiles in Delicious, which also have the largest size, bear the highest entropy and
average self-information. By aggregating the tag-based profiles, self-information
Page 9
increases clearly by 19.5% and 17.7% with respect to the Flickr and StumbleUpon
profiles respectively. Further, the tag-based profiles in Delicious can benefit from
the profile aggregation as the self-information would increase by 2.7% (from
8.53 bit to 8.76 bit) which is also considerably higher, considering that self-
information is measured in bits (e.g., with 8.53 bits one could describe 370 states
while 8.76 bits allow for decoding of 434 states).
Aggregation of tag-based profiles thus reveals more valuable new information
about individual users than focusing just on information from single services.
However, some fraction of the profiles also overlap between different systems, as
depicted in Figure 4(a). In the next section we analyze whether it is possible to
predict those overlapping tags.
4.4Prediction of Tagging Behavior
Systems that rely on user data usually have to struggle with the cold start prob-
lem; especially those systems that are infrequently used or do not have a large
base of users require solutions to that problem. In this section we investigate the
applicability of profile aggregation. Therefore, we evaluate different approaches
with respect to the following task.
Tag prediction task. Given a set of tags that occur in the tag-based profile of
user u in system A, the task of the tag prediction strategy is to predict those
tags that will also occur in u’s profile in system B.
We measure the performance by means of precision (= correctly classified as
overlapping tags / classified as overlapping tags), recall (= correctly classified
as overlapping tags / overlapping tags), and f-measure (= harmonic mean of
precision and recall). Our intention is not to find the best prediction algorithm,
but to examine the impact of features extracted from profile aggregation. Hence,
we apply a Naive Bayes classifier, which we feed with different features. The
benchmark tag prediction strategy (without profile aggregation) bases its decision
on a single feature: (F1) overall usage frequency of t in system B. In contrast,
the strategy that makes use of profile aggregation also applies (F2) u’s usage
frequency of t in system A and (F3) size of u’s profile in system A.
Figure 5(a) compares the average performance of both tag prediction strate-
gies. For each of the 139 users and each service combination (Flickr → Delicious,
Delicious → Flickr, StumbleUpon → Delicious, etc.) the strategies had to tackle
the prediction task specified above. The benefits of the profile aggregation fea-
tures are significant. The profile aggregation strategy performs—with respect
to the f-measure—96.1% better than the strategy that does not benefit from
profile aggregation (correspondingly, the improvement of precision and recall is
explicit). Further, it is important to notice that the average percentage of over-
lapping tags is less than 4%. Thus, a random strategy, which simply guesses
whether tag t will overlap or not (probability of 0.5), would fail with a precision
lower than 2%.
On average, the profile aggregation strategy can thus detect 57.4% of the
tags in system A that will also be part of the tag-based profile in system B. The
Page 10
0
0,1
0,2
0,3
0,4
0,5
0,6
profile aggregation without profile aggregation
F-Measure
Precision
Recall
(a) Average performance of tag prediction
0
0,05
0,1
0,15
0,2
0,25
0,3
categorize by Wordnet
before prediction
(b) Impact of Wordnet categorization
Wordnet category as
feature for prediction
without Wordnet
categories
F-Measure
Fig.5. Performance of tag prediction: (a) with and without aggregation of tag-based
profiles and (b) improving prediction performance (with profile aggregation) by means
of Wordnet categorization.
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
StumbleUpon &
Delicious
Flickr & StumbleUpon Flickr & Delicious
F-Measure
(a) Performance for the different services
0
0,1
0,2
0,3
0,4
0,5
0,6
cognition
communication
(b) Delicious → StumbleUpon
action
location
artifact
none
F-Measure
Precision
Recall
Fig.6. Tag prediction performance for specific services.
performance can further be improved by clustering the tag-based profiles accord-
ing to Wordnet categories. Figure 5(b) shows that the consideration of Wordnet
features—(F4) Wordnet category of t and (F5) relative size of corresponding
Wordnet category cluster in u’s profile—leads to a small improvement from 0.25
to 0.26 regarding the f-measure. However, if tag predictions are done for each
Wordnet cluster of the profiles separately, the improvement is considerably high
as the f-measure increases from 0.25 to 0.28.
Figure 6 shows the tag prediction performance (using features F1-5) focusing
on specific service combinations. While tag predictions for Flickr/Delicious based
on tag-based profiles from Delicious/Flickr perform quite weak, the predictions
between Flickr and StumbleUpon show a much better performance (f-measure:
0.23). For the two bookmarking services, StumbleUpon and Delicious, which
also have the highest average overlap (cf. Figure 4(a)), tag prediction works best
with f-measure of 0.39 and precision of 0.36. Figure 6(b) illustrates for what
kind of tags prediction works best between Delicious and StumbleUpon. For
tags that cannot be assigned to a Wordnet category (none), the precision is
just 16% while recall of 40% might still be acceptable. However, given tags that
can be mapped to Wordnet categories, the performance is up to 0.57 regarding
Page 11
f-measures. Given cognition tags (e.g., search, ranking) of a particular user u,
the profile aggregation strategy, which applies the features F1-5, can predict the
cognition tags u will use in StumbleUpon with a precision of nearly 60%: even
if a user has not performed any tagging activity in StumbleUpon, one could
recommend 10 cognition tags out of which 6 are relevant for u.
4.5Synopsis
The results of our analyses and experiments indicate several benefits of aggre-
gating and interweaving tag-based user profiles. We showed that users reveal
different types of facets (illustrated by means of Wordnet categories) in the dif-
ferent systems. By combining tag-based profiles from Flickr, StumbleUpon, and
Delicious, the average self-information of the profiles increases significantly. Al-
though the tag-based service-specific profiles overlap just to a small degree, we
proved that the consideration of profile data from other sources can be applied
to solve cold start problems. In particular, we showed that the profile aggre-
gation strategy for predicting tag-based profiles significantly outperforms the
benchmark that does not incorporate profile features from other sources.
5 Conclusions and Future Work
In this paper we analyzed the benefits of interweaving public profile data on
the Web. For both explicitly provided profile information (e.g. name, home-
town, etc.) and rather implicitly provided tag-based profiles (e.g. tags assigned to
bookmarks), the aggregation of profile data from different services (e.g, LinkedIn,
Facebook, Flickr, etc.) reveals significantly more facets about the individual users
than one can deduce from the separated profiles. Our experiments show the ad-
vantages of interweaving distributed user data for various applications, such as
completing service-specific profiles, generating FOAF or vCard profiles, produc-
ing multi-faceted tag-based profiles, and predicting tag-based profiles to solve
cold start problems. End-users and application developers can immediately ben-
efit from our research by using the Mypes service (http://mypes.groupme.org/).
In our future work we will focus on possible correlations between traditional
and tag-based profiles. For example, in initial experiments we analyzed whether
tag-based profiles conform to the skills users specified at LinkedIn. Given the
dataset described in Section 3, 76.2% of the users applied at least one of the,
on average, 8.56 LinkedIn skills also as a tag in Delicious. Further, we found
first evidence that for users, who belong to the same group based on their social
networking profile (in particular location and industry), the similarities between
the tag-based profiles is higher than for users belonging to different groups.
In the future, we will continue these experiments and investigate how explicitly
provided profile data can be exploited in social media systems, and how tag-based
profiles can be semantically enhanced to enrich traditional social networking
profiles.
Acknowledgments This work is partially sponsored by the EU FP7 project
GRAPPLE (http://www.grapple-project.org/).
Page 12
References
1. Jameson, A.: Adaptive interfaces and agents. The HCI handbook: fundamentals,
evolving technologies and emerging applications (2003) 305–330
2. Zang, N., Rosson, M.B., Nasser, V.: Mashups: Who? What? Why? In Czerwin-
ski, M., Lund, A., Tan, D., eds.: Proceedings of Conference on Human Factors in
Computing Systems on Human factors in computing systems (CHI ’08), New York,
NY, USA, ACM (2008) 3171–3176
3. Brickley, D., Miller, L.: FOAF Vocabulary Specification 0.91. Namespace docu-
ment, FOAF Project, http://xmlns.com/foaf/0.1/ (November 2007)
4. Heckmann, D., Schwartz, T., Brandherm, B., Schmitz, M., von Wilamowitz-
Moellendorff, M.: Gumo - the general user model ontology. In Ardissono, L., Brna,
P., Mitrovic, A., eds.: User Modeling. Volume 3538 of Lecture Notes in Computer
Science., Springer (2005) 428–432
5. Aroyo, L., Dolog, P., Houben, G., Kravcik, M., Naeve, A., Nilsson, M., Wild, F.: In-
teroperability in pesonalized adaptive learning. Journal of Educational Technology
& Society 9 (2) (2006) 4–18
6. Carmagnola, F., Cena, F.: User identification for cross-system personalisation.
Information Sciences: an International Journal 179(1-2) (2009) 16–32
7. Yudelson, M., Brusilovsky, P., Zadorozhny, V.: A user modeling server for con-
temporary adaptive hypermedia: An evaluation of the push approach to evidence
propagation. In Conati, C., McCoy, K.F., Paliouras, G., eds.: User Modeling. Vol-
ume 4511 of Lecture Notes in Computer Science., Springer (2007) 27–36
8. Assad, M., Carmichael, D., Kay, J., Kummerfeld, B.: PersonisAD: Distributed,
active, scrutable model framework for context-aware services. In LaMarca, A.,
Langheinrich, M., Truong, K.N., eds.: Pervasive Computing. Volume 4480 of Lec-
ture Notes in Computer Science., Springer (2007) 55–72
9. Abel, F., Heckmann, D., Herder, E., Hidders, J., Houben, G.J., Krause, D.,
Leonardi, E., van der Slujis, K.: A framework for flexible user profile mashups.
In Dattolo, A., Tasso, C., Farzan, R., Kleanthous, S., Vallejo, D.B., Vassileva, J.,
eds.: Int. Workshop on Adaptation and Personalization for Web 2.0 at UMAP ’09,
CEUR Workshop Proceedings (2009) 1–10
10. Szomszor, M., Alani, H., Cantador, I., O’Hara, K., Shadbolt, N.: Semantic mod-
elling of user interests based on cross-folksonomy analysis. In Sheth, A.P., Staab,
S., Dean, M., Paolucci, M., Maynard, D., Finin, T.W., Thirunarayan, K., eds.: In-
ternational Semantic Web Conference. Volume 5318 of Lecture Notes in Computer
Science., Springer (2008) 632–648
11. Stewart, A., Diaz-Aviles, E., Nejdl, W., Marinho, L.B., Nanopoulos, A., Schmidt-
Thieme, L.: Cross-tagging for personalized open social networking. In Cattuto,
C., Ruffo, G., Menczer, F., eds.: Hypertext, ACM (2009) 271–278
12. Firan, C.S., Nejdl, W., Paiu, R.: The benefit of using tag-based profiles. In Almeida,
V.A.F., Baeza-Yates, R.A., eds.: LA-WEB, IEEE Computer Society (2007) 32–41
13. Michlmayr, E., Cayzer, S.: Learning User Profiles from Tagging Data and Lever-
aging them for Personal(ized) Information Access. In Golder, S., Smadja, F., eds.:
Proceedings of the Workshop on Tagging and Metadata for Social Information
Organization at WWW ’07. (May 2007)
14. Dawson, F., Howes, T.: vCard MIME Directory Profile. Request for comments,
IETF, Network Working Group (September 1998)
View other sources
Hide other sources
-
Available from Eelco Herder · 26 Sep 2012
-
Available from l3s.de