Conference PaperPDF Available

Klout Score: Measuring Influence Across Multiple Social Networks

Authors:
  • YouTube Music

Abstract and Figures

In this work, we present the Klout Score, an influence scoring system that assigns scores to 750 million users across 9 different social networks on a daily basis. We propose a hierarchical framework for generating an influence score for each user, by incorporating information for the user from multiple networks and communities. Over 3600 features that capture signals of influential interactions are aggregated across multiple dimensions for each user. The features are scalably generated by processing over 45 billion interactions from social networks every day, as well as by incorporating factors that indicate real world influence. Supervised models trained from labeled data determine the weights for features, and the final Klout Score is obtained by hierarchically combining communities and networks. We validate the correctness of the score by showing that users with higher scores are able to spread information more effectively in a network. Finally, we use several comparisons to other ranking systems to show that highly influential and recognizable users across different domains have high Klout scores.
Content may be subject to copyright.
Klout Score: Measuring Influence Across Multiple Social Networks
Adithya Rao, Nemanja Spasojevic, Zhisheng Li and Trevor DSouza
Lithium Technologies | Klout
San Francisco, CA
Email: adithya, nemanja, zhisheng.li, trevor.dsouza@klout.com
Abstract— In this work, we present the Klout
Score, an influence scoring system that assigns scores
to 750 million users across 9 different social networks
on a daily basis. We propose a hierarchical framework
for generating an influence score for each user, by
incorporating information for the user from multiple
networks and communities. Over 3600 features that
capture signals of influential interactions are aggre-
gated across multiple dimensions for each user. The
features are scalably generated by processing over
45 billion interactions from social networks every
day, as well as by incorporating factors that indicate
real world influence. Supervised models trained from
labeled data determine the weights for features, and
the final Klout Score is obtained by hierarchically
combining communities and networks. We validate
the correctness of the score by showing that users
with higher scores are able to spread information
more effectively in a network. Finally, we use sev-
eral comparisons to other ranking systems to show
that highly influential and recognizable users across
different domains have high Klout scores.
Keywords-influence scoring; online social networks;
large scale;
I. Introduction
It is estimated that there are now over a billion users
on online social networks, exceeding even the number
of websites on the internet. In the past decade, ranking
webpages based on importance of linked content, clicks
and impressions led to ubiquitous internet applications
such as search. Applying effective ranking techniques to
determine influential users on the internet has a similar
potential to lead to many new and useful applications as
well.
When a user posts a message on social media, other
users in the network who see the content may perform
certain actions in reaction to the original message. The
fact that the original message prompted certain reactions
from other users is an indication that the user influenced
them in some manner. For example, a user may post a
message on Facebook about her experience in a restau-
rant, with a link to the restaurant’s webpage. A user
who reads the original message may choose to react to
it in several ways such as: read the message, click on the
link to get more information, reshare the link with other
users in his own network, or actually visit the restaurant
for dinner. The type of reaction gives an indication of the
strength of influence the message had on the user.
There are, of course, many variables pertaining to
offline actions that cannot be directly measured, such
as the effect of seeing a billboard on a freeway. But in
the context of social media, a large set of user reactions
such as impressions, clicks, likes, comments, reshares
and purchase behavior is measurable. By observing the
quantity and quality of reactions that a user generates
among other users in the network, it is therefore possible
to get a measure of how influential he or she is.
Here we introduce the Klout Score as a metric for
measuring influence of users on online platforms such
as social networks and community forums. While the
Klout Score has been available since 2008, early versions
of the score included fewer signals of influence, and were
therefore less effective as a metric of influence. However,
because the system was built to be extensible and flexi-
ble, the Klout score has evolved to incorporate many new
sources of information, growing more accurate over time.
Today, the Klout score is widely used for identifying
influential users for applications such as targeted search
and influencer marketing [1].
It is this extensible and flexible framework that we
present in this study, along with results that validate
the effectiveness of the score in measuring influence. Our
contributions in this paper are as below:
Scalable Production System: We describe a
full production system that assigns Klout Scores
to 750 million public and registered user profiles
from 9 different networks, by processing 45 billion
interactions everyday.
Feature Generation: We outline how features
that capture different aspects of influential actions
are generated. In our models we use over 3600 such
features.
Hierarchical Scoring: We explain how networks
are scored individually, and are then combined into
a single score using a hierarchical approach.
Validation: We present experiments and compar-
isons that show that the Klout score effectively
measures influence in a variety of contexts.
II. Problem Setting
A. Related Work
Recently, a great deal of research work has emerged on
exploring the social influence measurement and applica-
tions. Tang et al. [2] analyzed topic-based social influence
on academic collaboration networks. The authors of [3],
[4] identified influencers on blogs and the Twitter social
network respectively. In comparison, here we consider
influence simultaneously on multiple networks.
Influence maximization [5], [6], [7] is the problem of
finding a subset of nodes that would maximize the spread
of information in a given graph. This problem differs
from that of assigning influence scores to every node
in the graph, since the former is a targeting or subset
selection problem, while the latter is a measurement or
ranking problem.
Several metrics have been used in previous work to
measure influence. Alexy et al. [8] used a PageRank-
like social interaction score and number of mentions over
time to measure user influence on Twitter. Behnam et
al. [9] modeled influence using metrics such as number of
followers and ratio of affection. Influence measurement
on social networks also has a temporal aspect to it,
since messages typically have short lifespans in terms of
recieving reactions [10]. This problem of understanding
time-sensitive influence is one that has remained rela-
tively under-explored in previous work. Here we present
a framework for feature engineering that allows granular
measurement across various dimensions and scales to
thousands of features.
The Klout score has been applied to various marketing
applications [1], and has been used in studies about
social behavior [11]. The field of studying influence is
still in its early stages, and this work aims at advancing
such influence measurement systems.
B. Problem Statement
While influence is a broad and subjective concept,
we can quantitatively describe it in terms of observable
reactions to stimuli. If an entity performed an observ-
able reaction in response to a stimulus originating from
another, then we can say that the latter influenced
the former in some manner. Thus we can consider the
influence of an entity to be the ability to induce reactions
in other entities.
More specifically, in the context of social networks,
we can define the influence of a user to be the ability
to induce reactions in other users. Thus an influence
score determines how effectively a user may be able to
influence other users via his or her actions.
Let Grepresent a network or community of users,
who interact with each other via a set of actions A. An
influence score can then be defined as:
Definition 1: Influence Score: For each user uin a
network G, let Gube the subset of the network containing
the users who may directly or indirectly interact with
u, via a set of reactions R ⊆ A. Then an influence
score I(u, T )is a measure of the degree and quantity of
reactions that ucan induce in Guover a specified time
period T.
The score thus determined can be used to relatively
compare influential users in the network. Below we
describe a few of the factos that an influence scoring
system needs to take into account.
C. Considerations
There are several aspects that an influence scoring
system must incorporate in order to be effective:
User Scalability: An effective influence scoring sys-
tem must be able to process information from the com-
plete network graph, which may include hundreds of
millions of users. Some previously suggested approaches
rely on loading the entire graph in-memory to perform
such computation, but these approaches have limitations
when dealing with web-scale datasets. We solve this
problem by leveraging batch processing frameworks that
aggregate and generate features for each user separately
in multiple passes, before combining them hierarchically.
Network Scalability: A user’s online persona typ-
ically spans multiple social and professional networks.
An influence scoring system must therefore be able
to scale across these different networks, to unify the
available information for a user. It should also be able
to handle the distinct sets of interactions and user
behavior patterns associated with each network. Further,
the influence scoring system must also determine the
relative importance of networks when they are combined
together. We discuss these aspects of influence scoring in
the following sections.
Interaction Graph: As shown in [5] influence mea-
surement strategies that relies solely on structural prop-
erties of the graph such as degree and centrality heuris-
tics do not perform well, and it is essential to consider in-
formation dynamics in the network. Thus in addition to
properties such as in-degree and centrality of nodes in a
graph, an influence measurement strategy must identify
and capture variables that indicate dynamic information
flows. Furthermore, the manner of interaction indicates
the strength of influence, and some interactions may
indicate a greater degree of influence than others. This
relative importance of interactions may be determined
by constructing granular features that can be weighted
individually.
Temporal factors: The variables that capture influ-
ence may be broadly categorized as those that capture
long-lasting influence, versus those that capture dynamic
and changing influence. Since the importance of dynamic
variables fade over time [10], an influence measurement
system must be sensitive to time decay of influential
interactions. In our system, we choose a time window
of 90 days to consider dynamic interaction behavior, in
addition to signals that capture long-lasting influence
outside this time window.
Offline factors: It is plain that signals on social
networks are only a partial representation of a user’s
overall influence, and can only provide limited accu-
racy for influence measurement. It is therefore crucial
to incorporate proxy sources that signify a user’s real
world influence. Here we use Wikipedia and news articles
to extract signals that may indicate the user’s offline
influence.
Reach and Strength of Influence: The size of Gumay
vary widely for different users in the network, and an
influence scoring system must determine the importance
of the user’s reach with respect to the number of reac-
tions. Further, the manner and frequency of reactions
may determine how strongly a user influences another.
Thus a user who induces a total of 100 reactions among
10 other users may or may not have the same influence
score as a user who induces 100 reactions among 50
users, depending on the strategy chosen for scoring. An
influence scoring system may choose to score the latter
higher, since she reaches a larger set of people; while
another may score the former higher, since he is able
to more strongly influence a smaller set of users. The
chosen strategy may depend on the application.
III. System Overview
A. Methodology
Here we propose a hierarchical approach to compute
influence scores. We build an interaction graph by cap-
turing reactions generated in response to social media
posts. The reaction types chosen are those that are
strong indicators of influential information flows between
nodes. In addition we derive information from the rela-
tively slow-changing graph structure as well. To factor
in temporal effects and time decay, a trailing window
of activity over 90 days is used. More recent actions
have a greater significance compared to older actions,
all other variables remaining the same. Features such as
PageRank derived from Wikipedia and number of news
article mentions provide indicators of real world or offline
influence.
Features derived from such information are used to
create feature vectors for each user, for each network
or community. Supervised machine learning models are
built using ground truth labels generated for each net-
work. The model weights applied to the network feature
vector for a user gives a network score. The overall
score for a user is computed by combining the scores
Data
Normalization
Interaction
Graph
Normalized
Features
Dynamic
Long
Lasting
Social
Networks klout.com
Who? Where?
When? What?
vs.
GNIP
REST
API
LOG
Activities
Profiles
Graph
Hierarchical
Ground Truth
User voting
Hierarchical
Models
Scoring
Scores
Features
HBase
Community 2
Community 1
Community 3
Figure 1: Scoring Pipeline Overview
from all networks and communities where the user has
a presence, in a hierarchical manner.
While no measurement system can claim to com-
prehensively capture all signals of influence, we design
our system such that it is flexible enough to easily
incorporate new information, as and when it becomes
available.
B. Pipeline
Klout scores are computed for 750 million users from
9 major networks including Twitter (TW), Facebook
(FB), LinkedIn (LI), Google+ (GP), Foursquare (FS),
Instagram (IG), YouTube (YT) and Lithium Communi-
ties (LT), in addition to Wikipedia (WK). When a user
registers on Klout.com he associates his identities on dif-
ferent social networks with his Klout profile. For Twitter,
public data is collected via the Mention Stream1, data
for Lithium Communities2comes from in-house datas-
tores, and data for other social networks is collected via
REST APIs on the user’s behalf, based on the granted
permissions. All collected data is parsed and normalized
to protocol buffers that encode user interactions, graph,
and profile information. Data is collected continuously
from interactions in a trailing window of 90 days using
the Play Framework. The collected data is written out
to a distributed file system, where batched parsing and
processing is done using Hadoop MapReduce and Hive.
The batch processing pipeline derives features for each
user, normalized against the global population. Feature
weights from offline models built using the ground truth
data are then applied to generate Klout scores. The
pipeline overview is shown in Figure 1. Over 45 billion
interactions are processed in each pipeline run, with 0.5
billion new interactions added each day. The daily foot-
print of the pipeline is 196.14CPU days, with 18.46TB
of reads and 9.53TB writes.
1https://gnip.com/sources/twitter/
2http://www.lithium.com/
C. Features
In order to build a supervised model for influence
scores, we generate a set of quantitative features for each
user who is represented by a node in the graph. In some
cases such as LinkedIn Job Titles or Community Badges,
the categorical variables are converted to quantitative
values based on the ordered list of the categories. Thus
all features are designed to be directly proportional to
influence.
The features may be broadly divided into two types -
long-lasting and dynamic. Long-lasting features include
those that change gradually or infrequently. Education
history and Wikipedia PageRank are examples of such
features. The types of long-lasting variables are summa-
rized in Table I, although this is not an exhaustive list.
Over 60 such long-lasting features are considered as part
of the Klout score.
Table I: Types of Long-lasting features
Feature
Type
Features Networks
Node
Degree
Followers, Friends, Fans, Sub-
scribers, In-links
TW, FB,
IG, GP,
WK, YT
Graph
Properties
PageRank, Inlink to Outlink ra-
tio
WK
Categories Job Title, Education Level, En-
dorsements, Recommendations,
Awards, Community Badges
LI, LT
Table II: Common Dimensions for Dynamic Features
Audience
(Who)
Time
(When)
Network
(Where)
Action
(What)
Action
(How)
All,
Higher,
Peers
3, 7, 14,
21, 30,
60, 90
TW, FB,
LI, GP,
FS, IG,
LT, YT
Message,
Photo,
Video
Comment,
Reply, Like
(Upvote),
Mention
(Tag),
Reshare
(Retweet,
via), View
(Impression)
The dynamic features, on the other hand, capture
information flowing through edges in the graph between
users. As described in previous sections, the primary
signal of an influential interaction is when an action
from a user leads to reactions among other users. Each
of these reactions indicates a unit of information flow.
A reaction can be represented by a tuple of dimensions
(Who,When,Where,What,How) as below:
Who: The characteristics of the audience who re-
acted to the original post from the user.
When: The difference between the current time
and the time at which the reaction occured.
Where: The social network on which the reaction
was performed.
What: The unit of original content or action on
which the reaction was performed.
How: The type of reaction.
The first step while generating features is to normalize
all the reactions based on the above dimensions as (ac-
tor,timestamp,network,original content type,action).
Features are generated by aggregating all reactions that
are represented by the same tuple. For example, all the
reactions of the kind "comments from a user’s peers
received on Facebook Photo posts in the last 7 days" are
aggregated into a single feature represented by the tuple
{Peers,7 days,Facebook,Photo,Comment}.
This aggregation is achieved in a single pass through
the dataset, by employing User Defined Functions
(UDFs) such as conditional_emit and multiday_sketch,
applied within Hive queries that are executed as MapRe-
duce jobs. We have open sourced these UDFs in a project
named Brickhouse3.
This feature generation framework has several advan-
tages. Firstly, it allows a large set of dynamic features to
be generated for training. Table II provides the list of the
most commonly used dimensions for generating dynamic
features – 3 cohorts of users, 7 time windows, 8 networks
4, 3 common content types, and 6 common types of
reactions on content. Note that all the listed content
types and actions may not be present for every network,
nor are the dimensional values restricted to those in
Table II. Each network may have its own unique content
types and actions, leading to additional features. Overall
around 3550 dynamic features are generated by the
system, using various combinations of the dimensional
values.
Secondly, the framework allows easy extensibility in
any of the dimensions. Thus adding features for a new
action or a network becomes only a matter of identifying
the specific dimensional values needed.
Thirdly, this approach also provides granularity while
learning a supervised model that assigns weights to
features. By allowing weights to be assigned to specific
tuple combinations, the models can be made sensitive
to changes in each of the dimensions. For example, the
weights assigned to features that represent a reshare
action may carry a higher weight than a like action,
all other dimensions being the same. Similarly a reshare
action by a user whose is more influential than the user
himself may have a higher weight than the same action
from one of the user’s peers.
Finally, aggregating interactions into such dimensions
can be done using multiple passes through the dataset,
which has a computation complexity of O(n). This
3https://github.com/klout/brickhouse
4Since Wikipedia is not a conventional social network, we ex-
clude it when generating dynamic features.
Klout
Twitter Facebook Google+ Lithium
Community 1 Community 2 Community 3
klout
klout:tw klout:fb klout:gp
klout:li:community_1 klout:li:community_2 klout:li:community_3
klout:li
Features
...
...
Wiki
klout:wiki
Figure 2: Klout Score Structure Overview
means that the entire graph need not be loaded in-
memory to perform the computation, and can instead be
performed efficiently on a distributed batch processing
framework such as MapReduce or Hive.
D. Hierarchical Scoring
This hierarchical architecture allows for extensibility
both in terms of depth (granularity of features) as well
as breadth (number of sources). This enables the scoring
mechanism to be sensitive to signals as well as scalable
in terms of networks.
Ground truth data is generated for individual net-
works and communities through tools designed to collect
labeled data. For generating this data, evaluators were
shown pairs of users who were known to them on the
network, and were asked to identify the more influential
one in each pair. Each pair of users was ranked by
multiple users to reduce bias. Close to 1 million such
evaluations were collected for all networks combined,
with the hundreds of thousands of labels for larger
networks such as Twitter and Facebook, and tens of
thousands for smaller networks. To handle ambiguity in
the labels, they were then pre-processed to only pick
pairs with a clear winner by a difference of at least 2
votes.
The features generated typically have a power law
distribution with a long tail. In order to make the fea-
ture values comparable, they are log normalized by the
global maximum value per feature. A feature vector with
elements as these normalized values is then generated for
each user. Models are then trained for ranked users in
the ground truth training set using supervised learning
methods to generate a weight associated with each fea-
ture. Specifically, Non-Negative Least Squares (NNLS)
regression is used for model building, since the features
are designed such that an increase in a feature value
corresponds to higher influence. The trained models have
an F1 score between 0.70 to 0.75 for most networks,
which is relatively high given that human evaluators who
provide the ground truth labels do not always agree on
the ordering.
Network 1
Network 2
Network 3
Raw network score
vector
Weighted network score
vector
Combined score vector
Figure 3: Network Score Combiner (simplified to 3 net-
work dimensions)
Let ni,j represent the jth network or community at
the ith level in the hierarchy, with i= 0 representing
the topmost level. For a user uon a given network ni,j,
we denote the normalized feature vector as f(u, ni,j).
The weights associated with the learnt model for the
network ni,j are given by the weight vector w(ni,j). The
application of these weights applied to a feature vector
yields a score that is in the range of [0,1]. Thus, for
the graph Gucorresponding to the network ni,j, and a
time period Tover which the features are computed, the
influence score for the user in that network is given by:
Ii,j (u, T ) = s(u, ni,j )
=f(u, ni,j )·w(ni,j )
The scores obtained for a user as a result of applying
the learnt weights to the network or community feature
vectors are further combined into a new vector for the
next level in the hierarchy. Let the network ni1,K on the
level i1have kdifferent child networks corresponding
in the ith level. Then the feature vector for ni1,K is
given by:
f(u, ni1,K )=[s(u, ni,1), s(u, ni,2), ..., s(u, Ni,k)] (1)
The weight vector for this level can now be applied to
get a score for ni1,K .
Thus, for networks with child communities, such as
Lithium, the community scores for a user form a feature
vector for the network level, which can be combined to
get a network score. The network level scores are further
combined at the root level to give a score that represents
the influence of the user combined across the different
networks where he is present.
0.01
0.1
1
10
10 20 30 40 50 60 70 80
Reactions Average
Klout Score
Average Perk Content Reactions Count
Figure 4: Analysis of average perk related content reac-
tion count as a function of authors’ average Klout score
At higher levels in the hierarchy, it is challenging to
get ground truth data that represent how networks or
communities may be combined together. In the absence
of such labeled data, it may not be possible to generate
weight vectors using supervised models. Instead, we
extract weight vectors for the higher levels based on
network or community graph properties. For instance,
weights that represent the potential audience that a user
on the network could influence may be derived from
heuristics such as overall graph size or average node
degree.
Further, unlike the features at the lower levels, the
features at these higher levels may be fairly uncorrelated.
This allows us to approximate the child networks as
different orthogonal axes to generate a vector space.
For such levels, the combined score may be computed
as the Euclidean or L2 norm of the vector obtained by
the component-wise product of the weight and network
feature vectors.
s(u, ni,j ) = kf(u, ni,j )w(ni,j )k(2)
where represents the operator for element-wise multi-
plication.
In particular, the raw Klout Score KSr(u), denoted
by the network notation n0,1is given by the L2 norm of
the network scores in level i= 1:
KSr(u) = I0,1(u, T ) = s(u, n0,1)
=kf(u, n0,1)w(n0,1)k(3)
This root level score is finally scaled to [0,100], giving
the Klout score. Since the original features are log
normalized, the final score is also interpreted to be on
a logarithmic scale. Thus a user with a score of 60 may
be αtimes as influential as a user with a score of 50,
where αis the constant associated with the power law
distribution.
In the next section, we perform validation on the
Klout score via several comparisons.
IV. Validation
This section examines the Klout Score from four dif-
ferent aspects to illustrate its correctness and usefulness.
A. Spread of Information
To validate the effectiveness of the Klout Score, we
ran a year long experiment to measure the spread of
information with respect to the user scores. Users with
Klout Scores varying in the range of [10 80] were
targeted with perks, which could be claimed by the users.
The users were encouraged, but not mandated, to post
messages about their experience with the claimed perk.
The users’ audiences then reacted to these messages,
with a higher number of reactions indicating a greater
spread of information. 87,675 users posted messages
after claiming a perk, out of which 18,308 posts recieved
a total of 394,083 reactions.
The average number of reactions are plotted on a log
scale against the targeted users’ Klout scores in Figure
4. The curve shows a monotonically increasing curve,
where users with higher scores show a higher number
of reactions. We also observe an order of magnitude
difference in reactions recieved by users with a score of
60 compared to a score of 30, and similarly for users with
a score of 80 compared to 60. This validates that users
with higher Klout scores are able to spread information
more effectively in a network.
B. Comparisons with Other Systems
Table III: Comparison with ATP Tennis Player Ranking
and Forbes Most Powerful Women Ranking
Ranking ATP Klout Forbes Klout
1 Novak Djokovic 89.54 Hillary Clinton 93.23
2 Roger Federer 90.26 Melinda Gates 83.57
3 Andy Murray 89.50 Mary Barra 77.53
4 Stan Wawrinka 86.86 Christine Lagarde 83.89
5 Kei Nishikori 83.50 Dilma Rousseff 86.84
6 Tomas Berdych 66.69 Sheryl Sandberg 83.18
7 David Ferrer 65.98 Susan Wojcicki 80.04
8 Milos Raonic 82.28 Michelle Obama 87.30
9 Marin Cilic 58.93 Park Geun-hye 81.80
10 Rafael Nadal 82.37 Oprah Winfrey 91.08
1) Real-World Rankings: We also compare the Klout
Score with other real world rankings that indicate influ-
ence. Table III shows the Klout scores compared with
ATP rankings for tennis players5, and Forbes’ list for
most powerful women6, as of June 2015.
To measure the ranking quality of Klout Score,
we adopt the normalized Discounted Cumulative Gain
(nDCG) metric, defined in Eq. 4. The Discounted Cu-
mulative Gain upto position p(DCGp) is calculated as
Eq.5, and the ideal DCG for pis denoted by IDCGp.
nDCGp=DCGp
IDCGp
(4)
5http://www.atpworldtour.com/en/rankings/singles
6http://www.forbes.com/power-women/
Figure 5: Comparison of Klout Score and Google Trends
DCGp=
p
X
i=1
2reli1
log2(i+ 1) (5)
We calculate the IDCGpby using the ATP or Forbes
rankings as the ideal ordering of users. We set the rele-
vance rel of a person as p/rankideal , where the rankideal
is her/his position in the ideal ranking. For example, for
p= 10 the relevance of Novak Djokovic is 10 because his
position is 1in the ATP ranking. Thus his contribution
to the IDCGpmeasure is 2101
log22, whereas his contribution
to the DCGpmeasure for the Klout Score ranking is
2101
log23, since he appears in the 2nd position there. Setting
the relevance in this manner places stronger emphasis in
retrieving correct higher ranked documents.
With this setting, the nDCG10 measure for the Klout
score with respect to the ATP and Forbes ranking is
computed as 0.878 and 0.874, respectively. This demon-
strates that the Klout score is able to capture real world
influence to a high degree for these examples.
2) Google Trends: To observe temporal sensitivity,
we plot the Klout Score for a few entities along with
their Google trends for the last three years in Figure 5.
For both Starbucks and Airbnb, the Klout scores show
similar fluctuations compared to their Google Trends,
indicating a strong correlation between online influence
and search popularity. For Fitbit the Klout score catches
a few spikes that are not seen in Google Trends. This also
reveals that the Klout score is sensitive to short-term
variations, and tracks such changes very closely.
For the music artist Psy, we see that the Google Trend
drops significantly after 2013-06-01 while his Klout score
decreases more gradually. This is because while the
Google Trend reflects only the immediate short-term
popularity, the Klout score incorporates long-lasting
features as well.
C. Influencers By Topic
Since influence is typically contextual, we explore the
effectiveness of the Klout score across different topical
domains. Users in topical domains were identified using
the methodology described in [12], and ranked by their
Klout scores within their respective domains. Figure 6
shows the top 5ranked influencers in selected topics. For
a topic such as politics, we see that the highest scored
users includes prominent politicians such as are Barack
Obama and David Cameron, as well as news and media
entites such as The Washington Post and Fox News,
all of whom are clearly very influential. These examples
clearly depict that Klout score can correctly identify the
influencers in a variety of domains.
V. Conclusion
In this work, we propose a hierarchical scoring system
called the Klout Score, that assigns influence scores to
750 million users across 9 different social networks, by
analyzing 45 billion interactions daily. The framework
scales to hundreds of millions of users by leveraging
distributed batch processing frameworks that aggregate
signals in linear time with respect to the nodes in the
graph.
To create the score, a feature generation framework
aggregates signals across several dimensions for each
user, creating a large feature set containing over two
thousand features. In addition to incoporating signals
from social network interactions, the features also incor-
porate factors such as Wikipedia that provide a proxy for
real world influence. Weights obtained from supervised
Figure 6: Top-5 Influencers by Topic
models are applied to these features to generate network
or community scores. These scores are then combined
hierarchically to get the final Klout Score.
Our experiments also validate that users with higher
Klout Scores are able to spread information wider in a
network. We further compare the performance of the
score against other ranking systems and also analyze
the dynamic nature of the score. We examine different
topical domains and find that highly influential users are
correctly identified within these domains by their high
scores.
The Klout Score presented here is, of course, only a
partial representation of the influence of a user. Nev-
ertheless, by building an extensible feature generation
framework and a hierarchical scoring structure, the sys-
tem is able to easily incorporate new sources of informa-
tion, and therefore grow more accurate over time. Several
applications may be potentially built using an influence
scoring system such as the Klout Score, and we hope this
work enables future work in this area.
References
[1] S. M., “Return on influence: The revolutionary power
of klout, social scoring, and influence marketing.” in
McGraw-Hill Professional: London, 2012.
[2] J. Tang, J. Sun, C. Wang, and Z. Yang, “Social influence
analysis in large-scale networks,” in SIGKDD’09, 2009.
[3] N. Agarwal, H. Liu, L. Tang, and P. S. Yu, “Identifying
the influential bloggers in a community,” in WSDM’08,
2008.
[4] J. Weng, E.-P. Lim, J. Jiang, and Q. He, “Twitter-
rank: finding topic-sensitive influential twitterers,” in
WSDM’10, 2010.
[5] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing
the spread of influence through a social network,” in
SIGKDD’03, 2003.
[6] W. Chen, Y. Wang, and S. Yang, “Efficient influence
maximization in social networks,” in SIGKDD’09, 2009.
[7] S. Chen, J. Fan, G. Li, J. Feng, K.-l. Tan, and J. Tang,
“Online topic-aware influence maximization,” Proceed-
ings of the VLDB Endowment, vol. 8, no. 6, pp. 666–677,
2015.
[8] A. Khrabrov and G. Cybenko, “Discovering influence
in communication networks using dynamic graph analy-
sis,” in IEEE Second International Conference on Social
Computing (SocialCom), 2010.
[9] B. Hajian and T. White, “Modelling influence in a social
network: Metrics and evaluation,” in IEEE Third Iner-
national Conference on Social Computing (SocialCom),
2011.
[10] N. Spasojevic, Z. Li, A. Rao, and P. Bhattacharyya,
“When-to-post on social networks,” in Proc. of ACM
Conference on Knowledge Discovery and Data Mining
(KDD), ser. KDD ’15, 2015.
[11] C. Edwards, P. R. Spence, C. J. Gentile, A. Edwards,
and A. Edwards, “How much klout do you have . . . a test
of system generated cues on source credibility.” vol. 29,
2013, p. A12–A16.
[12] N. Spasojevic, J. Yan, A. Rao, and P. Bhattacharyya,
“Lasta: Large scale topic assignment on multiple social
networks,” in Proc. of ACM Conference on Knowledge
Discovery and Data Mining (KDD), ser. KDD ’14, 2014.

Supplementary resource (1)

... Multiple platforms have been analyzed in [45,63]. In [45], a score is assigned to each user (on a daily basis) through a supervised model that assigns weights to 3,550 features related to the network structure (i.e., graph properties), to the profile of a user (e.g., education), and to the interactions among users (e.g., comment or post). ...
... Multiple platforms have been analyzed in [45,63]. In [45], a score is assigned to each user (on a daily basis) through a supervised model that assigns weights to 3,550 features related to the network structure (i.e., graph properties), to the profile of a user (e.g., education), and to the interactions among users (e.g., comment or post). ...
... For this reason, such actions are outside the scope of this work. Instead, we will focus on active actions that can be observed by all the members of the group (such as publication of contents, comments, reply, and reactions) [45]. ...
Article
Full-text available
The widespread adoption of Online Social Networks (OSNs), the ever-increasing amount of information produced by their users, and the corresponding capacity to influence markets, politics, and society, have led both industrial and academic researchers to focus on how such systems could be influenced . While previous work has mainly focused on measuring current influential users, contents, or pages on the overall OSNs, the problem of predicting influencers in OSNs has remained relatively unexplored from a research perspective. Indeed, one of the main characteristics of OSNs is the ability of users to create different groups types, as well as to join groups defined by other users, in order to share information and opinions. In this article, we formulate the Influencers Prediction problem in the context of groups created in OSNs, and we define a general framework and an effective methodology to predict which users will be able to influence the behavior of the other ones in a future time period, based on historical interactions that occurred within the group. Our contribution, while rooted in solid rationale and established analytical tools, is also supported by an extensive experimental campaign. We investigate the accuracy of the predictions collecting data concerning the interactions among about 800,000 users from 18 Facebook groups belonging to different categories (i.e., News, Education, Sport, Entertainment, and Work). The achieved results show the quality and viability of our approach. For instance, we are able to predict, on average, for each group, around a third of what an ex-post analysis will show being the 10 most influential members of that group. While our contribution is interesting on its own and—to the best of our knowledge—unique, it is worth noticing that it also paves the way for further research in this field.
... De nombreux travaux ont identifiés des indicateurs pour mesurer quantitativement l'influence numérique d'un utilisateur. Dans les premières années de ces plateformes, le nombre d'abonnés d'un utilisateur sur le réseau définissait son score d'influence numérique mais cette mesure a été rapidement révolue [103,95,88,143,113,134]. En réalité, la grande popularité définit l'audience numérique qui peut ne pas prêter d'attention à ce que l'utilisateur publie. ...
... Au délà de la recherche scientifique et académique, certaines de ces approches sont implémentées dans des systèmes opérationnels et professionnels. Les propriétaires de certaines de ces systèmes tels que Klout 3 [134,113], Kred 4 [114,97], PerIndex 5 [114] indiquent avoir dans leurs algorithmes des coefficients de qualité qui permettent de considérer le score d'influence numérique calculé comme révélateur de celle de l'utilisateur dans la vie réelle. ...
Thesis
La recherche des conditions et des moyens permettant de représenter, de mieux contrôler et de communiquer efficacement avec son environnement a toujours été au centre des préoccupations de l'humain. Si les avancées technologiques telles que des téléphones intelligents (smartphones), le Web et les réseaux sociaux numériques nous permettent aujourd'hui de répondre à cette préoccupation, tel n'a pas toujours été le cas. Nos travaux dans cette thèse traitent des documents médiévaux considérés comme un moyen de communication et comme un modèle de réseau social du Moyen Âge. Ces documents, dénommés enluminures médiévales, sont des peintures luxueuses qui auraient été utilisées à cette époque pour représenter l'environnement réel et des mondes idéaux pour les élites tels que les princes, les ducs, etc. Ces peintures contiennent beaucoup d'entités symboliques et métaphoriques comme des ornements, les animaux fabuleux, etc. L'ambiguïté sémantique de telles entités, certaines de leurs combinaisons et les contextes de réalisation des enluminures elles-mêmes font que leurs interprétations et la compréhension des messages véhiculés sont actuellement réservées exclusivement aux médiévistes. Dans nos travaux, nous proposons une approche qui réduit l'hétérogénéité sémantique des entités de notre domaine d'intérêt (les enluminures, les réseaux sociaux, l'influence sociale). Elle permet également aux médiévistes de transmettre leur savoir-faire sur la compréhension de ces peintures médiévales à travers une formalisation des connaissances qu'elles contiennent. L'ambition est de permettre une compréhension plus générale des enluminures et des interprétations de leurs entités. Pour ce faire, nous exploitons des techniques et des méthodes de l'ingénierie des connaissances qui seront appliquées aux enluminures. Les contenus de ces dernières sont extraits et représentés dans un modèle formel. Ce modèle sert dans une application Web permettant à tout utilisateur de décrire des enluminures et de détecter automatiquement certains de leurs contenus. Cette détection automatique est effectuée grâce à des méthodes d'apprentissage machine (machine learning) implémentées dans l'application Web. Parallèlement, une étude comparative entre les principes de propagation de l'influence sociale à travers les enluminures et ceux à travers les réseaux sociaux numériques est faite. Cette comparaison donne des éléments de réflexion sur les mesures par lesquelles les questions relatives à cette notion d'influence sociale, telles que son évaluation, sa qualité, sa pérennité, sa maximisation, pourraient être traitées au mieux.
... O tipo de reação indica a força de influência que a mensagem teve no usuário. (Rao, Spasojevic, Li & Souza, 2015, p. 2282. ...
Chapter
Full-text available
Este capítulo tem dois objetivos gerais: criticar as métricas contemporâneas das mídias sociais, sobretudo aquelas descritas como métricas de vaidade, e desenvolver um conjunto alternativo de métricas (análise crítica), que desloquem o foco da mensuração do self online e da vaidade em mídias sociais para a rede de questões problemáticas (issue networks) e para o engajamento. A justificativa para essa mudança de foco diz respeito ao fato de as mídias sociais não serem apenas espaços para a performance de si e para uma produtiva rede social de contatos (networking), mas um local para a mobilização de públicos em torno de questões e causas sociais.
... Even though it was discontinued in May 2018, research suggests that it is a good proxy for credibility[55]. Klout score the ranges from zero to 100 and is based on three components across nine different social media platforms: (i) true reach, i.e. how many people a user influences; (ii) amplification, i.e. how much the user influences them; and (iii) network impact, i.e. the influence of the user's network[56,57].2 All applied filters were not case sensitive.2 Blockchain: A Technology in Search of Legitimacy ...
Chapter
Blockchain can potentially disrupt business processes across many industries. Despite substantial hype, blockchain adoption still remains quite low. Research suggests that the success and adoption of a new technology depends on whether it is deemed legitimate by key stakeholders. This is extremely important in the context of blockchain given the controversial reputation that some derivative innovations, such as Bitcoin, have gained over time. This chapter explores and compares how four types of business actors which play a prominent role in the blockchain ecosystem—the media, IT, financial services and consulting firms—attempted to build legitimacy around blockchain and Bitcoin on Twitter. Findings suggest that the messaging strategies employed by firms for blockchain and Bitcoin differ. While no dominant firm-level messaging strategies to build legitimacy around Bitcoin emerge, firms primarily employ three micro-level legitimation strategies to build legitimacy around blockchain. More specifically, communication strategies around blockchain focus on publishing the involvement of influential actors (pragmatic legitimacy), highlighting blockchain ongoing developments and associated market responses (cognitive legitimacy), and presenting technological advancements directly or indirectly related to blockchain (cognitive legitimacy). The findings illustrate how the key actors engage in the blockchain and Bitcoin legitimation discourse to help interpret this complex innovation and mobilise the market to adopt.
Chapter
Full-text available
Cyber World has become accessible, public and commonly used to distribute and exchange messages between malicious actors, terrorists, and illegally motivated persons. Electronic mail is one of the most frequently used transfers of information on internet media. E-mails are the most important digital proof that courts in various countries and communities use to condemn and that enables researchers to work continually to improve e-mail analysis using state-of-the-art technology to find digital evidence from e-mails. This work introduces a distinctive technology to analyze emails. It is based on consecutive phases, starting with data processing, extraction, compilation, then implementing the SWARM algorithm to adjust the output and to transfer these electronic mails for realistic and precise results by adjusting the support algorithm of vector machines. For email forensic analysis this system includes all the sentiment terms plus positives and negatives cases. It can deal with the machine learning algorithm (Sent WordNet 3.0). Enron Data set is used to test the proposed framework. In the best case, a high accuracy rate is 92%.
Chapter
Recently, by the explosion of information technology, the valuable and available data exponentially increases in various social media platforms which allow us to exploit and attain convenient information and transform it into knowledge. This means that prominent topics are extracted on time in the social media community by leveraging the proper techniques and methods. Although there are various novel approaches in this area, they almost ignore the factors of the user interactions. Besides, since the enormous size of the textual dataset is distributed to any languages and the requirements for trending detection in a specific language, most of the proposed methods concentrating on English. In this paper, we proposed a graph-based method for the Vietnamese dataset in which graph nodes represent posts. More specifically, the approach combines the user interactions with a word graph representation which then is extracted to the topic trends by the RankClus Algorithm. By applying our proposal in several Facebook and Twitter datasets, we introduce dominantly dependable and coherent results.
Article
The recent privacy incidents reported in major media about global social networks raised real public concerns about centralized architectures. P2P social networks constitute an interesting paradigm to give back users control over their data and relations. While basic social network functionalities such as commenting, following, sharing, and publishing content are widely available, more advanced features related to information retrieval and recommendation are still challenging. This is due to the absence of a central server that has a complete view of the network. In this paper, we propose a new recommender system called P2PCF. We use collaborative filtering approach to recommend content in P2P social networks. P2PCF enables privacy preserving and tackles the cold start problem for both users and content. Our proposed approach assumes that the rating matrix is distributed within peers, in such a way that each peer only sees interactions made by her friends on her timeline. Recommendations are then computed locally within each peer before they are sent back to the requester. Our evaluations prove the effectiveness of our proposal compared to a centralized scheme in terms of recall and coverage.
Article
Full-text available
Traditionally, banks follow a risk assessment model in sanctioning loans. Risk assessment is performed by computing a credit score considering certain financial factors. This work proposes a behavioral score that can be computed from social media data. Social media covers almost all aspects of a person’s life. Integrating the credit score with the behavioral score of a person lowers the risk that comes with traditional assessment models. The behavioral score is measured by the profile score, financial attitude, and twit score. A general profile score is computed for the data fetched from Twitter. The twit score of a person is calculated by considering multiple parameters like relevance, usage, and authenticity. Additionally, to strengthen the model, a novel multi-level voting ensemble is implemented with 84% accuracy to scrutinize the financial attitude of the individuals. Pair wise comparison is used to reveal the importance of the various criteria analyzed. The behavioral score is derived by aggregating the three scores accordingly. This research work proposes fusing social media details as an added risk evaluation feature in granting loans.
Article
Full-text available
Purpose A major part of knowledge management for knowledge-intensive firms such as professional service firms is the increasing focus on thought leadership. Despite being a well-known term, it is poorly defined and analysed in the academic and practitioner literature. The aim of this article is to answer three questions. First, what is thought leadership? Second, what tensions exist when seeking to create thought leadership in knowledge-based organisations? Third, what further research is needed about thought leadership? The authors call for cross-disciplinary and academic–practitioner approaches to understanding the field of thought leadership. Design/methodology/approach The authors review the academic and practitioner literature on thought leadership to provide a rich oversight of how it is defined and can be understood by separating inputs, creation processes and outcomes. The authors also draw on qualitative data from 12 in-depth interviews with senior leaders of professional service firms. Findings Through analysing and building on previous understandings of the concept, the authors redefine thought leadership as follows: “Knowledge from a trusted, eminent and authoritative source that is actionable and provides valuable solutions for stakeholders”. The authors find and explore nine tensions that developing thought leadership creates and propose a framework for understanding how to engage with thought leadership at the industry/macro, organisational/meso and individual/micro levels. The authors propose a research agenda based on testing propositions derived from new theories to explain thought leadership, including leadership, reducing risk, signalling quality and managing social networks, as well as examining the suggested ways to resolve different tensions. Originality/value To the best of the authors’ knowledge, they are the first to separate out thought leadership from its inputs, creation processes and outcomes. The authors show new organisational paradoxes within thought leadership and show how they can play out at different levels of analysis when implementing a thought leadership strategy. This work on thought leadership is set in a relatively under-explored context for knowledge management researchers, namely, knowledge-intensive professional service firms.
Chapter
In a world flooded with information, often irrelevant, lucidity is power. Never as in this historical period can anyone, thanks to new technologies, participate as a protagonist in the debates raised about events and issues that affect our society. In this flood of information, remaining lucid and knowing how to discriminate between real and false becomes fundamental. In this scenario, a leading role is played by Fake News, information that is partly or entirely untrue, divulged through the Web, the media, or digital communication technologies. Fake news is characterized by an apparent plausibility, the latter fed by a distorted system of public opinion expectations, and by an amplification of the prejudices based on it, which facilitates its sharing and diffusion even in the absence of verification of the sources. Fake News is becoming a severe problem that affects various sectors of society: medicine, politics, culture, history are some of the areas that suffer most from the phenomenon of fake news, which can often generate significant social problems. This paper will introduce a probabilistic approach to determining the degree of truthfulness of the information. The system is based on the definition of some features, identified after an analysis of fake news in the literature through NLP-based approaches and statistical methods. The specified features will highlight the syntactic, semantic, and social features of the information. These features are combined in a Bayesian Network, previously trained on a dataset composed of fake news, to provide a probabilistic level of the truthfulness of the information analyzed. The proposed method has been tested in some real cases with very satisfactory results.
Article
Full-text available
For many users on social networks, one of the goals when broadcasting content is to reach a large audience. The probability of receiving reactions to a message differs for each user and depends on various factors, such as location, daily and weekly behavior patterns and the visibility of the message. While previous work has focused on overall network dynamics and message flow cascades, the problem of recommending personalized posting times has remained an under-explored topic of research. In this study, we formulate a when-to-post problem, where the objective is to find the best times for a user to post on social networks in order to maximize the probability of audience responses. To understand the complexity of the problem, we examine user behavior in terms of post-to-reaction times, and compare cross-network and cross-city weekly reaction behavior for users in different cities, on both Twitter and Facebook. We perform this analysis on over a billion posted messages and observed reactions, and propose multiple approaches for generating personalized posting schedules. We empirically assess these schedules on a sampled user set of 0.5 million active users and more than 25 million messages observed over a 56 day period. We show that users see a reaction gain of up to 17% on Facebook and 4% on Twitter when the recommended posting times are used. We open the dataset used in this study, which includes timestamps for over 144 million posts and over 1.1 billion reactions. The personalized schedules derived here are used in a fully deployed production system to recommend posting times for millions of users every day.
Article
Full-text available
Millions of people use social networks everyday to talk about a variety of subjects, publish opinions and share information. Understanding this data to infer user's topical interests is a challenging problem with applications in various data-powered products. In this paper, we present 'LASTA' (Large Scale Topic Assignment), a full production system used at Klout, Inc., which mines topical interests from five social networks and assigns over 10,000 topics to hundreds of millions of users on a daily basis. The system continuously collects streams of user data and is reactive to fresh information, updating topics for users as interests shift. LASTA generates over 50 distinct features derived from signals such as user generated posts and profiles, user reactions such as comments and retweets, user attributions such as lists, tags and endorsements, as well as signals based on social graph connections. We show that using this diverse set of features leads to a better representation of a user's topical interests as compared to using only generated text or only graph based features. We also show that using cross-network information for a user leads to a more complete and accurate understanding of the user's topics, as compared to using any single network. We evaluate LASTA's topic assignment system on an internal labeled corpus of 32,264 user-topic labels generated from real users.
Conference Paper
Full-text available
Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide sugges- tions, report news, and form groups in Blogosphere. Blog- gers form their virtual communities of similar interests. Ac- tivities happened in Blogosphere affect the external world. One way to understand the development on Blogosphere is to find influential blog sites. There are many non-influential blog sites which form the "the long tail". Regardless of a blog site being influential or not, there are influential blog- gers. Inspired by the high impact of the influentials in a physical community, we study a novel problem of identify- ing influential bloggers at a blog site. Active bloggers are not necessarily influential. Influential bloggers can impact fellow bloggers in various ways. In this paper, we discuss the challenges of identifying influential bloggers, investigate what constitutes influential bloggers, present a preliminary model attempting to quantify an influential blogger, and pave the way for building a robust model that allows for finding various types of the influentials. To illustrate these issues, we conduct experiments with data from a real-world blog site, evaluate multi-facets of the problem of identify- ing influential bloggers, and discuss unique challenges. We conclude with interesting findings and future work.
Conference Paper
Full-text available
This paper focuses on the problem of identifying influential users of micro-blogging services. Twitter, one of the most notable micro-blogging services, employs a social-networking model called "following", in which each user can choose who she wants to "follow" to receive tweets from without requiring the latter to give permission first. In a dataset prepared for this study, it is observed that (1) 72.4% of the users in Twitter follow more than 80% of their followers, and (2) 80.5% of the users have 80% of users they are following follow them back. Our study reveals that the presence of "reciprocity" can be explained by phenomenon of homophily. Based on this finding, TwitterRank, an extension of PageRank algorithm, is proposed to measure the influence of users in Twitter. TwitterRank measures the influence taking both the topical similarity between users and the link structure into account. Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank.
Article
Influence maximization, whose objective is to select k users (called seeds) from a social network such that the number of users influenced by the seeds (called influence spread) is maximized, has attracted significant attention due to its widespread applications, such as viral marketing and rumor control. However, in real-world social networks, users have their own interests (which can be represented as topics) and are more likely to be influenced by their friends (or friends' friends) with similar topics. We can increase the influence spread by taking into consideration topics. To address this problem, we study topic-aware influence maximization, which, given a topic-aware influence maximization (TIM) query, finds k seeds from a social network such that the topic-aware influence spread of the k seeds is maximized. Our goal is to enable online TIM queries. Since the topic-aware influence maximization problem is NP-hard, we focus on devising efficient algorithms to achieve instant performance while keeping a high influence spread. We utilize a maximum influence arborescence (MIA) model to approximate the computation of influence spread. To efficiently find k seeds under the MIA model, we first propose a best-effort algorithm with 1 − 1/e approximation ratio, which estimates an upper bound of the topic-aware influence of each user and utilizes the bound to prune large numbers of users with small influence. We devise effective techniques to estimate tighter upper bounds. We then propose a faster topic-sample-based algorithm with ε · (1 − 1/e) approximation ratio for any ε ∈ (0, 1], which materializes the influence spread of some topic-distribution samples and utilizes the materialized information to avoid computing the actual influence of users with small influences. Experimental results show that our methods significantly outperform baseline approaches.
Conference Paper
In large social networks, nodes (users, entities) are influenced by others for various reasons. For example, the colleagues have strong influence on one's work, while the friends have strong influence on one's daily life. How to differentiate the social influences from dif- ferent angles(topics)? How to quantify the strength of those social influences? How to estimate the model on real large networks? To address these fundamental questions, we propose Topical Affin- ity Propagation (TAP) to model the topic-level social influence on large networks. In particular, TAP can take results of any topic modeling and the existing network structure to perform topic-level influence propagation. With the help of the influence analysis, we present several important applications on real data sets such as 1) what are the representative nodes on a given topic? 2) how to iden- tify the social influences of neighboring nodes on a particular node? To scale to real large networks, TAP is designed with efficient distributed learning algorithms that is implemented and tested un- der the Map-Reduce framework. We further present the common characteristics of distributed learning algorithms for Map-Reduce. Finally, we demonstrate the effectiveness and efficiency of TAP on real large data sets.
Conference Paper
Influence maximization is the problem of finding a small subset of nodes (seed nodes) in a social network that could maximize the spread of influence. In this paper, we study the influence maxi- mization problem from two angles in order to significantly reduce the running time of existing algorithms. One is to improve the orig- inal greedy algorithm of (6) and its improvement (9), and the sec- ond is to propose new degree discount heuristics for the problem. We evaluate our algorithms by experiments on two large academic collaboration graphs obtained from the online archival database arXiv.org. Our experimental results show that (a) our improved greedy algorithm achieves better running time comparing with the improvement of (9) with matching influence spread, (b) our degree discount heuristics achieve much better influence spread than clas- sic degree and centrality-based heuristics, and when tuned for a specific influence cascade model, it achieve almost matching influ- ence thread with the greedy algorithm, and more importantly (c) the degree discount heuristics run only in milliseconds while even the improved greedy algorithms run in hours in our experiment graph with a few tens of thousands of nodes. Base on our results, we believe that fine-tuned heuristics may provide very promising solutions to the influence maximization problem with satisfying influence spread and blazingly fast running time. This is a counter argument to the conclusion of (6) that tra- ditional heuristics cannot compete with the greedy approximation algorithm. All of our experimental data and source code will be made available soon on the first author's web site (http://research.microsoft.com/en-us/people/weic/).
Conference Paper
Social recommender systems are a recently introduced type of decision support system. One of the issues to be resolved in social recommender systems is the identification of opinion leaders in a network. The focus of this paper is the analysis of a network based on the interactions between users called behavioral analysis. The hypothesis explored in this paper is that Influence Rank can be quantified based on the interaction between users and their behavior. The Influence Rank for a node is defined as the average Influence Rank of its neighborhoods combined with another index called Magnitude of Influence. The correlation between the proposed indices is analyzed in this paper. This combined measure is calculated by a recursive algorithm whose calculation complexity is non-polynomial. However, this measure can be estimated by using the Page Rank algorithm. Results supporting the utility of the measure and the accuracy of its estimation using the Page Rank approximation are presented.
Conference Paper
The rise of Internet-based social networks has shifted many decision-impacting discussions online. Increasingly, people weigh new ideas, choose products, pick technologies, find entertainment and socialize virtually by engaging in online discourse. The participants depend on who people find online, who they get to know and trust, and who they consider as authorities on subjects of interest. This paper presents techniques to track who has influence in such a network and how they got there. Many definitions of influence are possible; here we focus specifically on the social interaction and its dynamics, using Twitter as the reference network and data source. We build a replier graph from each user A's messages mentioning another user B (which may be either "for" or "about" B), and study how this graph evolves. (In a tweet from A mentioning @B, A is the replier mentioning B.) For every day in the study, we compute a pagerank-type score and a drank, a dynamic function of the pagerank, for all users, together with a series of features such as the number of mentions a user gives or receives. The daily-versioned features enable exploratory data analysis of the conversational dynamics by looking at the relative decline or growth in specific features for every user every day, separately or relative to others. For instance, we find the longest periods of growth in the number of times a user A is mentioned by other users on a day d, m=|M(A,d)|, over a contiguous period of days, and also compute its acceleration over that period, dm/dt. Those accelerating the most, or sustaining the longest growth, or both, are worth closer modeling. Our metrics are applicable to any evolving directed graphs and allow us to find people of growing influence in social networks based purely on the structure and dynamics of their conversations. These are the first dynamic metrics for social networks which take into the account both global and local influence (pagerank and repliers), and can be app- - lied to other communication networks as well. Most interestingly, using them, we uncover a high-intensity ecosystem with its own "mind economy," adapting to maximize the participants' rankings and promote their shared message.