ArticlePDF Available

Trends in Social Media : Persistence and Decay

Authors:

Abstract

Social media generates a prodigious wealth of real-time content at an incessant rate. From all the content that people create and share, only a few topics manage to attract enough attention to rise to the top and become temporal trends which are displayed to users. The question of what factors cause the formation and persistence of trends is an important one that has not been answered yet. In this paper, we conduct an intensive study of trending topics on Twitter and provide a theoretical basis for the formation, persistence and decay of trends. We also demonstrate empirically how factors such as user activity and number of followers do not contribute strongly to trend creation and its propagation. In fact, we find that the resonance of the content with the users of the social network plays a major role in causing trends.
Electronic copy available at: http://ssrn.com/abstract=1755748
Trends in Social Media : Persistence and Decay
Sitaram Asur
Social Computing Lab
HP Labs
Palo Alto, California, USA
sitaram.asur@hp.com
Bernardo A. Huberman
Social Computing Lab
HP Labs
Palo Alto, California, USA
bernardo.huberman@hp.com
Gabor Szabo
Social Computing Lab
HP Labs
Palo Alto, California, USA
gabors@hp.com
Chunyan Wang
Dept. of Applied Physics
Stanford University
Palo Alto, California, USA
chunyan@stanford.edu
ABSTRACT
Social media generates a prodigious wealth of real-time content
at an incessant rate. From all the content that people create and
share, only a few topics manage to attract enough attention to rise
to the top and become temporal trends which are displayed to users.
The question of what factors cause the formation and persistence of
trends is an important one that has not been answered yet. In this
paper, we conduct an intensive study of trending topics on Twitter
and provide a theoretical basis for the formation, persistence and
decay of trends. We also demonstrate empirically how factors such
as user activity and number of followers do not contribute strongly
to trend creation and its propagation. In fact, we find that the res-
onance of the content with the users of the social network plays a
major role in causing trends.
1. INTRODUCTION
Social media is growing at an explosive rate, with millions of peo-
ple all over the world generating and sharing content on a scale
barely imaginable a few years ago. This has resulted in massive
participation with countless number of updates, opinions, news,
comments and product reviews being constantly posted and dis-
cussed in social web sites such as Facebook, Digg and Twitter, to
name a few.
This widespread generation and consumption of content has cre-
ated an extremely competitive online environment where different
types of content vie with each other for the scarce attention of the
user community. In spite of the seemingly chaotic fashion with
which all these interactions take place, certain topics manage to
attract an inordinate amount of attention, thus bubbling to the top
in terms of popularity. Through their visibility, this popular top-
ics contribute to the collective awareness of what is trending and at
times can also affect the public agenda of the community.
At present there is no clear picture of what causes these topics to
become extremely popular, nor how some persist in the public eye
longer than others. There is considerable evidence that one aspect
that causes topics to decay over time is their novelty [11]. Another
factor responsible for their decay is the competitive nature of the
medium. As content starts propagating throught a social network
it can usurp the positions of earlier topics of interest, and due to
the limited attention of users it is soon rendered invisible by newer
content. Yet another aspect responsible for the popularity of certain
topics is the influence of members of the network on the propaga-
tion of content. Some users generate content that resonates very
strongly with their followers thus causing the content to propagate
and gain popularity [9].
The source of that content can originate in standard media outlets
or from users who generate topics that eventually become part of
the trends and capture the attention of large communities. In either
case the fact that a small set of topics become part of the trending
set means that they will capture the attention of a large audience
for a short time, thus contributing in some measure to the public
agenda. When topics originate in media outlets, the social medium
acts as filter and amplifier of what the standard media produces and
thus contributes to the agenda setting mechanisms that have been
thoroughly studied for more than three decades [7] .
In this paper, we study trending topics on Twitter, an immensely
popular microblogging network on which millions of users create
and propagate enormous content via a steady stream on a daily ba-
sis. The trending topics, which are shown on the main website, rep-
resent those pieces of content that bubble to the surface on Twitter
owing to frequent mentions by the community. Thus they can be
equated to crowdsourced popularity. We then determine the fac-
tors that contribute to the creation and evolution of these trends, as
they provide insight into the complex interactions that lead to the
popularity and persistence of certain topics on Twitter, while most
others fail to catch on and are lost in the flow.
We first analyze the distribution of the number of tweets across
trending topics. We observe that they are characterized by a strong
log-normal distribution, similar to that found in other networks
such as Digg and which is generated by a stochastic multiplicative
process [11]. We also find that the decay function for the tweets is
mostly linear. Subsequently we study the persistence of the trends
Electronic copy available at: http://ssrn.com/abstract=1755748
to determine which topics last long at the top. Our analysis reveals
that there are few topics that last for long times, while most topics
break fairly quickly, in the order of 20-40 minutes. Finally, we look
at the impact of users on trend persistence times within Twitter. We
find that traditional notions of user influence such as the frequency
of posting and the number of followers are not the main drivers
of trends, as previously thought. Rather, long trends are charac-
terized by the resonating nature of the content, which is found to
arise mainly from traditional media sources. We observe that social
media behaves as a selective amplifier for the content generated by
traditional media, with chains of retweets by many users leading to
the observed trends.
2. RELATED WORK
There has been some prior work on analyzing connections on Twit-
ter. Huberman et al. [5] studied social interactions on Twitter to
reveal that the driving process for usage is a sparse hidden network
underlying the friends and followers, while most of the links rep-
resent meaningless interactions. Jansen et al. [6] have examined
Twitter as a mechanism for word-of-mouth advertising. They con-
sidered particular brands and products and examined the structure
of the postings and the change in sentiments. Galuba et al. [4]
proposed a propagation model that predicts which users will tweet
about which URL based on the history of past user activity.
Yang and Leskovec [12] examined patterns of temporal behavior
for hashtags in Twitter. They presented a stable time series cluster-
ing algorithm and demonstrate the common temporal patterns that
tweets containing hashtags follow. There have also been earlier
studies focused on social influence and propagation. Agarwal et
al. [1] studied the problem of identifying influential bloggers in the
blogosphere. They discovered that the most influential bloggers
were not necessarily the most active. Aral et al, [2] have distin-
guished the effects of homophily from influence as motivators for
propagation. As to the study of influence within Twitter, Cha et
al. [3] performed a comparison of three different measures of influ-
ence - indegree, retweets, and user mentions. They discovered that
while retweets and mentions correlated well with each other, the in-
degree of users did not correlate well with the other two measures.
Based on this, they hypothesized that the number of followers may
not a good measure of influence. Recently, Romero and others [9]
introduced a novel influence measure that takes into account the
passivity of the audience in the social network. They developed an
iterative algorithm to compute influence in the style of the HITS al-
gorithm and empirically demonstrated that the number of followers
is a poor measure of influence.
3. TWITTER
Twitter is an extremely popular online microblogging service, that
has gained a very large user following, consisting of close to 200
million users. The Twitter graph is a directed social network, where
each user chooses to follow certain other users. Each user submits
periodic status updates, known as tweets, that consist of short mes-
sages limited in size to 140 characters. These updates typically
consist of personal information about the users, news or links to
content such as images, video and articles. The posts made by a
user are automatically displayed on the user’s profile page, as well
as shown to his followers. A retweet is a post originally made by
one user that is forwarded by another user. Retweets are useful for
propagating interesting posts and links through the Twitter commu-
nity.
Twitter has attracted lots of attention from corporations due to the
immense potential it provides for viral marketing. Due to its huge
reach, Twitter is increasingly used by news organizations to dis-
seminate news updates, which are then filtered and commented on
by the Twitter community. A number of businesses and organiza-
tions are using Twitter or similar micro-blogging services to adver-
tise products and disseminate information to stockholders.
4. TWITTER TRENDS DATA
Trending topics are presented as a list by Twitter on their main Twit-
ter.com site, and are selected by an algorithm proprietary to the
service. They mostly consist of two to three word expressions, and
we can assume with a high confidence that they are snippets that
appear more frequently in the most recent stream of tweets than
one would expect from a document term frequency analysis such
as TFIDF. The list of trending topics is updated every few minutes
as new topics become popular.
Twitter provides a Search API for extracting tweets containing par-
ticular keywords. To obtain the dataset of trends for this study,
we repeatedly used the API in two stages. First, we collected the
trending topics by doing an API query every 20 minutes. Second,
for each trending topic, we used the Search API to collect all the
tweets mentioning this topic over the past 20 minutes. For each
tweet, we collected the author, the text of the tweet and the time it
was posted. Using this procedure for data collection, we obtained
16.32 million tweets on 3361 different topics over a course of 40
days in Sep-Oct 2010.
We picked 20 minutes as the duration of a timestamp after evaluat-
ing different time lengths, to optimize the discovery of new trends
while still capturing all trends. This is due to the fact that Twit-
ter only allows 1500 tweets per search query. We found that with
20 minute intervals, we were able to capture all the tweets for the
trending topics efficiently.
We noticed that many topics become trends again after they stop
trending according to the Twitter trend algorithm. We therefore
considered these trends as separate sequences: it is very likely that
the spreading mechanism of trends has a strong time component
with an initial increase and a trailing decline, and once a topic stops
trending, it should be considered as new when it reappears among
the users that become aware of it later. This procedure split the
3468 originally collected trend titles into 6084 individual trend se-
quences.
5. DISTRIBUTION OF TWEETS
We measured the number of tweets that each topic gets in 20 minute
intervals, from the time the topic starts trending until it stops, as
described earlier. From this we can sum up the tweet counts over
time to obtain the cumulative number of tweets Nq(ti)of topic q
for any time frame ti,
Nq(ti) =
i
X
τ=1
nq(tτ),(1)
where nq(t)is the number of tweets on topic qin time interval
t. Since it is plausible to assume that initially popular topics will
stay popular later on in time as well, we can calculate the ratios
Cq(ti, tj) = Nq(ti)/Nq(tj)for topic qfor time frames tiand tj.
Figure 1(a) shows the distribution of Cq(ti, tj)s over all topics for
four arbitrarily chosen pairs of time frames (nevertheless such that
ti> tj, and tiis relatively large, and tjis small).
(a)
Relative tweet count
Density
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
100.2100.4 100.6100.8 101101.2 101.4101.6
Relative tweet count
Density
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
100.4100.6 100.8 101101.2 101.4 101.6
Relative tweet count
Density
0
1
2
3
4
100.2 100.4 100.6 100.8 101
Relative tweet count
Density
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
100.2100.4 100.6 100.8 101101.2 101.4101.6
(b)
Normal theoretical quantiles
Data quantiles
−1.4
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
−3 −2 −1 0 1 2 3
Normal theoretical quantiles
Data quantiles
0.4
0.6
0.8
1.0
1.2
1.4
1.6
−2 −1 0 1 2
Normal theoretical quantiles
Data quantiles
0.2
0.4
0.6
0.8
−3 −2 −1 0 1 2 3
Normal theoretical quantiles
Data quantiles
0.4
0.6
0.8
1.0
1.2
1.4
1.6
−2 −1 0 1 2
Figure 1: (a) The densities of the ratios between cumulative tweet counts measured in two respective time frames. From left to right
in the figure, the indices of the time frames between which the ratios were taken are: (2, 10), (2, 14), (4, 10), and (4, 14), respectively.
The horizontal axis has been rescaled logarithmically, and the solid line in the plots shows the density estimates using a kernel
smoother. (b) The Q-Q plots of the cumulative tweet distributions with respect to normal distributions. If the random variables of
the data were a linear transformation of normal variates, the points would line up on the straight lines shown in the plots. The tails
of the empirical distributions are apparently heavier than in the normal case.
These figures immediately suggest that the ratios Cq(ti, tj)are dis-
tributed according to log-normal distributions, since the horizontal
axes are logarithmically rescaled, and the histograms appear to be
Gaussian functions. To check if this assumption holds, consider
Fig. 1(b), where we show the Q-Q plots of the distributions of
Fig. 1(a) in comparison to normal distributions. We can observe
that the (logarithmically rescaled) empirical distributions exhibit
normality to a high degree for later time frames, with the excep-
tion of the high end of the distributions. These 10-15 outliers occur
more frequently than could be expected for a normal distribution.
Log-normals arise as a result of multiplicative growth processes
with noise [8]. In our case, if Nq(t)is the number of tweets for
a given topic qat time t, then the dynamics that leads to a log-
normally distributed Nq(t)over qcan be written as:
Nq(t) = [1 + γ(t)ξ(t)] Nq(t1),(2)
where the random variables ξ(t)are positive, independent and iden-
tically distributed as a function of twith mean 1and variance σ2.
Note that time here is measured in discrete steps (t1expresses
the previous time step with respect to t), in accordance with our
measurement setup. γ(t)is introduced to account for the novelty
decay [11]. We would expect topics to initially increase in popu-
larity but to slow down their activity as they become obsolete or
known to most users. Since γ(t)is made up of decreasing positive
numbers, the growth of Ntslows with time.
To see that Eq. (2) leads to a log-normal distribution of Nq(t), we
first expand the recursion relation:
Nq(t) =
t
Y
s=1
[1 + γ(s)ξ(s)] Nq(0).(3)
Here Nq(0) is the initial number of tweets in the earliest time step.
Taking the logarithm of both sides of Eq. (3),
ln Nq(t)ln Nq(0) =
t
X
s=1
ln [1 + γ(s)ξ(s)] (4)
The RHS of Eq. (4) is the sum of a large number of random vari-
ables. The central limit theorem states thus that if the random
variables are independent and identically distributed, then the sum
asymptotically approximates a normal distribution. The i.i.d condi-
tion would hold exactly for the ξ(s)term, and it can be shown that
in the presence of the discounting factors (if the rate of decline is
not too fast), the resulting distribution is still normal [11].
In other words, we expect from this model that ln [Nq(t)/Nq(0)]
will be distributed normally over qwhen fixing t. These quantities
were shown in Fig. 1 above. Essentially, if the difference between
the two times where we take the ratio is big enough, the log-normal
property is observed.
The intuitive explanation for the multiplicative model of Eq. (2)
is that at each time step the number of new tweets on a topic is a
multiple of the tweets that we already have. The number of past
tweets, in turn, is a proxy for the number of users that are aware of
the topic up to that point. These users discuss the topic on different
forums, including Twitter, essentially creating an effective network
through which the topic spreads. As more users talk about a par-
ticular topic, many others are likely to learn about it, thus giving
the multiplicative nature of the spreading. The noise term is nec-
essary to account for the stochasticity of this process. On the other
hand, the monotically decreasing γ(t)characterizes the decay in
timeliness and novelty of the topic as it slowly becomes obsolete
and known to most users, and guarantees that Nq(t)does not grow
Time (hours)
γ
10−2
10−1.5
10−1
10−0.5
100
100100.5 101101.5
0.2
0.4
0.6
0.8
1.0
1.2
10 20 30 40
Figure 2: The decay factor γ(t)in time as measured using
Eq. (5). The log-log plot exhibits that it decreases in a power-
law fashion, with an exponent that is measured to be exactly -1
(the linear regression on the logarithmically transformed data
fits with R2= 0.98). The fit to determine the exponent was
performed in the range of the solid line next to the function,
which also shows the result of the fit while being shifted lower
for easy comparison. The inset displays the same γ(t)function
on standard linear scales.
unbounded [11].
To measure the functional form of γ(t), we observe that the ex-
pected value of the noise term ξ(t)in Eq. (2) is 1. Thus averaging
over the fractions between consecutive tweet counts yields γ(t):
γ(t) = Nq(t)
Nq(t1) q
1.(5)
The experimental values of γ(t)in time are shown in Fig. 2. It
is interesting to notice that γ(t)follows a power-law decay very
precisely with an exponent of 1, which means that γ(t)1/t.
6. THE GROWTH OF TWEETS OVER TIME
The interesting fact about the decay function γ(t) = 1/t is that
it results in a linear increase in the total number of tweets for a
topic over time. To see this, we can again consider Eq. (4), and
approximate the discrete sum of random variables with an integral
of the operand of the sum, and substitute the noise term with its
expectation value, hξ(t)i= 1 as defined earlier (this is valid if γ(t)
is changing slowly). These approximations yield the following:
ln Nq(t)
Nq(0) Zt
τ=0
ln [1 + γ(τ)] Zt
τ=0
1
τ= ln t. (6)
In simplifying the logarithm above, we used the Taylor expansion
of ln(1 + x)x, for small x, and also used the fact that γ(τ) =
1as we found experimentally earlier.
It can be immediately seen then that Nq(t)Nq(0) tfor the range
of twhere γ(t)is inversely proportional to t. In fact, it can be
easily proven that no functional form for γ(t)would yield a lin-
ear increase in Nq(t)other than γ(t)1/t (assuming that the
above approximations are valid for the stochastic discrete case).
Time (hours)
Normalized tweet counts
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40
Figure 3: The number of total tweets on topics in the first
48 hours, normalized to 1so that they can be shown on the
same plot. The randomly selected topics were (from left to
right): “Earnings”, “#pulpopaul”, “Sheen”, “Deuces Remix”,
“Isaacs”, “#gmp24”, and “Mac App”.
This suggests that the trending topics featured on Twitter increase
their tweet counts linearly in time, and their dynamics is captured
by the multiplicative noise model we discussed above.
To check this, we first plotted a few representative examples of the
cumulative number of tweets for a few topics in Fig. 3. It is ap-
parent that all the topics ( selected randomly) show an approximate
initial linear growth in the number of tweets.We also checked if this
is true in general. Figure 4 shows the second discrete derivative of
the total number of tweets, which we expect to be 0if the trend
lines are linear on average. A positive second derivative would
mean that the growth is superlinear, while a negative one suggests
that it is sublinear. We point out that before taking the average of
all second derivatives over the different topics in time, we divided
the derivatives by the average of the total number of tweets of the
given topics. We did this so as to account for the large difference
between the ranges of the number of tweets across topics, since a
simple averaging without prior normalization would likely bias the
results towards topics with large tweet counts and their fluctuations.
The averages are shown in Fig. 4.
We observe from the figure that when we consider all topics there
is a very slight sublinear growth regime right after the topic starts
trending, which then becomes mostly linear, as the derivatives data
is distributed around 0. If we consider only very popular topics
(that were on the trends site for more than 4 hours), we observe an
even better linear trend. One reason for this may be that topics that
trend only for short periods exhibit a concave curvature, since they
lose popularity quickly, and are removed from among the Twitter
trends by the system early on.
These results suggest that once a topic is highlighted as a trend on a
very visible website, its growth becomes linear in time. The reason
for this may be that as more and more visitors come to the site
and see the trending topics there is a constant probability that they
will also talk and tweet about it. This is in contrast to scenarios
where the primary channel of information flow is more informal.
Time (hours)
Relative curvature of the tweet count
−0.06
−0.04
−0.02
0.00
0.02
0.04
0.06
0 5 10 15 20
Figure 4: The average of the second derivative of the total num-
ber of tweets over all topics. For one topic, we first divided the
derivative values by the mean of the tweet counts so as to min-
imize the differences between the wide range of topic popular-
ities. The open circles show the derivatives obtained with this
procedure for all topics, while the smaller red dots represent
only topics that trended for longer than 4 hours.
In that case we expect that the growth will exhibit first a phase
with accelerated growth and then slow down to a point when no
one talks about the topic any more. Content that spreads through
a social network or without external “driving” will follow such a
course, as has been showed elsewhere [10, 12].
7. PERSISTENCE OF TRENDS
An important reason to study trending topics on Twitter is to un-
derstand why some of them remain at the top while others dissi-
pate quickly. To see the general pattern of behavior on Twitter,
we examined the lifetimes of the topics that trended in our study.
From Fig 5(a) we can see that while most topics occur continu-
ously, around 34% of topics appear in more than one sequence.
This means that they stop trending for a certain period of time be-
fore beginning to trend again.
A reason for this behavior may be the time zones that are involved.
For instance, if a topic is a piece of news relevant to North Ameri-
can readers, a trend may first appear in the Eastern time zone, and
3 hours later in the Pacific time zone. Likewise, a trend may re-
turn the next morning if it was trending the previous evening, when
more users check their accounts again after the night.
Given that many topics do not occur continuously, we examined the
distribution of the lengths sequences for all topics. In Fig 5(b) we
show the length of the topic sequences. It can be observed that this
is a power-law which means that most topic sequences are short
and a few topics last for a very long time. This could be due to the
fact that there are many topics competing for attention. Thus, the
topics that make it to the top (the trend list) last for a short time.
However, in many cases, the topics return to trend for more time,
which is captured by the number of sequences shown in Fig 5(a),
as mentioned.
7.1 Relation to authors and activity
Number of topic recurrences
Number of topics
100
100.5
101
101.5
102
102.5
103
100100.5 101101.5
Trend duration (hours)
Frequency
100
100.5
101
101.5
102
102.5
103
103.5
100100.5 101101.5
Trend duration (hours)
Frequency
0
500
1000
1500
2000
0 5 10 15 20
Figure 5: (a) The distribution of the number of sequences a
trending topic comprises of (b) The distribution of the lengths
of each sequence. Both graphs are shown in the log-log scale
with the inset giving the actual histograms in the linear scale.
We first examine the authors who tweet about given trending topics
to see if the authors change over time or if it is the same people
who keep tweeting to cause trends. When we computed the corre-
lation in the number of unique authors for a topic with the duration
(number of timestamps) that the topic trends we noticed that cor-
relation is very strong (0.80). This indicates that as the number of
authors increases so does the lifetime, suggesting that the propaga-
tion through the network causes the topic to trend.
To measure the impact of authors we compute for each topic the
active-ratio aqas:
aq=N umber of T weets
Number of Unique Authors (7)
The correlation of active-ratio with trending duration is as shown in
Fig 6. We observe that the active-ratio quickly saturates and varies
little with time for any given topic. Since the authors change over
time with the topic propagation, the correlation between number of
tweets and authors is high (0.83).
7.2 Persistence of long trending topics
On Twitter each topic competes with the others to survive on the
trending page. As we now show, for the long trending ones we can
derive an expression for the distribution of their average length.
● ●
● ●
●●
●●
●●
●●
● ●
●●
●●
● ●
● ●
● ●
● ●
●●
●●●●
●●
●●
●● ●
● ●
●●
●●
●●
●●
●●
● ●
●●
● ●
●●
●●
●●
●●
●●
●● ●
●●
● ●
●●
● ●
●●
●●●●
●●
● ●
●●
●●
●● ●
● ●
● ●
●●
● ●
●●
●●
●●
●●
●●
●●
● ●
● ●
●●
●●
●●
● ●
●●
● ●
●● ●
●●
● ●
● ●
●●
●●
●●
●●
●●
● ●
●● ●
● ●
●●
● ●
●●●● ●
●● ●
● ●
● ●
●●
●●
●●● ●
●●
●●
●●
● ●
● ●
●●
●●
●●
●●
● ●
● ●
●●
●●
●●
● ●
●●
●●
● ●
●●
● ●
●●
●●
●●
●●
●●
● ●
● ●
● ●
●●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
●●
●●
●●
●●
●●
●●
● ●
●●
● ●
● ●
●●
●●
● ●
● ●
●●
● ●
● ●
●●
● ●●
●●
● ●
●●
● ●
●●
● ●
● ●
●●
●● ●
●●
●●
● ●
●●● ●
● ●
●●●
● ●
●●
●●
●●
● ●
●●
● ●
●●
●●
●● ●
● ●
●●
●●
● ●
●●
●●
●●
●●
●●
● ●
●●
●●
●●
●●
● ●
●●
●●
●●
● ●
● ●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
● ●
●●
●●
●●
●●
● ●
● ●●
●●
● ●
●●
● ●●●●
● ●
●● ●
●●
● ●
●●
●●
● ●
● ●
●●
● ●●●
● ●
●●
● ●
●●
●●
● ●● ●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●
●●
●●
●●● ●
●●
● ●
●●
●●
● ●
●●
● ●
●●
●● ●
●●
●●
● ●
● ●
●●
● ●
● ●
●●
● ●
● ●
● ●
2 4 6 8 10
0 1000 2000 3000 4000
Active Ratio
Trending Duration (mins)
Figure 6: Relation between the active-ratio and the length of
the trend across all topics, showing that the active-ratio does
not vary significantly with time.
Figure 7: Distribution of trending times. The black dots repre-
sents actual trending data pulled from Twitter, and the red dots
are the predictions from a geometric distribution with p=0.12.
We assume that, if the relative growth rate of tweets, denoted by
φt=Nt
Nt1, falls below a certain threshold θ, the topic would stop
trending. When we consider long-trending topics, as they grow
in time, they overcome the initial novelty decay, and the γterm
in equation (3) becomes fairly constant. So we can measure the
change over time using only the random variable ξas :
log φt= log Nt
Nt1
= log Nt
N0
log Nt1
N0
'ξt(8)
Since the ξsare independent and identical distributed random vari-
ables, φ1, φ2,···φtwould be independent with each other. Thus the
probability that a topic stops trending in a time interval s, where sis
large, is equal to the probability that φsis lower than the threshold
θ, which can be written as:
p= Pr(φs< θ) = Pr(log φs<log(θ))
= Pr(ξs<log(θ)) = F(log θ)(9)
Figure 8: Fit of trending duration to density in log scale. The
straight line suggests an exponential family of the trending time
distribution. The red line gives a fit with an R2of 0.9112.
F(x)is the cumulative distribution function of the random variable
χ. Given that distribution we can actually determine the threshold
for survival as:
θ=eF1(p)(10)
From the independence property of the φ, the duration or life time
of a trending topic, denoted by L, follows a geometric distribution,
which in the continuum case becomes the exponential distribution.
Thus, the probability that a topic survives in the first ktime inter-
vals and fails in the k+ 1 time interval, given that kis large, can be
written as:
Pr(L=k) = (1 p)kp(11)
The expected length of trending duration Lwould thus be:
< L >=
X
0
(1 p)kp·k=1
p1 = 1
F(log θ)1(12)
We considered trending durations for topics that trended for more
than 10 timestamps on Twitter. The comparison between the ge-
ometric distribution and the trending duration is shown in Fig 7.
In Fig 8 the fit of the trending duration to density in a logarithmic
scale suggests an exponential function for the trending time. The
R-square of the fitting is 0.9112.
8. TREND-SETTERS
We consider two types of people who contribute to trending topics -
the sources who begin trends, and the propagators who are respon-
sible for those trends propagating through the network due to the
nature of the content they share.
8.1 Sources
We examined the users who initiate the most trending topics. First,
for each topic we extracted the first 100 users who tweeted about it
prior to its trending. The distribution of these authors and the topics
is a power-law, as shown in Fig 9. This shows that there are few
authors who contribute to the creation of many different topics. To
focus on these multi-tasking users, we considered only the authors
who contributed to at least five trending topics.
Number of trending topics initiated
Number of authors
100
101
102
103
104
105
100100.5 101101.5
0
50000
100000
150000
200000
0 2 4 6 8 10
Figure 9: Distribution of the first 100 authors for each trending
topic. The log-log plot shows a power-law distribution. The
inset graph gives the actual histogram in the linear scale.
When we consider people who are influential in starting trends on
Twitter, we can hypothesize two attributes - a high frequency of ac-
tivity for these users, as well as a large follower network. To eval-
uate these hypotheses we measured these two attributes for these
authors over these months.
Frequency: The tweet-rate can effectively measure the frequency
of participation of a Twitter user. The mean tweet-rate for these
users was 26.38 tweets per day, indicating that these authors tweeted
fairly regularly. However, when we computed the correlation of the
tweet-rate with the number of trending topics that they contributed
to, the result was a weak positive correlation of 0.22. This indicates
that although people who tweet a lot do tend to contribute to the
trending topics, the rate by itself does not strongly determine the
popularity of the topic. In fact, they happen to tweet on a variety of
topics, many of which do not become trends. We found that a large
number of them tended to tweet frequently about sporting events
and players and teams involved. When some sports-related topics
begin to trend, these users are among the early initiators of them, by
virtue of their high tweet-rate. This suggests that the nature of the
content plays a strong role in determining if a topic trends, rather
than the users who initate it.
Audience: When we looked at the number of followers for these
authors, we were surprised to find that they were almost completely
uncorrelated (correlation of 0.01) with the number of trending top-
ics, although the mean is fairly high (2481) 1. The absence of cor-
relation indicates that the number of followers is not an indication
of influence, similar to observations in earlier work [9].
8.2 Propagators
We have observed previously that topics trend on Twitter mainly
due to the propagation through the network. The main way to prop-
agate information on Twitter is by retweeting. 31% of the tweets of
trending topics are retweets. This reflects a high volume of propa-
gation that garner popularity for these topics. Further, the number
of retweets for a topic correlates very strongly (0.96) with the trend
duration, indicating that a topic is of interest as long as there are
people retweeting it.
Each retweet credits the original poster of the tweet. Hence, to
1This is due to the fact that one of these authors has more than a
million followers
Author Retweets Topics Retweet-Ratio
vovo_panico 11688 65 179.81
cnnbrk 8444 84 100.52
keshasuja 5110 51 100.19
LadyGonga 4580 54 84.81
BreakingNews 8406 100 84.06
MLB 3866 62 62.35
nytimes 2960 59 50.17
HerbertFromFG 2693 58 46.43
espn 2371 66 35.92
globovision 2668 75 35.57
huffingtonpost 2135 63 33.88
skynewsbreak 1664 52 32
el_pais 1623 52 31.21
stcom 1255 51 24.60
la_patilla 1273 65 19.58
reuters 957 57 16.78
WashingtonPost 929 60 15.48
bbcworld 832 59 14.10
CBSnews 547 56 9.76
TelegraphNews 464 79 5.87
tweetmeme 342 97 3.52
nydailynews 173 51 3.39
Table 1: Top 22 Retweeted Users in at least 50 trending topics
each
identify the authors who are retweeted the most in the trending top-
ics, we counted the number of retweets for each author on each
topic.
Domination: We found that in some cases, almost all the retweets
for a topic are credited to one single user. These are topics that are
entirely based on the comments by that user. They can thus be said
to be dominating the topic. The domination-ratio for a topic can be
defined as the fraction of the retweets of that topic that can be at-
tributed to the largest contributing user for that topic. However, we
observed a negative correlation of 0.19 between the domination-
ratio of a topic to its trending duration. This means that topics
revolving around a particular author’s tweets do not typically last
long. This is consistent with the earlier observed strong correlation
between number of authors and the trend duration. Hence, for a
topic to trend for a long time, it requires many people to contribute
actively to it.
Influence: On the other hand, we observed that there were authors
who contributed actively to many topics and were retweeted signif-
icantly in many of them. For each author, we computed the ratio
of retweets to topics which we call the retweet-ratio. The list of
influential authors who are retweeted in at least 50 trending topics
is shown in Table 1. We find that a large portion of these authors
are popular news sources such as CNN, the New York Times and
ESPN. This illustrates that social media, far from being an alter-
nate source of news, functions more as a filter and an amplifier for
interesting news from traditional media.
9. CONCLUSIONS
To study the dynamics of trends in social media, we have conducted
a comprehensive study on trending topics on Twitter. We first de-
rived a stochastic model to explain the growth of trending topics
and showed that it leads to a lognormal distribution, which is vali-
dated by our empirical results. We also have found that most topics
do not trend for long, and for those that are long-trending, their
persistence obeys a geometric distribution.
When we considered the impact of the users of the network, we
discovered that the number of followers and tweet-rate of users are
not the attributes that cause trends. What proves to be more impor-
tant in determining trends is the retweets by other users, which is
more related to the content that is being shared than the attributes
of the users. Furthermore, we found that the content that trended
was largely news from traditional media sources, which are then
amplified by repeated retweets on Twitter to generate trends.
10. REFERENCES
[1] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the
Influential Bloggers in a Community. WSDM’08, 2008.
[2] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing
influence-based contagion from homophily-driven diffusion
in dynamic networks. Proceedings of the National Academy
of Sciences, 106(51):21544–21549, December 2009.
[3] M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi.
Measuring User Influence in Twitter: The Million Follower
Fallacy. In Fourth International AAAI Conference on
Weblogs and Social Media, May 2010.
[4] W. Galuba, D. Chakraborty, K. Aberer, Z. Despotovic, and
W. Kellerer. Outtweeting the Twitterers - Predicting
Information Cascades in Microblogs. In 3rd Workshop on
Online Social Networks (WOSN 2010), 2010.
[5] B. A. Huberman, D. M. Romero, and F. Wu. Social networks
that matter: Twitter under the microscope. ArXiv e-prints,
December 2008, 0812.1045.
[6] B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter
power: Tweets as electronic word of mouth. J. Am. Soc. Inf.
Sci., 60(11):2169–2188, 2009.
[7] M. E. McCombs and D. L. Shaw. The Evolution of
Agenda-Setting Research: Twenty Five Years in the
Marketplace of Ideas. Journal of Communication, (43
(2)):68–84, 1993.
[8] M. Mitzenmacher. A brief history of generative models for
power law and lognormal distributions. Internet
Mathematics, 1:226–251, 2004.
[9] D. M. Romero, W. Galuba, S. Asur, and B. A. Huberman.
Influence and passivity in social media. In 20th International
World Wide Web Conference (WWW’11), 2011.
[10] G. Szabo and B. A. Huberman. Predicting the popularity of
online content. Commun. ACM, 53(8):80–88, 2010.
[11] F. Wu and B. A. Huberman. Novelty and collective attention.
Proceedings of the National Academy of Sciences of the
United States of America, 104(45):17599–17601, November
2007.
[12] J. Yang and J. Leskovec. Patterns of temporal variation in
online media. In Proceedings of the fourth ACM
international conference on Web search and data mining,
WSDM ’11, pages 177–186, 2011.
... In Table 5 is described the major findings and approaches used by the authors. The exploratory analysis of trending topics are relevant, because the authors study the behavior of the trending topics, in other words, discover what drives a trending topic, how and why they become trending, what are the key features, and many other questions.It was found that Twitter follows identical pattern to media news [40], and the most important attribute is the retweet by other users and the most content shared is news from traditional media [5]. Another study found that a trending topic emerge 1.5 times in a year [4], which means that are always new topics being discussed. ...
... Another study found that a trending topic emerge 1.5 times in a year [4], which means that are always new topics being discussed. Also, trending topics are driven by a lognormal distribution and have a decay of a geometric distribution [5]. That seen in accordance with the time that a topics became a trending topic, which is approximately 36.2 minutes to get to top ten and 91.5 minutes to be at top one [4]. ...
... Asur et al. [5] 2011 -Trending topics are driven by a log-normal distribution -Trending topics have a decay of a geometric distribution -The most important attribute is the retweet by other users -The number of followers and tweet-rate of users does not provoke trends -The most content shared is news from traditional media Wilkinson and Thelwall [40] 2012 ...
Article
Full-text available
Trending topics are the most discussed topics at the moment on social media platforms, particularly on Twitter and Facebook. While the access to trending topics are free and available to everyone, marketing specialists and specific software are more expensive, therefore there are companies that do not have the budget to support those costs. The main goal of this work is to search for associations between trending topics and companies on social media platforms and HotRivers prototype was developed to fill this gap. This approach was applied to Twitter and used text mining techniques to process tweets, train personalized models of companies and deliver a list of the matched trending topics of the target company. So, in this work were tested different pre-processing text techniques and a method to select tweets called Centroid Strategy used on trending topics to avoid unwanted tweets. Also, were tested three models, an embedding vectors approach with Doc2Vec model, a probabilistic model with Latent Dirichlet Allocation, and a classification task approach with a Convolutional Neural Network used on the final architecture. The approach was validated with real cases like Adidas, Nike and Portsmouth Hospitals University. In the results stand out that trending topic Nike has an association with the company Nike and #WorldPatientSafetyDay has an association with Portsmouth Hospitals University. This prototype, HotRivers, can be a new marketing tool that points the direction to the next campaign.
... RQ4. Studying the trends in Tweet counts [72] and the associated sentiments and emotional responses to detect any correlations between the two. RQ5. ...
Article
Full-text available
The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today’s living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 Tweets about exoskeletons that were posted in a 5-year period from 21 May 2017 to 21 May 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
... There has been a vast amount of research completed on the adaptation of influence and evolution of trends in Western online social networks [30][31][32]. In particular, applying social media apps, such as Twitter, Facebook, and Instagram, to qualitative geography research is becoming increasingly popular [24,27,33,34]. ...
Article
Full-text available
The assessment of public participation is one of the most fundamental components of holistic and sustainable cultural heritage management. Since the beginning of 2020, the COVID-19 pandemic became a catalyst for the transformation of participatory tools. Collaboration with stakeholders moved online due to the strict restrictions preventing on-site activities. This phenomenon provided an opportunity to formulate more comprehensive and reasonable urban heritage protection strategies. However, very few publications mentioned how social networking sites' data could support humanity-centred heritage management and participatory evaluation. Taking five World Cultural Heritage Sites as research samples, the study provides a methodology to evaluate online participatory practices in China through Weibo, a Chinese-originated social media platform. The data obtained were analysed from three perspectives: the users' information, the content of texts, and the attached images. As shown in the results section, individuals' information is described by gender, geo-location, celebrities, and Key Opinion Leaders. To a greater extent, participatory behaviour emerges at the relatively primary levels, that being "informing and consulting". According to the label detection of Google Vision, residents paid more attention to buildings, facades, and temples in the cultural heritage sites. The research concludes that using social media platforms to unveil interplays between digital and physical heritage conservation is feasible and should be widely encouraged.
... Twitter is a great tool for organizations to use because it has an exceptionally large reach (Asur, Huberman, Szabo, & Wang, 2011). True Friends Twitter account also has a very large following. ...
Article
Non-profit organizations in the United States are becoming more dependent on the use of social media accounts, to market to their mobile audiences, because they are free to use. With the constant advancements in technology, True Friends marketing department struggles to keep up with the lack of staff and necessary resources. The researchers chose to investigate how True Friends Organization could improve the quality of their mobile engagement through the analysis of their social media and Google analytics accounts. Specifically, the researchers implemented action research to evaluate if the increased use of Instagram expands True Friends mobile audience. The researchers evaluated how technology helps to create unique cultures amongst mobile audiences, as well as why social media as a medium is so important. Participants of this study included True Friends mobile audiences on Google Analytics, Facebook, Twitter, Pinterest, and Instagram. Their mobile audience consists of participants from California, England, Illinois, Iowa, Minnesota, New York, and Wisconsin. The study meticulously focused on social media as a medium for True Friends to communicate with their mobile audience, and how each of their accounts helps to create a distinct culture.
Article
Full-text available
Social media has been amazingly successful in terms of adoption and usage levels. They lead to a paradigm shift in the way people connect and communicate with each other, how to express and share ideas, and even how to interact with products, brands, and organizations. In addition, social networks have become an important consumer knowledge network. Social media, especially Facebook and Twitter, are important platforms for building relationships with consumers and for consumers to obtain information about brands and their products. E-WOW has grown at an extraordinary rate over the past couple of decades, creating multiple opportunities for marketing in online settings. The explosion of social media helps customers become more empowered and engaged in their brand interactions while also providing them with new tools in their search, evaluation, choice, and purchase of marketing offerings. Consumers trust the information published by the brand and the information published by other consumers online. The future of social media is also seen as a marketing tool. Consequently, these developments are influencing marketing practices, both strategically, and tactically. This setup has been able to communicate the best experience possessed of a brand to consumers so that they are able to recall the brand in a long time which led to the formation of trust, satisfaction, and loyalty towards the brand. This research aims to analyze the effect of social media marketing activities on E-brand Trust. The paper followed a deductive approach and attempted to review current scholarly social media marketing literature and research, including its beginnings, current usage, benefits and downsides, and best practices. Further examinations to uncover the vital job of social media, inside a digitalized business period. As a result of the comprehensive analysis, it undoubtedly displays that social media is a significant power in the present marketing scene, especially on E Brand Trust.
Article
Full-text available
Social media has become such a large part of people’s life that even if little at a time, that influence can accommodate over time and can manipulate or even form new opinions. The authors have gathered data with which it is easily understood that the growth of Twitter, the people within its engagement range and its potential for becoming a portal of information sourcing as well as incidents have grown considerably well over the last decade and are well expected to grow into the next decade as well due to the new generation telecom technologies. This study aims to understand how much time Twitter trends remain ‘hot’ based on various parameters including but not limited to demography, the incident, time period or the people affected.The main objective is to gather data about different trending topics over different time periods and then analyze the pattern of how tweet volume due to that Twitter trend increased or decreased over a few days. This allows to demonstrate that Twitter can be a powerful tool to manipulate public opinion since this reaches a large number of users in a lot of developed countries. The influence of tweets can be seen from the fact that even a tweet done from a non-influential person’s account can garner enough attention to become worldwide phenomenon. Towards the end of the study, the authors used a visual medium to depict how various topics fared over the 5 days that tweets were scraped.
Article
This paper is an exploration of the variety of French-speaking cats on Twitter. Among the many creative phenomena that the internet has produced, animal-related language varieties, the language used by pets, have been explored as early as the 2000s, yet with a strong and almost exclusive focus on English. I first describe the shared repertoire of lexical, semantic, phonographic, and syntactic features used by French-speaking cats, and show how the simultaneous use of a childlike code and a formal register constructs the sociolinguistic persona of cats as ambivalent animals. I argue that the French variety has become “enregistered” ( Squires 2010 ) insofar as it is perceived and ideologically constructed as a variety of its own while promoting a welcoming culture towards new members. In doing so, cats show that the belonging to a community of practice, notably by drawing on a common repertoire of resources, does not need to be linked with processes of exclusion.
Article
Full-text available
The subject of collective attention is central to an information age where millions of people are inundated with daily messages. It is thus of interest to understand how attention to novel items propagates and eventually fades among large populations. We have analyzed the dynamics of collective attention among one million users of an interactive website devoted to thousands of novel news stories. The observations can be described by a dynamical model characterized by a single novelty factor. Our measurements indicate that novelty within groups decays with a stretched-exponential law, suggesting the existence of a natural time scale over which attention fades.
Article
Full-text available
The ever-increasing amount of information flowing through Social Media forces the members of these networks to compete for attention and influence by relying on other people to spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Article
Full-text available
The ever-increasing amount of information owing through Social Media forces the members of these networks to compete for attention and influence by relying on other peopleto spread their message. A large study of information propagation within Twitter reveals that the majority of users act as passive information consumers and do not forward the content to the network. Therefore, in order for individuals to become influential they must not only obtain attention and thus be popular, but also overcome user passivity. We propose an algorithm that determines the influence and passivity of users based on their information forwarding activity. An evaluation performed with a 2.5 million user dataset shows that our influence measure is a good predictor of URL clicks, outperforming several other measures that do not explicitly take user passivity into account. We also explicitly demonstrate that high popularity does not necessarily imply high influence and vice-versa.
Article
Full-text available
2 Donald L. Shaw, a Senior Fellow for 1992–93 at the Freedom Forum Media Studies Center at Columbia University, is a Kenan Professor in the School of Journalism and Mass Communication at the University of North Carolina at Chapel Hill.
Conference Paper
Full-text available
Blogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide sugges- tions, report news, and form groups in Blogosphere. Blog- gers form their virtual communities of similar interests. Ac- tivities happened in Blogosphere affect the external world. One way to understand the development on Blogosphere is to find influential blog sites. There are many non-influential blog sites which form the "the long tail". Regardless of a blog site being influential or not, there are influential blog- gers. Inspired by the high impact of the influentials in a physical community, we study a novel problem of identify- ing influential bloggers at a blog site. Active bloggers are not necessarily influential. Influential bloggers can impact fellow bloggers in various ways. In this paper, we discuss the challenges of identifying influential bloggers, investigate what constitutes influential bloggers, present a preliminary model attempting to quantify an influential blogger, and pave the way for building a robust model that allows for finding various types of the influentials. To illustrate these issues, we conduct experiments with data from a real-world blog site, evaluate multi-facets of the problem of identify- ing influential bloggers, and discuss unique challenges. We conclude with interesting findings and future work.
Conference Paper
Full-text available
Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user's influence on others—a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twit- ter, we present an in-depth comparison of three mea- sures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynam- ics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spon- taneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.
Article
Full-text available
In this paper we report research results investigating microblogging as a form of electronic word-of-mouth for sharing consumer opinions concerning brands. We ana- lyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions. We inves- tigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment. We compared automated methods of classifying sentiment in these microblogs with man- ual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19% of microblogs contain mention of a brand. Of the brand- ing microblogs, nearly 20% contained some expression of brand sentiments. Of these, more than 50% were posi- tive and 33% were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy.
Article
Full-text available
Microblogging sites are a unique and dynamic Web 2.0 communication medium. Understanding the information flow in these systems can not only provide better insights into the underlying sociology, but is also crucial for applications such as content ranking, recommendation and filtering, spam detection and viral marketing. In this paper, we characterize the propagation of URLs in the social network of Twitter, a popular microblogging site. We track 15 million URLs exchanged among 2.7 million users over a 300 hour period. Data analysis uncovers several statistical regularities in the user activity, the social graph, the structure of the URL cascades and the communication dynamics. Based on these results we propose a propagation model that predicts which users are likely to mention which URLs. The model correctly accounts for more than half of the URL mentions in our data set, while maintaining a false positive rate lower than 15%.
Article
Full-text available
Node characteristics and behaviors are often correlated with the structure of social networks over time. While evidence of this type of assortative mixing and temporal clustering of behaviors among linked nodes is used to support claims of peer influence and social contagion in networks, homophily may also explain such evidence. Here we develop a dynamic matched sample estimation framework to distinguish influence and homophily effects in dynamic networks, and we apply this framework to a global instant messaging network of 27.4 million users, using data on the day-by-day adoption of a mobile service application and users' longitudinal behavioral, demographic, and geographic data. We find that previous methods overestimate peer influence in product adoption decisions in this network by 300-700%, and that homophily explains >50% of the perceived behavioral contagion. These findings and methods are essential to both our understanding of the mechanisms that drive contagions in networks and our knowledge of how to propagate or combat them in domains as diverse as epidemiology, marketing, development economics, and public health.
Conference Paper
Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.