ArticlePDF Available

Strong Regularities in Growth and Decline of Popularity of Social Media Services

Authors:

Abstract and Figures

We analyze general trends and pattern in time series that characterize the dynamics of collective attention to social media services and Web-based businesses. Our study is based on search frequency data available from Google Trends and considers 175 different services. For each service, we collect data from 45 different countries as well as global averages. This way, we obtain more than 8,000 time series which we analyze using diffusion models from the economic sciences. We find that these models accurately characterize the empirical data and our analysis reveals that collective attention to social media grows and subsides in a highly regular and predictable manner. Regularities persist across regions, cultures, and topics and thus hint at general mechanisms that govern the adoption of Web-based services. We discuss several cases in detail to highlight interesting findings. Our methods are of economic interest as they may inform investment decisions and can help assessing at what stage of the general life-cycle a Web service is at.
Content may be subject to copyright.
Strong Regularities in Growth and Decline
of Popularity of Social Media Services
Christian Bauckhage
University of Bonn,
Fraunhofer IAIS
Bonn, Germany
Kristian Kersting
TU Dortmund University,
Fraunhofer IAIS
Dortmund, Germany
ABSTRACT
We analyze general trends and pattern in time series that
characterize the dynamics of collective attention to social
media services and Web-based businesses. Our study is
based on search frequency data available from Google Trends
and considers 175 different services. For each service, we
collect data from 45 different countries as well as global av-
erages. This way, we obtain more than 8,000 time series
which we analyze using diffusion models from the economic
sciences. We find that these models accurately characterize
the empirical data and our analysis reveals that collective
attention to social media grows and subsides in a highly
regular and predictable manner. Regularities persist across
regions, cultures, and topics and thus hint at general mech-
anisms that govern the adoption of Web-based services. We
discuss several cases in detail to highlight interesting find-
ings. Our methods are of economic interest as they may
inform investment decisions and can help assessing at what
stage of the general life-cycle a Web service is at.
Categories and Subject Descriptors
G.3 [Probability and Statistics]: Time series analysis;
H.3.5 [Online Information Services]: Web-based services
General Terms
Economics, Human Factors, Measurement
Keywords
social media services, collective attention, trend prediction
1. INTRODUCTION
The problem of understanding the dynamics of collective
human attention has been called a key scientific challenge for
the information age [39]. In this paper, we address a spe-
cific aspect of this problem and mine search frequency data
for common trends and shared characteristics. Our focus is
on query logs which summarize the evolution of global and
regional interests in social media services and we explore
to what extend the general dynamics of collective attention
apparent from these data can be modeled mathematically.
Search frequency analysis is an emerging topic and a grow-
ing body of work shows that patterns found in aggregated
search data of large populations of Web users can provide
insights into collective concerns, interests, or habits. Results
on temporal dynamics of search engine queries are reported
from various fields and include data driven models of the
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(a) buzznet
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(b) failblog
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(c) flickr
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(d) librarything
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(e) studiVZ
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
shifted Gompertz
(f) wikipedia
Figure 1: Examples of Google Trends time series
which summarize how worldwide searches for dif-
ferent social media services evolve over time. Even
though individual curves differ considerably, an ap-
propriately parameterized diffusion model accounts
well for the apparent general trends of initial growth
and subsequent decline of interest. Results obtained
from more than 8,000 temporal signatures of collec-
tive attention on the Web indicate that these find-
ings are universal and that interests of large crowds
of users follow these patterns regardless of regional,
cultural, or linguistics backgrounds.
spread of diseases [19], accounts of the propagation of news
items [4,5,14], characterizations of the formation of politi-
cal opinions [22], or predictions of tourism flows [2].
Search frequencies are of particular interest in nowcast-
ing which aims at real time monitoring of economic trends
and developments [12]. Aggregated search behaviors of mil-
lions of users yield reliable predictions for sales or general
economic indicators [13,33]. Temporal changes in search
volumes were found to correlate with changes in the behav-
ior of investors [9,16] and to allow for predicting abnormal
stock returns [18,26]. Accordingly, analysts in the social
sciences, public health, or economics are beginning to em-
brace query log analysis as an alternative to more traditional
methods.
The work reported here originates from a project on Web
intelligence where we ask for socio-economic motivations for
individuals to participate in collective endeavors on the Web.
Regarding services, products, and campaigns we investigate
approaches that would allow companies or marketeers to
recognize whether they need to adjust their strategies in or-
der to remain competitive in the modern Web environment.
In particular, we ask to what extent it is possible to pre-
dict the future success or adoption of services, products, or
marketing messages using collective Web intelligence?
Our paradigm is to mine Web data for possible indicators
of trends in collective attention. In this paper, we consider
time series obtained from Google Trends which summarize
search interests of millions of users worldwide and we focus
on temporal signatures that characterize evolving interests
in social media. Extending previously published work [6],
our contributions are as follows:
1) We briefly review recent results which underline that
Google Trends data provide meaningful and reliable proxies
for research on how opinions and interests of large crowds
and populations evolve over time.
2) We analyze search frequency data from 45 countries
related to 175 social media services and Web businesses.
Given this comprehensive empirical basis, we perform trend
analysis using economic diffusion models and find them to
be in excellent agreement with the data. In particular, we
find that collective attention to social media as evident from
search frequencies evolves according to notably regular pat-
terns. Although microscopic behaviors may be chaotic, gen-
eral trends apparent in these data typically show simple and
highly regular dynamics of growth and decline.
3) We present evidence that this phenomenon persists
across regions, cultures, and linguistic backgrounds and we
elaborate on several particular examples to highlight sev-
eral interesting findings. We investigate the potential of our
models for forecasting and present qualitative results which
indicate that they indeed allow for reasonable predictions of
future developments of collective attention.
Next, we discuss the empirical basis of our study. Sec-
tion 3reviews models and methods applied for analysis; re-
sults are discussed in section 4. Section 5contrasts our work
to the related literature and section 6concludes this paper.
2. SEARCH FREQUENCY DATA: A PROXY
OF COLLECTIVE ATTENTION
Our overall goal is to proceed towards a better under-
standing of the dynamics of collective interests and concerns
of large populations of Web users. The empirical basis for
the work reported here consists of time series obtained from
Google Trends which indicate how search volumes related to
specific topics evolve over time.
2.1 Background
Google Trends is a publicly accessible service that pro-
vides statistics on queries users submitted to Google’s search
engine. It allows for retrieving weekly summaries of how
frequently a query has been used since January 1st 2004.
Aggregated statistics are available in form of global aver-
ages but can be narrowed down to regional statistics, for
instance on the level of individual countries.
Analyzing topic specific search dynamics is an increasingly
popular approach in studies on collective preferences [2,4,
5,9,12,13,16,18,22,26,33] and important questions
Table 1: 45 countries considered in this study
Africa MA, NG, ZA
Asia CN, ID, IN, JP, KR, MY, PH, TH, TW
Australia AU, NZ
Europe AT, BE, CH, CZ, DE, DK, ES, FI, FR, GR,
IE, IL, IT, NL, NO, PL, PT, RU, SE, TR,
UA, UK
N-America CA, MX, US
S-America AR, BR, CL, CO, PE, VE
pertaining to its validity and the significance of search data
have been addressed in two recent contributions.
Mellon [34] correlated results from traditional Gallup sur-
veys with Google Trends data and found that, w.r.t. politi-
cal and economic issues covered in traditional opinion polls,
search frequencies provide accurate proxies of the dynamics
of salient public opinions. Teevan et al. [38] studied how peo-
ple navigate the Web and found that over 25% of all queries
to search engines are navigational queries, i.e. searches for
company names such as facebook,youtube, or myspace that
are intended to find and then access particular Web sites.
In other words, a large percentage of Web users consistently
relies on Google searches rather than on bookmarks or on
entering URLs in order to navigate to Web sites. Together
these findings thus suggest that data from Google Trends
which aggregate information about the search activities of
millions of users are indeed indicative of collective interests
in Internet services, technical products, or novelties.
2.2 Data Collection and Preprocessing
In this paper, we analyze global and regional temporal
search statistics related to query terms such as ebay,face-
book, or youtube that indicate a populations interest in so-
cial media services or Web-based businesses. For potentially
ambiguous queries, we retrieve data for different spellings
(e.g. google plus,googleplus,google+,google +) and compute
their average. In total, we consider data from 45 different
countries related to 175 services. As we also retrieve corre-
sponding global search activities, our empirical basis consists
of more than 8.000 data sets.
The 45 countries considered in this study are listed in
Tab. 1. They were selected according to population size,
Internet penetration, and availability of query logs. Note
that this sample covers various regions, cultures, and official
languages and is deliberately not restricted to countries that
are technologically far advanced.
The 175 social media sites and Web businesses we con-
sider are listed in Tab. 2. These, too, were chosen accord-
ing to penetration and profile. Among others, they include
general and specialized social networking sites, photo- and
video sharing sites, music streaming services, virtual hang-
Table 2: 175 social media services and businesses
43things flixter mocospace studivz
adify fotoki myheritage stumbleupon
airbnb fotolog mylife svpply
aisanavenue foursquare myspace sysomos
amazon friendsreunited nasza-klasa taringa
amirite friendster netlog techcrunch
anobii gaiaonline netvibes technorati
asmallworld getglue nexopia tencent-qq
badoo github odnoklassniki tripadvisor
bebo gogoyoko openbc tripit
betfair goo dreads openid tuenti
bigadda google+ orkut tumblr
biip.no grono owly twango
bitly grooveshark paypal twitpic
blackplanet groupon photobucket twitter
bliptv habbo pinterest viadeo
boxcryptor hi5 plaxo vimeo
busuu hulu playdom virb
buzznet ibibo posterous vkontakte
cafemom imgur qapacity wakoopa
cloob instagram quechup wattpad
cotweet italki qzone weeworld
cozycot itsmy ravelry weibo
craiglist iwiw reddit weread
cyworld janrain renren wesabe
dailybooth jiepang revver wikia
dailymotion joost scribd wikipedia
deezer justin-tv scvngr winpalace
delicious kdice secondlife wordpress
deviantart kickstarter seedrs xanga
digg kiwibox sevenload xing
disaboom knitty shelfari yelp
disqus lagbook shopify youku
dontstayin last.fm skyblog youtube
dropbox librarything skype zaarly
dwolla linkedin skyrock zappos
ebay livejournal slashdot zoho
elftown livemocha slide.com zoomr
elixio living-social songza zooppa
epinions logoworks sonico zotero
facebook meebo soonr zynga
faceparty meetin soundcloud
failblog mendely sourceforge
fetlife metacafe spotify
flickr mixi stackoverflow
outs, (micro-)blogging services, and online retailers, trading
platforms, as well as social games providers and thus cover
a wide spectrum of social media.
For each combination of country and service, we collect
a discrete time series z= [z1, z2,...,z483] of weekly search
counts ztfrom January 2004 to March 2013.
As many services in our sample made their first appear-
ance later than January 2004 (e.g. youtube) and were thus
not actively searched for during the whole observation pe-
riod, we determine individual onset times tousing CUSUM
statistics [35]. This leaves us with shortened time series
y= [yto,...,y483] which we shift to yt0where t0=ttoin
order to facilitate statistical analysis.
For query terms related to services that were launched
prior to January 2004 (e.g. amazon), we manually determine
the number of weeks Tbetween their first public occurrence
toand January 1st 2004 and consider shifted time series
where t0=tto+T.
Given these data, we resort to descriptive data mining
techniques in order to identify commonalities or significant
differences between time series. In particular, we consider
three diffusion models which we review in the next section.
3. DIFFUSION MODELS
Visual inspection of search frequency time series related to
social media reveals noticeably common patterns: although,
on a microscopic level, collective interest in individual ser-
vices varies chaotically, macroscopic trends typically show
an initial phase of accelerated growth followed by periods
of saturation and prolonged decline (see, for instance, the
examples in Fig. 1).
Skewed temporal distributions like these frequently occur
in economics where they indicate buying behaviors or rates
of adoption are studied using diffusion models. We adhere
to this methodology and investigate to what extent simple
diffusion models can characterize general trends in our data.
Note that more elaborate approaches such as Gaussian
mixtures or kernel techniques might provide more accurate
fits. Alas, they typically lack interpretability since they yield
abstract in terms of (numerous) latent variables without
physical meaning. Diffusion models, on the other hand, are
deliberately designed to explain time series in terms of intu-
itive concepts that represent knowledge about everyday life
and the real world.
Since we are interested in macroscopic trends, we restrict
our analysis to two-parameter models which are unlikely to
over-fit the data but will capture its gist. Moreover, they
facilitate data exploration and simplify comparisons of sets
of time series. In order for this paper to be self contained,
this section briefly reviews the three diffusion models we
consider.
3.1 The Bass Model
In an influential paper, Bass [3] proposed a diffusion model
to describe how rates of adoption of novel products vary
over time. Introducing a parameter pto model a propensity
for innovation and a parameter qto model a propensity for
imitation, he cast the hazard rate of product adoption as
h(t) = f(t)
1F(t)=p+qF (t) (1)
where f(t) is a probability density and F(t) = Rt
0f(τ)is
the corresponding cumulative density. Solving the differen-
tial equation in (1) leads to the Bass distribution
fBA(t|p, q ) = (p+q)2
p
e(p+q)t
1 + q
pe(p+q)t2.(2)
Depending on the choice of pand q, this distribution can
assume a variety of shapes. In particular, for q > p, it
will increase to a maximum before decreasing to zero. This
becomes explicit by writing (1) as f(t) = p+qF (t)qF 2(t)
which exposes the adoption rate to result from composing
two antagonistic processes: a propensity p+qF (t) to grow
countered by a propensity qF 2(t) to decline.
We include the Bass model in our analysis because it often
accurately models sales and thus may also be able explain
collective attention dynamics on the Web.
3.2 The Shifted Gompertz Model
As our second model, we consider the shifted Gompertz
distribution
fSG (t|β , η) = βeβt eηeβt 1 + η1eβ t(3)
where t, β, η 0. It was introduced by Bemmaor [8] who
showed that the Bass model results from compounding the
shifted Gompertz with an Exponential distribution, i.e.
fBA(t|p, q ) = Z
0
fSG (t|β , η)eη
σ
σ(4)
such that p=β/(1 + σ) and q=. This reveals a latent
coupling of the Bass parameters pand qdue to taking the
average over the shape parameter ηof the shifted Gompertz.
Bemmaor’s shifted Gompertz therefore provides a more flex-
ible characterization of adoption dynamics and we explore
its merits in our experiments below.
3.3 The Weibull Model
The Weibull distribution is the type III extreme value
distribution and often applied as a life-time model [37]. Its
probability density function is defined for t[0,) and
given by
fWB (t|κ, λ) = κ
λt
λκ1e(t/λ)κ(5)
where κand λdetermine shape and scale. For κ= 1, the
Weibull coincides with the Exponential and, for κ3.5, it
approaches the Standard Normal.
Studying the dynamics of Internet memes, Bauckhage et
al. [5] pointed out that the Weibull, too, implicitly couples
two antagonistic growth dynamics. This can be seen from
considering its cumulative density function
FWB (t|κ, λ) = 1 e(t/λ)κ.(6)
Setting α= ( 1
λ)κfor brevity, rearranging the terms in (6),
and substituting into (5) yields f(t) = ακtκ1ακtκ1F(t).
Considered as a diffusion model, the Weibull distribution
thus combines a propensity ακtκ1for collective attention
to a service or product to grow with a propensity ακtκ1F(t)
for attention to subside. In passing, we note that by letting
α=α(t) = qF (t) and setting κ= 1, the Weibull and the
Bass model are related as fBA (t)p=fWB (t).
3.4 Model Fitting
When applying the above diffusion models to analyze tem-
poral signatures of collective attention on the Web, we must
cope with the fact that neither model provides a closed form
solution for the maximum likelihood estimates of their pa-
rameters. Addressing this issue and aiming at high efficiency
for large scale processing, we propose the use of multinomial
maximum likelihood techniques.
Throughout, we fit continuous distributions f(t|θ1, θ2) to
discrete series of frequency counts y1,...,ymgrouped into m
distinct intervals (t0, t1], (t1, t2], . . ., (tm1, tm]. To devise an
efficient algorithm for estimating optimal model parameters
θ
1and θ
2, we note that a histogram h(y1,...,ym) of counts
can be thought of as a multinomial distribution
h(y1,...,ym) = n!Y
i
pyi
i
yi!(7)
where n=Piyi. Since the cumulative density of the model
distribution is
F(t) = F(t|θ1, θ2) = Zt
0
f(τ|θ1, θ2)dτ, (8)
the probabilities piof the multinomial can be expressed as
pi(θ1, θ2) = F(ti)F(ti1) so that Pipi=F(tm)F(t0).
Accordingly, the likelihood for a discrete, truncated time
series y1,...,ymis given by
L(θ1, θ2) = n!
F(tm)F(t0)Y
i
pi(θ1, θ2)yi
yi!
and maximum likelihood estimates of θ1and θ2result from
computing the roots of θlog L. Again, this may not lead to
closed form solutions but may require numerical optimiza-
tion. To this end, we apply an efficient, iterative weighted
least squares scheme
X
i
wiyinpi(θ1, θ2)2.
which regresses the yionto their expectations npiand re-
quires to update the weights wi= (npi)1in each iteration.
In addition to computational convenience, this approach
is robust and has the property that, for pi=pi(θ
1, θ
2), the
final residual sum of squares follows a χ2statistic [25]. We
thus resort to the χ2-test for goodness of fit (GoF) testing.
Yet, we note that the χ2-test may underestimate the quality
of fits to time series [20] so that the results reported below
may improve even further if more elaborate tests were used.
4. EMPIRICAL RESULTS
This section presents and discusses trend analysis results
for our data set of about 8,000 social media related search
frequency signatures. In order to illustrate several arguably
important findings, we compare results obtained for distinct
countries, regions of the world, linguistic backgrounds, and
types of service in form of small case studies.
4.1 Time to Adoption
In a preparatory analysis, we gather statistics as to times
to adoption of social media in different countries. For each
service in our data set, we determine its global onset, i.e. the
point in time at which it first became visible in Google’s
search frequency data. Then, for every country in our data,
we determine the delay ∆t(in days) between the service’s
global onset and its onset in the country. Finally, we com-
pute the mean (µ) and median (m) delay per country in
order to perform comparisons.
Table 3ranks the 45 countries considered according to
their mean- and median times to adoption; the world map
at the bottom of the table shows a heat map visualization of
median times to adoption. Together with Japan, countries
from the western world lead both rankings. With respect
to both metrics, the US is the country where social media
most quickly achieve noticeable rates of adoption. This is
less surprising since many popular social media services such
as facebook are based in the US and thus may gather an
American audience faster than a global one.
A less anticipated finding comes from looking at Fig. 2
which plots median times to adoption along the time axis.
The delay between the US and the next fastest adopting
country, the UK, amounts to more than 200 days. For the
majority of countries in our study, we find that Web-based
social media achieve noticeable rates of adaption between
400 and 600 days after their launch or first observable onset.
At first sight, it thus appears surprising to find South Korea,
a technologically highly advanced nation, to lag behind in
this statistic. Yet, this can be attributed to peculiar aspects
of South Korean Web culture which features many social
Table 3: Rankings of countries w.r.t. mean (µ) and
median (m) time to adoption of a novel service
µ m µ m µ m
1. US US 16. NL IE 31. NZ AR
2. UK UK 17. AT MX 32. TW DK
3. FR CA 18. IE AT 33. DK NO
4. CA FR 19. MX PL 34. ZA ZA
5. JP DE 20. PL MY 35. CN TW
6. DE JP 21. PH IL 36. CO CO
7. AU ES 22. IN NZ 37. TH VE
8. IT IT 23. PT BR 38. NO GR
9. ES NL 24. IL PH 39. GR NG
10. BE BE 25. MA CL 40. CZ CZ
11. SE AU 26. VE MA 41. ID UA
12. MY SE 27. CL PE 42. UA ID
13. PE FI 28. AR IN 43. KR TH
14. FI PT 29. TR TR 44. RU RU
15. CH CH 30. BR CN 45. NG KR
0 200 400 600 800 1000
t
US
UK
CA
FR
DE
JP
ES
IT
NL
BE
AU
SE
FI
PT
CH
IE
MX
AT
PL
MY
IL
NZ
BR
PH
CL
MA
PE
IN
TR
CN
AR
DK
NO
ZA
TW
CO
VE
GR
NG
CZ
UA
ID
TH
RU
KR
Figure 2: Time line showing median times to adop-
tion (in days) of social media for different countries.
media such as cyworld or me2day that are very popular
within the country but rather unknown elsewhere.
Findings like these further underline that search frequency
signatures indeed provide plausible proxies for the study of
collective attention on the Web. Next, we therefore address
aspects of attention dynamics expressed in query log data.
4.2 Attention Dynamics
In our main analysis, we apply the economic diffusion
models from section 3in order to mine our data for shared
characteristics or noteworthy exceptions.
Table 4presents Goodness-of-Fit (GoF) results for all
three models in terms of p-value statistics obtained from
χ2-tests. To produce these statistics, data from different
countries were grouped into clusters representing continents
and the models were evaluated for each cluster. For the
shifted Gompertz, average p-values (the higher the better)
significantly exceed 0.5. This holds for fits to data which
reflect worldwide interests as well as for fits to continent
specific data. Moreover, at a significance level of 5%, we
Table 4: Goodness of fit w.r.t. regions of the world
region fSG fBA fW B
hpip > 0.05 hpip > 0.05 hpip > 0.05
Africa 0.61 68% 0.55 62% 0.50 57%
Asia 0.57 63% 0.49 54% 0.48 53%
Australia 0.66 70% 0.53 59% 0.50 58%
Europe 0.59 65% 0.48 51% 0.56 54%
N-America 0.54 57% 0.44 50% 0.39 44%
S-America 0.65 71% 0.54 59% 0.55 62%
worldwide 0.59 64% 0.50 55% 0.47 53%
Table 5: Goodness of fit w.r.t. languages of the
world
language fSG fBA fW B
hpip > 0.05 hpip > 0.05 hpip > 0.05
English 0.55 58% 0.44 49% 0.39 45%
Spanish 0.63 68% 0.52 56% 0.54 60%
Portuguese 0.60 67% 0.50 56% 0.47 51%
Russian 0.68 76% 0.58 66% 0.69 76%
French 0.55 60% 0.46 51% 0.39 45%
German 0.58 64% 0.47 52% 0.47 54%
Chinese 0.50 52% 0.42 46% 0.43 47%
Japanese 0.42 52% 0.38 44% 0.31 38%
Hindi 0.57 64% 0.47 54% 0.48 52%
average 0.57 62% 0.47 52% 0.45 51%
find the shifted Gompertz to provide accurate fits for the
majority of our data. In terms of overall GoF, the Bass and
the Weibull perform slightly worse, yet both models yield
statistically significant fits for most of the data, too.
Table 5provides an alternative view on our data. While
Tab. 4shows results w.r.t. geographic regions, Tab. 5lists
GoF results w.r.t. major languages spoken across the world.
Data from different countries were grouped into clusters rep-
resenting official languages and the three diffusion models
were evaluated for each cluster. Apparently, the results in
Tab. 5mimic those in Tab. 4. Quality and significance of
fits are comparable and the shifted Gompertz again provides
the most accurate explanation.
These results are interesting and important for they sug-
gest that the dynamics of collective attention apparent from
search frequency data can be accurately described in terms of
diffusion models. Moreover, they indicate that, around the
world, collective attention to social media evolves similarly
and independent of regions of origin or cultural backgrounds
of crowds of Web users.
Figure 3shows how our diffusion models fit general trends
for several well known social media platforms and Web-based
businesses. Gray curves show evolving global search volumes
available from Google Trends; colored curves represent fit-
ted models where the best fitting one (in terms of GoF)
is emphasized. These plots are in line with the results in
Tabs. 4and 5and illustrate that all three models are able to
capture general dynamics even if data for different services
show seemingly distinct patterns of growing and declining
collective attention.
A considerable advantage of descriptive data mining for
attention analysis and, in particular, of using two-parameter
diffusion models f(t|θ1, θ2) is that they facilitate visual an-
alytics. Once a diffusion model has been fit to a temporal
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
Weibull
Bass
shifted Gompertz
(a) amazon
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
Weibull
Bass
shifted Gompertz
(b) craiglist
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
Google Trends
Weibull
Bass
shifted Gompertz
(c) ebay
(d) facebook
(e) google+
(f) myspace
(g) youtube
(h) twitter
Figure 3: Exemplary visualizations of how the three diffusion models (Bass, shifted Gompertz, and Weibull)
fit general trends in temporal signatures of worldwide query logs related to several popular and well known
social media services and Web-based businesses; the respective best fitting model is emphasized.
(a) Bass model (b) shifted Gompertz model (c) Weibull model
Figure 4: Non-linear, two-dimensional embeddings of more than 8.000 search frequency time series into the
parameters spaces of the shifted Gomperts-, the Bass- and the Weibull diffusion model. In each case, the 2D
embedding coordinates of the eight examples in Fig. 3are highlighted in color.
signature of search activities, its parameters [θ1, θ2] provide
as a two-dimensional feature vector that characterizes the
time series and may be used in further analysis. Specifi-
cally, our approach immediately allows for non-linear, two-
dimensional embeddings of the data which can be plotted to
visualize whole data sets of time series.
Figure 4displays two-dimensional embeddings of all our
data according to the different diffusion models. To facilitate
interpretation, the coordinates of the eight time series in
Fig. 3are highlighted in color.
In each case, the embedding coordinates of amazon, a
business that continues to attract increasing user interest,
marks an extreme location in the embedding space. Simi-
larly extreme locations are occupied by craiglist and ebay,
two Web-platforms that were launched in the 1990s and
reached global peak popularity around 2008. The embed-
ding coordinates of google+, a service whose search frequency
time series indicate a spike of global attention after its launch
in 2011, reside at opposite extreme locations. All other time
series from Fig. 3are found more or less close together in
respective giant clusters of embedded search frequency data.
The existence of these giant clusters which contain almost
90% of all time series tested is arguably the most important
result of our analysis. Irrespective of the diffusion model
used to characterize general collective attention dynamics
and regardless of which region in the world is considered, it
appears that most time series in our collection show similar
behavior: individual social media services seem to be able to
attract increasing collective attention for a period of 4 to 6
years before user interest inevitably begins to subside. This
is visible in many of the time series shown throughout this
paper, well accounted for by the shape and scale parameters
of economic diffusion models, and thus strikingly apparent
in Fig. 4.
4.2.1 Case Study: Countries, Continents, Languages
Figure 5compares examples of attention dynamics for dif-
ferent countries, continents, and linguistic backgrounds.
In Fig. 5(a), we embed data from the US and South Ko-
rea in the parameters space of the shifted Gompertz model.
Above, both countries were found to be most different re-
garding median times to adoption of the services considered
in this study. Figure 5(a), however, indicates that attention
dynamics in both countries are rather similar.
Israel and Malaysia, two countries from different parts
of the world, occupy middle ranks in Tab. 3. Yet, their
embeddings in Fig. 5(b) overlap with those of the US and
(a) US and South Korea (b) Israel and Malaysia (c) Asia and South America (d) English and Russian
Figure 5: Exemplary comparisons of search frequency time series from different countries, continents, and
languages plotted in the two-dimensional parameter space of the shifted Gompertz model.
(a) facebook and myspace (b) flickr and imgur
Figure 6: Exemplary comparisons of temporal
query log data related to different social media ser-
vices.
South Korea and do not indicate noteworthy differences as
to collective attention dynamics. Similar conclusion apply
to the comparison of Asian and South American countries in
Fig. 5(c) and the comparison of English and Russian speak-
ing countries in Fig. 5(d).
4.2.2 Case Study: Social Networks, Photo Sharing
While diffusion models seem not to allow for a distinction
of attention dynamics in different countries or regions, we
find that data related to individual services tend to form
compact, separable clusters in the parameter spaces of the
models we consider.
As an example, Fig.6compares two social networks and
two photo sharing sites. Country specific time series re-
lated to myspace and facebook form distinct clusters in the
embedding space of the shifted Gompertz model. Whereas
myspace is a social networking site that came and went,
facebook seems to just have reached global peak popularity.
This difference is expressed in the scale parameter of the
shifted Gompertz. Likewise, attention dynamics for flickr
and imgur are well explicable in terms of the general cycle
of growth and decline; the apparent difference is that, in
most countries interest in flickr seems to decline while for
imgur it is still on the rise.
4.2.3 Case Study: amazon
Among all Web-based services considered in this study,
amazon, an online retailer, is found to cause most diverse
patterns of collective attention dynamics. Figure 7shows
that, while in most countries interest in amazon rises steadily
over the whole observation period, it remains rather constant
(a) amazon
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2006
2007
2008
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
(b) UK,DE,VE,FI,JP,ID
Figure 7: Query log data related to amazon.
(a) twitter
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2009
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2010
2011
2012
2013
20
40
60
80
100
GoogleTrends shiftedGompertz
2010
2011
2012
2013
20
40
60
80
100
120
GoogleTrends shiftedGompertz
(b) NL,PH,RU,FR,MY,TR
Figure 8: Query log data related to twitter.
in others, and actually declines in some albeit few cases.
4.2.4 Case Study: twitter
Twitter, a popular micro blogging service is another exam-
ple of a service where attention dynamics vary significantly
between countries. While in most countries in our study, in-
terest in twitter seems to just have reached its peak (see the
distinct cluster within the giant component in Fig. 8), there
are a few countries in which interest in this service continues
to rise, notably in France, Malaysia, and Turkey.
4.3 Predictions
Prompted by the overall high statistical significance of
fits provided by the three diffusion models, we apply them
to predict the future evolution of global collective interest in
existing social media. Next, we present qualitative results of
predictions over the next five years for exemplary services.
In addition, we consider services launched prior to 2004 and
demonstrate that the technique discussed in section 3also
allows for reasonably predicting the past and actually is able
to reconstruct unobserved past developments.
2008
2010
2012
2014
2016
2018
20
60
100
Google Trends
Weibull
Bass
shifted Gompertz
(a) facebook
2008
2010
2012
2014
2016
2018
20
60
100
Google Trends
Weibull
Bass
shifted Gompertz
(b) youtube
2010
2012
2014
2016
2018
20
60
100
140
Google Trends
Weibull
Bass
shifted Gompertz
(c) twitter
Figure 9: Predictions of future collective interest in exemplary social media services. Gray curves show
data obtained from Google Trends; solid colored curves indicate fits to these data, and dashed colored curves
show corresponding 5 year predictions. Note that these predictions do not indicate absolute user interest but
predict the evolution of relative search frequencies w.r.t. the maximum interest so far which is scaled to 100.
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
2016
2018
20
60
100
140
Google Trends
Weibull
Bass
shifted Gompertz
(a) amazon
2000
2002
2004
2006
2008
2010
2012
2014
2016
2018
20
60
100
Google Trends
Weibull
Bass
shifted Gompertz
(b) paypal
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
2016
2018
20
60
100
Google Trends
Weibull
Bass
shifted Gompertz
(c) ebay
Figure 10: Predictions of past and future collective interest in Web-based businesses launched prior to 2004.
Figure 9shows predictions of the development of collective
attention to three of todays prominent social media plat-
forms. To create these plots, we scale the range of the best
fitting instance of each model in our tests to match the range
of values used by Google Trends. While our predictions do
not allow for an estimation of the development of absolute
numbers of users interested in a service, they indicate how
interest may evolving relative to the present.
As the available data for the three services is truncated
from above, i.e. each service either has just or has not yet
reached peak popularity, traditional maximum likelihood es-
timates may not be reliable. However, visual inspection sug-
gests that, when using multinomial maximum likelihood, all
three diffusion models provide reasonable predictions. While
predictions according to the Weibull seem overly optimistic
and those due to the Bass model seem rather pessimistic,
the shifted Gompertz model marks a middle ground. For
instance, in the case of facebook it predicts that by 2017
collective interest in this service will reduce to 50% of its
current intensity. We remark that, while at first sight such
a development may seem improbable from today’s point of
view, the vast majority of the 175 social media considered in
this paper show characteristic cycles of growth and decline.
Given the data available as of this writing, collective atten-
tion to facebook so far seems to follow the same pattern.
Figure 10 shows examples of fits to severely truncated
search frequency data. Each Web-based business in this fig-
ure was launched prior to 2004 so that data from Google
Trends is incomplete regarding the past. Nevertheless, based
on multinomial maximum likelihood, the characterizations
of general trends according to each diffusion model are again
reasonable; in particular, onset times predicted by the shifted
Gompertz match the dates these businesses were launched.
5. RELATED WORK
Understanding the collective behavior of crowds of Web
users is a research topic of growing popularity and model-
based approaches have been used in this context before. We
divide our discussion of related work into two major parts:
first, we review previous contributions to attention dynamics
on the Web in general and then we discuss two recent, highly
related publications on the evolution of popularity of social
media services which themselves have stirred considerable
attention in early 2014.
5.1 Attention Dynamics on the Web
Statistical distributions similar to the ones considered in
this paper have been previously applied to characterize the
dynamics of the behavior ow crowds of Web users. In an
early contribution, Huberman et al. [24] analyzed brows-
ing behaviors and found that the number of links a user is
likely to follow on a Web site is distributed according to an
inverse Gaussian. In [39], Wu and Huberman studied life-
cycles of news items on social bookmarking site and found
that the amount of attention novel content receives is dis-
tributed log-normally. The log-Normal distribution was also
found to model sizes of cascades of messages passed through
a peer-to-peer recommendation network [28] or the number
of messages exchanged in instant messaging services [27].
The Weibull distribution in (5) was recently reported to
account well for statistics of dwell times on Web sites [31],
times people spend playing online games [7], or the dynamics
of Internet memes [5]. The Bass diffusion model in (1) has
recently been considered in order to reason about structures
of online social networks [32] or twitter information cascades
[23]. The shifted Gompertz distribution, on the other hand,
was apparently not yet considered in the context of social
media or Web usage dynamics.
While attention dynamics on shorter time scales have been
modeled using random fields [30], structured models [21],
or differential equations [29], long term temporal dynam-
ics of collective attention have previously been modeled us-
ing mixtures of power-law and Poisson distributions [15]
or systems of differential equations [1,28] which were in-
spired by techniques from the area of epidemic modeling
[10,17]. In this context, we note that the diffusion mod-
els considered in this paper also allow for interpretations in
terms of the dynamics of elementary differential equations.
For instance, the Weibull model in (5) can be expressed as
f(t) = d
dt F(t) = ακtκ1ακtκ1F(t) which hints at a sim-
ilarity in spirit between economic diffusion and established
epidemic models that seems to merit further research.
With respect to time series retrieved from Google Trends,
epidemic models based on differential equations involving
exogenous end endogenous influences have been discussed in
[15]. There, they were used as means of classifying, i.e. dis-
tinguishing, different types of attention dynamics. Trend
analysis based on data from Google Trends was also per-
formed in [14] yet there the focus was on developing clus-
tering algorithms to characterize different phases in search
frequency data. The approaches in [14,15] are thus related
to what is reported here, however, in contrast to these contri-
butions, we do not explicitly devise new models but consider
simpler representations that implicitly account for different
kinds of dynamics. Due to the simplicity of the diffusion
models considered here and because of their apparent empir-
ical validity and theoretical plausibility, the results reported
in this paper therefore provide a new baseline for research
on the mechanisms and long-term dynamics of collective at-
tention on the Web.
5.2 The Princeton / Facebook Controversy and
a Contribution from CMU
In a delightful synchronicity, Cannarella and Spechler [11],
Ribeiro [36], and we ourselves [6] all published analyzes on
how attention to social media evolves over time in early 2014.
While [11] was uploaded to arXiv, [36] and [11] were both
presented at the International World Wide Web Conference
in Seoul.
The work by Cannarella and Spechler from Princeton is
noteworthy for triggering a brief but fierce media frenzy.
Just as in the work presented here, the results in [11] were
obtained from analyzing Google Trends time series. Differ-
ing from our approach, Cannarella and Spechler considered
epidemic models to analyze search frequency time series that
indicate interest in services such as myspace or facebook.
While this methodology had earlier been applied to analyze
the temporal evolution of interest in Internet memes [4],
Cannarella and Spechler caused a controversy, because they
used their models to predict that facebook would lose 80% of
their users by 2017. Media interst was particularly stirred
by the fact that facebook data scientist Mike Develin was
quick to humorously “debunk” the Princeton “findings”
1.
Interestingly, our “qualitative” results in Fig. 9seem to
corroborate Cannarella’s and Spechler’s predictions and we
note that they were obtained from the same data but differ-
ent models. In any case, we certainly agree with Develin’s
objection that predictions based on search frequency data
have to be taken with a grain of salt. Yet, we disagree with
his argument that social media related search interests of
millions of Web users are not indicative of user engagement
1see: www.facebook.com/notes/mike-develin/debunking-
princeton/10151947421191849
(see again our discussion in Section 2) and note the curious
absence of any direct engagement data in his reply.
However, data that directly reflects engagement played an
important role in Ribeiro’s analysis performed at CMU [36].
He considered statistics available from alexa, a subsidiary of
amazon which provides Web traffic data that are gathered
using the alexa toolbar, a plugin that volunteers install in
their browsers so that alexa can track which Web pages they
access.
Regarding Ribeiro’s approach, we note that he extended
established epidemic models by new parameters and found
these new models to be in good agreement with his data.
His findings, too, caused considerable media attention since
he predicted collective user interest in facebook to remain
constant for years to come. Yet, this result as well should
be taken with a grain of salt. While it was derived from
direct engagement data, we point out that alexa data are
likely biased towards technology savvy users who installed
the toolbar and will hardly reflect the surfing behavior of
average Web users.
Given this discussion, the approach and results presented
here mark a middle ground. On the one hand, we consider
simple diffusion models rather than (intricate) models for
the epidemic spread of novelties. On the other hand, the
statistical basis for our analysis far exceeds those in [11,36].
Neither Cannarella and Spechler nor Ribeiro consider coun-
try specific data and neither of them considers as large a
number of different services than we do in this paper. More-
over, we see the main contribution of this paper not in the
predictions in Fig. 9but rather in the empirical observation
that collective attention to social media shows highly reg-
ular patterns of growth and decline regardless of region of
origin or cultural background of crowds of Web users.
6. CONCLUSION
In this paper, we performed search frequency analysis in
order to gain insights into the dynamics of collective atten-
tion to social media and Web-based businesses. Search fre-
quency analysis is an emerging topic and a quickly growing
literature shows that data available from Google Trends can
lead to novel insights into collective concerns, interests, or
habits [2,4,5,9,12,13,16,18,22,26,33,34,38].
Interested in collective attention to social media, we col-
lected Google Trends data from 45 different countries that
show how user interests in 175 social media services evolved
over time. Focusing on general trends, we considered de-
scriptive data mining techniques and applied economic dif-
fusion models to search our data set of more than 8,000 times
series for common patterns or distinctive differences.
Diffusion models are well established in economics and we
considered their use due to their conceptual simplicity. This
is in contrast to more elaborate approaches such as, say,
Gaussian mixtures or kernel techniques, which yield results
in terms of parameters for which there usually is no physi-
cally plausible counterpart. Diffusion models, on the other
hand, are designed to characterize time series in terms of ev-
eryday concepts such as propensities for attention to grow
and to decline and we note that Occam’s razor suggests to
prefer simple explanations whenever available.
Using an efficient algorithm for robust maximum likeli-
hood parameter estimation even under incomplete data, we
fitted the Bass-, the shifted Gompertz-, and the Weibull dif-
fusion model and evaluated their performance. Our most
important results can be summarized as follows:
economic diffusion models provide accurate and statis-
tically significant explanations of general trends in ag-
gregated search frequency data which summarize how
collective attention to social media evolves over time.
This capability of diffusion models to characterize the data
considered in this study thus suggests that:
collective attention to social media evolves according
to simple and highly regular dynamics of growth and
decline.
In a comparative analysis w.r.t. individual countries, dif-
ferent continents, or linguistic backgrounds, we found these
patterns to be persistent and conclude that
collective attention to social media evolves globally
similarly and independent of regions of origin or cul-
tural backgrounds of crowds of Web users.
Regarding individual services, however, rates of adoption
may vary between countries. Nevertheless, for almost 90%
of the time series in our data set, we found strikingly similar
attention dynamics and it seems that
most social media services are able to attract growing
collective attention for a period of 4 to 6 years before
user interest inevitably begins to subside.
Finally, because of the way growth dynamics are encoded
in the diffusion models studied here, it appears that public
attention to social media hinges on perceived novelty. In
other words, the more a crowd of users gets used to a service
or the less novel it appears, the faster it looses its appeal.
These are the characteristics of hype cycles. The temporal
behavior exposed in our analysis is therefore well in line with
everyday experience and aptly summarized by the statement
that what goes up, must come down.
Our results are of interest to professionals in marketing
and public relations. According to findings in [34,38] per-
taining to the saliency of query logs for behavioral studies,
data which aggregate the Web search behavior of millions
of people worldwide provide reasonable proxies for public
interests and preferences. The strongly regular patterns we
identified in time series that served as proxies for the pop-
ularity of social media therefore indicate that interests of
crowds of Web users are surprisingly predictable.
In summary, the models of attention dynamics considered
in this paper provide simple yet reliable and theoretically
well founded tools for Web trend analysis. They thus consti-
tute new baselines for Web intelligence research that targets
socio-economic questions. In particular, they provide base-
line tools that help estimating the future success or customer
adoption of particular services or Web-based businesses.
7. ACKNOWLEDGMENTS
The work reported in this paper was carried out within the
Fraunhofer / University of Southampton research project
SoFWIReD and funded by the Fraunhofer ICON initiative.
Kristian Kersting was additionally supported by the Fraun-
hofer ATTRACT fellowship “Statistical Relational Activity
Mining”.
8. REFERENCES
[1] A. Acerbi, S. Ghirlanda, and M. Enquist. The Logic of
Fashion Cycles. PLoS ONE, 7(3):e32541, 2012.
[2] C. Artola and E. Galan. Tracking the Future on the
Web: Construction of Leading Indicators using
Internet Searches. Documentos Ocasionales 1203,
Banco de Espana, 2012.
[3] F. Bass. A New Product Growth Model for Consumer
Durables. Management Science, 15(5):215–227, 1969.
[4] C. Bauckhage. Insights into Internet Memes. In Proc.
ICWSM. AAAI, 2011.
[5] C. Bauckhage, K. Kersting, and F. Hadiji.
Mathematical Models of Fads Explain the Temporal
Dynamics of Internet Memes. In Proc. ICWSM.
AAAI, 2013.
[6] C. Bauckhage, K. Kersting, and B. Rastegarpanah.
Collective Attention to Social Media Evolves
According to Diffusion Models. In Proc. WWW. ACM,
2014.
[7] C. Bauckhage, K. Kersting, R. Sifa, C. Thurau,
A. Drachen, and A. Canossa. How Players Lose
Interest in Playing a Game: An Empirical Study
Based on Distributions of Total Playing Times. In
Proc. CIG. IEEE, 2012.
[8] A. Bemmaor. Modeling the Diffusion of New Durable
Goods : Word-of-mouth Effect Versus Consumer
Heterogeneity. In G. Laurent, G. Lilien, and B. Pras,
editors, Research Traditions in Marketing, pages
201–229. Springer, 1994.
[9] I. Bordino, S. Battiston, G. Caldarelle, M. Cristelli,
A. Ukkonen, and I. Weber. Web Search Queries can
Predict Stock Market Volumes. PLoS ONE,
7(7):e40014, 2012.
[10] T. Britton. Stochastic Epidemic Models: A Survey.
Mathematical Biosciences, 225(1):24–35, 2010.
[11] J. Cannarella and J. Spechler. Epidemiological
Modeling of Online Social Network Dynamics.
arXiv:1401.4208 [cs.SI], 2014.
[12] J. Castle, N. Fawcett, and D. Hendry. Nowcasting Is
Not Just Comtemporaeneous Forecasting. National
Institute Economic Review, 210(1):71–89, 2009.
[13] H. Choi and H. Varian. Predicting the Present with
Google Trends. Economic Record, 88(S1):2–9, 2012.
[14] L. Christiansen, T. Schimoler, R. Burke, and
B. Mobasher. Modeling Topic Trends on the Social
Web Using Temporal Signatures. In Proc. WIDM.
ACM, 2012.
[15] R. Crane and D. Sornette. Robust Dynamic Classes
Revealed by Measuring the Response Function of a
Social System. PNAS, 105(41):15649–15653, 2008.
[16] Z. Da, J. Engelberg, and P. Gao. In Search of
Attention. J. of Finance, 66(5):1461–1499, 2011.
[17] K. Dietz. Epidemics and Rumors: A Survey. J. of the
Royal Statistical Society A, 130(4):505–528, 1967.
[18] A. Gerow and M. Keane. Mining the Web for the
Voice of the Herd to Track Stock Market Bubbles. In
Proc. IJCAI. AAAI, 2011.
[19] J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer,
M. Smolinski, and L. Brilliant. Detecting Influenza
Epidemics Using Search Engine Query Data. Nature,
457(7232):1012–1014, 2009.
[20] L. Gleser and D. Moore. The Effect of Dependence on
Chi-Square and Empiric Distribution Tests of Fit. The
Annals of Statistics, 11(4):1100–1108, 1983.
[21] S. Goel, D. Watts, and D. Goldstein. The Structure of
Online Diffusion Networks. In Proc. EC. ACM, 2012.
[22] L. Granka. Inferring the Public Agenda from Implicit
Query Data. In Proc. SIGIR. ACM, 2009.
[23] J. Hermann, W. Rand, B. Schein, and N. Vedopivec.
An Agent-Based Model of Urgent Diffusion in Social
Media. Technical report, Social Science Research
Network, 2013.
http://dx.doi.org/10.2139/ssrn.2297167.
[24] B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose.
Strong Regularities in World Wide Web Surfing.
Science, 280(5360):95–97, 1998.
[25] R. Jennrich and R. Moore. Maximum Likelihood
Estimation by Means of Nonlinear Least Squares. In
Proc. of the Statistical Computing Section. American
Statistical Association, 1975.
[26] K. Joseph, J. Wintoki, and Z. Zhang. Forecasting
Abnormal Stock Returns and Trading Volume Using
Investor Sentiment: Evidence from Online Search.
Int. J. of Forecasting, 27(4):1116–1127, 2011.
[27] J. Lescovec and E. Horvitz. Planetary-Scale Views on
a Large Instant-Messaging Network. In Proc. WWW.
ACM, 2008.
[28] J. Leskovec, L. Adamic, and B. Huberman. The
Dynamics of Viral Marketing. ACM Tans. Web,
1(1):5, 2007.
[29] J. Leskovec, L. Backstrom, and J. Kleinberg.
Meme-tracking and the Dynamics of the News Cycle.
In Proc. KDD. ACM, 2009.
[30] C. Lin, B. Zhao, Q. Mei, and J. Han. PET: A
Statistical Model for Popular Events Tracking in
Social Communities. In Proc. KDD. ACM, 2010.
[31] C. Liu, R. White, and S. Dumais. Understanding Web
Browsing Behavior through Weibull Analysis of Dwell
Times. In Proc. SIGIR. ACM, 2010.
[32] D. Luu, E.-P. Lim, T.-A. Hoang, and F. Chua.
Modeling Diffusion in Social Networks Using Network
Properties. In Proc. ICWSM. AAAI, 2012.
[33] N. McLaren and R. Shanbhogue. Using Internet
Search Data as Economic Indicators. Bank of England
Quarterly Bulletin, 51(2):134–140, 2011.
[34] J. Mellon. Search Indices and Issue Salience: the
Properties of Google Trends as a Measure of Issue
Salience. Sociology Working Papers 2011-01,
University of Oxford, 2011.
[35] E. Page. Continuous Inspection Scheme. Biometrika,
41(1–2):100–115, 1954.
[36] B. Ribeiro. Modeling and Predicting the Growth and
Death of Membership-Based Websites. In Proc.
WWW. ACM, 2014.
[37] H. Rinne. The Weibull Distribution. Chapman & Hall
/ CRC, 2008.
[38] J. Teevan, D. Liebling, and G. Geetha. Understanding
and Predicting Personal Navigation. In Proc. WSDM.
ACM, 2011.
[39] F. Wu and B. Huberman. Novelty and Collective
Attention. PNAS, 104(45):17599–17601, 2007.
... Research has shown that Google Trends can outperform surveys in predicting consumer behavior (Vosen & Schmidt, 2011). In this section, we will critically examine a study by Bauckhage, et al. (2014aBauckhage, et al. ( , 2014b) that used Google Trends data to gauge the public interest in 175 social media products (e.g. Facebook, YouTube, Twitter), including some social virtual worlds. ...
... In studying the diffusion of social media usage, Bauckhage, et al. (2014aBauckhage, et al. ( , 2014b assumed that Google Trends popularity was a proxy for "collective attention." To model and predict changes in collective attention, they tested the effectiveness of three different diffusion models in fitting Google Trends data: the Bass model (Bass, 1969), the shifted Gompertz model (Bemmaor, 1992), and a third function also used in diffusion studies, the Weibull model (Rinne, 2008). ...
... As an example of both the power and limitations of the Bauckhage, et al. method for understanding the popularity of a social platform, we briefly look at Google Trends data for Facebook. In their papers, Bauckhage, et al. (2014aBauckhage, et al. ( , 2014b used the shifted Gompertz model to correctly predict that in 2017 the Google Trends popularity of Facebook would fall to 50% of its 2013 peak value. The Weibull model seriously under-predicted the 2017 popularity (half of the real value) and the Bass model over-predicted the popularity (three times higher than the real value). ...
Article
Virtual worlds rose and fell in popularity a decade ago, and today's nascent commercially-available virtual reality could repeat this pattern. With sparse data available for gauging interest in technology products, such as virtual worlds or virtual reality, Google Trends search popularity has been used in prior studies as a proxy for global interest. We explore the problems with this approach using data from three virtual worlds: Second Life, Minecraft, and World of Warcraft. We find that Google Trends search volume does not correlate with user purchases or subscriptions, and the single shifted Gompertz function used in prior studies may not be sufficient to model both product user searches and searches driven by media attention.
... Numerous studies have stressed that network externalities appear to be important in the development of UGCSNets [13]- [19]. Other studies have adopted the BDM to capture and analyze the dynamics of the user acceptance of online UGCSNets [13], [34]. In addition, several studies have adopted epidemiological models to mathematically capture and explain the member adoption and abandonment of online social networks [30]- [33]. ...
... To capture the dynamics of the user acceptance of online UGCSNets, many studies have adopted the BDM [13], [34]. This model, developed by Frank Bass [29], describes the process of how new products become adopted as a function of the level of product innovation and imitation between adopters and potential adopters using difference equation (1a), as shown below, where P(t) is the number of adopters at time t and m is the total number of potential adopters in the market. ...
... In particular, for β > α, the user adoption will increase to a maximum before decreasing to zero [34], which becomes explicit by writing (1a) as ...
Article
User-generated content sharing networks (UGCSNets), in which members are content contributors as well as users, have had a significant impact on the sharing economy and on society via the sharing and reuse of contents. In a UGCSNet, managing for growth requires a quantitative grasp of how individual members’ participation and sharing affect and are affected by the membership and content volume; these interactions form a dynamic loop. In this paper, a quantitative modeling approach for the loop dynamics of UGCSNet growth is developed by exploiting limited empirical data. A teaching material sharing network (TMSN) serves as a baseline case study, and Wikipedia serves as a validation case for the modeling approach design. The novel modeling approach consists of i) a set of generalized Bass diffusion model-embedded stochastic difference equations (GBDSDEs) of the loop dynamics and ii) a quasibootstrap- based nonlinear least square (QBNLS) method to extract from the limited empirical data and periodically update the model parameters as the UGCSNet evolves. In GBDSDEs, two difference equations describe the number of members and content volume evolution. The stochastic drives consist of measures of individual participation and content uploading. The drive models are an innovative generalization of the Bass diffusion model (BDM) as probabilistic models of known qualitative descriptions regarding how the individual willingness to participate and share is affected by the total membership and content volume. Analyses of the coefficients of determination show good fits between model predictions and actual outcomes for both SCTNet and Wikipedia growths. Applications of the modeling approach to what-if analyses demonstrate its value to predict and assess the effects of specific managerial strategies—such as the initial content volume and the number of founding altruistic members—on the growth of a UGCSNet.
... From a perceiver attention perspective "novelty attracts human attention… When information is novel, it is not only surprising, but also more valuable" (Vosoughi et al. 2018(Vosoughi et al. , p. 1149. However, novelty is likely to be curvilinear to attention, as too little fails to gain attention whereas too much arousal risks information overload or sensory threat (Andersen et al. 1998;Bauckhage and Kersting 2014;Fisher et al. 2018;Kidd et al. 2012). ...
Chapter
The capabilities of researching social dynamics as big data have significantly outpaced the formulation of theory to account for the processes being discovered. This essay extends a formative conceptualization of social media communication as meme diffusion into a propositional model, animated largely by evolutionary and attention economy explanatory metaphors. The result is an integrative model formalized in 18 propositions, indicating that multiple system factors influence the generation and attrition of social media messages. The system levels include features of the meme itself, its medium, its source, its social network and societal context, the interference or facilitation of geospatial, technical and significant societal events. As such, memes diffuse sometimes because of the information value of events (evememic), viral meme cycles (entymemic), or some combination of these processes (polymemic). The model integrates extensive cross-disciplinary research and manifold theoretical influences in the interest of demonstrating a process of theory construction in the context of social media and new media.
... However, the best known and most frequently quoted model is the one proposed by Bass [35]. This model is often used alongside the so-called logistic and Gompertz projections, while ordinary predictions based on the Bass model are the most pessimistic [36]. This is an additional argument in favour of this model. ...
Article
Full-text available
Personal light electric vehicles (PLEVs) are a phenomenon that can currently be observed in cities, intended to be an ecological form of transport. The authors of the paper make an attempt to determine electricity consumption by PLEVs in the context of managing a large city in accordance with the concept of sustainable development. The article is of a cognitive nature. Research questions posed against the background of the goal formulated are as follows: how strong will the demand for PLEVs be (in the example of e-motor scooters, taking into consideration the number of vehicles) and for the electricity consumed by PLEVs. The method used is a simulation model. The conducted analyses demonstrate that a dynamic growth of PLEVs will result in an increased energy demand, which must be taken into account by the cities, developing according to the sustainable development conception.
... General trends and patterns in the dynamics of collective attention to social media and other web-services using the Bass diffusion equation and statistical methods are found in [27]. The main conclusion is that diffusion models can reproduce general trends quite accurately; that is, the same conclusion as in our paper, though using different methods. ...
... The shifted Gompertz distribution has mostly been used in the market research and diffusion theory, social networks, and forecasting. It has also been used to predict the growth and decline of social networks and online services and shown to be superior to the Bass model and Weibull distribution [4]. It is interesting to study the statistical phenomena in high energy physics in terms of this distribution. ...
Article
Full-text available
Charged particles' production in the e+e- , p¯p , and pp collisions in full phase space as well as in the restricted phase space slices, at high energies, is described with predictions from shifted Gompertz distribution, a model of adoption of innovations. The distribution has been extensively used in diffusion theory, social networks, and forecasting. A two-component model in which PDF obtained from the superposition of two shifted Gompertz distributions is introduced to improve the fitting of the experimental distributions by several orders. The two components correspond to the two subgroups of a data set, one representing the soft interactions and the other semihard interactions. Mixing is done by appropriately assigning weights to each subgroup. Our first attempt to analyse the data with shifted Gompertz distribution has produced extremely good results. It is suggested that the distribution may be included in the host of distributions more often used for the multiplicity analyses.
... His model can explain why some online communities decline and others do not. Bauckhage and Kersting (2014) insisted that the rise and fall of various SNSs can be explained by some statistical models. There are some other notable models (Ribeiro and Faloutsos 2015;Liu et al. 2013). ...
Article
Full-text available
Friendster is a social networking service which used to be popular at the beginning of the twenty-first century. Some analysis implies that the user network on Friendster collapsed from the “outside” of the layered structure of the cores. However, it is still not clear if the network really collapsed from the outside. We analyze the time evolution of the network structure more exactly to check whether that is true. It is shown that the collapse of the Friendster network actually started from the “center” of the core structure. Following this result, we attempt to explain its mechanism by a propagation model. We conclude that the time evolution of core structure can be explained by the two rules: (a) non-users who have many friends on Friendster are likely to register for Friendster, and (b) users who have many friends that have already left Friendster are also likely to leave. The users who have few friends on Friendster tend to leave soon and that may have also played a key role in the time evolution of the core structure. Moreover, under the assumption that our model is valid, we discuss what to do to prevent the decline of online communities. First, it is not effective to promote registration in maintaining the number of active users. Second, it is effective to promote non-active users to become active again. Third, it is effective to persuade influential users preferentially when we assume that the chain reaction of coming back may occur.
... For this setting, it is advantageous to use multinomial likelihood estimation based on reweighted least squares in order to determine optimal model parameters θ ˚ [19] . For a recent detailed exposition of this robust technique, we refer to [6]. For the fitted models, we report quantitative goodness-of-fit results in terms of the Hellinger distance ...
Conference Paper
Full-text available
Network statistics such as node degree distributions, average path lengths, diameters, or clustering coefficients are widely used to characterize networks. One statistic that received considerable attention is the distance distribution - the number of pairs of nodes for each shortest-path distance - in undirected networks. It captures important properties of the network, reflecting on the dynamics of network spreading processes, and incorporates parameters such as node centrality and (effective) diameter. So far, however, no parameterization of the distance distribution is known that applies to a large class of networks. Here we develop such a closed-form distribution by applying maximum entropy arguments to derive a general, physically plausible model of path length histograms. Based on the model, we then establish the generalized Gamma as a threeparameter distribution for shortest-path distance in strongly-connected, undirected networks. Extensive experiments corroborate our theoretical results, which thus provide new approaches to network analysis.
Article
Full-text available
Nearly two-thirds of the emissions that cause smog come from road transport. In April 2019, the European Parliament adopted new regulations on public procurement to encourage investment in clean buses—electric, hydrogen, or gas. Directive 2009/33/EC is to apply from the second half of 2021. The aim of this article is to make an attempt to simulate the number of zero-emission buses (ZEB) in European Union (EU) member countries in two time horizons: 2025 and 2030, and to forecast the number of clean vehicles in the precise time horizons, including before and after 2050. Research questions are as follows: (1) what will be the number of ZEBs in individual EU countries over the next few years; (2) which of the EU countries will reach by 2030 the level of 95% share of ZEBs in all buses, which are a fleet of public transport buses; and (3) in which year will which EU countries reach the level of 95% share of zero-emission buses. The method used is a Bass model. The conducted analyses demonstrate that, by 2050, only four of the EU members will be able to reach 95% level of share of clean buses in the city bus transport fleets. It is likely that other countries may not achieve this even by 2050.
Article
Full-text available
The subject of collective attention is central to an information age where millions of people are inundated with daily messages. It is thus of interest to understand how attention to novel items propagates and eventually fades among large populations. We have analyzed the dynamics of collective attention among one million users of an interactive website devoted to thousands of novel news stories. The observations can be described by a dynamical model characterized by a single novelty factor. Our measurements indicate that novelty within groups decays with a stretched-exponential law, suggesting the existence of a natural time scale over which attention fades.
Conference Paper
Full-text available
We investigate patterns of adoption of 175 social media services and Web businesses using data from Google Trends. For each service, we collect aggregated search frequencies from 45 countries as well as global averages. This results in more than 8.000 time series which we analyze using economic diffusion models. The models are found to provide accurate and statistically significant fits to the data and show that collective attention to social media grows and subsides in a highly regular manner. Regularities persist across regions, cultures, and topics and thus hint at general mechanisms that govern the adoption of Web-based services.
Chapter
The spread of new products in a population has been the subject of renewed interest over the past 15 years, stimulated in part by Robinson and Lakhani’s [1975] study on the pricing implications of the Bass diffusion model [1969]. Recent reviews have summarized the main articles in the area (for example, de Palma, Droesbeke, and Lefèvre [1991], Lilien, Kotler, and Moorthy [1992, pp. 461-80], and Mahajan, Muller, and Bass [1990]). However, as Mahajan, Muller, and Bass (p. 11) observed, most reported work has consisted of “refinements and extensions of the Bass diffusion model” without alteration of the basic premise of the diffusion curve, that is, sales as the result of the combination of both independent and imitative buying over time. Essentially, most work has considered adoption time as a deterministic event based upon the traits of consumers, the amount of information available to them, and their utility functions. Consequently, knowledge of those determinants implies a perfect prediction of adoption times.
Article
During a crisis, understanding the diffusion of information throughout a population will provide insights into how quickly the population will react to the information, which can help those who need to respond to the event. The advent of social media has resulted in this information spreading quicker then ever before, and in qualitatively different ways, since people no longer need to be in face-to-face contact or even know each other to pass on information in an crisis situation. Social media also provides a wealth of data about this information diffusion since much of the communication happening within this platform is publicly viewable. This data trove provides researchers with unique information that can be examined and modeled in order to understand urgent diffusion. A robust model of urgent diffusion on social media would be useful to any stakeholders who are interested in responding to a crisis situation. In this paper, we present two models, grounded in social theory, that provide insight into urgent diffusion dynamics on social networks using agent-based modeling. We then explore data collected from Twitter during four major urgent diffusion events including: (1) the capture of Osama Bin Laden, (2) Hurricane Irene, (3) Hurricane Sandy, and (4) Election Night 2012. We illustrate the diffusion of information during these events using network visualization techniques, showing that there appear to be differences. After that, we fit the agent-based models to the observed empirical data. The results show that the models fit qualitatively similarly, but the diffusion patterns of these events are indeed quite different from each other.
Article
Methods are given for using readily available nonlinear regression programs to produce maximum likelihood estimates in a rather natural way. Used as suggested the common Gauss-Newton algorithm for nonlinear least squares becomes the Fisher scoring algorithm for maximum likelihood estimation. In some cases it is also the Newton-Raphson algorithm. The standard errors produced are the information theory standard errors up to a possible common multiple. This means that much of the auxiliary output produced by a nonlinear least squares analysis is directly applicable to a maximum likelihood analysis. Illustrative applications to Poisson, quantal response, multinomial, and log-linear models are given.
Conference Paper
The Social Web makes visible the ebb and flow of popular interest in topics both newsworthy ("GulfSpill") and trivial ("Lolcat"). Understanding this emergent behavior is a fundamental goal for Social Web research. Key problems include discovering emergent topics from online text sources, modeling burst activity, and predicting the future trajectory of a given topic. Past work has addressed such problems individually for specific applications, but has lacked a generalizable framework for performing both classification and prediction of topic usage. Our approach is to model a topic as a temporally ordered sequence of derived feature states and capture characteristic changes in the topic trend. These sequences are drawn from a dynamic segmentation of frequency data based on change point analysis. We employ Partitioning Around Medoids clustering on these segments to produce signatures which highlight characteristic patterns of usage growth and decay. We demonstrate how this signature model can be used to define distinctive classes of topics in multiple online contexts, including tagging systems and web-based information retrieval. Additionally, we show how the model can predict the general trajectory of interest in a particular topic.
Conference Paper
Driven by outstanding success stories of Internet startups such as Facebook and The Huffington Post, recent studies have thoroughly described their growth. These highly visible online success stories, however, overshadow an untold number of similar ventures that fail. The study of website popularity is ultimately incomplete without general mechanisms that can describe both successes and failures. In this work we present six years of the daily number of users (DAU) of twenty-two membership-based websites - encompassing online social networks, grassroots movements, online forums, and membership-only Internet stores - well balanced between successes and failures. We then propose a combination of reaction-diffusion-decay processes whose resulting equations seem not only to describe well the observed DAU time series but also provide means to roughly predict their evolution. This model allows an approximate automatic DAU-based classification of websites into self-sustainable v.s. unsustainable and whether the startup growth is mostly driven by marketing & media campaigns or word-of-mouth adoptions.