Characterizing and modeling the dynamics of online popularity.
ABSTRACT Online popularity has an enormous impact on opinions, culture, policy, and profits. We provide a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two massive model systems: the Wikipedia and an entire country's Web space. We find that the dynamics of popularity are characterized by bursts, displaying characteristic features of critical systems such as fat-tailed distributions of magnitude and interevent time. We propose a minimal model combining the classic preferential popularity increase mechanism with the occurrence of random popularity shifts due to exogenous factors. The model recovers the critical features observed in the empirical analysis of the systems analyzed here, highlighting the key factors needed in the description of popularity dynamics.
[show abstract] [hide abstract]
ABSTRACT: Understanding social dynamics that govern human phenomena, such as communications and social relationships is a major problem in current computational social sciences. In particular, given the unprecedented success of online social networks (OSNs), in this paper we are concerned with the analysis of aggregation patterns and social dynamics occurring among users of the largest OSN as the date: Facebook. In detail, we discuss the mesoscopic features of the community structure of this network, considering the perspective of the communities, which has not yet been studied on such a large scale. To this purpose, we acquired a sample of this network containing millions of users and their social relationships; then, we unveiled the communities representing the aggregation units among which users gather and interact; finally, we analyzed the statistical features of such a network of communities, discovering and characterizing some specific organization patterns followed by individuals interacting in online social networks, that emerge considering different sampling techniques and clustering methodologies. This study provides some clues of the tendency of individuals to establish social interactions in online social networks that eventually contribute to building a well-connected social structure, and opens space for further social studies.EPJ Data Science. 11/2012; 1(9):1-30.
[show abstract] [hide abstract]
ABSTRACT: Inhomogeneous temporal processes, like those appearing in human communications, neuron spike trains, and seismic signals, consist of high-activity bursty intervals alternating with long low-activity periods. In recent studies such bursty behavior has been characterized by a fat-tailed inter-event time distribution, while temporal correlations were measured by the autocorrelation function. However, these characteristic functions are not capable to fully characterize temporally correlated heterogenous behavior. Here we show that the distribution of the number of events in a bursty period serves as a good indicator of the dependencies, leading to the universal observation of power-law distribution for a broad class of phenomena. We find that the correlations in these quite different systems can be commonly interpreted by memory effects and described by a simple phenomenological model, which displays temporal behavior qualitatively similar to that in real systems.Scientific Reports 01/2012; 2:397.
[show abstract] [hide abstract]
ABSTRACT: Understanding how institutional changes within academia may affect the overall potential of science requires a better quantitative representation of how careers evolve over time. Because knowledge spillovers, cumulative advantage, competition, and collaboration are distinctive features of the academic profession, both the employment relationship and the procedures for assigning recognition and allocating funding should be designed to account for these factors. We study the annual production ni(t) of a given scientist i by analyzing longitudinal career data for 200 leading scientists and 100 assistant professors from the physics community. Our empirical analysis of individual productivity dynamics shows that (i) there are increasing returns for the top individuals within the competitive cohort, and that (ii) the distribution of production growth is a leptokurtic “tent-shaped” distribution that is remarkably symmetric. Our methodology is general, and we speculate that similar features appear in other disciplines where academic publication is essential and collaboration is a key feature. We introduce a model of proportional growth which reproduces these two observations, and additionally accounts for the significantly right-skewed distributions of career longevity and achievement in science. Using this theoretical model, we show that short-term contracts can amplify the effects of competition and uncertainty making careers more vulnerable to early termination, not necessarily due to lack of individual talent and persistence, but because of random negative production shocks. We show that fluctuations in scientific production are quantitatively related to a scientist’s collaboration radius and team efficiency.Proceedings of the National Academy of Sciences 04/2012; 109(14):5213-5218. · 9.68 Impact Factor
Characterizing and modeling the dynamics of online popularity
Jacob Ratkiewicz,1Santo Fortunato,2Alessandro Flammini,1Filippo Menczer,1,2and Alessandro Vespignani1,2
1School of Informatics and Computing, Indiana University, Bloomington, IN, USA
2Complex Networks and Systems Lagrange Lab, Institute for Scientific Interchange, Torino, Italy
Online popularity has enormous impact on opinions, culture, policy, and profits. We provide
a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two
massive model systems, the Wikipedia and an entire country’s Web space. We find that the dynamics
of popularity are characterized by bursts, displaying characteristic features of critical systems such as
fat-tailed distributions of magnitude and inter-event time. We propose a minimal model combining
the classic preferential popularity increase mechanism with the occurrence of random popularity
shifts due to exogenous factors. The model recovers the critical features observed in the empirical
analysis of the systems analyzed here, highlighting the key factors needed in the description of
PACS numbers: 89.75.Hc, 89.20.-a
The dynamics of information and opinions have been
deeply affected by the existence of Web-mediated bro-
kers such as blogs, wikis, folksonomies, and search en-
gines, through which anyone can easily publish and pro-
mote content online. This “second age of information”
is driven by the economy of attention, first theorized by
Simon . Sources receiving a lot of attention become
popular and have formidable power to impact opinions,
culture, and policy, as well as advertising profit. The
Web 2.0 and social media  not only modify traditional
communication processes with new types of phenomena,
but also generate a huge amount of time-stamped data,
making it possible for the first time to study the dynam-
ics of online popularity at the global system scale.
In this letter we focus on the dynamics of popularity
of Wikipedia topics and Web pages. As popularity prox-
ies we have chosen the traffic of a document, expressed
by the number of clicks to that page generated by a spe-
cific population of users, and the number of hyperlinks
pointing to a document. It is well documented that the
statistical properties of these variables in the Web are
very heterogeneous, with distributions characterized by
fat tails roughly following power-law behavior [3–6]. Such
distributions have been explained with models based on
the rich-get-richer mechanism [7–9], but their validation
from the point of view of the dynamical behavior is prob-
lematic, mainly due to the difficulty to gather relevant
data. The data sets utilized here, however, contain tem-
poral information that makes it possible to observe the
growth in popularity of individual topics or pages, and
allows us to statistically characterize the microdynamics
by which online documents gather popularity.
Prior work on popularity dynamics has focused on
news [10, 11], videos [12, 13] and music . Here, we
analyze three large scale data sets that we assembled
about two information networks: the entire Wikipedia
and the Chilean Web. Wikipedia is a large collaborative
online encyclopedia with millions of articles and hundreds
of thousands of registered contributors (en.wikipedia.
Table I: Descriptions of the data sets constructed for our
study. The two Wiki collections refer to indegree (1) and traf-
fic (2) of Wikipedia topics, while the Chile collection refers to
indegree of Chilean Web pages.
Wiki13,293,102 Jan 2001 – Mar 2007
Wiki23,490,740 Feb 2008 – Current
Chile 3,252,7792001 – 2006
org). By mining the full edit history of every article, we
were able to reconstruct the entire Wikipedia structure
at any past point in time. The raw data was available un-
til March 2007 (download.wikimedia.org). Traffic data
with hourly temporal resolution was obtained by cross-
referencing with a separate data set originating from
Wikipedia proxy server logs (dammit.lt/wikistats).
Our third data source is a yearly sequence of crawls of the
Chilean Web, made available by courtesy of the TodoCL
search engine (www.todocl.com). This data consists of
one complete crawl of the .cl top-level domain for each
of the years 2002–2006. Basic statistics on each data set
are shown in Table I. The representative graphs of these
data sets have an approximately power-law distribution
of indegree [15–17], like the Web graph at large.
In order to gauge quantitatively the popularity of doc-
uments we consider the number of hyperlinks pointing
to a page (indegree k in the graph representation of the
Web ), and the traffic s of the page, expressed by the
number of clicks to it. Given either of these two popular-
ity proxies xtat time t, we study its logarithmic derivative
[∆x/x]t= (xt−xt−1)/xt−1, which represents the relative
variation of the measure in the time unit.
Fig. 1 shows the logarithmic derivative of the indegree
vs time for an example page in the English Wikipedia.
Despite a roughly exponential growth, the logarithmic
derivative provides a signature by which different topics
arXiv:1005.2704v2 [physics.soc-ph] 10 Oct 2010
0 6 12
Months since inception
18 24 30 36
?k / k
Figure 1: Time series of indegree k and its logarithmic deriva-
tive ∆k/k for the Wikipedia topic page about the artist Jen-
nifer Hudson.Topics typically experience a burst in their
early life. Here we observe later fluctuations as well. Jennifer
Hudson became popular through a television show leading to
her first burst. Another occurred when she won an Academy
Award; degree popularity doubled as many other pages linked
to the article (inset). The size of each circle shows another
popularity measure; it is proportional to the log-derivative of
the number of times the article is revised. The article receives
more edits when it attracts more links.
can be compared on the same scale. Almost all pages
experience a burst in ∆x/x near the beginning of their
life. Many pages receive little attention thereafter. While
some pages maintain a nearly constant positive logarith-
mic derivative indicating an exponential growth, a num-
ber of pages continue to experience intermittent bursts
in ∆x/x later in their life as in the example.
The distribution of magnitude ∆x/x for the two pop-
ularity measures at representative time resolutions is il-
lustrated in Figs. 2a–c. In all cases and at all granular-
ity we observe a heavy-tail behavior. Such heavy-tailed
burst magnitude distributions suggest a dynamics lacking
a characteristic scale. This is typical in a wide range of
“critical” physical, economic, and social systems, such as
avalanches, earthquakes, stock market crashes and hu-
man communication [19–23].
from the study of the distribution of the length of inter-
event intervals. For each document we record the time
stamp of each event for which ∆x/x > 1 and measure the
inter-event times ∆t. The probability distributions of ∆t
in the different data sets (Fig. 2d) are not distributed
following a Poissonian, as expected by queueing theory
in traditional systems, but in a power-law fashion with
a finite size cutoff, as in Omori’s law of earthquakes 
and other self-organized criticality phenomena .
The clear evidence for the bursty behavior of online
popularity dynamics calls for a stylized model able to
explain the observed features in terms of the already ac-
quired popularity of each page and the shifts in collective
Further evidence comes
α = 2.6
α = 1.9
α = 2.1
β = 0.8
The gray areas highlight the events for which ∆k > k (hence
∆k/k > 1). Maximum likelihood methods  in conjunction
with the Kolmogorov-Smirnoff (KS) statistic rule out lognor-
mal fits. In each case the KS statistic suggests that the power-
law curve is the better fit for the tail. For the distribution
of ∆k/k in Wikipedia (a) the parameters are α = 2.6 for the
exponent of the power law, with a lower cutoff of 12 and a
KS statistic of 0.005. For the Web (b) we find α = 1.9 for the
exponent of the power law, with a lower cutoff of 42 and a KS
statistic of 0.007. For the distribution of ∆s/s the parame-
ters are α = 2.1 with lower cutoff 90 and KS statistic 0.007.
The slopes of the best fit power laws are shown as guide to
the eye. These behaviors are consistent across a wide range
of temporal resolutions, as observed using time units from a
day to a year. (d) Distribution of the time interval ∆t be-
tween consecutive indegree bursts of Wikipedia articles. We
consider bursts such that ∆k/k > 1 after January 1st, 2003.
The three curves correspond to different time resolutions of
months, weeks, and days, aligned on the x-axis for ease of
visualization. As we increase the resolution the tail of the
distribution extends further, an indication that the cutoff is a
finite size effect. As a guide to the eye we show a power law
P(∆t) ∼ (∆t)−βwith β ≈ 0.8.
(a, b, c) Distributions of popularity burst size.
attention triggered by exogenous events.
The rich-get-richer mechanism can be simulated with
the classic linear preferential attachment model , in
its directed version , or with the ranking model by
Fortunato et al. . In the latter items are ranked ac-
cording to their popularity x, and the probability that an
existing item i receives a unit (e.g., a click) is P(i) ∼ r−δ
where riis the rank of i and δ > 0 is a free parameter that
tunes the power-law popularity distribution P(x) ∼ x−γ,
such that γ = 1 + 1/δ. Both preferential attachment
and ranking models, however, fail to reproduce the long
tails observed in the distributions of both ∆x/x and ∆t
(Figs. 3a-b). Neither model accounts for the occurrence
of exogenous factors that shift the attention of users and
suddenly increase the popularity of specific topics be-
cause of events such as an actor winning a prize, polit-
ical elections, etc. The minimal assumption in model-
ing exogenous perturbation consists in considering exter-
τ = 0.8
∆t (days between bursts)
tributions with what would be expected from a preferential
attachment (PA) process. Extensive numerical tests and max-
imum likelihood fitting  show that PA generates an ap-
proximately lognormal distribution (defined inside the gray
area) inconsistent with the long tail observed in the empirical
data. (b) The empirical inter-burst time distributions over-
lap when time is expressed in terms of the same unit (in the
figure, the common time unit is one day). The distribution
generated by PA is much narrower and fits an exponential
P(∆t) ∼ e−∆t/τwith τ = 0.8. (c,d) The rank-shift model,
despite its simplicity, reproduces quite well the distributions
of both event size (c) and inter-event time (d).
(a) Comparison of the empirical burst size dis-
nal stochastic events interfering with the basic rich-get-
richer mechanism by suddenly changing the popularity of
a topic. The simplest way to implement this mechanisms
consists in introducing in the ranking model a rerank-
ing probability ρ, such that at each iteration every item
is moved to a new position toward the front of the list,
chosen randomly with equal probability between 1 (the
top position) and the node’s current rank j. We call this
the rank-shift model .
In Fig. 4a and 4b we show the indegree distribution of
the rank-shift model for several values of ρ: δ = 1 (a)
and δ = 1.5 (b). The ranking model (ρ = 0) yields the
slope 1+1/δ indicated by the dashed line. The reranking
probability introduces an exponential cutoff in the distri-
bution, which becomes relevant for ρ ≈ 10−2and larger
(but we used 10−5< ρ < 10−3in our simulations).
The distribution of ∆k/k shows two distinctive fea-
tures, which are remarkably found in the empirical dis-
tributions: a maximum located in the range 0.01–0.1 and
a fat tail. Since the reranking probability is low, to un-
derstand the existence and the location of the maximum
it is convenient to consider the model in the absence of
the reranking mechanism. At a large time T, the ex-
pected value of the degree of the node with rank r is
proportional to Lr−δ, where L is the number of links
present in the network at time T. Let ∆L be the number
of links added during the interval ∆T at whose extremes
P(k) ~ k-(1+1/ δ)
P(k) ~ k-(1+1/ δ)
Stylized model without shifts
Expected slope 1+1/ δ
Figure 4: Rank-shift model. (a), (b). Indegree distribution:
δ = 1 (a), δ = 1.5 (b). (c) Comparison of the distribution of
popularity bursts for the ranking model  (circles) and a
stylized model built upon the simple assumptions of growth
described in the text. (d) Comparison of the distribution of
popularity bursts with the expected slope derived by assuming
that nodes are reranked at most once.
the ratio ∆k/k is computed. Let ∆L ? L, an assump-
tion verified in our calculations. Therefore, one can safely
assume that in the period ∆T the addition of new links
does not affect significantly the degree of nodes and their
relative ranking. So one can regard the growth process as
a multinomial process with probabilities p(r) ∝ r−δ. The
expected number ∆k of new links acquired by a node of
rank r is therefore p(r)∆L. The assumption of (almost)
stationarity also provides that k(r) ∼ p(r)L. We there-
fore expect ∆k/k for a node to be distributed around
∆L/L, regardless of the node. In Fig. 4c we compare the
simulation of the ranking model with the one of the multi-
nomial process with p(r) ∝ r−δ, by using the parameters
relative to the Wikipedia data set of January 2003, which
represents an ideal tradeoff between the needs of having
a sufficient number of bursts and a system size not too
large for the model to run. The number of nodes/pages
was N ≈ 1.3·105, the number of hyperlinks L ≈ 1.3·106
and ∆L ≈ 8 · 104. Based on the above discussion we ex-
pect to observe a maximum in the distribution of ∆k/k
located at ∆L/L ≈ 0.06. This is exactly where the max-
ima of the empirical distributions of popularity bursts
are located (see Fig. 2a).
The ranking model cannot reproduce the fat tail ob-
served in the real data. This is the reason why we in-
troduced the reranking mechanism in our model. Here,
it is the nodes that are suddenly promoted to a higher
rank that are responsible for the high values of ∆k/k
in the simulations.We consider a node that at time
T (the reference time at which we start measuring ∆k)
has rank r1, and is immediately promoted to rank r2,
with r2 chosen uniformly in 1 ≤ r2 ≤ r1. Under the
same assumption of stationarity that we made above,
the expected degree of the node before promotion is
k(r1) ≈ Lp(r1) ∝ r−δ
ρ ? 1 and that ∆L ? L, which hold for the parame-
ters used in our model. Since the reranking probability
is small, we can safely assume that no node is reranked
more than once during the observation time ∆T. The
expected number of links collected during the period ∆T
is then ∆k = ∆Lp(r2) ∝ r−δ
∆k/k ∝ (r2/r1)−δ. It is straightforward to derive the
distribution P(∆k/k) for a generic node that is promoted
at the beginning of ∆T by considering all pairs of values
r1, r2 uniformly distributed in 1 ≤ r2 ≤ r1 ≤ N. We
find P(∆k/k) ∝ (∆k/k)−(1+1/δ). In Fig. 4d we highlight
the tail of the distribution P(∆k/k) as produced by the
rank-shift model and our expectation for its slope: the
match is surprisingly good.
Simulations of the rank-shift model were performed us-
ing parameters matching those from the empirical data
(e.g., N = 2.8 × 105nodes for the Wikipedia in 2003);
the free model parameters were set to fit the empirical
distributions: 1 ≤ δ ≤ 1.2 and 10−5≤ ρ ≤ 10−3. For
ρ = 0 we recover the original ranking model, which yields
a lognormal distribution of ∆x/x, like the preferential
attachment (Fig. 3a). For ρ > 0 numerical simulations
show that the tail of the popularity burst magnitude dis-
tribution shifts from a lognormal to a power law. The
popularity distribution itself remains a power law; its ex-
ponent remains γ = 1 + 1/δ, but with an exponential
cutoff depending on ρ.
Such a parsimonious model is able to reproduce the
most relevant features observed in the empirical data.
Not only does rank-shift predict the distributions of both
popularity measures in our data sets, but also the long
tails of the distributions of indegree and traffic burst
size (Fig. 3c).Furthermore, it naturally accounts for
the maxima of the empirical distributions. Remarkably
the model captures the long-range distribution of inter-
burst intervals as well (Fig. 3d). The random rank-shift
mechanism is therefore able to capture the way in which
Web sites and pages gain and accumulate popularity: not
by a gradual proportional process, but by a sequence of
bursts that move them to the forefront of people’s at-
tention. Such bursts are different from those observed
in news-driven events , where attention fades rapidly
and overall popularity is lognormal-distributed. We also
found that smaller rank shifts are unable to capture the
critical burst behavior observed in the data .
At the present stage our model is mostly descriptive
and simply aims at reproducing at the coarsest level the
distributions that characterize popularity changes. Pos-
sible refinements may include the effect of search engines,
external events, news, word of mouth, social media, mar-
keting campaigns, or any combination of them.
study of traffic patterns and models [6, 29, 30] may help
shed empirical light on this question.
1. Let us further assume that
2. We expect therefore
We thank R. Baeza-Yates, C. Cattuto, B. Dravid,
V. Griffith, V. Loreto, M. Marchiori, M. Meiss. This
work was supported in part by a Lagrange Senior Fel-
lowship from the CRT Foundation to F.M., NSF grant
IIS-0513650 to A.V., and the Lilly Endowment Founda-
tion. S.F. gratefully acknowledges ICTeCollective, grant
238597 of the European Commission.
 H. A. Simon, in Computers, Communication, and the
Public Interest, edited by M. Greenberger (The Johns
Hopkins Press, Baltimore, 1971), pp. 37–72.
 D. Tapscott and A. D. Williams, Wikinomics: How Mass
Collaboration Changes Everything (Portfolio Hardcover,
 R. Albert, H. Jeong, and A.-L. Barab´ asi, Nature 401,
 A. Broder et al., Computer Networks 33, 309 (2000).
 M. Meiss, F. Menczer, and A. Vespignani, in Proc. 14th
Intl. World Wide Web Conf. (2005), pp. 510–518.
 M. Meiss et al., in Proc. 1st Intl. Conf. on Web Search
and Data Mining (WSDM) (2008), pp. 65–76.
 H. A. Simon, Biometrika 42, 425 (1955).
 D. de Solla Price, J. Amer. Soc. Inform. Sci. 27, 292
 A.-L. Barabasi and R. Albert, Science 286, 509 (1999).
 F. Wu and B. A. Huberman, Proc. Natl. Acad. Sci. USA
104, 17599 (2007).
 Z. Dezso et al., Phys. Rev. E 73, 066132 (2006).
 G.Szaboand B.A.
arXiv:0811.0405v1 [cs.CY] (2008).
 R. Crane and D. Sornette, Proc. Natl. Acad. Sci. USA
105, 15649 (2008).
 M. J. Salganik, P. S. Dodds, and D. J. Watts, Science
311, 854 (2006).
 R. Baeza-Yates and B. Poblete, Comput. Networks 50,
 A. Capocci et al., Phys. Rev. E 74, 036116 (2006).
 V. Zlatic et al., Phys. Rev. E 74, 016115 (2006).
 A. Clauset, C. R. Shalizi, and M. E. J. Newman, SIAM
Review 51, 661 (2009).
 A.-L. Barab´ asi, Nature 435, 207 (2005).
 B. B. Mandelbrot, Fractals and Scaling in Finance:
Discontinuity, Concentration, Risk, vol. E of Selecta
 M. H. R. Stanley et al., Nature 379, 804 (1996).
 B. Gutenberg and C. Richter, Bull. Seismol. Soc. Am.
34, 185 (1944).
 D. Rybski et al., Proc. Natl. Acad. Sci. USA 106, 12640
 F. Omori, J. Coll. Sci. Imp. Univ. Japan 7, 111 (1894).
 P. Bak, C. Tang and K. Wiesenfeld, Phys. Rev. Lett. 59,
 S. Dorogovtsev, J. Mendes, and A. Samukhin, Phys. Rev.
Lett. 85, 4633 (2000).
 S. Fortunato, A. Flammini, and F. Menczer, Phys. Rev.
Lett. 96, 218701 (2006).
 See EPAPS Document No. ... for alternative reranking
 B. Goncalves et al., in Late-breaking results at 2nd Intl.
Conf. on Web Search and Data Mining (WSDM) (2009).
 M. Meiss et al., Proc. 21sth ACM Conf. on Hypertext and
Hypermedia (HT) (2010).