ArticlePDF Available

Scale-free networks are rare

Springer Nature
Nature Communications
Authors:

Abstract and Figures

A central claim in modern network science is that real-world networks are typically "scale free," meaning that the fraction of nodes with degree k follows a power law, decaying like kαk^{-\alpha}, often with 2<α<32 < \alpha < 3. However, empirical evidence for this belief derives from a relatively small number of real-world networks. We test the universality of scale-free structure by applying state-of-the-art statistical tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational sources. We fit the power-law model to each degree distribution, test its statistical plausibility, and compare it via a likelihood ratio test to alternative, non-scale-free models, e.g., the log-normal. Across domains, we find that scale-free networks are rare, with only 4% exhibiting the strongest-possible evidence of scale-free structure and 52% exhibiting the weakest-possible evidence. Furthermore, evidence of scale-free structure is not uniformly distributed across sources: social networks are at best weakly scale free, while a handful of technological and biological networks can be called strongly scale free. These results undermine the universality of scale-free networks and reveal that real-world networks exhibit a rich structural diversity that will likely require new ideas and mechanisms to explain.
This content is subject to copyright. Terms and conditions apply.
ARTICLE
Scale-free networks are rare
Anna D. Broido1& Aaron Clauset 2,3,4
Real-world networks are often claimed to be scale free, meaning that the fraction of nodes
with degree kfollows a power law kα, a pattern with broad implications for the structure and
dynamics of complex systems. However, the universality of scale-free networks remains
controversial. Here, we organize different denitions of scale-free networks and construct a
severe test of their empirical prevalence using state-of-the-art statistical tools applied to
nearly 1000 social, biological, technological, transportation, and information networks.
Across these networks, we nd robust evidence that strongly scale-free structure is
empirically rare, while for most networks, log-normal distributions t the data as well or
better than power laws. Furthermore, social networks are at best weakly scale free, while a
handful of technological and biological networks appear strongly scale free. These ndings
highlight the structural diversity of real-world networks and the need for new theoretical
explanations of these non-scale-free patterns.
https://doi.org/10.1038/s41467-019-08746-5 OPEN
1Department of Applied Mathematics, University of Colorado, 526 UCB, Boulder, CO 80309, USA. 2Department of Computer Science, University of
Colorado, 430 UCB, Boulder, CO 80309, USA. 3BioFrontiers Institute, University of Colorado, 596 UCB, Boulder, CO 80309, USA. 4Santa Fe Institute, 1399
Hyde Park Road, Santa Fe, NM 87501, USA. Correspondence and requests for materials should be addressed to A.D.B. (email: anna.broido@colorado.edu)
or to A.C. (email: aaron.clauset@colorado.edu)
NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications 1
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Networks are a powerful way to represent and study the
structure of complex systems. Examples today are plen-
tiful and include social interactions among individuals,
protein or gene interactions in biological organisms, commu-
nication between digital computers, and various transportation
systems. Across scientic domains and classes of networks, it is
common to encounter the claim that most or all real-world
networks are scale free. The precise details of this claim vary17,
but generally a network is deemed scale free if the fraction of
nodes with degree kfollows a power-law distribution kα, where
α> 1. Some versions of this scale-free hypothesishave stronger
requirements, e.g., requiring 2 < α< 3 or that node degrees evolve
by the preferential attachment mechanism8,9. Other versions
make them weaker, e.g., the power law need only hold in the
upper tail10, it can exhibit an exponential cutoff11, or it is merely
more plausible than a thin-tailed distribution like an exponential
or normal12.
The study and use of scale-free networks is widespread
throughout network science1,9,1315. Many studies investigate
how the presence of scale-free structure shapes dynamics running
over a network6,7,14,1622. For example, under the Kuramoto
oscillator model, a transition to global synchronization is well-
known to occur at a precise threshold K
c
, whose value depends on
the power-law parameter αof the degree distribution2327. Scale-
free networks are also widely used as a substrate for network-
based numerical simulations and experiments, and the study of
specic generating mechanisms for scale-free networks has been
framed as providing a common basis for understanding all net-
work assembly3,8,9,2832.
The universality of scale-free networks, however, remains con-
troversial. Many studies nd support for their ubiquity4,5,16,17,3335,
while others challenge it on statistical or theoretical grounds2,3,10,3644.
This conict in perspective has persisted because past work has
typically relied upon small, often domain-specic data sets, less rig-
orous statistical methods, differing denitions of scale-freestructure,
and unclear standards of what counts as evidence for or against the
scale-free hypothesis47,16,17,4548. Additionally, few studies have
performed statistically rigorous comparisons of tted power-law dis-
tributions to alternative, non-scale-free distributions, e.g., the log-
normal or stretched exponential, which can imitate a power-law form
in realistic sample sizes49. These issues raise a natural question of the
pervasiveness of strong empirical evidence for scale-free structures in
real-world networks.
Central to this debate are the ambiguities induced by the
diversity of uses of the term scale-free network.The classic
denition1,21,35,37 states that a network is scale free if its degree
distribution Pr(k) has a power law kαform. A power law is the
only normalizable density function f(k) for node degrees in a
network that is invariant under rescaling, i.e., fðckÞ¼gðcÞfðkÞ
for any constant c14, and thus freeof a natural scale. For a
networks degree distribution, being scale free implies a power-
law pattern, and vice versa. Scale invariance can also refer to non-
degree-based aspects of network structure, e.g., its subgraphs may
be structurally self-similar50,51, and sometimes these networks are
also called scale free.
Scale-free networks are commonly discussed in the literature
on network assembly mechanisms, particularly in the context of
preferential attachment1,28,29, in which the probability that a
node gains a connection is proportional to its current degree k.
Although preferential attachment is the most famous mechanism
that produces scale-free networks, there exist other mechanisms
that can also produce them1315. And, some variations of pre-
ferential attachment do not produce power-law degree distribu-
tions35, although those networks are still sometimes, confusingly,
called scale free. Because the shape of a degree distribution
imposes only modest constraints on overall network structure52,
it represents relatively weak evidence when trying to distinguish
generating mechanisms5356, even when the distributions func-
tional form is clear. However, identifying that form from
empirical data can be non-trivial, e.g., because log-normals often
t degree distributions as well or better than power laws49,56,57.
Across this broad literature, the term scale-free networkmay
mean a precise or approximate statistical pattern in the degree
distribution, an emergent behavior in an asymptotic limit, or a
property of all networks assembled in part or in whole by a
particular family of mechanisms. This imprecision has con-
tributed to the controversy around the scale-free hypothesis.
Here, we focus narrowly on the traditional degree-based de-
nition of a scale-free network, which has the advantage of being
directly testable using empirical data. Even within this scope, the
denition is often modied by introducing auxiliary hypoth-
eses58. For instance, the scale-free pattern may only hold for the
largest degrees, implying Pr(k)kαfor kk
min
> 1, so that the
power law governs the distributions upper tail, while the lower
tail or bodyfollows some non-power-law pattern. In other
settings, nite-size effects may suppress the frequency of nodes
with degrees close to the underlying systems size, implying Pr(k)
kαeλk, where λgoverns the transition between a power law
and an exponential cutoff in the extreme upper tail. Or, extreme
heterogeneity among degrees may be of primary interest,
implying a restriction like 2 < α< 3, where the distributions mean
is nite while its variance is innite, asymptotically. Finally, the
power law may not even be meant to be a good model of the data
itself, but rather simply a better model than some alternatives,
e.g., an exponential or log-normal distribution, or just a generic
stand-in for a heavy-taileddistribution, i.e., one that decays
more slowly than an exponential.
A consequence of these varied uses of the term scale-free
network is that different researchers can use the same term to
refer to slightly different concepts, and this ambiguity complicates
efforts to empirically evaluate the basic hypothesis. Here, we
construct a severe test58 of the ubiquity of scale-free networks by
applying state-of-the-art statistical methods to a large and diverse
corpus of real-world networks. To explicitly cover the variations
in how scale-free networks have been dened in the literature, we
formalize a set of quantitative criteria that represent differing
strengths and types of evidence for scale-free structure in a par-
ticular network. This set of criteria unies the common varia-
tions, and their combinations, and allows us to assess different
types and degrees of evidence of scale-free degree distributions.
For each network data set in the corpus, we estimate the best-
tting power-law model, test its statistical plausibility, and com-
pare it to alternative non-scale-free distributions. We analyze
these results collectively, consider how the evidence for scale-free
structure varies across domains, and quantitatively evaluate their
robustness under several alternative criteria. We conclude with a
forward-looking discussion of the empirical relevance of the
scale-free hypothesis and offer suggestions for future research on
the structure of networks.
Results
Preliminaries. A key component of our evaluation of the scale-free
hypothesis is the use of a large and diverse corpus of real-world
networks. This corpus is composed of 928 network data sets drawn
from the Index of Complex Networks (ICON), a comprehensive
online index of research-quality network data, spanning all elds of
science59. It includes networks from biological, information, social,
technological, and transportation domains that range in size from
hundreds to millions of nodes (Fig. 1). These networks also exhibit
a wide variety of graph properties, such as being simple, directed,
weighted, multiplex, temporal, or bipartite.
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5
2NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
The scale-free hypothesis is dened most clearly for simple
graphs, which have only one degree distribution. More compli-
cated networks, e.g., a directed, weighted, multiplex network, can
have multiple degree distributions, which complicates testing
whether it is scale free; we must determine which degree
distributions count as evidence and which do not. We address
this problem in two ways. First, we apply a sequence of graph
transformations that convert a given network data set, dened as
a network with multiple graph properties, into a set of simple
graphs, each of which can be tested unambiguously for scale-free
structure (Supplementary Figs. 1 and 2). In this process, we
discard any resulting simple graph that is either too dense or too
sparse, under pre-specied thresholds, to be plausibly scale free.
(See Supplementary Note 1 for complete details.)
Then, for each simple graph associated with a network data set,
we apply standard statistical methods49 to identify the best-tting
power law in the degree distributions upper tail, evaluate its
statistical plausibility using a goodness-of-t test, and compare it
to four alternative distributions tted to the same part of the
upper tail using a likelihood-ratio test. The outputs of these
tting, testing, and comparison procedures for a given simple
graph encode in a vector the statistical evidence for its scale-free
structure. We then evaluate the set of these vectors for a given
network data set under criteria that formalize the different
denitions of a scale-free network.
For a given degree distribution, a key step in this process is the
selection of a value k
min
, above which the degrees are most closely
modeled by a scale-free distribution (see Methods). Hence, the
tting procedure truncates non-power-law behavior among low-
degree nodes, enabling a more clear evaluation of potentially
scale-free patterns in the upper tail. For technical reasons, all
model tests and comparisons must then be made only on the
degrees kk
min
in the upper tail49. Although our primary
evaluation uses a normalized likelihood ratio test60 that has been
specically shown valid for comparing the distributions con-
sidered here49, we also present results based on using standard
information criteria to compare distributional models61.
This approach for evaluating evidence for scale-free structure
has several advantages. It provides a systematic procedure
applicable to any network data set, and treats every data set
equivalently. It provides an evaluation of the scale-free hypothesis
over a maximally broad variety of networks, which facilitates the
characterization of their empirical ubiquity. And, it provides a
means to assess different kinds of evidence for scale-free
structure, by combining results from multiple degree distribu-
tions, if available in a network data set. The graph-simplication
process or the particular evidence criteria used may also
introduce biases into the results. We control for these possibilities
by considering alternative criteria under multiple robustness
analyses.
Denitions of a scale-free network. The different notions of
evidence for scale-free structure found in the literature can be
organized into a nearly nested set of categories (Fig. 2) and
assessed by applying standard statistical tools to each graph
associated with a network data set. Evidence for scale-free
structure typically comes in two types: (i) a power-law distribu-
tion is not necessarily a good model of the degrees, but it is a
relatively better model than alternatives, or (ii) a power law is
itself a good model of the degrees.
The rst type represents indirect evidence of scale-free
structure, because the observed degree distribution is not itself
required to be plausibly scale free, only that a scale-free pattern is
more believable than some non-scale-free patterns. A network
data set that exhibits this kind of evidence is placed into a
category called
Super-Weak: For at least 50% of graphs, no alternative
distribution is favored over the power law.
The second type represents direct evidence of scale-free
structure, and the various modications of a purely scale-free
pattern can be organized in a set of nested categories that
represent increasing levels of evidence:
Weakest: For at least 50% of graphs, a power-law distribution
cannot be rejected (p0.1).
Weak: Requirements of Weakest, and the power-law region
contains at least 50 nodes (n
tail
50).
150
Number of
networks
Number of
networks
Number of nodes n
Mean degree k
50
102
101
100
101102103104105106200 450
Fig. 1 Mean degree hkias a function of the number of nodes n. The 928
network data sets in the corpus studied here vary broadly size and density.
For data sets with more than one degree sequence (see text), we plot the
median of the corresponding set of mean degrees
Not Scale Free
Strongest
Super-Weak
Strong
Weak
Weakest
Fig. 2 Taxonomy of scale-free network denitions. Super-Weak meaning
that a power law is not necessarily a statistically plausible model of a
networks degree distribution but it is less implausible than alternatives;
Weakest, meaning a degree distribution that is plausibly power-law
distributed; Weak, adds a requirement that the distributions scale-free
portion cover at least 50 nodes; Strong, adds a requirement that 2 <^
α<3
and the Super-Weak constraints; and, Strongest, meaning that almost every
associated simple graph can meet the Strong constraints. The Super-Weak
overlaps with the Weak denitions and contains the Strong denitions as
special cases. Networks that fail to meet any of these criteria are deemed
Not Scale Free
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5 ARTICLE
NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Strong: Requirements of Weak and Super-Weak, and
2<^
α<3 for at least 50% of graphs.
Strongest: Requirements of Strong for at least 90% of graphs,
and requirements of Super-Weak for at least 95% of graphs.
The progression from Weakest to Strongest categories
represents the addition of more specic properties of the
power-law degree distribution, all found in the literature on
scale-free networks or distributions. We dene a sixth category of
networks that includes all networks that do not fall into any of the
above categories:
Not Scale Free: Networks that are neither Super-Weak nor
Weakest.
This evaluation scheme is parameterized by the different
fractions of simple graphs required by each evidence category.
The particular thresholds given above are statistically motivated
in order to control for false positives and overtting, and to
provide a consistent treatment across all networks (see Methods).
A more permissive parameterization of the scheme is also
considered as a robustness check. The above scheme favors
nding evidence for scale-free structure in three ways: (i) graphs
identied as being too dense or too sparse to be plausibly scale
free are excluded from all analyses, (ii) the estimation procedure
selects, by choosing k
min
, the subset of data in the upper tail that
best-ts a power law, and (iii) the comparisons to alternatives are
performed only on the data selected by the power law.
Scaling parameters. Across the corpus, the distribution of med-
ian estimated scaling parameters parameters ^
αis concentrated
around a value of ^
α¼2, but with a long right-tail such that 32%
of data sets exhibit ^
α3 (Fig. 3). The range α2;3Þis
sometimes identied as including the most emblematic of scale-
free networks8,9, and we nd that 39% of network data sets have
median estimated parameters in this range. We also nd that 34%
of network data sets exhibit a median parameter ^
α<2, which is a
relatively unusual value in the scale-free network literature.
Because every network produces some ^
α, regardless of the
statistical plausibility of the network being scale free, the shape of
the distribution of ^
αis not necessarily evidence for or against the
ubiquity of scale-free networks. It does, however, enable a check
of whether the estimation methods are biased by network size n.
Comparing ^
αand n,wend little evidence of strong systematic
bias (r2=0.24, p=1.82 × 1013; Supplementary Fig. 3).
Across the ve categories of evidence for scale-free structure,
the distribution of median ^
αparameters varies considerably
(Fig. 3, insets). For networks that fall into the Super-Weak
category, the distribution has a similar breadth as the overall
distribution, with a long right-tail and many networks with ^
α>3.
Most of the networks with ^
α<2 are spatial networks, represent-
ing mycelial fungal or slime mold growth patterns62. However,
few of these exhibit even Super-Weak or Weakest evidence of
scale-free structure, indicating that they are not particularly
plausible scale-free networks. Among the Weakest and Weak
categories, the distribution of median ^
αremains broad, with a
substantial fraction exhibiting ^
α>3. The Strong and Strongest
categories require that ^
α2;3Þ, and the few network data sets in
these categories are somewhat concentrated near ^
α¼2.
Alternative distributions. Independent of whether the power-law
model is a statistically good model of a networks degree
sequence, it may nevertheless be a better model than non-power-
law alternatives.
Across the corpus, likelihood ratio tests nd only modest
support for the power-law distribution over four alternatives
(Table 1). In fact, the exponential distribution, which exhibits a
thin tail and relatively low variance, is favored over the power law
(41%) more often than vice-versa (33%). This outcome accords
with the broad distribution of scaling parameters, as when α>3
(32% of data sets; Fig. 3), the degree distribution must have a
relatively thin tail.
The log-normal is a broad distribution that can exhibit heavy
tails, but which is nevertheless not scale free. Empirically, the log-
normal is favored more than three times as often (48%) over the
power law, as vice versa (12%), and the comparison is
inconclusive in a large number of cases (40%). In other words,
the log-normal is at least as good a t as the power law for the
vast majority of degree distributions (88%), suggesting that many
previously identied scale-free networks may in fact be log-
normal networks.
160
160
Number of data sets
140
120
100
80
60
40
20
0234567
Power-law parameter,
180
All data sets
Super-Weak scale-free
Weakest scale-free
Weak scale-free
Strong scale-free
Strongest scale-free
Super-Weak
scale-free
Weakest
scale-free
Weak
scale-free
Strong
scale-free
Strongest
scale-free
80
0
160
80
0
160
80
0
160
80
0
160
80
0
234567 234567 234567
234567
234567
Fig. 3 Distribution of ^
αby scale-free evidence category. For networks with more than one degree sequence, the median estimate is used, and for visual
clarity the 8% of networks with a median ^
α7 are omitted
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5
4NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
The Weibull or stretched exponential distribution can produce
thin or heavy tails, and is a generalization of the exponential
distribution. Compared to the power law, the Weibull is more
often the better statistical model (47%) than vice versa (33%).
Finally, the power-law distribution with an exponential cut-off
requires special consideration, as it contains the pure power-law
model as a special case. As a result, the likelihood of the power
law can never exceed that of the cutoff model, and the interesting
outcome is the degree to which the test is inconclusive between
the two. In this case, a majority of networks (56%) favor the
power law with cutoff model, indicating that nite-size effects
may be common.
The above ndings are corroborated by replacing the likelihood
ratio test with information criteria to perform the model
comparisons, which yield qualitatively similar conclusions
(Supplementary Table II).
Assessing the scale-free hypothesis. Given the results of tting,
testing, and comparing the power-law distribution across net-
works, we now classify each according to the six categories
described above.
Across the corpus, fully 49% of networks fall into the Not Scale
Free category (Fig. 4). Slightly less than half (46%) fall into the
Super-Weak category, in which a scale-free pattern among the
degrees is not necessarily statistically plausible itself, but remains
no less plausible than alternative distributions. The Weakest and
Weak categories represent networks in which the power-law
distribution is at least a statistically plausible model of the
networksdegree distributions. In the Weak case, this power-law
scaling covers at least 50 nodes, a relatively modest requirement.
These two categories account for only 29 and 19% of networks,
respectively, indicating that it is uncommon for a network to
exhibit direct statistical evidence of scale-free degree distributions.
Finally, only 10 and 4% of network data sets can be classied as
belonging to the Strong or Strongest categories, respectively, in
which the power-law distribution is not only statistically
plausible, but the exponent falls within the special α2;3Þ
range and the power law is a better model of the degrees than
alternatives. Taken together, these results indicate that genuinely
scale-free networks are far less common than suggested by the
literature, and that scale-free structure is not an empirically
universal pattern.
The balance of evidence for or against scale-free structure does
vary by network domain (Fig. 5). These variations provide a
means to check the robustness of our results, and can inform
future efforts to develop new structural mechanisms. We focus
our domain-specic analysis on networks from biological, social,
and technological sources (91% of the corpus).
Among biological networks, a majority lack any direct or
indirect evidence of scale-free structure (63% Not Scale Free;
Fig. 5a), in agreement with past work on smaller corpora of
biological networks42. The aforementioned fungal networks
represent a large share of these Not Scale Free networks, but
this group also includes some protein interaction networks and
some food webs. Among the remaining networks, one third
exhibit only indirect evidence (33% Super-Weak), and a modest
fraction exhibit the weakest form of direct evidence (19%
Weakest). This latter group includes cat and rat brain
connectomes. Compared to the corpus as a whole, biological
networks are slightly more likely to exhibit the strongest level of
direct evidence of scale-free structure (6% Strongest), and these
are primarily metabolic networks.
We note that the fungal networks comprise 28% of the corpus
and our analysis places 100% of them in the Not Scale Free
category. Given their spatially embedded nature, it could be
argued that these networks were unlikely to be scale-free in the
rst place. Because we know a posteriori that these networks
are Not Scale Free, omitting them will necessarily increase the
fraction of networks in at least some of the other categories.
We nd that these increases occur primarily in the weaker
evidence categories: 5% of non-fungal networks fall into the
Strongest category (up from 4%), 13% in Strong (from 10%), 27%
in Weak (from 19%), 40% in Weakest (from 29%), and 65%
Super-Weak (from 46%). Hence, the qualitative conclusions from
our primary analysis are robust to the inclusion of this particular
subset of networks.
In contrast, social networks present a different picture. Like the
corpus overall, half of social networks lack any direct or indirect
evidence of scale-free structure (50% Not Scale Free; Fig. 5b),
while indirect evidence is slightly less prevalent (41% Super-
Weak). The former group includes the Facebook100 online social
networks, and the latter includes many Norwegian board of
director networks.
However, among the categories representing direct evidence of
scale-free structure, more networks fall into the Weakest (48%)
and Weak (31%) categories, but not a single network falls into the
Strong or Strongest categories. Hence social networks are at best
only weakly scale free, and even in cases where the power-law
distribution is plausible, non-scale-free distributions are often a
better description of the data. The social networks exhibiting
weak evidence include many scientic collaboration networks and
roughly half of the Norwegian board of director networks.
Technological networks exhibit the smallest share of networks
for which there is no evidence, direct or indirect, of scale-free
structure (8% Not Scale Free; Fig. 5c), and the largest share
exhibiting indirect evidence (90% Super-Weak). The former
group includes some digital circuit networks and various water
Not
Scale Free
Super-Weak
All data sets
Weakest
Weak
Strong
Strongest 36 (0.04)
89 (0.10)
177 (0.19)
268 (0.29)
431 (0.46)
456 (0.49)
0.0 0.2 0.4 0.6 0.8 1.0
Fig. 4 Proportion of networks by scale-free evidence category. Bars
separate the Super-Weak category from the nested denitions, and from
the Not Scale Free category, dened as networks that are neither Weakest
or Super-Weak
Table 1 Comparison of scale-free and alternative
distributions
Test outcome
Alternative p(x)f(x)M
PL
Inconclusive M
Alt
Exponential eλx33% 26% 41%
Log-normal 1
xelogxμ
ðÞ
2
2σ212% 40% 48%
Weibull ex
b
ðÞ
a
33% 20% 47%
Power law with
cutoff
xαeλx44% 56%
The percentage of network data sets that favor the power-law model M
PL
, alternative model M
Alt
,
or neither, under a likelihood-ratio test, along with the form of the alternative distribution f(x)
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5 ARTICLE
NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
distribution networks. Among the categories representing direct
evidence, less than half exhibit the weakest form of direct
evidence (42% Weakest). This group includes roughly half of
CAIDAs networks of autonomous systems, several digital circuit
networks, and several peer-to-peer networks. In contrast to
biological or social networks, however, technological networks
exhibit a modest fraction with strong direct evidence of scale-free
structure (28% Strong). Networks in this category include the
other half of the CAIDA graphs. But, almost none of the
technological networks exhibit the strongest level of direct
evidence (1% Strongest).
Transportation networks do not represent a large enough
fraction of the corpus for a similar statistical analysis, but do offer
some useful insights for future work. Most of these networks
exhibit little evidence of scale-free structure. For example, all
three airport networks and 46 of 49 road networks fall into the
Not Scale Free category, while two of the remaining three road
networks fall into the Weak category and one into Super-Weak.
All of the subway networks fall into the Super-Weak category,
and nearly all fall into the Weakest category. These results suggest
that scale-free networks may represent poor models of many
transportation systems.
Robustness analysis. In order to assess the dependence of these
results on the evaluation scheme itself, we conduct a series of
robustness tests.
Specically, we test whether the above results hold qualitatively
when (i) we consider only network data sets that are naturally
simple (unweighted, undirected, monoplex, and no multi-edges);
(ii) we remove the power-law with cutoff from the set of alternative
distributions; (iii) we lower the percentage thresholds for all
categories to allow admission if any one constituent simple graph
satises the requirements; and (iv) we analyze the scaling behavior
ofthedegreedistributionsrst and second moment ratio. Details
for each of these tests, and two others, are given in Supplementary
Note 5. We also test whether the evaluation scheme correctly
classies four different types of synthetic networks with known
structure, both scale free and non-scale free. Details and results for
these tests are given in Supplementary Note 6.
The rst test evaluates whether the extension of the scale-free
hypothesis to non-simple networks and the corresponding graph-
simplication procedure biases the results. The second evaluates
whether the presence of nite-size effects drives the lack of
evidence for scale-free distributions. Applied to the corpus, each
test produces qualitatively similar results as the primary
evaluation scheme (see Supplementary Note 5, and Supplemen-
tary Fig. 4), indicating that the lack of empirical evidence for
scale-free networks is not driven by these particular choices in the
evaluation scheme itself.
The third considers a most permissiveparameterization,
which evaluates the impact of our requirements that a minimum
percentage of degree sequences satisfy the constraints of a
category. Under this test, we specically examine how the
evidence changes if we instead require that only one degree
sequence satises the given requirements. That is, this test lowers
the threshold for each category to be maximally permissive: if
scale-free structure exists in any associated degree sequence, the
network data set is counted as falling into the corresponding
category.
Under this modication, the Strong and Strongest categories
become equivalent, and 18% of network data sets fall into this
combined category (Fig. 6). We note that under this modied
evaluation, synthetic directed networks assembled by preferential
attachment should and do fall into the Strongest category of
evidence. The most permissive category, Super-Weak, only
changes slightly from 46 to 49%. And nally, performing this
test on only the directed networks within the corpus produces
similar results (see Supplementary Note 5 and Supplementary
Fig. 5). These tests demonstrate that the percentage requirements
used in the category denitions of the primary evaluation scheme
are not overly restrictive, and our qualitative conclusions are
robust to variations in the precise thresholds the evaluation uses.
The fourth test provides a model-independent evaluation of a
key prediction of the scale-free hypothesis. Scale-free distribu-
tions are mathematically unusual because only the moments hkmi
for m<α1 are nite, and all higher moments diverge14,
asymptotically. Hence, in the most widely analyzed range of α2
ð2;3Þfor scale-free networks, the moment ratio hk2i=hki2
diverges as the network size nincreases. This behavior underpins
the practical relevance of many theoretical analyses of scale-free
Not
Scale Free
Not
Scale Free
Super-Weak
Super-Weak
Strong
Strongest
Weakest
Weak
Strong
Strongest
Weakest
Weak
Not
Scale Free
Super-Weak
Technological
Strong
Strongest
Weakest
Weak
Social Biological
0.0 0.2 0.4 0.6 0.8 1.0
3 (0.01)
56 (0.28)
76 (0.37)
85 (0.42)
183 (0.90)
0 (0.00)
45 (0.31)
71 (0.48)
61 (0.41)
74 (0.50)
30 (0.06)
30 (0.06)
48 (0.10)
94 (0.19)
163 (0.33)
310 (0.63)
0 (0.00)
17 (0.08)
+ 0.13
– 0.14
– 0.10
– 0.09
– 0.04
+ 0.02
+ 0.01
– 0.05
– 0.10
+ 0.12
+ 0.19
– 0.04
– 0.41
– 0.02
+ 0.18
+ 0.18
+ 0.13
+ 0.44
a
b
c
Fig. 5 Proportion of networks by scale-free evidence category and by
domain. aBiological networks, bsocial networks, and ctechnological
networks. Tickers show change in percent from the pattern in all of the data
sets
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5
6NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
networks. Of course, diverging moments cannot be identied
from nite-sized networks, and no real-world network can
validate this prediction of the scale-free hypothesis. However, if
most networks are scale free in this way, the scaling behavior of
their moment ratios should exhibit a strongly diverging trend.
Across the corpus as a whole, we nd little evidence of a general
pattern of diverging moment ratios (Fig. 7). Instead, we nd
enormous variation in ratios across networks, domains, and
scales, such that networks with 102n103often have larger
ratios than networks several orders of magnitude larger, and even
those moments that do appear to increase with ndo not increase
fast enough to be consistent with scale-free behavior (Supple-
mentary Fig. 8). We leave a more detailed investigation of these
variations for future work.
Overall, the results of these tests corroborate our primary
ndings of relatively little empirical evidence for the ubiquity of
scale-free networks, and suggest that empirical degree distribu-
tions exhibit a richer variety of patterns, many of which are lower
variance, than predicted by the scale-free hypothesis.
Discussion
By evaluating the degree distributions of nearly 1000 real-world
networks from a wide range of scientic domains, we nd that
scale-free networks are not ubiquitous. Fewer than 36 networks
(4%) exhibit the strongest level of evidence for scale-free struc-
ture, in which every degree distribution associated with a network
is convincingly scale free. Only 29% of networks exhibit the
weakest form, in which a power law is simply a statistically
plausible model of some portion of the degree distributions
upper tail. And, for 46% of networks, the power-law form is not
necessarily itself a good model of the degree distribution, but is
simply a statistically better model than alternatives. Nearly half
(49%) of networks show no evidence, direct or indirect, of scale-
free structure, and in 88% of networks, a log-normal ts the
degree distribution as well as or better than a power law. These
results demonstrate that scale-free networks are not a ubiquitous
phenomenon, and suggest that their use as a starting point for
modeling and analyzing the structure of real networks is not
empirically well grounded.
Across different scientic domains, the evidence for scale-free
structure is generally weak, but varies somewhat in interesting
ways. These differences provide hints as to where scale-free
structure may genuinely occur. For instance, our evidence indi-
cates that scale-free patterns are more likely to be found in certain
kinds of biological and technological networks. These ndings
corroborate theoretical work on domain-specic mechanisms for
generating scale-free structure, e.g., in biological networks via the
well-established duplication-mutation model for molecular
networks3,30,54 or in certain kinds of technological networks via
highly optimized tolerance13,63.
In contrast, we nd that social networks are at best weakly scale
free, and although a power-law distribution can be a statistically
plausible model for these networks, it is often not a better model
than a non-scale-free distribution. Class imbalance in the corpus
precludes broad conclusions about the prevalence of scale-free
structure in information or transportation networks. However,
the few of these in the corpus provide little indication that they
would exhibit strongly different structural patterns than the better
represented domains.
The variation of evidence across social, biological, and tech-
nological domains (Fig. 5) is consistent with a general conclusion
that no single universal mechanism explains the wide diversity of
degree structures found in real-world networks. The failure to
nd broad evidence for scale-free patterns in the degree dis-
tributions of networks indicates that much remains unknown
about how network structure varies across different domains64
and what kinds of structural patterns are common across them.
We look forward to new investigations of statistical differences
and commonalities, which seem likely to generate new insights
about the structure of complex systems.
The statistical evaluation here considers only the degree dis-
tributions of networks, and hence says relatively little about other
structural patterns or the underlying processes that govern the
form of any particular network. However, the nding that scale-
free networks are empirically uncommon does imply a generally
limited role for any mechanism that necessarily produces power-
law degree distributions9,15,32,56, especially in domains where the
evidence for strongly scale-free networks is weak, e.g., social
networks. The mechanisms that govern the shape of a particular
network generally cannot be determined from a static networks
degree distribution alone, as it is both a weak constraint on
network structure52 and a weak discriminator between mechan-
isms54. For some networks, there is strong evidence that
mechanisms like preferential attachment apply, e.g., scientic
citation networks28,29,55,56. However, the results described here
imply that if such mechanisms apply more broadly, they are
heavily modied or even dominated by other, perhaps domain-
specic mechanisms. A claim that some network is scale free
should thus be established using a severe statistical test58 that goes
beyond static degree distributions.
Number of nodes n
101
10–1
100
101
102
103
102103104105106
k2 / k2
Smoothed mean
Biological data set
Informational data set
Social data set
Technological data set
Transportation data set
Fig. 7 Moment ratio scaling. For 3662 degree sequences, the empirical ratio
of the second to rst moments hk2i=hki2as a function of network size n,
showing substantial variation across networks and domains, little evidence
of the divergence pattern expected for scale-free distributions, and perhaps
a roughly sublinear scaling relationship (smoothed mean via exponential
kernel, with smoothed standard deviations)
All data sets
Not
Scale Free
Super-Weak
Strong
Strongest
Weakest
Weak
– 0.02
+ 0.02
+ 0.09
+ 0.09
+ 0.08
+ 0.14
165 (0.18)
165 (0.18)
258 (0.28)
354 (0.38)
452 (0.49)
441 (0.48)
0.0 0.2 0.4 0.6 0.8 1.0
Fig. 6 Proportions of networks in each scale-free evidence category with
removed degree percentage requirements
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5 ARTICLE
NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
In theoretical network science, assuming a power law for a
random graphs degree distribution can simplify mathematical
analyses, and a power law can be a useful conceptual model for
building intuition about the impact of extreme degree hetero-
geneity. And, for some types of calculations, e.g., the location of
the epidemic threshold, scale-free networks can be useful models,
even when real-world degree distributions are simply heavy
tailed, rather than scale free6567. On the other hand, if a math-
ematical result depends strongly on the asymptotic behavior of a
scale-free degree distribution, the resultspractical relevance will
necessarily depend on the empirical prevalence of scale-free
structures, which we show to be uncommon or rare, depending
on the kind of scale-free structure of interest. Mathematical
results based on extreme degree heterogeneity may, in fact, have
more narrow applicability than previously believed, given the lack
of evidence that empirical moment ratios diverge as quickly as
those results typically assume (Fig. 7and Supplementary Fig. 8).
The structural diversity of real-world networks uncovered here
presents both a puzzle and an opportunity. The strong focus in
the scientic literature on explaining and exploiting scale-free
patterns has meant relatively less is known about mechanisms
that produce non-scale-free structural patterns, e.g., those with
degree distributions better tted by a log-normal. Two important
directions of future work will be the development and validation
of novel mechanisms for generating more realistic degree struc-
ture in networks, and novel statistical techniques for identifying
or untangling them given empirical data. Similarly, theoretical
results concerning the behavior of dynamical processes running
on top of networks, including spreading processes like epide-
miological models, social inuence models, or models of syn-
chronization, may need to be reassessed in light of the genuine
structural diversity of real-world networks.
The statistical methods and evidence categories developed and
used in our evaluation of the scale-free hypothesis provide a
quantitatively rigorous means by which to assess the degree to
which some network exhibits scale-free structure. Their applica-
tion to a novel network data set should enable future researchers
to determine whether assuming scale-free structure is empirically
justied.
Furthermore, large corpora of real-world networks, like the one
used here, represent a powerful, data-driven resource by which to
investigate the structural variability of real-world networks64.
Such corpora could be used to evaluate the empirical status of
many other broad claims in the networks literature, including the
tendency of social networks to exhibit high clustering coefcients
and positive degree assortativity68, the prevalence of the small-
world phenomena69, the prevalence of rich clubsin networks70,
the ubiquity of community71 or hierarchical structure72, and the
existence of super-familiesof networks73. We look forward to
these investigations and the new insights they will bring to our
understanding of the structure and function of networks.
Methods
Network data sets. Network data sets were obtained through the ICON59,an
online index of real-world network data sets from all domains of science. The
composition of the corpus is roughly half biological networks, a third social or
technological networks, and a sixth information or transportation networks
(Supplementary Table 1). The 928 networks included span ve orders of magni-
tude in size, are generally sparse with a mean degree of hki3 (Fig. 1), and possess
a range of graph properties, e.g., simple, directed, weighted, multiplex, temporal, or
bipartite.
Prior to analysis, each network data set is transformed into one or more graphs,
whose degree sequences can be unambiguously tested for a scale-free pattern (for
example, Supplementary Fig. 1). For each non-simple graph property of a network,
a specic transformation is applied that increases the number of graphs in the data
set while removing the given graph property. Full details of this process are given in
Supplementary Note 1, and Supplementary Fig. 2. Complicated network data sets
can produce a combinatoric number of simple graphs under this process. Treating
every simplied degree sequence independently could lead to skewed results, e.g., if
a few non-scale-free data sets account for a large fraction of the total extracted
simple graphs. To avoid this bias, results are reported at the level of network data
sets. Additionally, we require that simplied graphs are neither too sparse nor too
dense to be potentially scale free and thus retain for analysis only simplied graphs
with mean degree 2 <hki<
ffiffiffi
n
p.
Simplifying the 928 network data sets produced 18,448 simple graphs, of which
14,415 were excluded for being too sparse and 371 excluded for being too dense
(about 80.4% of derived simple graphs). Results in the main text are reported only
in terms of the remaining 3662 simple graphs (about 3.9 per network data set). Of
the 928 network data sets, 735 (79%) produced no graphs that were excluded for
being too sparse. More than 90% of graphs excluded for being too sparse were
produced by simplifying three network data sets (<1% of the corpus). Similarly, 874
(94%) of the network data sets produced no graphs that were excluded for being
too dense. More than 70% of graphs excluded for being too dense were produced
by simplifying three network data sets. Finally, 782 (84%) of the data sets generated
at most three degree sequences prior to applying the too-sparse and too-dense
lters. Hence, the vast majority of data sets were uninvolved in the production of
many excluded graphs.
Modeling degree distributions. For the degree sequence fkik1;k2;¼;knof a
given network data set, we estimate the best-tting power-law distribution of the
form
PrðkÞ¼Ck
αα>1;kkmin 1;ð1Þ
where αis the scaling exponent, Cis the normalization constant, and kis integer
valued. This specication models only the distributions upper tail, i.e., degree
values kk
min
, and discards data from any non-power-law portion in the lower
distribution.
Fitting this model to an empirical degree sequence requires rst choosing the
location
^
kmin at which the upper tail begins, and then estimating the scaling
exponent ^
αon the truncated data k
^
kmin. Because the choice of k
min
changes the
sample size, it cannot be directly estimated using likelihood or Bayesian techniques.
Here, the standard KS-minimization approach is used to choose
^
kmin and the
discrete maximum likelihood estimator is used to choose ^
α49. Technical details of
the estimation procedure are given in Supplementary Note 2.
Fitting the power-law distribution always returns some parameters
^
θ¼ð
^
kmin;^
αÞ. However, parameters alone give no indication of the quality of the
tted model. A standard goodness-of-t test is used to assess the statistical
plausibility of the tted model, which returns a standard p-value (see
Supplementary Note 2). Following standard practice in this setting49,ifp0.1,
then the degree sequence is deemed plausibly scale free, while if p< 0.1, the scale-
free hypothesis is rejected. Hence, if the underlying data generating process is
indeed scale free, this test has a false negative rate of 0.1. The results of this test
provide direct evidence for or against a network exhibiting scale-free structure.
Each power-law model
^
θis compared to four non-scale-free alternative models,
estimated via maximum likelihood on the same degrees k
^
kmin, using a standard
Vuong normalized likelihood ratio test (LRT)49,60 (see Supplementary Notes 3, 4).
The restriction to k
^
kmin is necessary to make the model likelihoods directly
comparable, and slightly biases the test in favor of the power law, as the best choice
of
^
kmin for an alternative may not be the same as the best choice for the power
law49. The results of this test provide indirect evidence about the scale-free
hypothesis, as a power-law model can be favored over some alternative even if the
power law itself is not a statistically plausible model of the data. The non-scale free
alternatives used here are the (i) exponential, (ii) log-normal, (iii) power-law with
exponential cutoff, and (iv) stretched exponential or Weibull distributions
(Table 1), all of which have been used previously as models of degree
distributions7478, and for which the validity of the LRT used here has specically
been previously established49. Results from an alternative comparison based on
information criteria61 are given in Supplementary Table II and in Supplementary
Figs. 6 and 7.
The tted power law and each alternative are compared using a likelihood ratio
test (see Supplementary Note 4), with the test statistic R¼L
PL L
Alt;where LPL
is the log-likelihood of the power-law model and LAlt is the log-likelihood of a
particular alternative model. The sign of Rindicates which model is a better tto
the data: the power law R>0ðÞ, the alternative ðR <0Þ, or neither 0ðÞ.
The test statistic Ris derived from data, meaning that it is itself a random
variable subject to statistical uctuations49,60. As a result, the sign of Ris
meaningful only if its magnitude jRj is statistically distinguishable from 0. This
determination is made by a standard two-tailed test against a null hypothesis of
0, which yields a standard p-value. If p0.1, then jRj is statistically
indistinguishable from 0 and neither model is a better explanation of the data than
the other. If p< 0.1, then the data provide a clear conclusion in favor of one model
or the other, depending on the sign of R. This threshold sets the false positive rate
for the alternative distribution at 0.05. Corrections for multiple tests, e.g., a family-
wise error rate method like Bonferroni or a false discovery correction like
Benjamini-Hochberg, are not employed. Such corrections would simply lower the
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5
8NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
obtained p-values without changing the overall conclusions, while introducing
additional assumptions into the analysis.
To report results at the level of a network data set, we apply the LRTs to all the
associated simple graphs and then aggregate the results. For each alternative
distribution, we count the number of simple graphs associated with a particular
network data set in which the outcome favored the alternative, favored the power
law, or had an inconclusive result. Normalizing these counts across outcome
categories provides a continuous measure of the relative evidence that the data set
falls into each of category.
Parameters for dening scale-free network. Threshold parameters for the pri-
mary evaluation criteria were selected to balance false positive and false negative
rates, and to provide a consistent evaluation of evidence independent of the
associated graph properties or source of data. For the Super-Weak and Weakest
categories, a threshold of 50% ensures that the given property is present in a
majority of simple graphs associated with a network data set. For the Weak
category, a threshold of at least 50 nodes covered by the best-tting power law in
the upper tail follows standard practices49 to reduce the likelihood of false positive
errors due to low statistical power. For the Strong category, α2;3Þcovers the
full parameter range for which scale-free distributions have an innite second
moment but a nite rst moment. For the Strongest category, the thresholds of
90% for the goodness-of-t test and 95% for likelihood ratio tests against alter-
natives match the expected error rates for both tests under the null hypothesis. If
every graph associated with a network data set is scale free, the goodness-of-t test
is expected to incorrectly reject the power-law model 0.1 of the time, and the
likelihood ratio test will falsely favor the alternative 0.05 of the time. In the most
permissiveparameterization of the scheme (see Supplementary Note 5), we relax
the threshold requirements so that if at least one graph meets the given criteria, the
network is placed in this category. In this permissive parameterization, a directed
network with a power-law distribution in the in-degrees should be and is classied
as Strongest.
For specic networks, domain knowledge may suggest that some degree
sequences are potentially scale free while others are likely not. A non-uniform
weighting scheme on the set of associated degree sequences would allow such prior
knowledge to be incorporated in a Bayesian fashion. However, no xed non-
uniform scheme can apply universally correctly to networks as different as, for
example, directed trade networks, directed social networks, and directed biological
networks. To provide a consistent treatment across all networks, regardless of their
properties or source, we employ an uninformative (uniform) prior, which assigns
equal weight to each associated degree sequence. In future work on specic
subgroups of networks, a domain-specic weight scheme could be used with the
evaluation criteria described here.
Results for synthetic networks. The accuracy of the tting, comparing, and
testing methods, and the overall evaluation scheme itself, were evaluated using four
classes of synthetic data with known structure. Three of these generated networks
that contain power-law degree distributions: a directed version of preferential
attachment79, a directed vertex copy model21, and a simple temporal power-law
random graph. One generated networks that do not: simple Erdös-Rényi random
graphs. Applied to synthetic networks generated by these models, our evaluation
scheme correctly classied each of the synthetic network data sets according to the
scale-free categories suitable for their generating parameters (see Supplementary
Note 6).
Data availability
The network data sets used are available via https://icon.colorado.edu. Code for graph-
simplication functions and power-law evaluations, and data for replication are available
at https://github.com/adbroido/SFAnalysis.
Received: 23 January 2018 Accepted: 23 January 2019
References
1. Albert, R., Jeong, H. & Barabási, A. L. Diameter of the World-Wide Web.
Nature 401, 130131 (1999).
2. Pržulj, N. Biological network comparison using graphlet degree distribution.
Bioinformatics 23, 177183 (2007).
3. Lima-Mendez, G. & van Helden, J. The powerful law of the power law and
other myths in network biology. Mol. Biosyst. 5, 14821493 (2009).
4. Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P. & Bhattacharjee, B.
Measurement and analysis of online social networks. Proc. 7th ACM
SIGCOMM Conference on Internet Measurement (IMC).2942 (San Diego,
CA, USA, 2007).
5. Agler, M. T. et al. Microbial Hub Taxa Link Host and abiotic factors to plant
microbiome variation. PLoS Biol. 14,131 (2016).
6. Ichinose, G. & Sayama, H. Invasion of cooperation in scale-free networks:
accumulated versus average payoffs. Artif. Life 23,2533 (2017).
7. Zhang, L., Small, M. & Judd, K. Exactly scale-free scale-free networks. Phys. A
433, 182197 (2015).
8. Dorogovtsev, S. N. & Mendes, J. F. F. Evolution of networks. Adv. Phys. 51,
10791187 (2002).
9. Barabási, A. L. & Albert, R. Emergence of scaling in random networks. Science
286, 509512 (1999).
10. Willinger, W., Alderson, D. & Doyle, J. C. Mathematics and the internet: a
source of enormous confusion and great potential. Not. AMS 56, 586599
(2009).
11. Pastor-Satorras, R. & Vespignani, A. Epidemic dynamics in nite size scale-
free networks. Phys. Rev. E 65, 035108 (2002).
12. Albert, R., Jeong, H. & Barabási, A.-L. Error and attack tolerance of complex
networks. Nature 406, 378382 (2000).
13. Carlson, J. M. & Doyle, J. Highly optimized tolerance: a mechanism for power
laws in designed systems. Phys. Rev. E 60, 14121427 (1999).
14. Newman, M. E. J. Power laws, Pareto distributions and Zipfs law. Contemp.
Phys. 46, 323351 (2005).
15. Mitzenmacher, M. A brief history of generative models for power law and
lognormal distributions. Internet Math. 1, 226251 (2003).
16. Goh, K.-I., Oh, E., Jeong, H., Kahng, B. & Kim, D. Classication of scale-free
networks. Proc. Natl Acad. Sci. USA 99, 1258312588 (2002).
17. Pastor-Satorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A.
Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925979 (2015).
18. Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free
networks. Phys. Rev. Lett. 86, 32003203 (2001).
19. Aiello, W., Chung, F. R. K. & Lu, L. A random graph model for massive
graphs. Proc. 32nd Annual ACM Symposium on Theory of Computing.
171180 (Portland, OR, USA, 2000).
20. Aiello, W., Chung, F. & Lu, L. A random graph model for power law graphs.
Exp. Math. 10,5366 (2001).
21. Newman, M. Networks: An Introduction (Oxford Univerity Press, Oxford,
2010).
22. Newman, M. E. J. Spread of epidemic disease on networks. Phys. Rev. E 66,
016128 (2002).
23. Lee, D. S. Synchronization transition in scale-free networks: clusters of
synchrony. Phys. Rev. E 72,16 (2005).
24. Restrepo, J. G., Ott, E. & Hunt, B. R. Onset of synchronization in large
networks of coupled oscillators. Phys. Rev. E 71,112 (2005).
25. Ichinomiya, T. Frequency synchronization in a random oscillator network.
Phys. Rev. E 70, 5 (2004).
26. Restrepo, J. G., Ott, E. & Hunt, B. R. Synchronization in large directed
networks of coupled phase oscillators. Chaos 16, 015107 (2006).
27. Restrepo, J. G., Ott, E. & Hunt, B. R. Emergence of synchronization in
complex networks of interacting dynamical systems. Phys. D. 224, 114122
(2006).
28. Price, D. Jd. S. Networks of scientic papers. Science 149, 510515 (1965).
29. Simon, H. A. On a class of skew distribution functions. Biometrika 42,
425440 (1955).
30. Pastor-Satorras, R., Smith, E. & Solé, R. V. Evolving protein interaction
networks through gene duplication. J. Theor. Biol. 222, 199210 (2003).
31. Berger, N., Borgs, C., Chayes, J. T., DSouza, R. M. & Kleinberg, R. D. Proc.
31st International Colloquium on Automata, Languages and Programming
(ICALP). 208221 (Turku, Finland, 2004).
32. Leskovec, J., Kleinberg, J. & Faloutsos, C. Graph evolution. ACM Trans.
Knowl. Discov. Data 1,141 (2007).
33. Gamermann, D., Triana, J. & Jaime, R. A comprehensive statistical study of
metabolic and protein-protein interaction network properties. https://arxiv.
org/abs/1712.07683 (2017).
34. House, T., Read, J. M., Danon, L. & Keeling, M. J. Testing the hypothesis of
preferential attachment in social network formation. EPJ Data Science,https://
doi.org/10.1140/epjds/s13688-015-0052-2 (2015).
35. A. Barabasi, Network Science (Cambridge University Press, Cambridge, UK,
2016).
36. Tanaka, R. Scale-rich metabolic networks. Phys. Rev. Lett. 94,14 (2005).
37. Li, L., Alderson, D., Tanaka, R., Doyle, J. C. & Willinger, W. Towards a theory
of scale-free graphs: denition, properties, and implications (extended
version). Internet Math. 2, 431523 (2005).
38. Stumpf, M. P. H. & Porter, M. A. Critical truths about power laws. Science
335, 665666 (2012).
39. Golosovsky, M. Power-law citation distributions are not scale-free. Phys. Rev.
E032306,112 (2017).
40. Stumpf, M. P. H., Wiuf, C. & May, R. M. Subnets of scale-free networks are
not scale-free: Sampling properties of networks. Proc. Natl Acad. Sci. USA 102,
42214224 (2005).
41. Jackson, M. O. & Rogers, B. W. Meeting strangers and friends of friends: how
random are social networks? Am. Econ. Rev. 97, 890915 (2007).
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5 ARTICLE
NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
42. Khanin, R. & Wit, E. How scale-free are biological networks. J. Comp. Biol.13,
810-818 (2006).
43. Adamic, L. A. & Huberman, B. A. Technical comment on power-law
distribution of the world wide web by A.-L. Barabási and R. Albert and H.
Jeong and G. Bianconi. Science 287, 2115 (2000).
44. Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. N. Generic scale of the
scale-freegrowing networks. https://arxiv.org/abs/cond-mat/0011115
(2000).
45. Redner, S. How popular is your paper? An empirical study of the citation
distribution. Eur. Phys. J. B 134, 131134 (1998).
46. Pachon, A., Sacerdote, L. & Yang, S. Scale-free behavior of networks with the
copresence of preferential and uniform attachment rules. Phys. D: Nonlinear
Phenom. 371,112 (2018).
47. Seshadhri, C., Pinar, A. & Kolda, T. G. An in-depth analysis of stochastic
Kronecker graphs. J. ACM 60,130 (2011).
48. Eikmeier, N. & Gleich, D. F. Proc. 23rd ACM SIGKDD Internat. Conference on
Knowledge Discovery and Data Mining (KDD). 817826 (Halifax, NS, Canada,
2017).
49. Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-law distributions in
empirical data. SIAM Rev. 51, 661703 (2009).
50. Song, C., Havlin, S. & Makse, H. Self-similarity of complex networks. Nature
433, 392395 (2005).
51. Dmitri Krioukov, M. Á. S. & Boguñá, M. Self-similarity of complex networks
and hidden metric spaces. Phys. Rev. Lett. 100, 078701 (2008).
52. Alderson, D. L. & Li, L. Diversity of graphs with highly variable connectivity.
Phys. Rev. E 75, 046102 (2007).
53. Mitzenmacher, M. Editorial: the future of power law research. Internet Math.
2, 525534 (2004).
54. Middendorf, M., Ziv, E. & Wiggins, C. H. Inferring network mechanisms: The
Drosophila melanogaster protein interaction network. Proc. Natl Acad. Sci.
USA 102, 31923197 (2005).
55. Newman, M. E. J. The rst-mover advantage in scientic publication. EPL 86,
68001 (2009).
56. Redner, S. Citation statistics from 110 years of physical review. Phys. Today
58,4954 (2005).
57. Radicchi, F., Fortunato, S. & Castellano, C. Universality of citation
distributions: toward an objective measure of scientic impact. Proc. Natl
Acad. Sci. USA 105, 1726817272 (2008).
58. Mayo, D. G. Error and the Growth of Experimental Knowledge (Science and Its
Conceptual Foundations series) (University of Chicago Press, Chicago,
IL,1996).
59. Clauset, A., Tucker, E. & Sainz, M. The Colorado Index of Complex Networks,
sanitize@url url icon.colorado.edu (2016).
60. Vuong, Q. H. Likelihood ratio tests for model selection and non-nested
hypotheses. Econometrica 57, 307333 (1989).
61. Claeskens, G. & Hjort, N. L. Model Selection and Model Averaging.
(Cambridge University Press, Cambridge, England, 2008).
62. Lee, S. H., Fricker, M. D. & Porter, M. A. Mesoscale analyses of fungal
networks as an approach for quantifying phenotypic traits. J. Complex Netw. 5,
145159 (2017).
63. Newman, M. E. J., Girvan, M. & Farmer, J. D. Optimal design, robustness, and
risk aversion. Phys. Rev. Lett. 89, 028301 (2002).
64. K. Ikehara, A. Clauset, Characterizing the structural diversity of complex
networks across domains. https://arxiv.org/abs/1710.11304 (2017).
65. Barrat, A., Barthélemy, M. & Vespignani, A. Dynamical Processes on Complex
Networks (Cambridge Univerity Press, Cambridge, UK, 2008).
66. Vespignani, A. Modelling dynamical processes in complex socio-technical
systems. Nat. Phys. 8,3239 (2012).
67. Pastor-Satorras, R. & Vespignani, A. Epidemic dynamics in nite size scale-
free networks. Phys. Rev. E 65,14 (2002).
68. Newman, M. E. J. & Park, J. Why social networks are different from other
types of networks. Phys. Rev. E 68, 036122 (2003).
69. Watts, D. J. & Strogatz, S. H. Collective dynamics of small-worldnetworks.
Nature 393, 440442 (1998).
70. Colizza, V., Flammini, A., Serrano, M. A. & Vespignani, A. Detecting rich-club
ordering in complex networks. Nat. Phys. 2, 110115 (2006).
71. Girvan, M. & Newman, M. E. J. Community structure in social and biological
networks. Proc. Natl Acad.Sci. USA 99, 78217826 (2002).
72. Clauset, A., Moore, C. & Newman, M. E. J. Hierarchical structure and the
prediction of missing links in networks. Nature 453,98101 (2008).
73. Milo, R. et al. Superfamilies of evolved and designed networks. Science 303,
15381542 (2004).
74. Amaral, L. A. N., Scala, A., Barthelemy, M. & Stanley, H. E. Classes of small-
world networks. Proc. Natl Acad. Sci. USA 97, 1114911152 (2000).
75. Buzsáki, G. & Mizuseki, K. The log-dynamic brain: how skewed distributions
affect network operations. Nat. Rev. Neurosci. 15, 264278 (2014).
76. Jeong, H., Mason, S. P., Barabási, A. L. & Oltvai, Z. N. Lethality and centrality
in protein networks. Nature 411,4142 (2001).
77. Malevergne, Y., Pisarenko, V. F. & Sornette, D. Empirical distributions of log-
returns: between the stretched exponential and the power law? Quant. Financ.
5, 379401 (2005).
78. DuBois, T., Eubank, S. & Srinivasans, A. The effect of random edge removal
on network degree sequence. Electron. J. Comb. 19,120 (2012).
79. Easley, D. & Kleinberg, J. Networks, Crowds, and Markets: Reasoning about a
Highly Connected World (Cambridge University Press, Cambridge, UK, 2010).
Acknowledgements
The authors thank Eric Kightley, Johan Ugander, Cristopher Moore, Mark Newman,
Cosma Shalizi, Alessandro Vespignani, Marc Barthelemy, Juan Restrepo, Petter Holme,
and Albert-László Barabási for helpful conversations, and acknowledge the BioFrontiers
Computing Core at the University of Colorado Boulder for providing high performance
computing resources (NIH 1S10OD012300) supported by BioFrontiers IT. This work
was supported in part by Grant No. IIS-1452718 (A.C.) from the National Science
Foundation. Publication of this article was funded by the University of Colorado Boulder
Libraries Open Access Fund.
Author contributions
A.D.B. and A.C. conceived the research, designed the analyzes, and wrote the manu-
script. A.D.B. conducted the analyzes.
Additional information
Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467-
019-08746-5.
Competing interests: The authors declare no competing interests.
Reprints and permission information is available online at http://npg.nature.com/
reprintsandpermissions/
Journal peer review information:Nature Communications thanks the anonymous
reviewers for their contribution to the peer review of this work.
Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2019
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08746-5
10 NATURE COMMUNICATIONS | (2019) 10:1017 | https://doi.org/10.1038/s41467-019-08746-5 | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com

Supplementary resource (1)

... A major aspect of data analysis involves inferring the distribution which most accurately describes a given dataset. In particular, there has been much debate regarding the presence of power-law (PL) behaviour in datasets [15][16][17][18][19][20][21]. Whilst recently many advances have been made in developing techniques for identifying PL behaviour [22][23][24], there are still occasions when it is difficult to establish whether data are best described by a PL. ...
... Whilst recently many advances have been made in developing techniques for identifying PL behaviour [22][23][24], there are still occasions when it is difficult to establish whether data are best described by a PL. Specifically, determining between data which are drawn from a PL and that from a log-normal distribution is often intractable without a large number of data points [18,21,22,25,26]. ...
... As discussed in the introduction, determining the distribution which best describes observed data is often important, as differing underlying distributions can have profound qualitative and quantitative differences. Specifically, accurately determining PL behaviour is important, as its properties, such as being scale-free, can be used or assumed in subsequent analyses and predictions [21]. Whilst there are many methods to ascertain the likely presence of PL behaviour [21,22,33], there are still occasions when it is difficult to decide between PL and other distributions. ...
Article
Full-text available
Determining the ‘best-fitting’ distribution for data is an important problem in data analysis. Specifically, observing how the distribution of data changes as values below (or above) a threshold are omitted from analyses can be of use in various applications, from animal movement to the modelling of natural phenomena. Such truncated distributions, known as hazard functions, are widely studied and well understood in survival analysis, although rarely widely used in data analysis. Here, by considering the hazard and reverse-hazard functions, we demonstrate a qualitative assessment of the ‘best-fit’ distribution of data. Specifically, we highlight the potential advantages of this method when determining whether power-law behaviour may or may not be present in data. Finally, we demonstrate this approach using some real-world datasets.
... Complex stochastic processes driving evolution of many different real-world phenomena can hardly produce perfect power-law dependencies without any deviation from a pure power law [16]. Searches for pure power law dependencies in real-world data revealed that this distribution is exceedingly rare [17]. Therefore, it is important to consider the full class of regularly varying distributions instead of the pure power laws. ...
... The program 'PLFit Algorithm' [17,21], was used as implemented in R package 'poweRlaw' [22], to test for the possibility of a non-power law distribution to be well approximated by another heavy 6 tailed distribution. There are two tests implemented in the R package 'poweRlaw': Vuong's test [21] and a Bootstrap test [22]. ...
Preprint
Full-text available
Ancient human viruses have been detected in ancient DNA (aDNA) samples ranging from Anatomically Modern Humans to Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and limitations of current methodologies. Spurious alignments of reads to reference sequences (mapping) are a main source of false positives in aDNA assemblies and the assessment of signal-to-noise ratios is essential to differentiate bona fide reconstructions from random, noisy, assemblies. Here we analyzed the statistical distributions of viral genome assemblies, ancient and modern, and their respective random “mock” controls used to evaluate the signal-to-noise ratio. We tested if differences between real and random assemblies could be detected from their statistical distributions. Our analysis shows that the coverage distributions of: (1) real viral aDNA assemblies of adenovirus (ADV), herpesvirus (HSV) and papillomavirus (HPV) do not follow power laws nor log-normal laws, (ADV) and control aDNA assemblies are well approximated by log-normal laws, (3) negative control parvovirus B19 (real and random) follow a power law with infinite variance and (4) the mapDamage negative control with non-ancient DNA (modern ADV) and the mapDamage positive control (human mtDNA) are well approximated by the negative binomial distribution, consistent with the Lander-Waterman model. Our results show that the tails of the distributions of aDNA and their controls reveal the weight of random effects and can differentiate spurious assemblies, or false positives, from bona fide assemblies.
... (Q1) The conditions of observability [12] and the mechanisms of competition and self-organization in systems [4,6]. A key issue is the limited understanding of microscopic rules and the need for large, old networks to support hypotheses on asymptotic distributions and observe expected behaviors over a sufficiently long period [8,[27][28][29]. ...
Preprint
Full-text available
We introduce a method to study evolution rules and scale-free hypothesis of real-world growing networks using natural partitions of nodes and edges based on temporal and topological attributes, and analyzing degree distributions. We apply this method to the Software Heritage dataset, which collects software releases and revisions from open-source communities.Nodes with native temporal information does not fully capture the overall network dynamics, and degree distributions show greater regularity with fewer outliers, suggesting a more likely scale-free regime when examining networks derived from temporal and topological partitions. However, underlying aging, fitness, and inheritance mechanisms, along with chosen partitioning, hinder definitive conclusions and suggest that the very common ``pure parametric power-law'' hypothesis for the tail of degree distributions is too strong. Node's type derived from partions and changes in evolution rules, shown by variations in the average number of new edges per node over time, highlight the need for tools better suited for studying transient regimes and ease comparison of real-world networks with minimal models.
... citation graphs, networking) are power-law [17], [40], [41] where a few nodes have a very high degree, while most nodes have a low degree. According to [42] 49% of graphs have other distribution than power-law. Therefore, for the second type of graph, we used the Erdős-VOLUME 11, 2023 TABLE 1. Absolute mean correlation with standard deviation between node's degree and the output of randomly initialized networks on graphs with 20, 50, 80, 100, 200, and 500 nodes. ...
Article
Full-text available
Graph Neural Networks (GNNs) have demonstrated remarkable performance in tasks involving graph-structured data, but they also exhibit biases linked to node degrees. This paper explores a specific manifestation of such bias, termed Exploration Degree Bias (EDB), in the context of Reinforcement Learning (RL). We show that EDB arises from the inherent design of GNNs, where nodes with high or low degrees disproportionately influence output logits used for decision-making. This phenomenon impacts exploration in RL, skewing it away from mid-degree nodes, potentially hindering the discovery of optimal policies. We provide a systematic investigation of EDB across widely used GNN architectures—GCN, GraphSAGE, GAT, and GIN—by quantifying correlations between node degrees and logits. Our findings reveal that EDB varies by architecture and graph configuration, with GCN and GIN exhibiting the strongest biases. Moreover, analysis of DQN and PPO RL agents illustrates how EDB can distort exploration patterns, with DQN exhibiting EDB under low exploration rates and PPO showing a partial ability to counteract these effects through its probabilistic sampling mechanism. Our contributions include defining and quantifying EDB, providing experimental insights into its existence and variability, and analyzing its implications for RL. These findings underscore the need to address degree-related biases in GNNs to enhance RL performance on graph-based tasks.
Article
Degree distributions in protein-protein interaction (PPI) networks are believed to follow a power law (PL). However, technical and study biases affect the experimental procedures for detecting PPIs. For instance, cancer-associated proteins have received disproportional attention. Moreover, bait proteins in large-scale experiments tend to have many false-positive interaction partners. Studying the degree distributions of thousands of PPI networks of controlled provenance, we address the question if PL distributions in observed PPI networks could be explained by these biases alone. Our findings are supported by mathematical models and extensive simulations, and indicate that study bias and technical bias suffice to produce the observed PL distribution. It is, hence, problematic to derive hypotheses about the topology of the true biological interactome from the PL distributions in observed PPI networks. Our study casts doubt on the use of the PL property of biological networks as a modeling assumption or quality criterion in network biology.
Article
Full-text available
We study cooperation and group pressure on social networks by introducing a new concept termed norm-enforcing ties. By combining network characteristics and agents’ actions, direct and indirect norm-enforcing ties extend and refine the concept of social ties as well as the role of the tightness of a group as drivers of group pressure and cooperation. The results show that a strong commitment by agents with collective interests, or a high degree of confrontation between agents minimizes the effect of indirect norm-enforcing ties on cooperation. The analysis in terms of the agent’s utility reveals that an increase in indirect norm-enforcing ties does not necessarily lead to a decrease in the critical mass of compliers supporting cooperation. We demonstrate that network-oriented policies are more efficient in promoting cooperation than are standard economic policy instruments when the expected value of direct norm-enforcing ties is sufficiently large compared to the tightness of the group. Otherwise, standard economic policy instruments are more efficient.
Article
Full-text available
The description of human mobility is at the core of many fundamental applications ranging from urbanism and transportation to epidemics containment. Data about human movements, once scarce, is now widely available thanks to new sources such as phone call detail records, GPS devices, or Smartphone apps. Nevertheless, it is still common to rely on a single dataset by implicitly assuming that the statistical properties observed are robust regardless of data gathering and processing techniques. Here, we test this assumption on a broad scale by comparing human mobility datasets obtained from 7 different data-sources, tracing 500+ millions individuals in 145 countries. We report wide quantifiable differences in the resulting mobility networks and in the displacement distribution. These variations impact processes taking place on these networks like epidemic spreading. Our results point to the need for disclosing the data processing and, overall, to follow good practices to ensure robust and reproducible results.
Article
Full-text available
Background: At present, the complexity that governs the associations between different biological entities is understood better than ever before, owing to high-throughput techniques and systems biology. Networks of interactions are necessary not only for the visualization of these complex relationships but also because their analysis tends to be valuable for the extraction of novel biological knowledge. Methods: For this reason, we constructed a disease–protein–drug network, focusing on a category of rare protein-misfolding diseases, known as amyloidoses, and on other pathological conditions also associated with amyloid deposition. Apart from the amyloidogenic proteins that self-assemble into fibrils, we also included other co-deposited proteins found in amyloid deposits. Results: In this work, protein–protein, protein–drug, and disease–drug associations were collected to create a heterogenous network. Through disease-based and drug-based analyses, we highlighted commonalities between diseases and proposed an approved drug with prospects of repurposing. Conclusions: The identified disease associations and drug candidates are proposed for further study that will potentially help treat diseases associated with amyloid deposition.
Article
The properties of aggregate skeleton profoundly impact the macro-performance of asphalt mixture, especially the rutting performance. Currently, the empowerment of network theory has greatly advanced the comprehension and quantitative description of aggregate skeleton. However, further refinement is still required in terms of the application depth of network theory and the investigation of the correlation between meso-structural topology and rutting performance. This work comprehensively quantified the topological features, spatial features, and their combined features of aggregate skeletons and investigated their correlation with rutting performance, aiming to provide a reference for assessing the feasibility and worthwhileness of employing topological analysis for the meso-structural characterization and macro-performance interpretation of asphalt mixtures. The results show that, in terms of the realistic affinity with rutting performance, the combined feature performs the best, with an average correlation of 0.55; the topological feature is in the middle, with an average correlation of 0.38; the spatial feature is the least effective, with an average correlation of 0.25. The combined features demonstrate a more direct correlation with rutting performance compared to the topological features. Tree features generally outperform the other topological features from the perspective of the relative affinity with rutting performance, as do the metrics possessing a holistic perspective in combined features. There is an intrinsic relation between the shape of rutting performance curves and the overall closeness of nodes in aggregate networks. The meso-structure of asphalt mixtures exhibits a malfunctioning state characterized by a critical degree of DGCV(TA) = 0.5 from the perspective of the total rut.
Article
Full-text available
Understanding the mathematical properties of graphs underling biological systems could give hints on the evolutionary mechanisms behind these structures. In this article we perform a complete statistical analysis over thousands of graphs representing metabolic and protein-protein interaction (PPI) networks. The focus of the analysis is, apart from the description of the main properties of the graphs, to identify those properties that deviate from the expected values had the networks been build by randomly linking nodes with the same degree distribution. This survey identifies the properties of biological networks which are not solely the result of the degree distribution of the networks, but emerge from the evolutionary pressures under which the network evolves. The findings suggest that, while PPI networks have properties that differ from their expected values in their randomized versions with great statistical significance, the differences for metabolic networks have a smaller statistical significance, though it is possible to identify some drift. We also investigate the quality of fits obtained for the nodes degree distributions to power-law functions. The fits for the metabolic networks do describe the distributions if one disregards nodes with degree equal to one, but in the case of PPI networks the power-law distribution poorly describes the data except for the far right tail covering around half or less of the total distribution.
Article
Full-text available
We analyze time evolution of statistical distributions of citations to scientific papers published in one year. While these distributions can be fitted by a power-law dependence we find that they are nonstationary and the exponent of the power law fit decreases with time and does not come to saturation. We attribute the nonstationarity of citation distributions to different longevity of the low-cited and highly-cited papers. By measuring citation trajectories of papers we found that citation careers of the low-cited papers come to saturation after 10-15 years while those of the highly-cited papers continue to increase indefinitely: the papers that exceed some citation threshold become runaways. Thus, we show that although citation distribution can look as a power-law, it is not scale-free and there is a hidden dynamic scale associated with the onset of runaways. We compare our measurements to our recently developed model of citation dynamics based on copying/redirection/triadic closure and find explanations to our empirical observations.
Article
Full-text available
Complex networks in different areas exhibit degree distributions with heavy upper tail. A preferential attachment mechanism in a growth process produces a graph with this feature. We herein investigate a variant of the simple preferential attachment model, whose modifications are interesting for two main reasons: to analyze more realistic models and to study the robustness of the scale free behavior of the degree distribution. We introduce and study a model which takes into account two different attachment rules: a preferential attachment mechanism (with probability 1-p) that stresses the rich get richer system, and a uniform choice (with probability p) for the most recent nodes. The latter highlights a trend to select one of the last added nodes when no information is available. The recent nodes can be either a given fixed number or a proportion (\alpha n) of the total number of existing nodes. In the first case, we prove that this model exhibits an asymptotically power-law degree distribution. The same result is then illustrated through simulations in the second case. When the window of recent nodes has constant size, we herein prove that the presence of the uniform rule delays the starting time from which the asymptotic regime starts to hold. The mean number of nodes of degree k and the asymptotic degree distribution are also determined analytically. Finally, a sensitivity analysis on the parameters of the model is performed.
Article
Full-text available
Plant-associated microorganisms have been shown to critically affect host physiology and performance, suggesting that evolution and ecology of plants and animals can only be understood in a holobiont (host and its associated organisms) context. Host-associated microbial community structures are affected by abiotic and host factors, and increased attention is given to the role of the microbiome in interactions such as pathogen inhibition. However, little is known about how these factors act on the microbial community, and especially what role microbe-microbe interaction dynamics play. We have begun to address this knowledge gap for phyllosphere microbiomes of plants by simultaneously studying three major groups of Arabidopsis thaliana symbionts (bacteria, fungi and oomycetes) using a systems biology approach. We evaluated multiple potential factors of microbial community control: we sampled various wild A. thaliana populations at different times, performed field plantings with different host genotypes, and implemented successive host colonization experiments under lab conditions where abiotic factors, host genotype, and pathogen colonization was manipulated. Our results indicate that both abiotic factors and host genotype interact to affect plant colonization by all three groups of microbes. Considering microbe-microbe interactions, however, uncovered a network of interkingdom interactions with significant contributions to community structure. As in other scale-free networks, a small number of taxa, which we call microbial "hubs," are strongly interconnected and have a severe effect on communities. By documenting these microbe-microbe interactions, we uncover an important mechanism explaining how abiotic factors and host genotypic signatures control microbial communities. In short, they act directly on "hub" microbes, which, via microbe-microbe interactions, transmit the effects to the microbial community. We analyzed two "hub" microbes (the obligate biotrophic oomycete pathogen Albugo and the basidiomycete yeast fungus Dioszegia) more closely. Albugo had strong effects on epiphytic and endophytic bacterial colonization. Specifically, alpha diversity decreased and beta diversity stabilized in the presence of Albugo infection, whereas they otherwise varied between plants. Dioszegia, on the other hand, provided evidence for direct hub interaction with phyllosphere bacteria. The identification of microbial "hubs" and their importance in phyllosphere microbiome structuring has crucial implications for plant-pathogen and microbe-microbe research and opens new entry points for ecosystem management and future targeted biocontrol. The revelation that effects can cascade through communities via "hub" microbes is important to understand community structure perturbations in parallel fields including human microbiomes and bioprocesses. In particular, parallels to human microbiome "keystone" pathogens and microbes open new avenues of interdisciplinary research that promise to better our understanding of functions of host-associated microbiomes.
Chapter
We analyze time evolution of statistical distributions of citations to scientific papers published in the same year. While these distributions seem to follow the power-law dependence, we find that they are nonstationary and the exponent of the power-law fit decreases with time and does not come to saturation. We attribute the nonstationarity of citation distributions to different longevity of the low-cited and highly-cited papers. By measuring citation trajectories of papers, we found that citation careers of the low-cited papers come to saturation after 10–15 years while those of the highly-cited papers continue to increase indefinitely. When the number of citations of a paper exceeds some citation threshold, it becomes a runaway. Thus, we show that although citation distribution can look as a power-law dependence, it is not scale-free and there is a hidden dynamic scale associated with the onset of runaways. We show that our model of citation dynamics based on copying/redirection/triadic closure accounts for these issues fairly well.
Article
The structure of complex networks has been of interest in many scientific and engineering disciplines over the decades. A number of studies in the field have been focused on finding the common properties among different kinds of networks such as heavy-tail degree distribution, small-worldness and modular structure and they have tried to establish a theory of structural universality in complex networks. However, there is no comprehensive study of network structure across a diverse set of domains in order to explain the structural diversity we observe in the real-world networks. In this paper, we study 986 real-world networks of diverse domains ranging from ecological food webs to online social networks along with 575 networks generated from four popular network models. Our study utilizes a number of machine learning techniques such as random forest and confusion matrix in order to show the relationships among network domains in terms of network structure. Our results indicate that there are some partitions of network categories in which networks are hard to distinguish based purely on network structure. We have found that these partitions of network categories tend to have similar underlying functions, constraints and/or generative mechanisms of networks even though networks in the same partition have different origins, e.g., biological processes, results of engineering by human being, etc. This suggests that the origin of a network, whether it's biological, technological or social, may not necessarily be a decisive factor of the formation of similar network structure. Our findings shed light on the possible direction along which we could uncover the hidden principles for the structural diversity of complex networks.
Conference Paper
By studying a large number of real world graphs, we find empirical evidence that most real world graphs have a statistically significant power-law distribution with a cutoff in the singular values of the adjacency matrix and eigenvalues of the Laplacian matrix in addition to the commonly conjectured power-law in the degrees. Among these results, power-laws in the singular values appear more consistently than in the degree distribution. The exponents of the power-law distributions are much larger than previously observed. We find a surprising direct relationship between the power-law in the degree distribution and the power-law in the eigenvalues of the Laplacian that was theorized in simple models but is extremely accurate in practice. We investigate these findings in large networks by studying the cutoff value itself, which shows a scaling law for the number of elements involved in these power-laws. Using the scaling law enables us to compute only a subset of eigenvalues of large networks, up to tens of millions of vertices and billions of edges, where we find that those too show evidence of statistically significant power-laws.
Article
It is well known that cooperation cannot be an evolutionarily stable strategy for a non-iterative game in a well-mixed population. In contrast, structured populations favor cooperation, since cooperators can benefit each other by forming local clusters. Previous studies have shown that scale-free networks strongly promote cooperation. However, little is known about the invasion mechanism of cooperation in scale-free networks. To study microscopic and macroscopic behaviors of cooperators' invasion, we conducted computational experiments on the evolution of cooperation in scale-free networks where, starting from all defectors, cooperators can spontaneously emerge by mutation. Since the evolutionary dynamics are influenced by the definition of fitness, we tested two commonly adopted fitness functions: accumulated payoff and average payoff. Simulation results show that cooperation is strongly enhanced with the accumulated payoff fitness compared to the average payoff fitness. However, the difference between the two functions decreases as the average degree increases. As the average degree increases, cooperation decreases for the accumulated payoff fitness, while it increases for the average payoff fitness. Moreover, for the average payoff fitness, low-degree nodes play a more important role in spreading cooperative strategies than for the accumulated payoff fitness.
Article
We investigate the application of mesoscopic response functions (MRFs) to characterize a large set of networks of fungi and slime moulds grown under a wide variety of different experimental treatments, including inter-species competition and attack by fungivores. We construct 'structural networks' by estimating cord conductances (which yield edge weights) from the experimental data, and we construct 'functional networks' by calculating edge weights based on how much nutrient traffic is predicted to occur along each edge. Both types of networks have the same topology, and we compute MRFs for both families of networks to illustrate two different ways of constructing taxonomies to group the networks into clusters of related fungi and slime moulds. Although both network taxonomies generate intuitively sensible groupings of networks across species, treatments and laboratories, we find that clustering using the functional-network measure appears to give groups with lower intra-group variation in species or treatments. We argue that MRFs provide a useful quantitative analysis of network behaviour that can (1) help summarize an expanding set of increasingly complex biological networks and (2) help extract information that captures subtle changes in intra- and inter-specific phenotypic traits that are integral to a mechanistic understanding of fungal behaviour and ecology. As an accompaniment to our paper, we also make a large data set of fungal networks available in the public domain.