Clustering drives assortativity and community structure in ensembles of networks.
ABSTRACT Clustering, assortativity, and communities are key features of complex networks. We probe dependencies between these features and find that ensembles of networks with high clustering display both high assortativity by degree and prominent community structure, while ensembles with high assortativity show much less enhancement of the clustering or community structure. Further, clustering can amplify a small homophilic bias for trait assortativity in network ensembles. This marked asymmetry suggests that transitivity could play a larger role than homophily in determining the structure of many complex networks.
- SourceAvailable from: Jose Javier Ramasco
Article: Dynamics in online social networks[Show abstract] [Hide abstract]
ABSTRACT: An increasing number of today's social interactions occurs using online social media as communication channels. Some online social networks have become extremely popular in the last decade. They differ among themselves in the character of the service they provide to online users. For instance, Facebook can be seen mainly as a platform for keeping in touch with close friends and relatives, Twitter is used to propagate and receive news, LinkedIn facilitates the maintenance of professional contacts, Flickr gathers amateurs and professionals of photography, etc. Albeit different, all these online platforms share an ingredient that pervades all their applications. There exists an underlying social network that allows their users to keep in touch with each other and helps to engage them in common activities or interactions leading to a better fulfillment of the service's purposes. This is the reason why these platforms share a good number of functionalities, e.g., personal communication channels, broadcasted status updates, easy one-step information sharing, news feeds exposing broadcasted content, etc. As a result, online social networks are an interesting field to study an online social behavior that seems to be generic among the different online services. Since at the bottom of these services lays a network of declared relations and the basic interactions in these platforms tend to be pairwise, a natural methodology for studying these systems is provided by network science. In this chapter we describe some of the results of research studies on the structure, dynamics and social activity in online social networks. We present them in the interdisciplinary context of network science, sociological studies and computer science.10/2012;
- [Show abstract] [Hide abstract]
ABSTRACT: THE ASSUMPTION THAT A NAME UNIQUELY IDENTIFIES AN ENTITY INTRODUCES TWO TYPES OF ERRORS: splitting treats one entity as two or more (because of name variants); lumping treats multiple entities as if they were one (because of shared names). Here we investigate the extent to which splitting and lumping affect commonly-used measures of large-scale named-entity networks within two disambiguated bibliographic datasets: one for co-author names in biomedicine (PubMed, 2003-2007); the other for co-inventor names in U.S. patents (USPTO, 2003-2007). In both cases, we find that splitting has relatively little effect, whereas lumping has a dramatic effect on network measures. For example, in the biomedical co-authorship network, lumping (based on last name and both initials) drives several measures down: the global clustering coefficient by a factor of 4 (from 0.265 to 0.066); degree assortativity by a factor of ∼13 (from 0.763 to 0.06); and average shortest path by a factor of 1.3 (from 5.9 to 4.5). These results can be explained in part by the fact that lumping artificially creates many intransitive relationships and high-degree vertices. This effect of lumping is much less dramatic but persists with measures that give less weight to high-degree vertices, such as the mean local clustering coefficient and log-based degree assortativity. Furthermore, the log-log distribution of collaborator counts follows a much straighter line (power law) with splitting and lumping errors than without, particularly at the low and the high counts. This suggests that part of the power law often observed for collaborator counts in science and technology reflects an artifact: name ambiguity.PLoS ONE 01/2013; 8(7):e70299. · 3.73 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: We uncover the global organization of clustering in real complex networks. To this end, we ask whether triangles in real networks organize as in maximally random graphs with given degree and clustering distributions, or as in maximally ordered graph models where triangles are forced into modules. The answer comes by way of exploring m-core landscapes, where the m-core is defined, akin to the k-core, as the maximal subgraph with edges participating in at least m triangles. This property defines a set of nested subgraphs that, contrarily to k-cores, is able to distinguish between hierarchical and modular architectures. We find that the clustering organization in real networks is neither completely random nor ordered although, surprisingly, it is more random than modular. This supports the idea that the structure of real networks may in fact be the outcome of self-organized processes based on local optimization rules, in contrast to global optimization principles.Scientific Reports 08/2013; 3:2517. · 5.08 Impact Factor
Clustering Drives Assortativity and Community Structure in Ensembles of Networks
David V. Foster,1, ∗Jacob G. Foster,2Peter Grassberger,1,3and Maya Paczuski1
1Complexity Science Group, University of Calgary, Calgary T2N 1N4, Canada
2Department of Sociology, University of Chicago, Chicago 60615, USA
3NIC, Forschungszentrum J¨ ulich, D-52425 J¨ ulich, Germany
(Dated: January 6, 2011)
Clustering, assortativity, and communities are key features of complex networks. We probe depen-
dencies between these attributes and find that ensembles with strong clustering display both high
assortativity by degree and prominent community structure, while ensembles with high assortativ-
ity are much less biased towards clustering or community structure. Further, clustered networks
can amplify small homophilic bias for trait assortativity. This marked asymmetry suggests that
transitivity, rather than homophily, drives the standard nonsocial/social network dichotomy.
PACS numbers:89.75.Hc, 05.10.Ln, 89.75.Fb, 64.60.aq
Networks provide convenient representations for di-
verse phenomena spanning physical, technological, so-
cial, biological and informational domains [1–4]. They
are often complicated, historically contingent assemblies
created by nonlinear processes. Just as it is meaningful
to “explain” features of real networks with simple gen-
erative mechanisms, it is also informative to ask what
properties to expect given no other information about a
network save that it has a certain set of properties.
In fact, network properties can be markedly interde-
pendent [5, 6]. We focus on three key features of undi-
rected networks: (1) the clustering coefficient, C, which
reflects the tendency of the network to form triangles
(transitivity) [7, 8]; (2) the assortativity, r, which re-
flects the tendency of similar nodes to connect to one an-
other (homophily) ; and (3) the modularity, Q, which
reflects the tendency of nodes to form tightly intercon-
nected communities .
We show that ensembles of networks constrained by a
transitive bias to be strongly clustered also become highly
degree-assortative and modular. In contrast, ensembles
constrained by a homophilic bias to be highly assortative
show only weak clustering or modularity. Hence, at the
ensemble level a fundamental asymmetry exists between
transitivity and homophily. This asymmetry holds un-
less the distribution of the number of links attached to
each node (the node’s degree) is extremely broad. Fur-
thermore, a transitive bias can amplify the effect of a
homophilic bias towards trait (i.e. race, age, education,
etc.) assortativity  in network ensembles.
High values for the clustering, assortativity, and mod-
ularity are often observed in real-world social networks,
while nonsocial networks may have low values
Although extensive social science literature posits ho-
mophily to be a dominant force in social network forma-
tion [11, 13] (since social networks are highly assortative),
our results show that a bias for transitive relationships
(also called “triadic closure” in sociology literature )
is sufficient to obtain this effect in network ensembles.
Our work is complementary to that of Newman and Park
who produce assortativity and clustering characteristic of
social networks by introducing modularity .
correlation = .79
correlation = .79
C, and the assortativity, r. Gray points represent social net-
works, black points represent other types of networks. So-
cial networks: astro phys (scientific collaboration)  ;
condensed matter (scientific collaboration) ; Cyworld (on-
line social) ; dolphins (friendship) ; email (communi-
cation) ; HEP (scientific collaboration) ; jazz (musical
collaboration) ; MySpace (online social) ; network sci-
ence (scientific collaboration) ; nioki (online social) ;
orkut (online social) ; PGP (communication network) ;
pussokram (online dating) . Non-social networks: c.
elegans (neural) ; e. coli (metabolic) ; internet (router
level) ; power (connections between power stations) ;
TAP (yeast protein-protein binding) ; word adjacency (in
English text) ; Y2H (yeast protein-protein binding) .
The relationship between the clustering coefficient,
To begin, we note a distinct empirical correlation be-
tween C and r in real networks illustrated in Fig. 1, with
social networks (generally) in the high C, high r cor-
ner, and non-social networks (generally) in the low C,
low r one. The pattern suggests an interdependence be-
tween the two features that transcends a simple nonso-
arXiv:1012.2384v2 [physics.soc-ph] 5 Jan 2011
TABLE I: Important values for the empirical networks
ER 19680 41000 -1.3e-5
HEP7610 15751 .29
NetSci 1461 2742.46
PGP 10680 24316 .24
cial/social dichotomy. For instance, consider two net-
works in Fig. 1: TAP is a high C, high r protein-protein
interaction network, generated by tandem affinity purifi-
cation experiments ; Y2H is a weakly clustered, disas-
sortative protein-protein interaction network, generated
using yeast two hybridization .
methodology, by itself, can explain the difference, since
TAP pulls out bound complexes and assigns links to ev-
ery pair of proteins in the complex while Y2H tests each
pair of proteins individually for direct binding.
transitivity has a natural origin in the construction of
the TAP network, it is likely that the observed assorta-
tivity arises solely as a byproduct of the interrelationship
between transitivity and assortativity rather than any di-
rect homophilic tendency between proteins.
Since network properties often depend conspicuously
on the degree sequence – or the number of links at-
tached to each node  – we consider ensembles of
networks constrained to have the same fixed degree se-
quence (FDS). Three real world networks are studied in
detail: a collaboration network of high energy physicists
(HEP) ; a collaboration network of network scien-
tists (NetSci) ; and an encrypted communication net-
work (PGP) . We also examine a randomly generated
Erd˝ os - R´ enyi network (ER) . Basic network param-
eters are given in Table I.
We use a rewiring procedure [32, 33] to sample from
each ensemble. At each step of the procedure two links
are chosen at random and their endpoints are exchanged,
unless this would create a double link, in which case the
step is skipped. This move set preserves the degree of
each node but otherwise randomizes connections.
sample ensembles with specific features, we use a network
Hamiltonian H(G) [34–37] to define an exponential en-
semble by assigning a sampling weight P(G) ∝ e−H(G)to
each graph G. Here we consider ensembles where H(G)
depends on C, r and/or trait assortativity defined be-
low. Denoting the number of triangles in G by n∆, the
degree of node i by ki, and the number of nodes by N,
the clustering coefficient is defined as
Assortativity by degree is defined as the Pearson corre-
lation coefficient between the degrees of nodes joined by
a link :
where L is the number of links in the network and jiand
kiare the degrees of nodes at each end of link i.
To get ensembles with specific values of C or r we use
the following Hamiltonians:
HC? = β|C?− Ct|, Hr? = β|r?− rt|
where C?is the current clustering coefficient and Ct is
the target value, and similarly for r?. The parameter β
controls the strength of bias towards the target. It is a
transitive bias in HC? and a homophilic bias in Hr?.
We employ simulated annealing based on a standard
Metropolis-Hastings procedure with a rewiring move
set [38, 39]. One pair of links in the network G is switched
to produce a new candidate network G?. A valid move is
accepted with probability
p = eH(G)−H(G?)
p ≤ 1,(4)
and rejected with probability 1 − p. If p > 1 the move is
accepted. Initially, the network is rewired 2×105times at
β = 0 to randomize links and avoid strong hysteresis .
Then β is increased slowly, rewiring 5 × 104times after
each increase until C (or r) hits Ct (or rt). The first
network with C = Ct (r = rt) is a single sample from
the ensemble of networks with a fixed degree sequence
and C = Ct(r = rt). The whole process then repeats,
starting with the β = 0 quench.
We also study the influence of transitivity on trait as-
sortativity, rd, which measures the tendency for nodes to
connect to others with the same discrete trait (e.g. race,
gender, etc.) . For this we add a homophilic bias βd
for links between nodes with the same trait. Defining
the Hamiltonian becomes
δeδδ, where eδδis the fraction of links in the net-
work from a node of type δ to another node of type δ,
Hd= β|C − Ct| + βd
Choosing different values of Ctand βdallows one to ex-
plore how transitivity impacts trait assortativity at the
We examine ensembles constrained to have a particular
value of r (resp. C) and measure the value for the other
feature C (resp. r) averaged over 100 samples from the
ensemble. Results are shown in Fig. 2. The grey (resp.
black) symbols show the values for ensembles with con-
strained r (resp. C). Increasing transitivity to increase C
has a strong influence on r in all cases, whereas increas-
ing homophily to increase r has relatively little impact
on C. The asymmetry is strongest for narrow degree
distributions (e.g. the ER network), and becomes less
pronounced, but still apparent, as the degree distribu-
The asymmetric relationship between r and C can
be understood as follows: For nodes to participate in
as many transitive relationships as possible, their neigh-
bours must be of similar degree. Hence increasing clus-
FIG. 2: Controlling assortativity (grey symbols) vs. control-
ling clustering coefficient (black symbols) for various network
degree sequences. C is on the x−axis, r on the y−axis. Each
point represents average values from 100 samples from an en-
semble with specified r or C values. The dashed lines show
the values of r and C for the original network.
asymmetry between the effect of C on r compared to r on C.
tering also increases r. Increasing r leads to links between
nodes of similar degree, but these relationships need not
be transitive. For narrow degree distributions, one could
divide all nodes of degree k into two groups and only per-
mit links between the two groups. Assortativity would
be maximum, in the absence of any clustering.
On the other hand, for broad degree distributions (like
PGP) only a few nodes of high degree exist, but they
have a large effect on r. Hence for large r, the high-
est degree nodes are under strong pressure to link, thus
creating many transitive relationships. Many social net-
works do not have broad degree distributions. In such
cases homophily has only a weak influence on C at the
Fig. 2 also indicates the C and r values for the real-
world networks (dashed lines). Ensembles of networks
constrained to have the same C as the real network ex-
hibit far greater r. Hence, social networks are actually
disassortative relative to the ensemble of networks with
the same clustering coefficient and degree sequence .
Indeed, the most likely way to create many triangles is
to densely interconnect the higher degree nodes so tri-
angles clump together (as discussed in Ref. ). Real
social networks seem to spread clustering more evenly
across the network, thus lowering r. For example in sci-
entific collaboration networks, supervisory relationships
may decrease the assortativity by creating links between
lower degree students and higher degree professors.
We next consider the influence of r and C on modular-
ity. Many methods for extracting community structure
exist [41, 42]. For definiteness, we use the one proposed
by Newman and Girvan : Given a partition of the
network, eijis the fraction of all edges connecting a node
in community i to one in community j, and ai=?
FIG. 3: (Color Online) Modularity Q for various ensembles
of networks with different target values for C (top row) or
r (bottom row). Clustering has a much larger impact on
modularity than assortativity does.
is the fraction of all links within community i. The mod-
ularity of the network given partition P is defined as:
We use an agglomerative method  to approximate the
best partition and largest QP, which we denote Q.
The top (resp. bottom) panel in Fig. 3 shows the aver-
age Q in ensembles with constrained C (resp. r). Tran-
sitivity has a more pronounced effect on modularity than
does homophily. The modularity achieved for the highly
clustered ensembles approximates the actual modularity
for the real networks (HEP, NetSci, and PGP; see Ta-
ble I), unlike assortative ensembles without a transitive
Finally, we consider the effect of transitivity on trait as-
sortativity, rd. For each of the degree sequences, we cre-
ate ensembles of networks with different target C values
and varying homophilic biases βd. Since the actual data
sets do not contain trait values, we assign each node one
of three possible traits at random with equal probability.
For ER, HEP, and NetSci we observe that ensembles with
larger C enhance rdrelative to ensembles with the same
homophilic bias but no clustering (C = 0). This is espe-
cially clear for the narrowest (ER) degree sequence. For
the PGP network, which has a broad degree distribution,
clustering appears to compete with the homophilic bias
(e.g. the curves cross), leading to a more complicated
scenario. The interdependence between clustering and
trait assortativity thus appears to depend on the degree
sequence, but for narrow degree sequences the positive
relationship holds and transitivity enhances the effect of
homophilic bias . We also note that increasing the trait
assortativity of an ensemble had no impact on C, r, or
Q (data not shown).
We conjecture that the standard nonsocial/social (dis-
FIG. 4: (Color Online) Trait assortativity rd (y-axis) for en-
sembles of networks with varying C (indicated in the legend)
and homophilic bias βd (x-axis). For narrow degree distribu-
tions clustering amplifies the response of trait assortativity on
homophilic bias. For broad degree distribution the opposite
occurs for small βd.
assortative/assortative) dichotomy is driven by transitive
relationships in many social networks, such as in scien-
tific collaborations. As shown here, transitivity typically
leads to assortativity. This explains the anomalous posi-
tion of TAP located within social networks, and is consis-
tent with another anomaly in Fig. 1: several online social
networks show low clustering and low assortativity .
If assortative mixing by degree is the result of homophily
by degree in social networks, this anomaly is hard to ex-
plain: why should popular people stop seeking each other
out simply because the social network moved online? But
if assortativity is a side-effect of transitivity, this effect
is easier to understand: it is plausible that online so-
cial relationships are less transitive, since in the absence
of spatially mediated interactions there is a smaller ten-
dency to introduce mutual friends. We have not ruled
out the scenario in reference .
factors driving network evolution are likely to be com-
plex, multifaceted, and idiosyncratic. Our results on the
asymmetric dependencies between clustering, assortativ-
ity, and modularity provide a warning about inferring
causality from naive observations of network structure.
Indeed, the causal
 A. Broder et al., Comp. Netw. 33, 309 (2000).
 S. Boccaletti et al., Phys. Rep. 424, 175 (2006).
 A. Barab´ asi and Z. Oltvai, Nat. Genet. 5, 101 (2004).
 M. E. J. Newman, SIAM Review 45, 167 (2003).
 S. N. Soffer and A. Vazquez, Phys. Rev. E 71, 057101
 P. Holme and J. Zhao, Phys. Rev. E 75, 046111 (2007).
 D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
 M. E. J. Newman, Phys. Rev. E 68, 026121 (2003).
 M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
 M. E. J. Newman and M. Girvan, Phys. Rev. E 69,
 G. Kossinets and D. Watts, AJS 115, 405 (2009).
 M. E. J. Newman and J. Park, Phys. Rev. E 68, 036122
 M. McPherson, L. Smith-Lovin, and J. Cook, Annu. Rev.
Sociol. 27, 415 (2001).
 A. Rapoport, Bull. Math. Biol. 15, 523 (1953).
 M. E. J. Newman, PNAS 98, 404 (2001).
 Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong,
in Proceedings of the 16th international conference on
World Wide Web (ACM, 2007).
 D. Lusseau et al., Behav. Ecol. Sociobiol. 54, 396 (2003).
 R. Guimer´ a, L. Danon, A. Daz-Guilera, F. Giralt, and
A. Arenas, Phys. Rev. E 68, 065103 (2003).
 P. M. Gleiser and L. Danon, Adv. Complex Sys. 6, 565573
 M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
 P. Holme, C. R. Edling, and F. Liljeros, Soc. Networks
26, 155 (2004).
 M. Bogu˜ n´ a, R. Pastor-Satorras, A. D´ ıaz-Guilera, and
A. Arenas, Phys. Rev. E 70, 056122 (2004).
 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 (2005).
 H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and
A. Barab´ asi, Nature 407, 651 (2000).
 M. E.J.Newman,
 A. C. Gavin et al., Nature 415, 141 (2002).
 H. Jeong, S. P. Mason, A. Barab´ asi, and Z. N. Oltvai,
Nature 411, 41 (2001).
 O. Puig et al., Methods 24, 218 (2001).
 S. Fields and O. Song, Nature 340, 245 (1989).
 M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys.
Rev. E 64, 026118 (2001).
 P. Erd˝ os and A. R´ enyii, Publ. Math. Debrecen 6, 156
 S. Maslov and K. Sneppen, Science 296, 910 (2002).
 J. G. Foster,D. V. Foster,
M. Paczuski, Phys. Rev. E 76, 46112 (2007).
 R. Milo, S. Shen-Orr,
D. Chklovskii, and U. Alon, Science 298, 824 (2002).
 J. Berg and M. L¨ assig, Phys. Rev. Lett. 89, 228701
 J. Park and M. E. J. Newman, Phys. Rev. E 70, 066117
 D. V. Foster, J. G. Foster, M. Paczuski, and P. Grass-
berger, Phys. Rev. E 81, 046115 (2010).
 W. K. Hastings, Biometrika 57, 97 (1970).
 M. E. J. Newman and G. T. Barkema, Monte Carlo meth-
ods in statistical physics (Oxford Univ. Press, 1999).
 J. G. Foster, D. V. Foster, M. Paczuski, and P. Grass-
berger, P. Natl. Acad. Sci. USA 107, 10815 (2010).
 C. Porter, J. Computer-Mediated Comm. 10 (2004).
 S. Fortunato, Phys. Rep. 486, 75 (2010).
 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev.
E 70, 066111 (2004).
 H. Hu and X. Wang, Europhys. Lett. 86, 18003 (2009).
S. Itzkovitz, N. Kashtan,