Clustering drives assortativity and community structure in ensembles of networks.
ABSTRACT Clustering, assortativity, and communities are key features of complex networks. We probe dependencies between these features and find that ensembles of networks with high clustering display both high assortativity by degree and prominent community structure, while ensembles with high assortativity show much less enhancement of the clustering or community structure. Further, clustering can amplify a small homophilic bias for trait assortativity in network ensembles. This marked asymmetry suggests that transitivity could play a larger role than homophily in determining the structure of many complex networks.
Clustering Drives Assortativity and Community Structure in Ensembles of Networks
David V. Foster,1, ∗Jacob G. Foster,2Peter Grassberger,1,3and Maya Paczuski1
1Complexity Science Group, University of Calgary, Calgary T2N 1N4, Canada
2Department of Sociology, University of Chicago, Chicago 60615, USA
3NIC, Forschungszentrum J¨ ulich, D-52425 J¨ ulich, Germany
(Dated: January 6, 2011)
Clustering, assortativity, and communities are key features of complex networks. We probe depen-
dencies between these attributes and find that ensembles with strong clustering display both high
assortativity by degree and prominent community structure, while ensembles with high assortativ-
ity are much less biased towards clustering or community structure. Further, clustered networks
can amplify small homophilic bias for trait assortativity. This marked asymmetry suggests that
transitivity, rather than homophily, drives the standard nonsocial/social network dichotomy.
PACS numbers:89.75.Hc, 05.10.Ln, 89.75.Fb, 64.60.aq
Networks provide convenient representations for di-
verse phenomena spanning physical, technological, so-
cial, biological and informational domains [1–4]. They
are often complicated, historically contingent assemblies
created by nonlinear processes. Just as it is meaningful
to “explain” features of real networks with simple gen-
erative mechanisms, it is also informative to ask what
properties to expect given no other information about a
network save that it has a certain set of properties.
In fact, network properties can be markedly interde-
pendent [5, 6]. We focus on three key features of undi-
rected networks: (1) the clustering coefficient, C, which
reflects the tendency of the network to form triangles
(transitivity) [7, 8]; (2) the assortativity, r, which re-
flects the tendency of similar nodes to connect to one an-
other (homophily) ; and (3) the modularity, Q, which
reflects the tendency of nodes to form tightly intercon-
nected communities .
We show that ensembles of networks constrained by a
transitive bias to be strongly clustered also become highly
degree-assortative and modular. In contrast, ensembles
constrained by a homophilic bias to be highly assortative
show only weak clustering or modularity. Hence, at the
ensemble level a fundamental asymmetry exists between
transitivity and homophily. This asymmetry holds un-
less the distribution of the number of links attached to
each node (the node’s degree) is extremely broad. Fur-
thermore, a transitive bias can amplify the effect of a
homophilic bias towards trait (i.e. race, age, education,
etc.) assortativity  in network ensembles.
High values for the clustering, assortativity, and mod-
ularity are often observed in real-world social networks,
while nonsocial networks may have low values
Although extensive social science literature posits ho-
mophily to be a dominant force in social network forma-
tion [11, 13] (since social networks are highly assortative),
our results show that a bias for transitive relationships
(also called “triadic closure” in sociology literature )
is sufficient to obtain this effect in network ensembles.
Our work is complementary to that of Newman and Park
who produce assortativity and clustering characteristic of
social networks by introducing modularity .
correlation = .79
correlation = .79
C, and the assortativity, r. Gray points represent social net-
works, black points represent other types of networks. So-
cial networks: astro phys (scientific collaboration)  ;
condensed matter (scientific collaboration) ; Cyworld (on-
line social) ; dolphins (friendship) ; email (communi-
cation) ; HEP (scientific collaboration) ; jazz (musical
collaboration) ; MySpace (online social) ; network sci-
ence (scientific collaboration) ; nioki (online social) ;
orkut (online social) ; PGP (communication network) ;
pussokram (online dating) . Non-social networks: c.
elegans (neural) ; e. coli (metabolic) ; internet (router
level) ; power (connections between power stations) ;
TAP (yeast protein-protein binding) ; word adjacency (in
English text) ; Y2H (yeast protein-protein binding) .
The relationship between the clustering coefficient,
To begin, we note a distinct empirical correlation be-
tween C and r in real networks illustrated in Fig. 1, with
social networks (generally) in the high C, high r cor-
ner, and non-social networks (generally) in the low C,
low r one. The pattern suggests an interdependence be-
tween the two features that transcends a simple nonso-
arXiv:1012.2384v2 [physics.soc-ph] 5 Jan 2011
TABLE I: Important values for the empirical networks
ER 19680 41000 -1.3e-5
HEP7610 15751 .29
NetSci 1461 2742.46
PGP 10680 24316 .24
cial/social dichotomy. For instance, consider two net-
works in Fig. 1: TAP is a high C, high r protein-protein
interaction network, generated by tandem affinity purifi-
cation experiments ; Y2H is a weakly clustered, disas-
sortative protein-protein interaction network, generated
using yeast two hybridization .
methodology, by itself, can explain the difference, since
TAP pulls out bound complexes and assigns links to ev-
ery pair of proteins in the complex while Y2H tests each
pair of proteins individually for direct binding.
transitivity has a natural origin in the construction of
the TAP network, it is likely that the observed assorta-
tivity arises solely as a byproduct of the interrelationship
between transitivity and assortativity rather than any di-
rect homophilic tendency between proteins.
Since network properties often depend conspicuously
on the degree sequence – or the number of links at-
tached to each node  – we consider ensembles of
networks constrained to have the same fixed degree se-
quence (FDS). Three real world networks are studied in
detail: a collaboration network of high energy physicists
(HEP) ; a collaboration network of network scien-
tists (NetSci) ; and an encrypted communication net-
work (PGP) . We also examine a randomly generated
Erd˝ os - R´ enyi network (ER) . Basic network param-
eters are given in Table I.
We use a rewiring procedure [32, 33] to sample from
each ensemble. At each step of the procedure two links
are chosen at random and their endpoints are exchanged,
unless this would create a double link, in which case the
step is skipped. This move set preserves the degree of
each node but otherwise randomizes connections.
sample ensembles with specific features, we use a network
Hamiltonian H(G) [34–37] to define an exponential en-
semble by assigning a sampling weight P(G) ∝ e−H(G)to
each graph G. Here we consider ensembles where H(G)
depends on C, r and/or trait assortativity defined be-
low. Denoting the number of triangles in G by n∆, the
degree of node i by ki, and the number of nodes by N,
the clustering coefficient is defined as
Assortativity by degree is defined as the Pearson corre-
lation coefficient between the degrees of nodes joined by
a link :
where L is the number of links in the network and jiand
kiare the degrees of nodes at each end of link i.
To get ensembles with specific values of C or r we use
the following Hamiltonians:
HC? = β|C?− Ct|, Hr? = β|r?− rt|
where C?is the current clustering coefficient and Ct is
the target value, and similarly for r?. The parameter β
controls the strength of bias towards the target. It is a
transitive bias in HC? and a homophilic bias in Hr?.
We employ simulated annealing based on a standard
Metropolis-Hastings procedure with a rewiring move
set [38, 39]. One pair of links in the network G is switched
to produce a new candidate network G?. A valid move is
accepted with probability
p = eH(G)−H(G?)
p ≤ 1,(4)
and rejected with probability 1 − p. If p > 1 the move is
accepted. Initially, the network is rewired 2×105times at
β = 0 to randomize links and avoid strong hysteresis .
Then β is increased slowly, rewiring 5 × 104times after
each increase until C (or r) hits Ct (or rt). The first
network with C = Ct (r = rt) is a single sample from
the ensemble of networks with a fixed degree sequence
and C = Ct(r = rt). The whole process then repeats,
starting with the β = 0 quench.
We also study the influence of transitivity on trait as-
sortativity, rd, which measures the tendency for nodes to
connect to others with the same discrete trait (e.g. race,
gender, etc.) . For this we add a homophilic bias βd
for links between nodes with the same trait. Defining
the Hamiltonian becomes
δeδδ, where eδδis the fraction of links in the net-
work from a node of type δ to another node of type δ,
Hd= β|C − Ct| + βd
Choosing different values of Ctand βdallows one to ex-
plore how transitivity impacts trait assortativity at the
We examine ensembles constrained to have a particular
value of r (resp. C) and measure the value for the other
feature C (resp. r) averaged over 100 samples from the
ensemble. Results are shown in Fig. 2. The grey (resp.
black) symbols show the values for ensembles with con-
strained r (resp. C). Increasing transitivity to increase C
has a strong influence on r in all cases, whereas increas-
ing homophily to increase r has relatively little impact
on C. The asymmetry is strongest for narrow degree
distributions (e.g. the ER network), and becomes less
pronounced, but still apparent, as the degree distribu-
The asymmetric relationship between r and C can
be understood as follows: For nodes to participate in
as many transitive relationships as possible, their neigh-
bours must be of similar degree. Hence increasing clus-
FIG. 2: Controlling assortativity (grey symbols) vs. control-
ling clustering coefficient (black symbols) for various network
degree sequences. C is on the x−axis, r on the y−axis. Each
point represents average values from 100 samples from an en-
semble with specified r or C values. The dashed lines show
the values of r and C for the original network.
asymmetry between the effect of C on r compared to r on C.
tering also increases r. Increasing r leads to links between
nodes of similar degree, but these relationships need not
be transitive. For narrow degree distributions, one could
divide all nodes of degree k into two groups and only per-
mit links between the two groups. Assortativity would
be maximum, in the absence of any clustering.
On the other hand, for broad degree distributions (like
PGP) only a few nodes of high degree exist, but they
have a large effect on r. Hence for large r, the high-
est degree nodes are under strong pressure to link, thus
creating many transitive relationships. Many social net-
works do not have broad degree distributions. In such
cases homophily has only a weak influence on C at the
Fig. 2 also indicates the C and r values for the real-
world networks (dashed lines). Ensembles of networks
constrained to have the same C as the real network ex-
hibit far greater r. Hence, social networks are actually
disassortative relative to the ensemble of networks with
the same clustering coefficient and degree sequence .
Indeed, the most likely way to create many triangles is
to densely interconnect the higher degree nodes so tri-
angles clump together (as discussed in Ref. ). Real
social networks seem to spread clustering more evenly
across the network, thus lowering r. For example in sci-
entific collaboration networks, supervisory relationships
may decrease the assortativity by creating links between
lower degree students and higher degree professors.
We next consider the influence of r and C on modular-
ity. Many methods for extracting community structure
exist [41, 42]. For definiteness, we use the one proposed
by Newman and Girvan : Given a partition of the
network, eijis the fraction of all edges connecting a node
in community i to one in community j, and ai=?
FIG. 3: (Color Online) Modularity Q for various ensembles
of networks with different target values for C (top row) or
r (bottom row). Clustering has a much larger impact on
modularity than assortativity does.
is the fraction of all links within community i. The mod-
ularity of the network given partition P is defined as:
We use an agglomerative method  to approximate the
best partition and largest QP, which we denote Q.
The top (resp. bottom) panel in Fig. 3 shows the aver-
age Q in ensembles with constrained C (resp. r). Tran-
sitivity has a more pronounced effect on modularity than
does homophily. The modularity achieved for the highly
clustered ensembles approximates the actual modularity
for the real networks (HEP, NetSci, and PGP; see Ta-
ble I), unlike assortative ensembles without a transitive
Finally, we consider the effect of transitivity on trait as-
sortativity, rd. For each of the degree sequences, we cre-
ate ensembles of networks with different target C values
and varying homophilic biases βd. Since the actual data
sets do not contain trait values, we assign each node one
of three possible traits at random with equal probability.
For ER, HEP, and NetSci we observe that ensembles with
larger C enhance rdrelative to ensembles with the same
homophilic bias but no clustering (C = 0). This is espe-
cially clear for the narrowest (ER) degree sequence. For
the PGP network, which has a broad degree distribution,
clustering appears to compete with the homophilic bias
(e.g. the curves cross), leading to a more complicated
scenario. The interdependence between clustering and
trait assortativity thus appears to depend on the degree
sequence, but for narrow degree sequences the positive
relationship holds and transitivity enhances the effect of
homophilic bias . We also note that increasing the trait
assortativity of an ensemble had no impact on C, r, or
Q (data not shown).
We conjecture that the standard nonsocial/social (dis-
FIG. 4: (Color Online) Trait assortativity rd (y-axis) for en-
sembles of networks with varying C (indicated in the legend)
and homophilic bias βd (x-axis). For narrow degree distribu-
tions clustering amplifies the response of trait assortativity on
homophilic bias. For broad degree distribution the opposite
occurs for small βd.
assortative/assortative) dichotomy is driven by transitive
relationships in many social networks, such as in scien-
tific collaborations. As shown here, transitivity typically
leads to assortativity. This explains the anomalous posi-
tion of TAP located within social networks, and is consis-
tent with another anomaly in Fig. 1: several online social
networks show low clustering and low assortativity .
If assortative mixing by degree is the result of homophily
by degree in social networks, this anomaly is hard to ex-
plain: why should popular people stop seeking each other
out simply because the social network moved online? But
if assortativity is a side-effect of transitivity, this effect
is easier to understand: it is plausible that online so-
cial relationships are less transitive, since in the absence
of spatially mediated interactions there is a smaller ten-
dency to introduce mutual friends. We have not ruled
out the scenario in reference .
factors driving network evolution are likely to be com-
plex, multifaceted, and idiosyncratic. Our results on the
asymmetric dependencies between clustering, assortativ-
ity, and modularity provide a warning about inferring
causality from naive observations of network structure.
Indeed, the causal
 A. Broder et al., Comp. Netw. 33, 309 (2000).
 S. Boccaletti et al., Phys. Rep. 424, 175 (2006).
 A. Barab´ asi and Z. Oltvai, Nat. Genet. 5, 101 (2004).
 M. E. J. Newman, SIAM Review 45, 167 (2003).
 S. N. Soffer and A. Vazquez, Phys. Rev. E 71, 057101
 P. Holme and J. Zhao, Phys. Rev. E 75, 046111 (2007).
 D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
 M. E. J. Newman, Phys. Rev. E 68, 026121 (2003).
 M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
 M. E. J. Newman and M. Girvan, Phys. Rev. E 69,
 G. Kossinets and D. Watts, AJS 115, 405 (2009).
 M. E. J. Newman and J. Park, Phys. Rev. E 68, 036122
 M. McPherson, L. Smith-Lovin, and J. Cook, Annu. Rev.
Sociol. 27, 415 (2001).
 A. Rapoport, Bull. Math. Biol. 15, 523 (1953).
 M. E. J. Newman, PNAS 98, 404 (2001).
 Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong,
in Proceedings of the 16th international conference on
World Wide Web (ACM, 2007).
 D. Lusseau et al., Behav. Ecol. Sociobiol. 54, 396 (2003).
 R. Guimer´ a, L. Danon, A. Daz-Guilera, F. Giralt, and
A. Arenas, Phys. Rev. E 68, 065103 (2003).
 P. M. Gleiser and L. Danon, Adv. Complex Sys. 6, 565573
 M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
 P. Holme, C. R. Edling, and F. Liljeros, Soc. Networks
26, 155 (2004).
 M. Bogu˜ n´ a, R. Pastor-Satorras, A. D´ ıaz-Guilera, and
A. Arenas, Phys. Rev. E 70, 056122 (2004).
 J. Duch and A. Arenas, Phys. Rev. E 72, 027104 (2005).
 H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and
A. Barab´ asi, Nature 407, 651 (2000).
 M. E.J.Newman,
 A. C. Gavin et al., Nature 415, 141 (2002).
 H. Jeong, S. P. Mason, A. Barab´ asi, and Z. N. Oltvai,
Nature 411, 41 (2001).
 O. Puig et al., Methods 24, 218 (2001).
 S. Fields and O. Song, Nature 340, 245 (1989).
 M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys.
Rev. E 64, 026118 (2001).
 P. Erd˝ os and A. R´ enyii, Publ. Math. Debrecen 6, 156
 S. Maslov and K. Sneppen, Science 296, 910 (2002).
 J. G. Foster,D. V. Foster,
M. Paczuski, Phys. Rev. E 76, 46112 (2007).
 R. Milo, S. Shen-Orr,
D. Chklovskii, and U. Alon, Science 298, 824 (2002).
 J. Berg and M. L¨ assig, Phys. Rev. Lett. 89, 228701
 J. Park and M. E. J. Newman, Phys. Rev. E 70, 066117
 D. V. Foster, J. G. Foster, M. Paczuski, and P. Grass-
berger, Phys. Rev. E 81, 046115 (2010).
 W. K. Hastings, Biometrika 57, 97 (1970).
 M. E. J. Newman and G. T. Barkema, Monte Carlo meth-
ods in statistical physics (Oxford Univ. Press, 1999).
 J. G. Foster, D. V. Foster, M. Paczuski, and P. Grass-
berger, P. Natl. Acad. Sci. USA 107, 10815 (2010).
 C. Porter, J. Computer-Mediated Comm. 10 (2004).
 S. Fortunato, Phys. Rep. 486, 75 (2010).
 A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev.
E 70, 066111 (2004).
 H. Hu and X. Wang, Europhys. Lett. 86, 18003 (2009).
S. Itzkovitz, N. Kashtan,