Page 1

Clustering Drives Assortativity and Community Structure in Ensembles of Networks

David V. Foster,1, ∗Jacob G. Foster,2Peter Grassberger,1,3and Maya Paczuski1

1Complexity Science Group, University of Calgary, Calgary T2N 1N4, Canada

2Department of Sociology, University of Chicago, Chicago 60615, USA

3NIC, Forschungszentrum J¨ ulich, D-52425 J¨ ulich, Germany

(Dated: January 6, 2011)

Clustering, assortativity, and communities are key features of complex networks. We probe depen-

dencies between these attributes and find that ensembles with strong clustering display both high

assortativity by degree and prominent community structure, while ensembles with high assortativ-

ity are much less biased towards clustering or community structure. Further, clustered networks

can amplify small homophilic bias for trait assortativity. This marked asymmetry suggests that

transitivity, rather than homophily, drives the standard nonsocial/social network dichotomy.

PACS numbers:89.75.Hc, 05.10.Ln, 89.75.Fb, 64.60.aq

Networks provide convenient representations for di-

verse phenomena spanning physical, technological, so-

cial, biological and informational domains [1–4]. They

are often complicated, historically contingent assemblies

created by nonlinear processes. Just as it is meaningful

to “explain” features of real networks with simple gen-

erative mechanisms, it is also informative to ask what

properties to expect given no other information about a

network save that it has a certain set of properties.

In fact, network properties can be markedly interde-

pendent [5, 6]. We focus on three key features of undi-

rected networks: (1) the clustering coefficient, C, which

reflects the tendency of the network to form triangles

(transitivity) [7, 8]; (2) the assortativity, r, which re-

flects the tendency of similar nodes to connect to one an-

other (homophily) [9]; and (3) the modularity, Q, which

reflects the tendency of nodes to form tightly intercon-

nected communities [10].

We show that ensembles of networks constrained by a

transitive bias to be strongly clustered also become highly

degree-assortative and modular. In contrast, ensembles

constrained by a homophilic bias to be highly assortative

show only weak clustering or modularity. Hence, at the

ensemble level a fundamental asymmetry exists between

transitivity and homophily. This asymmetry holds un-

less the distribution of the number of links attached to

each node (the node’s degree) is extremely broad. Fur-

thermore, a transitive bias can amplify the effect of a

homophilic bias towards trait (i.e. race, age, education,

etc.) assortativity [11] in network ensembles.

High values for the clustering, assortativity, and mod-

ularity are often observed in real-world social networks,

while nonsocial networks may have low values

Although extensive social science literature posits ho-

mophily to be a dominant force in social network forma-

tion [11, 13] (since social networks are highly assortative),

our results show that a bias for transitive relationships

(also called “triadic closure” in sociology literature [14])

[12].

∗ventres@gmail.com

is sufficient to obtain this effect in network ensembles.

Our work is complementary to that of Newman and Park

who produce assortativity and clustering characteristic of

social networks by introducing modularity [12].

C

r

internet

HEP

HEP

orkut

network science

network science

power

krapivsky

pussokram

astro phys

astro phys

condensed matter

condensed matter

dolphins

dolphins

word adjacency

c. elegans

c. elegans

jazz

jazz

PGP

PGP

email

email

e. coli

correlation = .79

correlation = .79

Y2H

Y2H

TAP

TAP

orkut

nioki

nioki

pussokram

MySpace

MySpace

Cyworld

e. coli

0.0

0.8

−0.20.6

0.2

0.20.00.4

0.4

0.6

power

internet

word adjacency

Cyworld

FIG. 1:

C, and the assortativity, r. Gray points represent social net-

works, black points represent other types of networks. So-

cial networks: astro phys (scientific collaboration) [15] ;

condensed matter (scientific collaboration) [15]; Cyworld (on-

line social) [16]; dolphins (friendship) [17]; email (communi-

cation) [18]; HEP (scientific collaboration) [15]; jazz (musical

collaboration) [19]; MySpace (online social) [16]; network sci-

ence (scientific collaboration) [20]; nioki (online social) [21];

orkut (online social) [16]; PGP (communication network) [22];

pussokram (online dating) [21]. Non-social networks: c.

elegans (neural) [23]; e. coli (metabolic) [24]; internet (router

level) [25]; power (connections between power stations) [7];

TAP (yeast protein-protein binding) [26]; word adjacency (in

English text) [20]; Y2H (yeast protein-protein binding) [27].

The relationship between the clustering coefficient,

To begin, we note a distinct empirical correlation be-

tween C and r in real networks illustrated in Fig. 1, with

social networks (generally) in the high C, high r cor-

ner, and non-social networks (generally) in the low C,

low r one. The pattern suggests an interdependence be-

tween the two features that transcends a simple nonso-

arXiv:1012.2384v2 [physics.soc-ph] 5 Jan 2011

Page 2

2

TABLE I: Important values for the empirical networks

Name NLr

ER 19680 41000 -1.3e-5

HEP7610 15751 .29

NetSci 1461 2742.46

PGP 10680 24316 .24

C

.00021 .246

.33

.70

.38

QRef

[31]

[15]

[20]

[22]

.40

.47

.41

cial/social dichotomy. For instance, consider two net-

works in Fig. 1: TAP is a high C, high r protein-protein

interaction network, generated by tandem affinity purifi-

cation experiments [28]; Y2H is a weakly clustered, disas-

sortative protein-protein interaction network, generated

using yeast two hybridization [29].

methodology, by itself, can explain the difference, since

TAP pulls out bound complexes and assigns links to ev-

ery pair of proteins in the complex while Y2H tests each

pair of proteins individually for direct binding.

transitivity has a natural origin in the construction of

the TAP network, it is likely that the observed assorta-

tivity arises solely as a byproduct of the interrelationship

between transitivity and assortativity rather than any di-

rect homophilic tendency between proteins.

Since network properties often depend conspicuously

on the degree sequence – or the number of links at-

tached to each node [30] – we consider ensembles of

networks constrained to have the same fixed degree se-

quence (FDS). Three real world networks are studied in

detail: a collaboration network of high energy physicists

(HEP) [15]; a collaboration network of network scien-

tists (NetSci) [20]; and an encrypted communication net-

work (PGP) [22]. We also examine a randomly generated

Erd˝ os - R´ enyi network (ER) [31]. Basic network param-

eters are given in Table I.

We use a rewiring procedure [32, 33] to sample from

each ensemble. At each step of the procedure two links

are chosen at random and their endpoints are exchanged,

unless this would create a double link, in which case the

step is skipped. This move set preserves the degree of

each node but otherwise randomizes connections.

sample ensembles with specific features, we use a network

Hamiltonian H(G) [34–37] to define an exponential en-

semble by assigning a sampling weight P(G) ∝ e−H(G)to

each graph G. Here we consider ensembles where H(G)

depends on C, r and/or trait assortativity defined be-

low. Denoting the number of triangles in G by n∆, the

degree of node i by ki, and the number of nodes by N,

the clustering coefficient is defined as

The experimental

Since

To

C =

3n∆

1

2

?N

i=1(ki− 1)ki

. (1)

Assortativity by degree is defined as the Pearson corre-

lation coefficient between the degrees of nodes joined by

a link [9]:

r =L?L

i=1jiki− [?L

i=1ji]2

i=1ji]2

L?L

i=1j2

i− [?L

,(2)

where L is the number of links in the network and jiand

kiare the degrees of nodes at each end of link i.

To get ensembles with specific values of C or r we use

the following Hamiltonians:

HC? = β|C?− Ct|, Hr? = β|r?− rt|

where C?is the current clustering coefficient and Ct is

the target value, and similarly for r?. The parameter β

controls the strength of bias towards the target. It is a

transitive bias in HC? and a homophilic bias in Hr?.

We employ simulated annealing based on a standard

Metropolis-Hastings procedure with a rewiring move

set [38, 39]. One pair of links in the network G is switched

to produce a new candidate network G?. A valid move is

accepted with probability

,(3)

p = eH(G)−H(G?)

p ≤ 1,(4)

and rejected with probability 1 − p. If p > 1 the move is

accepted. Initially, the network is rewired 2×105times at

β = 0 to randomize links and avoid strong hysteresis [37].

Then β is increased slowly, rewiring 5 × 104times after

each increase until C (or r) hits Ct (or rt). The first

network with C = Ct (r = rt) is a single sample from

the ensemble of networks with a fixed degree sequence

and C = Ct(r = rt). The whole process then repeats,

starting with the β = 0 quench.

We also study the influence of transitivity on trait as-

sortativity, rd, which measures the tendency for nodes to

connect to others with the same discrete trait (e.g. race,

gender, etc.) [9]. For this we add a homophilic bias βd

for links between nodes with the same trait. Defining

rd∝?

the Hamiltonian becomes

δeδδ, where eδδis the fraction of links in the net-

work from a node of type δ to another node of type δ,

Hd= β|C − Ct| + βd

?

δ

eδδ

.(5)

Choosing different values of Ctand βdallows one to ex-

plore how transitivity impacts trait assortativity at the

ensemble level.

We examine ensembles constrained to have a particular

value of r (resp. C) and measure the value for the other

feature C (resp. r) averaged over 100 samples from the

ensemble. Results are shown in Fig. 2. The grey (resp.

black) symbols show the values for ensembles with con-

strained r (resp. C). Increasing transitivity to increase C

has a strong influence on r in all cases, whereas increas-

ing homophily to increase r has relatively little impact

on C. The asymmetry is strongest for narrow degree

distributions (e.g. the ER network), and becomes less

pronounced, but still apparent, as the degree distribu-

tion broadens.

The asymmetric relationship between r and C can

be understood as follows: For nodes to participate in

as many transitive relationships as possible, their neigh-

bours must be of similar degree. Hence increasing clus-

Page 3

3

0.0

0.0

0.00.0

C

r

0.20.2

0.2

0.2

0.40.4

0.4

0.4

0.6

0.6

0.60.6

0.8

0.8

0.80.8

HEPER

PGP NetSci

FIG. 2: Controlling assortativity (grey symbols) vs. control-

ling clustering coefficient (black symbols) for various network

degree sequences. C is on the x−axis, r on the y−axis. Each

point represents average values from 100 samples from an en-

semble with specified r or C values. The dashed lines show

the values of r and C for the original network.

asymmetry between the effect of C on r compared to r on C.

Note the

tering also increases r. Increasing r leads to links between

nodes of similar degree, but these relationships need not

be transitive. For narrow degree distributions, one could

divide all nodes of degree k into two groups and only per-

mit links between the two groups. Assortativity would

be maximum, in the absence of any clustering.

On the other hand, for broad degree distributions (like

PGP) only a few nodes of high degree exist, but they

have a large effect on r. Hence for large r, the high-

est degree nodes are under strong pressure to link, thus

creating many transitive relationships. Many social net-

works do not have broad degree distributions. In such

cases homophily has only a weak influence on C at the

ensemble level.

Fig. 2 also indicates the C and r values for the real-

world networks (dashed lines). Ensembles of networks

constrained to have the same C as the real network ex-

hibit far greater r. Hence, social networks are actually

disassortative relative to the ensemble of networks with

the same clustering coefficient and degree sequence [40].

Indeed, the most likely way to create many triangles is

to densely interconnect the higher degree nodes so tri-

angles clump together (as discussed in Ref. [37]). Real

social networks seem to spread clustering more evenly

across the network, thus lowering r. For example in sci-

entific collaboration networks, supervisory relationships

may decrease the assortativity by creating links between

lower degree students and higher degree professors.

We next consider the influence of r and C on modular-

ity. Many methods for extracting community structure

exist [41, 42]. For definiteness, we use the one proposed

by Newman and Girvan [10]: Given a partition of the

network, eijis the fraction of all edges connecting a node

in community i to one in community j, and ai=?

jeij

r

r

Q

Q

Q

Q

C

C

ER

HEP

NetSci

PGP

PGP

ER

HEP

NetSci

PGP

NetSci

PGP

0.2

0.2

0.2

0.0

0.4

0.4

0.4

0.60.8

0.3

0.3

0.5

0.5

ER

HEP

NetSci

ER

HEP

FIG. 3: (Color Online) Modularity Q for various ensembles

of networks with different target values for C (top row) or

r (bottom row). Clustering has a much larger impact on

modularity than assortativity does.

is the fraction of all links within community i. The mod-

ularity of the network given partition P is defined as:

?

We use an agglomerative method [43] to approximate the

best partition and largest QP, which we denote Q.

The top (resp. bottom) panel in Fig. 3 shows the aver-

age Q in ensembles with constrained C (resp. r). Tran-

sitivity has a more pronounced effect on modularity than

does homophily. The modularity achieved for the highly

clustered ensembles approximates the actual modularity

for the real networks (HEP, NetSci, and PGP; see Ta-

ble I), unlike assortative ensembles without a transitive

bias.

Finally, we consider the effect of transitivity on trait as-

sortativity, rd. For each of the degree sequences, we cre-

ate ensembles of networks with different target C values

and varying homophilic biases βd. Since the actual data

sets do not contain trait values, we assign each node one

of three possible traits at random with equal probability.

For ER, HEP, and NetSci we observe that ensembles with

larger C enhance rdrelative to ensembles with the same

homophilic bias but no clustering (C = 0). This is espe-

cially clear for the narrowest (ER) degree sequence. For

the PGP network, which has a broad degree distribution,

clustering appears to compete with the homophilic bias

(e.g. the curves cross), leading to a more complicated

scenario. The interdependence between clustering and

trait assortativity thus appears to depend on the degree

sequence, but for narrow degree sequences the positive

relationship holds and transitivity enhances the effect of

homophilic bias . We also note that increasing the trait

assortativity of an ensemble had no impact on C, r, or

Q (data not shown).

We conjecture that the standard nonsocial/social (dis-

QP=

i

(eii− a2

i).(6)

Page 4

4

HEP ER

NetSciPGP

rd

βd

0.0

0.0

0.00.0

βd

rd

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

0.50.51.01.01.51.52.02.02.52.5

.1

0

.2

.3

.4

.5

C

.1

0

.2

.3

.4

.5

C

.1

0

.2

.3

.4

.5

C

.1

0

.2

.3

.4

.5

C

FIG. 4: (Color Online) Trait assortativity rd (y-axis) for en-

sembles of networks with varying C (indicated in the legend)

and homophilic bias βd (x-axis). For narrow degree distribu-

tions clustering amplifies the response of trait assortativity on

homophilic bias. For broad degree distribution the opposite

occurs for small βd.

assortative/assortative) dichotomy is driven by transitive

relationships in many social networks, such as in scien-

tific collaborations. As shown here, transitivity typically

leads to assortativity. This explains the anomalous posi-

tion of TAP located within social networks, and is consis-

tent with another anomaly in Fig. 1: several online social

networks show low clustering and low assortativity [44].

If assortative mixing by degree is the result of homophily

by degree in social networks, this anomaly is hard to ex-

plain: why should popular people stop seeking each other

out simply because the social network moved online? But

if assortativity is a side-effect of transitivity, this effect

is easier to understand: it is plausible that online so-

cial relationships are less transitive, since in the absence

of spatially mediated interactions there is a smaller ten-

dency to introduce mutual friends. We have not ruled

out the scenario in reference [12].

factors driving network evolution are likely to be com-

plex, multifaceted, and idiosyncratic. Our results on the

asymmetric dependencies between clustering, assortativ-

ity, and modularity provide a warning about inferring

causality from naive observations of network structure.

Indeed, the causal

[1] A. Broder et al., Comp. Netw. 33, 309 (2000).

[2] S. Boccaletti et al., Phys. Rep. 424, 175 (2006).

[3] A. Barab´ asi and Z. Oltvai, Nat. Genet. 5, 101 (2004).

[4] M. E. J. Newman, SIAM Review 45, 167 (2003).

[5] S. N. Soffer and A. Vazquez, Phys. Rev. E 71, 057101

(2005).

[6] P. Holme and J. Zhao, Phys. Rev. E 75, 046111 (2007).

[7] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).

[8] M. E. J. Newman, Phys. Rev. E 68, 026121 (2003).

[9] M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).

[10] M. E. J. Newman and M. Girvan, Phys. Rev. E 69,

026113 (2004).

[11] G. Kossinets and D. Watts, AJS 115, 405 (2009).

[12] M. E. J. Newman and J. Park, Phys. Rev. E 68, 036122

(2003).

[13] M. McPherson, L. Smith-Lovin, and J. Cook, Annu. Rev.

Sociol. 27, 415 (2001).

[14] A. Rapoport, Bull. Math. Biol. 15, 523 (1953).

[15] M. E. J. Newman, PNAS 98, 404 (2001).

[16] Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong,

in Proceedings of the 16th international conference on

World Wide Web (ACM, 2007).

[17] D. Lusseau et al., Behav. Ecol. Sociobiol. 54, 396 (2003).

[18] R. Guimer´ a, L. Danon, A. Daz-Guilera, F. Giralt, and

A. Arenas, Phys. Rev. E 68, 065103 (2003).

[19] P. M. Gleiser and L. Danon, Adv. Complex Sys. 6, 565573

(2003).

[20] M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).

[21] P. Holme, C. R. Edling, and F. Liljeros, Soc. Networks

26, 155 (2004).

[22] M. Bogu˜ n´ a, R. Pastor-Satorras, A. D´ ıaz-Guilera, and

A. Arenas, Phys. Rev. E 70, 056122 (2004).

[23] J. Duch and A. Arenas, Phys. Rev. E 72, 027104 (2005).

[24] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and

A. Barab´ asi, Nature 407, 651 (2000).

[25] M. E.J.Newman,

personal.umich.edu/˜mejn/netdata/.

[26] A. C. Gavin et al., Nature 415, 141 (2002).

[27] H. Jeong, S. P. Mason, A. Barab´ asi, and Z. N. Oltvai,

Nature 411, 41 (2001).

[28] O. Puig et al., Methods 24, 218 (2001).

[29] S. Fields and O. Song, Nature 340, 245 (1989).

[30] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys.

Rev. E 64, 026118 (2001).

[31] P. Erd˝ os and A. R´ enyii, Publ. Math. Debrecen 6, 156

(1959).

[32] S. Maslov and K. Sneppen, Science 296, 910 (2002).

[33] J. G. Foster,D. V. Foster,

M. Paczuski, Phys. Rev. E 76, 46112 (2007).

[34] R. Milo, S. Shen-Orr,

D. Chklovskii, and U. Alon, Science 298, 824 (2002).

[35] J. Berg and M. L¨ assig, Phys. Rev. Lett. 89, 228701

(2002).

[36] J. Park and M. E. J. Newman, Phys. Rev. E 70, 066117

(2004).

[37] D. V. Foster, J. G. Foster, M. Paczuski, and P. Grass-

berger, Phys. Rev. E 81, 046115 (2010).

[38] W. K. Hastings, Biometrika 57, 97 (1970).

[39] M. E. J. Newman and G. T. Barkema, Monte Carlo meth-

ods in statistical physics (Oxford Univ. Press, 1999).

[40] J. G. Foster, D. V. Foster, M. Paczuski, and P. Grass-

berger, P. Natl. Acad. Sci. USA 107, 10815 (2010).

[41] C. Porter, J. Computer-Mediated Comm. 10 (2004).

[42] S. Fortunato, Phys. Rep. 486, 75 (2010).

[43] A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev.

E 70, 066111 (2004).

[44] H. Hu and X. Wang, Europhys. Lett. 86, 18003 (2009).

Network data,http://www-

P. Grassberger,and

S. Itzkovitz, N. Kashtan,