Page 1

1

On Modularity Clustering

Ulrik Brandes1, Daniel Delling2, Marco Gaertler2, Robert G¨ orke2,

Martin Hoefer1, Zoran Nikoloski3, Dorothea Wagner2

Abstract—Modularity is a recently introduced quality measure

for graph clusterings. It has immediately received considerable

attention in several disciplines, and in particular in the complex

systems literature, although its properties are not well under-

stood. We study the problem of finding clusterings with maximum

modularity, thus providing theoretical foundations for past and

present work based on this measure. More precisely, we prove

the conjectured hardness of maximizing modularity both in the

general case and with the restriction to cuts, and give an Integer

Linear Programming formulation. This is complemented by first

insights into the behavior and performance of the commonly

applied greedy agglomerative approach.

Index Terms—Graph Clustering, Graph Partitioning, Modu-

larity, Community Structure, Greedy Algorithm

I. INTRODUCTION

Graph clustering is a fundamental problem in the analysis of

relational data. Studied for decades and applied to many settings,

it is now popularly referred to as the problem of partitioning

networks into communities. In this line of research, a novel graph

clustering index called modularity has been proposed recently [1].

The rapidly growing interest in this measure prompted a series

of follow-up studies on various applications and possible adjust-

ments (see, e.g., [2], [3], [4], [5], [6]). Moreover, an array of

heuristic algorithms has been proposed to optimize modularity.

These are based on a greedy agglomeration [7], [8], on spectral

division [9], [10], simulated annealing [11], [12], or extremal

optimization [13] to name but a few prominent examples. While

these studies often provide plausibility arguments in favor of the

resulting partitions, we know of only one attempt to characterize

properties of clusterings with maximum modularity [2]. In partic-

ular, none of the proposed algorithms has been shown to produce

optimal partitions with respect to modularity.

In this paper we study the problem of finding clusterings with

maximum modularity, thus providing theoretical foundations for

past and present work based on this measure. More precisely,

we proof the conjectured hardness of maximizing modularity

both in the general case and the restriction to cuts, and give an

integer linear programming formulation to facilitate optimization

without enumeration of all clusterings. Since the most commonly

employed heuristic to optimize modularity is based on greedy

agglomeration, we investigate its worst-case behavior. In fact, we

give a graph family for which the greedy approach yields an

This work was partially supported by the DFG under grants BR 2158/2-

3, WA 654/14-3, Research Training Group 1042 ”Explorative Analysis and

Visualization of Large Information Spaces” and by EU under grant DELIS

(contract no. 001907).

1Department of Computer & Information Science, University of Konstanz,

{brandes,hoefer}@inf.uni-konstanz.de

2

Faculty of Informatics,

{delling,gaertler,rgoerke,wagner}@ira.uka.de

3Max-Planck Institute for Molecular Plant Physiology, Bioinformatics

Group, nikoloski@mpimp-golm.mpg.de

Universit¨ atKarlsruhe(TH),

approximation factor no better than two. In addition, our examples

indicate that the quality of greedy clusterings may heavily depend

on the tie-breaking strategy utilized. In fact, in the worst case,

no approximation factor can be provided. These performance

studies are concluded by partitioning some previously considered

networks optimally, which does yield further insight.

This paper is organized as follows. Section II shortly introduces

preliminaries, formulations of modularity, an ILP formulation of

the problem. Basic and counterintuitive properties of modularity

are observed in Sect. III. Our NP-completeness proofs are given

in Section IV, followed by an analysis of the greedy approach

in Section V. The theoretical investigation is extended by char-

acterizations of the optimum clusterings for cliques and cycles

in Section VI. Our work is concluded by revisiting examples

from previous work in Section VII and a brief discussion in

Section VIII.

II. PRELIMINARIES

Throughout this paper, we will use the notation of [14]. More

precisely, we assume that G = (V,E) is an undirected connected

graph with n := |V | vertices, m := |E| edges. Denote by C =

{C1,...,Ck} a partition of V . We call C a clustering of G and

the Ci, which are required to be non-empty, clusters; C is called

trivial if either k = 1 or k = n. We denote the set of all possible

clusterings of a graph G with A(G). In the following, we often

identify a cluster Ciwith the induced subgraph of G, i.e., the

graph G[Ci] := (Ci,E(Ci)), where E(Ci) := {{v,w} ∈ E :

v,w ∈ Ci}. Then E(C) :=?k

intra-cluster edges is denoted by m(C) and the number of inter-

cluster edges by m(C). The set of edges that have one end-node

in Ciand the other end-node in Cjis denoted by E(Ci,Cj).

i=1E(Ci) is the set of intra-cluster

edges and E \E(C) the set of inter-cluster edges. The number of

A. Definition of Modularity

Modularity is a quality index for clusterings. Given a simple

graph G = (V,E), we follow [1] and define the modularity q(C)

of a clustering C as

q(C) :=

?

Note that C?ranges over all clusters, so that edges in E(C)

are counted twice in the squared expression. This is to adjust

proportions, since edges in E(C,C?), C ?= C?, are counted twice

as well, once for each ordering of the arguments. Note that we

can rewrite Equation (1) into the more convenient form

?

m

C∈C

|E(C)|

m

−

?

|E(C)| +?

C?∈C|E(C,C?)|

2m

?2

.

(1)

q(C) =

?

C∈C

|E(C)|

−

??

v∈Cdeg(v)

2m

?2?

.

(2)

Page 2

2

This reveals an inherent trade-off: To maximize the first term,

many edges should be contained in clusters, whereas the mini-

mization of the second term is achieved by splitting the graph

into many clusters with small total degrees each. Note that the

first term |E(C)|/m is also known as coverage [14].

B. Maximizing Modularity via Integer Linear Programming

The problem of maximizing modularity can be cast into a

very simple and intuitive integer linear program (ILP). Given a

graph G = (V,E) with n := |V | nodes, we define n2decision

variables Xuv ∈ {0,1}, one for every pair of nodes u,v ∈ V .

The key idea is that these variables can be interpreted as an

equivalence relation (over V ) and thus form a clustering. In order

to ensure consistency, we need the following constraints, which

guarantee

reflexivity

symmetry

∀ u: Xuu = 1 ,

∀ u,v: Xuv = Xvu , and

1

2m

(u,v)∈V2

?

0

, otherwise

transitivity

∀ u,v,w:

Xuv+ Xvw− 2 · Xuw

Xuw+ Xuv− 2 · Xvw

Xvw+ Xuw− 2 · Xuv

≤

≤

≤

1

1

1

.

The objective function of modularity then becomes

?

1

?

, if (u,v) ∈ E

Euv−deg(u)deg(v)

2m

?

Xuv ,

with

Euv =.

Note that this ILP can be simplified by pruning redundant

variables and constraints, leaving only

constraints.

?n

2

?

variables and

?n

3

?

III. FUNDAMENTAL OBSERVATIONS

In the following, we identify basic structural properties that

clusterings with maximum modularity fulfill. We first focus on

the range of modularity, for which Lemma 3.1 gives the lower

and upper bound.

Lemma 3.1: Let G be an undirected and unweighted graph

and C ∈ A(G). Then −1/2 ≤ q(C) ≤ 1 holds.

Proof:Let mi = |E(C)| be the number of edges inside

cluster C and me =

C?=C?∈C

edges having exactly one end-node in C. Then the contribution

of C to q(C) is:

mi

m−

This expression is strictly decreasing in meand, when varying mi,

the only maximum point is at mi= (m−me)/2. The contribution

of a cluster is minimized when miis zero and me is as large as

possible. Suppose now mi= 0, using the inequality (a + b)2≥

a2+ b2for all non-negative numbers a and b, modularity has a

minimum score for two clusters where all edges are inter-cluster

edges. The upper bound is obvious from our reformulation in

Equation (2), and has been observed previously [2], [3], [15]. It

can only be actually attained in the specific case of a graph with

no edges, where coverage is defined to be 1.

As a result, any bipartite graph Ka,bwith the canonic clustering

C = {Ca,Cb} yields the minimum modularity of −1/2. The

following four results characterize the structure of a clustering

with maximum modularity.

?

??E(C,C?)??be the number of

?2

?mi

m+me

2m

.

Corollary 3.2: Isolated nodes have no impact on modularity.

Corollary 3.2 directly follows from the fact that modularity

depends on edges and degrees, thus, an isolated node does not

contribute, regardless of its association to a cluster. Therefore, we

exclude isolated nodes from further consideration in this work,

i.e., all nodes are assumed to be of degree greater than zero.

Lemma 3.3: A clustering with maximum modularity has no

cluster that consists of a single node with degree 1.

Proof: Suppose for contradiction that there is a clustering C

with a cluster Cv = {v} and deg(v) = 1. Consider a cluster

Cu that contains the neighbor node u. Suppose there are a

number of miintra-cluster edges in Cuand meinter-cluster edges

connecting Cu to other clusters. Together these clusters add

mi

m−(2mi+ me)2+ 1

4m2

to q(C). Merging Cv with Cu results in a new contribution of

mi+ 1

m

The merge yields an increase of

−(2mi+ me+ 1)2

4m2

1

m−2mi+ me

2m2

> 0

in modularity, because mi+ me ≤ m and me ≥ 1. This proves

the lemma.

Lemma 3.4: There is always a clustering with maximum mod-

ularity, in which each cluster consists of a connected subgraph.

Proof:Consider for contradiction a clustering C with a

cluster C of miintra- and me inter-cluster edges that consists

of a set of more than one connected subgraph. The subgraphs in

C do not have to be disconnected in G, they are only disconnected

when we consider the edges E(C). Cluster C adds

mi

m−(2mi+ me)2

4m2

to q(C). Now suppose we create a new clustering C?by splitting C

into two new clusters. Let one cluster Cvconsist of the component

including node v, i.e. all nodes, which can be reached from a

node v with a path running only through nodes of C, i.e. Cv =

?∞

Let Cv have mv

new clusters add

m−(2mv

to q?C??. For a,b ≥ 0 obviously a2+ b2≤ (a + b)2, and hence

Corollary 3.5: A clustering of maximum modularity does not

include disconnected clusters.

Corollary 3.5 directly follows from Lemma 3.4 and from the

exclusion of isolated nodes. Thus, the search for an optimum

can be restricted to clusterings, in which clusters are connected

subgraphs and there are no clusters consisting of nodes with

degree 1.

i=1Ci

and C0

v, where Ci

v= {v}. The other nonempty cluster is given by C − Cv.

iintra- and mv

v= {w | ∃(w,wi) ∈ E(C) with wi∈ Ci−1

v

}

einter-cluster edges. Together the

mi

i+ mv

e)2+ (2(m − mv

4m2

i) + m − mv

e)2

q?C??≥ q(C).

A. Counterintuitive Behavior

In the last section, we listed some intuitive properties like

connectivity within clusters for clusterings of maximum modular-

ity. However, due to the enforced balance between coverage and

the sums of squared cluster degrees, counter-intuitive situations

Page 3

3

(a)(b)

(c) (d)

Fig. 1.

behavior. Clusters are represented by colours.

(a,b) Non-local behavior; (c) a clique K3 with leaves; (d) scaling

arise. These are non-locality, scaling behavior, and sensitivity to

satellites.

a) Non-Locality.: At first view, modularity seems to be

a local quality measure. Recalling Equation (2), each cluster

contributes separately. However, the example presented in Fig-

ures 1(a) and 1(b) exhibit a typical non-local behavior. In these

figures, clusters are represented by color. By adding an additional

node connected to the leftmost node, the optimal clustering is

altered completely. According to Lemma 3.3 the additional node

has to be clustered together with the leftmost node. This leads to

a shift of the rightmost black node from the black cluster to the

white cluster, although locally its neighborhood structure has not

changed.

b) Sensitivity to Satellites.: A clique with leaves is a graph

of 2n nodes that consists of a clique Kn and n leaf nodes of

degree one, such that each node of the clique is connected to

exactly one leaf node. For a clique we show in Section VI that

the trivial clustering with k = 1 has maximum modularity. For

a clique with leaves, however, the optimal clustering changes to

k = n clusters, in which each cluster consists of a connected pair

of leaf and clique nodes. Figure 1(c) shows an example.

c) Scaling Behavior.: Figures 1(c) and 1(d) display the

scaling behavior of modularity. By simply doubling the graph pre-

sented in Figure 1(c), the optimal clustering is altered completely.

While in Figure 1(c) we obtain three clusters each consisting of

the minor K2, the clustering with maximum modularity of the

graph in Figure 1(d) consists of two clusters, each being a graph

equal to the one in Figure 1(c).

This behavior is in line with the previous observations in [2],

[4] that size and structure of clusters in the optimum clustering

depend on the total number of links in the network. Hence,

clusters that are identified in smaller graphs might be combined

to a larger cluster in a optimum clustering of a larger graph.

The formulation of Equation 2 mathematically explains this

observation as modularity optimization strives to optimize the

trade-off between coverage and degree sums. This provides a

rigorous understanding of the observations made in [2], [4].

IV. NP-COMPLETENESS

It has been conjectured that maximizing modularity is hard [8],

but no formal proof was provided to date. We next show that

that decision version of modularity maximization is indeed NP-

complete.

Fig. 2.

3-PARTITION. Node labels indicate the corresponding numbers ai∈ A.

An example graph G(A) for the instance A = {2,2,2,2,3,3} of

Problem 1 (MODULARITY): Given a graph G and a number

K, is there a clustering C of G, for which q(C) ≥ K?

Note that we may ignore the fact that, in principle, K could

be a real number in the range [−1/2,1], because 4m2· q(C) is

integer for every partition C of G and polynomially bounded in

the size of G. Our hardness result for MODULARITY is based on

a transformation from the following decision problem.

Problem 2 (3-PARTITION): Given 3k positive integer numbers

a1,...,a3ksuch that the sum?3k

these numbers into k sets, such that the numbers in each set sum

up to b?

We show that an instance A = {a1,...,a3k} of 3-PARTITION can

be transformed into an instance (G(A),K(A)) of MODULARITY,

such that G(A) has a clustering with modularity at least K(A),

if and only if a1,...,a3kcan be partitioned into k sets of sum

b = 1/k ·?k

i.e. the problem remains NP-complete even if the input is

represented in unary coding. This implies that no algorithm can

decide the problem in time polynomial even in the sum of the

input values, unless P = NP. More importantly, it implies that

our transformation need only be pseudo-polynomial.

The reduction is defined as follows. Given an instance A of 3-

PARTITION, construct a graph G(A) with k cliques (completely

connected subgraphs) H1,...,Hkof size a =?3k

connect it to ainodes in each of the k cliques in such a way that

each clique member is connected to exactly one element node.

It is easy to see that each clique node then has degree a and

the element node corresponding to element ai∈ A has degree

kai. The number of edges in G(A) is m = k/2 · a(a + 1). See

Figure 2 for an example. Note that the size of G(A) is polynomial

in the unary coding size of A, so that our transformation is indeed

pseudo-polynomial.

Before specifying bound K(A) for the instance of MODULAR-

ITY, we will show three properties of maximum modularity clus-

terings of G(A). Together these properties establish the desired

characterization of solutions for 3-PARTITION by solutions for

MODULARITY.

i=1ai= kb and b/4 < ai< b/2

for an integer b and for all i = 1,...,3k, is there a partition of

i=1aieach.

It is crucial that 3-PARTITION is strongly NP-complete [16],

i=1aieach. For

each element ai∈ A we introduce a single element node, and

Page 4

4

Lemma 4.1: In a maximum modularity clustering of G(A),

none of the cliques H1,...,Hkis split.

We prove the lemma by showing that every clustering that violates

the above condition can be modified in order to strictly improve

modularity.

Proof:We consider a clustering C that splits a clique

H ∈ {H1,...,Hk} into different clusters and then show how

to obtain a clustering with strictly higher modularity. Suppose

that C1,...,Cr ∈ C, r > 1, are the clusters that contain nodes of

H. For i = 1,...,r we denote by nithe number of nodes of H

contained in cluster Ci, mi= |E(Ci)| the number edges between

nodes in Ci, fithe number of edges between nodes of H in Ci

and element nodes in Ci, dibe the sum of degrees of all nodes

in Ci. The contribution of C1,...,Cr to q(C) is

1

m

r

?

i=1

mi−

1

4m2

r

?

i=1

d2

i .

Now suppose we create a clustering C?by rearranging the nodes

in C1,...,Cr into clusters C?,C?

exactly the nodes of clique H, and each C?

remaining elements of Ci (if any). In this new clustering the

number of covered edges reduces by?r

connecting the clique nodes to other non-clique nodes of Cias

inter-cluster edges. For H itself there are

edges that are now additionally covered due to the creation of

cluster C?. In terms of degrees the new cluster C?contains a

nodes of degree a. The sums for the remaining clusters C?

reduced by the degrees of the clique nodes, as these nodes are

now in C?. So the contribution of these clusters to q?C??is given

1,...,C?r, such that C?contains

i, 1 ≤ i ≤ r, the

i=1fi, because all nodes

i. This labels the edgesfrom H are removed from the clusters C?

?r

i=1

?r

j=i+1ninj

iare

by

1

m

r

?

i=1

mi+

a4+

r

?

r

?

j=i+1

ninj− fi

?

−

1

4m2

?

i=1

(di− nia)2

.

Setting ∆ := q?C??− q(C), we obtain

∆=

1

m

1

4m2

r

?

i=1

r

?

?

?

?

j=i+1

??

ninj− fi

+

r

i=1

r

2dinia − n2

ia2

?

− a4

?

=

1

4m2

?

?

(4m

i=1

2dia − nia2??

r

?

j=i+1

ninj− 4m

r

?

?

i=1

fi

+

r

?

i=1

ni

− a4

.

Using

?r

the

j?=ininj, substituting m =k

equationthat

2?r

i=1

?r

j=i+1ninj

=

i=1

?

2a(a + 1) and rearranging

terms we get

∆=

a

4m2

?

− a3− 2k(a + 1)

?

r

?

i=1

fi

+

r

?

a

4m2

i=1

ni

2di− nia + k(a + 1)

?

j?=i

nj

??

≥

?

− a3− 2k(a + 1)

?

r

?

i=1

fi

+

r

?

i=1

ni

nia + 2kfi+ k(a + 1)

r

?

j?=i

nj

??

.

For the last inequality we use the fact that di≥ nia + kfi. This

inequality holds because Ci contains at least the ni nodes of

degree a from the clique H. In addition, it contains both the

clique and element nodes for each edge counted in fi. For each

such edge there are k − 1 other edges connecting the element

node to the k − 1 other cliques. Hence, we get a contribution of

kfiin the degrees of the element nodes. Combining the terms ni

and one of the terms?

j?=injwe obtain

∆

≥

a

4m2

?

− a3− 2k(a + 1)

?

i=1

r

?

i=1

fi

?

+

a

4m2

r

?

ni

?

a

r

?

j=1

nj+ 2kfi

+((k − 1)a + k)

r

?

j?=i

nj

??

=

a

4m2

?

− 2k(a + 1)

r

?

i=1

fi

+

r

?

i=1

ni

?

2kfi+ ((k − 1)a + k)

r

?

j?=i

nj

??

=

a

4m2

?

r

?

i=1

2kfi(ni− a − 1))

+((k − 1)a + k)

r

?

i=1

r

?

j?=i

ninj

?

≥

a

4m2

?

r

?

i=1

2kni(ni− a − 1)

+((k − 1)a + k)

r

?

i=1

r

?

j?=i

ninj

?

,

For the last step we note that ni≤ a − 1 and ni− a − 1 < 0

for all i = 1,...,r. So increasing fi decreases the modularity

difference. For each node of H there is at most one edge to a

node not in H, and thus fi≤ ni.

Page 5

5

By rearranging terms and using the inequality a ≥ 3k we get

?

∆

≥

a

4m2

r

?

i=1

ni

2k(ni− a − 1)

+((k − 1)a + k)

r

?

j?=i

nj

?

=

a

4m2

r

?

i=1

ni

−2k + ((k − 1)a − k)

4m2((k − 1)a − 3k)

r

?

j?=i

nj

≥

a

r

?

i=1

r

?

j?=i

ninj

≥

3k2

4m2(3k − 6)

r

?

i=1

r

?

j?=i

ninj .

As we can assume k > 2 for all relevant instances of 3-

PARTITION, we obtain ∆ > 0. This shows that any clustering can

be improved by merging each clique completely into a cluster.

Next, we observe that the optimum clustering places at most one

clique completely into a single cluster.

Lemma 4.2: In a maximum modularity clustering of G(A),

every cluster contains at most one of the cliques H1,...,Hk.

Proof:Consider a maximum modularity clustering.

Lemma 4.1 shows that each of the k cliques H1,...,Hk is

entirely contained in one cluster. Assume that there is a cluster

C which contains at least two of the cliques. If C does not

contain any element nodes, then the cliques form disconnected

components in the cluster. In this case it is easy to see that the

clustering can be improved by splitting C into distinct clusters,

one for each clique. In this way we keep the number of edges

within clusters the same, however, we reduce the squared degree

sums of clusters.

Otherwise, we assume C contains l > 1 cliques completely and

in addition some element nodes of elements aj with j ∈ J ⊆

{1,...,k}. Note that inside the l cliques la(a − 1)/2 edges are

covered. In addition, for every element node corresponding to an

element ajthere are lajedges included. The degree sum of the

cluster is given by the la clique nodes of degree a and some

number of element nodes of degree kaj. The contribution of C

to q(C) is thus given by

j∈J

1

m

l

2a(a − 1) + l

?

aj

−

1

4m2

la2+ k

?

j∈J

aj

2

.

Now suppose we create C?by splitting C into C?

that C?

number of edges covered within the cliques the same, however,

all edges from H to the included element nodes eventually drop

out. The degree sum of C?

of C?

1and C?

2such

1completely contains a single clique H. This leaves the

1is exactly a2, and so the contribution

2to q?C??is given by

1

m

j∈J

1and C?

l

1

4m2

2a(a − 1) + (l − 1)

?

?

aj

−

(l − 1)a2+ k

j∈J

aj

2

+ a4

.

Considering the difference we note that

q?C??− q(C)=

−1

m

?

j∈J

1

4m2

2(l − 1)a4+ 2ka2?

−4m?

2(l − 1)a4− 2ka?

9k3

2m2(9k − 1)

0,

aj

+

?

(2l − 1)a4+ 2ka2?

j∈J

aj− a4?

=

j∈Jaj

4m2

j∈Jaj

4m2

=

j∈Jaj

4m2

≥

>

as k > 0 for all instances of 3-PARTITION.

Since the clustering is improved in every case, it is not optimal.

This is a contradiction.

The previous two lemmas show that any clustering can be

strictly improved to a clustering that contains k clique clusters,

such that each one completely contains one of the cliques

H1,...,Hk(possibly plus some additional element nodes). In

particular, this must hold for the optimum clustering as well. Now

that we know how the cliques are clustered we turn to the element

nodes.

As they are not directly connected, it is never optimal to create a

cluster consisting only of element nodes. Splitting such a cluster

into singleton clusters, one for each element node, reduces the

squared degree sums but keeps the edge coverage at the same

value. Hence, such a split yields a clustering with strictly higher

modularity. The next lemma shows that we can further strictly

improve the modularity of a clustering with a singleton cluster of

an element node by joining it with one of the clique clusters.

Lemma 4.3: In a maximum modularity clustering of G(A),

there is no cluster composed of element nodes only.

Proof: Consider a clustering C of maximum modularity and

suppose that there is an element node vi corresponding to the

element ai, which is not part of any clique cluster. As argued

above we can improve such a clustering by creating a singleton

cluster C = {vi}. Suppose Cminis the clique cluster, for which

the sum of degrees is minimal. We know that Cmincontains all

nodes from a clique H and eventually some other element nodes

for elements aj with j ∈ J for some index set J. The cluster

Cmincovers all a(a − 1)/2 edges within H and?

k?

of C and Cminto q(C) of

?

2

j∈J

Again, we create a different clustering C?by joining C and Cmin

to a new cluster C?. This increases the edge coverage by ai. The

new cluster C?has the sum of degrees of both previous clusters.

The contribution of C?to q?C??is given by

1

m2

j∈J

j∈Jajedges

to element nodes. The degree sum is a2for clique nodes and

j∈Jajfor element nodes. As C is a singleton cluster, it covers

no edges and the degree sum is kai. This yields a contribution

1

m

a(a − 1)

+

?

aj

?

−

1

4m2

??

a2+ k

?

j∈J

aj

?2

+ k2a2

i

?

.

?

a(a − 1)

+ ai+

?

aj

?

−

1

4m2

?

a2+ kai+ k

?

j∈J

aj

?2

,