Content uploaded by Ilias Gialampoukidis

Author content

All content in this area was uploaded by Ilias Gialampoukidis on Nov 29, 2020

Content may be subject to copyright.

Community Detection in Complex Networks Based

on DBSCAN* and a Martingale Process

Ilias Gialampoukidis, Theodora Tsikrika, Stefanos Vrochidis and Yiannis Kompatsiaris

Information Technologies Institute

Centre for Research and Technology Hellas

Email: {heliasgj, theodora.tsikrika, stefanos, ikom}@iti.gr

Abstract—Community detection is a valuable tool for ana-

lyzing complex networks. This work investigates the community

detection problem based on the density-based algorithm DB-

SCAN*. This algorithm requires, though, a lower bound for the

community size to be determined a priori, a challenging task.

To this end, this work proposes the application of a Martingale

process to DBSCAN* that progressively detects communities at

various levels of granularity. The proposed DBSCAN*-Martingale

community detection algorithm corresponds to an iterative pro-

cess that progressively lowers the threshold of the size of

the acceptable communities, while maintaining the communi-

ties detected for higher thresholds. Evaluation experiments are

performed based on four realistic benhmark networks and the

results indicate improvements in the effectiveness of the proposed

DBSCAN*-Martingale community detection algorithm in terms

of the Normalized Mutual Information and the RAND metrics

against several state-of-the-art community detection approaches.

I. INTRODUCTION

Community detection in complex networks aims to identify

groups of nodes that are more densely connected to each

other than to the rest of the network [1] and thus probably

share common properties and/or play similar roles within the

network [2]. The detection of the community structure of

networks is of great importance in many ﬁelds, including

sociology and biology [3], as well as computer science [4],

i.e. disciplines where systems are often represented as net-

works. More recently, there has been increasing interest in

detecting communities on the Web [5] and social media [1]

so as to both gain valuable insights into the particular charac-

teristics and latent phenomena in such networks, and also to

exploit the detected communities in various applications, such

as in the detection of events in social media streams.

Detecting communities in complex networks is also known

as a graph partition problem, given that networks are usually

modelled as graphs. A graph can be split into communities

in numerous ways, i.e. for each graph there are many possi-

ble community structures. In the simple case, a community

structure is deﬁned as a graph partition into a set of node sets.

Several community detection algorithms have been pro-

posed (e.g. [2], [6], [7], [3], [8], [9], [10], [11]). The quality

of their results is often evaluated by the use of modularity [4],

particularly in the absense of appropriate ground-truth. Hence,

several approaches use modularity optimization itself as a

method for the detection of communities in complex net-

works [2]. Alternative to the maximization of modularity, the

minimization of the so-called codelength description, being the

minimum Shannon information needed to describe a random

walk on the network, has also played a key role in revealing

community structure [11]. However, none of these approaches

is able to identify noise, i.e. nodes that are not members of any

community. To address this issue, density-based community

detection approaches are more appropriate since they provide

support for leaving spuriously connected nodes (i.e. noise) out

of the detected community structure.

DBSCAN* [12], the graph analogue of the well-established

DBSCAN [13] algorithm, is such a density-based approach

that could be applied to community detection. Similarly to

DBSCAN, it relies on two parameters, the density level and

a lower bound M inP ts for the number of nodes that may form

a community. Both these parameters greatly affect the output of

the algorithm, but their estimation is far from trivial. To address

this issue, and in particular the estimation of the M inP ts pa-

rameter, this work proposes an extension to DBSCAN* based

on Doob’s Martingale [14], which involves the construction

of a Martingale that progressively gains knowledge about the

communities in the network based on an iterative application

of DBSCAN* for several values of MinP ts.

The main contributions of this work are three-fold: (i) the

application of DBSCAN* to the community detection problem,

(ii) the proposal of a Martingale process for community detec-

tion based on DBSCAN*, and (iii) the experimental evaluation

of the proposed DBSCAN*-Martingale community detection

algorithm against several state-of-the-art community detection

approaches by using four realistic benhmark networks [15].

The proposed DBSCAN*-Martingale community detection

algorithm is presented in Section III and its experimental

evaluation is reported in Section IV. First, though, the state-

of-the-art in community detection is discussed next.

II. RE LATE D WOR K

A large number of community detection algorithms has

appeared in the literature (e.g. [2], [6], [7]), but only few of

them are large scale algorithms that are directly applicable in

large social media graphs, as reviewed in [1].

The GirvanNewman community detection algorithm [3],

[4] is a divisive hierarchical process, based on the edge be-

tweenness centrality measure, which may be quickly calculated

[16]. The edge betweenness is measured by the number of

shortest paths that pass through a given edge and determines

the edges which are more likely to connect different com-

munities. The edge with the highest edge betweenness is

removed and the remaining edges are re-assigned new edge

This is a draft version of the paper. The final version will appear at IEEEXplore: 978-1-5090-5246-2/16/$31.00 ©2016 IEEE

In: Proceedings of the 11th International Workshop on Semantic and Social Media Adaptation and Personalization

betweenness scores. The process generates a dendrogram with

root node the whole graph and leaves the graph vertices. In

order to extract the detected communities, the modularity score

is computed at each dendrogram cut, so as to be maximized.

The GirvanNewman algorithm requires the maximization of

a modularity function, as a stopping criterion, for the op-

timal extraction of communities. An alternative hierarchical

approach for community detection has been proposed [17],

using the modularity function as an objective function to

optimize. Initially, all vertices are separate communities and

any two communities are merged if the modularity increases.

The algorithm stops when the modularity is not increasing

anymore.

In the Label Propagation method [8], every node is initial-

ized with a unique label and at every step each node adopts

the label that most of its neighbors currently have. Hence, an

iterative process is deﬁned, in which the densely connected

groups of nodes form a consensus on a unique label and

communities are extracted.

The Louvain method [9] is based on the maximization

of the modularity and involves two phases that are repeated

iteratively. In the ﬁrst phase, each vertex forms a community

and for each vertex ithe gain of modularity is calculated

for removing vertex ifrom its own community and placing

it into the community of each neighbor jof i. The vertex i

is moved to the community for which the gain in modularity

becomes maximal. In case the modularity decreases or remains

the same, vertex idoes not change community. The ﬁrst phase

is completed when the modularity cannot be further increased.

In the second phase, the detected communities formulate a

new network with weights of the links between the new

nodes being the sum of weights of the links between nodes

in the corresponding two communities. In this new network,

self-loops are allowed, representing links between vertices

of the same community. At the end of the second phase,

the ﬁrst phase is re-applied to the new network, until no

more communities are merged and the modularity attains its

maximum.

The Walktrap method [10] generates random short walks

on the graph by simulating transitions from one node to

another. Since short random walks tend to stay within the

same community, it is possible to detect communities using

such random walks.

The Infomap method [11], [18], [19] is an information-

theoretic approach for community detection. The inventors of

the Infomap method showed that the problem of ﬁnding a

community structure in networks is equivalent to solving a

coding problem. In general, the goal of a coding problem

is to minimize the information required for the transmission

of a message. Initially, Infomap employs the Huffman code

[20] in order to give a unique name (codeword) in every

node in the network. In contrast to the Louvain method,

which maximizes modularity, Infomap minimizes the Shannon

information [20] required to describe the trajectory of a random

walk on the network. The objective function, which minimizes

the description length of a random walk on the network

(described by the corresponding sequence of codewords on

each visited node), is called the map equation [11], [18], [19],

and is minimized over all possible network partitions.

DBSCAN [13] is a density-based clustering algorithm,

which is able to extract clusters without knowing the number

of clusters, even in the case where there is noise in the

spatial collection of points. The clustering is based on two

parameters and M inP ts, which are determined by the

desired density level and a lower bound for the number of

points in a cluster M inP ts. The estimation of the density

level, however, is not a trivial task and several approaches have

been proposed to extract clusters, using DBSCAN, without

determining the parameter , such as the DBSCAN-Martingale

[21]. The graph-analogue of DBSCAN is called DBSCAN*

[12] and deﬁnes core objects on a graph, in a way similar

to the core points of DBSCAN. The transition from density-

based clustering of spatial databases to community detection

in graphs, through DBSCAN* does not involve border points,

due to the “updated” deﬁnition of reachability.

III. DBSCAN*-MARTINGALE COMMUNITY DETE CT IO N

A. Notation and Preliminaries on DBSCAN* and Martingales

Given a network G(N, E)with Nnodes and Eedges,

density-based community detection algorithms partition the

network into kcommunities, where Nc⊆Nof the nodes

belong to the detected communities, while the N\Ncnodes

that were not assigned to any of the communities are labeled

as “noise”. The output of such algorithms corresponds to an

N-dimensional vector C. For each node nj,j= 1,2, . . . , N,

the j-th element of C, denoted as C[j], is assigned the ID

{1,2, . . . , k}of the community the node njbelongs; if a node

does not belong to any of the communities, the value 0is

assigned instead. As a result, the communities vector Cis an

N-dimensional vector with values in {0,1,2, . . . , k}.

DBSCAN* relies on two parameters, the density level

and the minimum number M inP ts of nodes that can form

a community. We denote the communities vector provided by

DBSCAN* as CDBS CAN ∗(,M inP ts). As this work considers

that the parameter is ﬁxed, the communities vector is denoted

as CDBS CAN ∗(M inP ts). High values of MinP ts typically

result in a CDBS CAN ∗(M inP ts)vector of zeros, i.e. all nodes

are marked as noise, since the algorithm fails to detect com-

munities required to have at least MinP ts nodes. On the other

hand, low values of MinP ts result in a single community and

thus the partitioning is trivial.

The output of DBSCAN* strongly depends on the pa-

rameter M inP ts. This is illustrated by the example depicted

in Figure 1. Figure 1a shows the ground-truth communities

as disconnected components for illustrative purposes. A high

value of M inP ts (M inP ts > 13) results in no communities

being detected (Figure 1b). For M inP ts = 13, two commu-

nities are detected (Figure 1c), while for M inP ts = 11, two

additional communities are detected (Figure 1d). Lower values

of M inP ts result in the detection of further communities, but

at the same time they merge communities that would have

been detected as separate by higher values of M inP ts.

This indicates that a single value of M inP ts may not allow

to detect all communities and motivates us to consider that

an iterative process would be more appropriate for detecting

communities in an effective manner. In particular, starting

from high values of M inP ts, so that the larger communities

are detected, and progressively decreasing MinP ts, so that

(a) ground truth (b) MinPts>13

(c) MinPts= 13 (d) MinPts= 11

Fig. 1. Community detection in a social network consisting of 650 nodes using DBSCAN* with = 1 and various values of M inP ts.

further, smaller, communities are detected, would result in a

set of communities that are detected based on different values

of M inP ts; this process would continue until a minimum

acceptable threshold of community size is applied. To this

end, we propose an extension of DBSCAN* based on Doob’s

Martingale, which allows for introducing a random variable

M inP ts and involves the construction of a Martingale pro-

cess, which progressively approaches the CDBSC AN∗(MinP ts)

vector that contains all communities.

Martingale is a stochastic process, i.e. a sequence of

random variables X1, X2,..., for which the expected future

value of Xs+1 , given all prior values X1, X2, . . . , Xs, is equal

to the present observed value Xs. A well-known martingale is

Doob’s Martingale, in which our knowledge about a random

variable is progressively obtained and is deﬁned as follows:

Deﬁnition 1: (Doob’s Martingale) [14] Let X, Y1, Y2, . . .

be any random variables with ﬁnite expectation E[|X|]<

∞. Then, if Xsis deﬁned by the conditional expectation

Xs=E[X|Y1, Y2, . . . , Ys], the sequence of random variables

X1, X2, . . . is a martingale.

We shall introduce a probabilistic method that constructs

aMartingale stochastic process for progressively detecting all

communities based on DBSCAN* and a given density level

. The martingale construction is based on Doob’s martingale

(Deﬁnition 1), where knowledge is progressively gained about

the result of a random variable.

B. Progressive Community Detection Based on a Martingale

In the context of a community detection problem, the

random variable that needs to be known is the vector of

communities’ IDs, which is a combination of Scommunities’

vectors CDBSCAN ∗(M inP tss), each generated for a different

value M inP tss, s = 1,2, . . . , S . For each application of DB-

SCAN*, the parameter is set to 1so that only the immediate

neighborhood of each node is considered. Neighborhoods of

order greater than 2 tend to merge different communities, be-

cause all communities are mutually reachable by intermediate

nodes much easier than the case where neighborhoods are

considered to be of order 1.

First, we generate Srandom numbers M inP tss, s =

1,2, . . . , S uniformly in [M inP tsmin, M inP tsmax], a range

of thresholds for the minimum community size. The sample

of M inP tss, s = 1,2, . . . , S is sorted in decreasing order.

Initially, there are no communities detected in the network.

In the ﬁrst iteration (s= 1), all communities detected by

CDBS CAN ∗(M inP ts1)are kept, corresponding to the commu-

nity size threshold M inP ts1, i.e. the largest value in the

range. In the second iteration (s= 2), some of the detected

communities by CDBS CAN ∗(M inP ts2)are new and some of

them were previously detected at iteration (s= 1). In order

to keep only the newly detected communities of the second

iteration (s= 2), we keep only the group of numbers of the

same cluster ID with size greater than or equal to M inP ts2,

but lower than MinP ts1, and set the rest to 0.

Formally, we deﬁne the sequence of communities C(s), s =

1,2, . . . , S, where C(1) =CDBSC AN∗(M inP ts1)and:

C(s)[j] := 0, if nj∈a previously detected community

CDBS CAN ∗(M inP tss)[j], otherwise

(1)

Finally, we relabel the IDs of the detected communities.

Assuming that rnew communities are detected at iteration s,

we update the labels of C(s)starting from 1 + maxjC(s−1)[j]

to r+maxjC(s−1)[j]. The sum of all vectors C(s)up to stage

Sis the ﬁnal communities vector of our algorithm:

C=C(1) +C(2) +· · · +C(S)(2)

The sequence of vectors Xs=C(1) +C(2) +· · ·+C(s), s =

1,2, . . . , S is Doob’s martingale for the sequence of random

variables Yt=CDBSC AN ∗(MinP tss), s = 1,2, . . . , S. Each

random selection of M inP tss, s = 1,2, . . . , S provides one

vector CDBSCAN ∗(M inP tss)of community IDs for all s=

1,2, . . . , S. As sdecreases, more vectors are combined and we

gain knowledge about the ﬁnal vector Cof community IDs.

The vector C(1) +C(2) +· · ·+C(S)is our “best prediction” for

the ﬁnal vector Cof community IDs at stage s. The expected

ﬁnal vector of community IDs at stage s=Shas extracted all

available communities of various sizes.

Update the labels of the communities

Update the vector

New community detected for

Fig. 2. DBSCAN*-Martingale for S= 2 iteraions. The two communities

detected at the ﬁrst iteration are re-discovered in the second iteration as a

single community, but the update keeps them as separate, together with the

newly discovered community of the second iteration.

This DBSCAN*-Martingale process that detects commu-

nities in a progressive manner and combines them in a single

communities vector is presented as pseudo-code in Algorithm

1 and it is also illustrated in Figure 2 for two iterations and

values M inP ts1= 5 and M inP ts2= 4, where XTdenotes

the transpose of vector X.

The DBSCAN*-Martingale may not assign all nodes to

a community. To address this issue, an optional propaga-

tion step is applied where each unassigned node is assigned

Algorithm 1: DBSCAN*-Martingale(,MinP ts)return C

1: Generate a random sample of Svalues in [MinP tsmin , M inP tsmax]

2: Sort the generated sample s, s = 1,2,...,S

3: for t= 1 to S

4: ﬁnd CDBSCAN ∗(,MinP tss)

5: compute C(s)as in Eq. (1)

6: update the community IDs

7: update the vector Cas in Eq. (2)

8: end for

9: return C

to the community that belongs to its -neighborhood. This

propagation process is iteratively repeated until there are no

unassigned nodes in the connected components of the detected

communities. Figure 3 illustrates this process for the case

of two communities detected by DBSCAN*-Martingale with

= 1, i.e. t0signiﬁes the start of the propagation process

following the end of the community detection algorithm. At

each iteration ti, i > 0, these two communities are expanded

with their immediate neighbours (since = 1) and after

ﬁve iterations, both communities consist of all nodes in their

connected component.

The DBSCAN*-Martingale requires Siterations of the

DBSCAN* algorithm, which runs in O(Nlog N)if a tree-

based spatial index is used and in O(N2)without tree-based

spatial indexing [22]. Therefore, the DBSCAN*-Martingale

runs in O(SN log N)for tree-based indexed datasets and in

O(SN 2)without tree-based indexing. The optional propa-

gation step has worst-case complexity O(N), since in the

worst case scenario the algorithm will examine all nodes

for deciding whether to update their community ID or not.

Our code is written in R1and uses the DBSCAN-Martingale

implementation available on Github2for implementing the

proposed DBSCAN*-Martingale.

IV. EVALUATION

A. Experimental Set-Up

Evaluation is performed using the community detection

benchmark networks developed by Lancichinetti, Fortunato,

and Radicchi (LFR) [15]. These LFR networks were developed

with the goal to reﬂect the structure of real networks and in

particular to account for the heterogeneity in the distributions

of node degrees and of community sizes. This work employs

four such networks, namely LFR1, LFR2, LFR3 and LFR4,

constructed under the realistic assumptions (i) the network is

scale-free and its degree distribution has a power-law behavior

with power-law exponent τ1, (ii) the community sizes also

obey a power-law distribution with exponent τ2and (iii) the

communities are mixed, i.e. links appear from a node in a

community ito a node in a community j, where i6=j. The

ratio of links between different communities to the number

of links within a community determines the mixing parameter

µ. When µ= 0 there is no mixing, thus all communities are

also disconnected components, and when µ= 1 there is no

community structure.

We used four datasets of sizes 650, 3,182, 21,226 and

41,791 nodes with 10, 50, 200 and 50 communities, respec-

tively. Their characteristics are as follows:

1https://www.r-project.org/

2https://github.com/MKLab-ITI/topic-detection

(a) t0(b) t1(c) t2

(d) t3(e) t4(f) t5

Fig. 3. Iterative propagation of community membership to unassigned nodes until all nodes in the connected components of the communities detected by

DBSCAN*-Martingale are assigned to a community.

TABLE I. CO MMU NI TY DE TEC TI ON EVAL UATIO N RE SULT S.

Size

Method 650 3,182 21,226 41,791

NMI RAND NMI RAND NMI RAND NMI RAND

Edge Betweenness [4] 0.7018 0.8793 0.8567 0.9601 NA NA NA NA

Fast Greedy [17] 0.7038 0.8808 0.8543 0.9598 0.7046 0.8196 0.4177 0.6303

Label Propagation [8] 0.5930 0.8553 0.7116 0.9490 0.5458 0.8144 0.2882 0.6255

Louvain [9] 0.6947 0.8792 0.8589 0.9606 0.7077 0.8198 0.4200 0.6305

Walktrap [10] 0.6904 0.8808 0.8653 0.9621 0.7081 0.8336 0.3842 0.6529

Infomap [18], [19] 0.5852 0.8551 0.7180 0.9488 0.5569 0.8144 0.2954 0.6255

DBSCAN*-Martingale 0.7898 0.9303 0.8665 0.9626 0.7234 0.8437 0.4526 0.6627

•LRF benchmark dataset 1 (LRF1): 10 commu-

nities, 650 vertices, minimum community size 20,

community size power-law ﬁt beta = 1.89 (p-value =

0.16 >0.05), degree distribution power-law ﬁt gamma

= 3.54 (p-value = 0.29 >0.05) and maximum degree

= 13.

•LRF benchmark dataset 2 (LRF2): 50 communi-

ties, 3,182 vertices, minimum community size 15,

community size power-law ﬁt beta = 1.98 (p-value =

0.93 >0.05), degree distribution power-law ﬁt gamma

= 3.63 (p-value = 0.99 >0.05) and maximum degree

= 28.

•LRF benchmark dataset 3 (LRF3): 200 communi-

ties, 21,226 vertices, minimum community size 10,

community size power-law ﬁt beta = 2.00 (p-value =

0.70 >0.05), degree distribution power-law ﬁt gamma

= 3.33 (p-value = 0.13 >0.05) and maximum degree

= 52.

•LRF benchmark dataset 4 (LRF4): 50 communi-

ties, 41,791 vertices, minimum community size 10,

community size power-law ﬁt beta = 1.69 (p-value =

0.87 >0.05), degree distribution power-law ﬁt gamma

= 3.49 (p-value=0.98 >0.05) and maximum degree =

124.

All these datasets have ground truth community structure,

i.e. they provide annotated graph nodes based on the com-

munity they belong to.

The proposed DBSCAN*-Martingale is evaluated against

the well-established and parameter-free community detection

algorithms presented in Section II and listed in Table I; to this

end, their respective implementations in igraph (version 1.0.1,

date: 2015-06-26) are used. Based on preliminary experiments,

the range of M inP ts values was set to [5,30] and the number

of iterations Sto 5. The parameter was set to 1as many

community detection approaches consider only the immediate

neighborhood of each node. In addition, the propagation pro-

cess was applied for determining the community membership

of some of the unassigned nodes, given that the LRF datasets

provide ground truth for all nodes, i.e. no nodes are left

unassigned. Finally, the most prominent evaluation measures

in community detection were employed, namely Normalized

Mutual Information [23] and RAND [24].

B. Results

Table I presents the results of the evaluation experiments in

each of the four datasets. All community detection approaches

were applied in all datasets, apart from the GirvanNewman

(Edge Betweenness) approach [4] which is applicable only to

small-scale datasets and thus it was not applied to LFR3 and

LFR4.

The proposed DBSCAN*-Martingale is the best perform-

ing community detection approach for both evaluation metrics

across all datasets, indicating its quality and robustness across

heterogeneous networks of different sizes. The most signiﬁ-

cant differences to the other approaches for both evaluation

metrics are observed for the smallest LRF dataset. For LRF1,

DBSCAN*-Martingale indicates improvements over the other

community detection approaches ranging from 12% to 35%

in terms of NMI and ranging from 5.6% to 8.8% in terms

of RAND. In the larger datasets, the DBSCAN*-Martingale

still performs better than all the other approaches, but the

differences in the effectiveness are smaller, particularly for the

RAND evaluation metric.

Interestingly, the second best performing community detec-

tion approach is Walktrap [10], with the exception of NMI for

LFR1 and LFR2, where the Fast Greedy [17] and the Louvain

[9] methods perform second best, respectively.

V. CONCLUSIONS

This work proposed a novel community detection approach

based on the DBSCAN* density-based algorithm and a Mar-

tingale process that aims to progressively detect communities

in complex networks at various levels of granularity. To this

end, it applies an iterative process that progressively lowers

the threshold of the size of the acceptable communities, while

maintaining the communities detected for higher thresholds.

The output of our proposed community detection approach is

usually a .json ﬁle, which is then imported by other appli-

cations. Evaluation experiments over four benchmark datasets

with diverse characteristics and sizes against several state-of-

the-art community detection methods indicate the effective-

ness and robustness of the proposed approach. Further work

includes its application in large-scale social media networks

where communities can be deﬁned along various dimensions

given the multitude of relationships that exist between users

(i.e. the nodes in the network) and further optimizations for

automatically determining the range of lower bound values

to explore in the Martingale process based on the network

characteristics. We expect that our method will achieve high

performance especially in covert networks where communities

are sparsely connected and not very mixed.

ACKNOWLEDGMENT

This work was partially supported by the European Com-

mission by the projects MULTISENSOR (FP7-610411) and

HOMER (FP7-312883).

REFERENCES

[1] S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos,

“Community detection in social media,” Data Mining and Knowledge

Discovery, vol. 24, no. 3, pp. 515–554, 2012.

[2] S. Fortunato, “Community detection in graphs,” Physics reports, vol.

486, no. 3, pp. 75–174, 2010.

[3] M. Girvan and M. E. Newman, “Community structure in social and

biological networks,” Proceedings of the national academy of sciences,

vol. 99, no. 12, pp. 7821–7826, 2002.

[4] M. Newman and M. Girvan, “Finding and evaluating community

structure in networks,” Physical Review E, vol. 69, no. 2, p. 026113.

[5] Y. Dourisboure, F. Geraci, and M. Pellegrini, “Extraction and classiﬁ-

cation of dense communities in the web,” in Proceedings of the 16th

international conference on World Wide Web. ACM, 2007, pp. 461–

470.

[6] F. D. Malliaros and M. Vazirgiannis, “Clustering and community

detection in directed networks: A survey,” Physics Reports, vol. 533,

no. 4, pp. 95–142, 2013.

[7] S. Harenberg, G. Bello, L. Gjeltema, S. Ranshous, J. Harlalka, R. Seay,

K. Padmanabhan, and N. Samatova, “Community detection in large-

scale networks: a survey and empirical evaluation,” Wiley Interdisci-

plinary Reviews: Computational Statistics, vol. 6, no. 6, pp. 426–439,

2014.

[8] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algo-

rithm to detect community structures in large-scale networks,” Physical

Review E, vol. 76, no. 3, p. 036106, 2007.

[9] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast

unfolding of communities in large networks,” Journal of statistical

mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.

[10] P. Pons and M. Latapy, “Computing communities in large networks

using random walks.” J. Graph Algorithms Appl., vol. 10, no. 2, pp.

191–218, 2006.

[11] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex

networks reveal community structure,” Proceedings of the National

Academy of Sciences, vol. 105, no. 4, pp. 1118–1123, 2008.

[12] R. J. Campello, D. Moulavi, and J. Sander, “Density-based clustering

based on hierarchical density estimates,” in Advances in Knowledge

Discovery and Data Mining. Springer, 2013, pp. 160–172.

[13] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based

algorithm for discovering clusters in large spatial databases with noise.”

in Kdd, vol. 96, no. 34, 1996, pp. 226–231.

[14] J. L. Doob, “Stochastic processes,” Wiley, New York, vol. 101, 1953.

[15] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for

testing community detection algorithms,” Physical review E, vol. 78,

no. 4, p. 046110, 2008.

[16] U. Brandes, “A faster algorithm for betweenness centrality*,” Journal

of mathematical sociology, vol. 25, no. 2, pp. 163–177, 2001.

[17] A. Clauset, M. E. Newman, and C. Moore, “Finding community

structure in very large networks,” Physical review E, vol. 70, no. 6,

p. 066111, 2004.

[18] M. Rosvall, D. Axelsson, and C. T. Bergstrom, “The map equation,”

The European Physical Journal Special Topics, vol. 178, no. 1, pp.

13–23, 2009.

[19] L. Bohlin, D. Edler, A. Lancichinetti, and M. Rosvall, “Community

detection and visualization of networks with the map equation frame-

work,” in Measuring Scholarly Impact. Springer, 2014, pp. 3–34.

[20] T. M. Cover and J. A. Thomas, Elements of information theory. John

Wiley & Sons, 2012.

[21] I. Gialampoukidis, S. Vrochidis, and I. Kompatsiaris, “A hybrid frame-

work for news clustering based on the dbscan-martingale and lda,” in

Machine Learning and Data Mining in Pattern Recognition. Springer,

2016, pp. 170–184.

[22] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics:

ordering points to identify the clustering structure,” in ACM Sigmod

Record, vol. 28, no. 2. ACM, 1999, pp. 49–60.

[23] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, “Comparing

community structure identiﬁcation,” Journal of Statistical Mechanics:

Theory and Experiment, vol. 2005, no. 09, p. P09008, 2005.

[24] W. M. Rand, “Objective criteria for the evaluation of clustering meth-

ods,” Journal of the American Statistical association, vol. 66, no. 336,

pp. 846–850, 1971.