Content uploaded by Yang Chen
Author content
All content in this area was uploaded by Yang Chen on Jul 12, 2019
Content may be subject to copyright.
Distributed Community Detection over Blockchain Networks
Based on Structural Entropy
Yang Chen
School of Computer Science
The University of Auckland
Auckland, New Zealand
yang.chen@auckland.ac.nz
Jiamou Liu
School of Computer Science
The University of Auckland
Auckland, New Zealand
jiamou.liu@auckland.ac.nz
ABSTRACT
Blockchain technology provides a groundbreaking computing par
adigm that tackles problems in a completely decentralised man
ner. As the underlying infrastructure and protocol of blockchain,
blockchain networks convey communications and coordination
across all involving participants. In extensive application scenar
ios, conducting community detection over blockchain networks
has potential eects on both discovering hidden information and
enhancing communicating eciency. However, the decentralised
nature poses a restriction on community detection over blockchain
networks. In coping with this restriction, we propose a distributed
community detection method based on the ProposeSelectAdjust
(
PSA
) framework that runs in an asynchronous way. We extend
the
PSA
framework using the concept of structural entropy and
aim to detect a community structure with low entropy. We test our
entropybased distributed community detection algorithm on both
benchmark networks and bitcoin trust networks. Experimental re
sults reveal that our algorithm successfully detects communities
with low structural entropy.
CCS CONCEPTS
•Information systems →Social networks
;
•Security and pri
vacy →Distributed systems security
;
•Theory of computa
tion →Distributed algorithms
;
•Mathematics of computing
→Information theory.
KEYWORDS
blockchain; community detection; structural entropy; distributed
computing
ACM Reference Format:
Yang Chen and Jiamou Liu. 2019. Distributed Community Detection over
Blockchain Networks Based on Structural Entropy. In 2019 ACM Int’l Sym
posium on Blockchain and Secure Critical Infrastructure (BSCI ’19), July 8,
2019, Auckland, New Zealand. ACM, New York, NY, USA, 10 pages. https:
//doi.org/10.1145/3327960.3332381
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
BSCI ’19, July 8, 2019, Auckland, New Zealand
©2019 Association for Computing Machinery.
ACM ISBN 9781450367868/19/07. . . $15.00
https://doi.org/10.1145/3327960.3332381
1 INTRODUCTION
Cryptocurrency has attracted massive attentions from industry,
government institutions, and academia in recent years. As the rst
and most representative cryptocurrency, bitcoin has gained huge
success in the worldwide capital market. The blockchain is the
core technology and underlying mechanism for all kinds of cryp
tocurrency, which was rst proposed in 2008 and implemented
in 2009 [
16
]. Blockchain can be regarded as a public ledger of all
cryptocurrency transactions that have ever been executed. It is
constantly growing as new blocks are appended to it by miners
(typically every 10 minutes) to record the most recent transactions.
The key characteristics of blockchain include decentralisation, per
sistency, anonymity, and auditability, in which decentralisation is
most outstanding. Through using technologies of cryptographic
hash, digital signature (asymmetric cryptography) and distributed
consensus mechanism, blockchain works in a completely decen
tralised environment, which allows transactions to be recorded and
validated without central trusted devices. Beyond cryptocurrency
and nancialrelated applications, the generalised blockchain has
demonstrated immense potentials in elds of governance, health,
science, literacy, culture, and art [25].
Blockchain utilizes the peertopeer (P2P) network as the under
lying communication architecture and protocol, where participants
are equally privileged in terms of communication. Based on the P2P
network architecture, the generalised blockchain can be regarded
as a complex networkbased software connector, which provides
communication, coordination (through transactions, smart con
tracts, and validation oracles) and facilitation services [
27
]. The
twolayer blockchain model is depicted in Fig. 1, where each node
has two layers, namely, the blockchain layer and the application
layer. In the application layer, interactions among participants of
the blockchain form various types of virtual relations such as trust
[24] and transaction [3]. Each of these relations form a network.
Since there is no control by central servers, networks in the
application layer are very much selforganising and exhibit some
signicant features [
14
,
15
,
28
]. One of the most prevalent features
is community structure which means that the network can be viewed
as consisting of a number of subnetworks, called communities;
the links within the same community are dense whereas between
dierent communities are sparse. Naturally, the application layer
networks of blockchain exhibits community structure. The commu
nity detection problem demands an automated approach to reveal
the communities given the network structure.
Detecting communities over applicationlayer networks can
solve some signicant problems from a structural perspective. For
example, one of the most straightforward applications in the context
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
3
of blockchain is to track bitcoin users activity on bitcoin transac
tion networks [
22
]. Each node in a bitcoin transaction network
represents an address and the network is partitioned into several
communities. In this scenario, the detected communities are used to
reidentify multiple addresses that belong to the same user. Another
application amounts to facilitating anomaly detection in cryptocur
rency transaction network [
21
]. The transaction network is divided
into groups to locate anomalies as soon as possible to prevent them
from harming the network’s community and integrity. A third
possible application scenario is to execute group validation over
blockchain trust networks. A trust network records trust relation
ships of participants involving in blockchain applications. The trust
network is not static, but rather, it undergoes continuous updates
as users join and leave the network. This calls for a dynamic algo
rithm to perform community detection while the trust relationships
change as time passes. Such a dynamic algorithm would be useful
when validating events: When an event proposed by a user is broad
cast across the communication network, the groups that capture
the message can diuse the event within the respective group and
provide group validations accordingly.
Existing methods to solve the community detection problem pose
some severe challenges for the applications above on blockchain
networks. Namely, most established methods for community de
tection are centralised approaches, where the algorithm needs to
know in advance the entire network topology [
5
,
18
,
20
]. These
methods produce highly accurate results. Such centralised meth
ods prove to be dicult to scale to large decentralised networks
where global information is inaccessible. Furthermore, blockchain
networks are dynamic in the sense that nodes and links frequently
come and go, and communities evolve continuously. In coping with
such dynamic networks, classical approaches such as snapshots
analysis may be ineective, as snapshots may not smoothly track
community evolution across dierent time stamps [
19
]. All the
above pose new challenges and hence call for a novel approach
for community detection. To attempt at designing a community
detection method for blockchain, the following issues are needed
to be settled:
•
(Decentralisation) The method is capable to work on the
decentralised network without the requirement of massive
information exchanging. Instead of detecting communities
using the entire network data, one would need to delegate
the problem to many individual processors, who have only
access to local data.
•
(Dynamisation) The method is adaptive to the changes to
the network. Namely, when the network undergoes continu
ous changes, the results of the algorithm would be updated
eciently [6].
Contribution.
In light of these requirements, we propose a dis
tributed community detection for blockchain networks based on
the ProposeSelectAdjust (
PSA
) framework, which is a distributed
community detection framework with asynchronous runs [
12
]. In
the
PSA
framework, each node only requires its local information
provided by the neighbourhood and functions without synchroni
sation from a central controller. The contributions of this paper are
listed as follows:
•
We propose a distributed community detection method for
blockchain networks based on the
PSA
framework. This
framework will settle the decentralisation and dynamisation
problem above.
•
We extend the original formulation of
PSA
framework [
11
]
by utilizing the concept of local structural entropy [
9
]. This
metric evaluates the amount of information that lies within
a subnetwork and is an appropriate guiding factor for com
munity detection.
•
To test our algorithm, we conduct experiments on both
benchmark networks for community detection and bitcoin
trust networks. Results show that our algorithm produces
reasonable community structures.
The rest of this paper is organised as follows: In Sec. 2, we give
some preliminaries on the community detection problem, struc
tural entropy, and
PSA
framework, which provide the necessary
background for the introduction of our distributed algorithm in
Sec. 3. Next, in Sec. 4 we show the detailed mechanism of how the
entropybased
PSA
system functions to detect communities in a
distributed asynchronous manner. Sec. 5 shows the experimental
results of our algorithm. Finally, the paper ends with conclusions
and outlook in Sec. 6.
Related Work.
Community detection is one of the most im
portant questions in the study of complex networks. Conventional
methods for community detection include the clique percolation
algorithm which is agglomerative [
20
], Girvan and Newman’s
betweennessbased algorithm which is divisive [
5
], and approaches
that aim to maximize modularity [
18
]. All of these classical meth
ods are centralised approaches which requires the utilization of the
entire network structure. This has to be achieved by a central con
troller which would mean that the algorithms would be insucient
for the case of blockchain networks that described above.
Using techniques from community detection to solve problems
in blockchain, especially in the context of cryptocurrency, has at
tracted a lot of attention recently. For example, the authors in [
22
]
propose several algorithms to detect communities on bitcoin trans
action networks for predicting dierent addresses that belong to
the same user. Community detection has also shown its potential
in facilitating anomaly detection. For example, clustering methods
such as
k
means are used in [
21
] to detect and locate anomaly in
cryptocurrency transaction networks. Moreover, in industrial appli
cations, a blockchainbased dynamic community detection method
is proposed in [
23
] for detecting P2P botnets in the Internet of
things (IoT). Our work diers from these works in that we propose
a distributed asynchronous community detection algorithm which
provides a solution framework for a series of problem domains in
the blockchainbased applications above.
This work is also related to structural entropy. The notion aims
to extend information theory to networked data. However, the clas
sical information entropy pioneered by Shannon fails to support
analysis on communication networks since network topology is
not taken into consideration. To better analyse communication net
works from the perspective of information theory, Li et. al. in [
9
]
propose the notion of structural entropy and further dene multi
dimensional structural entropy in [
10
]. Their goal is to capture the
information that are inherent to a network structure. Their series
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
4
of work on structural entropy points out how structure structure
impacts information resistance and information uncertainty on
communication networks. Inspired by their work, we employ struc
tural entropy in our proposed distributed community detection
method, which is expected to nd out community structures with
low structural entropy.
Figure 1: Overview of the twolayer blockchain model.
2 PRELIMINARIES
2.1 Community Detection
We regard a complex network as an undirected unweighted graph
G=(V,E)
where
V,E
are the sets of nodes and edges, respectively.
We assume that
V
contains
n
elements
{
1
,
2
, . . . , n}
and abuse the
notation writing an undirected edge as
(u,v) ∈ V2
. For any node
u∈V
, the neighbourhood of
u
,
N(u)
, is the set
{v}∪{v∈V
(u,v) ∈ E}
that contains
v
and all neighbours of
v
. For a subset
C⊆V,N(C)denotes Ðu∈CN(u).
Very often complex network structures exhibit a special property
[
26
]. More precisely, nodes in the network can be partitioned into
clusters, with a high density of edges within each cluster but low
density of edges between these clusters [
5
]. Any graph with this
property is named a community structure where each cluster men
tioned above is called a community. A wellknown example is the
karate club network, as constructed out of the social relationship
between members of a karate club in an American university in the
1970s. As can be seen in Figure 2, the network contains two clearly
divided communities, which facilitates the prediction of real events
[29].
Formally, the community structure of a graph
G=(V,E)
refers
to a partition Cof the node set V, i.e.,
C={C1,C2, . . . , Ck}
Figure 2: The community structure of an university karate
club analyzed by Zachary. The two communities are in two
dierent colours.
such that each
Ci
for 1
≤i≤k
induces a connected subgraph,
called a community. The requirement for these
C1, . . . , Ck
is that
they have high intracluster density and low intercluster density,
as dened below.
Denition 1.
Let
C
be a clustering of a graph
G=(V,E)
, and
C∈ C
be a cluster. The intracluster density of
C
represents the
edge connectivity within C; it is dened as
δint(C)=E↾C
C(C − 1)/2,if C>1
and δint(C)=1if C=1.
Let
D1,D2, . . . , Dm
be all communities in
C
such that
C∩Di=∅
and
Di∩N(C),∅
for all 1
≤i≤m
. The intercluster density of
C
represents the edge connectivity between
C
and its neighbouring
clusters; it is computed as
δext(C)=C×N(C) \ C
C × (D1+· · · +Dm ) ,if m≥1
and δext(C)=0if m=1.
As networks vary greatly in the realworld, there has not been a
universally accepted formal denition of communities in networks.
On the other hand, most established denitions resort to the out
come of certain concrete algorithms. In general, an overarching
view of the community structure is that it reveals important global
information about the network structure. We will exploit this in
tuition to provide an informationtheoretic notion of community
structure in subsequent sections.
The community detection problem aims to compute the commu
nity structure
C
of a given graph
G=(V,E)
where each community
in Chas high intracluster density but a low intercluster density.
2.2 Structural Entropy
Shannon’s information entropy measures the inherent uncertainty
of a probabilistic distribution:
H(p1,· · · ,pn)=−
n
Õ
i=1
pilog2pi.
This metric is wellknown and widely used. Shannon’s theory on
entropy and the associated concept of noise, have provided rich
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
5
insights for information science and technology. It is certainly desir
able that this entropy measure can be applied to relational structures
such as complex networks. However, this famous metric does not
provide the appropriate tools for this purpose. The obstacle mainly
arises due to the fact that Shannon’s information entropy relies
on a predened probability distribution. However, a single proba
bility distribution would normally not be sucient to capture the
information content of a complex network. For example, consider
measuring the information entropy using the degree distribution
of nodes in a network as follows:
Denition 2.
The Shannon structural entropy of a graph G has the
following form: Let
G=(V,E)
be a connected graph where
V=n
and
E=m
. For each vertex
i
, let
di
be the degree of
i
in
G
, and let
pi=di/
2
m
. Then the vector
®
p=(p1,p2, . . . ,pn)
is the stationary
distribution of a random walk in
G
. By using this, we dene the
onedimensional structural entropy or positioning entropy of
G
by:
H(G)=H(®
p)=Hd1
2m, . . . , dn
2m=−
n
Õ
i=1
di
2m·log2
di
2m.
Two networks would have the same Shannon structural entropy
if they have the same degree distribution. However, this notion
would not be able to reect the existence of community struc
ture information, i.e., graphs with or without a clear community
structure may have the same degree distribution. Therefore for the
purpose of more rened network analysis, one would need a new
framework that expresses structural entropy.
The authors of [
10
] argue that, from the perspective of data com
munication, the structural entropy of a network should in principle
reect a sense of resistance over information passing. A network
structure with low structural entropy should mean lower uncer
tainty in terms of delivering a piece of information between two
nodes, and thus such structure would be preferred in terms of fa
cilitating communication. On the other hand, a higher structural
entropy should denote the fact that the network structure contains
“holes” [
1
] which forbids the passing of information. From social net
work theory, it is wellknown that such holes correspond to certain
closed clusters with few outside contacts. In this sense, a network
with a higher structural entropy should denote the existence of a
clear community structure.
Generally, given a clustering
C={C1,C2,· · · ,Ck}
over
G
, we
would need a notion of local entropy of a community
Cj
where
1
≤j≤k
. For any community
Cj
, let
Vj
denote the volume of the
community, i.e., the sum of degree of nodes in
Vj
, and
дj
denote the
number of edges with exactly one endpoint lying in community
Cj
.
Denition 3.
Let
G=(V,E)
be a graph where
V=n
and
E=m
,
assume that
C={C1,C2,· · · ,Ck}
is a community structure of
V
.
Dene the local entropy of Cjas:
HCj(G)=
Vj
2mH©«
dj
1
Vj
,· · · ,
dj
nj
Vjª®¬−дj
2mlog2
Vj
2m,
where
nj
is the number of nodes in community
Cj
,
dj
i
is the degree
of the
i
th node in
Cj
,
Vj
is the number of edges with both endpoints
in community
Cj
, and
дj
is the number of edges with exactly one
endpoint in community Cj.
Based on local entropy, we dene the structural entropy as follow.
Denition 4
([
10
])
.
The structural entropy of a community structure
C
of a graph
G
is dened by summing the local entropy of all
communities:
HC(G)=
k
Õ
j=1
HCj(G)
=−
k
Õ
j=1
Vj
2m
nj
Õ
i=1
dj
i
Vj
log2
dj
i
Vj
−
k
Õ
j=1
дi
2mlog2
Vj
2m.
According to the denition of structural entropy, a community
structure
C={C1,C2. . . , Ck}
leads to a low entropy if every
Ci∈
C
has a high intracluster density and low intercluster density,
which is consistent with the intuition of a community structure.
We therefore present the following formal denition of an optimal
community structure.
Denition 5
([
10
])
.
Given a network
G=(V,E)
, the inherent
community structure of
G
is a community structure
C∗
with the
minimum structural entropy, i.e.,
C∗=arg min
CHC(G).
The inherent community structure is thus the goal of our com
munity detection problem. In our entropybased distributed com
munity detection framework to be introduced below, we iteratively
rene the clustering of a network in order to continuously reduce
the structural entropy.
3 THE ProposeSelectAdjust FRAMEWORK
In [
11
], the authors put forward a framework that is appropriate
for decentralised processing of graphs. Namely, the
Propose

Select

Adjust (PSA)framework describes a general scheme that involves
a network to selforganise into a community structure.
The scheme treats each node in the network as an independent
processing unit, called an agent. The framework is based on the con
sensus formation process in a multiagent system. Imagine a group
of networked agent whose collective goal is to decide on dividing
the group into several community, where every agent belongs to
one and only one partition. Imagine further a setup where each
agent only sees local information about her own connections. An
agent may communicate with other through the connections, but
no knowledge is shared among all agents. In this way, an individual
can only make egocentric judgements. The
PSA
framework aims
to ensure that the agents in this context still arrive at a consensus
through repeated interactions. Each agent performs the following
three tasks:
(1) Propose
: Each agent individually computes a list of agents
and sends an invitation to every node on the list in the hope
of forming a community.
(2) Select
: An agent may also receive invitations from other
agents. After the invitations are received, the agent evaluates
the quality of each proposed community, and chooses to
accept the more favourable invitation.
(3) Adjust
: Once an agent accepts an invitation, she then deliber
ates her own community according to the accepted proposal.
After every individual nishes this step, the whole group
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
6
would have been divided into a number of communities, and
thus a community is formed.
The resulting community structure may not be optimal. In this
case, the agents should start another round of the three steps above,
as illustrated in Fig. 3.
Figure 3: The ProposeSelectAdjust Framework.
We now formally dene a cell in the
PSA
framework. The set
Σ
in the denition is the symbol set {p,s,a}.
Denition 6.
[
11
] Let
G=(V,E)
be a network and
v
be a node in
V. A PSAcell dened on the node vis a tuple
Mv=(P,S,A,Q,(δσ,v)σ∈Σ,Fv)where
• P,S,A
are nite sets of proposals,selections and solutions,
respectively; Qis a nite set of control states
•δp,v:(A × P × Q)N(v)  → P × Qis the propose function
•δs,v:(P × S × Q)N(v) → S × Qis select function
•δa,v:(S × A × Q)N(v) → A × Qis adjust function
•Fv:Σ×QN(v) →Qis the changestep functions.
As a general algorithmic framework, the denition above does
not pinpoint the exact meaning of proposals, selections, and solu
tions but rather introduce them generically as sets. A
PSA
cell runs
by iteratively switching its states. The state of a
PSA
cell is denoted
as
(σ,P(v),S(v),A(v),q) ∈ Σ× P × S × A × Q
where
P(v),S(v),A(v)
are the current proposal,selection and solution
of the cell
v
, respectively,
q∈Q
and
σ∈Σ
is called the step of
v
, which denotes one of propose, select and adjust, respectively.
The cell
v
applies the three transition functions
δp,v,δs,v
and
δa,v
repeatedly in turn to the appropriate components of its current
state:
(1)
The cell
v
starts in step pand applies the propose function
δp,v
to compute a proposal
P(v) ∈ P
, according to current
solutions and proposals (and control states) of cells in its
neighbourhood N(v).
(2)
Then
v
moves to step sand applies the select function
δs,v
to compute a selection
S(v)∈S
, according to proposals and
selections (and control states) of cells in N(v).
(3)
Then
v
moves to step aand applies the adjust function
δa,v
to update its candidate solution
A(v)
, according to selections
and solutions (and control states) of cells in
N(v)
. The cell
then repeats the above cycle.
Formally, a
PSA
cell
v
in state
σ
of the computation may change
to the next state
σ′
, where
(σ,σ′) ∈ {(
p
,
s
),(
s
,
a
),(
a
,
p
)}
, by per
forming a transition
(σ,P(v),S(v),A(v),q) 7→ (σ′,P(v),S(v),A(v),q′)
only if
Fv(σ,q1, . . . ,qk)=q′
where
q1, . . . , qk
are the current con
trol states of cells in
N(v)
. We now formally introduce a
PSA
system.
Denition 7. •APSAsystem Γis dened by
Γ=(V,E,(Mv)v∈V,P,S,A,Q,p0,s0,a0,q0)
where
(V,E)
is a network, for each node
v∈V
,
Mv
is a
PSA

cell with proposal set, selection set, solution set and control
states
P,S,A,Q
, respectively,
p0∈ P
,
s0∈ S
,
a0∈ A
and
q0∈Q.
•The initial state if any cell in the PSAsystem is
(p,p0,s0,a0,q0).
•Aconguration of the PSAsystem is dened as a function
c:V→Σ× P × S × A × Q.
•
Arun of the
PSA
system is a sequence
c0,c1, . . .
of congu
rations in which each cell cycles through the processes of
propose, select and adjust as describe above.
Let
A(c,v)
denote the solution in the state
c(v)
for a node
v∈V
.
A run
ρ=c0,c1, . . .
is stablising if there is some
i≥
0such that
∀j≥i∀v∈V
:
A(ci,v)=A(cj,v)
. Stablising runs are important
for a
PSA
as the limit would dene the outcome of the system.
Furthermore, it is desirable to have a
PSA
system all of whose runs
are stablising and have the same limit. This would then dene the
outcome of the system. In the subsequent section, we present an
instantiation of the
PSA
system all of whose runs are stablising and
have the same limit.
4 COMMUNITY DETECTION WITH A
ENTROPYBASED PSASYSTEM
Our algorithm combines the
PSA
system as introduced above with
structural entropy. To instantiate the
PSA
system, we need to spec
ify the dierent ingredients in the denitions above.
We describe the proposal set
P
and the solution set
A
for
Γ
.
The proposal set relies on the notion of a tendency tree. A tree is
a tuple
(T,f)
where
T
is a set of nodes and
f:T→T
is an edge
function such that
f(r)=r
for exactly one node (the root)
r∈T
and ∃i>0 : fi(u)=rfor all other nodes u,r.
Denition 8.
Atendency tree of a graph
(V,E)
is a tree
(T,f)
where T⊆Vand for any u,v∈T,f(u)=vimplies (u,v) ∈ E.
The proposal set
P
and solution set
S
of the
PSA
system
Γ
is
the set of all tendency trees of
(V,E)
where the initial proposal
and solution are both the empty tree. The selection set
S
is
V∪
{null}
where the initial selection is
null
. To present our distributed
algorithm, we illustrate our algorithm on a classical benchmark
network, namely, the karate club network.
In the design of [
11
], a
PSA
cell runs a number of rounds of the
proposeselectadjust cycle, which are indexed by 1
,
2
,
3
, . . .
and
ω
.
In particular, there are three dierent stages of rounds where a cell
would perform dierent operations. These are round1, round
i>
1,
and round
ω
, as illustrated in Fig 5. We use the control states in
Q
to ag the current stage of a cell. We now describe each of these
stages in detail.
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
7
Figure 4: The initial conguration of the karate club net
work.
Figure 5: The general computation ow of a cell in the PSA
system Γ.
4.1 Stage 1
Propose.
The propose step is for an individual cell to initiate the
formation of a community. Here we use the notion of core centrality
of nodes [
2
], which has been a classical tool in social network
analysis. The intuition of a core resembles degree, i.e., the number
of edges adjacent to a node. Unlike degree, this notion considers not
only the local information of a node, but rather a set of nodes. More
precisely, a
k
core is an induced subgraph of
G
where all nodes have
degree at least
k
. The core centrality of a node
v
is the largest
k
core
that contains v.
Denition 9. For v∈V, the local core of vis the set
K(v)={u∈N(v)  N(u) ∩ N(v) ≥ κ(v)+1}.
In the propose step, a cell
Mv
would set a tendency tree dened
on the local
K(v)
as the proposal
P(v)
. This proposal will be sent
to all nodes in
K(v)
. The proposals for the nodes in the karate club
network are shown in Figure 6.
Select.
The cell
v
waits for all its neighbouring cells
N(v)
to
make a proposal and collects all received proposals, i.e.,
I(v)={u∈N(v)  v∈K(u)}.
Note that every cell in
I(v)
has proposed
v
to join with itself. To
make a selection,
v
picks the most “favourable” proposal among
all proposals
P(u)
where
u∈I(v)
. For the next denition, we say
that a
C
consistent set for a community structure
C
is a union of
Figure 6: The proposal P(v)made by a node vin the karate
club network in Stage 1 consists of nodes in the local core
K(v). Selfloops in the tendency trees are omitted. E.g. the
tendency tree of 0 contains nodes 0,1,2,3,7.
communities in
C
. In the original
PSA
system, the authors dene
the preference relation using directly intracluster density and inter
cluster density [
11
]. In this paper, on the other hand, we adopt a
dierent preference relation using the concept of local structural
entropy introduced in Sec. 2.
Denition 10.
Let
C
be a community structure of the graph
G=
(V,E)
. We dene the preference relation
⪯
on all
C
consistent sets
such that C≺C′if
(1) HC(G)>HC′(G); or
(2) HC(G)=HC′(G)and C<C′.
The node
v
then selects the node
S(v)=u∈I(v)
whose proposal
is the most preferred. For example, Figure 7 displays the selections
of each node in the karate club network.
Figure 7: The selection of each node is labeled by an arrow.
Selfselections are omitted.
Adjust.
After all cells in
N(v)
have made a selection, the cell
v
then updates its solution
A(v)
to include all cells who have selected
v
, i.e., set
A(v)={u∈N(v)  S(u)=v}
. After this step, all cells
would have declared a community
A(v)
; all these communities form
a community structure
C0
of the graph. Fig. 8 shows the resulting
community structure in the karate club network.
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
8
4.2 Stage 2
After the communities are formed in Stage 1, cells in the same com
munity bind into a single “meta”cell and start to make collective
decisions. To further reduce the structural entropy of the resulting
community structure, the proposeselectadjust procedures are re
peated several times. Here the operations are dierent from Stage
1 as the cells are no longer individual nodes, but rather they are
communities formed in the previous round. In this stage, each cell
aims to lower the local structural entropy of its community until
no improvement can be achieved. We present the intuitive ideas:
Propose.
Recall that
(T,f)
is the current tendency forest
A(v)
of
v
. The cell
v
examines all
u∈K(v)
and compares the current solu
tion
A(v)
with the union of
A(v)
and
A(u)
. The new proposal
P(v)
contains the union of
A(v)
and
A(u)
that has the highest preference.
This proposal is then propagated to the root of
T
, who makes a
proposal based on proposals of its children and passes it down to
all cells in
T
. In this way, all
v∈T
will produce the same proposal.
Select.
We say that a community
C⊆ C0
receives a proposal
P(u) ∈ P
, if
P(u)
contains
C
. Through propagation of information,
the community of
v
examines all proposals it receives and chooses
the proposal with the highest preference.
Adjust.
For every community
C∈ C0
, we dene a tendency
cluster τ(C) ∈ C. There are two cases:
•
Case 1: every cell
x∈C
selects its parent
f(x)
in its current
tendency tree; in this case the community set τ(C)=C.
•
Case 2: Some cell
x∈C
selects a node
u<C
; in this case the
community
C
has decided to join with the community
C′
that contains u, and we set τ(C)=C′.
Note that whenever
τ(C),C
,
τ(C)
always has higher utility vector
than
C
. Hence for every
C∈ C0
, there is some
j∈N
and
D∈ Ci−1
such that D=τj(C)=τj+1(C). We call Dthe sink of C.
We dene the community structure
C1
as follows:
C,C′∈ C0
bind into the same community in
C1
whenever they have the same
sink. Each node
v
then adjusts its solution
A(v)
to its new commu
nity in
C1
. For this to happen, the tendency tree of
A(v)
may need
to be changed so that it is linked with another community in
C0
.
The resulting tendency tree’s root would be the root of its sink.
After all cells in
N(v)
update their solutions,
v
then moves back
to step pand starts another round. The node
v
moves on to Stage
3 when no change occurs to
A(v)
after step a. In the karate club
example, the community structures
C0
and
C1
are the same, so
every node directly moves on to Stage 3 after one round in Stage 2.
4.3 Stage 3
The community structured obtained after Stage 2 has in a certain
sense reached local optimality in the sense that they cannot achieve
a lower structural entropy if combined with any neighbouring com
munities. However, at this point, the resulting community struc
tures may still be quite distinct from the desired outcome. They are
normally too small to reveal any global structure of the network.
In realworld networks, communities tend to combine several such
communities (e.g. the two communities in Fig. 2 are formed by
combining several communities in Fig. 8). Hence we use Stage 3 to
nd such tendency and obtain the nal clustering of the network.
Figure 8: The resulting clustering in Zachary’s karate club
after the rst round. Nodes belong to the same community
have circled in dash line. An isolated nodes is a community
comprises itself
Figure 9: The nal outcome of the PSAsystem on the karate
club network. The τ(C)of each community Cis shown with
an arrow. Selfloops are omitted from the diagram. The two
sinks are the community of node 0 and the community of
node 33. Hence the resulting community structure Cωcoin
cides with the ground truth community structure (Figure 2)
as in Zachary’s original work [29].
Stage 3 is performed similarly to Stage 2, except that when a
community
C∈ C∗
generates proposals, it would send a proposal to
every node in its neighbourhood. The clustering
Cω
is the outcome
of the
PSA
system and contains all the communities identied in
this network. Figure 9 shows the result on the karate club network,
which matches the groundtruth community structure in Fig. 2. This
shows that the
PSA
system correctly detected the communities in
the karate club dataset. Fig. 9 shows tendency of each community
C∈ C∗in the karate club network.
5 EXPERIMENTAL RESULTS
5.1 Benchmark Networks
Besides the karate club network, we also run our
PSA
system on two
networks with ground truth community structure, dolphin social
network and American college football league network, which
are commonly used as benchmarks for community detection. The
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
9
dolphin network was generated from observations of a group of
62 bottlenose dolphins. Nodes represent the dolphins, and edges
represent associations between dolphin pairs occurring more often
than expected by chance [
13
]. The American college football league
network contains the network of American football games between
Division IA colleges during regular season Fall 2000 [
17
]. Nodes
and edges represent teams and matches, respectively. There are 10
conferences and 8 independent teams in 2000.
Fig. 10 shows the resulting communities in the Doubtful sound
bottlenose dolphin network, where edges are social interactions
between dolphins. Fig. 11 shows the resulting communities in the
American college football league network, where our algorithm
accurately revealed conferences in the league.
Figure 10: Running our algorithm reveals four communi
ties in the dolphin social network. Our algorithm revealed
four communities which matches the ground truth to a
high precision. (Top: The ground truth community structure
[13]. Bottom: The community structure detected by our al
gorithm.)
5.2 Bitcoin Trust Networks
A network is dynamic when it undergoes continuous changes
such as adding/deletion of edges. We next conduct experiments on
two dynamic bitcoin trust networks, which we call
BITCOIN1
and
BITCOIN2
[
7
,
8
], respectively. A bitcoin trust network records who
trustswhom relationships of people who trade using bitcoin on a
platform called bitcoin OTC, where users are anonymous. Nodes
in bitcoin trust networks represent users and if user
i
trusts user
j
, there is an edge between
i
and
j
. Each edge is assigned with a
timestamp, which represents the exact time when the correspond
ing edge occurs. The statistics of two bitcoin trust networks at the
last timestamp are summarised in Table 1. To conduct simulations,
we speed up the evolution of networks by 100 times, namely, every
second in the experiment we append new edges occurring during
an interval of 100s in reality.
Figure 11: Running our algorithm reveals several communi
ties in the American college football league, which match
the actual conferences to a high precision. (Top: The ground
truth community structure, where grey nodes indicate inde
pendent teams. Bottom: The community structure detected
by our algorithm.)
We apply both the original
PSA
algorithms and our structural
entropybased PSA algorithm to two bitcoin networks. To demon
strate the outcome of our entropybased algorithm, we select four
timestamps and show the evolution of the community structure in
BITCOIN2 (see Fig. 12).
We measure the quality of the resulting clusterings using two
quality functions: let
C={C1, . . . , Cℓ}
be the clustering obtained
by the algorithm on the bitcoin network at the last timestamp.
1) Modularity: This widely used quality function measures the pro
portion of incluster edges taking into account the expected pro
portion; it is dened as
Mod(C) =
ℓ
Õ
i=1"E↾Ci
E−d2
i
4E2#
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
10
Figure 12: The evolution of the community structure of BITCOIN2 output by our algorithm. The category 15 refers to the iso
lated nodes at corresponding timestamp. Our algorithm detectes 15 stable communities at the last timestamp, which represents
15 trust groups over bitcoin trading on the OTC platform.
where diis the sum of degrees of nodes in Ci[4].
2) Performance: This function counts edges within communities
and pairs of nodes that are not linked by edges between dierent
communities (the “correctly interpreted pairs”) [
4
]; it is dened as
X∪Y
(V − 1)V,
where
X={(u,v) ∈ EC(u)=C(v)}
,
Y={(u,v)<EC(u),C(v)}
and C(x)is the community of x.
Fig. 13 shows the results of the quality of the resulting clusterings
using two metrics introduced above. Moreover, we also compute the
structural entropy of each bitcoin network under the corresponding
obtained clustering structure at the last timestamp. Outcomes of
experiments on performance and structural entropy are illustrated
in Fig. 14.
Table 1: Statistics of two bitcoin trust networks (last times
tamp)
Nodes Edges Largest Component
BITCOIN1 5,881 35,592 5,876
BITCOIN2 3,783 24,186 3,775
Figure 13: Results of modularity and performance by PSA
in [11] and the entropybased PSA on two bitcoin trust net
works.
Figure 14: Results of structure entropy by PSA in [11] and
the entropybased PSA on two bitcoin trust networks.
Experimental results reveal that entropybased
PSA
detects a
community structure with slightly lower modularity and consider
able performance compared to the original
PSA
algorithm, but
reduces the structural entropy of the output community struc
ture. This implies that blockchain networks logically partitioned
by entropybased
PSA
tend to have a lower resistance and uncer
tainty over information diusion. As a result, the eciency of the
blockchain system undergoes an enhancement.
6 CONCLUSIONS AND OUTLOOK
In this paper, we propose a distributed asynchronous community
detection algorithm based on
PSA
framework for blockchain net
works. Using structural entropy as the selecting criteria, the algo
rithm can detect community structure with low structural entropy.
Experimental results show that the algorithm works well on both
benchmark networks and bitcoin networks. Our work provides a
solution framework for categories of problems in blockchainbased
applications.
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
11
Future research directions that have yet to be explored include:
(a) Extend the current framework to detect overlapping communi
ties in blockchain networks; (b) Explore more application scenarios
where our proposed algorithm can be applied; (c) Embed our dis
tributed community detection algorithm into the fundamental layer
of blockchain technology to support the applications in higher lay
ers.
REFERENCES
[1]
Ronald S Burt. 2004. Structural Holes and Good Ideas. Amer. J. Sociology 110, 2
(2004), 349–399.
[2]
Kousik Das, Sovan Samanta, and Madhumangal Pal. 2018. Study on centrality
measures in social networks: a survey. Social Network Analysis and Mining 8, 1
(2018), 13.
[3]
Michael Fleder, Michael S Kester, and Sudeep Pillai. 2015. Bitcoin transaction
graph analysis. arXiv preprint arXiv:1502.01657 (2015).
[4]
Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 35
(2010), 75–174.
[5]
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and
biological networks. Proceedings of the national academy of sciences 99, 12 (2002),
7821–7826.
[6]
Bakhadyr Khoussainov, Jiamou Liu, and Imran Khaliq. 2009. A dynamic algo
rithm for reachability games played on trees. In International Symposium on
Mathematical Foundations of Computer Science. Springer, 477–488.
[7]
Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and
VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms.
In Proceedings of the Eleventh ACM International Conference on Web Search and
Data Mining. ACM, 333–341.
[8]
Srijan Kumar, Francesca Spezzano, VS Subrahmanian, and Christos Faloutsos.
2016. Edge weight prediction in weighted signed networks. In Data Mining
(ICDM), 2016 IEEE 16th International Conference on. IEEE, 221–230.
[9]
Angsheng Li and Yicheng Pan. 2016. Structural information and dynamical
complexity of networks. IEEE Transactions on Information Theory 62, 6 (2016),
3290–3339.
[10]
Angsheng Li and Yicheng Pan. 2018. Structure Entropy and Resistor Graphs.
arXiv preprint arXiv:1801.03404 (2018).
[11]
Jiamou Liu and Ziheng Wei. 2014. Community detection based on graph dynam
ical systems with asynchronous runs. In Computing and Networking (CANDAR),
2014 Second International Symposium on. IEEE, 463–469.
[12]
Jiamou Liu and Ziheng Wei. 2014. From a local to a global perspective of commu
nity detection in networks. In Pacic Rim International Conference on Articial
Intelligence. Springer, 1036–1049.
[13]
David Lusseau and Mark EJ Newman. 2004. Identifying the role that animals play
in their social networks. Proceedings of the Royal Society of London B: Biological
Sciences 271, Suppl 6 (2004), S477–S481.
[14]
Anastasia Moskvina and Jiamou Liu. 2016. Integrating networks of equipotent
nodes. In International Conference on Computational Social Networks. Springer,
39–50.
[15]
Anastasia Moskvina and Jiamou Liu. 2016. Togetherness: an algorithmic approach
to network integration. In 2016 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM). IEEE, 223–230.
[16]
Satoshi Nakamoto. 2008. Bitcoin: A peertopeer electronic cash system. (2008).
[17]
Mark EJ Newman. 2004. Fast algorithm for detecting community structure in
networks. Physical review E 69, 6 (2004), 066133.
[18]
Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community
structure in networks. Physical review E 69, 2 (2004), 026113.
[19]
Gergely Palla, AlbertLászló Barabási, and Tamás Vicsek. 2007. Quantifying
social group evolution. Nature 446, 7136 (2007), 664.
[20]
Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. 2005. Uncovering
the overlapping community structure of complex networks in nature and society.
Nature 435, 7043 (2005), 814.
[21]
Thai Pham and Steven Lee. 2016. Anomaly detection in bitcoin network using
unsupervised learning methods. arXiv preprint arXiv:1611.03941 (2016).
[22]
Cazabet Remy, Baccour Rym, and Latapy Matthieu. 2017. Tracking bitcoin users
activity using community detection on a network of weak signals. In International
Workshop on Complex Networks and their Applications. Springer, 166–177.
[23]
Gokhan Sagirlar, Barbara Carminati, and Elena Ferrari. 2018. AutoBotCatcher:
Blockchainbased P2P Botnet Detection for the Internet of Things. In 2018 IEEE
4th International Conference on Collaboration and Internet Computing (CIC). IEEE,
1–8.
[24]
David Shrier, Weige Wu, and Alex Pentland. 2016. Blockchain & infrastructure
(identity, data security). Massachusetts Institute of TechnologyConnection Science
1, 3 (2016).
[25] Melanie Swan. 2015. Blockchain: Blueprint for a new economy. " O’Reilly Media,
Inc.".
[26]
Duncan J Watts. 2004. Small worlds: the dynamics of networks between order and
randomness. Vol. 9. Princeton university press.
[27]
Xiwei Xu, Cesare Pautasso, Liming Zhu, Vincent Gramoli, Alexander Ponomarev,
An Binh Tran, and Shiping Chen. 2016. The blockchain as a software connector.
In 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA). IEEE,
182–191.
[28]
Bo Yan, Yang Chen, and Jiamou Liu. 2017. Dynamic relationship building: ex
ploitation versus exploration on a social network. In International Conference on
Web Information Systems Engineering. Springer, 75–90.
[29]
Wayne W Zachary. 1977. An information ow model for conict and ssion in
small groups. Journal of anthropological research 33, 4 (1977), 452–473.
Session 1: Blockchain Consensus, Framework, and Architecture
BSCI’19, July 8, 2019, Auckland, New Zealand
12