ArticlePDF Available

Abstract and Figures

In many applications we are required to locate the most prominent group of vertices in a complex network. Group Betweenness Centrality can be used to evaluate the prominence of a group of vertices. Evaluating the Betweenness of every possible group in order to find the most prominent is not computationally feasible for large networks. In this paper we present two algorithms for finding the most prominent group. The first algorithm is based on heuristic search and the second is based on iterative greedy choice of vertices. The algorithms were evaluated on random and scale-free networks. Empirical evaluation suggests that the greedy algorithm results were negligibly below the optimal result. In addition, both algorithms performed better on scale-free networks: heuristic search was faster and the greedy algorithm produced more accurate results. The greedy algorithm was applied for optimizing deployment of intrusion detection devices on network service provider infrastructure.
Content may be subject to copyright.
1
Finding the Most Prominent Group in
Complex Networks
Rami Puzis a,∗∗, Yuval Elovici band
Shlomi Dolev c
aDeutsche Telekom Laboratories at
Ben-Gurion University of the Negev
e-mail: puzis@bgu.ac.il
bDeutsche Telekom Laboratories at
Ben-Gurion University of the Negev
Tel: 972-8-647-7551
e-mail: elovici@inter.net.il
cDepartment of Computer Science
Ben-Gurion University of the Negev
Tel: 972-8-647-2715
e-mail: dolev@cs.bgu.ac.il
In many applications we are required to locate the
most prominent group of vertices in a complex net-
work. Group Betweenness Centrality can be used to
evaluate the prominence of a group of vertices. Eval-
uating the Betweenness of every possible group in or-
der to find the most prominent is not computationally
feasible for large networks. In this paper we present
two algorithms for finding the most prominent group.
The first algorithm is based on heuristic search and
the second is based on iterative greedy choice of ver-
tices. The algorithms were evaluated on random and
scale-free networks. Empirical evaluation suggests that
the greedy algorithm results were negligibly below
the optimal result. In addition, both algorithms per-
formed better on scale-free networks: heuristic search
was faster and the greedy algorithm produced more
accurate results. The greedy algorithm was applied for
optimizing deployment of intrusion detection devices
on network service provider infrastructure.
Keywords: Complex Networks, Group Betweenness
Centrality, Heuristic Search
1. Introduction
Complex networks are used to study the struc-
ture and dynamics of complex systems in various
*Supported by Deutsche Telekom.
**Corresponding author: Rami Puzis, puzis@bgu.ac.il
disciplines [1]. For example, social networks [2,3],
protein interactions networks [4], and computer
networks such as the Internet [5,6] are all classi-
fied as complex networks. In social networks ver-
tices are usually individuals and edges characterize
the relations between them; in computer networks,
vertices might be routers connected to each other
through communication lines.
Many naturally evolved complex networks are
characterized by a Scale-Free (SF) structure [7,8]
and in particular the power-law distribution [5] of
Connectivity Degree. Scale-free networks have a
few vertices with a very high degree of connec-
tivity [9]. While a power-law degree distribution
is the main characteristic of scale-free networks,
many networks are also characterized by a power-
law Betweenness distribution [10].
Identification of the most central group of ver-
tices in a network is an important issue from both
theoretical and practical points of view. For exam-
ple, Ballester et al. state in [11] the importance of
finding the key group in a criminal network. Bor-
gatti elaborates in [12] on a Key Player Problem
(KPP) that is strongly related to the cohesion of
a network. The author defines two problems KPP-
Pos and KPP-Neg. The solution of the first prob-
lem is a group maximally connected to all other
vertices in a graph and the solution of the second is
a group maximally disrupting the network. In this
paper we define a similar optimization problem:
find the group that has maximal potential to con-
trol traffic in communication networks. Analogous
to [12] we will call this problem KPP-Com.
The potential to control traffic is typically at-
tributed in literature to vertices with high Short-
est Path Betweenness Centrality (BC) [13,14,15].
In fact, BC of vertices was used in [16] to define a
congestion-free routing strategy. Everett and Bor-
gatti [17] defined Group Betweenness Centrality
(GBC ) as a natural extension of the Betweenness
measure. GBC is used to compute the prominence
of groups of vertices in complex networks. Free-
AI Communications
ISSN 0921-7126, IOS Press. All rights reserved
2 R. Puzis et al. / Finding the most prominent group in SF networks
man [13] has defined the Group Betweenness Cen-
tralization index as a measure of homogeneity of
the members’ Betweenness. In this paper we use
the GBC definition of Everett and Borgatti and
our main goal is to find the group with the highest
GBC .
In many applications we are required to lo-
cate the most prominent group. One possible ap-
plication is optimizing deployment of intrusion
detection devices in a wide area network. Self-
propagating malicious code (computer viruses and
worms) poses a significant threat to Internet users.
Only some of the users connected to the Inter-
net are protected by updated anti virus tools. One
way to prevent users from being infected by such
threats is to clean the traffic at the level of Net-
work Service Provider (NSP). The NSP traffic can
be monitored and cleaned by a Distributed Net-
work Intrusion Detection System (DNIDS) that
may be deployed on the NSP routers/links [18].
It is not realistic to inspect the traffic of all the
routers/links of the NSP. Theoretical models sug-
gest protecting the most significant routers/links
in order to slow down the threat propagation in the
entire network [19]. Previous research has shown
that NSP network infrastructure can be regarded
as a complex network [6] and it has scale-free prop-
erties [5,10]. Thus, the group of routers/links that
together have the highest influence on communi-
cation in the NSP infrastructure can be located by
finding the group with the highest GBC .
Park [20] exploited the scale-free structure of
computer networks to administer scalable deploy-
ment of systems for protection against Distributed
Denial of Service (DDoS) attacks and worm con-
tainment. The author suggests that deployment
based on Vertex Cover can be small enough and
provide a high level of protection for the entire net-
work. Small Vertex Cover can be found iteratively
by choosing vertices that are connected to the most
unprotected peers. In this way a network can be
protected from DDoS attacks with only 15% de-
ployment and efficient worm containment can be
provided by filters deployed on 4% of the vertices.
In contrast to scale-free computer networks, the
same deployment sizes provide much less protec-
tion when applied to random networks.
In this study we propose two algorithms to solve
the KPP-Com problem and we illustrate how the
solution of KPP-Com can be used to optimize the
deployment of DNIDS in a large network. In the
first algorithm we use heuristic search based on
Depth First Branch and Bound (DFBnB) when
using four different heuristic functions. In the sec-
ond algorithm we use a greedy approach in order
to speedup the search. The algorithms were eval-
uated on random and scale-free networks for vari-
ous group sizes. Our evaluation indicates that the
greedy algorithm’s results are very close to the op-
timal solution. We also discovered that the heuris-
tic search algorithm performed much faster when
applied to scale-free networks compared to random
networks. In addition, we found that the quality
of solutions produced by the iterative greedy algo-
rithm is higher when the algorithm is applied to
scale-free networks compared to random networks.
We concluded our evaluation by using the greedy
algorithm to optimize the deployment of DNIDS
on the infrastructure of NSP.
The rest of the paper is structured as follows: In
Section 2 we elaborate on the Group Betweenness
Centrality measure; in Section 3 we present our al-
gorithms. In Section 4 we present the evaluations
of the algorithms and in Section 5 their applica-
tion to network security; we conclude the paper
in Section 6. Notations and abbreviations used in
this paper are summarized in Appendix A.
2. Group Betweenness Centrality
In this study we want to find a group of vertices
that has maximal control over the traffic in a com-
munication network G= (V , E) with |V|=nver-
tices and |E|=medges. To simplify the discussion
we assume that Gis a connected, unweighted and
undirected graph. However, all proposed methods
are also applicable to networks that have no such
constraints.
Shortest Path Betweenness is considered to be
an accurate estimation of the information flow con-
trolled by a vertex in communication networks
that use shortest path routing [15,16]. However, in
computer networks traffic may diverge from short-
est paths due to load balancing. There are cen-
trality measures that are not designed for shortest
path routing. In particular, Flow Betweenness [21]
equally respects all paths while Random Walk Be-
tweenness [22] favors shorter paths over the longer
ones.
In this study we assume that every two ver-
tices in the network communicate with each other
R. Puzis et al. / Finding the most prominent group in SF networks 3
mainly through shortest paths. Therefore, we are
primarily interested in locating the group with
maximal Shortest Path Group Betweenness Cen-
trality (BC). We also assume that information sent
by sVto tVcan be inspected or modified by
s,tor any intermediate vertex that participates in
forwarding this information. The KPP-Com prob-
lem focuses on finding a group of vertices SV
of size kthat can inspect or modify most of the in-
formation flow in the network. Next we will briefly
present the definitions of Betweenness Centrality
(BC) and Group Betweenness Centrality (GBC).
All shortest path Betweenness measures are de-
fined as a summation over all pairs of vertices in
a graph. Let sand tbe two vertices in a graph.
Let σs,t be the total number of different shortest
paths that connect sand t. Let vbe a vertex that
lies on a shortest path between sand t. We de-
note the number of shortest paths from sto tthat
pass through vby σs,t(v). In order to determine
how many different paths from sto ttraverse vwe
multiply the number of shortest paths from sto v
(σs,v) by the number of shortest paths from vto t
(σv,t).
σs,t(v) = ½σs,v ·σv,t d(s, t) = d(s, v) + d(v, t)
0otherwise
where d(x, y) is the distance between vertices x
and y.
Betweenness Centrality (BC ) of vertex vV
represents the total influence that vhas on com-
munications between all possible pairs of vertices
in the graph. The Betweenness Centrality of vertex
vis
BC (v) = X
s,tV|s6=v6=tµσs,t(v)
σs,t
where the fraction σs,t(v)
σs,t represents the influence
of von the communication between sand t.
In an equivalent definition of BC, the distinction
constraint (s6=v6=t) in the above equation can
be replaced with (s6=t) if we define σx,x = 0. This
definition results in σs,t(t) = σs,t (s) = 0.
Note that in the original definition of BC , short-
est paths that start or end at vare not included in
the computation of BC (v). It is more convenient
to think that information originating at some ver-
tex is seen by this vertex and hence, should be
accounted for. We want to include shortest paths
that start or end at vin the computation of v’s Be-
tweenness. For this reason, in this study we define
σx,x = 1 which results in σs,t(t) = σs,t (s) = σs,t. A
similar variation of BC was previously mentioned
[10]. Defining σx,x = 1 results in addition of a con-
stant term (2·(n1)) to the BC of a single vertex
in a connected network.
BC of individual vertices can be naturally ex-
tended to the Betweenness Centrality of groups of
vertices [17]. Let SVbe a group of vertices.
GBC (S) stands for the total fraction of short-
est paths between all pairs of vertices that pass
through at least one member of the group S. Let
¨σs,t(S) be the number of shortest paths between
sand tthat traverse at least one member of the
group S. The Group Betweenness Centrality of
group Sis
GBC (S) = X
s,tV|s6=tµ¨σs,t(S)
σs,t
It can be proved by straightforward reduction
from the Minimal Vertex Cover problem [23] that
finding a group with maximal GBC is NP-hard.
Any group that controls all the information flows
in the network (in particular, communications of
adjacent vertices) must have one vertex on every
edge in the network and vise versa. One way to
cope with the complexity of the KPP-Com prob-
lem is to employ heuristic search. Alternatively,
a suboptimal polynomial time algorithm can be
used when the network and group become rela-
tively large.
During the search for the group with maximal
GBC , many groups are evaluated. The algorithm
presented in [24] computes the GBC of a given
group of vertices. The complexity of this algorithm
is O(k3), where kis the size of the group. The al-
gorithm requires preprocessing the complexity of
which is O(n3) where nis the size of the net-
work. Good heuristic search decreases the number
of groups that have to be evaluated. Nevertheless,
the number of groups evaluated during the search
is proportional to nk. Thus, when searching for
groups with size three or more, the time spent on
preprocessing is negligible compared to the time
spent on the search.
3. Solving the KPP-Com
3.1. The search space
In this section we will present a new algorithm
for finding the most influential group of vertices
4 R. Puzis et al. / Finding the most prominent group in SF networks
in a network using heuristic search. In order to
avoid confusing the search space with the original
input graph, we will use the terms “nodes” and
“transitions” to refer to parts of the search space,
and the terms “vertices” and “edges” to refer to
parts of the input graph.
Every node (denoted by C) maintains one group
of vertices (denoted by GM(C)) and a list of can-
didate vertices (denoted by CL(C)) that may join
the group. We will refer to CL(C) = (c1, c2, . . .) as
an ordered list of vertices. The order of vertices in
CL(C) may vary between the nodes. The methods
of ordering vertices in CL(C) will be described in
Section 3.3. We will refer to the first vertex in the
ordered list CL(C) by the notation vbest .
We will refer to the search space as a decision
tree. At every node we decide whether vbest is in-
cluded in the most influential group. According to
this decision we branch to two sub-trees, one con-
taining all the groups that include vbest and the
other containing all the groups that do not include
it. Thus we assure that all possible solutions can
be found during the search.
Every node has two children: the left child is
denoted by Cand the right child is denoted by
C+.GM(C+) contains all the vertices of GM(C),
additionally including vbest, and GM (C) is equal
to GM(C).
GM(C+) = GM (C)∪ {vbest }
GM(C) = GM (C)
The candidates lists of both children contain all
the vertices from the candidates list of their father
excluding vbest.
CL(C+) = C L(C) = C L(C)\ {vbest }
The root node of the tree represents an empty
group of vertices. Its candidates list contains all
the vertices in the input graph. Every child of the
root holds a candidates list that is missing one ver-
tex and a group that includes one vertex or none.
All nodes of the second level hold a candidates list
that is missing two vertices and a group that in-
cludes zero, one, or two vertices, etc.
Since we are looking for a group of size k, we will
refer to nodes with |GM (C)|=kas possible solu-
tions to our problem. We exclude from the search
space all sub-trees that do not contain a possible
solution. These can be either sub-trees rooted at
nodes with |GM (C)| ≥ kor sub-trees rooted at
nodes with |CL(C)|< k |GM (C)|. The minimal
(c, b)
{a}
+
+
++++
+
(a, b, c)
(b, c)
(b)
{a, c}
(b)
{a}
()
{a, b}
()
{a}
()
{a, c}
()
{a, b, c}
()
{b, c}
(c)
{b}
(c)
()
{c}
()
{b}
()
Fig. 1. Tree representing the KPP-Com search space for
n= 3 and k= 2. Dashed lines represent sub-trees that
do not contain possible solutions. The set in curly braces
{. . .}represents GM(C) and the list in parenthesis (. . .)
represents CL(C).
depth of a leaf in the tree is kwhile the maximal
depth of a leaf equals to the size of the input graph.
For example, let a,b, and cbe vertices of the in-
put graph (n= 3). Assume we are searching for a
group of size two (k= 2) that has maximal GBC.
Figure 1 presents an example of the search space.
The search should start at the root node Rwhose
candidates list is (a, b, c) and a=vbest. We con-
sider all groups where ais a member by moving to
the right child (following the “+” transition). R+
maintains the group {a}and a candidates list that
includes all vertices of the original graph, exclud-
ing the vertex a. It is possible that the order of
vertices in CL(R+) is different from the order of
vertices in CL(R). We consider all groups where
ais not a member by following the “” transi-
tion from the root node. Rmaintains an empty
group and candidates list that is identical to the
candidates list of R+.
3.2. The search algorithm
We use the Depth First Branch and Bound (DF-
BnB) algorithm [25] to search the tree for the
group with maximal GBC . DFBnB is known to
be effective when the depth of the search tree is
known, as in our case. DFBnB is similar to DFS
but it prunes nodes according to a global bound.
We begin the search with a bound equal to zero.
During the search the bound is equal to the max-
imal GBC found so far.
In contrast to the traditional search that is
aimed at finding the node with minimal cost, we
R. Puzis et al. / Finding the most prominent group in SF networks 5
are using a utility based approach. We define the
utility of the node Cas g(C) = GBC (GM (C)).
A heuristic function h(C) estimates the maximal
utility that can be gained by exploring the sub-
tree rooted at C.f(C) = g(C) + h(C) is a function
that estimates the maximal utility of nodes in C’s
sub-tree.
The utility of the root node Ris equal to zero be-
cause GM(R) = .h(R) is the upper bound on the
optimal solution. While searching down the tree,
g(C) will grow and h(C) will decrease. When the
algorithm finds a possible solution, h(C) is equal
to zero and f(C) is equal to GBC (GM (C)) where
|GM(C)|=k.
Pruning decisions made during the search are
based on the value of f(C) and the maximal GBC
found so far. We can guarantee that the heuris-
tic search will find the optimal solution only if the
function f(C) is an upper bound on the maximal
GBC that can be found within the sub-tree rooted
at C. If this upper bound is below the maximal
GBC found so far, the sub-tree is pruned, other-
wise it is explored in hope of finding a group with a
higher GBC . Heuristic functions used for pruning
nodes in the search tree are described in Section
3.4.
When visiting a node, beside the decision whether
to prune the current sub-tree, the algorithm should
also determine vbest. The following section de-
scribes two methods of ordering vertices in CL(C)
and determining vbest.
3.3. Order of vertices in the candidates list
Admissible heuristic functions guarantee that
during the search we will find the optimal solu-
tion (note that the search may take an exponential
time). The order of vertices in CL(C) is important
for choosing vbest and for computing admissible
pruning heuristics. The order of vertices in CL(C)
can be determined by either their individual Be-
tweenness or their contribution to Betweenness of
GM(C).
Algorithms presented in [26,27,28] compute in-
dividual BC for all vertices in the graph with time
complexity O(mn). We can sort vertices in CL(C)
by their individual BC where the first vertex is
the vertex with the highest BC. The computation
of BC and the sorting can be done once for the
entire search. Sorting vertices by their individual
BC will impose the same order on all candidates
lists in the tree.
Using individual BC is not the best way to order
the vertices CL(C). Assume that there is a vertex
in CL(C) with very high BC. Assume also that
most of the traffic covered by this vertex is already
covered by GM (C). In such case it is misleading
to use BC as a measure for the potential of this
vertex to help the group in covering more traffic.
For this reason we prefer to use the contribution
of vertices to GBC of GM (C) rather than their
individual Betweenness.
Let vVbe some vertex in the graph. Let
SVbe a group of vertices. We denote the contri-
bution of the vertex vto the GBC of Sby BC S(v).
BC S(v) = GBC(S∪ {v})GBC (S)
The algorithm presented in [24] is able to compute
the contribution of all vertices in the graph with
time complexity O(kn2). Time complexity of up-
dating the contribution of all vertices in the graph
when one vertex is added to the group is O(n2).
The actual contribution of vertices in CL(C) de-
pends on the members of GM (C). Therefore, the
contribution of vertices should be recalculated
each time the search algorithm moves to the right
child (by adding vbest to GM(C)). Recalculating
the contribution of vertices in CL(C) may change
their order.
3.4. Heuristic evaluation functions
We propose four different heuristic functions for
estimating the maximal gain in GBC that can be
acquired during exploration of the sub-tree rooted
at C. The functions differ in their computation
time and accuracy.
The first two functions, h1and h2, assume that
the candidates list CL(C) is ordered according to
the individual Betweenness of its members.
h1(C) = BC (c1)·(k− |GM (C)|)
In h1(C) we multiply the BC of the first vertex in
the candidates list by the number of vertices that
need to be added to GM (C) in order to construct
a group of size k. This function is a very rough up-
per bound. h1can be computed at O(1) since the
individual Betweenness of all vertices is calculated
once prior to the search.
h2(C) =
k−|GM(C)|
X
i=1
BC (ci)
6 R. Puzis et al. / Finding the most prominent group in SF networks
In h2(C), we sum the Betweenness of the k
|GM(C)|vertices with maximal individual Be-
tweenness in CL(C). The function h2(C) is a re-
finement of h1(C) because BC (c1) is higher than
or equal to BC (ci) for any i. Computational com-
plexity of h2is O(k).
The next two functions, h3and h4, assume that
the candidates list CL(C) is ordered according
to the contribution of its vertices to the GBC of
GM(C).
h3(C) = BC GM (C)(c1)·(k− |GM (C)|)
The function h3(C) is similar to the first function,
but in this case we locate the vertex in CL(C)
with the highest contribution to the GBC of the
group GM(C) and not with the highest individ-
ual Betweenness. Since the contribution of a ver-
tex is always smaller than its individual Between-
ness, h3is a refinement of h1. We should keep in
mind that h3includes the computation of the con-
tributions. The computational complexity of h3is
O(1) when performing the “” transitions. When
performing the “+” transitions there are two op-
tions: if |GM(C)|+ 1 <nthen the computa-
tional complexity of h3is O(k2n) otherwise it is
O(n2).
h4(C) =
k−|GM(C)|
X
i=1
BC GM (C)(ci)
The function h4(C) is similar to h2, but in this
case we order the vertices in CL(C) by their con-
tribution to the GBC of GM (C). This function is
more accurate than all the other functions, since it
is a refinement of both h2and h3. Computational
complexity of h4is O(k) when performing the “
transitions. When performing the “+” transitions
there are two options: if |GM(C)|+ 1 <nthen
the computational complexity of h4is O(k2n), oth-
erwise it is O(n2).
The above four heuristic functions can be used
for pruning sub-trees. If f(C) is below the maxi-
mal GBC found so far, the sub-tree rooted at C
will be pruned. Otherwise the algorithm will visit
the two children of Cby first following the “+”
transition and afterwards the “” transition. vbest
is chosen from the candidates list CL(C) accord-
ing to the heuristic evaluation function we use. For
h1and h2vbest is a candidate with highest Be-
tweenness centrality. For h3and h4vbest is a can-
didate with highest contribution to GBC of the
group represented by the current node.
3.5. Admissibility of pruning heuristics
We will prove that all of the heuristic functions
described in the previous subsection are admis-
sible. Since we are working with utility, we will
show that the heuristic function foverestimates
the GBC of any group in the sub-tree. We will
show that h2(C) is always lower than or equal to
h1(C) and that h4(C) is lower than or equal to
h2(C) and h3(C). In light of this partial order, it
is enough to show that f(C) = g(C) + h4(C) over-
estimates the maximal GBC that can be found
within the sub-tree rooted at C.
Let {BC (v)}vC L(C)be a sequence of Between-
ness values. Let {BCGM (C)(v)}vC L(C)be a se-
quence of contributions to the Group Betweenness
of GM(C). Let l=k− |GM (C)|be the num-
ber of vertices that are still missing in the group
represented by the node C. Contribution of a ver-
tex to the GBC of some group is always lower
than or equal to its individual Betweenness [24].
Summation of the lhighest contributions is lower
than or equal to the summation of the lhighest
Betweenness values of distinct vertices in CL(C),
thus h2(C)h4(C). h2and h4are summa-
tions of the lhighest elements in {BC(v)}vCL(C)
and {BC GM (C)(v)}vC L(C), respectively. h1and
h3are ltimes the highest element in the respec-
tive lists, therefore, h1(C)h2(C) and h3(C)
h4(C).
The following lemma is based on the concepts
described in [24].
Lemma 3.1 h4is an admissible heuristic function.
Proof: To prove that h4is admissible, we need to
prove that for any descendant C0of C:
g(C0)g(C)h4(C)
Let q=|GM(C0)|−|GM(C)|, let S={s1, . . . , sq}
be a subset of CL(C) such that GM(C0) =
GM(C)S, and let c1, . . . , c|C L(C)|be the elements
of CL(C) in order of decreasing B C GM (C)(ci). The
following five steps (explained below) complete the
proof:
g(C0)g(C) =
q
X
i=1
BC GM (C)∪{s1,...,si1}(si) (1)
q
X
i=1
BC GM (C)(si) (2)
R. Puzis et al. / Finding the most prominent group in SF networks 7
q
X
i=1
BC GM (C)(ci) (3)
k−|GM(C)|
X
i=1
BC GM (C)(ci) (4)
=h4(C) (5)
Steps (1) follows from the definition of g. Step
(2) follows from the fact that the joint contribution
of all vertices in Sis smaller than or equal to the
sum of their individual contributions. For any two
vertices si, sjSthe contribution of sito the
GBC of GM (C) is greater than or equal to its
contribution to the GBC of GM (C){sj}because
some of the shortest paths contributed by sito
GM(C) may also pass through sj. This implies
that for any superset S0of GM(C) (in particular
for S0=GM(C)∪ {s1, . . . , si1}) it holds that:
BC GM (C)(si)BCS0(si)
Step (3) follows from the fact that the ciare or-
dered from high to low BCGM (C). Hence, for any
set {s1, . . . , sl}it holds that:
l
X
i=1
BC GM (C)(si)
l
X
i=1
BC GM (C)(ci)
Step (4) follows from q < k − |GM (C)|and the
positiveness of BC values. Step (5) follows from
the definition of h4.
3.6. Suboptimal polynomial time greedy algorithm
for KPP-Com
A na¨ıve greedy algorithm for locating an influ-
ential group of vertices can choose kvertices with
the highest individual Betweenness. We will denote
this algorithm as T opK.kvertices with highest
individual Betweenness can be selected after sin-
gle execution of the algorithm that computes BC
for all vertices in the graph [28,26]. Therefore, the
complexity of the TopK algorithm is O(nm).
We hypothesize that the first solution encoun-
tered during DFBnB is close to the optimal solu-
tion. Thus, we define a simple iterative algorithm
that at every step during the search chooses the
next best candidate and stops after choosing k
vertices. We will denote this algorithm as ItrK .
The algorithm chooses the next best candidate
vCL(C) according to its contribution to GBC
of the group represented by the current node C
(BC GM (C)(v)). The complexity of the ItrK al-
gorithm is O(n3) and therefore it is significantly
faster than the exponential time heuristic search
(for all heuristic functions).
4. Evaluating the algorithms’ performance
In order to evaluate the algorithms we imple-
mented them in Python and executed the imple-
mentation on a PC P-Duo 3GHz machine with
2GB memory. Our first goal was to compare the ac-
curacy of the greedy algorithms (T opK and ItrK)
to DFBnB based on four different heuristic func-
tions. We conducted the comparison on random
networks (with equal probability to connect every
two vertices) and preferential-attachment [7] net-
works. All algorithms were used to find the most
prominent groups of sizes 1,2,...,10 on networks
of sizes 10,20, . . . , 500. The networks we consid-
ered were sparse, with the number of edges equal
to 1.3 times the number of vertices. We have evalu-
ated five random and five preferential-attachment
networks for each network size. Every execution of
the algorithms was bounded by ten minutes, there-
fore, larger networks were evaluated for smaller
groups only. The experiments have shown similar
behavior of the evaluated algorithms for all net-
work sizes. Here we choose to present the results
of the evaluations performed on networks of size
100.
Figures 2 and 3 compare the GBC of groups
found by T opK,I trK, and Heuristic Search al-
gorithms for random and preferential-attachment
networks, respectively. Similarly the running time
of various GBC maximization strategies is pre-
sented in Figures 4 and 5. The results of the evalu-
ation experiments suggest that results of the ItrK
algorithm are very close to the optimal solution
(see Figures 2 and 3) while its computation time
(see Figures 4 and 5) is low. The evaluation results
show that heuristic search works faster on scale-
free networks (compare the plots in Figures 4 and
5).
We compared the computation time of the I trK
algorithm, T opK and the heuristic search for four
heuristic functions. T opK and ItrK are much
faster than heuristic search. It is easy to see that
as the heuristic function is more accurate (h1
least accurate, h4most accurate), the computa-
tion time decreases even though the complexity of
8 R. Puzis et al. / Finding the most prominent group in SF networks
the heuristic function is increasing. Table 4 com-
pares the results and the execution time of various
algorithms.
Fig. 2. GBC for random network.
Fig. 3. GBC for scale-free network.
Fig. 4. GBC computation time for random network.
Fig. 5. GBC computation time for scale-free network.
5. Applying the greedy algorithms to optimize
the deployment of DNIDS
One way to prevent users from being infected by
computer viruses, worms, and Trojan horses is to
clean the traffic at the NSP level (edge routers).
The NSP traffic can be monitored and cleaned
by Distributed Network Intrusion Detection Sys-
tem (DNIDS) that may be deployed on the NSP
routers/links [18,20]. It is not realistic to inspect
the traffic of all the routers/links of the NSP. The-
oretical models suggest protecting the most signif-
icant routers/links in order to slow down threat
propagation in the entire network [19]. We used
the greedy algorithm to locate routers/links that
together have the highest influence on communi-
cation in the NSP infrastructure. The NSP net-
work included 26 core routers, 70 edge routers, 204
customer routers, and 451 links.
We assumed that five local area networks are
connected to each customer router. Then, we an-
alyzed the threat propagation in the NSP infras-
tructure assuming that a DNIDS is monitoring the
traffic of the influential group of routers/links. The
analysis was performed using a network simula-
tor that was developed for this purpose and whose
main goal is to assist in finding the appropriate
size and the location of a DNIDS deployment. We
repeated the analysis for various group sizes while
measuring the number of infected local area net-
works. We repeated the simulation for deployment
based on ItrK,T opK , and random deployment.
The simulation results (see Figure 6) illustrate the
superiority of deployment located by the iterative
greedy algorithm (ItrK).
R. Puzis et al. / Finding the most prominent group in SF networks 9
Table 1
Statistics of a typical execution of various algorithms when
searching for a group of size 6 in a network of size 100.
Network Model Algorithm Result (GBC) Total time (sec.) Visited Nodes
Preferential-attachment TopK 0.904 0.015 1
Preferential-attachment ItrK 0.941 1.937 6
Preferential-attachment HS h10.941 96.058 1520
Preferential-attachment HS h20.941 84.84 1338
Preferential-attachment HS h30.941 18.608 90
Preferential-attachment HS h40.941 8.891 42
Random T opK 0.67 0.016 1
Random ItrK 0.721 1.922 6
Random HS h10.733 166.195 3108
Random HS h20.733 139.822 2606
Random HS h30.733 38.967 204
Random HS h40.733 20.092 104
Fig. 6. Infected or crashed computers as a function of de-
ployment size.
6. Conclusions
In this paper we proposed two algorithms for
finding the group with the highest GBC . The first
algorithm is based on heuristic search and the sec-
ond is based on a greedy choice. For the heuristic
search algorithm we proposed four pruning heuris-
tics and two heuristics for guiding the search. The
algorithms were evaluated on random and scale-
free networks. Empirical evaluation suggests that
results of the ItrK algorithm were negligibly be-
low the optimal results that can be found by ex-
ponential heuristic search. In addition, both al-
gorithms performed better on scale-free networks:
heuristic search was faster and the greedy algo-
rithm produced more accurate results. The greedy,
ItrK, algorithm was applied for optimizing de-
ployment of intrusion detection devices on a net-
work that resembles the network service provider
infrastructure.
References
[1] S. H. Strogatz. Exploring complex networks. Nature,
410:268–276, March 2001.
[2] S. Wasserman and K. Faust. Social network analy-
sis: Methods and applications. Cambridge University
Press, 1994.
[3] J. Scott. Social Network Analysis: A Handbook. Sage
Publications, London, 2000.
[4] P. Bork, L. J. Jensen, C. von Mering, A. K. Ramani,
I. Lee, and E. M. Marcotte. Protein interaction net-
works from yeast to human. Curr. Opin. Struct. Biol.,
14(3):292–299, Jun. 2004.
[5] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On
power-law relationships of the internet topology. SIG-
COMM Comput. Comm. Rev.,29(4):251–262, 1999.
[6] S.-H. Yook, H. Jeong, and A.-L. Barabasi. Mod-
eling the internet’s large-scale topology. PNAS,
99(21):13382–13386, Oct. 2002.
[7] A.-L. Barabasi and R. Albert. Emergence of scaling in
random networks. Science,286:509–512, 1999.
[8] A.-L. Barabasi, R. Albert, and H. Jeong. Scale-free
characteristics of random networks: the topology of the
world-wide web. Physica A,281:69–77, 2000.
[9] B. Bollobas and O. Riordan. Robustness and vulner-
ability of scale-free random graphs. Internet Mathe-
matics,1(1):1–35, 2003.
[10] M. Barth´elemy. Betweenness centrality in large com-
plex networks. The European Physical Journal B –
Condensed Matter,38(2):163–168, March 2004.
[11] C. Ballester, A. CalvA -Armengol, and Y. Zenou.
Who’s who in networks. wanted: The key player.
Econometrica,74(5):1403–1417, Sep. 2006.
[12] Stephen P. Borgatti. Identifying sets of key players
in a social network. Comput. Math. Organ. Theory,
12(1):21–34, 2006.
[13] L. C. Freeman. A set of measures of centrality based
on betweenness. Sociometry,40(1):35–41, 1977.
10 R. Puzis et al. / Finding the most prominent group in SF networks
[14] J. M. Anthonisse. The rush in a directed graph. Tech-
nical Report BN 9/71, Stichting Mathematisch Cen-
trum, Amsterdam, 1971.
[15] P. Holme. Congestion and centrality in traffic flow on
complex networks. Advances in Complex Systems,
6(2):163–176, 2003.
[16] G. Yan, T. Zhou, B. Hu, Z.-Q. Fu, and B.-H. Wang.
Efficient routing on complex networks. Phys. Rev. E,
73:046108, 2006.
[17] M. G. Everett and S. P. Borgatti. The central-
ity of groups and classes. Mathematical Sociology,
23(3):181–201, 1999.
[18] V. Yegneswaran, P. Barford, and S. Jha. Global in-
trusion detection in the domino overlay system. in:
NDSS, 2004.
[19] D. H. Zanette and M. Kuperman. Effects of immuniza-
tion in small-world epidemics. Physica A,309:445–
452, Jun. 2002.
[20] K. Park. Scalable protection against ddos and worm
attacks. DARPA ATO FTN project AFRL contract
F30602-01-2-0530, Purdue University, West Lafayette,
2004.
[21] L. C. Freeman, S. P. Borgatti, and D. R. White. Cen-
trality in valued graphs: A measure of betweenness
based on network flow. Social Networks,13(2):141–
154, Jun. 1991.
[22] M. E. J. Newman. A measure of betweenness centrality
based on random walks. Social Networks,27(1):39–54,
Jan. 2005.
[23] R. G. Downey and M. R. Fellows. Parametrized com-
putational feasibility. Feasible Mathematics II, pages
219–244. Boston: Birkh¨auser, 1995.
[24] R. Puzis, Y. Elovici, and S. Dolev. Fast algorithm for
successive group betweenness centrality computation.
Technical Report #03-07, Dept. of Computer Science,
Ben Gurion University of the Negev, Jan. 2007.
[25] R. E. Korf and W. Zhang. Performance of linear-space
search algorithms. Artificial Intelligence,79(2):241–
292, 1995.
[26] M. E. J. Newman. Scientific collaboration networks
ii. shortest paths, weighted networks, and centrality.
Phys. Rev. E,64(016132), 2001.
[27] M. E. J. Newman. Erratum: Scientific collaboration
networks. II. Shortest paths, weighted networks, and
centrality. Phys. Rev. E,73:039906(E), 2006.
[28] U. Brandes. A faster algorithm for betweenness cen-
trality. Mathematical Sociology,25(2):163–177, 2001.
Appendix
A. Glossary
BC – Shortest Path Betweenness Centrality
GBC – Shortest Path Group Betweenness Cen-
trality
KPP-Com – The problem of finding group of ver-
tices of given size that has the maximal po-
tential to control the traffic flow in commu-
nication networks that are based on shortest
path routing
TopK – na¨ıve algorithm for GBC maximization
that takes kvertices with maximal BC
ItrK – greedy algorithm for GB C maximization
that at every step chooses the vertex with
maximal contribution to the group of already
chosen vertices
HS – Heuristic Search
DFBnB – Depth First Branch and Bound
NSP – Network Service Provider
DNIDS – Distributed Network Intrusion Detec-
tion System
Gunweighted and undirected graph
vertex,edge part of the input graph
Vthe set of all vertices in G
Ethe set of all edges in G
nthe number of vertices
mthe number of edges
a, b, c, s, t, u, v, x, y vertices
S, S0group of vertices
ksize of the group of vertices
d(x, y)the distance between vertices xand y
σs,t the total number of different shortest paths
that connect sand t
σs,t(v)the number of shortest paths from sto t
that pass through v
¨σs,t(S)the number of shortest paths between s
and tthat traverse at least one member of the
group S
R, C, C 0nodes in the search tree where Ris the
root
node,transition part of the search tree
GM(C)group of vertices maintained by C
CL(C)candidate list maintained by C
vbest the first vertex in the ordered list CL(C)
Cleft child of Cobtained by removing vbest
from CL(C)
C+right child of Cobtained by removing vbest
from CL(C) and adding it to GM (C)
g(C)utility of C
h(C)heuristic function
f(C)g(C) + h(C)
BC S(v)the contribution of the vertex vto the
GBC of S
... The author also discusses how degree, betweenness, and closeness metrics may fall short when applied "as is" for this problem definition of key players. Subsequently, in [77], Puzis et al. study the problem of determining groups with the largest betweenness in a large-scale network. They apply their algorithms on a network service provider infrastructure data set to optimally distribute intrusion detection devices. ...
... In this work, the high complexity associated with identifying the optimal group in terms of group betweenness is also mentioned as an important factor. Thus, the study lacks algorithmic details in detecting the most betweenness group, different than Puzis et al. [77]. ...
... Nonetheless, Dolev et al. [28] further this research by studying group betweenness centrality measured as the information flow in an evolving network. The authors propose a heuristic approach with an approximation guarantee that generalizes the heuristic algorithm proposed in Puzis et al. [77] for a similar problem setting. Later, Fink and Spoerhase [31] improve the analysis of Dolev et al. [28]. ...
Article
Centrality metrics have become a popular concept in network science and optimization. Over the years, centrality has been used to assign importance and identify influential elements in various settings, including transportation, infrastructure, biological, and social networks, among others. That said, most of the literature has focused on nodal versions of centrality. Recently, group counterparts of centrality have started attracting scientific and practitioner interest. The identification of sets of nodes that are influential within a network is becoming increasingly more important. This is even more pronounced when these sets of nodes are required to induce a certain motif or structure. In this study, we review group centrality metrics from an operations research and optimization perspective for the first time. This is particularly interesting due to the rapid evolution and development of this area in the operations research community over the last decade. We first present a historical overview of how we have reached this point in the study of group centrality. We then discuss the different structures and motifs that appear prominently in the literature, alongside the techniques and methodologies that are popular. We finally present possible avenues and directions for future work, mainly in three areas: (i) probabilistic metrics to account for randomness along with stochastic optimization techniques; (ii) structures and relaxations that have not been yet studied; and (iii) new emerging applications that can take advantage of group centrality. Our survey offers a concise review of group centrality and its intersection with network analysis and optimization.
... The author also discusses how degree, betweenness, and closeness metrics may fall short when applied "as is" for this problem definition of key players. Subsequently, in [77], Puzis et al. study the problem of determining groups with the largest betweenness in a large-scale network. They apply their algorithms on a network service provider infrastructure data set to optimally distribute intrusion detection devices. ...
... In this work, the high complexity associated with identifying the optimal group in terms of group betweenness is also mentioned as an important factor. Thus, the study lacks algorithmic details in detecting the most betweenness group, different than Puzis et al. [77]. ...
... Nonetheless, Dolev et al. [27] further this research by studying group betweenness centrality measured as the information flow in an evolving network. The authors propose a heuristic approach with an approximation guarantee that generalizes the heuristic algorithm proposed in Puzis et al. [77] for a similar problem setting. Later, Fink and Spoerhase [31] improve the analysis of Dolev et al. [27]. ...
... Betweenness centrality can easily be adapted to consider variable communication patterns in the form of traffic matrices [7]. Puzis et al. describe in [8] a greedy algorithm that, at every stage, chooses the candidate that contributes the most to the GBC of the already chosen candidates. The algorithm presented here, is a generalization of this scheme which also supports "must deploy" and "can not deploy" constraints. ...
... M . It was proved in [8] that the sum of contributions of any set of nodes to the GBC of 1 − i M is greater than or equal to their joint contribution. ...
Preprint
In many applications we are required to increase the deployment of a distributed monitoring system on an evolving network. In this paper we present a new method for finding candidate locations for additional deployment in the network. This method is based on the Group Betweenness Centrality (GBC) measure that is used to estimate the influence of a group of nodes over the information flow in the network. The new method assists in finding the location of k additional monitors in the evolving network, such that the portion of additional traffic covered is at least (1-1/e) of the optimal.
... An attempt to find a good set of vertices as the center of a network was finding a prominent set [25]. A prominent set is a set of minimum size, such that every shortest path in the ...
... However, this notion of prominent groups has shortcomings. First, the problem of finding a prominent group is a simple reduction of the minimal vertex cover problem [25] and hence, it is an NP-hard problem. Second, the size of a prominent group can be large, as it tries to control all the flows (which are done through shortest paths) in the network. ...
Preprint
There are several applications that benefit from a definition of centrality which is applicable to sets of vertices, rather than individual vertices. However, existing definitions might not be able to help us in answering several network analysis questions. In this paper, we study generalizing aggregation of centralities of individual vertices, to the centrality of the set consisting of these vertices. In particular, we propose exclusive betweenness centrality, defined as the number of shortest paths passing over exactly one of the vertices in the set, and discuss how this can be useful in determining the proper center of a network. We mathematically formulate the relationship between exclusive betweenness centrality and the existing notions of set centrality, and use this relation to present an exact algorithm for computing exclusive betweenness centrality. Since it is usually practically intractable to compute exact centrality scores for large real-world networks, we also present approximate algorithms for estimating exclusive betweenness centrality. In the end, we evaluate the empirical efficiency of exclusive betweenness centrality computation over several real-world networks. Moreover, we empirically study the correlations between exclusive betweenness centrality and the existing set centrality notions.
... Given a group size k, the problem of finding k vertices in a graph with the highest group betweenness centrality is an NP-hard problem [126]. Therefore heuristic approaches [126] and approximation algorithms [56] have been proposed in the literature. ...
... Given a group size k, the problem of finding k vertices in a graph with the highest group betweenness centrality is an NP-hard problem [126]. Therefore heuristic approaches [126] and approximation algorithms [56] have been proposed in the literature. Kolaczyk et al. [92] proposed lower and upper bounds on the group betweenness centrality and therefore their approach can be used to estimate group betweenness centrality of any given subset of the vertices. ...
... [22] proposed an Θ(|K| 3 ) time algorithm for computing successive group betweenness centrality, where |K| is the size of the set. The same authors in [23] presented two algorithms for finding most prominent group. A most prominent group of a network is a set of vertices of minimum size, so that every shortest path in the network passes through at least one of the vertices in the set. ...
Preprint
Graphs (networks) are an important tool to model data in different domains. Real-world graphs are usually directed, where the edges have a direction and they are not symmetric. Betweenness centrality is an important index widely used to analyze networks. In this paper, first given a directed network G and a vertex rV(G)r \in V(G), we propose an exact algorithm to compute betweenness score of r. Our algorithm pre-computes a set RV(r)\mathcal{RV}(r), which is used to prune a huge amount of computations that do not contribute to the betweenness score of r. Time complexity of our algorithm depends on RV(r)|\mathcal{RV}(r)| and it is respectively Θ(RV(r)E(G))\Theta(|\mathcal{RV}(r)|\cdot|E(G)|) and Θ(RV(r)E(G)+RV(r)V(G)logV(G))\Theta(|\mathcal{RV}(r)|\cdot|E(G)|+|\mathcal{RV}(r)|\cdot|V(G)|\log |V(G)|) for unweighted graphs and weighted graphs with positive weights. RV(r)|\mathcal{RV}(r)| is bounded from above by V(G)1|V(G)|-1 and in most cases, it is a small constant. Then, for the cases where RV(r)\mathcal{RV}(r) is large, we present a simple randomized algorithm that samples from RV(r)\mathcal{RV}(r) and performs computations for only the sampled elements. We show that this algorithm provides an (ϵ,δ)(\epsilon,\delta)-approximation to the betweenness score of r. Finally, we perform extensive experiments over several real-world datasets from different domains for several randomly chosen vertices as well as for the vertices with the highest betweenness scores. Our experiments reveal that for estimating betweenness score of a single vertex, our algorithm significantly outperforms the most efficient existing randomized algorithms, in terms of both running time and accuracy. Our experiments also reveal that our algorithm improves the existing algorithms when someone is interested in computing betweenness values of the vertices in a set whose cardinality is very small.
... Indeed, state-of-the-art algorithms for solving the SIB problem optimally use an exhaustive Depth-First Search (Kochut 1996;Hood et al. 2011). Heuristic search algorithms, however, can be used to find optimal solutions faster than uninformed search algorithms even in MAX problems by using an admissible heuristic (Stern et al. 2014b;Puzis, Elovici, and Dolev 2007;Stern et al. 2014a). An important difference between MAX and MIN problems is the definition of an admissible heuristic: in MIN problems an admissible heuristic is a lower bound on future costs, while in MAX problems it is an upper bound. ...
Article
Full-text available
Snake in the Box (SIB) is the problem of finding the longest simple path along the edges of an n-dimensional cube, subject to certain constraints. SIB has important applications in coding theory and communications. State of the art algorithms for solving SIB apply uninformed search with symmetry breaking techniques. We formalize this problem as a search problem and propose several admissible heuristics to solve it. Using the proposed heuristics is shown to have a huge impact on the number of nodes expanded and, in some configurations, on runtime. These results encourage further research in using heuristic search to solve SIB, and to solve maximization problems more generally.
... We are now ready to show the  -completeness of the problem when the structure containing u ∈ V is enforced to be a clique or an induced star. The proof can be found in Appendix A. For the case of a representative set of a vertex u ∈ V, we can use the MAXIMUM BETWEENNESS CENTRALITY problem, as this was introduced by [30]: the authors also derived the  -hardness of the problem by reduction from the well-known VERTEX COVER. ...
Article
In this work, we introduce centrality metrics based on group structures, and we show their performance in estimating importance in protein-protein interaction networks (PPINs). The centrality metrics introduced are extensions of well-known nodal metrics. However, instead of focusing on a single node, we focus on that node and the set of nodes around it. Furthermore, we require the set of nodes to induce a specific pattern or structure. The structures investigated range from the “stricter“ induced stars and cliques, to a “looser” definition of a representative structure. We derive the computational complexity of all metrics and provide mixed integer programming formulations; due to the problem complexity and the size of PPINs, using commercial solvers is not always viable. Hence, we propose a combinatorial branch-and-bound solution approach. We conclude by showing the effectiveness of the proposed metrics in identifying essential proteins in Helicobacter pylori and comparing them to nodal metrics.
... Greedy approximation algorithms to maximize group centrality also exist for measures like closeness (by Bergamini et al. [7]) and current-flow closeness (by Li et al. [30]). Puzis et al. [41] state an algorithm for group betweenness maximization that does not utilize submodular approximation but relies on a branch-and-bound approach. ...
Article
Full-text available
This paper extends the standard network centrality measures of degree, closeness and betweenness to apply to groups and classes as well as individuals. The group centrality measures will enable researchers to answer such questions as ‘how central is the engineering department in the informal influence network of this company?’ or ‘among middle managers in a given organization, which are more central, the men or the women?’ With these measures we can also solve the inverse problem: given the network of ties among organization members, how can we form a team that is maximally central? The measures are illustrated using two classic network data sets. We also formalize a measure of group centrality efficiency, which indicates the extent to which a group's centrality is principally due to a small subset of its members.
Article
Full-text available
We analyze the betweenness centrality (BC) of nodes in large complex networks. In general, the BC is increasing with connectivity as a power law with an exponent h\eta . We find that for trees or networks with a small loop density h = 2\eta = 2 while a larger density of loops leads to h < 2\eta < 2 . For scale-free networks characterized by an exponent g\gamma which describes the connectivity distribution decay, the BC is also distributed according to a power law with a non universal exponent d\delta . We show that this exponent d\delta must satisfy the exact bound d ³ (g+ 1)/2\delta\geq (\gamma + 1)/2 . If the scale free network is a tree, then we have the equality d = (g+ 1)/2\delta = (\gamma + 1)/2 .
Article
The propagation of model epidemics on a small-world network under the action of immunization is studied. Although the connectivity in this kind of networks is rather uniform, a vaccination strategy focused on the best connected individuals yields a considerable improvement of disease control. The model exhibits a transition from disease localization to propagation as the disorder of the underlying network grows. As a consequence, for fixed disorder, a threshold immunization level exists above which the disease remains localized.
Article
Finite population noncooperative games with linear-quadratic utilities, where each player decides how much action she exerts, can be interpreted as a network game with local payoff complementarities, together with a globally uniform payoff substitutability component and an own-concavity effect. For these games, the Nash equilibrium action of each player is proportional to her Bonacich centrality in the network of local complementarities, thus establishing a bridge with the sociology literature on social networks. This Bonacich–Nash linkage implies that aggregate equilibrium increases with network size and density. We then analyze a policy that consists of targeting the key player, that is, the player who, once removed, leads to the optimal change in aggregate activity. We provide a geometric characterization of the key player identified with an intercentrality measure, which takes into account both a player's centrality and her contribution to the centrality of the others.