Content uploaded by Michael Spranger
Author content
All content in this area was uploaded by Michael Spranger on Feb 04, 2021
Content may be subject to copyright.
Measuring Competence: Improvements to Determine the Degree of Opinion Leadership
in Social Networks
Michael Spranger∗† , Kai-Jannis Hanke†, Florian Heinke†and Dirk Labudde†‡
†University of Applied Sciences Mittweida
Forensic Science Investigation Lab (FoSIL), Germany
Email: name.surname@hs-mittweida.de
‡Fraunhofer
Cyber Security
Darmstadt, Germany
Email: labudde@hs-mittweida.de
Abstract—In recent years, the automated, efficient and sensitive
monitoring of social networks has become increasingly important
for the criminal investigation process and crime prevention.
Previously, we have shown that the detection of opinion leaders
is of great interest in forensic applications to gather important
information. In the current work, it is argued that state of
the art methods, determining the relative degree to which an
opinion leader exerts influence over the network, have weaknesses
if networks exhibit a star-like social graph topology, whereas
these topologies result from the interaction of users with similar
interests. This is typically the case in networks of political
organizations. In these cases, the underlying topologies are highly
focused on one (or only a few) central actor(s) and lead to
less meaningful results by classic measures of node centrality
commonly used to ascertain the degree of leadership. With the
help of data collected from the Facebook and Twitter network
of a German political party, these aspects are examined and a
quantitative indicator for describing star-like network topologies
is introduced and discussed. This measure can be of great value in
assessing the applicability of established leader detection methods.
Finally, two variations of a new measure– the CompetenceRank –
which is based on the LeaderRAnk score and aims to address the
discussed problems in cases with and without additional network
data such as likes and shares, are proposed.
Keywords–Forensic; Opinion Leader; Graph Theory.
I. INTRODUCTION
The detection of opinion leaders in online social networks
has been discussed extensively over the past few years. While
the term “detection” is generally associated with a binary
decision, here – in accordance with other papers in this domain
– it is used to refer to the determination of the degree of
leadership. The scope of application is manifold and reaches
from determining influencers and brand ambassadors up to
finding those who influence the political opinion of a group
of people. Especially the last application can be of interest to
law enforcement and intelligence agencies. In [1] it was shown
that in some situations previous approaches based on the work
by Katz [2], who focused on networks in the offline world, do
not capture the core of the problem and as a result lead to an
inaccurate assessment of opinion leadership.
Measures for opinion leadership on social networks tend
to focus on a single aspect: network contribution. However, it
becomes clear that only evaluating network contribution such
as posting content, commenting it or replying to it does not
capture the full range of interactions social media platforms
have to offer. Besides network contribution or content gener-
ation in the ordinary sense we also find a secondary form of
participation, which solely relies on existing content. Virtually
nodding in agreement by clicking like or extending the reach
of a given post by sharing it, is not creating new content in
a given network. However, measures reflecting such activities
exist on most social media platforms and play a substantial
role in determining ones reach and authority. These secondary
measures do not only shape how people interact but also
influence who rises to the position of an opinion leader.
This section shall give a brief introduction to the field in
which situations may occur, in which the LeaderRank leads to
inappropriate results. Furthermore, it will give an overview of
topology-based approaches and it finishes with the scope and
structure of the paper.
A. General Motivation
Analyzing social networks has become an important tool
for investigators, intelligence services and decision makers
of police services. The information gained this way can be
used to solve crimes by searching for digital evidence that
relates to the crime in the real world. Additionally, methods of
predictive policing can help to organize police missions as was
shown in [3]–[5]. The detection of opinion leaders in social
networks is an important task for different reasons. On the one
hand, owners of influential profiles are often also influential in
the offline world. Knowing these people helps to determine
the direction of an investigation or more concretely to target
persons of interest. On the other hand, as was suggested in
previous work [5], it might be of interest to contact these
profiles by means of chatbots to gain access into closed groups
in an effort to gather important information for intelligence
services. Intuitively, opinion leaders, when considered as nodes
with high structural importance, can be detected with the help
of centrality measures. However, different kinds of influence
in a network have to be distinguished. Nodes can have a
great influence as corresponding actors are able to spread
information fast and widely in a network, or they can have
a great influence because they write something of importance
that attracts many other users in the network to respond.
97
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
B. Leader Detection by means of Network Centrality Measures
In the literature, one can mainly find centrality measures
for the former type of influence. For example, highly active
profiles can be recognized using degree centrality, meaning, the
relative number of outgoing edges of a node. These profiles
are represented by nodes with a high degree centrality and
are especially useful to spread information in a network due
to their high interconnectedness. In this context, the closeness
centrality – the inverse of the mean of the shortest path of a
node to any other node in the network – is even more effective.
It describes the efficiency of the dissemination of information
of a certain node.
Furthermore, the betweenness centrality of a certain node,
which is defined as the number of shortest paths between
two nodes that cross this node, describes the importance of
this node for the dissemination of information in a network.
Therefore, the higher the betweenness centrality of a node,
the greater its importance for the exchange of information in
a network.
Moreover, the eigenvector centrality of a node is defined as
the principal eigenvector of the adjacency matrix of a network.
In contrast to the measures discussed beforehand, PageRank
[6], as one of the best measures of node centrality, does not
only consider the centrality of the node itself, yet also of its
neighboring nodes.
As part of the opinion leader detection research, Leader-
Rank [7] was introduced as a further development of PageRank
in order to find nodes that spread information further and
faster. However, all of these centrality measures consider nodes
that are involved in the dissemination of information mainly
based on their activity. For the purpose of the intended usage,
users who achieve high impact through what they have written
are of much greater interest. Thus, similar to the citation of
papers and books and its impact on the author’s reputation, the
importance of a node has to be higher when it reaches a high
number of references and citations with low activity.
Especially social media platforms provide comparable met-
rics, such as likes and shares that partially reflect the author’s
reputation and credibility. Hence, it is imperative to consider
respective measures of acceptance, expertise and authority
when determining opinion leaders in any digital social net-
work.
Interestingly, Li et al. considered the so-called node spread-
ability as the ground truth for quantifying node importance
in a subsequent study [8]. Subsequently, node spreadability
is based on a straightforward Susceptible-Infected-Removed
(SIR) infection model from which the expected number of
infected nodes upon initially infecting the node in question
is estimated. However, this expected number can only be
estimated from simulation, which, furthermore, is dependent
on the parameterization of the SIR model. In this respect, all
centrality measures can be considered as heuristic approxima-
tions of node spreadability.
C. Scope and Structure of the Paper
In this work, we discuss problems that can arise when aim-
ing to detect opinion leaders in social networks yielding highly
central topologies similar to star graphs. Examples for such
networks are especially group pages on Facebook or vk.com
where user interactions and activities are mostly triggered by
and focused on posts made by the page owner. In such cases,
the page owner – a trivial leader in the sense of centrality
measures discussed above – acts as a score aggregator and can
thus lead to distorted scoring, which can eventually be adverse
in the context of opinion leader detection. In this case, classic
centrality measures can be considered inappropriate. Based on
interactions of users of the Facebook page of the German
political pary “DIE LINKE” tracked for five consecutive
months (January - May 2017), this problem is illustrated. We
further introduce the LeaderRank skewness as a quantitative
measure of aggregator-induced distorted LeaderRank scoring,
which in experiments show to be superior to network entropy
with respect to expressiveness. Additionally, a simple modified
LeaderRank score, to which we refer to as CompetenceRank,
is introduced. It is proposed to be more suitable for opinion
leader detection in such networks, especially, if additional data
for likes and shares are not available.
For such cases in which these data is available an improved
version of the CompetenceRank is proposed and evaluated
using the Twitter network of “DIE LINKE”. The corresponding
data set contains not only tweets, comments and replies from
the entire year 2018, it also incorporates the accompanying
like and retweet counts for each tweet, comment and reply.
In politically motivated networks, as the one analyzed in
this paper, the improved CompetenceRank shows a substantial
increase in performance compared to the LeaderRank and the
simple CompetenceRank.
The paper is structured as follows: in Section II, a brief
literature overview on the topic of opinion leader detection is
given, followed by a summary of the LeaderRank algorithm. In
Section III two shortcomings of the LeaderRank are discussed:
firstly, the skewness of the rank distribution in star-shaped
network topologies and, secondly, that not all available data of
social media platforms are taken into account. Subsequently, in
the same section the deduction and definition of the normalized
LeaderRank skewness as a metric for an approximation of
a star-shaped topology is discussed and compared with the
normalized graph entropy. In SectionIV three datasets are
introduced, which were used to evaluate these metrics, two
of which were also used to develop solutions for the afore-
mentioned problems as proposed in Section V by introducing
the CompetenceRank for taking authority into account as
well as an improvement for cases in which additional data
is available. Subsequently, Section VI contains an evaluation
of both CompetenceRank versions using the Twitter network.
Finally, a conclusion as well as an overview of future work is
given in Section VII.
II. DE TE CT IO N OF OPINION LEADERS
Opinion leaders in the context of the intended analysis
of social networks are individuals, who exert a significant
amount of influence on the opinion and sentiment of other
users of the network through their actions or by what they are
communicating. In social sciences the term “opinion leader”
was introduced before 1957 by Katz and Lazarsfeld’s research
on diffusion theory [2]. Their proposed two-step flow model
retains validity in the digital age, especially in the context of
social media.
Katz et al. assume that information disseminated in a social
network is received, strengthened and enriched by opinion
98
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
leaders in their social environment. Each individual is influ-
enced in his opinion by a variety of heterogeneous opinion
leaders. This signifies that the opinion of an individual is
mostly formed by its social environment. In 1962, Rogers
referenced these ideas and defined opinion leader as follows:
“Opinion leadership is the degree to which an
individual is able to influence informally other indi-
viduals’ attitudes or overt behavior in a desired way
with relative frequency.” [9, p. 331]
For the present study, one important question to answer is
what influence means, or rather how to identify an opinion
leader or how the influencer can be distinguished from those
being influenced. Katz defined the following features [2]:
1) personification of certain values,
2) competence,
3) strategic social location.
One approach to identify opinion-leaders is to extract and
analyze the content of nodes and edges of networks to mine
leadership features. For instance, the sentiment of communi-
cation pieces can be analyzed to detect the influence of their
authors, as shown by Huang et. al., who aim to detect the
most influential comments in a network this way [10]. Another
strategy is to perform topic mining to categorize content and
detect opinion leaders for each topic individually, as opinion
leadership is context-dependent [2] [11]. For this purpose,
Latent Dirichlet Allocation (LDA) [12] can be used, as seen
in the work of [13]. Furthermore, Aleahmad et. al. achieved
good results with OLFinder by utilizing both topic mining
methods and centrality measures [14]. Additionally, Chen et
al. proposed D OLMiner, which derives opinion leaders from
dynamic social networks [15].
Another novel approach, the firefly algorithm, a meta-
heuristic optimization algorithm that can deal with especially
large networks, is based on the behavior of fireflies and is used
by Jain et. al. to determine local and global opinion leaders
[16].
For this study, we considered the implementation of
content-based methods problematic, as texts in social networks
mostly lack correct spelling and formal structure which im-
pairs such methods’ performance. Additionally, leaders can be
identified by analyzing the flow of information in a network.
By monitoring how the interaction of actors evolves over time,
one can identify patterns and individuals of significance within
them. To achieve this, some model of information propagation
is required, such as Markov processes employed by [17] and
the probabilistic models proposed by [18]. These interaction-
based methods consider both topological features and their
dynamics over time. DDOL is a recent, dynamic approach
by Queslati et. al. that focuses on social signals (shares,
comments, likes) and terms that are frequently encountered in
the expression of opinions. DDOL does not include centrality
measures and has a slightly lower precision than PageRank but
contrary to PageRank it works on dynamic networks and a has
a lower computational complexity [19].
Parts of this study use methods that are solely based on
a network’s topology, therefore, considering features, such as
node degree, neighborhood distances and clusters, to identify
opinion leaders. One implementation for the former is the
calculation of node centrality. The underlying assumption is
that the more influence an individual gains, the more central it
is in the network. Which centrality measure is most suitable is
dependent on the application domain. We judged eigenvector
centrality to be most adequate. One of the most popular al-
gorithms is Google’s PageRank algorithm [6]. The application
of PageRank for the purposes of opinion leader detection has
seen merely moderate success [20] [21].
With LeaderRank scores, L¨
u et al. advocate further devel-
opment and optimization of this algorithm for social networks,
and have achieved surprisingly good results [7]. Herein, users
are considered as vertices and directed edges as relation-
ships between opinion leaders and users. All users are also
bidirectionally connected to a ground vertex, which ensures
connectivity as well as score convergence. In short, the al-
gorithm is an iterative multiplication of a vector comprised
by per-vertex scores si(t)at iteration step twith a weighted
adjacency matrix until convergence is achieved according to
some convergence criteria. Initially, at iteration step t0, all
vertex scores are set to s(0) = 1, except for the ground
vertex score which is initialized as sg(0) = 0. Equation (1)
describes the LeaderRank algorithm as a model of probability
flow through the network, where si(t)indicates the score of a
vertex iat iteration step t.
si(t+ 1) =
N+1
X
j=1
aji
eout
vj
sj(t)(1)
Depending on whether or not there exists a directed edge from
vertex jto the vertex i, the value 1respectively 0is assigned
to aji .eout
vjdescribes the number of outgoing edges of a vertex
j. The update rule given in Equation (1) can be rewritten as a
matrix-vector product:
s(t+ 1) = ˜
As(t),(2)
where s(t)corresponds to the vector of the N+1 vertex scores
at iteration step t, and ˜
Ais the weighted adjacency matrix of
size (N+ 1) ×(N+ 1) with
˜
Aji =aj i
eout
vj
.(3)
The final score is obtained as the score of the respective vertex
at the convergence step tcand the obtained ground vertex
score, as shown in (4). At tc, equilibration of LeaderRank
scores towards a steady state is observed.
Si=si(tc) + sg(tc)
N(4)
Furthermore, note that
N
X
i=1
Si=
N
X
i=1
si(t) = N. (5)
The advantage of this algorithm compared to PageRank is
that the convergence is faster and, above all, that vertices
that spread information faster and further can be found. In
later work, for example, by introducing a weighting factor, as
in [8] or [22], susceptibility to noisy data has been further
reduced and the ability to find influential distributors (hubs)
of information has been added.
99
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
III. ISSUES WITH LEA DE RRANK
The LeaderRank algorithm can be understood as a re-
version of a discrete model of diffusion. In that sense, the
initialization si(0) = 1 at t0can be interpreted as assigning a
uniform concentration distribution of some virtual compound
that, in the processes, is re-distributed according to the model.
In that respect, central actors showing the highest activity in
star-like networks can induce score aggregation and migration
towards their central nodes as well as their adjacent nodes,
whereas nodes in the ’peripheral region’ of the network be-
come inadequately represented by their scores. Therefore, one
can hypothesize that ranked lists obtained from LeaderRank
scores can not be considered meaningful if a given network in
question exhibits a star-like topology.
Another problem of LeaderRank comes into existence
when considering means of communication that differ from
traditional ones in person dialogues. Most social media plat-
forms utilize likes,shares,dislikes and the concept of building
a follower base. The amount of, for example, likes that a post
receives or the frequency with which it is shared indicate its
importance within a network and at least partially reflect the
influence of the respective author. In turn, such data should be
included when determining opinion leadership. Theoretically,
LeaderRank has the capacity to incorporate aforementioned
additional data. However, if this data were to be included in
a network graph, then each like,share or anything similar
would be seen as a unique edge from one node to another,
just like regular forms of communication. This introduces two
major problems, a theoretical one and a practical one. Firstly,
is a like on a post equally as valuable as an actual reply
and then how influential is a share? Evidently, there is a
difference between the interaction activities, such as liking,
sharing, writing or replying to a post, but this discrepancy is
difficult to capture with the LeaderRank. Either one accepts
that likes and shares have similar value to a written reply or
one needs to additionally implement weights for different types
of edges within a network.
Secondly, including likes as edges between nodes poses
a practical problem: partial networks. When considering an
individual post, then ideally the name of every individual who
has liked this post is available in our data set, but in a real
world example this is usually not the case. For example, when
analyzing a twitter network one can discover how many people
liked an individual post quite easily, but recovering the names
of those individuals is highly restricted as twitter only provides
a shortened list of names. It might be possible to recover all the
names for a tweet with only 15 likes, but the list of names for
a tweet with 100 likes can have the same length as the list for a
tweet with 1.000 likes. Clearly, we lose a significant amount of
information with exactly those tweets that are of great interest
for opinion leadership, that is, tweets with seemingly the most
influence over other users. When faced with similar restrictions
on different platforms the total count of likes or shares might
be more useful than a drastically reduced and limited list of
names. In a similar manner it makes more sense to determine
the popularity of politicians by counting the attendees of a
political event compared to getting the names of only the
first hundred attendees. Hence, it makes more sense to define
people posting on social media as “politicians” speaking on a
stage whereas users liking or sharing their content can be seen
as attendees nodding in agreement or sending pictures of the
stage to their friends.
On social media we have many attendees, virtually nodding
their heads by clicking like or retweeting or sharing interesting
content but they do not contribute by producing new posts.
Incomplete data sets may not include the name for every
person that likes a contribution, but these users can still be
influenced and may even shape the network, since likes and
shares present a measure for authority, credibility and approval
in a given network. As a result, accounts partaking in the
network through likes and shares should receive recognition
as they silently enable cognitive biases, like the bandwagon
effect [23] or herding mentality [24], that in turn alter how
well-liked content appears to be, consequently, making it more
or less influential. Ideally, LeaderRank does not only find
opinion leaders in complete networks, but also discovers them
in incomplete data sets. As a result, accounts that cannot be
represented in the graph due to the absence of a name should
still be considered when determining opinion leadership. A
magnitude of nameless accounts cannot be included in a graph
and thus they will not receive LeaderRank-Scores themselves,
but seen as a collective they may help in shaping a network
and identifying truly influential opinion leaders.
In this case study, two different networks are being exam-
ined. Namely, the network around the Facebook page as well
as the Twitter network of the German left-winged political
party “DIE LINKE”. Firstly, the star topology of the Facebook
network is being evaluated and secondly a novel approach to
include likes and retweets is tested on the Twitter network.
In the first case study, the Facebook network under inves-
tigation shows an extreme case of a star topology in which the
owner of the political Facebook page “DIE LINKE” acts solely
as the central actor (for more information see Section IV).
Since the LeaderRank emphasizes the strategic social location
of a user, their competence seems to be improperly valued.
In star-shaped network topologies, high centralities of only a
fraction of nodes leads to a heavily skewed LeaderRank score
distribution.
In contrast, one could argue that someone is more important
if any activity generates a high number of responses. Such
a case is regularly given by political networks which are
dominated by the central node of the page owner. Conse-
quently, a straightforward modification of the LeaderRank
score is proposed in Section V-A addressing the imbalance
the LeaderRank algorithm yields in such networks.
In the following paragraph a quantitative measure of Lead-
erRank distribution skewness is proposed that could aid to
ensure proper applicability of the LeaderRank algorithm for
any given network. This measure is further compared to the
classic measure of network entropy. Tests on simulated data
show the LeaderRank skewness to be superior to network
entropy with respect to topological changes.
A. Definition of LeaderRank Distribution Skewness
Let LR ={S1, ..., Si, ..., SN}be the LeaderRank scores of
all nodes. Further, Sand sdLR denote the arithmetic mean and
standard deviation of LR. Based on the z-scaled LeaderRank
scores (6), the skewness νof the LeaderRank distribution is
calculated as shown in (7).
z(Si) = Si−S
sdLR
(6)
100
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
νLR =
1
NX
i
z(Si)3
(7)
As discussed above, score distribution skewness is correlated
with network topology. Yet, normalization of computed skew-
ness is required in order to make a statement about the
topology and whether a star-like topology is present. Hence,
upper and lower bounds, νmin and νmax, are needed. In this
paragraph, derivation of both bounds are given.
Trivially, νconverges to the lower bound – the theoretical
minimum (ν= 0) – in almost-regular graphs. Such graphs are
regular graphs with one edge being removed. With Nbeing
sufficiently large, the supposition that Si≈Sjfor any pair of
randomly selected vertices of a social network graph vi, vj∈
Vholds true and a limit of limsdLR→0ν= 0 can be assumed.
In regular graphs however all LeaderRank scores are equal by
definition, resulting to sdLR = 0 and νbeing undefined in this
case.
In contrast, νis equal to the theoretical maximum if the
network graph exhibits a strictly star-shaped topology. Directed
star graphs are graphs with a central vertex vcand N−1leaf
vertices connected to vc. One can re-write the set of star graph
vertices as V={vc, v2, ..., vN}and denote the LeaderRank
score set as LR ={Sc, S2, ..., SN}.The LeaderRank scores
of any randomly selected pair of vertices viand vjwith
vi, vj6=vc, with vcbeing the central vertex, are then not
distinguishable, i. e., Si=Sj, according to the LeaderRank’s
definition. Furthermore, the sum of LeaderRank scores equals
Nleading to S= 1 for any given graph. Given the central
node’s score Sc, each Sican thus be calculated as shown in
(8).
Si=N−Sc
N−1(8)
Thus if Scis known, the set of LeaderRank values
{Sc, S2, ...Si, ..., SN}and the resulting νmax can be derived.
In the following text we shall give an explicit relationship
between the number of nodes Nin a directed star graph and
the corresponding score set LR. For this, let sbe the scores
vector at the steady-state to which s(t)converges according to
the update rule (see Equation (2)). Then the identity given in
Equation (9) holds, since s=s(t+ 1) = s(t).
s=˜
As (9)
Thus equation (9), in conjunction with the relation given in
equation (5), yields a set of N+2 equations from which scan
be (theoretically) obtained for any given graph, if a sufficiently
efficient solver algorithm exists. However, for directed star
graphs solving these equations is straight-forward, and leads
to an explicit formalism for sand the LeaderRank scores LR
accordingly. Solving this set of equations involves that ˜
Acan
be explicitly written as
˜
A=
0 1/2 1/2... 1/2 1/N
0 0 0 ... 0 1/N
.
.
..
.
..
.
.....
.
..
.
.
0 0 0 ... 0 1/N
1 1/2 1/2... 1/2 1/N
.(10)
for any given directed, extended star graph with vertices
V={vc, v2, ..., vN, vg}. One henceforth obtains the steady-
state score vector s= (sc, s2, ..., sN, sg)|from the resulting
set of equations which can be derived by simply re-arranging
Equations (9) and (5):
sc=N2
5N−1+N
5N−1(11)
si=2N
5N−1,∀i= 2, ..., N (12)
sg=2N2
5N−1.(13)
This explicit formalism of ˜
Aalso highlights that the leaf ver-
tices (denoted as vifor textual cleanness in the following text)
are indistinguishable with respect to the weighted adjacency
matrix values ˜
Ai·. Thus, the obtained LeaderRank scores Si
are identical as well. Plugging the computed values of sinto
the final update rule (see Equation (4)) yields the LeaderRank
score for the central vertex vc:
Sc=N2
5N−1+3N
5N−1(14)
(15)
Then the equal LeaderRank score Siof the leaf nodes can be
calculated according to Equation (8), from which the upper
skewness bound νmax is readily computed. Subsequently,
for any irregular network graph the LeaderRank skewness
can be calculated and normalized subsequently using a min-
max normalization as denoted in (16), whereas νmin can be
assumed as 0as discussed above.
ˆν=ν−νmin
νmax −νmin
=ν
νmax
(16)
B. Detection of star topology
LeaderRank skewness ˆνcan be utilized to indicate adverse
leader ranking by means of LeaderRank scores. In this section,
we compare νto the classic measure of network entropy
(denoted as Hin the following text). In order to allow direct
comparison to ˆνas well as to entropies computed from other
graphs, His required to be normalized analogously to ˆν. In
this subsection, we give a brief overview on how normalization
can be conducted.
Let Abe the adjacency matrix of a network with N
vertices, where each element aij := 1 if there exists a directed
edge eij between adjacent vertices viand vj. Each element
of the principal diagonal aii is defined as aii := deg(vi)and
thus corresponds to the degree – the sum of the incoming
and outgoing edges – of vertex vi. The trace of Ais de-
fined as the sum of all elements of the principal diagonal:
tr(A) = PN
i=1 aii. The formalism for graph entropy used by
Passerini and Severini H(ρ) = −tr(ρlog2ρ)[25] is based on
the von Neumann entropy and can be adapted as shown in
101
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
(17).
H(ρ) = −tr(ρlog2ρ)
=−
N
X
i=1
ρilog2ρi
=−
N
X
i=1
aii
tr(A)log2
aii
tr(A)
=−
N
X
i=1
deg(vi)
N
P
j=1
deg(vj)
log2
deg(vi)
N
P
j=1
deg(vj)
.
(17)
This formalism, which is the entropy of the density ma-
trix of a graph, describes the distribution of incoming and
outgoing edges. In a randomly generated graph one expects
deg(vi)≈deg(vj). In this case, the graph entropy His close
to the theoretical maximum entropy Hmax. Therefore, the
graph entropy only reaches its maximum if Gis a regular graph
where deg(vi) = deg(vj) = D. Because ρi=D/DN = 1/N
in a regular graph, one has Has shown in (18).
H=Hmax =−Xρilog2ρi= log2N(18)
In contrast, the minimum graph entropy Hmin is observable
in networks showing star topology. The trace tr(A)of such
a graph corresponds to 2N−2and the degree of its central
vertex is deg(vc) = N−1. Consequently, the entropy of the
central vertex Hcis calculated as shown in (19).
Hc=−N−1
2N−2log2
N−1
2N−2=−1
2log2
1
2= 0.5.(19)
The degree of any other vertex is deg(vi)=1. Hence, the
entropy of a graph constituted as a star is calculated as follows:
H=Hmin
= 0.5 + X
V\vc
−1
2N−2log2
1
2N−2
= 0.5 + 1
2log2(2N−2)
= 1 + 1
2log2(N−1).
(20)
The normalized network entropy can be finally computed
according to (21):
ˆ
H=H−Hmin
Hmax −Hmin
,ˆ
H∈[0,1] (21)
In order to illustrate expressiveness of ˆ
Hand ˆνwith respect
to the underlying network topology, a straightforward experi-
ment was carried out in which synthetic networks exhibiting
star topologies were continuously mutated over time, resulting
in almost regular graphs after numerous generations.
This simulated process consequently yields a continuous
change of network topology for each graph. ˆ
Hand ˆνwere ac-
cordingly computed for every generation and tracked. The time
series of both measures are shown in Figure 1. More precisely,
simulations of topological change were conducted by starting
with star graphs of fixed sizes (N= 16,32,64,128,256 and
512 vertices). In every generation, edges between every pair
of vertices were randomly added and respectively removed.
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
N = 16 N = 32
N = 64 N = 128
N = 256
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
Generation
0 20 40 60 80 100
N = 512
Generation
Figure 1. Simulation results of networks with various sizes N, whereas the
red line represents ˆ
H, the blue line ˆνand vertical bars indicate standard
deviations.
For each graph size, six runs were conducted in an effort to
estimate variance.
As shown in Figure 1, both measures converged after 100
generations. All entropy trajectories show fast convergence
compared to ˆνtrajectories, with the convergence time de-
creasing with increasing N. Although ˆνyield larger variances
(especially for N≤32), its slower convergence and qual-
itatively similar trajectories for all graph sizes Nillustrates
greater sensitivity to topological changes. In that respect,
matrix entropy loses significance with increasing graph size.
IV. DATAS ET S
In this study, two different networks, namely Facebook and
Twitter, of the German party “DIE LINKE” were analyzed,
because both exhibit a star-like topology, yet to a different
degree. As a comparison, a part of the Epinions social network,
as an example for a nearly regular graph, was also included.
A. Facebook Dataset
Figure 2 depicts the network of the Facebook page “DIE
LINKE” from January 2017 as a graph in which the size of
each node corresponds to the out-degree (number of out-links).
As can be seen, the network is dominated by the central node
102
International Journal
o
n Advances in Internet Technology
, vol
1
3
no
3
&
4
, year 20
20
,
http://www.iariajournals.org/internet_technology/
20
20
, © Copyright by auth
ors, Published under agreement with IARIA
-
www.iaria.org
of the page owner and, therefore, closely resembles a star-
shaped topology.
[DIE LINKE]
[1363]
[245]
[2530]
[1478]
[198]
[2285]
[2505]
[1454]
[1225]
[41]
[768]
[1493]
[197]
[797]
[705]
[836]
[606]
[163]
[996]
[926]
[207]
[474]
[504]
[850]
[456]
[484]
[1531]
[724]
[590]
[931]
[932]
[387]
[1547]
[120]
[1597]
[2106]
[339]
[139]
[751]
[2233]
[890]
[2788]
[2301]
[280]
[1527]
[19]
[2173]
[103]
[1195]
[428]
[877]
[493]
[880]
[1518]
[87]
[2791]
[212]
[905]
[2184]
[1835]
[1580]
[1555]
[1577]
[615]
[1552]
[1557]
[255]
[1545]
[613]
[1228]
[1299]
[1254]
[1245]
[947]
[360]
[144]
[267]
[1521]
[10]
[1013]
[2404]
[1588]
[800]
[556]
[1041]
[2531]
[621]
[761]
[2081]
[410]
[76]
[291]
[1526]
[684]
[2534]
[130]
[500]
[628]
[964]
[983]
[1043]
[1544]
[1567]
[1583]
[1168]
[1024]
[1173]
[711]
[208]
[15]
[1596]
[639]
[25]
[669]
[0]
[2193]
[697]
[592]
[165]
[1572]
[672]
[138]
[499]
[622]
[1401]
[461]
[1578]
[1037]
[309]
[1250]
[470]
[469]
[737]
[1369]
[923]
[1592]
[930]
[1468]
[1541]
[415]
[146]
[1239]
[1040]
[1624]
[126]
[1176]
[122]
[2774]
[222]
[1748]
[1585]
[1039]
[1522]
[1022]
[1107]
[934]
[275]
[1260]
[611]
[702]
[2507]
[501]
[722]
[719]
[795]
[1364]
[2429]
[1600]
[809]
[1590]
[141]
[164]
[206]
[422]
[498] [503]
[580]
[634]
[699]
[807]
[808]
[823]
[1042]
[1571]
[1584]
[1587]
[1598]
[1599]
[2298]
[2308]
[2]
[2313]
[1265]
[2212]
[594]
[488]
[5]
[2199]
[1280]
[121]
[1481]
[1483]
[2414]
[715]
[2390]
[610]
[997]
[731]
[1010]
[65]
[1579]
[1005]
[1017]
[2034]
[774]
[1004]
[1021]
[1029]
[1877]
[334]
[209]
[1177]
[1308]
[2405]
[490]
[1223]
[264]
[935]
[258]
[149]
[600]
[1073]
[325]
[1528]
[1227]
[1051]
[1511]
[1473]
[221]
[34]
[1514]
[1984]
[11]
[17]
[1268]
[1970]
[1340]
[617]
[793]
[317]
[48]