On graph modelling, node ranking and visualisation.
ABSTRACT Graphs traditionally have many applications in various areas of computer science. Research in graphbased data mining has recently gained a high level of attraction due to its broad range of applications. Examples include XML documents, web logs, web searches and molecular biology. Most of the approaches used in these applications focus on deriving interesting, frequent patterns from given datasets. Two fundamental questions are, however, ignored; that is, how to derive a graph from a set of objects and how to order nodes according to their relations with others in the graph. In this paper, we provide approaches to building a graph from a given set of objects accompanied by their feature vectors, as well as to ranking nodes in the graph. The basic idea of our ranking approach is to quantify the important role of a node as the degree to which it has direct and indirect relationships with other nodes in a graph. A method for visualising graphs with ranking nodes is also presented. The visual examples and applications are provided to demonstrate the effectiveness of our approaches.
 Citations (18)
 Cited In (0)
 [Show abstract] [Hide abstract]
ABSTRACT: Graph visualization is commonly used to visually model relations in many areas. Examples include Web sites, CASE tools, and knowledge representation. When the amount of information in these graphs becomes too large, users, however, cannot perceive all elements at the same time. A clustered graph can greatly reduce visual complexity by temporarily replacing a set of nodes in clusters with abstract nodes. This paper proposes a new approach to clustering graphs. The approach constructs the node similarity matrix of a graph that is derived from a novel metric of node similarity. The linkage pattern of the graph is thus encoded into the similarity matrix, and then one obtains the hierarchical abstraction of densely linked subgraphs by applying the kmeans algorithm to the matrix. A heuristic method is developed to overcome the inherent drawbacks of the kmeans algorithm. For clustered graphs we present a multilevel multiwindow approach to hierarchically drawing them in different abstract level views with the purpose of improving their readability. The proposed approaches demonstrate good results in our experiments. As application examples, visualization of part of Java class diagrams and Web graphs are provided. We also conducted usability experiments on our algorithm and approach. The results have shown that the hierarchically clustered graph used in our system can improve user performance for certain types of tasks.Journal of Visual Languages & Computing 01/2006; · 0.56 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: This paper discusses the use of graphtheoretic methods for the representation and searching of threedimensional patterns of sidechains in protein structures. The position of a sidechain is represented by pseudoatoms, and the relative positions of pairs of sidechains by the distances between them. This description of the geometry can be represented by a labelled graph in which the nodes and the edges of the graph represent the pseudoatoms and the sets of interpseudoatomic distances, respectively. Given such a representation, a protein can be searched for the presence of a userdefined query pattern of sidechains by means of a subgraphisomorphism algorithm which is implemented in the program ASSAM. Experiments with one such algorithm, that due to Ullmann, show that it provides both an effective and a highly efficient way of searching for patterns of sidechains. The method is illustrated by searches for the serine protease catalytic triad, for residues involved in the catalytic activity of staphyloccocal nuclease, and for the zincbinding sidechains of thermolysin. The catalytic triad pattern search revealed the existence of a second AspHisSer triadlike arrangement of residues in trypsinogen and chymotrypsinogen, in addition to the catalytic residues. In addition the program can be used to search for hypothetical patterns, as is shown for a pattern of three tryptophan sidechains. These searches demonstrate that the search algorithm can successfully retrieve the great majority of the expected proteins, as well as other, previously unreported proteins that contain the pattern of interest.Journal of Molecular Biology 11/1994; 243(2):32744. · 3.91 Impact Factor 
Page 1
188
Int. J. Intelligent Systems Technologies and Applications Vol. 3, Nos. 3/4, 2007
Copyright © 2007 Inderscience Enterprises Ltd.
On graph modelling, node ranking and visualisation
Xiaodi Huang*
Department of Mathematics, Statistics and Computer Science,
University of New England, Australia
Email: xhuang@turing.une.edu.au,
*Corresponding author
Dianhui Wang
Department of Computer Science and Computer Engineering,
La Trobe University, Australia
Email: dh.wang@latrobe.edu.au
Kazuo Misue and Jiro Tanaka
Department of Computer Science,
Graduate School of Systems and Information Engineering,
University of Tsukuba, Japan
Email: misue@cs.tsukuba.ac.jp
Email: jiro@cs.tsukuba.ac.jp
A.S.M. Sajeev
Department of Mathematics, Statistics and Computer Science,
The University of New England, Australia
Email: sajeev@turing,une.edu.au
Abstract: Graphs traditionally have many applications in various areas of
computer science. Research in graphbased data mining has recently gained a
high level of attraction due to its broad range of applications. Examples include
XML documents, web logs, web searches and molecular biology. Most of the
approaches used in these applications focus on deriving interesting, frequent
patterns from given datasets. Two fundamental questions are, however,
ignored; that is, how to derive a graph from a set of objects and how to order
nodes according to their relations with others in the graph. In this paper, we
provide approaches to building a graph from a given set of objects
accompanied by their feature vectors, as well as to ranking nodes in the graph.
The basic idea of our ranking approach is to quantify the important role of a
node as the degree to which it has direct and indirect relationships with other
nodes in a graph. A method for visualising graphs with ranking nodes is also
presented. The visual examples and applications are provided to demonstrate
the effectiveness of our approaches.
Keywords: graph mining; graph visualisation; web graph.
Page 2
On the graph modelling, node ranking and visualisation 189
Reference to this paper should be made as follows: Huang, X., Wang, D.,
Misue, K., Tanaka, J. and Sajeev, A.S.M. (2007) ‘On graph modelling, node
ranking and visualisation’, Int. J. Intelligent Systems Technologies and
Applications, Vol. 3, Nos. 3/4, pp.188–210.
Biographical notes: Xiaodi Huang received his PhD in 2004 from Swinburne
University of Technology, Australia. Currently, he is a Lecturer in the
Department of Mathematics, Statistics and Computer Science at the University
of New England. He worked as an Associate Professor at South China Normal
University. His research interests include visual information processing, data
mining and software engineering.
Dianhui Wang received his PhD in 1995 from Northeastern University, China.
He is a Senior Lecturer in the Department of Computer Science and Computer
Engineering at La Trobe University, Australia. He worked as a Postdoctoral
Fellow and Research Fellow at Nanyan Technological University, Singapore
and The Hong Kong Polytechnic University, Hong Kong, respectively.
During his sabbatical leave, he was employed as a Professor in the College of
Software at Kyungwon University, Korea. His current research interests
include computational intelligence and data mining for bioinformatics and
multimedia information processing. He is a Senior Member of the IEEE
Computer Society.
Kazuo Misue is an Associate Professor in the Department of Computer
Science, Graduate School of Systems and Information Engineering, University
of Tsukuba, Japan. His research interests include human–computer interaction,
information visualisation, automatic graph drawing and supporting human
creative activities. He received his BSc and MSc from Tokyo University of
Science in 1984 and 1986, respectively. He received a PhD in Engineering
from the University of Tokyo in 1997. He is a member of the ACM.
Jiro Tanaka is a Professor in the Department of Computer Science, Graduate
School of Systems and Information Engineering, University of Tsukuba. His
research interests include visual programming, interactive programming,
computer–human interaction and software engineering. He received his BSc
and MSc from the University of Tokyo in 1975 and 1977, respectively.
He received a PhD in Computer Science from the University of Utah in 1984.
He is a member of the ACM and the IEEE Computer Society.
Professor A.S.M. Sajeev is the Chair of IT/Computer Science at the University
of New England. He received a PhD in Computer Science from Monash
University, Australia. He is also a Fellow of the Institution of Engineers,
Australia. His research interests are in software engineering, mainly, in
software processes, testing and software metrics including metrics
visualisation.
1 Introduction
Graphs have widely been used in many fields of computer science. Graphical
representations are a useful tool for illustrating complex structures in a direct and an
intuitive way. Examples of wellknown graph structures include control and dataflow
tables, entity relationship diagrams, Petri nets, hardware and software architectures,
evolution diagrams of nondeterministic processes, SADTdiagrams, state charts and
Page 3
190
X. Huang et al.
many more. The underlying structures of these representations are graphs where the
nodes and edges represent structural elements and the relations between them.
In addition, the attributes of nodes and edges describe the contents, layout or other
annotations of these elements and relations.
Graphbased data mining has become popular in the last few years. It has a broad
range of applications such as in the analyses of XML documents, citation networks, CAD
circuits, web logs, web searches and molecular biology (Artymiuk et al., 1994; Broder
et al., 2000; Polyzotis and Garofalakis, 2002).
Many graphmining approaches derive interesting frequent patterns, and then use
them to build predictive models. These approaches, however, ignore two fundamental
questions of how to model structural data into a graph and how to rank the nodes in
terms of their important roles in the representation of the relationships among data in the
graph. Given a set of interrelated transactions, for example, we attempt to examine the
relations between them. A graph is built where the node represents a transaction, and
the edge indicates the relation between two transactions. We may ask which transactions
are important in terms of their relations with others. All the nodes in this graph are
ranked, thereby identifying nodes, which play important roles in the representation of the
relations. The involved issues are hereby how to build a graph and how to rank nodes.
The major contributions of this paper are in:
1
developing an approach to building a graph from a set of given objects
2
providing two algorithms for ranking nodes within a graph and
3
presenting two filtering models for visualising the ranked graph.
The rest of this paper is organised as follows. In Section 2 we give a general approach to
graph modelling from a set of objects together with their feature vectors. In Section 3, we
present approaches to ranking nodes with respect to the importance the roles they play in
a graph. Section 4 gives a filtering method for visualising large graphs. Examples and
applications are provided in Section 5, followed by conclusions in Section 6.
2 Graph modelling
In this section, we present an approach that builds a graph from a given set of objects
together with their feature vectors.
In the area of graph mining, an important issue is how to model relational data into a
graph. This problem can be further divided into two subproblems, namely how to model
an object into a node and how to model the relationship between two objects
into an edge. The former just simply maps an object into a node, while the latter is not
trivial. In a nonattributed graph, an edge is directly built between two objects if they
have one common attribute. Two web pages, for instance, can be represented as two
corresponding nodes, and an edge is created in the web graph if the two pages have a
hyperlink between them. The common attribute used to build an edge in this example is
the hyperlink. In the real world, the relationships between two objects (nodes), however,
could be multiple. Two web pages may be hyperlinked, share several common keywords
and refer to the same references. In an attributed graph, the edges are associated with a
set of attributes where each member represents one aspect of the relationships between
two corresponding nodes. In other words, each attribute of an edge is determined by a
Page 4
On the graph modelling, node ranking and visualisation 191
relation between one or more attributes of two corresponding nodes connected to such an
edge. Thus, the problem addressed here is how to find the mappings from two involved
node attributes to edge attributes.
Attributed graph was introduced by Tsai and Fu for pattern analysis (Tsai and Fu,
1979). It gives a straightforward representation of structural patterns. An attributed graph
has attributes attached to each node and edge, with the formal definition as follows.
Definition 1 Attributed graph: A 4tuple graph G = (V,E,µ,v) where V = {1, 2,…, n} is the
set of nodes, E ⊆ V × V is the set of edges, µ: V→ L maps the nodes to node attributes
and v: E→ R maps the edges to edge attributes. The sets L and R are the nodeattribute
set and edgeattribute set, respectively.
In what follows, we present a general approach to determining such mappings.
2.1 Node attribute
As mentioned above, the nodes in an attributed graph are associated with a set of
domainspecific attributes in which each member characterises one aspect of nodes. The
relation between two nodes is based on the corresponding attribute values. The
set of all possible attributevalue pairs associated with node i is denoted by {(Li, Ci)}
where
{  1,2,..., }
iii
Llkn
==
is the attribute set, and Ci the set of the value part of
attributevalue pairs. The latter set can be treated as a nodeattribute vector
12
( ,,,)
=
…
iiii
c cc
c
with
∈
i
R
c
and
.
R
∈
index of a node, and superscripts to indicate the index of attributes. In fact, a set of
attributevalue pairs characterises a node. For example, a node of an attributed web graph
is associated with two sets: an attribute set such as L1 = {Hyperlink, Keyword1,
Keyword2}, and an attribute value set such as C1 = {1, 1, 2}, where the values
correspond to the attributes. The attribute value set can be viewed as a nodeattribute
vector, namely c1 = (1, 1, 2). However, different nodes may have various attributes and
corresponding attribute values. Further to the above example, suppose another web page,
represented as a node, has its attribute set L2 = {Reference, Keyword1}, as well as its
attribute value set C2 = {1, 2}. In this case, the first page has one Hyperlink, one
Keyword1 and two Keyword2s, whereas the second page has no Hyperlink, one
Reference page and two Keyword1s. Obviously, these two nodeattribute vectors
c1 = (1, 1, 2) and c2 = (1, 2) have different dimensions. Thus, we cannot directly build the
relations between them based on their attributevalues. In fact, the following observations
are true for most cases. Firstly, each nodeattribute set may differ in the number of
members due to the fact that different nodes have different attributes. Secondly, different
nodeattribute sets may contain common attributes.
Observation: A nodeattribute set is a group of attributes describing the properties of
nodes. We denote the nodeattribute set of nodes V = {1, 2,…, n} as L1, L2,…, Ln,
respectively, and L as a unified global nodeattribute set. For all 1 ≤ i, j ≤ n and i ≠ j, we
have
k
in
1
×
in
k
ic
Note that we use subscripts to indicate the
1
  0
≥
ij
LL
−
2
 0
≥
ij
LL
∩
3
1
.
n
ii
LL
=
= ∪
Page 5
192
X. Huang et al.
To calculate the degree of a relation between two nodes, we have to use the attribute
vectors with the same dimensionality. In the following, we present a way to achieve this.
Suppose that two nodes i and j have nodeattribute vectors
1
i
2
i
( ,
c c
,,)
=
…
in
ii
c
c
and
1
j
2
j
( ,
c c
,
=
, ),
=
=
…
jn
j
…
=
j
c
c
which are associated with their respective attribute set
==
…
where ni ≠ nj. We have a unified
L
∪
with the number of elements
{ 
i
l
1,2,
L
, }
n
L
k
ii
L
attribute set
k
and
{ 
j
l
1,2,,},
k
jj
Lkn
ijij
 .
ijij
nLL
=∪
Now, we need
to find a mapping to extend ci and cj to new vectors
'
ic and
'
jc with nij dimensionalities,
namely
nodeattribute sets with the unified nodeattribute set, the components of the new vectors
remain unchanged if both sets have corresponding attributes and 0 otherwise. That is,
'
k k
11
:( )
c
ϕ
( )
c
.
ij
=
i
n
n
k k
=
→
One straightforward mapping is that by comparing
''
if if
:
0 otherwise 0 otherwise
l
i
l
i
m
j
m
j ijij
k
i
k
j
clLclL
cc
ϕ
⎧
⎪
⎨
⎪
⎩
⎧
⎪
⎨
⎪
⎩
∈∈
==
As an example, the two web pages are given by:
1
L1 = {Hyperlink, Reference, Keyword1, Keyword2, Keyword5}, C1 = {1, 1, 2,
3, 5} and
2
L2 = {Hyperlink, Keyword1, Keyword3, Keyword4, Keyword5}, C2 = {1, 1, 6,
5, 8}.
We have Lij = L1 ∪ L2 = {Hyperlink, Reference, Keyword1, Keyword2, Keyword3,
Keyword4, Keyword5} and n12 = ⎜Lij⎢ = 7. The new nodeattribute vectors of two web
pages are obtained by comparing Lij with L1 and with L2, respectively:
(1,1,2,3,0,0,5),(1,0,1,0,6,5,8)
==
cc
''
21
where 0s result from the fact that L1 has no Keyword3 and Keyword4 and L2 has no
Reference and Keyword2.
Although different nodes in a graph have various nodeattribute sets, they are now
associated with global nodeattribute vectors with the equal dimension. This provides a
uniform basis for building the relationships between nodes with various number of
attributes.
Note that all attributes fall into two categories: the nominal attribute indicating a
binary relation (1 or 0) and the cardinal attribute measured by a real number. For
illustration, we treat the Hyperlink as a nominal attribute, which tells whether a page has
the Hyperlink attribute or not. The corresponding value is 1 if it has or 0 otherwise.
In contrast, the Keyword attribute is a cardinal one, which means that the number of a
particular Keyword can be one or more. The nodeattribute set Li can thus be written as
Li = Ni ∪ Oi. That is, a nodeattribute set Li consists of a nominalattribute set Ni and a
cardinal attribute set Oi.
It is obvious that in the preceding example we have:
N1 = {Hyperlink, Reference}
O1 = {Keyword1, Keyword2, Keyword5}
N2 = {Hyperlink}
O2 = {Keyword1, Keyword3, Keyword4, Keyword5}.
Page 6
On the graph modelling, node ranking and visualisation 193
Different types of attributes will lead to using different similarity measurements to
compute the degree of relations between nodes.
2.2 Edge attribute
We have so far described the attribute set and its nodeattribute vector. In what follows,
we present how to build edges that are able to represent more than one relation defined
over the pairs of nodes.
The edges are treated similarly as the nodes. Each edge attribute takes on attributes
from the set
{
1,2,, }.
R
Rrkn
==
…
The edge connecting two nodes i and j is associated
with an edgeattribute vector
(,,
=
…
ij ijij
w w
w
edgeattributes that describe the nR kinds of possible relations with respect to all pairs of
node attributes. This means that two nodes have at most nR kinds of relations. Each kind
of such relations can separately be modelled into a graph where all the edges are
associated with this particular relation. In this way, we can construct nR graphs in total to
illustrate nR kinds of relations. Examining these graphs, however, is not trivial, thereby
losing the intuition power of visualisation. An alternative way is to simultaneously
combine the nR graphs into a single graph where points represent nodes, and lines and
their visual attributes indicate the nR relational attributes. In the previous example,
we suppose R = {Link, Reference, Similarity}, where the Link refers to whether a web
page has a hyperlink to another or not, the Reference indicates whether two pages refer
to the same references or not, and the Similarity is the degree of similarity between the
two pages. We make use of a line, line arrows and the width of a line within one graph to
visualise the Link, Reference and Similarity relations between nodes, respectively. In a
case where two web pages have a hyperlink between them, but do not share common
reference and keywords, we should draw a nonarrow line with a normal width between
the two nodes. In the drawing of an attribute graph, an edge and its visual attributes such
as the line type, width and colour are dependent on the relations of its two corresponding
node attributes. Using an edge and its visual attributes makes it possible to illustrate
multiple relations between two nodes. It follows that we need to determine, which node
attributes or their combinations to map onto which kinds of the edge attributes.
We address this question in the following.
In line with Definition 1, we are able to formalise the relations between the
nodeattribute set and the edgeattribute set as L × L→R. This means that node attributes
determine edge attributes in terms
In particular, let L
combination of attributes from L. We then have L
1 ≤ k ≤ nR such that
1
,
i
LL
=
∪=
LL
φ∩=
relations between nodes with respect to the chosen attribute(s). Each node i uses a subset
of attributes from its attribute set Li to build
L = {Hyperlink, Reference, Keyword1, Keyword2} and R = {Link, Reference,
Similarity} and the subsets L
Keyword2}, r
{Hyperlink}→{Link}, {Reference}
×
{Keyword1, Keyword2} × {Keyword1, Keyword2}→{Similarity}.
The choice of the set L
model, as well as the type of attributes. Given the subvectors in relation to the subsets of
k
11
, ),
R
n
ij
w
where nR is the number of distinct
of their
k is a derived set of an attribute or a
k × L
and i ≠ j. A graph is built to represent nR
types and magnitudes.
k be a subset of set L, that is, L
k →{r
k} where L
k ⊆ L, and
B
n
i
ij
R n relations with others. For instance, given
1 = {Hyperlink}, L
2 = {Reference} and r
2 = {Reference}, L
3 = {Similarity}, we have {Hyperlink} ×
{Reference} (L
3 = {Keyword1,
1 = {Link}, r
2)→{Reference}(r
2) and
k relies on the domain knowlege into which the graph is to
Page 7
194
X. Huang et al.
nodeattributes, we subsequently compute the nodenode similarities by means of certain
similarity measurements. The purpose of this computation is to tell whether the kth
relationship between two nodes is strong enough to be represented as an edge connecting
them.
In general, we measure the degree of a relation between two nodes with respect to
individual or combined nodeattributes in a more complex way. Let
degree of relation r
attributes in
.
k
ij
w denotes the
k (the kth edgeattribute) between nodes i and j with respect to the
iL and
kk
jL According to the previously presented method, we first extend
vectors
1
i
2
i
( ,
c c
,,)
=
…
in
ii
c
c
and
1
j
2
j
( ,
c c
,,)
=
…
jn
jj
c
c
to
''
'
i
1
i
2
i
(,,,)
=
…
R
n
i
c cc
c
and
''
'1
j
2
j
(,,,)
R
n
jj
c cc
=
c
…
in order that they have the same dimension. A component of the
corresponding edgeattribute vector is then computed by
'
( ,
f c c
)
k
ij
k
i
k
j
k
w
=
where
k = 1, 2,…, nR. It quantifies the degree of r
attribute(s). Therefore, the overall relations between two nodes are represented as the
edgeattribute vector:
(
R
k relation between nodes i and j with respect to
)()
()
(
)
''''
1
i
1
j
2
i
2
j
12
,,,,,,
RR
n
i
n
j ijn
wf c c fccfcc
=
…
We may assume that the range of fk() is [0, 1]. The above model allows us to choose a
different function for each edge attribute. Such choices depend on what the nature of
attributes is. In the case of nominal attributes, for example, we use Boolean operations.
For cardinal attributes, the widely used cosine measurement can be employed instead.
Within the cosine similarity measurement, the elements of vectors
viewed as the coordinates in a ldimensional space. In particular, with l = 2, they indicate
two points in a twodimensional Cartesian plane.
In general, the cosine similarity between two vectors ci and cj, denoted by
defined as:
⋅
=
cc
'k
ic and
'k
ic are
,
k
ij
θ
is
2
2
cos
ij
k
ij
ij
θ
c c
where the norm of a ldimension vector is
value as the degree of one kind of relationship between two nodes i and j. If the value
cos()
ij
θ
exceeds the userspecified threshold, this type of the relation will be
illustrated in the graph in a form of an edge or an attribute of the edge connecting the two
nodes.
We provide an example to illustrate the above approach. Referring to the preceding
example in Section 2.1, we have two nodeattribute vectors:
2
1
2
2
2
l
2
.
Pppp
=+++
?
We use this
k
''
21
(1,1,2,3,0,0,5) and (1,0,1,0,6,5,8)
==
c
c
and
vector wij are obviously
{ }
r
kkk
LL
×→
where k = 1, 2, 3. The first two components of the edgeattribute
1
w = and
12
0
w = by means of simple Boolean operations.
Two cardinal subvectors of the nodeattribute vectors c1 and c2 are
=
c
According to the cosine similarity measure, we have
1
12
2
'3
1
(2,3,0,0,5)
=
c
and
'3
2
(1,0,6,5,8).
Page 8
On the graph modelling, node ranking and visualisation 195
(
)
(
0.607
)
''
''
3
1
3
2
3
12
3
12
3
1
3
2
22
2222222222
(2 1 3 0
× + × + × + ×
0 5 5 8)
2300510658
0.5
⋅
==
=
++++×++++
=>
cc
w
cc
θ
where 0.50 is the threshold. As a result, we have the edgeattribute vector
123
12121212
(,,) (1, 0, 0.607).
==
www
w
3 NodeRank: node importance rank
In this section, we address the second question of how to rank nodes in this paper.
In particular, we present two approaches to ranking nodes according to the important
roles they play within a graph.
As we know, the purpose of the use of a graph is to represent complex relationships
between objects. Nodes are employed to represent objects, and edges represent
relationships between these objects. However, various nodes and edges in a graph play
different roles in revealing such relationships. Some are prominent whereas others are
trivial. Prominent nodes are those that are extensively involved in the relationships with
other nodes. This involvement makes them more visible to others. For this reason, it is
necessary to develop a method that accurately identifies the important nodes within a
graph. A function is needed to measure the important roles of nodes regarding
the depiction of relationships. A node has many attributes such as its important role in
the graph, the content of what it represents, and its size. The central issue is to determine
which attribute(s) qualify for distinguishing nodes and edges in a graph. Fundamental to
our approach is the notion of NodeRank, which is defined as follows:
Definition 2 Node Importance Score (NodeRank): A real number indicating the degree of
the important role, which a node plays in a graph, or of the involvement of a node in
other relationships. Let G = (V, E) be an undirected and connected graph, and s be a
function that assigns a real value ranging from 0 to 1 to each node i in G, namely s:
V→[0, 1]. s (i) is called the Node Importance Score of node i (0 ≤ s(i) ≤ 1), which is
empoyed to rank nodes.
The next question is how to calculate the NodeRank. The measure of ‘centrality’ used in
social network analysis (Faust and Stanley, 1995) provides the basis for this calculation.
The concept of centrality quantifies the significance of each node within a graph, by
summarising structural relationships among the nodes.
Similarly, we develop the method for ranking the nodes by measuring their important
roles within a graph. Two approaches to calculating the NodeRank of nodes are presented
in Sections 3.1 and 3.2.
3.1 Degree, closeness and between centrality index
As mentioned in Definition 2, a function s is to be constructed to map each node into a
real value that indicates its important role within a graph.
Page 9
196
X. Huang et al.
Centrality refers to the importance of a particular node in a social network. Measures
of centrality ‘attempt to describe and measure properties of ‘actor location’ in a social
network’ (Faust and Stanley, 1995). More precisely, the importance of a node can be
measured by both how often it is directly connected to other nodes and how easily it
reaches other nodes. In other words, the NodeRank of a node is determined by how it
connects to other nodes directly as well as indirectly. In this view, the role of a node is
treated as a function of its position in a given graph.
A variety of centrality measures has been reported in the literature (Faust and
Stanley, 1995). They are roughly classified into three fundamental types: degree,
closeness and betweenness centralities (Faust and Stanley, 1994; Freeman, 1978;
Friedkin, 1991; Granovetter, 1973; Katz, 1953; Scott, 2000). Their definitions are given
in the following. Note that these measures are relative ones in that all of them are divided
by the corresponding possible maximum values.
Definition 3 (Freeman, 1978) Degree centrality: The number of edges attached to a
node i.
deg( )
n
−
( )
1
D
i
c i
=
where deg( )
With normalisation, the degree centrality ranges from 0 to 1 where 0 means the
smallest, possible value, and 1 the highest, possible centrality.
The degree centrality reflects the direct relationships of a particular node with others
in a graph.
{ ( , )
j
}.
=∈∧ ∈
ii i jEV
Definition 4 (Freeman, 1978) Closeness centrality: The sum of geodesic distances,
defined as the shortest path connecting two nodes, between a node i and all other nodes.
∑
( , )
d i j
( )
1
j V
n
C
c i
∈
=
−
where d (i, j) is the shortest path between nodes i and j, which is equal to the number of
edges between them. The closeness centrality reflects the freedom of a node from
controlling by others, and its capacity for independent relationships within a graph. This
measure simply tells how far a particular node is from all others. A node with a higher
closeness score is less centralised than a node with a lower closeness score. The most
central nodes are able to interact quickly with all other nodes because they are close to
them.
Definition 5 (Freeman, 1978) Betweenness centrality: The ratio of the number of the
shortest paths between two nodes passing a node i to the number of all possible such
shortest paths in a graph:
'
1
'
11
( )
( )
jk
g i
g
( )
( )
and
(1)(2)/2
−
==
=
=
−−
∑∑
j
j
n
B
B
B
k
jk
c i
c i
c i
nn
where gjk is the number of the shortest paths between node j to node k, and gjk (i) is the
number of these paths via node i.
Page 10
On the graph modelling, node ranking and visualisation 197
The betweenness centrality reflects the intermediate location of a node by capturing
indirect relationships with other nodes. It “measures the extent to which a particular point
lies between the various other points in a graph: a point of relatively low degree may
play an important ‘intermediary’ role and so be very central to the network” (Scott,
2000). A node with the high betweenness is able to facilitate or limit the interaction
between nodes.
The degree centrality is a local centrality in the sense that it counts the number of
links connecting adjacent nodes to it. Unlike the degree centrality, both closeness and
betweenness centralities are global ones.
We observe that these three centrality measures may produce contrary results for the
same graph. It is likely that a node has a low degree centrality, but has a high
betweenness centrality. Freeman (1987) demonstrated that the betweenness centrality
best captures the essence of important nodes in a graph, and generates the largest node
variances, while the degree centrality appears to produce the smallest node variances.
To overcome these drawbacks of single centrality, we combine the degree, closeness and
betweenness centralities to yield the following measure as the NodeRank score:
()
( )
s i
( ),
i c i c i
( ),( )
DCB
g c
=
An appropriate function g should be chosen to rectify such shortcomings by
incorporating the interactions between the three centralities. One simple scheme utilises
their linear combinations:
123
d i j
1
11
123
( )
s i
( ) ( )
∑
( )
( )
i g
( , )
deg( )
n
−
11( 1)(2)/2
DCB
nj
jkjk
j V
n
jk
w c iw c i w c i
g
i
www
nn
−
∈==
=++
=++
−−−
∑ ∑
(1)
where the weights add up to 1, that is, w1 + w2 + w3 = 1. For simplicity, we consider the
three measures equally by assigning equal weights in our experiments. By adjusting these
weights, we can change the emphasis of the different measures. In general, a node will
have a high NodeRank score, if it has a high degree, is close to all other nodes and lies on
several shortest paths between other nodes.
The time complexity of these measures is obviously O(n
NodeRanks for large graphs should be computed offline.
2), implying that the
3.2 Eigenvector centrality
Degree, closeness and betweenness centralities are derived from graph distances. Nodes
are considered to be important by quantifying how close other nodes are to them.
We should, however, simultaneously consider the importance of nodes that are
proximate to the node under study. In reality, being chosen by a popular individual,
should add more to one’s popularity. If, however, the domain of a node contains
only peripheral or marginally important nodes, then the NodeRank of this node should
be low.
For these reasons, we should weight the distances used in the proximity indices by
incorporating the importance of nodes in the influence domain. More precisely, the
Page 11
198
X. Huang et al.
NodeRank of each node is proportional to the sum of the NodeRanks of the nodes to
which it is directly connected. We thus have:
1
( )
s i
( )
n
ij
j
wa s j
=
= ∑
where s(i) is the NodeRank of node i, and aij = 1 if node i and node j are adjacent, or 0
otherwise. Hence, a node connected to many nodes that are well connected is assigned a
high score. A node that is connected only to near isolates, however, is not assigned a
high score, even if it has a high degree.
On the other hand, some characteristics of nodes do not depend completely on
their relations to others. For example, each student in a class has some popularity
that depends on his/her internal characteristics. In a communication network,
each individual has sources of information that are independent of other group members.
Similarly, each individual node has its own characteristics, which are independent of
other nodes.
Overall, the importance of a node consists of two factors: external and internal. The
above equation can thus be modified as:
1
( )
s i
( ) ( )
e i
n
ij
j
wa s j
=
=+
∑
where e(i) is the internal importance score of node i.
These n equations for all the nodes in a graph can be rewritten in a matrix form. Let s
be a vector of NodeRanks denoted by
(
=
s
()
,,,
=
…
e ee
e
and A be an adjacent matrix A = (aij) where
aij = 1 if nodes i and j are adjacent, or 0 otherwise. An equation with a combination of
both external and internal factors is then given by:
12
)
,,,,
…
n T
s ss
e be a vector of the internal
importance of nodes by
12
n T
=+
T
wA
sse
The parameterw reflects the relative importance of external versus internal factors in the
determination of importance scores. This equation has the matrix solution:
(
I wA
se
)
1
−
=−
T
where I is the identity matrix and e is a vector.
We normalise s to a length of 1:
∑
2
 
s
( )
s i
1
i
==
The eigenvector important score of a node i is finally obtained:
( )
s i
i
s
=
(2)
Namely the components of vector s are the NodeRanks of corresponding nodes in a
graph. High rank scores imply that the nodes are connected by either a few other nodes
with high rank scores, or many others with low to moderate rank scores.
Eigenvector centrality is best understood as a variant of simple degree. Equivalently,
it can be interpreted as a refined version of degree.
Page 12
On the graph modelling, node ranking and visualisation 199
The idea of eigenvector centrality was initiated by Katz (1953), further developed by
Hubbell (1965) and many others, finally culminated with Bonacich (1972), which
defined centrality as the principal eigenvector of an adjacency matrix. The concept of
NodeRank is similar to the PageRank used in the Google search engine (Page et al.,
1998). The PageRank models the browsing internet behaviours of a user as a random
surfing model. It assumes that the user either follows a link from the current page or
jumps to a random page in the web graph. Unlike using this stochastic analysis of a
random in the PageRank, the NodeRank is founded on the linkage pattern of either
directed or undirected graphs.
4 Filtering graphs for visualisation
We have so far constructed a graph from a given set of objects with their attribute
vectors, and also ranked those objects with respect to their roles in the graph. The next
related issue is how to visually represent these objects and their ranking relations.
We provide two definitions to describe this problem.
Definition 6 Visual vocabulary: A set of marks and their graphical properties used to
visually encode information. Let M be the set of marks and P be the set of graphical
properties. We have M = {Point, Line, Surface, Area, Volume} and P = {Position,
Colour, Grey Scale, Size, Shape, Orientation, Texture}. The visual vocabulary can
therefore be defined as Q = M × P.
Definition 7 Attributed graph visualisation: A graph visualisation containing two
mappings: one mapping is from nodes to their positions in a space,
f: V→ℜ
choosing a visual structure in accordance with the visualisation design guidelines,
d: G→ Q, where Q is the visual vocabulary and d is the design rules.
k where k = 2 or 3, and f is a layout algorithm; the other mapping is formed by
For illustration, we assume that the visual vocabulary for the particular example of two
web pages in Section 2.2 is Q = {line, yellow, width}, which illustrate the
edgeattribute set R = {Link, Reference, Similarity}. Due to the presence of the first and
third components of vector w12 = (1, 0, 0.607) (see Section 2.2), we should draw a line
with its width proportional to 0.607 as the edge between two nodes indicating the web
pages. The number 0.607 indicates the degree to which they are similar with respect to
having some common keywords. The colour of this line, however, should not be yellow
as the two web pages do not have the same reference.
The challenge of graph visualisation in practice is how to display a large graph with
many nodes and edges (Henry, 1992; Huang, 2004; Huang and Lai, 2006). The layout
of this huge graph as a whole will be cluttered on a limited screen, thus losing
the advantages of visualisation. Therefore, we must reduce the visual complexity, and at
the same time provide an intuitive picture of the relations among a large number
of objects.
The idea of our approach is to display only these important nodes and their associated
edges. In particular, we rank all the nodes in order of their NodeRanks, and then remove
or hide those nodes whose NodeRanks are under a threshold, as well as their connected
edges.
Page 13
200
X. Huang et al.
As stated before, filtering a graph aims to reduce the visual complexity. A filtered
graph, however, should still represent the main relationships between nodes after such a
filtering. It is required that a filtered graph should be at least connected. The following
definitions from the graph theory (Harary, 1972) details the implication of graph
connections.
Definition 8 A connected graph: A graph is connected only if for every pair of nodes i
and j, there is a path starting at i and ending at j.
The graph is 2connected if the deletion of any one node still keeps it connected. It is
3connected if it still remains connected with removal of any two nodes and so on. It is
required that a kconnected graph should have at least k + 1 nodes.
Definition 9 CutPoint: A node is a cutpoint if its removal disconnects a graph, that is,
increases the number of separated subgraphs.
Also, the removal of cutpoints makes some points unreachable from some others. The
concept of a cutpoint can be extended from a single node to a set of nodes that is capable
of keeping a graph connected. This set is referred to as a cutset. In other words, a
node cutset is a subset of the nodes of a graph whose removal (simultaneously removing
all edges adjacent to those nodes) makes the graph no longer connected. If the set is of
size k, then it is called a knode cut, denoted by k(G). That is, the k(G) of a graph is the
minimum number of removed nodes to make the graph G disconnected.
Definition 10 Bridge: An edge is a bridge if its removal results in disconnected
subgraphs.
A bridge is an edge such that the graph has fewer components after this edge is removed.
We denote the bridge set of graph G as λ (G), and the edge connectivity as λ (G), which
is the minimum number of edges that must be removed so as to disconnect G.
4.1 Algorithm for filtering graphs
In the following we present an algorithm for filtering graphs based on the NodeRank.
The main algorithm is as follows:
Input: a connected graph, and a threshold and
Output: a connected and filtered graph.
−
compute the NodeRank of each node in the graph
−
rank the nodes in order of their NodeRanks and
−
remove nodes and their associated edges without being cutpoints and
bridges, and whose NodeRanks are less than the threshold.
To calculate the NodeRank of each node in the first step of the above algorithm, we can
use the centrality index (formula (1)) or eigenvector NodeRank (formula (2)). The former
mainly involves finding the shortest path between two nodes. This is easily achieved by
the wellknown algorithms of Dijkstra and Floyd, respectively. The latter algorithm
(Hotelling, 1936; Golub and Van Loan, 1996) for computing the NodeRanks is as
follows:
1 set s(i) = 1 and assign initial values to e(i) and w, for all node i
( ) ( )( )
ij
j
2 compute
s iw a s j e i
∗
=+
∑
Page 14
On the graph modelling, node ranking and visualisation 201
3 set γ equal to the square root of the sum of squares of s∗ (the vector)
4 set ( ) ( )/
s is i γ
∗
=
for all nodes i and
5 repeat Steps 1 to 3 until γ stops changing.
Note that after executing Step 1 the first time, s∗(i) is simply equal to the degree of each
node.
During the removal of nodes and edges, we must make sure that the resulting graph is
connected. The algorithms for identifying cutpoints (or bridges) attempt to delete nodes
(or edges) one by one, and then to use depthfirst or breadthfirst search to test whether
the resulting graph is still connected. This can be done in linear time.
The running time for the computation in the algorithm comes mainly from Step 1.
For a large graph, we can find its linkage pattern offline, or gradually obtain part of the
graph onthefly and then filter it.
After presenting the approaches for ranking nodes and the basic requirement for
filtering, the remaining question is about the way in which a graph with the ranked nodes
is filtered. In the following we present two models for filtering.
4.2 Global filtering
With the ranked nodes, a filtered graph is generated by suppressing all the nodes whose
NodeRanks are under a threshold specified and adjusted by users.
Denoting the threshold as t, we have a set of the remaining nodes after filtering:
' { 
i i
( )
i
}
s
VVt
=∈∧≥
where s (i) is the NodeRank of node i.
An alternative way of obtaining the threshold is to specify what proportion of nodes
will be reserved after filtering:
 '
V
n
t
≤
With a given threshold, the procedure for global filtering begins by ranking the nodes in
decreasing order of their NodeRanks, and then sequentially removes the number of (1−t)
n nodes in order by starting from a node with the smallest score.
The alteration of a graph after filtering is measured by the density of a graph (Faust
and Stanley, 1995):
2 
(
n n
 Deg( )
n
−
( )
G
∇
1)1
E
−
G
==
Where Deg( )
quantifies the proportion of possible edges that are actually present in a graph. In other
words, the density of a graph is the average proportion of edges incident with nodes in
the graph.
2  /
=
G E n is the average degree of the nodes in a graph. This equation
Page 15
202
X. Huang et al.
As a summary, we formalise the requirements and properties of filtering graphs.
Let F be a set of filtered nodes and their associated edges, t be a threshold of the
NodeRank and G′ be the filtered graph. The following properties should hold true:
•
({  ( )
k G
( )
s i
},{( , )
t
( , )
i j
( )})
G
λ
F i iVi jjVV
=∈−∧<∈∧∈−
•
'
GGF
=−
•
'
  
≤

GG where '≤
Vn and ' 
≤

EE
•
G′ is connected and
•
( ')
G
( ).
G
∇≥ ∇
A filtered graph not only reduces the visual complexity, but also captures the main
relationships between objects in the original graph.
4.3 Fisheye view
Furnas (1981, 1986) devised the fisheye lens model as an efficient way of showing large
graphs. This model uses a ‘Degree Of Interest’ (DOI) function to constrain (called
Filtering fisheye views) or distort the display of information to relevant or interesting
elements, according to their importance. The DOI of a node i is a function of A Priori
Importance (API) of i and the Distance (D) between i and the focus node f (Furnas
1981, 1986):
=−
DOI( )API( )( , )
D i fii
It is clear that the DOI of a node increases with API and decreases with the distance.
In general, the importance and distance factors are related to the size and level in a
hierarchy, or some other characteristics of objects in the collection. In the case of
filtering a graph, API(i) is naturally determined by the importance of a node x, which is
the exact implication of the NodeRank. As a result, we have:
=
API( )( )
s ii
As for the distance, D(i, f ) is the graphtheoretic distance between a node i and the focus
node vf, that is, D(i, f ) = d (i, vf ). Let d denote the maximum distance over all pairs of
nodes, which is the diameter of the graph
d
,max ( ,
f
i vV
∈
).
f
d i v
=
The DOI can therefore be
rewritten as:
DOI( ) API( )
( )
s i
( , )
)
f
( ,
ii D i f
d i v
d
=−
−
=
With this normalisation, the value of DOI (i) ranges within (−1, 1]. Given a node i, if
DOI(i) ≥ t, then the node i will be visible, otherwise invisible in the filtered graph, that is,
'
{ DOI( )}
Vi iVit
=∈∧≥
where t is a threshold. The effect of the filtering fisheye view
is that nodes in the neighbourhood areas of the focus node are shown in detail, while
only the most important nodes in more distant areas are displayed.