Page 1

A Space Efficient Clustered Visualization of Large Graphs

Mao Lin Huang and Quang Vinh Nguyen

Faculty of Information Technology, University of Technology, Sydney, Australia

maolin@it.uts.edu.au, quvnguye@it.uts.edu.au

Abstract

This paper proposes a new technique for visualizing

large graphs of several ten thousands of vertices and edges.

To achieve the graph abstraction, a hierarchical clustered

graph is extracted from a general large graph based on the

community structures which are discovered in the graph.

An enclosure geometrical partitioning algorithm is then ap-

plied to achieve the space optimization. For graph drawing,

we technically use the combination of a spring-embbeder

algorithm and circular drawings that archives the goal of

optimization of display space and aesthetical niceness. We

also discuss an associated interaction mechanism accom-

panied with the layout solution. Our interaction not only al-

lows users to navigate hierarchically up and down through

the entire clustered graph, but also provides a way to nav-

igate multiple clusters concurrently. Animation is also im-

plemented to preserve users’ mental maps during the inter-

action.

1Introduction

Graph visualization has been widely used in human-

computer interaction. A graph commonly includes a node

set and an edge set to represent entities and relationships

between entities respectively. In real-world applications,

graphs could be very large with thousands or perhaps mil-

lions of nodes, such as citation and collaboration networks

and the World Wide Web (WWW). As the result of rapid

increasing of the size in networks, the large scale visual-

ization has become one of the hottest topics in Information

Visualization. The question about how to comprehensively

display large graphs on the screen becomes the key issue in

graph visualization. However, the display of large graphs

can decrease significantly the performance of a visualiza-

tion technique which normally performs well on small or

medium size of datasets. Large graph visualization is usu-

ally suffered from poor running time and limitation of dis-

play space. In addition, the issue of ”view-ability” and us-

ability also arises because it will be almost impossible to

discern between nodes and edges when a dataset of thou-

sands of items are displayed [13].

It seems that classical graph models with a simple node-

link diagram tend to be inadequate for large scale visual-

ization with several thousands of items. The lack of formal

hierarchical structures in real world applications could limit

the conveying and perception of the complicated informa-

tion. Figure 1 shows an example of the graph visualization

of a WWW site which illustrates two typical major prob-

lems:

1. Too many nodes (pages) to be displayed and the layout

of such a large geometrical area could not be fitted in

one single screen

2. The layout of the graph has inefficient utilization of

display space with many unused areas in the display.

Figure 1. An example of a large graph visu-

alization using the classic virtual-page tech-

nique.

To solve the first problem, a well established new graph

model to accommodate with the visualization of large

graphs is required. We believe that one way to deal with

the display of large graphs is to provide users with a certain

degree of Graph Abstract. That is to filter out some details

Fourth International Conference on Image and Graphics

© 2007 IEEE

DOI 10.1109/ICIG.2007.10

920

Page 2

of the graph drawing which the users are not interested at

a time, while the overall structure of the graph drawing is

maintained for navigation.

Among several available graph visualization approaches,

we believe that the use of clustered graph is a better option

for graph abstraction. It is believed that a good visualization

system for very large graphs should be a combination three

components including graph drawing, graph clustering and

interaction [15] [4]. Therefore, visualization of clustered

graphs such as [7] [9] is one of excellent approaches to deal

with large graphs through the graph abstraction. A clustered

graphcanbeextractedfromageneralgraphsbypartitioning

recursively the graph into hierarchy of sub- graphs for clus-

tered visualization, which simplifies the complex structures

of large graphs through the global abstraction for easy in-

terpretation, perception and navigation of large information

spaces.

To solve the second problem mentioned above, we need

to optimize the layout algorithms so that the utilization of

display screen could be maximized. As a result it allows

more nodes to be displayed. The research from Ware [21]

shows that more information can be displayed on very high-

resolution and large screen, but it does not necessarily pro-

vide very much more information into the brain. This is be-

cause the conventional monitor covers only 5-10% of visual

field in the normal condition, but it uses as much as 50% of

brain pixels [22]. The study also shows that the uniquely

stimulated brain pixels peak at the width of a normal moni-

tor view, and it is effective (but not critical) to increase the

number of pixels for the normal desktop to reach the limit of

the brain pixels. Therefore, the investigation of optimized

visual abstraction (clustering) techniques that could provide

viewers with more comprehensive views of the large graphs

is important.

2Related Work

Large graphs visualization has been recently received a

lot of attention of researchers in both information visual-

ization and graph drawing communities. Although some

newly available techniques are quite capable of visualizing

large graphs of thousands to hundred thousands of nodes

and edges, named a few [3] [12] [20] [1] [11] and [10], the

visualization of large graphs is still one of the hot topics in

information visualization.

Harel and Koren [12] described a technique to draw a

graph that used high-dimensional embedding and then pro-

jected it onto a 2D plane. Although this technique is very

fast and capable of exhibit graph in various dimensions with

some good navigational ability, it is more suitable for vi-

sualizing mesh-graphs rather than tree-like graphs or clus-

tered graphs. One of the good approaches for handling

large graphs is to use multi-scale visualization [11] [3] and

[5]. These techniques typically apply a force-directed al-

gorithm to draw large graphs using multi-scale scheme, in

which they try to beautify the coarsest-scale representation.

These technique aim to improve the processing speed while

maintaining the graph niceness. Good visualization of large

graphs can also be archived by using multilevel techniques

[20] [2]. In short, techniques in this approach aim to im-

prove the visual appearance of the visualization by defining

different levels for a structure so that they can present the

graphs using an optimal algorithm at each level.

Although the above techniques are quite capable of vi-

sualizing large graphs, the space-efficiency is not consid-

ered in these techniques, in which could limit the amount

of information to be visualized on the screen at a time.

Fekete et al. [8] presented a space-efficient visualization

of graph using a modification of the well-known Tree-maps

[14]. Technically, the authors used Tree-maps to display

the tree structure of graph and used explicit link curves to

present the other links. This technique is optimal in term

of using display space and it is quite useful for visualiz-

ing structures that the underlying trees have some meaning.

However, the technique does not perform well in general

graphs and clustered graphs because the link curves may

cause the unnatural look of the graphs. Some preliminary

work has been carried out and from which two tree visual-

ization techniques Space-Optimized tree (or SO-tree) [17]

and EncCon tree [18] have been developed that can quickly

display large trees with maximized utilization of display

space. However, these techniques are only suitable for trees

(hierarchical structures).

This paper proposes a new technique for visualizing

very large general graphs. Our technique is similar to the

framework of Tulip [3] which consists of three components:

graph clustering, graph layout and interaction. We first use

a new clustering algorithm to partition the complete graph

into abstract clusters for achieving the view abstraction; that

greatly reduces the visual complexity of the graph layout,

and enhance the comprehension and understanding of the

graph.

The clustered graph is then visualized by using a new

space-efficient layout technique which is a combination of

layout algorithms. This geometrical optimization of graph

layout allows more data items (and clusters) to be displayed

within a limited screen resolution. Our visualization pro-

vides viewers with not only an abstract view of the en-

tire graph but also an interaction technique for navigating

through the large graph. The navigation method allows

users to navigate hierarchically through the clustered graph

and navigate across a number of selected clusters. All the

interactions are accommodated with animation to preserve

users’ mental maps during the navigation.

921

Page 3

3The archtecture of our Visualization

Our model for visualizing large graphs includes several

processes which are illustrated at Figure 2. There are two

major phases involved in this model including the clustering

analysis and the user interface phases. These two phases

operate independently.

Figure 2. An architecture model for visualiz-

ing large graphs.

The clustering analysis phase is responsible for analyz-

ing a large graph dataset and partitioning it into an opti-

mized clustered graph based on discovered internal commu-

nities. In short, the clustering algorithm recursively divides

the graphs into smaller and smaller sub-graphs based on the

density of connection within and between subgroups. Al-

though this process does not require a very fast algorithm

and it can run independently on a high-speed workstation,

the computational cost of clustering algorithms should be

controlled with the worst-case running time O(n2) on a

sparse graph or faster to ensure its capability of handling

hundreds thousands of items within a few hours using a or-

dinary personal computer. The clustering process also ex-

tractstheattributedpropertiesfornodes, edgesandrelations

between sub-graphs.

The user interface phase is responsible for visualization

and navigation of the clustered graph, including the pro-

cesses of layout optimization, interactive viewing and dis-

play optimization. A combination of a number of layout

algorithms is employed which aims to optimize the geomet-

rical space so that the large graph can be drawn at a normal

screen size.

During the navigation of clustered graphs, we allow

users to interactively adjust the views to reach an optimized

representation of the graph; from which users can obtain

the best understanding of the data and structures they are

currently interested in. This visualization is involved with

real time human-computer interaction. Therefore, very fast

graphlayoutandnavigationalgorithmsarerequiredforhan-

dling hundred thousands of items within minutes or seconds

using a personal computer with limited display space and

computational power.

The final display is created through the view navigation

and graphical attributing. We use rich graphic attributes to

assist viewers to quickly identify the domain specific prop-

erties associated with data items. We next describe briefly

of the clustering and the technical detail of our visualization

technique.

4 Graph Clustering

We use a graph clustering method which can quickly dis-

cover the community structure embedded in large graphs

and divide the graph into densely connected sub-graphs.

Our clustering algorithm is a modification of Newman’s al-

gorithm [16] which considers the equality of size among

clusters during the partitioning. The proposed algorithm

can not only run fast in time O((m+n)n) similarly to [16],

but also achieve a consistent partitioning result in which a

graph is divided into a set of clusters of the similar size. The

balanced size of clusters could provide users with a clearer

view of the clustered graph and thus it makes easier for

users to visualize and navigate large graphs. Our balanced

clustering technique also achieves the layout optimization

at both global and local levels of the display through the

use of the enclosure+connection layout technique. This al-

lowsmorevisualitemstobedisplayedwithinlimitedscreen

resolutions with comprehensive views. The combination of

ourclusteringmethodandaspace-efficientlayouttechnique

would enable the visualization of very large general graphs

with several thousands of elements.

5Graph visualization

We use a new space-efficient visualization technique

similarly to EncCon [18] to optimize the geometrical space

forvisualizinglargeclusteredgraphswithseveralthousands

of nodes and edges. This technique consists of two compo-

nents, the space-efficient layout and the interactive naviga-

tion. The layout of clustered graph is generated by using

922

Page 4

a combination of an extended fast enclosure partitioning al-

gorithm, called Clenccon, and a number of traditional graph

drawing algorithms, including a spring-embedder algorithm

[6], a circular drawing, and simple layout algorithm, to

archive the objectives of space-efficiency, aesthetical nice-

ness and fast computation. Although some available tech-

niques can use any layout algorithm at each clusters, such

as using a spring-force algorithm [20] [11], in our belief, the

choice of a space-efficient enclosure-partitioning algorithm

athighlevelswillbemoreefficientbecauseitprovidesmore

space for displaying information at the limited display.

The Clenccon layout algorithm is only applied to those

non-leaf sub-graphs, in which the space utilization and

computational cost are crucial. In other cases, the other

layout algorithms, such as Spring-embedder and circular

drawing, are applied to the calculation of the position for

those leaf sub-graphs, which contain a small number of

nodes in which the space utilization issue becomes less im-

portant and, therefore, the aesthetic niceness and flexibility

issues need to be more considered. The use of a particu-

lar layout algorithm depends on the nature of the leaf sub-

graphs. Our system also displays a high-level node-link di-

agram to present the overall clustering structure explicitly

(see examples at Figure 4 and Figure 5).

Technically, the algorithm inherits essentially the advan-

tage of space-filling techniques [14] [18] that maximize the

utilization of display space by using area division for the

partitioning of sub-trees and nodes. Note that the issue of

space utilization becomes significantly important when vi-

sualizing large graphs with thousands or even hundred thou-

sands of nodes and edges because of the limitation of screen

pixels. It is similar to EncCon [18] that use a rectangular

division method for recursively positioning the nodes hier-

archically. This property aims to provide users with a more

straightforward way to perceive the visualization and it en-

sures the efficient use of display space. Our new technique

is applied for clustered graphs rather than simple tree struc-

tures; therefore, the algorithm takes the connectivity prop-

erty between sibling nodes into its partitioning process. We

now describe the technical detail of our layout and naviga-

tion algorithms.

5.1Layout Algorithm

Our layout algorithm is responsible for positioning of all

nodes in a given clustered graph in a two-dimensional ge-

ometrical space, including a vertex subset {v1,v2,...,vn}

in V and a cluster subset {v?

graph C = {G, T} is derived by a general graph G = {V,

E} and a cluster tree T whose leaves are in V. Each cluster

v?

by the leaves of the sub-tree T’ rooted at r’. The root r’ of

the sub-tree T’ is also called a super-node. The super nodes

1,v?

2,...,v?

n} in V’. A clustered

i= C?is a sub cluster graph, contains a subset of V given

are not displayed in our visualization but they are used for

partitioning process of calculating the local region for sub

cluster graph. For the partitioning of clustered graph C, We

define a virtual tree consisting of a set of super-nodes for

area division. We define a super-node r(v?

v?

in [9].

Thelayoutalgorithmisacombinationoftwoalgorithms:

1) Clenccon - a fast area division algorithm and 2) graph

drawingalgorithmsincludingaspring-embedderalgorithm,

a circular drawing and a simple algorithm to lay out a very

small number of nodes.

Each cluster v?

R(v?

corresponding sub-clustered-graph G(v?

inside the geometrical area of R(v?

region R(v?

eas assigned to its children. The position of the super-node

r(v?

The position of leaf nodes is defined by either the spring

embedder, the circular drawing or the simple layout algo-

rithms.

We first assign the entire rectangular display area as the

local region to the clustered graph C. We then recursively

partition the local regions for every sub-clusters until all the

clusters are reached.

We assign a weight w(v’) to each vertex v’ for the cal-

culation of the local region R(v’) of the vertex. Although

the weight of each vertex can be associated with its prop-

erty, all the leaf vertices in our experiments have the same

weight.Suppose the rectangular local region R(v’) for

cluster v is drawn, we then need to calculate the local re-

gions {R(v?

{v?

area of each rectangle R(v?

w(v?

cluster v’ is done recursively from leaves of the cluster tree

to a the root of the cluster tree. The calculation is done by

the following formula:

i) for each cluster

i. Further description of the clustered graph can be found

iis bounded by a rectangular local region

i) centered at super-node r(v?

i). The drawing of the

i) is restricted to be

i). Therefore, the local

iis the sum of the rectangular ar-

i) of cluster v?

i) of v?

iis at the centre of the rectangle defined by R(v?

i).

l+1),R(v?

l+2,...,v?

l+2),...,R(v?

l+k}.

l+k)} for its sub-clusters

The partitioning ensures that the

l+i) is proportional to the weight

l+i. The calculation of w(v’) of a

l+1,v?

l+i) of the cluster v?

w(v) = w0+ S

k

?

i=1

w(vl+i)

(1)

where wois the internal weight of cluster v’. Although the

internal weight of a cluster can be defined by the cluster’s

attributed property, we define the internal weight of all clus-

ters to be 1 for all experiments. S is a constant (0 ¡ S ¡ 1),

and w(v?

v’. The constant S determines the size difference of local

regions of all clusters based on the number of descendants

of those vertices.

The process of recursive partitioning R(v’) into sub-

regions {R(v?

l+i) is the weight assigned to the ith child of cluster

l+1),R(v?

l+2),...,R(v?

l+k)} for all its children

923

Page 5

clusters {v?

below:

l+1,v?

l+2,...,v?

l+k} is illustrated as the procedure

procedure partitioning (Node N) {

if all child-nodes of N are leaf-nodes then

{

lay out the child-nodes using a graph algorithm;

scale the layout to fit with rectangular local region;

}

else

{

lay out the child-nodes using Clenccon algorithm;

for each non-leaf child-node of N

{

partitioning(child-node);

}

}

}

procedure Clenccon-layout (Node [] Nodes) {

group linked-nodes into subgroups;

sort the subgroups based on connection and size;

lay out subgroups using EncCon algorithm;

for each subgroup

{

lay out nodes in subgroup using EncCon algorithm;

}

}

procedure childnode-layout (Node [] Nodes) {

if number of child-nodes < K1 then

{

lay out the child-nodes using simple algorithm;

}

else if number of child-nodes > K1 and

no. edges / no. child-nodes > K2 then

{

lay out the child-nodes using circular drawing;

}

else

{

lay out the child-nodes using Spring algorithm;

}

}

where K1 = 6 indicates the number of nodes that is suit-

able for each algorithm. If there are just a few number of

nodes, i.e. less than 10 nodes, a simple node location algo-

rithm can perform well. K2 = 5 indicates the ratio of the

number of edges over the number of nodes. Because the

force directed algorithms do not usually perform well for a

graph whose the number of edges is much larger than the

number of nodes, we use the circular drawing in this situa-

tion.

The detail description of the area partitioning Enc-

Con can be found at [18].

improved force-directed layout algorithm, the traditional

Spring-Embedder [6] layout algorithm is chosen in our im-

plementation. This is because the algorithm is simple, easy

to implement, flexible and it performs well in general for

a small number of nodes, in which a more complicated al-

gorithm is not necessary. The circular drawing is a simple

technique but is very effective to show the pattern of rela-

tionship for a graph whose the number of edges is much

larger than number of nodes.

nodes equally on a circle where relational nodes are located

close together so that the pattern of connections can be dis-

play more clearly (see Figure 5 and Figure 6).

Although there are several

Technically, we place all

Figure 3(a) shows an example of partitioning and draw-

ing a small clustered graph using our algorithm. We can

see that the algorithm uses the Clenccon algorithm to layout

threeclusternodesandtheirinter-relationshipstoensurethe

efficient utilization of space and uses the spring-embedder

algorithm to draw the sub-graphs within each cluster to

achieve the aesthetic niceness and flexibility. Note that the

inter-relationships among clusters here are represented by

using abstract links. Figure 3(b) shows the same example

of the partitioning, but the inter-relationships among clus-

ters are represented by using the original structure of re-

lationships. Figure 4, Figure 5 and Figure 6 illustrate the

applications of our layout algorithm on various very large

datasets. These pictures show clearly the structure of clus-

tered graphs, in which sub-graphs are efficiently partitioned

and drawn inside their local regions. All of these pictures

use abstract links to represent the inter-relational structures

among clusters.

Figure 3. An example of partitioning and

drawing a small clustered graph using our

algorithm, a) use the abstract links among

clusters; b) use the original structure of re-

lationships.

5.2Navigational Views

There is no pure visualization technique that could assist

data retrieval without providing users with an associate nav-

igation mechanism in graphic user interface design. In our

user interface component, during the navigation, we enable

users to interactively adjust the views to reach an optimized

representation of the graph; from which users can obtain the

best understanding of the data and its relational structures

they are currently interesting in.

In our prototype, we use a multiple-views technique [19]

to achieve the focus+context view navigation of large clus-

tered graphs. This technique allows the exploration of data

hierarchically as well as across multiple selected clusters to

quickly focus on the interest parts of data. Therefore, users

can semantically zoom in one or many areas of interest or

sub-clusters (see Figure 9) while retaining simplified con-

text views. It effectively uses both focus and context views

924