Page 1

On Cross-Correlation Evaluation Model of

Internet Macroscopic Topology by Genetic

Algorithm

Ye XU

College of Information Science and Engineering, Shenyang Ligong University, Shenyang, China

Email: xuy.mail@gmail.com

Zhuo WANG

College of Information Science and Engineering, Shenyang Ligong University, Shenyang, China

Email: {zhuowang}@163.com

Abstract—Cross-correlation evaluation model, CCEM, was

mainly studied to evaluate how much two different

topologies are similar to each other in a quantitative way,

and further used in evaluating whether a topology by an

Internet topology model is close to real Internet or not. SLS

(Signless Laplacian Spectra), is used to quantitatively

identify the topology properties of the Internet generated

by the model and the Internet out of real measuring. SLS

eigenvectors could be gained out of this procedure, then a

cross-correlation calculation was performed on the

eigenvectors to give the difference identification in a

quantitative way. With this, a recommended way of using

the CCEM within a Genetic Algorithm was finally given.

Index Terms—Cross-correlation, Internet topology,

topology evaluation, SLS

I. INTRODUCTION

The research on the Internet topology modeling has

been growing into a hot topic in Internet-related research

fields recently [1]. There are basically three phases of

research in Internet topology modeling from 1980’s till

now [4]. The latest researches have been giving great

help in discovering characters of Internet topology after

Faloutsos found power-law distribution in Internet

topology structure in 1999 [6]. And after that, based on

the power-law related findings, many researchers had

developed many kinds of Internet topology models. All

these models could give a mathematical way to construct

an Internet, however, they could only be named as

qualitative or quasi-quantitative models because the way

to construct these models are not complete quantitative.

To construct a completely quantitative model,

quantitative evaluation

Cross-correlation Evaluation Model, CCEM, out of

composite methods of graph theory, spectral density [5]

algorithm is necessary.

and correlation algorithm [15] would be studied in this

paper.

A. Spectral density introduction

A non-directed graph G could be denoted by it

symmetrical adjacency matrix A. If there is a link

between node i and node j in G, then Aij=Aji=1, otherwise

Aij=Aji=0. Eigen values of G are the eigen values of its

matrix A, and they are denoted as λ1, λ2… λn.

Researches in Graph Theory show that eigen values of a

graph are closely related to the structural properties of

the graph topology. So studies on a graph’s eigen values

are useful in topology research.

Spectra of a graph G is denoted by a set of the eigen

values and their tuple of its adjacency matrix A [2], and

it’s denoted as Eq. (1).

⎛

=

m

where m is the tuple of the eigen value.

Spectral density

)(λρ

, is the eigen value density of

the adjacency matrix A, and it could be denoted as: [2] [3]

[4]

1

)(

λρ

⎟⎟

⎠

⎞

⎜⎜

⎝

n

n

m

G

(

Spec

...

...

)

1

1

λλ

. (1)

∑

=

i

−=

n

i

N

1

)(

λλδ

. (2)

where

iλ is the ith eigen value of the adjacency matrix

A, N is the number of the eigen values.

approaching to a continuous function when

)

→

(λ

N

ρ

will be

∞

.

B. Experiment samples

Experiment samples in this paper are the router-level

Internet samples measured from CAIDA1. The rough

measuring results in this paper are the router-level

1 CAIDA, the Cooperative Association for Internet Data Analysis, is a

worldwide research center on Internet-related research fields. CAIDA

has more than thirty monitor nodes which are distributed throughout

the whole world, measuring and monitoring the Internet.

This work is supported by the National Natural Science Foundation

of China (60802031)

230JOURNAL OF NETWORKS, VOL. 6, NO. 2, FEBRUARY 2011

© 2011 ACADEMY PUBLISHER

doi:10.4304/jnw.6.2.230-237

Page 2

Internet topology data measured at 30th, Jan. 20062 from

as many as twenty-one CAIDA monitors3. And after the

IP Alias resolution, we get twenty-one set of measuring

samples.

Then we move on sampling bias handling process.

Firstly, we gather them together (the twenty-one monitor

measuring results) to form a complete testing sample in

order to reduce the impact of sampling bias to an

extreme extent. And this best copy of sample is

undoubtedly regarded as our key sample in experiments

of the paper.

However, we still made several other inferior or

incomplete testing samples for comparison reasons, and

they are sample(1) comprising data from only one

monitors (arin monitor), and sample(2) from two

monitors (arin, b-root), till sample(20) from as many as

twenty monitors.

Finally, We get an experiment Internet samples with

1,145,841 routers (nodes) and 2,907,638 links from the

twenty-one monitors. After IP alias-solution [16] [17],

the size of the sample reduced to 29,367 routers and

190,280 links respectively [12], but still too large to be

easily handled by computer.

To simplify the computation, we performed a

second-order sampling (re-sampling) operations on the

experiment samples, and the re-sampling rules are:

1)Re-sampling operation is completely random, it could

start from any effective node in target graph;

2)Re-sampled results must be a connected graph;

3)Re-sampled results should cover as much nodes as

possible, i.e., node selection is preferential to link

selections.

At last, the re-sampled Internet topology graph was

converted into an adjacency matrix for further

calculation.

II. POSSIBILITY OF USING SPECTRAL DENSITY IN

DISTINGUISHING TOPOLOGY GRAPHS

Before we made use of spectral density to construct

CCEM, we would first testify whether it could be used to

distinguish topology graphs (including Internet topology)

or not.

Three representative graphs: ER random graph,

Scale-free graph and Internet topology graph were

selected for the test in this paper.

A. Distinguishing ER random graph

According to [3], the spectral density of an ER random

graph converges to a half-circle, and the low part of the

half-circle exhibits an exponential distribution, as is

shown in Fig. 1 from Ref. [3].

2 The reason why measuring topology data at 30th, Jan. 2006 is that there

are as many as twenty-one monitors providing effective measuring data that

day. For other days round that period of time, the fact is, there would be

fewer effective monitors.

3 The twenty-one monitors are arin, b-root, cam, cdg-rssac, champagne,

d-root, e-root, h-root, i-root, iad, ihug, k-root, lhr, m-root, mwest, neu1, nrt,

riesling, sjc, uoregon and yto. And all monitors are distributed into different

continents for better measuring Internet throughout the whole world.

Figure 1. Diagram of ER Random Graph’s spectral density. The axes

are calibrated by (Np(1-p))^0.5

B. Distinguishing scale-free graph

Spectrum density of a scale-free graph out of BA

model [3][8][9][10][11] exhibits a symmetrically

continuous curve with a triangular center together with

two power-law distribution sides, as is shown in Fig. 2

from Ref. [8].

Figure 2. Diagram of Scale-free Graph’s spectral density. The axes are

calibrated by (Np(1-p))^0.5

C. Distinguishing Internet topology graph

We can find from Fig. 1 and Fig. 2 that different graph

exhibits quite different spectra diagram. Thus the spectral

density could be utilized as a tool to distinguish graph.

Internet topology graph, as we know, is a type of

graph different from ER graph and Scale-free graph, but

is a little similar to the scale-free one [1][6][12]. We then

take a look at if it is possible to distinguish the Internet

topology graph from the scale-free one.

For simplicity and better comparison, we draw three

copy of Internet graph with the re-sampling tool

mentioned above and the size of the three samples after

re-sampling are 30 nodes and 29 links, 300 nodes and

536 links, as well as 500 nodes and 753 links

respectively. Their eigen values and spectral density are

listed in table I.

JOURNAL OF NETWORKS, VOL. 6, NO. 2, FEBRUARY 2011231

© 2011 ACADEMY PUBLISHER

Page 3

TABLE I

EIGEN VALUES AND SPECTRAL DENSITY OF THREE RE-SAMPLED

INTERNET GRAPHS

30 ips 300 ips

λ (13)1

)(λρ

λ (104)1

-3.2196 0.0333 -8.7818

-2.6318 0.0333 -8.0004

… … …

-0.5663 0.0333 -0.1767

-0.0000 0.5333 -0.0000

0.5663 0.0333 0.1479

… … …

2.6318 0.0333 8.8174

3.2196 0.0333 14.1650

Note: The value in the bracket is the total number of the eigen values.

500 ips

)(λ

0.0033

0.0033

…

0.0033

0.5567

0.0033

…

0.0033

0.0033

ρ

λ (112)1

-10.7058

-10.2681

…

-0.2635

-0.0000

0.1113

…

10.9470

12.3570

)(λ

0.0020

0.0020

…

0.0020

0.7320

0.0020

…

0.0020

0.0020

ρ

The symmetry of the spectral density could be found

from table I, and this is consistent to the spectra

symmetry on scale-free graphs found in [3], [8]. The

correspondence match proves in a coarse granularity that

there is a little similarity between the Internet graph and

the scale-free graph, as was mentioned previously.

However, there are differences between the graphs,

and we illustrated the Internet’s spectra diagram in Fig. 3

for better comparison.

Figure 3. Spectral density diagrams of three Internet graphs. The

sub-graph in the top-right is a plot zoomed in to [-5, 5] in axis x and [0,

0.2] in axis y for a better view

From Fig. 3, we first find that there are complete

conformities in all three re-sampled graphs (30 ips, 300

ips and 500 ips), such as two small peaks whenλ =±

1.0000, one distinct peak when λ =0, and all

)(λρ

<0.005 when λ <-1.0000 and λ >1.0000.

All three graphs comprise quite different size and

contents (specific routers and links) due to re-sampling

rules, and the conformity found in Fig. 3 shows that,

though performed on different part of Internet, the

spectral density still get similar results. So conclusions

could be made that, spectral density is OK in

representing real Internet graph characters.

Next, we compare Fig. 3 with the scale-free graph

(Fig. 2) and find that the center of three spectral density

curves in Fig. 3 is of triangular shape, which is similar to

the scale-free graph. For the two side parts, however,

they are different from scale-free graph since the side

parts are not complied with exponential distribution or

power-law distribution. So the spectral density is OK in

distinguishing Internet graph from the scale-free graph.

Again, we begin to distinguish the Internet graph from

the ER graph by comparing Fig. 3 with Fig. 1, and the

differences are easily found from the two Fig.s. So, we

make the conclusion that the spectral density is OK in

distinguishing Internet graph from the ER graph.

Together with the fact that spectral density gives a

quantitative description of Internet topology characters,

we would make use of it in CCEM for Internet topology

modeling.

III. INTERNET TOPOLOGY CHARACTERS DISCOVERED BY

SPECTRAL DENSITY

A. General spectral density

For a better view of spectra distribution, we calibrate

the coordinate system by a factor of

) 1 (

pNp

−

to

make a new one with axis X as

λρ

What’s more, we enlarge the size of the re-sampled

Internet topology graph from 30 ips, 300 ips and 500 ips

(Fig. 3) to 300, 800, 2000, 3000 and 4000 ips (Fig. 4) so

as to make a graph closer to the real Internet.

We know that the more nodes a graph has, the closer

to real Internet it is. However, a graph with 4000 ips is

the largest one in this paper, and the reasons are: 1)

Limitations of computing abilities, the calculating

efficiency of spectral density would decrease sharply if

the size of the graph increases over 4000; 2) Internet

characters could be well expressed through spectral

density no matter how many nodes an Internet graph has.

And this is a fact having been proved in Fig. 3

(different-sized-graph has conformities in spectral

density structure) and going to be proved again in Fig. 4.

)1 (/

pNp −λ

and

axis Y as

)1 ()(

pNp −

[1] [3].

Figure 4. Spectral density of five re-sampled graphs. The sub-graph in

the top-right is a plot zoomed in to [-3, 3] in axis x and [0, 0.15] in axis

y for a better view

From Fig. 4, we found that all five graphs’ spectral

density showed very good conformities despite of their

232JOURNAL OF NETWORKS, VOL. 6, NO. 2, FEBRUARY 2011

© 2011 ACADEMY PUBLISHER

Page 4

different size. All five plots have the maximum when

λ =0 and the second maximum when λ =0.5 around.

Similar to what was found in Fig. 3, the conformity

among five Internet graphs proved that, only a

small-sized Internet graph could be enough to represent

key properties of real Internet topology by spectral

density based on the re-sampling tool. Which means that,

performing experiments on the complete Internet

topology graph is not necessary any more for us to study

its properties, a rather smaller re-sampled graph with

appropriate algorithm could also be effective.

Back to the basic idea of this paper, to distinguish

topology graphs by comparing their spectral density.

However, the spectral density is somewhat in coarse

granularity, there is another especially valuable kind of

spectral density named Signless Laplacian Spectra (SLS)

which could give further and finer information on a

graph’s properties [14].

B. SLS

An SLS matrix |L| of a graph G is defined to |L|=D+A,

where matrix D is a diagonal matrix representing G’s

degree, and matrix A is G’s adjacency matrix [14]. SLS

is eigen values of |L|. Some researches in graph theory

indicate that SLS is the best spectra in distinguishing

different graphs [14]. In this paper, SLS is used on four

re-sampled Internet topology graphs (3000 ips). And the

result is illustrated in Fig. 5.

Figure 5. SLS analysis results on four 3000-ip graphs, where axis y is

in logarithm, and axis x is sorted by eigen values’ descending order

From Fig. 5, firstly, we could see that all four curves

show high similarities although the four samples are

completely random and different from each other. Again,

this should be regarded as another proof that the

re-sampled samples could

properties of the real Internet graph.

There are two evident horizontal lines when SLS

equals to 1(100) and 2, which means that there are the

most nodes in the Internet topology graph when SLS

equals to 1, and the second-most nodes at SLS=2. All

four samples exhibit same properties clearly in Fig. 5.

For the other part of the Fig. 5, i.e., the part when

SLS>2 and SLS<1, we’d make further studies by

effectively represent

performing power-law distribution fitting operations [1].

The fit result is illustrated in Fig. 6 and Fig. 7.

From Fig. 6, we could see that there is obvious

power-law relationship between

corresponding descending order, and the fitting result

ACC (absolute value of the correlation coefficient) is

greater than 0.9, meaning that the fitting operation is

highly acceptable. The power-law relationship found

here is quite consistent to what was found in the spectral

density research on China CERNET in [1].

However, there is not clear power-law relationship

since ACC is rather small in Fig. 7. And this could also

be regarded as a criterion identifying Internet graph.

SLS and its

Figure 6. Power law distribution fitting results with descending eigen

value when SLS>2 of four re-sampled graphs.

Figure 7. Power law distribution fitting results with descending eigen

value when SLS<1.

C. Selection for CCEM

Compared with the general spectral density, SLS is

better since 1) SLS is recommended to be the best

spectra in Ref. [14]; 2) SLS is as same as the general

spectral density in quantitatively identifying Internet

graph by its eigen value sequence, but is better in

discovering more characters of Internet such as two

horizontal phases at SLS=1 and SLS=2, one power-law

JOURNAL OF NETWORKS, VOL. 6, NO. 2, FEBRUARY 2011233

© 2011 ACADEMY PUBLISHER

Page 5

distribution part when SLS>2 and non-power-law

distribution at SLS<1.

So, SLS would be selected for studying CCEM.

IV. CROSS-CORRELATION EVALUATION MODEL

A. Transformation from SLS to data sequence

To evaluate an Internet model is to determine the

differences between the generated Internet topology and

the real Internet topology. SLS eigen values sequences

are introduced to determine the differences as a

quantitative evaluation way.

The SLS eigen values are a sequence of numerical

numbers representing the primary characters of the target

graph, i.e., the Internet topology graph. With the two

value sequences, the problem left for us is to find an

effective algorithm to get the evaluation result between

them.

CCEM, then is used to evaluate whether a given or a

generated topology is similar to or same as the real

Internet topology. And the first requirement of CCEM is

to transform SLS into data sequence.

After the sort of eigen values of SLS in descending

way, the data sequence is gained and ready for the next

step evaluation, as is shown in Eq. (3) and Eq. (4).

][

SLSmu

=

][

SLSnv

=

where u[m] is sequence of real Internet topology, v[n] is

sequence of a given topology, m and n denote the

descending order of SLS eigen value of the real Internet

topology and a given topology, respectively.

]

]

[

m

[

n

. (3)

. (4)

B. Cross-correlation algorithm

Cross-correlation

distinguishing and identifying the differences between

numerical number sequences

quantitative way [15], and it’s defined in Eq. (5).

algorithm is capable of

in an absolutely

) 0 (

vv

r

) 0 (

uu

r

)(

)(

uv

uv

nr

n =

ρ

. (5)

where n is the disalignment lag between u[m] and v[n],

)(nruv

is cross-variance,

autocorrelation of u[m] and v[n] with disalignment lag

set to be 0, respectively. And they are:

∑

=

0

k

N

∑

=

k

N

∑

=

k

N

where N is length of u[m] and v[n]. Let Nu=Length(u[m]),

Nv=Length(v[n]), then:

NN

−+=

1

) 0 (

uu

r

and

) 0 (

vv r

are

−−

+=

1

u

)()(

1

)(

nN

uv

knvknr

. (6)

−

=

1

0

−

2

)(

1

) 0 (

uu

r

N

ku

. (7)

=

1

0

2

)(

1

) 0 (

vv

r

N

kv

. (8)

uuvu

uu

=

u

NNifNNN

NN

!

if

==

. (9)

[Proof 1]: The cross-correlation maximum occurs if

and only if two given topology are completely identical

and the disalignment lag is 0.

Proof:

If two given topology are completely identical, then:

][

mu

=

And if the disalignment lag is 0, with Eq. (6), we get:

∑

) 0 (

k

N

According to Eq. (10), we get:

) 0 () 0 (

vv uuuv

rrr

==

∑

=

0

k

N

∑

=

0

k

N

)]([

uE

=

. (12)

First, we are going to prove:

([| | )(|) 0 (

≥≥

kuEjrr

uuuu

Consider a non-negative variable,

()( [(

±

kukuE

Extend this Eq. (14), we get:

([2)]([(

+±

kuE

With Eq. (13), we simplify Eq. (15) to:

) 0 (2

±

r

uu

Then:

) 0 (

uuuu

rr

≤−

And

|) 0 (

≥

rr

uuuu

Now, we’ve proved that cross-correlation value

reaches maximum when

disalignment lag set to be 0.

Next, we are going to prove when

) 0 (

uu

r

or

When the disalignment lag

][

nv

. (10)

−

=

=

1

0

)()(

1

N

uv

kvkur

. (11)

) 0 (

−

=

1

)()(

1

N

kuku

−

=

1

2

)(

1

N

ku

2k

0!| )]()

=+±

jjku

. (13)

0] ))

2≥+

j

. (14)

0))]([

)]()

2

2

≥

+±±

j

jkukuEkuE

. (15)

0)(2

≥

jr

uu

) 0 (

uu

r

)(

j

≤

0!| )

j

(

=

j

. (16)

][][

nvmu

=

and the

][]![

nvmu

=

, the

maximum is still

) 0 (

!=

j

ruu

0

. So, according to Eq.

vv r

.

0

( j

, to simplify the

)

to be

proof procedure, we can set

]!,[

=

jmu

(16), we get:

)( jruv

since

!],[

=

jjnv

0!| )

j

r

| )

j

(|) 0 (

uu

=≥

jrr

uv

.

And for

) 0 (

vv r

, similar to

) 0 (

≥

r

vv

) 0 (

uu

, we still get:

0

=

.

!(|

jr

uv

End proof.

We then use SLS eigen values from Fig. 5, i.e., the

four SLS sequences from four real Internet topology, to

testify whether Proof (1) is correct or not.

234JOURNAL OF NETWORKS, VOL. 6, NO. 2, FEBRUARY 2011

© 2011 ACADEMY PUBLISHER