Content uploaded by Alessandro Vespignani

Author content

All content in this area was uploaded by Alessandro Vespignani on Dec 18, 2013

Content may be subject to copyright.

arXiv:cond-mat/0209619v1 26 Sep 2002

Scale-free behavior of the Internet global performance

Roberto Percacci

∗

and Alessandro Vespignani

†

∗

International School for Advanced Studies SISSA/ISAS, via Beirut 4, 34014 Trieste, Italy; and

†

The Abdus Salam International Centre for

Theoretical Physics (ICTP), P.O. Box 586, 34100 Trieste, Italy

June 11, 2002.

Measurements and data analysis have proved very

eﬀective in the study of the Internet’s physical fabric and

have shown heterogeneities and statistical ﬂuctuations

extending over several orders of magnitude. Here we

analyze performance measurements obtained by the PingER

monitoring infrastructure. We focus on the relationship

between the Round-Trip-Time (RTT) and the geographical

distance. We deﬁne dimensionless variables that contain

information on the quality of Internet connections ﬁnding

that their probability distributions are characterized by a

slow power-law decay signalling the presence of scale-free

features. These results point out the extreme heterogeneity

of the Internet since the transmission speed between diﬀer-

ent points of the n etwork exhibits very large ﬂuctuations.

The associated scaling exponents appear to have fairly

stable values in diﬀerent data s ets and thus deﬁn e an

invariant characteristic of the Internet that might be used in

the future as a benchmark of the overall s tate of “health”

of the Internet. The observed scale-free character should be

incorporated in mo d els and analysis of Internet performance.

The Internet is a self-organizing system whose size has already

scaled ﬁve orders of magnitude since its incep t ion. Given the

extremely complex and interwoven structure of t he Internet,

several research groups started to deploy technologies and in-

frastructures aiming to obtain a more global picture of the In-

ternet. This has led to very interesting ﬁn dings concerning the

Internet maps topology. Connectivity and other metrics are

characterized by algebraic statistical distributions that signal

ﬂuctuations ext ending over many length scales [1, 2, 3, 4, 5].

These scale-free properties and the associated heterogeneity

of the Internet fabric deﬁne a large scale object whose prop-

erties cannot be inferred from local ones, and are in sharp

contrast with standard graph models. The importance of a

correct topological characterization of the Internet in routing

protocols and the parallel advancement in the understanding

of scale-free networks [6] have triggered a renewed interest

in Internet measurements and modeling. Considerable eﬀorts

have been devoted also to the collection of end-to-end per-

formance data by means of active measurements techniques.

This activity has stimulated several studies that, however, fo-

cus mainly on individual properties of hosts, routers or routes.

Only recently, an increasing body of work focuses on the per-

formance of the Internet as a whole, especially to forecast fu-

ture performance trends [7, 8]. These measurement s p ointed

out the presence of highly heterogeneous performances and

it is our interest t o inspect the possibility of a cooperative

“emergent phenomenon” with associated scale-free behavior.

The basic testing package for Internet performance is the

original PING (Packet InterNet Groper) program. Based on

the Internet Control Message Protocol (ICMP), Ping works

much like a sonar echo-location, sending packets that elicit

a reply from the targeted host. The program then mea-

sures the round-trip-time (RTT), i.e. how long it takes

each packet to make the round trip. Organizations such

as the National Laboratory for Applied Network Research

(http://moat.nlanr.net/) and the Cooperative Association for

Internet Data Analysis (http://www.caida.org/) use PING-

like probes from geographically diverse monitors to collect

RTT data to hundreds or thousands of Internet d estinations.

Our Internetwork Performance Measurement ( IPM) project

currently participates in the PingER monitoring infrastruc-

ture (http://www-iepm.slac.stanford.edu/). PingER was de-

veloped by the Internet End-to-end Performance Measure-

ment (IEPM) group to monitor the end-to-end performance

of Internet links. It consists of a number of beacon sites send-

ing regularly ICMP probes to hundreds of targets and storing

all d ata centrally. Most beacons and targets are hosts b elong-

ing to universities or research centers; they are connected to

many diﬀerent networks and backbones and have a very wide

geographical distribution, so they likely represent a statisti-

cally signiﬁcant sample of the Internet as a whole.

We have analyzed two years worth of PingER data, going

from April 2000 to March 2002. We have selected 3353 d if-

ferent beacon-target pairs, taken out of 36 beacons and 196

targets. For each pair we h ave considered the following met-

rics: the geographic distance of t he hosts d (measured on a

great circle), the monthly average packet loss rate r (the per-

centage of ICMP packet that does not reach the target point),

the monthly minimum and average round-trip-times RTT

min

,

and RTT

av

, respectively. These data oﬀer the opportunity to

test various hypotheses on the statistical behavior of Inter-

net performance. Each data point is the monthly summary

of approximately 1450 single measurements. The geographic

position of hosts is known with great accuracy for some sites,

but in most cases it may be wrong by 10-20km. Consequently,

we have discarded pairs of sites that are less than this distance

apart. The end-to-end delay is governed by several factors.

First, digital information travels along ﬁber optic cables at al-

most exactly 2/3 the speed of light in vacuum. This gives the

mnemonically very convenient value of 1ms RTT per 100km

of cable. Using this speed one can express the geographic dis-

tance d in light-milliseconds, obtaining an absolute physical

lower bound on the RTT between sites. The actual measured

RTT is (usually) larger than this value because of several fac-

tors. First, data packets often follow rather circuitous paths

leading them through a number of nodes that are far from the

geodesic line between the endpoints. Furthermore, each link

in a given path is itself far from being straight, often follow-

ing highways, railways or power lines [11]. The combination

of these factors produces a purely geometrical enhancement

factor of the RTT. In addition, there is a minimum processing

delay δ introduced by each router along the way, of the order

of 50-250µs per hop on average, summing up to a few ms for

a typical path [11]. This can be signiﬁcant for very close site

pairs, but is negligible for most of the paths in the PingER

sample. On top of this, the presence of cross traﬃc along the

2

0 50 100 150 200

d (x 100 km)

0

100

200

300

400

RTTmin (ms)

FIG. 1: RTT

min

between 2114 host pairs (PingER data set

of February 2002) as a function of their distance d. Each

point correspond to a diﬀerent host pair. The line indicates

the physical lower bound provided by the speed of light in

transmission cables. It is possible to observe th e very large

ﬂuctuations in the RTT

min

of diﬀerent host pairs separated

by the same distance. For graphical reasons the picture frame

is limited to 400ms, however, several outliers up to 900ms are

present in the data set.

route can cause data packets to be qu eued in the routers. Let

t

R

be the sum of all processing and queueing delays due to

the routers on a path. When the traﬃc reaches congestion,

t

R

becomes a very signiﬁcant part of the RTT and packet loss

also sets in. We have considered minimum and average values

of the RTT over one month periods. It is plausible that even

on rather congested links there will be a moment in the course

of a month when t

R

is negligible, so RTT

min

can be taken as

an estimate of the best possible communication performance

on the given data path, subject only to the intrinsic geomet-

rical enhancement factor and the minimum processing delay.

On the other hand, RTT

av

for a given site pair is obtained by

considering the average RTT over one month periods. This

takes into account also the average queueing delay and gives

an estimate of the overall communication performance on the

given data path.

We studied the level of correlation between geographic dis-

tance and the RTT

min

and RTT

av

of source-destination pair.

In Fig.1 we report th e obtained relationship for RTT

min

com-

pared with the solid line representing the speed of light in

optic ﬁbers at each distance. While it is possible to observe

a linear correlation of the RTT

min

with the physical distance

of hosts, yet the data are extremely scattered. The RTT

av

present a qualitatively very similar behavior, and it is worth

remarking that both plots are in good agreement with simi-

lar analysis obtained for diﬀerent d ata sets [8, 9, 10]. While

several qualitative features of this plot provide insight into

the geographical distribution of hosts and their connectivity,

it misses a quantitative characterization of the intrinsic ﬂuc-

0 1 2 3 4

−7

−5

−3

−1

Gloperf

0 1 2 3 4

ln τ

min

−9

−7

−5

−3

−1

ln P

cum

(τ

min

)

apr01

feb02

0.0 1.0 2.0 3.0 4.0

−7

−5

−3

−1

Gloperf

0 1 2 3 4

ln τ

av

−8

−6

−4

−2

0

ln P

cum

(τ

av

)

apr01

feb02

FIG. 2: Cumulative distributions, of the roun d-trip-times

normalized with the actual distance d between host pairs.The

linear behavior in the double logarithmic scale indicates

a broad distribution with power-law behavior. (a) In the

case of t he normalized minimum round-trip-times τ

min

,

the slope of the reference line is −2.0. (b) In th e case of

the normalized average round-trip-times τ

av

, the reference

line has a slope −1.5. The insets of a) and b) report the

distributions obtained for the Gloperf dataset. In both cases

we obtain power-law behaviors in good agreement with those

obtained for the PingER data sets (see Tab.I).

tuations of performances and their statistical properties.

A more signiﬁcant characterization of the end-to-end

performance is obtained by normalizing the latency time

by the geographical distance between hosts. This deﬁnes

the absolute performance metrics τ

min

=RTT

min

/d and

τ

av

=RTT

av

/d which represent the minimum and average

latency time for unit distance, i.e. the inverse of the over-

all communication velocity (note that if we measure d in

light-milliseconds τ

min

and τ

av

are actually dimensionless).

These metrics allow us to meaningfully compare t he perfor-

mance between pairs of hosts with diﬀerent geographical dis-

tances. The highly scattered plot of Fig. 1, indicates that

3

−2 −1 0 1 2

ln r

1

3

5

7

ln P(r)

FIG. 3: Probability density P (r) for the occurrence of

packet loss rate r on beacon-target pairs transmissions.

The zero on the x axis corresponds to a 1% rate in packet

loss. Note that the distribution has a linear behavior in the

double logarithmic scale, indicating a power law behavior.

The reference line has a slope −1.2.

end-to-end performance ﬂuctuates conspicuously in the whole

range of geographic distances. In particular, looking at col-

lections of host pairs at approximately the same geograph-

ical distance, we ﬁnd latency times varying up to two or-

ders of magnitude. The best way to characterize the level

of ﬂuctuations in latency times is represented by the prob-

ability P (τ

min

) and P(τ

av

) that a pair of hosts present a

given τ

min

and τ

av

, respectively. In contrast with usual ex-

ponential or gaussian distributions, for which there is a well

deﬁned scale, we ﬁnd t hat data closely follow a straight line

in a doub le logarithmic plot for at least one or two orders of

magnitude, deﬁning a power-law behavior P (τ

min

) ∼ τ

−α

min

min

and P (τ

av

) ∼ τ

−α

av

av

. In Fig.2 we show the cumulative distri-

butions P

cum

(τ ) =

R

∞

τ

P (τ

′

)dτ

′

obtained from the PingER

data. If the probability density distribution is a power law

P (τ ) ∼ τ

−α

, the cumulate distribution preserves the alge-

braic behavior and scales as P

cum

(τ ) = τ

−(α−1)

. In addition,

it has the advantage of being considerably less noisy than the

original distribution. From the behavior of Fig.2, a best ﬁt

of the linear region in the double logarithmic representation

yields the scaling exp onents α

min

≃ 3.0 and α

av

≃ 2.5. It

is worth remarking that the presence of a truncation of the

power law behavior for large values is a natural eﬀect implic-

itly present in every real world data set and it is likely due to

an incomplete statistical sampling of the distribution. Power-

law distributions are characterized by scale-free properties;

i.e. unbounded ﬂuctuations and the absence of a meaningful

characteristic length usually associated with the probability

distribution peak. In such a case, the mean distribution value

and the corresponding averages are poorly signiﬁcant, since

ﬂuctuations are gigantic and there are non negligible probabil-

ities to h ave very large τ

min

and τ

av

compared to the average

values in the whole system. In other words, Internet perfor-

mances are extremely heterogeneous and it is impossible to

infer local properties from average quantities.

The origin of scale-free b ehavior is usually associated to

critical cooperative dynamical eﬀects. Critical and scale-free

behavior has been observed and characterized in queueing

Data set α

min

α

av

hτ

min

i hτ

av

i

April ’00 2.7 ± 0.2 2.2 ± 0.2 3.7 6.6

Feb. ’01 2.9 ± 0.2 2.4 ± 0.2 3.6 6.6

Feb. ’02 3.0 ± 0.2 2.5 ± 0.2 3.1 5.3

Gloperf 2.7 ± 0.2 2.4 ± 0.2 5.4 7.8

TABLE I: The table shows the improving performances along

the years of the PingER data sample. As an independent

check, we report the values obtained from the analysis of the

data sample of the Gloperf project.

properties at router interfaces, probably aﬀecting conspicu-

ously the distribution of τ

av

. It is, however, unclear why scale-

free properties are observed also in the distribution of τ

min

.

In this case traﬃc eﬀects should be negligible, and it is well

known that the the distribution of hop counts between hosts

has a well d eﬁned peak and no fat tails [10]. On the contrary,

we ﬁnd that minimum latency times are distributed over more

than two orders of magnitude. Potentially, cables wiggliness,

Internet connectivity and hardware heterogeneities might be

playing a role in the observed performance distribution.

It is worth remarking that a tendency to improved perfor-

mance is observed over the two years period of data collec-

tions. Table I shows that the averages over all the site pairs of

< τ

min

> and < τ

av

> decreases steadily, whereas the expo-

nents α

min

and α

av

increases signalling a faster decay of the

distribution tails. We can consider the improvement of per-

formance as the byproduct of the technological drift to better

lines and routers. On the other hand, the large ﬂuctuations

present in the Internet performance appear to be a stable and

general feature of the statistical analysis. In order to have an

independent check of the PingER results, we have considered

also the Gloperf data set that was used in [8]. We h ave ex-

tracted a set of parameter values for each of 650 unique site

pairs in the sample and analyzed the statistics. These results

are also reported in Table I. Although the averages depend

on the speciﬁc characteristics of the sample (size, world re-

gion etc.) and diﬀer signiﬁcantly from the PingER case, the

existence of power law tails and the values of the exponents

seem to be conﬁrmed. These exponents can thus b e consid-

ered as one of the few and sought after reliable and invariant

properties of the Internet [12].

Finally, a further evidence of large ﬂuctuations in Internet

performance is provided by the analysis of the packet loss

data. Also in this case we are interested in the probability

P (r) that a certain rate r of packet loss occur on any given

pair. We have analyzed the monthly average packet loss

between PingER beacon-target pairs. In Fig.3 we report

the probability P (r) as a function of r. The plot shows

an algebraically decaying distribution that can be well

approximated by a power-law behavior P (r) ∼ r

−γ

with

γ = 1.2± 0.2. The slowly decaying probability of large packet

loss rate is another signature of the very heterogeneous

performance of the Internet. The results presented here

have implications for the evaluation of performance tren ds.

Models for primary performance factors must include the

high heterogeneities observed in real data. Time and scale

extrapolation for Internet performances can be seriously

ﬂawed by considering just the average properties. It is likely

that we will observe in the future an improvement of th e

average end-to-end p erformance due to increased bandwidth

and router speed, but the real improvement of the Internet

as a whole would correspond in reducing the huge statistical

ﬂuctuations observed nowadays. On a more theoretical side,

4

the explanation and formulation of microscopic models at

the origin of the scale-free behavior of Internet performance

appear challenging, t o say the least.

We thank C. Lee for sending us the Gloperf raw data. We

are grateful to L. Carbone, F. Coccetti, L. Cottrell, P. Dini,

Y. Moreno, R. Pastor-Satorras and A. V´azquez for helpful

comments and d iscussions. This work has been supported by

the Internetwork Performance Measurement ( IPM) project

of Istituto Nazionale di Fisica Nucleare, and the European

Commission - Fet Open Project COSIN IST-2001-33555.

[1] Faloutsos, M., Faloutsos, P. & Faloutsos, C. ACM SIG-

COMM ’99, Comput. Commun. Rev. 29, 251-262 (1999).

[2] Govindan, R. & Tangmunarunkit, H., in Proceedings of

IEEE INFOCOM 2000, Tel Aviv, Israel (Ieee, Piscat-

away, N.J. 2000).

[3] Broido A., & Claﬀy, K. C., in SPIE International sympo-

sium on Convergence of IT and Communication, (Denver,

CO, 2001).

[4] Pastor-Satorras,R., V´azquez, A. & Vespignani, A. Phys.

Rev. Lett. 87, 2587011-2587014 (2001); V´azquez, A.,

Pastor-Satorras, R. & Vespignani, A., Phys. Rev. E 65,

066130 (2002).

[5] Willinger, W., Govindan, R., Jamin, S., Paxson, V. &

Shenker, S. Proc. Nat. Acad. Sci. 99, 2573-2580 (2002).

[6] Barab´asi, A.-L. & Albert, R. R ev. Mod. Phys. 74, 47-97

(2002).

[7] Paxson, V. IEEE ACM T. Network 5 601-615, (1997).

[8] Lee, C. & Stepanek, J. On future global grid communica-

tion performance. 10th IEEE Heterogeneous Computing

Workshop, May 2001.

[9] Huﬀaker, B., Fomenkov, M., Mo ore,

D., Nemeth, E. & Claﬀy, K.

http://www.caida.org/outreach/papers/2000/asia

paper/

[10] Huﬀaker, B., Fomenkov, M., Moore, D., & Claﬀy, K.

Proceedings of the PAM 2001 Conference, Amsterdam,

23-24 April 2001.

[11] Bovy, C., Mertodimedjo, H.T., Hooghiemstra, G., Uijter-

waal, H. & Van Mieghem, P. Proceedings of the PAM

2002 Conference, Fort Collins, Colorado, 25-26 March

2002

[12] Floyd, S. & Paxson, V. IEEE/ACM Transactions on N et-

working, 9, 392-403 (2001).