Content uploaded by Alessandro Vespignani
Author content
All content in this area was uploaded by Alessandro Vespignani on Dec 18, 2013
Content may be subject to copyright.
arXiv:cond-mat/0209619v1 26 Sep 2002
Scale-free behavior of the Internet global performance
Roberto Percacci
∗
and Alessandro Vespignani
†
∗
International School for Advanced Studies SISSA/ISAS, via Beirut 4, 34014 Trieste, Italy; and
†
The Abdus Salam International Centre for
Theoretical Physics (ICTP), P.O. Box 586, 34100 Trieste, Italy
June 11, 2002.
Measurements and data analysis have proved very
effective in the study of the Internet’s physical fabric and
have shown heterogeneities and statistical fluctuations
extending over several orders of magnitude. Here we
analyze performance measurements obtained by the PingER
monitoring infrastructure. We focus on the relationship
between the Round-Trip-Time (RTT) and the geographical
distance. We define dimensionless variables that contain
information on the quality of Internet connections finding
that their probability distributions are characterized by a
slow power-law decay signalling the presence of scale-free
features. These results point out the extreme heterogeneity
of the Internet since the transmission speed between differ-
ent points of the n etwork exhibits very large fluctuations.
The associated scaling exponents appear to have fairly
stable values in different data s ets and thus defin e an
invariant characteristic of the Internet that might be used in
the future as a benchmark of the overall s tate of “health”
of the Internet. The observed scale-free character should be
incorporated in mo d els and analysis of Internet performance.
The Internet is a self-organizing system whose size has already
scaled five orders of magnitude since its incep t ion. Given the
extremely complex and interwoven structure of t he Internet,
several research groups started to deploy technologies and in-
frastructures aiming to obtain a more global picture of the In-
ternet. This has led to very interesting fin dings concerning the
Internet maps topology. Connectivity and other metrics are
characterized by algebraic statistical distributions that signal
fluctuations ext ending over many length scales [1, 2, 3, 4, 5].
These scale-free properties and the associated heterogeneity
of the Internet fabric define a large scale object whose prop-
erties cannot be inferred from local ones, and are in sharp
contrast with standard graph models. The importance of a
correct topological characterization of the Internet in routing
protocols and the parallel advancement in the understanding
of scale-free networks [6] have triggered a renewed interest
in Internet measurements and modeling. Considerable efforts
have been devoted also to the collection of end-to-end per-
formance data by means of active measurements techniques.
This activity has stimulated several studies that, however, fo-
cus mainly on individual properties of hosts, routers or routes.
Only recently, an increasing body of work focuses on the per-
formance of the Internet as a whole, especially to forecast fu-
ture performance trends [7, 8]. These measurement s p ointed
out the presence of highly heterogeneous performances and
it is our interest t o inspect the possibility of a cooperative
“emergent phenomenon” with associated scale-free behavior.
The basic testing package for Internet performance is the
original PING (Packet InterNet Groper) program. Based on
the Internet Control Message Protocol (ICMP), Ping works
much like a sonar echo-location, sending packets that elicit
a reply from the targeted host. The program then mea-
sures the round-trip-time (RTT), i.e. how long it takes
each packet to make the round trip. Organizations such
as the National Laboratory for Applied Network Research
(http://moat.nlanr.net/) and the Cooperative Association for
Internet Data Analysis (http://www.caida.org/) use PING-
like probes from geographically diverse monitors to collect
RTT data to hundreds or thousands of Internet d estinations.
Our Internetwork Performance Measurement ( IPM) project
currently participates in the PingER monitoring infrastruc-
ture (http://www-iepm.slac.stanford.edu/). PingER was de-
veloped by the Internet End-to-end Performance Measure-
ment (IEPM) group to monitor the end-to-end performance
of Internet links. It consists of a number of beacon sites send-
ing regularly ICMP probes to hundreds of targets and storing
all d ata centrally. Most beacons and targets are hosts b elong-
ing to universities or research centers; they are connected to
many different networks and backbones and have a very wide
geographical distribution, so they likely represent a statisti-
cally significant sample of the Internet as a whole.
We have analyzed two years worth of PingER data, going
from April 2000 to March 2002. We have selected 3353 d if-
ferent beacon-target pairs, taken out of 36 beacons and 196
targets. For each pair we h ave considered the following met-
rics: the geographic distance of t he hosts d (measured on a
great circle), the monthly average packet loss rate r (the per-
centage of ICMP packet that does not reach the target point),
the monthly minimum and average round-trip-times RTT
min
,
and RTT
av
, respectively. These data offer the opportunity to
test various hypotheses on the statistical behavior of Inter-
net performance. Each data point is the monthly summary
of approximately 1450 single measurements. The geographic
position of hosts is known with great accuracy for some sites,
but in most cases it may be wrong by 10-20km. Consequently,
we have discarded pairs of sites that are less than this distance
apart. The end-to-end delay is governed by several factors.
First, digital information travels along fiber optic cables at al-
most exactly 2/3 the speed of light in vacuum. This gives the
mnemonically very convenient value of 1ms RTT per 100km
of cable. Using this speed one can express the geographic dis-
tance d in light-milliseconds, obtaining an absolute physical
lower bound on the RTT between sites. The actual measured
RTT is (usually) larger than this value because of several fac-
tors. First, data packets often follow rather circuitous paths
leading them through a number of nodes that are far from the
geodesic line between the endpoints. Furthermore, each link
in a given path is itself far from being straight, often follow-
ing highways, railways or power lines [11]. The combination
of these factors produces a purely geometrical enhancement
factor of the RTT. In addition, there is a minimum processing
delay δ introduced by each router along the way, of the order
of 50-250µs per hop on average, summing up to a few ms for
a typical path [11]. This can be significant for very close site
pairs, but is negligible for most of the paths in the PingER
sample. On top of this, the presence of cross traffic along the
2
0 50 100 150 200
d (x 100 km)
0
100
200
300
400
RTTmin (ms)
FIG. 1: RTT
min
between 2114 host pairs (PingER data set
of February 2002) as a function of their distance d. Each
point correspond to a different host pair. The line indicates
the physical lower bound provided by the speed of light in
transmission cables. It is possible to observe th e very large
fluctuations in the RTT
min
of different host pairs separated
by the same distance. For graphical reasons the picture frame
is limited to 400ms, however, several outliers up to 900ms are
present in the data set.
route can cause data packets to be qu eued in the routers. Let
t
R
be the sum of all processing and queueing delays due to
the routers on a path. When the traffic reaches congestion,
t
R
becomes a very significant part of the RTT and packet loss
also sets in. We have considered minimum and average values
of the RTT over one month periods. It is plausible that even
on rather congested links there will be a moment in the course
of a month when t
R
is negligible, so RTT
min
can be taken as
an estimate of the best possible communication performance
on the given data path, subject only to the intrinsic geomet-
rical enhancement factor and the minimum processing delay.
On the other hand, RTT
av
for a given site pair is obtained by
considering the average RTT over one month periods. This
takes into account also the average queueing delay and gives
an estimate of the overall communication performance on the
given data path.
We studied the level of correlation between geographic dis-
tance and the RTT
min
and RTT
av
of source-destination pair.
In Fig.1 we report th e obtained relationship for RTT
min
com-
pared with the solid line representing the speed of light in
optic fibers at each distance. While it is possible to observe
a linear correlation of the RTT
min
with the physical distance
of hosts, yet the data are extremely scattered. The RTT
av
present a qualitatively very similar behavior, and it is worth
remarking that both plots are in good agreement with simi-
lar analysis obtained for different d ata sets [8, 9, 10]. While
several qualitative features of this plot provide insight into
the geographical distribution of hosts and their connectivity,
it misses a quantitative characterization of the intrinsic fluc-
0 1 2 3 4
−7
−5
−3
−1
Gloperf
0 1 2 3 4
ln τ
min
−9
−7
−5
−3
−1
ln P
cum
(τ
min
)
apr01
feb02
0.0 1.0 2.0 3.0 4.0
−7
−5
−3
−1
Gloperf
0 1 2 3 4
ln τ
av
−8
−6
−4
−2
0
ln P
cum
(τ
av
)
apr01
feb02
FIG. 2: Cumulative distributions, of the roun d-trip-times
normalized with the actual distance d between host pairs.The
linear behavior in the double logarithmic scale indicates
a broad distribution with power-law behavior. (a) In the
case of t he normalized minimum round-trip-times τ
min
,
the slope of the reference line is −2.0. (b) In th e case of
the normalized average round-trip-times τ
av
, the reference
line has a slope −1.5. The insets of a) and b) report the
distributions obtained for the Gloperf dataset. In both cases
we obtain power-law behaviors in good agreement with those
obtained for the PingER data sets (see Tab.I).
tuations of performances and their statistical properties.
A more significant characterization of the end-to-end
performance is obtained by normalizing the latency time
by the geographical distance between hosts. This defines
the absolute performance metrics τ
min
=RTT
min
/d and
τ
av
=RTT
av
/d which represent the minimum and average
latency time for unit distance, i.e. the inverse of the over-
all communication velocity (note that if we measure d in
light-milliseconds τ
min
and τ
av
are actually dimensionless).
These metrics allow us to meaningfully compare t he perfor-
mance between pairs of hosts with different geographical dis-
tances. The highly scattered plot of Fig. 1, indicates that
3
−2 −1 0 1 2
ln r
1
3
5
7
ln P(r)
FIG. 3: Probability density P (r) for the occurrence of
packet loss rate r on beacon-target pairs transmissions.
The zero on the x axis corresponds to a 1% rate in packet
loss. Note that the distribution has a linear behavior in the
double logarithmic scale, indicating a power law behavior.
The reference line has a slope −1.2.
end-to-end performance fluctuates conspicuously in the whole
range of geographic distances. In particular, looking at col-
lections of host pairs at approximately the same geograph-
ical distance, we find latency times varying up to two or-
ders of magnitude. The best way to characterize the level
of fluctuations in latency times is represented by the prob-
ability P (τ
min
) and P(τ
av
) that a pair of hosts present a
given τ
min
and τ
av
, respectively. In contrast with usual ex-
ponential or gaussian distributions, for which there is a well
defined scale, we find t hat data closely follow a straight line
in a doub le logarithmic plot for at least one or two orders of
magnitude, defining a power-law behavior P (τ
min
) ∼ τ
−α
min
min
and P (τ
av
) ∼ τ
−α
av
av
. In Fig.2 we show the cumulative distri-
butions P
cum
(τ ) =
R
∞
τ
P (τ
′
)dτ
′
obtained from the PingER
data. If the probability density distribution is a power law
P (τ ) ∼ τ
−α
, the cumulate distribution preserves the alge-
braic behavior and scales as P
cum
(τ ) = τ
−(α−1)
. In addition,
it has the advantage of being considerably less noisy than the
original distribution. From the behavior of Fig.2, a best fit
of the linear region in the double logarithmic representation
yields the scaling exp onents α
min
≃ 3.0 and α
av
≃ 2.5. It
is worth remarking that the presence of a truncation of the
power law behavior for large values is a natural effect implic-
itly present in every real world data set and it is likely due to
an incomplete statistical sampling of the distribution. Power-
law distributions are characterized by scale-free properties;
i.e. unbounded fluctuations and the absence of a meaningful
characteristic length usually associated with the probability
distribution peak. In such a case, the mean distribution value
and the corresponding averages are poorly significant, since
fluctuations are gigantic and there are non negligible probabil-
ities to h ave very large τ
min
and τ
av
compared to the average
values in the whole system. In other words, Internet perfor-
mances are extremely heterogeneous and it is impossible to
infer local properties from average quantities.
The origin of scale-free b ehavior is usually associated to
critical cooperative dynamical effects. Critical and scale-free
behavior has been observed and characterized in queueing
Data set α
min
α
av
hτ
min
i hτ
av
i
April ’00 2.7 ± 0.2 2.2 ± 0.2 3.7 6.6
Feb. ’01 2.9 ± 0.2 2.4 ± 0.2 3.6 6.6
Feb. ’02 3.0 ± 0.2 2.5 ± 0.2 3.1 5.3
Gloperf 2.7 ± 0.2 2.4 ± 0.2 5.4 7.8
TABLE I: The table shows the improving performances along
the years of the PingER data sample. As an independent
check, we report the values obtained from the analysis of the
data sample of the Gloperf project.
properties at router interfaces, probably affecting conspicu-
ously the distribution of τ
av
. It is, however, unclear why scale-
free properties are observed also in the distribution of τ
min
.
In this case traffic effects should be negligible, and it is well
known that the the distribution of hop counts between hosts
has a well d efined peak and no fat tails [10]. On the contrary,
we find that minimum latency times are distributed over more
than two orders of magnitude. Potentially, cables wiggliness,
Internet connectivity and hardware heterogeneities might be
playing a role in the observed performance distribution.
It is worth remarking that a tendency to improved perfor-
mance is observed over the two years period of data collec-
tions. Table I shows that the averages over all the site pairs of
< τ
min
> and < τ
av
> decreases steadily, whereas the expo-
nents α
min
and α
av
increases signalling a faster decay of the
distribution tails. We can consider the improvement of per-
formance as the byproduct of the technological drift to better
lines and routers. On the other hand, the large fluctuations
present in the Internet performance appear to be a stable and
general feature of the statistical analysis. In order to have an
independent check of the PingER results, we have considered
also the Gloperf data set that was used in [8]. We h ave ex-
tracted a set of parameter values for each of 650 unique site
pairs in the sample and analyzed the statistics. These results
are also reported in Table I. Although the averages depend
on the specific characteristics of the sample (size, world re-
gion etc.) and differ significantly from the PingER case, the
existence of power law tails and the values of the exponents
seem to be confirmed. These exponents can thus b e consid-
ered as one of the few and sought after reliable and invariant
properties of the Internet [12].
Finally, a further evidence of large fluctuations in Internet
performance is provided by the analysis of the packet loss
data. Also in this case we are interested in the probability
P (r) that a certain rate r of packet loss occur on any given
pair. We have analyzed the monthly average packet loss
between PingER beacon-target pairs. In Fig.3 we report
the probability P (r) as a function of r. The plot shows
an algebraically decaying distribution that can be well
approximated by a power-law behavior P (r) ∼ r
−γ
with
γ = 1.2± 0.2. The slowly decaying probability of large packet
loss rate is another signature of the very heterogeneous
performance of the Internet. The results presented here
have implications for the evaluation of performance tren ds.
Models for primary performance factors must include the
high heterogeneities observed in real data. Time and scale
extrapolation for Internet performances can be seriously
flawed by considering just the average properties. It is likely
that we will observe in the future an improvement of th e
average end-to-end p erformance due to increased bandwidth
and router speed, but the real improvement of the Internet
as a whole would correspond in reducing the huge statistical
fluctuations observed nowadays. On a more theoretical side,
4
the explanation and formulation of microscopic models at
the origin of the scale-free behavior of Internet performance
appear challenging, t o say the least.
We thank C. Lee for sending us the Gloperf raw data. We
are grateful to L. Carbone, F. Coccetti, L. Cottrell, P. Dini,
Y. Moreno, R. Pastor-Satorras and A. V´azquez for helpful
comments and d iscussions. This work has been supported by
the Internetwork Performance Measurement ( IPM) project
of Istituto Nazionale di Fisica Nucleare, and the European
Commission - Fet Open Project COSIN IST-2001-33555.
[1] Faloutsos, M., Faloutsos, P. & Faloutsos, C. ACM SIG-
COMM ’99, Comput. Commun. Rev. 29, 251-262 (1999).
[2] Govindan, R. & Tangmunarunkit, H., in Proceedings of
IEEE INFOCOM 2000, Tel Aviv, Israel (Ieee, Piscat-
away, N.J. 2000).
[3] Broido A., & Claffy, K. C., in SPIE International sympo-
sium on Convergence of IT and Communication, (Denver,
CO, 2001).
[4] Pastor-Satorras,R., V´azquez, A. & Vespignani, A. Phys.
Rev. Lett. 87, 2587011-2587014 (2001); V´azquez, A.,
Pastor-Satorras, R. & Vespignani, A., Phys. Rev. E 65,
066130 (2002).
[5] Willinger, W., Govindan, R., Jamin, S., Paxson, V. &
Shenker, S. Proc. Nat. Acad. Sci. 99, 2573-2580 (2002).
[6] Barab´asi, A.-L. & Albert, R. R ev. Mod. Phys. 74, 47-97
(2002).
[7] Paxson, V. IEEE ACM T. Network 5 601-615, (1997).
[8] Lee, C. & Stepanek, J. On future global grid communica-
tion performance. 10th IEEE Heterogeneous Computing
Workshop, May 2001.
[9] Huffaker, B., Fomenkov, M., Mo ore,
D., Nemeth, E. & Claffy, K.
http://www.caida.org/outreach/papers/2000/asia
paper/
[10] Huffaker, B., Fomenkov, M., Moore, D., & Claffy, K.
Proceedings of the PAM 2001 Conference, Amsterdam,
23-24 April 2001.
[11] Bovy, C., Mertodimedjo, H.T., Hooghiemstra, G., Uijter-
waal, H. & Van Mieghem, P. Proceedings of the PAM
2002 Conference, Fort Collins, Colorado, 25-26 March
2002
[12] Floyd, S. & Paxson, V. IEEE/ACM Transactions on N et-
working, 9, 392-403 (2001).