ArticlePDF Available

Scale-free behavior of the Internet global performance

Authors:

Abstract

Measurements and data analysis have proved very effective in the study of the Internet's physical fabric and have shown heterogeneities and statistical fluctuations extending over several orders of magnitude. Here we analyze performance measurements obtained by the PingER monitoring infrastructure. We focus on the relationship between the Round-Trip-Time (RTT) and the geographical distance. We define dimensionless variables that contain information on the quality of Internet connections finding that their probability distributions are characterized by a slow power-law decay signalling the presence of scale-free features. These results point out the extreme heterogeneity of the Internet since the transmission speed between different points of the network exhibits very large fluctuations. The associated scaling exponents appear to have fairly stable values in different data sets and thus define an invariant characteristic of the Internet that might be used in the future as a benchmark of the overall state of ``health'' of the Internet. The observed scale-free character should be incorporated in models and analysis of Internet performance.
arXiv:cond-mat/0209619v1 26 Sep 2002
Scale-free behavior of the Internet global performance
Roberto Percacci
and Alessandro Vespignani
International School for Advanced Studies SISSA/ISAS, via Beirut 4, 34014 Trieste, Italy; and
The Abdus Salam International Centre for
Theoretical Physics (ICTP), P.O. Box 586, 34100 Trieste, Italy
June 11, 2002.
Measurements and data analysis have proved very
effective in the study of the Internet’s physical fabric and
have shown heterogeneities and statistical fluctuations
extending over several orders of magnitude. Here we
analyze performance measurements obtained by the PingER
monitoring infrastructure. We focus on the relationship
between the Round-Trip-Time (RTT) and the geographical
distance. We define dimensionless variables that contain
information on the quality of Internet connections finding
that their probability distributions are characterized by a
slow power-law decay signalling the presence of scale-free
features. These results point out the extreme heterogeneity
of the Internet since the transmission speed between differ-
ent points of the n etwork exhibits very large fluctuations.
The associated scaling exponents appear to have fairly
stable values in different data s ets and thus defin e an
invariant characteristic of the Internet that might be used in
the future as a benchmark of the overall s tate of “health”
of the Internet. The observed scale-free character should be
incorporated in mo d els and analysis of Internet performance.
The Internet is a self-organizing system whose size has already
scaled ve orders of magnitude since its incep t ion. Given the
extremely complex and interwoven structure of t he Internet,
several research groups started to deploy technologies and in-
frastructures aiming to obtain a more global picture of the In-
ternet. This has led to very interesting fin dings concerning the
Internet maps topology. Connectivity and other metrics are
characterized by algebraic statistical distributions that signal
fluctuations ext ending over many length scales [1, 2, 3, 4, 5].
These scale-free properties and the associated heterogeneity
of the Internet fabric define a large scale object whose prop-
erties cannot be inferred from local ones, and are in sharp
contrast with standard graph models. The importance of a
correct topological characterization of the Internet in routing
protocols and the parallel advancement in the understanding
of scale-free networks [6] have triggered a renewed interest
in Internet measurements and modeling. Considerable efforts
have been devoted also to the collection of end-to-end per-
formance data by means of active measurements techniques.
This activity has stimulated several studies that, however, fo-
cus mainly on individual properties of hosts, routers or routes.
Only recently, an increasing body of work focuses on the per-
formance of the Internet as a whole, especially to forecast fu-
ture performance trends [7, 8]. These measurement s p ointed
out the presence of highly heterogeneous performances and
it is our interest t o inspect the possibility of a cooperative
“emergent phenomenon” with associated scale-free behavior.
The basic testing package for Internet performance is the
original PING (Packet InterNet Groper) program. Based on
the Internet Control Message Protocol (ICMP), Ping works
much like a sonar echo-location, sending packets that elicit
a reply from the targeted host. The program then mea-
sures the round-trip-time (RTT), i.e. how long it takes
each packet to make the round trip. Organizations such
as the National Laboratory for Applied Network Research
(http://moat.nlanr.net/) and the Cooperative Association for
Internet Data Analysis (http://www.caida.org/) use PING-
like probes from geographically diverse monitors to collect
RTT data to hundreds or thousands of Internet d estinations.
Our Internetwork Performance Measurement ( IPM) project
currently participates in the PingER monitoring infrastruc-
ture (http://www-iepm.slac.stanford.edu/). PingER was de-
veloped by the Internet End-to-end Performance Measure-
ment (IEPM) group to monitor the end-to-end performance
of Internet links. It consists of a number of beacon sites send-
ing regularly ICMP probes to hundreds of targets and storing
all d ata centrally. Most beacons and targets are hosts b elong-
ing to universities or research centers; they are connected to
many different networks and backbones and have a very wide
geographical distribution, so they likely represent a statisti-
cally significant sample of the Internet as a whole.
We have analyzed two years worth of PingER data, going
from April 2000 to March 2002. We have selected 3353 d if-
ferent beacon-target pairs, taken out of 36 beacons and 196
targets. For each pair we h ave considered the following met-
rics: the geographic distance of t he hosts d (measured on a
great circle), the monthly average packet loss rate r (the per-
centage of ICMP packet that does not reach the target point),
the monthly minimum and average round-trip-times RTT
min
,
and RTT
av
, respectively. These data offer the opportunity to
test various hypotheses on the statistical behavior of Inter-
net performance. Each data point is the monthly summary
of approximately 1450 single measurements. The geographic
position of hosts is known with great accuracy for some sites,
but in most cases it may be wrong by 10-20km. Consequently,
we have discarded pairs of sites that are less than this distance
apart. The end-to-end delay is governed by several factors.
First, digital information travels along fiber optic cables at al-
most exactly 2/3 the speed of light in vacuum. This gives the
mnemonically very convenient value of 1ms RTT per 100km
of cable. Using this speed one can express the geographic dis-
tance d in light-milliseconds, obtaining an absolute physical
lower bound on the RTT between sites. The actual measured
RTT is (usually) larger than this value because of several fac-
tors. First, data packets often follow rather circuitous paths
leading them through a number of nodes that are far from the
geodesic line between the endpoints. Furthermore, each link
in a given path is itself far from being straight, often follow-
ing highways, railways or power lines [11]. The combination
of these factors produces a purely geometrical enhancement
factor of the RTT. In addition, there is a minimum processing
delay δ introduced by each router along the way, of the order
of 50-250µs per hop on average, summing up to a few ms for
a typical path [11]. This can be significant for very close site
pairs, but is negligible for most of the paths in the PingER
sample. On top of this, the presence of cross traffic along the
2
0 50 100 150 200
d (x 100 km)
0
100
200
300
400
RTTmin (ms)
FIG. 1: RTT
min
between 2114 host pairs (PingER data set
of February 2002) as a function of their distance d. Each
point correspond to a different host pair. The line indicates
the physical lower bound provided by the speed of light in
transmission cables. It is possible to observe th e very large
fluctuations in the RTT
min
of different host pairs separated
by the same distance. For graphical reasons the picture frame
is limited to 400ms, however, several outliers up to 900ms are
present in the data set.
route can cause data packets to be qu eued in the routers. Let
t
R
be the sum of all processing and queueing delays due to
the routers on a path. When the traffic reaches congestion,
t
R
becomes a very significant part of the RTT and packet loss
also sets in. We have considered minimum and average values
of the RTT over one month periods. It is plausible that even
on rather congested links there will be a moment in the course
of a month when t
R
is negligible, so RTT
min
can be taken as
an estimate of the best possible communication performance
on the given data path, subject only to the intrinsic geomet-
rical enhancement factor and the minimum processing delay.
On the other hand, RTT
av
for a given site pair is obtained by
considering the average RTT over one month periods. This
takes into account also the average queueing delay and gives
an estimate of the overall communication performance on the
given data path.
We studied the level of correlation between geographic dis-
tance and the RTT
min
and RTT
av
of source-destination pair.
In Fig.1 we report th e obtained relationship for RTT
min
com-
pared with the solid line representing the speed of light in
optic fibers at each distance. While it is possible to observe
a linear correlation of the RTT
min
with the physical distance
of hosts, yet the data are extremely scattered. The RTT
av
present a qualitatively very similar behavior, and it is worth
remarking that both plots are in good agreement with simi-
lar analysis obtained for different d ata sets [8, 9, 10]. While
several qualitative features of this plot provide insight into
the geographical distribution of hosts and their connectivity,
it misses a quantitative characterization of the intrinsic fluc-
0 1 2 3 4
−7
−5
−3
−1
Gloperf
0 1 2 3 4
ln τ
min
−9
−7
−5
−3
−1
ln P
cum
(τ
min
)
apr01
feb02
0.0 1.0 2.0 3.0 4.0
−7
−5
−3
−1
Gloperf
0 1 2 3 4
ln τ
av
−8
−6
−4
−2
0
ln P
cum
(τ
av
)
apr01
feb02
FIG. 2: Cumulative distributions, of the roun d-trip-times
normalized with the actual distance d between host pairs.The
linear behavior in the double logarithmic scale indicates
a broad distribution with power-law behavior. (a) In the
case of t he normalized minimum round-trip-times τ
min
,
the slope of the reference line is 2.0. (b) In th e case of
the normalized average round-trip-times τ
av
, the reference
line has a slope 1.5. The insets of a) and b) report the
distributions obtained for the Gloperf dataset. In both cases
we obtain power-law behaviors in good agreement with those
obtained for the PingER data sets (see Tab.I).
tuations of performances and their statistical properties.
A more significant characterization of the end-to-end
performance is obtained by normalizing the latency time
by the geographical distance between hosts. This defines
the absolute performance metrics τ
min
=RTT
min
/d and
τ
av
=RTT
av
/d which represent the minimum and average
latency time for unit distance, i.e. the inverse of the over-
all communication velocity (note that if we measure d in
light-milliseconds τ
min
and τ
av
are actually dimensionless).
These metrics allow us to meaningfully compare t he perfor-
mance between pairs of hosts with different geographical dis-
tances. The highly scattered plot of Fig. 1, indicates that
3
−2 −1 0 1 2
ln r
1
3
5
7
ln P(r)
FIG. 3: Probability density P (r) for the occurrence of
packet loss rate r on beacon-target pairs transmissions.
The zero on the x axis corresponds to a 1% rate in packet
loss. Note that the distribution has a linear behavior in the
double logarithmic scale, indicating a power law behavior.
The reference line has a slope 1.2.
end-to-end performance fluctuates conspicuously in the whole
range of geographic distances. In particular, looking at col-
lections of host pairs at approximately the same geograph-
ical distance, we find latency times varying up to two or-
ders of magnitude. The best way to characterize the level
of fluctuations in latency times is represented by the prob-
ability P (τ
min
) and P(τ
av
) that a pair of hosts present a
given τ
min
and τ
av
, respectively. In contrast with usual ex-
ponential or gaussian distributions, for which there is a well
defined scale, we find t hat data closely follow a straight line
in a doub le logarithmic plot for at least one or two orders of
magnitude, defining a power-law behavior P (τ
min
) τ
α
min
min
and P (τ
av
) τ
α
av
av
. In Fig.2 we show the cumulative distri-
butions P
cum
(τ ) =
R
τ
P (τ
)
obtained from the PingER
data. If the probability density distribution is a power law
P (τ ) τ
α
, the cumulate distribution preserves the alge-
braic behavior and scales as P
cum
(τ ) = τ
(α1)
. In addition,
it has the advantage of being considerably less noisy than the
original distribution. From the behavior of Fig.2, a best fit
of the linear region in the double logarithmic representation
yields the scaling exp onents α
min
3.0 and α
av
2.5. It
is worth remarking that the presence of a truncation of the
power law behavior for large values is a natural effect implic-
itly present in every real world data set and it is likely due to
an incomplete statistical sampling of the distribution. Power-
law distributions are characterized by scale-free properties;
i.e. unbounded fluctuations and the absence of a meaningful
characteristic length usually associated with the probability
distribution peak. In such a case, the mean distribution value
and the corresponding averages are poorly significant, since
fluctuations are gigantic and there are non negligible probabil-
ities to h ave very large τ
min
and τ
av
compared to the average
values in the whole system. In other words, Internet perfor-
mances are extremely heterogeneous and it is impossible to
infer local properties from average quantities.
The origin of scale-free b ehavior is usually associated to
critical cooperative dynamical effects. Critical and scale-free
behavior has been observed and characterized in queueing
Data set α
min
α
av
hτ
min
i hτ
av
i
April ’00 2.7 ± 0.2 2.2 ± 0.2 3.7 6.6
Feb. ’01 2.9 ± 0.2 2.4 ± 0.2 3.6 6.6
Feb. ’02 3.0 ± 0.2 2.5 ± 0.2 3.1 5.3
Gloperf 2.7 ± 0.2 2.4 ± 0.2 5.4 7.8
TABLE I: The table shows the improving performances along
the years of the PingER data sample. As an independent
check, we report the values obtained from the analysis of the
data sample of the Gloperf project.
properties at router interfaces, probably affecting conspicu-
ously the distribution of τ
av
. It is, however, unclear why scale-
free properties are observed also in the distribution of τ
min
.
In this case traffic effects should be negligible, and it is well
known that the the distribution of hop counts between hosts
has a well d efined peak and no fat tails [10]. On the contrary,
we find that minimum latency times are distributed over more
than two orders of magnitude. Potentially, cables wiggliness,
Internet connectivity and hardware heterogeneities might be
playing a role in the observed performance distribution.
It is worth remarking that a tendency to improved perfor-
mance is observed over the two years period of data collec-
tions. Table I shows that the averages over all the site pairs of
< τ
min
> and < τ
av
> decreases steadily, whereas the expo-
nents α
min
and α
av
increases signalling a faster decay of the
distribution tails. We can consider the improvement of per-
formance as the byproduct of the technological drift to better
lines and routers. On the other hand, the large fluctuations
present in the Internet performance appear to be a stable and
general feature of the statistical analysis. In order to have an
independent check of the PingER results, we have considered
also the Gloperf data set that was used in [8]. We h ave ex-
tracted a set of parameter values for each of 650 unique site
pairs in the sample and analyzed the statistics. These results
are also reported in Table I. Although the averages depend
on the specific characteristics of the sample (size, world re-
gion etc.) and differ significantly from the PingER case, the
existence of power law tails and the values of the exponents
seem to be confirmed. These exponents can thus b e consid-
ered as one of the few and sought after reliable and invariant
properties of the Internet [12].
Finally, a further evidence of large fluctuations in Internet
performance is provided by the analysis of the packet loss
data. Also in this case we are interested in the probability
P (r) that a certain rate r of packet loss occur on any given
pair. We have analyzed the monthly average packet loss
between PingER beacon-target pairs. In Fig.3 we report
the probability P (r) as a function of r. The plot shows
an algebraically decaying distribution that can be well
approximated by a power-law behavior P (r) r
γ
with
γ = 1.2± 0.2. The slowly decaying probability of large packet
loss rate is another signature of the very heterogeneous
performance of the Internet. The results presented here
have implications for the evaluation of performance tren ds.
Models for primary performance factors must include the
high heterogeneities observed in real data. Time and scale
extrapolation for Internet performances can be seriously
flawed by considering just the average properties. It is likely
that we will observe in the future an improvement of th e
average end-to-end p erformance due to increased bandwidth
and router speed, but the real improvement of the Internet
as a whole would correspond in reducing the huge statistical
fluctuations observed nowadays. On a more theoretical side,
4
the explanation and formulation of microscopic models at
the origin of the scale-free behavior of Internet performance
appear challenging, t o say the least.
We thank C. Lee for sending us the Gloperf raw data. We
are grateful to L. Carbone, F. Coccetti, L. Cottrell, P. Dini,
Y. Moreno, R. Pastor-Satorras and A. azquez for helpful
comments and d iscussions. This work has been supported by
the Internetwork Performance Measurement ( IPM) project
of Istituto Nazionale di Fisica Nucleare, and the European
Commission - Fet Open Project COSIN IST-2001-33555.
[1] Faloutsos, M., Faloutsos, P. & Faloutsos, C. ACM SIG-
COMM ’99, Comput. Commun. Rev. 29, 251-262 (1999).
[2] Govindan, R. & Tangmunarunkit, H., in Proceedings of
IEEE INFOCOM 2000, Tel Aviv, Israel (Ieee, Piscat-
away, N.J. 2000).
[3] Broido A., & Claffy, K. C., in SPIE International sympo-
sium on Convergence of IT and Communication, (Denver,
CO, 2001).
[4] Pastor-Satorras,R., azquez, A. & Vespignani, A. Phys.
Rev. Lett. 87, 2587011-2587014 (2001); azquez, A.,
Pastor-Satorras, R. & Vespignani, A., Phys. Rev. E 65,
066130 (2002).
[5] Willinger, W., Govindan, R., Jamin, S., Paxson, V. &
Shenker, S. Proc. Nat. Acad. Sci. 99, 2573-2580 (2002).
[6] Barab´asi, A.-L. & Albert, R. R ev. Mod. Phys. 74, 47-97
(2002).
[7] Paxson, V. IEEE ACM T. Network 5 601-615, (1997).
[8] Lee, C. & Stepanek, J. On future global grid communica-
tion performance. 10th IEEE Heterogeneous Computing
Workshop, May 2001.
[9] Huffaker, B., Fomenkov, M., Mo ore,
D., Nemeth, E. & Claffy, K.
http://www.caida.org/outreach/papers/2000/asia
paper/
[10] Huffaker, B., Fomenkov, M., Moore, D., & Claffy, K.
Proceedings of the PAM 2001 Conference, Amsterdam,
23-24 April 2001.
[11] Bovy, C., Mertodimedjo, H.T., Hooghiemstra, G., Uijter-
waal, H. & Van Mieghem, P. Proceedings of the PAM
2002 Conference, Fort Collins, Colorado, 25-26 March
2002
[12] Floyd, S. & Paxson, V. IEEE/ACM Transactions on N et-
working, 9, 392-403 (2001).
... To be exact, the rtt is a measure of time, from which we can infer an approximation of the distance[26,20,32]. ...
Chapter
Full-text available
Web scraping bots are now using so-called Residential ip Proxy (resip) services to defeat state-of-the-art commercial bot countermeasures. resip providers promise their customers to give them access to tens of millions of residential ip addresses, which belong to legitimate users. They dramatically complicate the task of the existing anti-bot solutions and give the upper hand to the malicious actors. New specific detection methods are needed to identify and stop scrapers from taking advantage of these parties. This work, thanks to a 4 months-long experiment, validates the feasibility, soundness, and practicality of a detection method based on network measurements. This technique enables contacted servers to identify whether an incoming request comes directly from a client device or if it has been proxied through another device.KeywordsWeb scrapingResidential ip Proxy resip Round trip time measurement tls SecurityBots
... In general, the geographical distance between the observation points and the service is different, as it is shown in Fig. 2. The transmission speed of digital information in optical cables is 2/3 of the speed of light in a vacuum, which is about 200 km/ms (Percacci & Vespignani, 2003). The latency is further increased by the number of routers on the path and by other minor delays related to data transmission (serialization, etc.). ...
Article
Full-text available
Latency is one of the key parameters of Internet services. However, it is difficult to correctly assess a service by its latency. Many latency measurements are blocked en route by routers and firewalls. For this reason, the service latency is not fully known. This work proposes a method to assess Internet services including the blocked latency measurements. Survival theory is applied to process latency values. The results show that the omission of blocked latencies from statistical processing severely underestimates the service latency. Two Internet service providers were compared as an example. Their latency difference was 9 ms using the traditional approach. The survival latency resulted in a difference of 17 ms. The method of survival latency can be used to increase revenues in e-commerce and to improve the experience of online gaming.
... The latency was constantly 4 ± 0 msec in all measurements, indicating a sufficient working connection between the networks. 18 The image quality of the videos transmitted through the smart glasses was acceptable. However, the camera could not reach the resolution of 4K (3840 × 2160 pixels, comparable to 8.3 megapixels) in any of the reviewed cases when transmitting data over the internet. ...
Article
Full-text available
OBJECTIVE Telemedicine technology has been developed to allow surgeons in countries with limited resources to access expert technical guidance during surgical procedures. The authors report their initial experience using state-of-the-art wearable smart glasses with wireless capability to transmit intraoperative video content during spine surgery from sub-Saharan Africa to experts in the US. METHODS A novel smart glasses system with integrated camera and microphone was worn by a spine surgeon in Dar es Salaam, Tanzania, during 3 scoliosis correction surgeries. The images were transmitted wirelessly through a compatible software system to a computer viewed by a group of fellowship-trained spine surgeons in New York City. Visual clarity was determined using a modified Snellen chart, and a percentage score was determined on the smallest line that could be read from the 8-line chart on white and black backgrounds. A 1- to 5-point scale (from 1 = unrecognizable to 5 = optimal clarity) was used to score other visual metrics assessed using a color test card including hue, contrast, and brightness. The same scoring system was used by the group to reach a consensus on visual quality of 3 intraoperative points including instruments, radiographs (ability to see pedicle screws relative to bony anatomy), and intraoperative surgical field (ability to identify bony landmarks such as transverse processes, pedicle screw starting point, laminar edge). RESULTS All surgeries accomplished the defined goals safely with no intraoperative complications. The average download and upload connection speeds achieved in Dar es Salaam were 45.21 and 58.89 Mbps, respectively. Visual clarity with the modified white and black Snellen chart was 70.8% and 62.5%, respectively. The average scores for hue, contrast, and brightness were 2.67, 3.33, and 2.67, respectively. Visualization quality of instruments, radiographs, and intraoperative surgical field were 3.67, 1, and 1, respectively. CONCLUSIONS Application of smart glasses for telemedicine offers a promising tool for surgical education and remote training, especially in low- and middle-income countries. However, this study highlights some limitations of this technology, including optical resolution, intraoperative lighting, and internet connection challenges. With continued collaboration between clinicians and industry, future iterations of smart glasses technology will need to address these issues to stimulate robust clinical utilization.
... Our research indicates that, even if tromboning paths exist, they are rare. A packet following a tromboning path (relatively longer path) would very likely result in higher RTT, in comparison to a non-tromboning path [63]. Such high RTTs would have resulted in wider PCRs, resulting in large intersection regions. ...
Article
Full-text available
Recent research claims that “powerful” nation-states may be hegemonic over significant web traffic of “underserved” nations ( e.g., Brazil and India). Such traffic may be surveilled when transiting (or ending in) these powerful nations. On the other hand, content distribution networks (CDNs) are designed to bring web content closer to end-users. Thus it is natural to ask whether CDNs have led to the localization of Internet traffic within the country’s boundary, challenging the notion of nation-state hegemony. Further, such traffic localization may inadvertently enhance a country’s ability to coerce content providers to censor (or monitor) access within its boundary. On top of that, the obvious solution, i.e., anti-censorship approaches, may sadly face a new dilemma. Traditional ones, relying on proxies, are easily discoverable. Whereas newer ones ( e.g., Decoy Routing, Cache-Browser, Domain Fronting and CovertCast etc. ) might not work as they require accessing web content hosted outside the censors’ boundary. We thus quantitatively analyzed the impact of web content localization on various anti-censorship systems. Such analysis requires geolocating the websites. Thus we adapted a multilateration method, Constraint Based Geolocation (CBG), with additional heuristics. We call it as Region Specific CBG (R-CBG) . In more than 89% cases, R-CBG correctly classifies hosts as inside (or outside) w.r.t. a nation. Our empirical study, involving five countries, shows that the majority (61%−92%) of popular country-specific websites are hosted within a client’s own country. Further, additional heuristics classify the majority of them to be on CDNs.
... To estimate the radius of a constraint circle, we first use ping to measure the target IP and then convert the delay between each probe and the target device into a geographical distance by Formula 18. Percacci et al. [26] has shown that packets travel in fiber optic cables at 2/3 the speed of light in a vacuum (denoted by c) . However, other literature [12] [16] has demonstrated that 2/3 is a loose upper bound of converting factor because it does not take into account the transmission delay, the queuing delay, and the processing delay. ...
Preprint
IP geolocation aims at locating the geographical position of Internet devices, which plays an essential role in many Internet applications. In this field, a long-standing challenge is how to find a large number of highly-reliable landmarks, which is the key to improve the precision of IP geolocation. To this end, many efforts have been made, while many IP geolocation methods still suffer from unacceptable error distance because of the lack of landmarks. In this paper, we propose a novel IP geolocation system, named XLBoost-Geo, which focuses on enhancing the number and the density of highly reliable landmarks. The main idea is to extract location-indicating clues from web pages and locating the web servers based on the clues. Based on the landmarks, XLBoost-Geo is able to geolocate arbitrary IPs with little error distance. Specifically, we first design an entity extracting method based on a bidirectional LSTM neural network with a self-adaptive loss function (LSTM-Ada) to extract the location-indicating clues on web pages and then generate landmarks based on the clues. Then, by measurements on network latency and topology, we estimate the closest landmark and associate the coordinate of the landmark with the location of the target IP. The results of our experiments clearly validate the effectiveness and efficiency of the extracting method, the precision, number, coverage of the landmarks, and the precision of the IP geolocation. On RIPE Atlas nodes, XLBoost-Geo achieves 2,561m median error distance, which outperforms SLG and IPIP.
... However, replicating data between such servers can be problematic due to limited network bandwidth and long round-trip latencies ("digital information travels along fiber optic cables at almost exactly 2/3 the speed of light in vacuum [. . .] the mnemonically very convenient value of 1 ms RTT [round-trip time] per 100 km of cable" (Percacci and Vespignani, 2003)). ...
Conference Paper
Full-text available
To keep internet based services available despite inevitable local internet and power outages, their data must be replicated to one or more other sites. For most systems using the store-and-forward architecture, data loss can also be prevented by using end-to-end acknowledgements. So far we have not found any sufficiently good solutions for replication of data in store-and-forward systems without acknowledgements and with geograph- ically separated system nodes. We therefore designed a new replication protocol, which could take advantage of the lack of a global order between the messages and the acceptance of a slightly higher risk for duplicated deliveries than existing protocols. We tested a proof-of-concept implementation of the protocol for throughput and latency in a controlled experiment using 7 nodes in 4 geographically separated areas, and observed the throughput increasing superlinearly with the number of nodes up to almost 3500 messages per second. It is also, to the best of our knowledge, the first replication protocol with a bandwidth usage that scales according to the number of nodes allowed to fail and not the total number of nodes in the system.
Article
Cloud computing is critical for today's information society. In this paper, we shed light on two radically different cloud design philosophies: the DC-cloud built around massive data centers, and the ISP-cloud built upon a large ISP. With extensive measurements on Alibaba, Tencent, and CTYun, we find that both designs have strengths and weaknesses: the ISP-cloud of CTYun has less inflated paths to users within the same ISP, but its paths to external users are more inflated comparing with the DC-clouds of Alibaba and Tencent. By analyzing the clouds’ routing policies, we reveal the reasons behind the path inflations: Alibaba and Tencent adopt an early-exit policy to use more inflated public Internet paths as early as possible; while CTYun follows a global and location-agnostic policy to detour traffic to remote PoPs, leading to highly inflated paths. Based on the insights, we suggest alternative policies and averagely reduce 11.0% latency to 30.5% destinations for Alibaba, 9.8% latency to 18.6% destinations for Tencent, and 54.1% latency to external destinations for CTYun. The results suggest that both cloud designs have rooms for improvement, and an ISP-cloud has the potential to achieve a superior performance, thanks to its inherited advantages from the ISP infrastructure.
Article
Full-text available
The description of the Internet topology is an important open problem, recently tackled with the introduction of scale-free networks. We focus on the topological and dynamical properties of real Internet maps in a three-year time interval. We study higher order correlation functions as well as the dynamics of several quantities. We find that the Internet is characterized by non-trivial correlations among nodes and different dynamical regimes. We point out the importance of node hierarchy and aging in the Internet structure and growth. Our results provide hints towards the realistic modeling of the Internet evolution.
Article
Full-text available
We study the large-scale topological and dynamical properties of real Internet maps at the autonomous system level, collected in a 3-yr time interval. We find that the connectivity structure of the Internet presents statistical distributions settled in a well-defined stationary state. The large-scale properties are characterized by a scale-free topology consistent with previous observations. Correlation functions and clustering coefficients exhibit a remarkable structure due to the underlying hierarchical organization of the Internet. The study of the Internet time evolution shows a growth dynamics with aging features typical of recently proposed growing network models. We compare the properties of growing network models with the present real Internet data analysis.
Article
Full-text available
Simulating how the global Internet behaves is an immensely challenging undertaking because of the network's great heterogeneity and rapid change. The heterogeneity ranges from the individual links that carry the network's traffic, to the protocols that interoperate over the links, the “mix” of different applications used at a site, and the levels of congestion seen on different links. We discuss two key strategies for developing meaningful simulations in the face of these difficulties: searching for invariants and judiciously exploring the simulation parameter space. We finish with a look at a collaborative effort within the research community to develop a common network simulator
Article
Full-text available
For a grid network performance data set, we estimate propagation distances to get a lower bound on propagation delays. With a model for primary performance factors and assumptions about expected performance trends, we extrapolate to estimate the communication performance of a global grid in ten years time. Communication pipes are getting fatter, but due to simple propagation delays, are not getting commensurately shorter. Hence, bandwidth-delay products will rise correlated to distance. Based on conservative estimates, the bandwidth-delay products will rise by a median factor of 5.2x with an increasing distribution to a strong mode at 7.1x. This clearly indicates that latency tolerance must be integral to applications at an increasingly smaller scale using established techniques such as caching, compression and pre-fetching coupled with coarse-grain, data-driven execution models to hide latency with throughput. With the extrapolated performance data, we use a simple pipeline model to estimate an "operating region" for work granularity and number of computational threads to hide latency with throughput. We then discuss implications for programming and execution models.
Article
Seemingly overnight, the Internet has gone from an academic experiment to a worldwide information matrix. Along the way, computer scientists have come to realize that understanding the performance of the Internet is a remarkably challenging and subtle problem. This challenge is all the more important because of the increasingly significant role the Internet has come to play in society. To take stock of the field of Internet performance modeling, the authors organized a workshop at Schloß Dagstuhl. This paper summarizes the results of discussions, both plenary and in small groups, that took place during the four-day workshop. It identifies successes, points to areas where more work is needed, and poses “Grand Challenges” for the performance evaluation community with respect to the Internet.
Article
Recent Internet measurements have found pervasive evidence of some surprising scaling properties. The two we focus on in this paper are self-similar scaling in the burst patterns of Internet traffic and, in some contexts, scale-free structure in the network's interconnection topology. These findings have led to a number of proposed models or "explanations" of such "emergent" phenomena. Many of these explanations invoke concepts such as fractals, chaos, or self-organized criticality, mainly because these concepts are closely associated with scale invariance and power laws. We examine these criticality-based explanations of self-similar scaling behavior---of both traffic flows through the Internet and the Internet's topology---to see whether they indeed explain the observed phenomena. To do so, we bring to bear a simple validation framework that aims at testing whether a proposed model is merely evocative, in that it can reproduce the phenomenon of interest but does not necessarily capture and incorporate the true underlying cause, or indeed explanatory, in that it also captures the causal mechanisms (why and how, in addition to what). We argue that the framework can provide a basis for developing a useful, consistent, and verifiable theory of large networks such as the Internet. Applying the framework, we find that, whereas the proposed criticality-based models are able to produce the observed "emergent" phenomena, they unfortunately fail as sound explanations of why such scaling behavior arises in the Internet.
Article
The large-scale behavior of routing In the Internet has gone virtually without any formal study, the exceptions being Chinoy's (1993) analysis of the dynamics of Internet routing information, and work, similar in spirit, by Labovitz, Malan, and Jahanian (see Proc. SIGCOMM'97, 1997). We report on an analysis of 40000 end-to-end route measurements conducted using repeated “traceroutes” between 37 Internet sites. We analyze the routing behavior for pathological conditions, routing stability, and routing symmetry. For pathologies, we characterize the prevalence of routing loops, erroneous routing, infrastructure failures, and temporary outages. We find that the likelihood of encountering a major routing pathology more than doubled between the end of 1994 and the end of 1995, rising from 1.5% to 3.3%. For routing stability, we define two separate types of stability, “prevalence”, meaning the overall likelihood that a particular route is encountered, and “persistence”, the likelihood that a route remains unchanged over a long period of time. We find that Internet paths are heavily dominated by a single prevalent route, but that the time periods over which routes persist show wide variation, ranging from seconds up to days. About two-thirds of the Internet paths had routes persisting for either days or weeks. For routing symmetry, we look at the likelihood that a path through the Internet visits at least one different city in the two directions. At the end of 1995, this was the case half the time, and at least one different autonomous system was visited 30% of the time
Article
Complex networks describe a wide range of systems in nature and society, much quoted examples including the cell, a network of chemicals linked by chemical reactions, or the Internet, a network of routers and computers connected by physical links. While traditionally these systems were modeled as random graphs, it is increasingly recognized that the topology and evolution of real networks is governed by robust organizing principles. Here we review the recent advances in the field of complex networks, focusing on the statistical mechanics of network topology and dynamics. After reviewing the empirical data that motivated the recent interest in networks, we discuss the main models and analytical tools, covering random graphs, small-world and scale-free networks, as well as the interplay between topology and the network's robustness against failures and attacks. Comment: 54 pages, submitted to Reviews of Modern Physics
Tel Aviv, Israel (Ieee, Piscat-away
  • R Govindan
  • H Tangmunarunkit
Govindan, R. & Tangmunarunkit, H., in Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel (Ieee, Piscat-away, N.J. 2000).
  • R Pastor-Satorras
  • A Vázquez
  • A Vespignani
  • A Vázquez
  • R Pastor-Satorras
  • A Vespignani
Pastor-Satorras,R., Vázquez, A. & Vespignani, A. Phys. Rev. Lett. 87, 2587011-2587014 (2001); Vázquez, A., Pastor-Satorras, R. & Vespignani, A., Phys. Rev. E 65, 066130 (2002).