ArticlePDF Available

Abstract and Figures

Geolocation of Internet hosts enables a diverse and interesting new class of location-aware applications. Previous measurement-based approaches use reference hosts, called landmarks, with a well-known geographic location to provide the location estimation of a target host. This leads to a discrete space of answers, limiting the number of possible location estimates to the number of adopted landmarks. In contrast, we propose Constraint-Based Geolocation (CBG), which infers the geographic location of Internet hosts using multilateration with distance constraints, thus establishing a continuous space of answers instead of a discrete one. CBG accurately transforms delay measurements to geographic distance constraints, and then uses multilateration to infer the geolocation of the target host. Our experimental results show that CBG outperforms the previous measurement-based geolocation techniques. Moreover, in contrast to previous approaches, our method is able to assign a confidence region to each given location estimate. This allows a location-aware application to assess whether the location estimate is sufficiently accurate for its needs.
Content may be subject to copyright.
Constraint-Based Geolocation of Internet Hosts
Bamba Gueye, Artur Ziviani, Member, IEEE, Mark Crovella, Member, IEEE, and Serge Fdida, Member, IEEE
Abstract—Geolocation of Internet hosts enables a new class
of location-aware applications. Previous measurement-based ap-
proaches use reference hosts, called landmarks, with a well-known
geographic location to provide the location estimation of a target
host. This leads to a discrete space of answers, limiting the number
of possible location estimates to the number of adopted land-
marks. In contrast, we propose Constraint-Based Geolocation
(CBG), which infers the geographic location of Internet hosts
using multilateration with distance constraints to establish a
continuous space of answers instead of a discrete one. However,
to use multilateration in the Internet, the geographic distances
from the landmarks to the target host have to be estimated based
on delay measurements between these hosts. This is a challenging
problem because the relationship between network delay and
geographic distance in the Internet is perturbed by many factors,
including queueing delays and the absence of great-circle paths
between hosts. CBG accurately transforms delay measurements to
geographic distance constraints, and then uses multilateration to
infer the geolocation of the target host. Our experimental results
show that CBG outperforms previous geolocation techniques.
Moreover, in contrast to previous approaches, our method is able
to assign a confidence region to each given location estimate. This
allows a location-aware application to assess whether the location
estimate is sufficiently accurate for its needs.
Index Terms—Delay measurement, geolocation, internet, multi-
lateration, position measurement.
NOVEL location-aware applications could be enabled by an
efficient means of inferring the geographic location of In-
ternet hosts. Examples of such location-aware applications in-
clude targeted advertising on web pages, automatic selection of
a language to display content, restricted content delivery fol-
lowing regional policies, and authorization of transactions only
when performed from pre-established locations. Inferring the
location of Internet hosts from their IP addresses is a challenging
problem because there is no direct relationship between the IP
address of a host and its geographic location [1].
Previous work on the measurement-based geolocation of In-
ternet hosts [2], [3] uses the positions of reference hosts, called
landmarks, with well-known geographic location as the possible
Manuscript received September 30, 2004; revised July 11, 2005 and October
27, 2005; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor R.
Caceres. This work was supported in part by FAPERJ, CNPq, Euronetlab, the
National Science Foundation under Grants CCR-0325701 and ANI-0322990,
and by a grant from Intel. A short version of this paper was presented at the
ACM/SIGCOMM Internet Measurement Conference (IMC’04), October, 2004.
B. Gueye and S. Fdida are with the Laboratoire d’Informatique de Paris 6
(LIP6/CNRS), Université Pierre et Marie Curie (Paris 6), Paris, France (e-mail:;
A. Ziviani is with the National Laboratory for Scientific Computing (LNCC),
Petrópolis, Brazil (e-mail:
M. Crovella is with the Department of Computer Science, Boston University,
Boston, MA 02215 USA (e-mail:
Digital Object Identifier 10.1109/TNET.2006.886332
location estimates for the target host. This leads to a discrete
space of answers; the number of answers is equal to the number
of reference hosts, which can limit the accuracy of the resulting
location estimation. This is because the closest reference host
may still be far from the target.
To overcome this limitation, we propose the Constraint-Based
Geolocation (CBG) approach, which infers the geographic lo-
cation of Internet hosts using multilateration. Multilateration
refers to the process of estimating a position using a sufficient
number of distances to some fixed points. As a result, multilat-
eration establishes a continuous space of answers instead of a
discrete one. We use a set of landmarks to estimate the location
of other hosts. The fundamental idea is that given geographic
distances to a given target host from the landmarks, an estima-
tion of the location of the target host would be feasible using
multilateration, just as the Global Positioning System (GPS) [4]
does. However, to use multilateration in the Internet, geographic
distances from the landmarks to the target host have to be esti-
mated based on delay measurements between these hosts. This
is a challenging task because delay measurements cannot always
be transformed accurately to geographic distances, since net-
work delay is not necessarily well correlated with geographic
distance. This happens because the relationship between net-
work delay and geographic distance in the Internet is perturbed
by many factors, including queueing delays, violations of tri-
angle inequality [5], and the absence of great-circle paths be-
tween hosts [6].
In recent years, several propositions, such as GNP [7], Virtual
Landmarks [8], and Vivaldi [9], have addressed the evaluation
of network proximity between Internet hosts using coordinate
systems. Nevertheless, distance in the context of network prox-
imity problems refers to the network delay between a pair of
Internet hosts. In contrast, for geolocating hosts, distances refer
to actual geographic distances between hosts. Therefore, to the
best of our knowledge, CBG is the first effort to use multilater-
ation for geolocating Internet hosts.
A key element of CBG is its ability to accurately transform
delay measurements into distance constraints. The starting point
is the fact that digital information travels along fiber optic ca-
bles at almost exactly 2/3 the speed of light in a vacuum [10].
This means that any particular delay measurement immediately
provides an upper bound on the great-circle distance between
the endpoints. The upper bound is the delay measurement di-
vided by the speed of light in fiber. Thus, from the standpoint
of a particular pair of endpoints, there is some theoretical min-
imum delay for packet transmission, dictated by the great-circle
distance between them. Therefore, no matter the reason (e.g.
queueing delays, violations of the triangle inequality, absence
of great-circle paths between hosts, and so on), the actual mea-
sured delay between them involves only an additive distortion.
1063-6692/$20.00 © 2006 IEEE
However, if CBG were to use simple delay measurements
directly to infer distance constraints, it would not be very ac-
curate. For accurate results, it is important to estimate and re-
move as much of the additive distortion as possible. CBG does
this by self-calibrating the delay measurements taken from each
measurement point. This is done in a distributed manner as ex-
plained in Section III. After self-calibration, CBG can more
accurately transform a set of measured delays to a target into
distance constraints. Self-calibration deals with the several rea-
sons that contribute to deviate the measured delay from the the-
oretical minimum delay corresponding to the great-circle dis-
tance between hosts. Some of these reasons, such as circuitous
routing, localized delay, and shared paths, are further discussed
in Section IV-F. CBG then uses multilateration with these dis-
tance constraints to establish a geographic region that contains
the target host. In our experimental results, this region always
contains the target host; identifying this region is CBGs prin-
cipal output. Given the target region, a reasonable guessas
to the hosts location is at the regions centroid, which is what
CBG uses as a point estimate of the targets position.
In contrast to previous approaches, CBG is able to assign a
condence region to the location estimate. This allows a loca-
tion-aware application to assess whether the estimate is suf-
ciently accurate for its needs. A location server that uses CBG
may be queried by a host that wants to learn its own location
as well as by a web server that desires to locate its clients,
for instance. Thus, using CBG-based geolocation service, both
server- and host-driven protocols are possible.
We evaluate CBG using real-life datasets and a PlanetLab
[11] deployment with hosts that are geographically distributed
through the continental U.S. and Western Europe. These
datasets comprise 95 landmarks in the U.S. and 42 landmarks
in Western Europe. Results for both datasets suggest that a
certain number of landmarks, typically about 30, are needed
to level off the mean error distance. Our experimental results
are promising and show that CBG outperforms previous ge-
olocation techniques. The median error distance is below 25
km for the Western Europe dataset and below 100 km for the
U.S. dataset. For the majority of evaluated target hosts, the
obtained condence regions allow a resolution at the regional
level. Furthermore, from the obtained results, we are also
able to indicate some reasons that lead to inaccurate location
estimates, including xed delay and the sharing of paths by
the measurements. Concerning the PlanetLab deployment, the
median error distance is below 50 km for target hosts located in
Europe and below 130 km for U.S. target hosts.
This paper is organized as follows. Section II discusses the
main motivations for geolocating Internet hosts, reviews the
related work on this eld, and points out the contributions of
CBG in contrast to previous approaches. In Section III, we in-
troduce CBG and its methodology to use multilateration with
geographic distance constraints based on delay measurements
to infer the location of Internet hosts. Following that, we present
results for datasets in Section IV and for PlanetLab experiments
in Section V. We discuss some issues related to geolocation
techniques in Section VI. Finally, we conclude andpresent some
research perspectives in Section VII.
A. Motivation
We expect the wide availability of location information to
enable the development of location-aware applications that can
be useful to both private and corporate users. For example:
Targeted advertising on web pages: Online consumers may
have different regional preferences based on where they
live. Being able to locally tailor products, marketing strate-
gies, and contents confers a business advantage;
Restricted content delivery: Following regional policies,
a geographic location service can determine which client
has access to content. Similarly, enforcement of localized
regulation is enabled;
Location-based security check: If authorized locations are
known, an e-commerce transaction that is requested from
elsewhere might generate warnings on atypical or unautho-
rized behavior of a customer.
A large range of location-aware applications may be en-
visaged based on an IP address to location mapping service,
beneting end users as well as network management. Further-
more, different location-aware applications may have different
requirements for the accuracy of the location information. Our
goal is thus to provide a methodology that is able to geolocate
Internet hosts with reasonable accuracy while associating a
condence region with the given answer.
B. Related Work
An approach based on using additional DNS records to pro-
vide a geographic location service of Internet hosts is proposed
by Davis et al. in RFC 1876 [12]. Nevertheless, the adoption of
this approach has been limited since it requires changes in the
DNS records and administrators have little motivation to register
new location records. Tools such as [13] and [14] query Whois
databases in order to obtain the location information recorded
therein to infer the geographic location of a host. This informa-
tion, however, may be inaccurate or stale. Moreover, if a large
and geographically dispersed block of IP addresses is allocated
to a single entity, the Whois databases may contain just a single
entry for the entire block.
There are also some geolocation services based on an exhaus-
tive tabulation between IP addresses ranges and their locations.
This is the case of some projects [15], [16] or commercial ser-
vices [17], [18]. It is hard to compare this approach with our
work because the algorithms are proprietary. In any case, ex-
haustive tabulation is difcult to manage and to keep up to date.
Padmanabhan and Subramanian [3] investigate three dif-
ferent techniques to infer the geographic location of an Internet
host. The rst technique infers the location of a host based
on the DNS name of the host or another nearby node. This
DNS-based technique is the base of GeoTrack [3], GTrace [19],
and SarangWorld Traceroute project [20]. Quite often network
operators assign geographically meaningful names to routers,
presumably for administrative convenience. For example, the
name indicates a router
located in Paris, France. Nevertheless, not all names contain
an indication of location. Since there is no standard, operators
commonly develop their own rules for naming their routers
even if the names are geographically meaningful. Thus, parsing
rules to recognize a location from a node name must be specic
to each operator, imposing great challenges in the creation and
management of such rules. Further, since the position of the last
recognizable router in the path toward the host to be located is
used to estimate the position of this host, a lack of accuracy is
also expected.
The second technique splits the IP address space into clusters
such that all hosts with an IP address within a cluster are likely to
be co-located. Knowing the location of some hosts in the cluster
and assuming they are in agreement, the technique infers the
location of the entire cluster. An example of such a technique is
GeoCluster [3]. This technique, however, relies on information
that is partial and possibly inaccurate. The information is partial
because it comprises location information for a relatively small
subset of the IP address space. Moreover, such information may
be inaccurate because the databases rely on data provided by
users, which may be unreliable.
The third technique (GeoPing) is the closest to ours, as it is
based on exploiting a possible correlation between geographic
distance and network delay [3]. The location estimation of a host
is based on the assumption that hosts with similar network de-
lays to some xed probe machines tend to be located near each
other. This assumption is similar to the one exploited by wireless
positioning systems such as RADAR [21] concerning the rela-
tionship between signal strength and distance. Therefore, given
a set of landmarks with a well-known geographic location, the
location estimate for a target host is the location of the landmark
presenting the most similar delay pattern to the one observed for
the target host.
In GeoPing-like methods, the number of possible location es-
timates is limited to the number of adopted landmarks, resulting
in a discrete space of answers. As a consequence, the accuracy
of this discrete space system is directly related to the number
and placement of the adopted landmarks [2]. Thus, in order to
increase the accuracy of techniques like GeoPing, it is neces-
sary to add additional landmarks. In [22], a measurement-based
geolocation technique with a discrete space of answers is evalu-
ated with respect to methods for assessing the similarity among
the gathered delay patterns. In Section IV-C, we compare CBG
with DNS-based and GeoPing-like methods and show that CBG
outperforms them.
C. Contributions
In this section, we summarize the contributions of CBG with
respect to related work in geolocation of Internet hosts:
CBG establishes a dynamic relationship between IP ad-
dresses and geographic location. This dynamic relation-
ship results from a measurement-based approach in which
landmarks cooperate in a distributed and self-calibration
manner, allowing CBG to adapt itself to time-varying net-
work conditions. This contrasts with previous work that
relies on a static relationship by using queries on Whois
databases, exhaustive tabulation, or unreliable information
provided by users;
A major contribution of CBG is to point out that delay
measurements can be transformed to geographic distance
constraints to be used in multilateration, potentially leading
to more accurate location estimates of Internet hosts;
By using multilateration with distance constraints, CBG
offers a continuous space of answers instead of a discrete
one as do previous measurement-based approaches;
CBG assigns a condence region to each location estimate,
allowing location-aware applications to assess whether the
location estimate has enough resolution with respect to
their needs.
In this section, we describe how the CBG methodology is
able to transform delay measurements to distance constraints
and apply these in a multilateration process to geolocate In-
ternet hosts. The use of distance constraints is the CBG insight
to deal with the difculty in accurately transforming delay mea-
surements to geographic distances in the Internet.
A. Multilateration With Geographic Distance Constraints
The position of a point can be estimated using a sufcient
number of distances or angle measurements to some xed points
whose positions are known. When dealing with distances, this
process is called multilateration. Likewise, when dealing with
angles, it is called multiangulation. Strictly speaking, triangu-
lation refers to an angle-based position estimation process with
three reference points. However, quite often the same term is
adopted for any distance or angle-based position estimation. In
spite of the popularity of the term triangulation, we adopt the
more precise term multilateration in this paper.
The main problem that stems from using multilateration is the
accurate measurement of the distances between the target point
to be located and the reference points. For example, the Global
Positioning System (GPS) [4] uses multilateration to some satel-
lites to estimate the position of a given GPS receiver. In the case
of GPS, the distance between the GPS receiver and a satellite
is measured by timing how long it takes for a signal sent from
the satellite to arrive at the GPS receiver. Precise measurement
of time and time interval is at the heart of GPS accuracy. Each
satellite typically has atomic clocks on board and receivers use
inexpensive quartz oscillators. Therefore, in the case of GPS,
multilateration is performed with perfectdistances (i.e., with
negligible errors) from time measurements and hence very ac-
curate position estimations are feasible. In contrast to GPS, it
is a challenging problem to transform Internet delay measure-
ments to geographic distances accurately. This is likely to be
the reason why direct multilateration has remained so far unex-
ploited for the purposes of geolocating Internet hosts. Hereafter,
we explain the CBG design principles that enable the multilat-
eration with geographic distance constraints.
Consider a set of landmarks (ref-
erence hosts with a well-known geographic location). For the
location of Internet hosts using multilateration, we tackle the
problem of estimating the geographic distance from the target
host to be located to these landmarks given the delay measure-
ments to the landmarks. From a measurement viewpoint, the
end-to-end delay over a xed path can be split into two com-
ponents: a deterministic (or xed) delay and a stochastic delay
[23]. The deterministic delay is composed of the minimum pro-
cessing time at each router, the transmission delay, and the prop-
agation delay. This deterministic delay is xed for any path. The
stochastic delay comprises the queueing delay at the interme-
diate routers and the variable processing time at each router that
exceeds the minimum processing time. Besides the stochastic
delay, the conversion from delay measurements to geographic
distance is also distorted by other sources as well. The effects
of different sources of distortion on the relationship between
network delay and geographic distance are further discussed in
Section IV-F.
The fundamental insight for the CBG methodology is that,
no matter the reason, delay is only distorted additively with
respect to the time for light in ber to pass over the great-circle
path. Therefore, we are interested in beneting from this in-
variant by developing a method to estimate geographic distance
constraints from these additively distorted delay measurements.
How CBG uses this insight to infer the geographic distance
constraints between the landmarks and the target host from
delay measurements is detailed in Section III-B. It is also
shown that as a consequence of the additive delay distortion,
the resulting geographic distance constraints are generally
overestimated with respect to the real distances.
Fig. 1 illustrates the multilateration in CBG using the set of
landmarks in the presence of some additive
distance distortion due to imperfect measurements. Each land-
mark intends to infer its geographic distance constraint to a
target host with unknown geographic location. Nevertheless,
the inferred geographic distance constraint is actually given by
, i.e., the real geographic distance plus an
additive geographic distance distortion represented by . This
purely additive distance distortion results from the eventual
presence of some additive delay distortion. As a consequence
of having additive distance distortion, the location estimation of
the target host should lie somewhere within the gray area (cf.
Fig. 1) that corresponds to the intersection of the overestimated
geographic distance constraints from the landmarks to the target
B. From Delay Measurements to Distance Constraints
Before we introduce how CBG converts delay measurements
to geographic distance constraints, let us rst observe a sample
scatter plot relating geographic distance and network delay. This
sample, shown in Fig. 2, is taken from the experiments de-
scribed in Section IV. The axis is the geographic distance and
the axis is the network delay between a given landmark
and the remaining landmarks. Therefore, dots represent an ob-
served relationship between geographic distance and network
delay within the network as seen by landmark with respect
to each other landmark in the set. The meanings of baseline
and bestlinein Fig. 2 are explained in this section.
Recent work [2], [3] investigates the correlation coefcient
found within this kind of scatter plot, deriving a least-squares
tting line to characterize the relationship between geographic
distance and network delay. In contrast, we consider the reasons
why points are scattered in the plot above, and argue that what is
important is not the least-squares t, but the tightest lower linear
Fig. 1. Multilateration with geographic distance constraints.
Fig. 2. Sample scatter plot of geographic distance and network delay.
Based on these considerations, we propose a novel approach
to establish a dynamic relationship between network delay and
geographic distance. To illustrate this approach, suppose the ex-
istence of great-circle paths between the landmark and each
one of the remaining landmarks. Further, consider also that,
when traveling on these great-circle paths, data are only subject
to the propagation delay of the communication medium. In this
perfect case, we should have a straight line comprising this re-
lationship that is given by the slope-intercept form ,
where since there are no xed delays and is only re-
lated to the speed bits travel in the communication medium.
As already noted, digital information travels along ber optic
cables at almost exactly 2/3 the speed of light in vacuum [10].
This gives a very convenient rule of 1 ms round-trip time (RTT)
per 100 km of cable. Such a relationship may be used to ob-
tain an absolute physical lower bound on the RTT (or one-way
delay) between sites whose geolocations are well known. This
lower bound is shown as the baselinein Fig. 2. In this ideal-
ized case, we could simply use this convenient rule to extract
the accurate geographic distance between sites from delay mea-
surements in a straightforward manner. Nevertheless, in prac-
tice, these great-circle paths rarely exist. Therefore, we have to
deal with paths that deviate from this idealized model for sev-
eral reasons, including queueing delay and lack of great-circle
paths between hosts.
As stated in Section III-A, the main insight behind CBG is
that the combination of different sources of delay distortion with
respect to the perfect great-circle case only produces an addi-
tive delay to the theoretical minimum delay associated with the
great-circle distance. We thus model the relationship between
network delay and geographic distance using delay measure-
ments in the following way. We dene the bestlinefor a given
landmark as the line that is closest to, but below,
all data points and has a non-negative intercept, since it
makes no sense to consider negative delays. A positive intercept
in the bestline reects the presence of some xed delay. Note
that each landmark computes its own bestline with respect to
all other landmarks. Therefore, the bestline can be seen as the
line that captures the least distorted relationship between geo-
graphic distance and network delay from the viewpoint of each
landmark. The distance of each data point to the bestline cor-
responds to the presence of some source of extra additive dis-
tortion with respect to the best-observed case, i.e., the bestline.
The region separating the bestline and the baseline (cf. Fig. 2)
represents the observed gap between the current relationship of
geographic distances and network delays within the network and
the idealized case.
The nding of the bestline is formulated as a linear program-
ming problem. For a given landmark , there are the network
delay and the geographic distance toward each landmark
, where . We need to nd for each landmark the slope
and the intercept that determine the bestline given by the
slope-intercept form . The condition that the best-
line for each landmark should lie below all data points
denes the feasible region where a solution should lie:
The objective function to minimize the distance between the
line with non-negative intercept and all the delay measurements
is stated as
where is the slope of the baseline. We use (2) to nd the so-
lution and from (1) that determines the bestline for each
landmark . Each landmark then uses its own bestline to
convert the delay measurement to the target host into a geo-
graphic distance. Thus, the estimated geographic distance con-
straint between a landmark and the target host is derived
from the delay distance using the bestline of the landmark
as follows:
If delays between landmarks are periodically gathered, this
leads to a self-calibrating algorithm that determines how each
landmark currently observes the dynamic relationship between
network delay and geographic distance within the network.
C. Using Distributed Distance Constraints to Geolocate Hosts
CBG uses a geometric approach using multilateration to esti-
mate the location of a given target host . Each landmark in-
fers its geographic distance constraint to the target host , which
is actually the additively distorted distance ,
using (3). Therefore, each landmark estimates that the target
host is somewhere within the circumference of a circle
centered at the landmark with a radius equal to the estimated
geographic distance constraint (similar to the example of
Fig. 1). Given landmarks, the target host has a collection of
closed curves that can be seen as an
order- Venn diagram. Out of the possible regions dened
by this order- Venn diagram for the target host , we are in-
terested in the unique region that forms the intersection of all
closed curves given by
The region corresponds to the gray area of Fig. 1 that hope-
fully comprises the real position of the target host . Note that
is convex, since the regions are convex, and the intersection
of convex sets is itself convex. The conversion from the addi-
tively distorted delay measurements to geographic distance con-
straints will overestimate these distance constraints. The goal is
to assure that since each landmark overestimates its geographic
distance constraint toward the target host, there will be a re-
gion determined by the intersection of all the curves with an
overestimated radius. If the baseline were used for this conver-
sion, the geographic distances would be strongly overestimated
based on the delay measurements because these measurements
are taken in a non-idealized case. This would potentially create
a very large intersection region for a given target host that
would provide an inaccurate location estimation for this target
host. In contrast, the bestline captures the best relationship be-
tween network delay and geographic distance as currently ob-
served within the network. Therefore, the idea behind using the
bestline is to minimize the overestimation of the geographic dis-
tances by taking into account the current network conditions as
constraints. Using a certain number of landmarks intends to in-
troduce some diversity into the bestline computation so that the
bestline represents the best observed case for a set of different
reference points given network conditions.
D. Effects of Over and Underestimation of Distance
When establishing the set of closed curves for a given
target host , there are three possible resulting situations: 1) the
geographic distance constraints from all landmarks are overes-
timated; 2) the geographic distance constraints from all land-
marks are underestimated; 3) the geographic distance constrains
are overestimated for some landmarks and underestimated for
Fig. 3. Effects of the over and underestimation of the geographic distance constraints. (a) Overestimated distance constraints. (b) Underestimated distance con-
straints. (c) Mismatch.
the remaining landmarks, leading to a mismatch among land-
marks. Fig. 3 depicts these three situations.
In Fig. 3(a), geographic distance constraints are overesti-
mated. As a consequence, CBG can determine an intersection
region and use it to infer the location of the target host .We
expect that this is the only likely situation to occur if a sufcient
number of landmarks is used. Experimental results presented in
Sections IV-B and V indeed conrm that distance constraints
are overestimated for all considered target hosts.
If the geographic distance constraints to the target host from
all landmarks are underestimated, as shown in Fig. 3(b), the re-
gion is empty, i.e., there is no intersection region at all. This
situation happens only if the target host presents, from the view-
point of the landmarks, a smaller ratio of geographic distance
to network delay than the one represented by the bestline, i.e.,
smaller than the one from all landmarks. This is clearly unlikely.
In this case, based on the bestline approach, CBG will not nd
sufcient information to infer a location estimation. As a conse-
quence, CBG declares that a location estimation is not possible
for this specic target host . This is rather an important prop-
erty of CBG because for several applications no location esti-
mation at all may be better than blindly providing a geolocation
estimate of the target host as other techniques would do.
In Fig. 3(c), we illustrate a situation where two landmarks,
and , overestimate their geographic distance constraints to the
target host while the landmark underestimates its distance
constraint. The mismatch in the distance constraints among the
landmarks results in an intersection region that does not include
the target host . This would defeat our methodology because
the location estimation would be inferred as being inside the
intersection region, away from the real position of the target
host. We currently do not handle mismatches and this is left for
future work, although we expect this mismatch situation to be
unlikely. First, consider two groups of landmarks: one whose
members overestimate their geographic distance constraints to
the target host and another group wherein this distance con-
straint is underestimated. The mismatch situation happens when
the observed relationship between geographic distance and net-
work delay from these two groups toward the target host is very
unbalanced. Although we know that routing asymmetry (and, as
a consequence, capacity asymmetry) is somewhat usual in the
Internet, we believe that the differences in capacity are unlikely
to be enough to result in the mismatch situation. Moreover, the
self-calibrating nature of the CBG method incorporates in the
construction of each bestline the current network condition as
seen by the whole set of landmarks. Therefore, each landmark
has an unilateral viewpoint to the remaining landmarks, thus in-
corporating eventual asymmetries in the network conditions.
In summary, the CBGs method of transforming delay mea-
surements to distance constraints is a constrained distance over-
estimation. This constrained overestimation results in an inter-
section region, whereby CBG estimates the location of the target
host. In the case that a target host presents underestimated ge-
ographic distance constraints to the landmarks, CBG is able to
detect this situation and then decline to provide a location esti-
mation. The self-calibrating nature of CBG elegantly avoids a
mismatch situation where the system would be defeated. We in-
deed conrm that the geographic distance constraints are over-
estimated in all our experiments (see Sections IV and V) and
that a consistent location estimation has been always feasible in
these experiments.
A. Datasets
To validate our methods, we need datasets with hosts whose
geographic locations are well known. Unfortunately, datasets
that provide the geolocation of the involved hosts are un-
common. For our experiments, we then use two datasets:
RIPE: data collected in the Test Trafc Measurements
(TTM) project of the RIPE network [24]. Each RIPE host
generates nearly 300 kB per day toward every other RIPE
host with an average of two packets sent per minute. The
dataset we consider is composed by the 2.5 percentile of
Fig. 4. Geographic location of landmarks (not to the same scale). (a) 42 land-
marks in Western Europe (RIPE dataset). (b) 95 landmarks in the continental
U.S. (AMP dataset).
the one-way delay observed from each RIPE host to each
other host in the set during a period of 10 weeks from
early December 2002 until February 2003. Most RIPE
hosts are located in Europe and they are all equipped with
GPS cards, thus allowing their exact geographic position
to be known. We then use the 42 RIPE hosts located in
Western Europe (W.E.) to compose our W.E. landmark
dataset. Fig. 4(a) shows the geographic distribution of the
W.E. dataset.
NLANR AMP: data collected in the NLANR Active Mea-
surement Project (AMP) [25]. Delay is sampled on average
once a minute. This leads to an average measurement load
of about 144 kB per day sent by each AMP host toward
each other AMP host. The dataset we consider is com-
posed by the 2.5 percentile of the RTT delay between all
the participating nodes located in the continental United
States (U.S.), in a total of 95 hosts. This data was collected
on January 30, 2003 and is symmetric. The exact location
of each participating node (in pairs of latitude and longi-
tude) is also available. These 95 AMP hosts compose our
U.S. landmark dataset. Their geographic distribution is il-
lustrated in Fig. 4(b).
The minimum RTT taken on the measurements is used since
it is more likely to be reective of actual propagation delay and
may lter out some of the effects of queueing and local delay.
We note that no correctly-measured RTT can be an underesti-
mate of the minimum RTT. Nevertheless, we indeed consider
that some RTTs may be erroneously measured. We thus use the
2.5 percentile to avoid erroneous under-measurements of the
minimum RTT.
The experimental datasets comprise hosts in United States
and Western Europe. The main reason for this restriction is that
the datasets we have had correspond to hosts located in these
regions. Nevertheless, we indeed believe that the results we re-
port in this paper are interesting and promising in spite of being
limited to the U.S. and Western Europe.
Using the gathered delays in each dataset, we construct two
delay matrices and with dimensions (42 42)
and (95 95), respectively. We consider all hosts in each
dataset as landmarks, leading to two sets of landmarks:
and .
We then nd the set of bestlines, as described in Section III-B,
for each element belonging to each landmark dataset and
. The bestline computation for each landmark is done
considering only landmarks of the same dataset. The set of best-
lines is determined by a slope vector
and an intercept vector for each landmark
dataset. After computing the bestline for each landmark in
the landmark dataset, delays in each dataset are converted to
geographic distance constraints applying (3). This results in
two geographic distance constraint matrices and .
These matrices comprise the additively distorted geographic
distances between the landmarks that we use in our experiments
for performance evaluation.
In our experiments, we geolocate each host one at a time using
CBG. The remaining hosts in the same dataset are then consid-
ered as landmarks to perform the location estimation of a target
host. The bestline of each landmark is computed using the set of
landmarks of each scenario, thus excluding the target host. We
stress that when we calculate the bestline for a particular exper-
iment in which we are geolocating a given target host, we do not
include this target host in the bestline calculation. We repeat this
procedure to evaluate the resulting location estimation of each
host in both U.S. and W.E. landmark datasets.
B. Location Estimation of a Target Host
From the geographic distance constraints in matrices
and , CBG determines for each target host a set of closed
curves (see Section III-C), where
for the W.E. dataset and for the U.S. dataset.
Each curve in is centered at its respective landmark and
has as radius the estimated geographic distance constraint .
To illustrated the CBG methodology, Fig. 5 shows two example
sets of closed curves extracted from our experimental study.
Fig. 5(a) refers to the location estimation of a RIPE host in Brus-
sels, Belgium. There are 41 curves corresponding to the view-
points of the remaining landmarks in the W.E. landmark dataset.
Similarly, Fig. 5(b) presents the set of 94 closed curves used
to estimate the location of an AMP host located in Lawrence,
Kansas, USA.
The gray areas in Fig. 5(a) and (b) represent the respective re-
gions , i.e., the intersection of all closed curves in each case.
Fig. 5. Two location estimation examples (not to the same scale). (a) RIPE host
in Brussels, Belgium. (b) AMP host in Kansas, U.S.
In our experiments, we take all hosts in the datasets and use
them one at a time to be target hosts. It is important to point out
that for all the target hosts in both landmark datasets, there is
always a region that contains the target host. This means that
CBG successfully overestimates the geographic distance con-
straints for all target hosts. Such a result veries that the situa-
tion of Fig. 3(a) is indeed prevalent, at least in our experimental
datasets, as postulated in Section III-D.
The area of the intersection region , i.e., the gray areas in
Fig. 5(a) and (b), indicates the condence region that CBG asso-
ciates with each location estimate. Note that in most cases con-
dence regions have a relatively small area, not visible in similar
plots with all closed curves (Sections IV-D and V present results
on the sizes of condence regions in our experiments). These
two examples have larger condence regions than are typical,
but are chosen so that the region is sufciently visible so as to
illustrate the CBG methodology.
C. Geolocating Internet Hosts
The region is the location estimate of CBG. Given this
region, a reasonable guessas to the target hosts location is
at the regions centroid. Therefore, CBG uses the centroid of
region as a point estimate of the targets position.
Fig. 6. Sample result from the polygon heuristic (not to the same scale). (a)
Locating the RIPE host in Brussels, Belgium. (b) Locating the AMP host in
Kansas, U.S.
We adopt the following heuristic to approximate the inter-
section region by a polygon. The resulting polygon is used
to approximately measure the area of the region and pro-
vide an estimate of the point location of the target host. To
form the polygon, we consider as vertices the crossing points
of the circles that belong to all circles. Since the region
is convex, the polygon is an underestimate of the area of .For
example, in Fig. 1, the vertices would be the crossing points
of the dashed lines that touch the gray area, thus determining
a polygon that approximates this area. Thus, we approximate
the region by a polygon made up of line segments between
vertices , . The last vertex
is assumed to be the same as the rst, i.e., the
polygon is closed. The area of a non-self-intersecting polygon
with vertices is
given by
where denotes the determinant of matrix . The centroid
of the polygon, i.e., the position estimate of the target host ,
is positioned at given by
Fig. 7. Error distance for CBG, DNS-based, and GeoPing-like methods. (a) U.S. dataset. (b) Western Europe dataset.
Fig. 8. Error distance for CBG in the U.S. and W.E. datasets.
The point estimate of the target host and the estimate of the
condence region are the centroid and the area of the
approximated polygon, respectively. Fig. 6 shows two sample
polygons provided by this heuristic. The gray areas presented in
Fig. 6 are the resulting polygon approximations of intersection
regions shown in Fig. 5. Solid circles indicate the real location
of each target host while crosses indicate the point estimate pro-
vided by the centroid of the polygon.
After inferring the point estimate for each considered target
host, we compute the error distance, which is the difference be-
tween the estimated position and the real location of the target
host . We compare our performance with the results obtained
by a DNS-based method and by a GeoPing-like measure-
ment-based geolocation system. The DNS-based method (i.e.,
SarangWorld Traceroute project [20]) performs traceroutes
Fig. 9. Condence regions provided by CBG in km .
toward the target host and it infers the geolocation of interme-
diate routers from their DNS names. The inferred geolocation
of the closest recognizable router with respect to the target host
is used as a location estimate. The GeoPing-like method uses a
measurement-based approach with a discrete space of answers
[2], [3], i.e., where the location of the landmarks are used as
location estimates.
Fig. 7 shows the cumulative distribution function (CDF)
of the observed error distance using CBG, the DNS-based
method, and the GeoPing-like approach with a discrete set of
answers. CBG outperforms the DNS-based approach as well
as the GeoPing-like method. The performance gap between
the two measurement-based approaches is more signicant
in the Western Europe dataset. This is probably because this
dataset presents fewer landmarks than the U.S. dataset. In the
discrete space approach, since the number of possible answer
is limited to the locations of the landmarks, the number and
placement of landmarks is a key point to the performance [2].
In Section IV-E, we investigate the impact of the number of
adopted landmarks on the performance of CBG.
Fig. 10. Error distance as a function of the number of landmarks. (a) U.S. dataset. (b) Western Europe dataset.
In Fig. 8, we compare further the CBG results in error dis-
tance for the U.S. and W.E. datasets. The mean error distance
in the U.S. dataset is 182 km, whereas for the W.E. dataset
the mean error distance is 78 km. Most hosts in both landmark
datasets have a quite good location estimation. The median error
distance and the 80th percentile for the U.S. dataset are 95 km
and 277 km, respectively. In the W.E. dataset, the median error
distance is 22 km and the 80th percentile is 134 km. We identify
and discuss reasons of inaccurate estimations in further detail in
Section IV-F.
D. Confidence Region of a Location Estimation
The total area of the intersection region is somewhat re-
lated to the condence that CBG assigns to the resulting location
estimate. Intuitively, this area quanties the geographic extent or
spread of each location estimate in km . The smaller the area of
region , the more condent CBG is in this location estimate.
Therefore, in contrast to previous measurement-based geoloca-
tion techniques, CBG assigns a condence region in km to each
location estimate. We believe this is important because this con-
dence region may be used by location-aware applications to
evaluate to which extent they can rely on the given location es-
timate. Furthermore, we envisage location-aware applications
with different requirements on accuracy. By using the con-
dence region, these location-aware applications may decide if
the provided location estimate has sufcient resolution with re-
spect to their particular needs.
Fig. 9 presents the CDF of condence regions in km for lo-
cation estimates in both the U.S. and W.E. landmark datasets.
Results show that, for the U.S. dataset, CBG assigns a con-
dence region with a total area less than km for around 80%
of location estimates. This area is slightly larger than Portugal or
the U.S. state of Indiana. For the W.E. dataset, 80% of location
estimates have a condence region of up to km , thus en-
abling regional host location. A condence region of less than
km , which is equivalent to a large metropolitan area, is
achieved by 25% of target hosts for the U.S. dataset and by 65%
of target hosts for the W.E. dataset.
E. Impact of the Number of Landmarks
In this section, we evaluate the impact of the number of
adopted landmarks in the performance of CBG. For each
dataset, we compute the mean error distance as the average
of all error distances corresponding to several random sets
of landmarks chosen out of the total number of available
landmarks (42 for the W.E. dataset and 95 for the U.S. dataset).
Because the number of possible placement combinations be-
come very large as we increase , we do not consider all the
possible choices of landmarks out of each dataset.
Fig. 10 shows different percentile levels of the error distance
of CBG location estimates as a function of the number of
adopted landmarks. For example, the 90th percentile curve
represents the error distance at which the CDF plot of mean
error distance meets the 0.90 probability mark. Error bars
indicate the 99% condence interval. These results suggest that
a certain number of landmarks, typically about 30, is needed to
level off the mean error distance for both datasets.
F. On the Reasons of Inaccurate Estimations
Two aspects contribute to add basic robustness to the lo-
cation inference performed by CBG against factors that may
weaken the relationship between network delay and geographic
distance. First, delay is measured from multiple geographi-
cally distributed landmarks rather than from three locations as
would be sufcient for a triangulation with perfectlyaccurate
measurements like in GPS. Second, the minimum RTT, among
several RTT samples, is considered rather than an individual
delay sample to avoid considering queueing delay. Besides
these two sources of distortion, the conversion from delay
measurements to geographic distance constraints may be also
distorted by other sources as well and these are discussed in the
1) Circuitous Routing: Route circuitousness indicates the de-
gree to which the network path deviates from the great-circle
path between two nodes. Subramanian et al. [6] examine how
circuitous Internet paths are and show that the level of net-
work connectivity and the interconnection policies between au-
tonomous systems directly impact the circuitousness of a path.
Fig. 11. Condence region as a function of the intercept (localized delay). (a) U.S. dataset. (b) Western Europe dataset.
Further, at the network level, Internet paths are not necessarily
optimal since end-to-end paths can be signicantly longer than
needed. This phenomenon has been recently analyzed under dif-
ferent names, such as path ination [26] or routing stretch [27],
and also contributes to path circuitousness.
CBG deals with these deviations from the idealized great-
circle paths between hosts. This is done as each landmark self-
calibrates its vision to the relationship between network delay
and geographic distance when computing the bestline. The best-
line at each landmark reects the known path that is the closest
to the great-circle path (represented by the baseline). Therefore,
the bestline incorporates the deviations from the great-circle
path as they are seen with respect to all other landmarks.
2) Localized Delay: Localized delay refers to the situation
in which there is a constant amount of delay that appears to
be added to all delay measurements to a given host. Localized
delays may emerge from low-speed access links, local conges-
tion, or both. In CBG, localized delay is represented by the in-
tercept of the computed bestlines. In other words, the target
sees landmarks as having a nonzero minimum delay even for
landmarks that are collocated with the target. The presence of
excessive localized delays is misleading because the geographic
distance constraints tend to be largely overestimated, leading to
large condence regions.
Fig. 11 compares the intercept found in the bestline on each
landmark and the resulting condence region when this land-
mark is used as a target host. It should be noted that Fig. 11(a)
and (b) are not in the same scale. The U.S. dataset presents some
landmarks with very large intercepts in their bestlines as com-
pared to the European landmarks, leading to large condence
regions for some U.S. target hosts. However, regardless of the
dataset, all landmarks that have large intercepts also have a
large condence region when being used as target hosts. This
clearly indicates that excessively large localized delays lead to
large condence regions. Nevertheless, the contrary is not nec-
essarily true. From Fig. 11, small intercepts do not directly result
in small condence regions. A large condence region may be
the result of an overestimation of the distance constraints by the
remaining landmarks due to how they currently observe the net-
work conditions, and not necessarily related to local conditions
of the target host. If shared paths hide the target host behind a
single point, all landmarks overestimate the distance constraints,
even if the target host presents no localized delay as is further
discussed in next section.
3) Shared Paths: Measurements from different landmarks
that share some paths toward the target host provide redundant
information. If all measurements travel past a single point and
share the remaining paths toward the target host, the location
estimate is limited to a region around that single point. This
potentially leads to inaccurate estimates, i.e., large condence
regions. We observe some inaccurate location estimates due to
shared paths in our experiments, as some cases shown in Fig. 11
that have large condence regions although the host presents
small or no localized delay.
An interesting example of shared paths is the case of the RIPE
hosts located in Lisbon and Porto, both cities in Portugal. When
the Porto landmark is used as a target host, this leads to an in-
accurate location estimation with a condence region of about
57 000 km , which is about 2/3 of the size of Portugal. Fig. 12
shows the bestline that reects how the Lisbon and Porto land-
marks best observe the relationship between network delay and
geographic distance within the network. It should be noted that
the Porto landmark determines the bestline of the Lisbon land-
mark in Fig. 12(a), and vice versa in Fig. 12(b). We observe that
without the Lisbon landmark in Fig. 12(b) the bestline of the
Porto landmark would be shifted toward the remaining land-
marks. The resulting gure would be virtually the same as of
the bestline of the Lisbon landmark in Fig. 12(a), except that
an intercept of about 5 ms would be present in the new
bestline of the Porto landmark. The measured delay between
the Porto landmark and the Lisbon landmark is indeed about 5
ms. In other words, the network perception that all landmarks
have from the Porto host is the same that they have from the
Lisbon host with an additional delay of 5 ms. Clearly, from the
viewpoint of the remaining landmarks, the Porto landmark is
to some extent hidden behind the Lisbon landmark. We suggest
that this is an indication that all trafc from Porto toward the re-
maining landmarks, and vice versa, travels through the Lisbon
Fig. 12. Example of inaccurate location estimation caused by shared paths. (a) Bestline of the Lisbon landmark. (b) Bestline of the Porto landmark.
urban area. As a consequence, when the Porto landmark is used
as the target host, the condence region is inferred as a relatively
large circle around Lisbon, i.e., an inaccurate location estimate.
In the U.S. dataset, we observe a similar typical case of
shared paths that leads to inaccurate location estimations.
The AMP hosts and , respectively
located in Pullman (WashingtonWA) and in Bozeman (Mon-
tanaMT), seem to be hidden by the
host in Seattle (WA). All the remaining landmarks in the U.S.
dataset see the and hosts with a
constant extra delay of 10 ms and 15 ms, respectively, added to
their visions of . This leads to inaccurate
condence regions. Measurements from all other landmarks
share paths to and after traveling
through the Seattle area as indicate the respective
traces available at AMP [25]. It is reasonable to suppose that the
trafc to these hosts passes through somewhere in the Seattle
area. We believe that these results on shared paths obtained
using CBG are an indication that similar methods may be used
for topology inference, but this still needs further investigation.
We also present experimental results for a CBG deployment
on PlanetLab [11]. These results have been taken in early May
2005. We adopt 57 landmarks, i.e., PlanetLab nodes, distributed
in the following way: 24 in the U.S., 24 in Europe, ve in Asia,
three in South America, and one in Oceania. These landmarks
are used to geolocate using CBG methodology 42 target hosts
in the U.S. and 43 target hosts in Europe. Among these target
hosts, there are three behind modems, three connected through
wireless links, and six through ADSL, while the remaining have
Internet access using broadband links. The response time for all
target hosts is within a 23 minutes range.
Fig. 13 shows the CDF of the observed error distance using
CBG in our PlanetLab experiment. The mean error distance for
the target hosts located in the U.S. is 209 km, whereas for the
target hosts located in Europe the mean error distance is 106 km.
The median error distance and the 80th percentile for the U.S.
Fig. 13. Error distance for CBG using PlanetLab.
hosts are 130 km and 411 km, respectively. For the target hosts
located in Europe, the median error distance is 42 km and the
80th percentile is 218 km. Fig. 14 presents the CDF of the con-
dence regions in km for the location estimates of the target
hosts located in both the U.S. and Europe. Although condence
regions in the PlanetLab experiments are in general larger than
those found in the dataset evaluation, there is a larger number of
highly condent estimates, i.e., with a condence region of less
than km.
The network diversity issues associated with PlanetLab ex-
periments are well-known [5], [28] and as such, our results must
be evaluated in that light. However, there is no widely available
alternative system to PlanetLab for these sorts of experiments at
the current time.
In this section we address topics related to Internet geoloca-
tion technology in general. We emphasize that the raised issues
do not necessarily affect CBG more than they do with any other
geolocation technique.
Fig. 14. Condence regions provided by CBG in km using PlanetLab.
The development and use of geolocation technology can give
rise to privacy and security concerns. The Geographic Loca-
tion/Privacy (geopriv) IETF working group [29] focuses on es-
tablishing policies to control the exchange of geolocation infor-
mation with privacy in mind, whereas the development of geolo-
cation technology is out of its scope of work. Thus, our research
is actually complementary to their work. We believe that any ge-
olocation technology, including CBG, has to consider privacy
and security issues in the use of the provided location informa-
tion. Further, the proposed approach at the geopriv community
is to provide less location information, i.e., with reduced reso-
lution, to unprivileged users. The condence region assigned by
CBG to each location estimate may be directly used to this pur-
Proxies and rewalls impose a fundamental limitation on
measurement-based geolocation techniques that depend on the
client IP address. Since the IP address seen by the external
network may actually correspond to the address of a proxy,
the geolocation techniques infer the geographic location of
the proxy, which may be inaccurate in the case the client and
the proxy are not in relatively close proximity. As a practical
countermeasure to this, commercial geolocation services that
rely on exhaustive tabulation (Section II-B) keep an extensive
database of known proxy servers from large ISPs in order to
refrain from inferring a geolocation in these cases. Denying a
location answer is a rst step, but not exactly a solution to the
problem. This is an area for further research.
Measurement-based geolocation techniques assume that the
target host is able to answer measurements (a request for
instance). Nevertheless, even if the target host does not directly
echo requests, a measurement-based geolocation may still
be possible. A possible countermeasure that we have consid-
ered is to use and look for secondary targets to be
measured that are relatively close in hop count to the originally
intended target host. By limiting the distance in hop count and
inferring the location of these secondary targets, a location esti-
mate may be feasible at a lower accuracy.
In this paper, we have proposed the Constraint-Based Geolo-
cation (CBG), a measurement-based method to estimate the ge-
ographic location of Internet hosts. CBG establishes a dynamic
relationship between network delay and geographic distance.
This is done in a distributed and self-calibrating fashion among
the adopted landmarks using the bestline method. CBG points
out that accurate transformation from delay measurements to
geographic distances constraints is indeed feasible and that in
practice these constraints are often tight enough to allow an ac-
curate location estimation using multilateration.
Our experimental results show that CBG outperforms pre-
vious geolocation techniques. The median error distance ob-
tained in our experiments for the U.S. dataset is below 100 km
while for the Western Europe dataset this value is below 25 km.
These results contrast with median error distances of about 150
km for the U.S. dataset and 100 km for the Western Europe
dataset when GeoPing-like methods are used. Moreover,in con-
trast to previous approaches, CBG assigns a condence region
to each location estimate. This is important to allow a loca-
tion-aware application to assess whether the location estimate
is sufciently accurate for its needs. Our ndings indicate that
an accurate location estimate, i.e., with a relatively small con-
dence region, is provided for most cases in both datasets, thus
enabling location information at a regional level granularity.
Similar results have been found in a PlanetLab deployment of
CBG. It might be possible, once the condence region has been
determined, to use other methods if necessary to geolocate more
precisely the target host using regional landmarks. This is left
for future work.
Our results are based on measurements taken in well-con-
nected, geographically contiguous networks. To some extent our
work takes advantage of the fact that network connectivity has
improved dramatically in the last decade, and that the relation-
ship between network delay and geographic distance is strong
in these regions [2], [30]. Thus, one must be cautious before ex-
trapolating our results to arbitrary network regions. Generalized
geolocation to or from typical end-systems and investigation on
methods to address other sources of distortion in the relation-
ship of delay and distance that result in inaccurate estimations
are part of our future work.
[1] M. J. Freedman, M. Vutukuru, N. Feamster, and H. Balakrishnan, Ge-
ographic locality of IP prexes,in Proc. ACM Internet Measurement
Conf. (IMC 2005), Berkeley, CA, Oct. 2005, pp. 153158.
[2] A. Ziviani, S. Fdida, J. F. de Rezende, and O. C. M. B. Duarte, Im-
proving the accuracy of measurement-based geographic location of In-
ternet hosts,Comput. Netw., vol. 47, no. 4, pp. 503523, Mar. 2005.
[3] V. N. Padmanabhan and L. Subramanian, An investigation of geo-
graphic mapping techniques for Internet hosts,in Proc. ACM SIG-
COMM, San Diego, CA, Aug. 2001, pp. 173185.
[4] P. Enge and P. Misra, Special issue on global positioning system,
Proc. IEEE, vol. 87, no. 1, pp. 315, Jan. 1999.
[5] S. Banerjee, T. G. Grifn, and M. Pias, The interdomain connectivity
of PlanetLab nodes,in Proc. Passive and Active Measurement Work-
shop (PAM 2004), Antibes Juan-les-Pins, France, Apr. 2004.
[6] L. Subramanian, V. N. Padmanabhan, and R. Katz, Geographic prop-
erties of Internet routing,in Proc. USENIX 2002, Monterey, CA, Jun.
2002, pp. 243259.
[7] T. S. E. Ng and H. Zhang, Predicting Internet network distance with
coordinates-based approaches,in Proc. IEEE INFOCOM, New York,
Jun. 2002, pp. 170179.
[8] L. Tang and M. Crovella, Virtual landmarks for the Internet,in AC M
Internet Measurement Conf. 2003, Miami, FL, Oct. 2003, pp. 143152.
[9] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, Vivaldi: A decentral-
ized network coordinate system,in Proc. ACM SIGCOMM 2004, Port-
land, OR, Aug. 2004, pp. 1526.
[10] R. Percacci and A. Vespignani, Scale-free behavior of the Internet
global performance,Eur. Phys. J. BCondensed Matter, vol. 32, no.
4, pp. 411414, Apr. 2003.
[11] PlanetLab: An Open Platform for Developing, Deploying, and
Accessing Planetary-Scale Services. 2002 [Online]. Available:
[12] C. Davis, P. Vixie, T. Goodwin, and I. Dickinson, A means for ex-
pressing location information in the domain name system,Internet
RFC 1876, Jan. 1996.
[13] IP Address to Latitude/Longitude. Univ. Illinois, Urbana-Champaign
[Online]. Available:
[14] D. Moore, R. Periakaruppan, J. Donohoe, and K. Claffy, Where in the
world is,presented at the INET 2000 Conf., Yoko-
hama, Japan, Jul. 2000.
[15] GeoURL. [Online]. Available:
[16] Net World Map. [Online]. Available:
[17] GeoNetMap. Geobytes, Inc. [Online]. Available: http://www.geobytes.
[18] GeoPoint. Quova Inc. [Online]. Available:
[19] GTrace. CAIDA [Online]. Available:
[20] Sarangworld Traceroute Project. 2003 [Online]. Available: http://www.
[21] P. Bahl and V. N. Padmanabhan, RADAR: An in-building RF-based
user location and tracking system,in Proc. IEEE INFOCOM 2000,
Tel Aviv, Israel, Mar. 2000, pp. 775784.
[22] A. Ziviani, S. Fdida, J. F. de Rezende, and O. C. M. B. Duarte, To -
ward a measurement-based geographic location service,in Proc. Pas-
sive and Active Measurement Workshop (PAM 2004), Antibes Juan-les-
Pins, France, Apr. 2004, pp. 4352.
[23] C. J. Bovy, H. T. Mertodimedjo, G. Hooghiemstra, H. Uijterwaal,
and P. van Mieghem, Analysis of end-to-end delay measurements in
Internet,in Proc. Passive and Active Measurement Workshop (PAM
2002), Fort Collins, CO, Mar. 2002.
[24] RIPE Test Trafc Measurements. 2000 [Online]. Available:
[25] NLANR Active Measurement Project. 1998 [Online]. Available: http://
[26] N. Spring, R. Mahajan, and T. Anderson, Quantifying the causes of
path ination,in Proc. ACM SIGCOMM 2003, Karlsruhe, Germany,
Aug. 2003, pp. 113124.
[27] D. Krioukov, K. Fall, and X. Yang, Compact routing on Internet-like
graphs,in Proc. IEEE INFOCOM 2004, Hong Kong, Mar. 2004, pp.
[28] H. Zheng, E. K. Lua, M. Pias, and T. G. Grifn, Internet routing poli-
cies and round-trip-times,in Proc. Passive and Active Measurement
Workshop (PAM 2005), Boston, MA, Mar. 2005, pp. 236250.
[29] Geographic location/privacy (geopriv). IETF working group,
2003 [Online]. Available:
[30] S.-H. Yook, H. Jeong, and A.-L. Barabási, Modeling the Internets
large-scale topology,Proc. National Academy of Sciences (PNAS),
vol. 99, pp. 1338213386, Oct. 2002.
Bamba Gueye received the B.Sc. degree in com-
puter science from the University Cheikh Anta Diop,
Dakar, Senegal, and the M.Sc. degree in networking
from the University of Paris 6, France, in 2003.
Currently, he is working toward the Ph.D. degree in
computer networking, also at the University of Paris
He is developing the GeoLIM project that aims
at providing measurement-based geolocation of
Internet hosts. His research interests are in Internet
measurements, focusing on measurement-based
geolocation and bandwidth estimation.
Artur Ziviani (S99M04) received the B.Sc.
degree in electronics engineering in 1998 and the
M.Sc. degree in electrical engineering (emphasis
in computer networking) in 1999, both from the
Federal University of Rio de Janeiro (UFRJ), Brazil.
In 2003, he received the Ph.D. degree in computer
science from the University of Paris 6, France, where
he was also a Lecturer during 20032004.
Since 2004, he has been with the National Labo-
ratory for Scientic Computing (LNCC), Brazil. His
research interests include QoS, wireless computing,
Internet measurements, and the application of networking technologies in
Dr. Ziviani has been a member of the ACM since 2004.
Mark Crovella (M94) is Professor of computer
science at Boston University, Boston, MA. During
2003-2004, he was a Visiting Associate Professor
at the Laboratoire dInformatique de Paris VI
(LIP6). His research interests are in performance
evaluation, focusing on parallel and networked
computer systems. In the networking arena, he has
worked on characterizing the Internet and the World
Wide Web; on analysis of Internet measurements,
including trafc and topology measurements; and on
the implications of measured Internet properties for
the design of protocols and systems. He is coauthor of Internet Measurement:
Infrastructure, Trafc and Applications (Wiley, 2006).
Dr. Crovella has been a member of the ACM since 1994.
Serge Fdida (M88SM98) has been a Full Pro-
fessor at the UniversitéPierre et Marie Curie (Paris
6) since 1991. He received the Doctorat de 3eme
Cycle in 1984, and the Habilitation a Diriger des
Recherches specializing in modeling of computer
networks in 1989 from the University of Paris 6.
From 1989 to 1995, he was a Full Professor to the
UniversitéRene Descartes (Paris). His research inter-
ests are in the area of high-speed networking, perva-
sive communication, resource management, and per-
formance analysis. He is heading the Network and
Performance group of the LIP6 Laboratory (CNRS-University of Paris 6). He
was a Visiting Scientist at IBM Research during the 19901991 academic year.
He has been the editor of the proceedings of several networking conferences and
is the author of a book on performance evaluation and a book on networking. He
is involved in many research projects in high-performance networking in France
and Europe. He is also the Co-Director of EURONETLAB, a joint laboratory
established in 2001, between University Paris 6, CNRS, THALES and 6WIND.
Prof. Fdida has been a member of the ACM since 1988.
... Thus, the webcam landmarks generated by GeoCAM include the information of IP addresses and corresponding pairs of latitude/longitude. We implement three geolocation algorithms from previous research [8]- [10], and a hybrid algorithm of our own approach. Based on webcam landmarks, we can approximately pinpoint an individual host's geolocation with high accuracy and wide coverage. ...
... The geolocation algorithm is to estimate the distance between the target and landmarks based on measured latencies and then calculate the coordinates of the target. In this work, we use three geolocation algorithms, including Constraint-Based Geolocation (CBG) [8], Octant [9], Spotter [10], and our GeoCAM. Since we cannot find the open-source code for those three algorithms, we have to re-implement them based on the descriptions presented in previous work. ...
... 1) Constraint-Based Geolocation: Gueye et al. [8] proposed the CBG that builds a linear programming model between the distance and the latency, i.e., d i ∼ a * t i + b. For a target host, CBG measures the latency and uses the linear model to calculate its distance. ...
Full-text available
IP-based geolocation is essential for various location-aware Internet applications, such as online advertisement, content delivery, and online fraud prevention. Achieving accurate geolocation enormously relies on the number of high-quality (i.e., the fine-grained and stable over time) landmarks. However, the previous efforts of garnering landmarks have been impeded by the limited visible landmarks on the Internet and manual time cost. In this paper, we leverage the availability of numerous online webcams used to monitor physical surroundings as a rich source of promising high-quality landmarks for serving IP-based geolocation. In particular, we present a new framework called GeoCAM, which is designed to automatically generate qualified landmarks from online webcams, providing an IP-based geolocation service with high accuracy and wide coverage. GeoCAM periodically monitors websites hosting live webcams and uses the natural language processing technique to extract the IP addresses and latitude/longitude of webcams for generating landmarks at a large-scale. Given latency and topology constraints among webcam landmarks, GeoCAM uses the maximum likelihood estimation to approximately pinpoint the geolocation of a target host. We develop a prototype of GeoCAM and conduct real-world experiments for validating its efficacy. Our results show that GeoCam can detect 282,902 live webcams hosted in webpages with 94.2% precision and 90.4% recall, and then generate 16,863 stable and fine-grained landmarks, which are two orders of magnitude more than the landmarks used in prior works. To demonstrate the superiority of using large-scale webcams as landmarks, we implement four different geolocation algorithms and compare their performance between webcam landmarks and open-source landmarks. The evaluation results show that all the algorithms can significantly improve geolocation accuracy by using webcam landmarks.
... Addressing such concerns requires identifying if popular websites (e.g., country specific Alexa top-1k) are located within a country's geographic boundary. Thus, we began by employing Constraint Based Geolocation (CBG) [42], a multilateration technique to geolocate Internet hosts. The process involves estimating the position of a target host using distance measurements from sufficient number of fixed reference points. ...
... The initial efforts [62] involved mapping nodes with unknown locations (also called targets), proximus to known locations. Later approaches mostly relied on Internet multilateration [42,49,75,76]. They involve estimating the position of a target host, using distance measurements from sufficient number of fixed reference points. ...
... An example is the Speed of Light (SOL) constraint. It is largely believed that packets on fiber network cannot travel faster than two-thirds the speed of light (c) [42]. This principle provides an upper bound ((2/3)c * OW D, where OWD is one-way delay) on the estimated distance the packets may travel. ...
Full-text available
Recent research claims that “powerful” nation-states may be hegemonic over significant web traffic of “underserved” nations ( e.g., Brazil and India). Such traffic may be surveilled when transiting (or ending in) these powerful nations. On the other hand, content distribution networks (CDNs) are designed to bring web content closer to end-users. Thus it is natural to ask whether CDNs have led to the localization of Internet traffic within the country’s boundary, challenging the notion of nation-state hegemony. Further, such traffic localization may inadvertently enhance a country’s ability to coerce content providers to censor (or monitor) access within its boundary. On top of that, the obvious solution, i.e., anti-censorship approaches, may sadly face a new dilemma. Traditional ones, relying on proxies, are easily discoverable. Whereas newer ones ( e.g., Decoy Routing, Cache-Browser, Domain Fronting and CovertCast etc. ) might not work as they require accessing web content hosted outside the censors’ boundary. We thus quantitatively analyzed the impact of web content localization on various anti-censorship systems. Such analysis requires geolocating the websites. Thus we adapted a multilateration method, Constraint Based Geolocation (CBG), with additional heuristics. We call it as Region Specific CBG (R-CBG) . In more than 89% cases, R-CBG correctly classifies hosts as inside (or outside) w.r.t. a nation. Our empirical study, involving five countries, shows that the majority (61%−92%) of popular country-specific websites are hosted within a client’s own country. Further, additional heuristics classify the majority of them to be on CDNs.
... As GeoIP databases are not always reliable [23,46,52], active localization approaches that use timing measurements to derive geo-location have been proposed in prior work. Systems like Spotter [39], Octant [61], or constraint-based geo-location [26,34,38] send timing probes to targets from trusted landmarks that have a known location [15,20]. Combining the timing measurements obtained by the set of landmarks allows to narrow down the location of the target. ...
... Furthermore, prior approaches often rely on central authorities [26,61]. In contrast to VerLoc, both approaches depend on trusted information of the underlying infrastructure and do not consider targeted adversarial manipulations. ...
This paper tackles an open challenge: reliably determining the geo-location of nodes in decentralized networks, considering adversarial settings and without depending on any trusted parties. In particular, we consider active adversaries that control a subset of nodes, announce false locations and strategically manipulate measurements. To address this problem we propose, implement and evaluate VerLoc, a system that allows verifying the claimed geo-locations of network nodes in a fully decentralized manner. VerLoc securely schedules round trip time (RTT) measurements between randomly chosen pairs of nodes in a way that cannot be biased by an adversary. Trilateration is then applied to the set of measurements to verify claimed geo-locations. VerLoc verifies genuine locations with more than 90\% accuracy, and it is capable of detecting and filtering out adversarial timing manipulations for network setups with up to 25\% malicious nodes.
... The database query-based IP location method mainly relies on the accurate geographic location information in the database to locate the IP. first, since the IP geographic location information in the database can only be accurate to the city level, while some databases can only be accurate to the national level, the location granularity of the method is greatly affected by the granularity of the information in the database [15]; (2) the measurement error of the network delay is large. In the actual network environment, the accuracy of network delay measurement is affected by the presence of conditions such as router queuing delay, network congestion, and delay jitter. ...
Full-text available
The geographic location of network IP is an important foundation of location-based services, which plays an increasingly important role in people’s daily life and work. However, the existing IP location methods based on database query, network measurement and machine learning often have some problems, such as poor timeliness, low reliability, difficult feature modeling and so on, or need a large amount of user data as support, which is often only suitable for the more ideal network environment, so the positioning accuracy and applicability are far from the actual needs. To solve this problem, this paper proposes an IP location method based on clustering. The initial IP address of the classifier is located by training. Combined with the method of IP address database matching, the accurate location of IP address is finally realized. Experimental results show the effectiveness of the method.
... We use fping, an extension to ping, for fast probing destinations in parallel. In geolocation applications, ping-time is typically the minimum of several RTT samples to mitigate queueing delay [84,16]. In our experiments, we use more samples (i.e., 30 samples) to further exclude queueing delay for ping-time. ...
Load balancing is used in the Internet to distribute load across resources at different levels, from global load balancing that distributes client requests across servers at the Internet level to path-level load balancing that balances traffic across load-balanced paths. These load balancing algorithms generally work under certain assumptions on performance similarity. Specifically, global load balancing divides the Internet address space into client aggregations and assumes that clients in the same aggregation have similar performance to the same server; load-balanced paths are generally selected for load balancing as if they have similar performance. However, as performance similarity is typically achieved with similarity in path properties, e.g., topology and hop count, which do not necessarily lead to similar performance, performance between clients in the same aggregation and between load-balanced paths could differ significantly. This dissertation evaluates and improves global and path-level load balancing in terms of performance similarity. We achieve this with large-scale latency measurements, which not only allow us to systematically identify and evaluate the performance issues of Internet load balancing at scale, but also enable us to develop data-driven approaches to improve the performance. Specifically, this dissertation consists of three parts. First, we study the issues of existing client aggregations for global load balancing and then design AP-atoms, a data-driven client aggregation learned from passive large-scale latency measurements. Second, we show that the latency imbalance between load-balanced paths, previously deemed insignificant, is now both significant and prevalent. We present Flipr, a network prober that actively collects large-scale latency measurements to characterize the latency imbalance issue. Lastly, we design another network prober, Congi, that can detect congestion at scale and use Congi to study the congestion imbalance problem at scale. For both latency and congestion imbalance, we demonstrate that they could greatly affect the performance of various applications.
... In order to improve the location accuracy, it often combines the network topology information. Typical location algorithms and systems of this kind geolocation algorithm include Geoping [8], Shortest Ping [2], Constraint-based Geolocation (CBG) [9], Topology-based Geolocation (TBG) [2], NBIGA [7], Posit, Spotter, etc. Compared with the location algorithm based on conjecture, this kind of location algorithm has a more perfect mathematical foundation. ...
Full-text available
IP Golocation is an important technique to achieve network attack traceability, network security early warning, and improve the ability of cyberspace governance. Due to the limitation of collecting a large number of anchor nodes, it is impossible to accurately infer the IP location to block level or more in a large scale. Therefore, this paper proposes a target driven IP geolocation inference method. Firstly, the candidate anchor node is collected from the subnet or the adjacent subnet of the target IP, then the candidate anchor node is calibrated by their fingerprints, and finally the geolocation of target IP is comprehensively inference by its proximity to anchors. In order to verify the proposed method, 10,000 IPs in Beijing is selected randomly. The experimental results show that the method can effectively infer the geographical location of the target IP address.
Full-text available
Unmanned Aerial Vehicles (UAV) supported by 5G networks can play an important role in providing aerial-aerial/aerial-ground computing services to remote and isolated areas at a low cost. In this paper, we present an aerial-aerial-ground network (AAGN) computing architecture using High Altitude Unmanned Aerial Vehicle (HAU) and Mini-Drones (MDs) based on Mobile Edge Computing (MEC) services where HAU provides computation offloading services for MDs, while MDs can serve as edge computing servers that can be equipped with appropriate capabilities to provide computing services for User Equipments (UEs) on demand. This study focuses on the computation offloading services provided by HAU to MDs, where the MD offloads all or a part of the task to the HAU, and the remaining of the task can be executed by MD. The proposed AAGN framework aims to reduce the MDs’ energy consumption and minimize the processing delay by optimizing HAU mobility, MDs scheduling, flight speed, flight angle, and tasks offloading, equipping HAU with the required computing resources. We investigate the computation offloading problem using Deep Deterministic Policy Gradient (DDPG) as a computing offloading approach to learn the optimal offloading policy from a dynamic AAGN environment, considering this problem as a non-convex problem. The simulation results show the feasibility and effectiveness of the proposed AAGN environment where DDPG algorithm can achieve an optimal decision offloading policy and obtains a critical optimization in delay and task offloading ratio compared with Deep Q Network (DQN) and Actor-Critic (AC) algorithms.
Conference Paper
Full-text available
Location-aware applications require a geographic location service of Internet hosts. We focus on a measurement-based service for the geographic location of Internet hosts. Host locations are inferred by comparing delay patterns of geographically distributed landmarks, which are hosts with a known geographic location, with the delay pattern of the target host to be located. Results show a significant correlation between geographic distance and network delay that can be exploited for a coarse-grained geographic location of Internet hosts.
Conference Paper
Full-text available
Round trip times (RTTs) play an important role in Internet measure- ments. In this paper, we explore some of the ways in which routing policies im- pact RTTs. In particular, we investigate how routing policies for both intra- and inter-domain routing can naturally give rise to violations of the triangle inequality with respect to RTTs. Triangle Inequality Violations (TIVs) might be exploited by overlay routing if an end-to-end forwarding path can be stitched together with paths routed at layer 3. However, TIVs pose a problem for Internet Coordinate Systems that attempt to associate Internet hosts with points in Euclidean space so that RTTs between hosts are accurately captured by distances between their associated points. Three points having RTTs that violate the triangle inequality cannot be embedded into Euclidean space without some level of inaccuracy. We argue that TIVs should not be treated as measurement artifacts, but rather as nat- ural features of the Internet's structure. In addition to explaining routing policies that give rise to TIVs, we present illustrating examples from the current Internet.
Full-text available
Network generators that capture the Internet's large-scale topology are crucial for the development of efficient routing protocols and modeling Internet traffic. Our ability to design realistic generators is limited by the incomplete understanding of the fundamental driving forces that affect the Internet's evolution. By combining several independent databases capturing the time evolution, topology, and physical layout of the Internet, we identify the universal mechanisms that shape the Internet's router and autonomous system level topology. We find that the physical layout of nodes form a fractal set, determined by population density patterns around the globe. The placement of links is driven by competition between preferential attachment and linear distance dependence, a marked departure from the currently used exponential laws. The universal parameters that we extract significantly restrict the class of potentially correct Internet models and indicate that the networks created by all available topology generators are fundamentally different from the current Internet.
The proliferation of mobile computing devices and local-area wireless networks has fostered a growing interest in location-aware systems and services. In this paper we present RADAR, a radio-frequency (RF) based system for locating and tracking users inside buildings. RADAR operates by recording and processing signal strength information at multiple base stations positioned to provide overlapping coverage in the area of interest. It combines empirical measurements with signal propagation modeling to determine user location and thereby enable location-aware services and applications. We present experimental results that demonstrate the ability of RADAR to estimate user location with a high degree of accuracy.
Location-aware applications take into account from where the users are accessing and thereby can offer novel functionalities in the Internet. This paper focuses on improving the accuracy of a geographic location service that relies on delay measurements to locate Internet hosts. Host locations are inferred by comparing delay patterns of geographically distributed landmarks, which are hosts with a known geographic location, with the delay pattern of the target host to be located. We deal with two problems that influence the accuracy of the resulting location estimation: (i) the placement of the landmarks and the probe machines that perform the delay measurements; and (ii) how to best measure the similarity between the delay patterns of the landmarks and the one observed for the target host. For the landmark placement problem, we propose a demographic approach to improve the representativeness of each landmark with respect to the hosts to be located. Given a limited number of landmarks, results show that a demographic placement provides closer landmarks and more accurate location estimations for most hosts. Concerning the placement of probe machines, we show that they have to be sparsely placed to avoid gathering redundant data. Furthermore, we define and evaluate three similarity models. Experiments show that other similarity models outperform the commonly adopted Euclidean distance, resulting then in a more accurate geographic location of Internet hosts.
Conference Paper
The Thorup-Zwick (TZ) compact routing scheme is the first generic stretch-3 routing scheme delivering a nearly optimal per-node memory upper bound. Using both direct analysis and simulation, we derive the stretch distribution of this routing scheme on Internet-like inter-domain topologies. By investigating the TZ scheme on random graphs with power-law node degree distributions, P<sub>k</sub>&sime;k<sup>-γ</sup>, we find that the average TZ stretch is quite low and virtually independent of γ. In particular, for the Internet inter-domain graph with γ&sime;2.1, the average TZ stretch is around 1.1, with up to 70% of all pairwise paths being stretch-1 (shortest possible). As the network grows, the average stretch slowly decreases. We find routing table sizes to be very small (around 50 records for 104-node networks), well below their theoretical upper bounds. Furthermore, we find that both the average shortest path length (i.e. distance) d~ and width of the distance distribution σ observed in the real Internet inter-AS graph have values that are very close to the minimums of the average stretch in the d~- and σ -directions. This leads us to the discovery of a unique critical point of the average TZ stretch as a function of d~ and σ. The Internet's distance distribution is located in a close neighborhood of this point. This is remarkable given the fact that the Internet inter-domain topology has evolved without any direct attention paid to properties of the stretch distribution. It suggests the average stretch function may be an indirect indicator of the optimization criteria influencing the Internet's inter-domain topology evolution.