Using the geographic scopes of web documents for contextual advertising.
- Citations (18)
-
Cited In (0)
-
Conference Proceeding: Web-a-where: geotagging web content.
SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004; 01/2004 -
SourceAvailable from: xldb.di.fc.ul.pt
Conference Proceeding: Classifying Documents According to Locational Relevance.
Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence, EPIA 2009, Aveiro, Portugal, October 12-15, 2009. Proceedings; 01/2009 -
SourceAvailable from: Andrei Z. Broder
Conference Proceeding: To swing or not to swing: learning when (not) to advertise.
Andrei Z. Broder, Massimiliano Ciaramita, Marcus Fontoura, Evgeniy Gabrilovich, Vanja Josifovski, Donald Metzler, Vanessa Murdock, Vassilis PlachourasProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008; 01/2008
Page 1
Using the Geographic Scopes of Web Documents for
Contextual Advertising
Ivo Anastácio, Bruno Martins, Pável Calado
INESC-ID Lisboa / Instituto Superior Técnico, PT
{ivo.anastacio, bruno.g.martins, pavel.calado}@ist.utl.pt
ABSTRACT
Geotargeting is a specialization of contextual advertising
where the objective is to target ads to Website visitors con-
centrated in well-defined areas. Current approaches involve
targeting ads based on the physical location of the visitors,
estimated through their IP addresses. However, there are
many situations where it would be more interesting to target
ads based on the geographic scope of the target pages, i.e.,
on the general area implied by the locations mentioned in
the textual contents of the pages. Our proposal applies tech-
niques from the area of geographic information retrieval to
the problem of geotargeting. We address the task through a
pipeline of processing stages, which involves (i) determining
the geographic scope of target pages, (ii) classifying target
pages according to locational relevance, and (iii) retrieving
ads relevant to the target page, using both textual contents
and geographic scopes. Experimental results attest for the
adequacy of the proposed methods in each of the individual
processing stages.
Categories and Subject Descriptors
H.3.1 [Content Analysis and Indexing]: Abstracting
methods; H.4.m [Information Systems]: [Miscellaneous]
General Terms
Algorithms, experimentation
Keywords
Contextual Advertisement, Geotargeting, Geographic Text
Mining, Geographic Information Retrieval
1.
Online advertising platforms such as Google AdSense1or
Yahoo! Content Match2are nowadays the financial back-
bone of the Web. The primary business model behind most
INTRODUCTION
1http://www.google.com/adsense
2http://publisher.yahoo.com/sell/ContentMatch.php
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GIR’10, 18-19th Feb. 2010, Zurich, Switzerland
Copyright c ? 2010 ACM ISBN 978-1-60558-826-1/10/02... $10.00
non-transactional Web sites is currently based on contextual
advertisement, where contextually relevant textual ads are
displayed alongside the regular content of Web pages.
From a research standpoint, contextual advertising is nor-
mally seen as an Information Retrieval (IR) problem, where
the objective is to retrieve relevant ads given a target Web
page. Previous studies have shown that, in the contextual
advertisement domain, relevance increases the probability of
reaction and is therefore strongly tied with profitability (i.e.,
more relevant ads lead to improved user satisfaction and
higher response rates) [19]. One problem that has been get-
ting increasing attention is therefore the design of IR ranking
functions to select advertisements that are highly relevant.
A particularly interesting specialization of contextual adver-
tising is localized advertisement, also known as geotargeting,
where the objective is to target ads to audiences concen-
trated in well-defined areas. This is particularly interesting
to advertisers that have local businesses and are looking to
generate shop traffic or calls for professional services. For
example, a takeaway restaurant serving a particular region
would like to target its advertisements to that region.
Nowadays, the most common geotargeting approach involves
targeting ads based on the physical location of the visitors,
estimated using their IP addresses [7]. While this would
work on the example of takeaway restaurants (i.e., potential
clients are often interested in knowing what is near to their
current location), the IP-targeting approach has several lim-
itations. Besides the inacuracies involved in IP geocoding,
there are also many situations where it would be more inter-
esting to target ads based on the geographic scope described
in the content of the target pages.
Consider a user who is traveling to Lisbon and browsing
Web pages describing tourist attractions and events in the
city. Here it would be more interesting to place ads that are
relevant to the geographic location that is described in the
content of the pages. To handle these cases, advertisers of-
ten include region-specific keywords on the text of the ads,
hoping that they match the placenames mentioned in the
Web pages. However, this is by no means an optimal solu-
tion, since it cannot account for geographical proximity or
containment (e.g., the name Lisbon would not match places
that correspond to sub-regions, like Chiado).
A recent trend in IR applications relates to extracting geo-
Page 2
graphic context information from textual documents, in or-
der to explore it for purposes of document retrieval [10, 14,
1]. This is usually referred to as Geographic Information
Retrieval (GIR). In this paper, we explore the usage of GIR
techniques for geotargeting advertisements. Similarly to tra-
ditional contextual advertisement, we model the problem as
a task of retrieving the most locationally relevant ads, given
a target Web page. By locationally relevant we mean adver-
tisements whose target population matches the geographic
scope of the target Web page. While techniques for geo-
graphic IR have been getting increased attention, their ap-
plication to the contextual advertisement domain is, to the
best of our knowledge, a novel contribution of this paper.
We propose to address the task through a pipeline of opera-
tions, in which we (i) extract place references from the target
pages and assign them to geographic scopes, (ii) classify the
target pages as either local or global, using features like the
text or the extracted place references, and (iii) find ads rel-
evant to the target pages by using GIR ranking techniques
that combine thematic and geographic similarities.
The main contributions of this paper are as follows:
1. We compare several strategies for assigning geographic
scopes to target Web pages, including well-known al-
gorithms and baseline methods.
2. We propose and evaluate a supervised machine-learning
approach for the task of classifying the target Web
pages according to their implicit locational relevance,
i.e., classifying them as either local or global.
3. We propose and evaluate different retrieval strategies
for the task of displaying relevant advertisements in
target Web pages, leveraging on the results obtained
from the previous two tasks.
The rest of this paper is organized as follows. Section 2
presents related work. Section 3 describes the approaches
for assigning geographic scopes to Web pages.
describes the classification of target pages according to lo-
cational relevance. Section 5 describes GIR approaches for
finding the most relevant ads. Section 6 presents the ex-
perimental validation of the proposed approaches. Finally,
Section 7 presents conclusions and directions for future work.
Section 4
2.
This section describes previous research on contextual adver-
tisement, which is often formulated as a retrieval problem.
We also present previous research on Geographic Informa-
tion Retrieval (GIR), a specialization of IR that addresses
issues related to the geographic relevance of documents.
RELATED WORK
2.1
Contextual advertisement can be framed as a document re-
trieval problem, where the ads are the documents to be re-
trieved given a query composed of a target page. Thus, one
way of approaching it is to represent the target page as a set
of keywords, in order to retrieve the ads that match those
same keywords. From this perspective, Yih et al. proposed
Contextual Advertisement
a system for keyword extraction from target pages [23], ar-
guing that this is already a critical task in well-known con-
textual advertising systems. The authors use a variety of
features (e.g. TF/IDF, HTML metadata, query logs) to de-
termine the importance of phrases (i.e., sequences of up to
5 words) extracted from the target pages.
Complementary approaches have been reported by Lacerda
et al. [9] and by Ribeiro-Neto et al. [16]. Lacerda et al. fo-
cused on the selection of good ranking functions for match-
ing ads to pages, using genetic programming for generating
a non-linear combination of term weighting heuristics that
maximizes the average precision on retrieving ads. Ribeiro-
Neto et al. examined different strategies for matching pages
to ads, based on keywords. The winning strategy required
that at least one of the keywords declared by an advertiser
appeared on the target page, and ranked ads by the cosine
of the angle between the ad and page vectors. These authors
also concluded that one of the main problems in contextual
advertisement is that ads contain little text and often use a
different vocabulary than that of the pages.
In our work, we perform the same keyword extraction and
ad ranking operations.However, since these are not our
main focus, we used a state-of-the-art commercial keyword
extraction service, namely the Yahoo! Term Extractor3, and
performed the ad ranking using a linear combination of tex-
tual and geographic similarity measures, using heuristics to
fine-tune the combination weights.
Noting that, when no ads are relevant to the visitor’s in-
terests, showing ads would produce no economic benefit,
Broder et al. approached the decision problem of whether to
swing, i.e., whether or not to show any ads for an incoming
request [3]. The authors experimented with a simple thresh-
olding approach and with machine learning. In both cases,
the idea was to classify target pages as either relevant for
advertisement or not. Here we use a similar approach, but
in our case to decide whether or not a target Web page is
apropriate for placing local advertisements.
2.2
Geographic information is pervasive on the Web and explor-
ing this information is a research problem that is getting
increasing attention [8, 12]. Previous works in the area of
Geographic Information Retrieval (GIR) have addressed is-
sues such as the recognition and disambiguation of place
references in text [13], the assignment of geographic scopes
to documents [1], or the retrieval of documents taking geo-
graphic relevance into account [14, 12].
Geographic Information Retrieval
Leidner presented a variety of approaches for handling place
references in text [10]. The problem is usually seen as an ex-
tension of the named entity recognition task (NER), as pro-
posed by the natural language processing community. More
than recognizing mentions of places over text, which is the
subject of NER, the task also requires for the place refer-
ences to be disambiguated into the corresponding locations
on the surface of the Earth (i.e., assigning them to unam-
biguous geospatial coordinates or gazetteer identifiers). This
3http://developer.yahoo.com/search/content/V1/
termExtraction.html
Page 3
disambiguation often uses heuristics such as default senses
(i.e., disambiguation should be made to the most important
referent, estimated using population counts) or spatial min-
imality (i.e., disambiguation should minimize the bounding
polygon that contains all candidate referents) [10].
The automatic assignment of geographic scopes to Web doc-
uments, based on the place references that are present in the
text, is an example of a complex GIR problem that has also
been addressed. Given a set of diverse geographic regions,
corresponding to the placenames mentioned in a given Web
page, the problem concerns finding the geographic region
that best summarizes and describes them all. While several
different strategies have been proposed in the past, there
is still no clear information about the trade-offs involved in
choosing a particular algorithm. Each different algorithm
makes specific assumptions, therefore resulting in different
approximations for the geographic scope of the documents.
In this paper, we experimented with the methods proposed
by Amitay et al. [1], Woodruff and Plaunt [21], and Martins
and Silva [15], as well as with some simple baseline methods.
As for the existing approaches for retrieving documents ac-
cording to geographic relevance, they are mostly based on
combinations of the standard IR metrics used in text re-
trieval (e.g. cosine similarity of TF/IDF vectors) with sim-
ilarity metrics for geographic scopes, based on distance or
containment [14]. Larson and Frontiera compared the per-
formance of different methods for computing spatial similar-
ity scores for query-document pairs, using area overlaps [5].
Martins et al. proposed a similarity function that, instead
of area overlaps, uses a non-linear normalization of the dis-
tance between the document and query scopes [14].
Cai proposed the GeoVSM framework for geographic docu-
ment retrieval, combining traditional IR heuristics with ge-
ographic similarity scores [4]. The author argues that the-
matic and geographic similarities should be computed inde-
pendently, and afterwords combined into a single retrieval
score. Markowetz et al. proposed an efficient strategy also
based on a linear combination of scores [12]. This paper
explores similar ideas for the task of retrieving ads that are
both thematically and geographically relevant.
3.
The first stage in the proposed processing pipeline involves
assigning geographic scopes to target pages. In turn, this
stage involves two separate sub-tasks, namely (i) handling
place references in the text, and (ii) assigning geographic
scopes to documents based on the disambiguated place ref-
erences. This section details both sub-tasks.
ASSIGNING GEOGRAPHIC SCOPES
3.1
For handling place references in text, we relied on the Place-
maker4text mining Web service provided by Yahoo!. This
service provides functionalities for recognizing and disam-
biguating place references over text, returning an unique
identifier and a confidence score for each reference recog-
nized in a document. Using this identifier it is possible to
query the Yahoo! GeoPlanet5gazetteer service, and obtain
Handling Place References in Text
4http://developer.yahoo.com/geo/placemaker
5http://developer.yahoo.com/geo/geoplanet
further information on the location. This way, each of the
resolved place references is associated with the correspond-
ing city, state, country and continent, as well as with the
bounding rectangle that covers its area.
Since Yahoo! Placemaker uses natural language contextual
clues, the service can often disambiguate a word like Read-
ing between the location in England or the verb sense of
to read. It also covers many colloquial location names (e.g.
nyc for New York City), as well as interest points (e.g. Eiffel
Tower) that may appear in the text of the target pages. In
a separate publication we measured the performance of the
this geotagger for the tasks of recognizing and disambiguat-
ing place references [13]. Over the English dataset from
the CoNLL-03 experiment on named entity recognition, we
obtained F1 measures of approximately 59% and 57% for
recognition and disambiguation, respectively.
3.2
We tested seven different approaches for assigning geographic
scopes to target pages, namely the method from the Place-
maker service, the methods proposed in the context of the
Web-a-Where [1], GIPSY [21] and GREASE [15] projects,
and three simple baseline methods. In all cases, we used the
results of the Yahoo! Placemaker Web service as the source
of disambiguated place references.
Assigning Geographic Scopes to Pages
Yahoo! Placemaker, besides recognizing and disambiguating
place references, also assigns scopes to documents. We use
this Web service as a black box, in order to understand what
is the current performance of commercial applications.
The Web-a-Where technique leverages on part-of relations
among the recognized place references, provided by a hier-
archical gazetteer [1]. The basic idea is that, for instance,
if several cities from the same country are mentioned, this
might mean that this country is the scope, i.e. the algorithm
tries to generalize from the disambiguated place references.
More specific places are scored higher if they are the only
places mentioned, or if they are mentioned many times. The
algorithm starts by building a geographic hierarchy from the
disambiguated place references. By looping over these refer-
ences, it aggregates the confidence scores from lower levels
in the hierarchy. The references are then sorted by score and
the highest is chosen as the scope.
The GIPSY algorithm uses the bounding boxes that corre-
spond to the place references in the text, after they have
been disambiguated [21]. The geographic scope is computed
using the overlapping area for all the boxes, thus trying to
find the most specific place that is related to all the place
references made in the document. In the GIPSY algorithm,
the bounding boxes are seen as thick polygons, with a base
positioned at an (x,y) plane, but extending upwards a dis-
tance of z to a higher parallel plane. One by one, in de-
creasing order of size, the bounding boxes corresponding to
the place references are analysed, in order to build a skyline
of bounding boxes. Finally, the bounding boxes are sorted
according to their z order and the highest ranking bounding
box is selected as the scope. In our implementation, each
bounding box has a thickness z equal to the number of times
its respective place name occurs, weighted according to the
confidence score from the disambiguation task.
Page 4
The approach from the GREASE project is based on graph-
ranking [15]. The idea is to represent the gazetteer used
for place reference disambiguation as a graph, where the
nodes correspond to different places and the edges corre-
spond to semantic relationships (part-of, containment or ad-
jacency) between places. Nodes on this graph are weighted
according to the occurrence frequency of place references in
a document, and edges are weighted according to the rel-
ative importance of the different types of relationships. A
graph-ranking algorithm, namely PageRank, is applied to
this graph and the highest ranked node is selected as the
scope. In case of ties, the node connected to the highest
number of edges is selected. By propagating scores across
the graph, this algorithm tries, at the same time, to gener-
alize and to specify from the available information.
The three previously described methods make non-trivial as-
sumptions about how place references should be combined in
discovering the geographic scope of a document. In order to
assess what are the gains introduced by these assumptions,
we also implemented three simple baseline methods:
1. The number of times a place is referenced in a docu-
ment reflects the importance of that place to the doc-
ument’s subject. We therefore experimented with a
simple scope assignment method that chooses the most
frequently occurring place reference as the scope. In
case of ties, the place reference corresponding to the
largest area is chosen as the scope.
2. The different place references made in the document
should all contribute to the document’s scope.
therefore experimented with a simple scope assignment
method that computes the bounding box that covers
all the place references made in the document.
We
3. Only the place references that are somewhat interre-
lated should be considered, this way filtering the errors
made while recognizing and disambiguating place ref-
erences, as well as filtering the place references that
are only tangential to the content of the document.
We first compute the average centroid point for all the
place references made in the document, as well as the
average distance between the place references and this
centroid. Then, we filter out those place references
whose centroid is at a distance that is greater than
twice the average distance value. Finally, we assign a
scope corresponding to the bounding box that covers
all the remaining place references. If none are remain-
ing, we choose the one closest to the centroid. This
baseline is inspired on a technique proposed by Smith
and Crane for place reference disambiguation [18].
4.
The second stage of the proposed processing pipeline in-
volves classifying target pages according to their locational
relevance, i.e. classifying them as either local or global, so
that locally relevant ads are placed on the target pages that
are more interesting for them. For example, a target page
on the subject of computer programming can be considered
global, as it is likely to be of interest to a geographically
broad audience. In contrast, a document listing events in
CLASSIFYING TARGET PAGES
a specific city could be regarded as local, likely to be more
interesting to a regional audience.
In the context of this work, locational relevance is therefore a
score that reflects the probability of a given document being
either global, meaning that users interested in the document
are likely to have broad geographic interests, or local, mean-
ing that users interested in the document are likely to have
a single narrow geographic interest.
Assigning documents to global and local classes, according
to their implicit locational relevance, can be naturally for-
mulated as a binary classification problem. However, instead
of applying the standard classification approach, based on
a bag-of-words representation of the documents, we argue
for the use of specific features better suited to reflect the
locational characteristics. A work by Gravano et al. [6] al-
ready described similar ideas, although for classifying search
engine queries instead of documents.
Our features attempt to capture two different but comple-
mentary aspects of locational relevance, namely (i) the ge-
ographic information inferred from the distribution of place
references in the text and geographic scopes, since we can
assume that local documents are more likely to contain a
cohesive set of place references, and (ii) the thematic relat-
edness to subjects that are typically regarded as local, since
words like restaurant or hotel are more often associated to
local pages than words like tutorial or mp3.
We group the considered features into three sets, namely (i)
textual features, e.g. TF-IDF weights for term stems, (ii)
simple locative features, e.g. counts for the different types of
geographical references made in the document (e.g., seper-
ate counts for cities or states), and (iii) high level locative
features, e.g., the spatial area of the geographic scopes ob-
tained through different algorithms. Due to space restric-
tions we omit the complete description of the considered
features here, but the reader can refer to a separate publi-
cation [2].
We chose to use a classifier based on Support Vector Ma-
chines (SVMs) since SVMs represent the state-of-the-art
text classification technology [17]. Moreover, they offer the
possibility to assign a value in the interval [0,1] that esti-
mates the likeliness of the document being either in the lo-
cal or global class. In particular, we used a gaussian-based
SVM with parameters C and γ optimized through the grid-
search functionality offered by Weka [20]. The final classifier
scores correspond to probabilities for the documents being
either local or not, according to the normalization procedure
described by Lin et al. [11].
5.
The final stage of the proposed pipeline of processing oper-
ations involves retrieving and ranking ads based on a com-
bination of thematic and geographic relevance.
GEOGRAPHIC RETRIEVAL OF ADS
As previously stated, contextual advertising can be inter-
preted as a search problem over the corpus of ads. Ads are,
in our case, represented as a bag of words, where the words
come from ad fields like a title, a small textual description,
and a set of descriptive keywords. Additionally, advertisers
Page 5
also associate their ads with the intended geographic scope.
The query triggering the search is derived from the context
of the target page (the text and the geographic scope) where
the ads are to be displayed. Since users are unlikely to click
on irrelevant ads, the retrieval systems should attempt to
maximize ad relevance.
For combining multiple sources of relevance into a single
ranking, we follow the GeoVSM framework proposed by
Cai [4]. As shown in Equation 1, GeoVSM independently
computes a geographic similarity gs and a thematic similar-
ity ts, later combining them through some function f.
Rel(doc,ad) = f(ts{doc,ad},gs{doc,ad}) (1)
We argue that, for the contextual advertising problem, a
linear combination of relevance scores that uses the proposed
locational relevance as the weight, as shown in Equation 2,
is an adequate function f. In the context of multimedia
information retrieval, Wu et al. [22] demonstrated that a
linear combination might be sufficient when fusing a small
number of relevance rankings from different domains, as in
this case. Also, contrary to the usage of static weights, the
locational relevance score provides a weighting scheme that
dynamically adapts itself to each document.
f(ts,gs) = (1 − w) · ts + w · gs
We experimented with two different approaches for measur-
ing the thematic relevance ts, namely the similarity between
document key terms (extracted with Yahoo’s service) and
terms from the ads, and the similarity between the full-text
of the document and the terms from the ads. Our imple-
mentation relies on the full-text search capabilities provided
by the PostgreSQL database management system6. How-
ever, since the PostgreSQL full-text engine does not provide
a thematic similarity value v in the range [0,1], we applied
the min-max normalization procedure shown in Equation 3.
(2)
ts =
ts?− mints?
maxts? − mints?
(3)
For measuring geographic relevance gs we also experimented
with two different strategies, namely the normalized distance
metric by Martins et al. [14] and the relative area of overlap
metric proposed by Greg Jan´ ee7.
The method by Martins et al. uses a double sigmoid function
with the center corresponding to the diagonal distance of the
rectangular region for the query scope (i.e., the geographic
scope of the target page). The similarity is maximum when
the distance is zero, and smoothly decays to zero as the
distance increases. The method is presented in Equation 4,
where d refers to the diagonal distance of the rectangular
region corresponding to the geographic scope of the ad Sad
and D = centroidDistance(Spage,Sad) − d.
sim(Spage,Sad) =
(
1 if Spage is contained in Sad
1+sign(D)×(1−e−(
1 −
D
d×0.5)2
)
2
otherwise
(4)
6http://www.postgresql.org
7http://www.alexandria.ucsb.edu/~gjanee/archive/
2003/similarity.html
Frontiera et al. [5] showed that the method proposed by
Jan´ ee performs almost as well as a highly-tuned method
based on logistic regression. In Jan´ ee’s approach, the simi-
larity between two regions Spage and Sadis given as follows:
sim(Spage,Sad) =area(Spage∩ Sad)
area(Spage∪ Sad)
(5)
In our implementations of the geographic similarity func-
tions, PostGIS8was used to compute distances and overlaps.
6.
In this section, we describe the details of our empirical eval-
uation. This includes our experimental design for evaluating
the effectiveness of the proposed approaches, as well as the
obtained results in the different experiments.
EXPERIMENTAL EVALUATION
6.1
We evaluated the algorithms for assigning geographic scopes
to the target pages by comparing the produced assignments
against those of the human editors from the Open Direc-
tory Project (ODP). Specifically, we took a sample of 6,000
Web pages written in English, with more than 2 KBytes
and at least one place reference, and classified under the
Regional/North America/United States section of the direc-
tory. The human-assigned scopes were equally distributed
across scopes for the entire country, states, and cities. The
collection had a total of 1,100 unique scopes.
Assigning Scopes to Target Pages
We ran all seven algorithms over the test collection, mea-
suring the distance and the relative overlap between the
scopes that were assigned by the algorithms and the scopes
that were assigned by the human editors of the ODP. This
test jointly evaluates place reference resolution and scope as-
signment, since any errors made by the geotagger influence
the scope assignment. The overlap was measured using the
scheme proposed by Jan´ ee, which we described in Section 5.
We also measured the accuracy for both exact matches and
approximate matches. Table 1 summarizes our results.
The Web-a-Where and GraphRank algorithms obtained the
best overall performances, with errors equally distributed
across countries, states, and cities. Both approaches were
particularly good in pages with country scopes, which was
already expected due to their generalization behaviour (i.e.,
propagate scores towards encompassing regions). The GIPSY
method performed well in both average distance and accu-
racy for approximations bellow 100Km, although it had a
weak performance in terms of exact matches. The algorithm
privileges narrow regions and often fails in generalizing from
the available place references.
Regarding the baselines, the results show that using the cov-
ering area produces the worst results on most metrics. Re-
moving the outliers substantially improves the results, but
this is also a very weak baseline in terms of overall per-
formance, specially when dealing with narrow scopes. The
baseline that simply assigns the scope as the most frequent
location proved to be a very competitive approach, regularly
outperforming other methods on pages with scopes corre-
sponding to states or cities.
8http://www.postgis.org