ArticlePDF Available

The Evolution of Natural Cities from the Perspective of Location-Based Social Media

Authors:

Abstract and Figures

This paper examines the former location-based social medium Brightkite, over its three-year life span, based on the concept of natural cities. The term 'natural cities' refers to spatially clustered geographic events, such as the agglomerated patches aggregated from individual social media users' locations. We applied the head/tail division rule to derive natural cities. More specifically, we generated a triangulated irregular network, made up of individual unique user locations, and then categorized small triangles (smaller than an average size) as natural cities for the United States (mainland) on a monthly basis. The concept of natural cities provides a powerful means to develop new insights into the evolution of real cities, because there are virtually no data available to track the history of a city across its entire life span and at very fine spatial and temporal scales. Therefore, natural cities can act as a good proxy of real cities, in the sense of understanding underlying interactions, at a global level, rather than of predicting cities, at an individual level. Apart from the data produced and the contributed methods, we established new insights into the structure and dynamics of natural cities, e.g., the idea that natural cities evolve in nonlinear manners at both spatial and temporal dimensions. Keywords: Big data, head/tail breaks, ht-index, power laws, fractal, and nonlinearity
Content may be subject to copyright.
1
The Evolution of Natural Cities from the Perspective of Location-Based Social Media
Bin Jiang and Yufan Miao
Department of Technology and Built Environment, Division of Geomatics
University of Gävle, SE-801 76 Gävle, Sweden
Email: bin.jiang@hig.se, yufanmiao@gmail.com
(Draft: August 2013, Revision: September 2013, January 2014)
Abstract
This paper examines the former location-based social medium Brightkite, over its three-year life span,
based on the concept of natural cities. The term ‘natural cities’ refers to spatially clustered geographic
events, such as the agglomerated patches aggregated from individual social media users’ locations. We
applied the head/tail division rule to derive natural cities. More specifically, we generated a
triangulated irregular network, made up of individual unique user locations, and then categorized
small triangles (smaller than an average size) as natural cities for the United States (mainland) on a
monthly basis. The concept of natural cities provides a powerful means to develop new insights into
the evolution of real cities, because there are virtually no data available to track the history of a city
across its entire life span and at very fine spatial and temporal scales. Therefore, natural cities can act
as a good proxy of real cities, in the sense of understanding underlying interactions, at a global level,
rather than of predicting cities, at an individual level. Apart from the data produced and the contributed
methods, we established new insights into the structure and dynamics of natural cities, e.g., the idea
that natural cities evolve in nonlinear manners at both spatial and temporal dimensions.
Keywords: Big data, head/tail breaks, ht-index, power laws, fractal, and nonlinearity
1. Introduction
Once upon a time, there were no cities, only scattered villages. Over time, cities gradually emerged
through the interaction of people or residents; similarly, large or mega cities evolve through the
interaction of cities or people. This is a conjecture mentioned in Jiang (2013b), in which he argued that
geographic phenomena such as urban growth are essentially unpredictable. Many models in the
literature that claim to be able to predict urban growth are in effect for short-term prediction like the
weather forecast; weather forecast beyond five days is essentially unforecastable (Bak 1996). A typical
city may have hundreds of years of history, making it nearly impossible to track its growth
quantitatively because of a lack of related data. More important, a city grows within a system of cities;
one cannot understand a city’s growth without considering other related cities. In this paper, we
illustrate that emerging social media provide an unprecedented data source for studying the evolution
of natural cities (c.f., Section 2 for the definition), and subsequently for better understanding structure
and dynamics of real cities. Location-based social media, sometimes termed as location-based social
networks, such as Flickr, Twitter, and Foursquare (Traynor and Curran 2012, Zheng and Zhou 2011)
refer to a set of Internet-based applications founded on Web 2.0 technologies and ideologies that allow
users to create and exchange user-generated content. Location-based social media can act as a proxy of
real cities (or human settlements in general) and provide better understanding of underlying structure
and dynamics of human settlements.
Not a long ago, there were no social media, only scattered home pages and bulletin board systems
created and maintained by individuals and institutions (Boyd and Ellison 2008, Kaplan and Haenlein
2010). In the era of Web 1.0, geographic locations were not an issue. However, with Web 2.0,
geographic locations have been becoming an important feature of social media. Almost all social
media allow users to tag their geographic locations, often at the level of meters, when sharing and
2
exchanging user-generated content. Location-based social media enable users to track individual
historical trajectories, their friends, and even the growth of social media. Unlike with conventional
cities, the trajectories of social media are well documented by the hosting companies; and unlike
conventional census data, social media data is defined at individual level, often at very fine spatial and
temporal scales. Data can be obtained using crawling techniques or through the social media’s
officially released application programming interfaces (API). This study aimed to showcase how
social media’s time-stamped location data can be utilized to study the evolution of natural cities, and
thus, providing new insights into the underlying structure and dynamics of real cities.
The contribution of this paper can be seen from the three aspects: data, methods, and new insights.
This study produced a large amount of data regarding natural cities from the former social medium
Brightkite during its entire 31-month life span. The resulting data has significant value for further
study of city growth and allometric relationship between populations and physical extents (data, as
well as related source codes, from the study will be released upon acceptance of this paper). We drew
upon a set of fractal or scaling oriented methods to characterize natural cities. These unique methods
help create new insights into the evolution of natural cities as well as that of real cities. For example,
natural cities demonstrate a striking nonlinear property, spatially and temporally (see Section 5).
Moreover, the evolution of natural cities can provide better understanding of social media from a
unique geospatial perspective.
This study provides new perspectives, as well as different ways of thinking, to the study of cities and
city growth in the era of big data (Mayer-Schonberger and Cukier 2013). We did not adopt
conventional census data, but rather the emerging georeferenced social media data; we did not adopt
conventional geographic units or boundaries that are imposed from the top down by authorities, but
rather the naturally defined concept of natural cities, to avoid statistical bias out of the modifiable areal
unit problem (Openshaw 1984); and we did not rely on standard and spatial statistics with a
well-defined mean to characterize spatial heterogeneity, but rather power-law-based statistics, driven
by fractal and scaling thinking. Therefore, the underlying ways of thinking adopted in this study are
bottom up rather than top down, in terms of data and methods, nonlinear rather than linear, and fractal
rather than Euclidean in terms of the power-law statistics. Therefore, this study intends to argue that
geospatial analysis requires a different way of thinking while dealing with the problem of spatial
heterogeneity.
The remainder of this paper is structured as follows. Section 2 presents the methods in which we
define the concept of natural cities, and discuss ways of characterizing natural cities. Section 3
presents the data on a monthly basis and shows basic statistics of the data. Section 4 discusses on the
results and major findings, while Section 5 on the implications of the study. Finally, Section 6 draws a
conclusion and points to future work.
2. Methods
In this section, we illustrate and define the concept of natural cities and present various ways of
characterizing natural cities. We also discuss how natural cities differ from conventional cities and
why they represent a new way of thinking for geospatial analysis.
2.1 Defining natural cities
To approach the difficult task of defining and describing natural cities, we start with definitions of
conventional cities and try to clarify why the conventional definitions are not natural. A city is a
relatively large and permanent human settlement. But how large a settlement must be to qualify as a
city is unclear. For example, a city in Sweden may not qualify as a city in China. Also, many cities
have a particular administrative, legal, and historical status according to its local laws. In the United
States, for example, cities can refer to incorporated places, urban areas, or metropolitan areas with
sufficient population of, say, at least 10,000. This population threshold can be very subjective and is
dependent on the country. This subjectivity is also demonstrated in the physical boundaries of cities,
which are legally and administratively determined. Remotely sensed imagery provides new means to
3
delineate city boundaries, but how does one choose an appropriate pixel value as a cutoff for the
delineation? Because of these subjectivities, conventional definitions of cities are unnatural. How, then,
can we define a city in more natural ways?
We present three examples of natural cities before formally define the concept. In the first example,
natural cities are derived from massive street nodes, including both junctions and street ends. Given all
street nodes of an entire country, we can run an iterative clustering algorithm to determine whether a
node is within the neighbor of another node. For example, set a radius of 700 meters and continuously
draw a circle around each node to determine whether any other node is within its circle. This
progressive and exhaustive process results in many natural cities; see Figure 1a for an illustrative
example. In their study, Jiang and Jia (2011) found that millions of natural cities could be derived
from dozens of millions of street nodes in the United States using OpenStreetMap (OSM) data
(Bennett 2010). Instead of massive street nodes, the second example relies on a massive number of
street blocks to extract natural cities. Jiang and Liu (2012) adopted the three largest European
countries: France, Germany, and the UK for their case studies, again using OSM data. The idea is
illustrated in Figure 1b in which small blocks (smaller than an average city block) constitute a natural
city. Although this method sounds very simple, the computation is very intensive for each country, and
involves millions of street blocks. The third example comes from Jiang and Yin (2014), in which the
authors relied on nighttime imagery to derive natural cities. The author took all pixel values (millions
of pixels each valued between 0 and 63) of an image in the United States and computed an average
value or mean. The mean split all the pixels into two: those above the mean, and those below the mean.
For the pixels above the mean, a second mean was obtained, and it can be a meaningful cutoff for
delineating natural cities.
Figure 1: (Color online) Natural cities based on (a) street nodes and (b) street blocks
(Note: Blue rectangles are the boundaries of the natural cities, which are composed of high-density
nodes or small street blocks based on the head/tail division rule (Jia and Jiang (2010))
These examples of deriving natural cities point out the importance of the mean’s effect, which is based
on the head/tail division rule: Given a variable X, if its values x follow a heavy tailed distribution, then
the mean (m) of the values can divide all the values into two parts: a high percentage in the tail, and a
low percentage in the head (Jiang and Liu 2012). The heavy tailed distribution refers to the statistical
distributions that are right-skewed, for example, power law, lognormal, and exponential. Obviously,
the density of street nodes, the size of street blocks, and the nighttime imagery pixel values all exhibit
a heavy tailed distribution, which implies that there are far more small things than large ones. In this
paper, we introduce an additional way of deriving natural cities: from individual users’ geographical
data of location-based social media. From unique users who check in from locations across an entire
country, we can build up a huge triangular irregular network (TIN), and then categorize these small
triangles (smaller than a mean) as natural cities (Figure 2); refer to the Appendix for a short tutorial.
Section 5 includes a discussion of why the head/tail division rules works so well in delineating natural
cities.
4
Figure 2: (Color online) Procedure of generating natural cities (red patches) from points through TIN
Based on these examples, a formal definition of natural cities can be derived. Natural cities refer to
human settlements or human activities in general on Earth’s surface that are objectively or naturally
defined and delineated from massive geographic information of various kinds, and based on the
head/tail division rule. Unlike conventional cities, natural cities do not need to meet a minimum
population requirement. A one-person settlement may constitute a natural city, or even zero people, if
natural cities are defined not according to human population, but something else. For example, when
natural cities are defined according to street nodes, a natural city derived from one street node may
have no people there at all. The reader may question whether this definition makes sense, but the
definition makes good sense because it provides a new perspective for geospatial analysis, and helps
us develop new insights into geographic forms and processes (see Sections 4 and 5). That is also the
reason that we use the term natural cities to refer to human settlements or human activities in general
on the Earth’s surface. With the concept of natural cities, we abandon the top-down imposed unnatural
geographic units or boundaries such as states, counties, and cities, in order to study geographic forms
and processes more scientifically.
2.2 Characterizing natural cities
The rank-size distribution of cities in a region can be well characterized by Zipf’s law, i.e., an inverse
power relationship between city rank (r) and city size (N), N = r ^ -1 (Zipf 1949). Simply put, when
ranking all cities in a decreasing order for a given country, the largest city is twice as big as the second
largest, three times as big as the third largest, and so on. In other words, a city’s size by population is
inversely proportional to its rank. Such a simple and neat law is found to hold remarkably well for
almost all countries or regions (e.g., Berry and Okulicz-Kozaryn 2011), although some researchers
have challenged its universality (e.g., Benguigui and Blumenfeld-Leiberthal 2011). Essentially, Zipf’s
law indicates two aspects: (1) a power-law relationship between rank and size, and (2) the Zipf’s
exponent of one. Most previous studies have confirmed the first aspect, but not the second; the Zipf’s
exponent was found to deviate from one. In other words, the first aspect is not as much controversial
as the second aspect. Some researchers argued that Zipf’s law was primarily used for characterizing
large cities rather than all cities. In this study, we chose large natural cities (larger than a mean) to
examine whether they followed Zipf’s law. The scaling patterns of far more small cities than large
ones underlie Zipf’s law — a majority of small cities, while a minority of large cities. More important,
the scaling pattern recurs not just once, but multiple times for those large cities, again and again. This
is the basis of head/tail breaks (Jiang 2013), a novel classification scheme for data with a heavy tailed
distribution. In what follows, we illustrate head/tail breaks with a working example.
Table 1: Head/tail breaking statistics for the TIN edges
EdgesMean#Head%Head#Tail%Tail
5042.213527%36973%
1356.23526%10074%
3513.41337%2263%
1320.7323%1077%
333.2133%267%
5
The triangulated irregular network shown in Figure 2 apparently seems to contain far more short edges
than long ones, and indeed, this is true. There are 504 edges, ranging from the shortest 0.001 to the
longest 46.752. The wide range 46.751 = 46.752 – 0.001 and the large ratio 46,752 = 46.752/0.001
clearly indicate far more short edges than long ones. The average length of the 504 edges is 2.2, which
splits all the edges into two unbalanced parts: 135 in the head (27 percent) and 369 in the tail (73
percent). This head/tail breaking process can be continued for the head again and again, as shown in
Table 1. Eventually, the scaling pattern of far more short edges than long ones recurs five times, three
of which are plotted in Figure 3, or so-called nested rank-size plots. Given that the scaling pattern
recurs five times, the ht-index is six. Note that ht-index (Jiang and Yin 2014) is an alternative index to
fractal dimension (Mandelbrot 1983) used to capture the complexity of geographical features.
Figure 3: (Color online) Nested rank-size plots for the first three hierarchical levels with respect to the
first three rows in Table 1
(Note: The x axis and y axis represent rank and size respectively. The largest plot contains the 504
edges, the red being the first head (135 edges) and the blue being the first tail (369 edges). The 135
edges are plotted again with the red representing 35 in the second head and the blue 100 in the second
tail. The smallest plot is for the 35 edges in the second head.)
Head/tail breaks or ht-index provides a simple yet effective means to characterize natural cities, or
data in general with a heavy tailed distribution for mapping purposes. The derived ht-index captures
the hierarchy or scaling hierarchy of the data. For mapping purposes, head/tail breaks is superior to
conventional classification methods for capturing the underlying scaling pattern (Jiang 2013).
Ht-index complements to fractal dimension for characterizing the complexity of geographic features or
fractals in general.
3. Data and Data Processing
As stated above, the data for this study came from the former location-based social medium Brightkite,
during its three-year (31 months to be more precise) life span, from April 2008 to October 2010 (Cho,
Myer, and Leskovec 2011). The case included 2,837,256 locations in the mainland United States.
From the amount of locations, we removed duplicate locations, obtained 412,961 unique locations for
generating a TIN, and then 8,307 natural cities as of October 2010, by following the procedure shown
in Figure 2, as well as the short tutorial in the Appendix. The location data was time stamped (Table 2),
so we were able to slice all these locations monthly in an accumulated manner, i.e., locations at month
mi+1 contain all locations between months m1 and mi, where 131. For each time interval or
snapshot, we generated a set of natural cities ranging from dozens to thousands. For some snapshots,
we had to split data into small pieces, and put them back together to ArcGIS for visualization and
analysis. For example, Figure 4 illustrates the 8,307 natural cities as of October 2010, showing their
boundaries and populations. Note that this is just one of the 31 snapshots or datasets in the study.
6
Table 2: Initial check-in data format
User Chec
k
‐intime Latitude Longitude Locationid
58186 2008‐12‐03T21:09:14Z 39.633321 ‐105.317215 ee8b88dea22411
58186 2008‐11‐30T22:30:12Z 39.633321 ‐105.317215 ee8b88dea22411
58186 2008‐11‐28T17:55:04Z ‐13.158333 ‐72.531389 e6e86be2a22411
58186 2008‐11‐26T17:08:25Z 39.633321 ‐105.317215 ee8b88dea22411
58187 2008‐08‐14T21:23:55Z 41.257924 ‐95.938081 4c2af967eb5df8
Figure 4: (Color online) The largest set of natural cities as of October 2010 (red patches for boundaries
and red dots for populations) on the background of TIN (gray lines) generated from 412,961 unique
location points or 2,837,256 duplicate ones
Table 3: Measurements and statistics from location points to natural cities for the different time
intervals
(Note: Pnt = # of points, PntUniq = # of unique points, TINEdge = # of TIN edges, Mean = Average
length of TIN edges, NaturalCity = # of natural cities)
TimePntPntUniqTINEdgeMeanNaturalCity
2008043784119935805404344
2008053543174012217818900192
20080669670124893744313971274
200807105381171195133111713345
200808139248217186512610229458
20080917013426716801238741532
200810211612334381002897537660
200811263919422161266216570854
2008123143395070615208959951029
2009013645845869917606954791158
2009024117716556219666251271295
2009034677157340822020048091467
2009042375827353176105950821397088
2009052430842359243107770921197217
2009062480720364760109425720997306
2009072531199370992111295320797460
2009082577445376811113041020617579
2009092618017381447114432120477669
2009102646890386286115883520307761
2009112674335390462117136320167830
2009122697482393907118169820057923
2010012717638396763119026919957988
2010022734260398964119687219878040
2010032752153401562120466619788082
2010042766773403795121136219708117
2010052779889405814121741919658149
2010062793472407711122311019618206
7
2010072806640409487122843719568259
2010082820373411194123356119518303
2010092831894412407123719719478301
2010102837256412961123885919458307
Table 3 lists some basic measurements and statistics from the location points to the natural cities. For
example, for the first month, April 2008, only 44 natural cities were generated from 3,784 locations, of
which 1,199 unique locations were used for generating a TIN with 3,580 edges, and a mean of 54,043
as the cutoff to derive the 44 natural cities. The number of natural cities increased to 8,307 as of
October 2010. During the 31 months, natural cities increased rapidly at some instances, e.g., over four
time increments from April to May 2008 and from March to April 2009. We do not know why there
are such rapid increments, but it could relate to advertising effects. In addition, there was a slight drop
in the number of natural cities from August to September 2010. In the following section, we utilize the
seven time intervals highlighted in Table 3 for a detailed discussion of our findings.
4. Results and Discussions
Before discussing the findings, we map the natural cities at the seven time intervals (or snapshots) for
four largest natural cities surrounding Chicago, New York, San Francisco, and Los Angeles. These are
shown in Figure 5, which illustrates clearly how the four cities or regions grew or expanded during the
31-month period. All parts of the country can be assessed for similar patterns of growth and evolution.
We know little about why the procedure shown in Figure 2, as well in the Appendix, works so well,
but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities.
On the one hand, the natural cities expanded towards more fragmented pieces, far more small pieces
than large ones. On the other hand, the physical boundaries of the natural cities tended to become
more irregular over time. These two aspects suggest that the natural cities are fractal, and become
more and more fractal, resembling very well real cities (Batty and Longley 1994). These two aspects
are further discussed in the following.
Figure 5: (Color online) Evolution of the natural cities near the four largest cities regions with TIN as
a background
These results can be assessed from both global and local perspectives. Globally, all the natural cities in
the United States exhibit a power-law distribution. This is shown rank-size plots (Figure 6), in which
the distribution lines are very straight for all the natural cities at the different time intervals in the
8
log-log plots. The natural cites as of April 2008, except the smallest with less than 12 people, exhibited
a clear power law, probably the straightest distribution among all others. This result is the same in May
2008. However, the distribution lines from October 2008 to October 2010 are less straight, indicating
that a few of the largest natural cities did not fit well the power-law distribution. This is particularly
obvious for the last two snapshots in April 2009 and October 2010. A possible reason for this
difference, moving from a striking to a less striking power law, is described below.
Figure 6: (Color online) Rank-size plot for the natural cities
In further examinations, we looked at the large cities (larger than the mean) in each snapshot and
found that Zipf’s exponent was indeed around one for the first two months (0.98, and 1.08), and then
greater than one by about 0.25 (Table 4). Considering the duality of Zipf’s law, this result suggests that
Zipf’s law held remarkably well for the first two months, but less so for the remaining months. We
postulated a possible reason: The social medium users at the first two months increased proportionally
with the populations of real cities, thus leading to a striking Zipf’s law effect among the natural cities
because the populations of real cities are power-law distributed. Over time, large cities — particularly
a few of the largest cities such as New York — did not capture the other cities in attracting more users.
In other words, beyond the first two months, the increase in social medium users became less
proportional to the real cities’ populations. As a result, Zipf’s law is less striking. We assess this point
further in the discussion of our findings from a local perspective below. In contrast to small deviations
of Zipf’s exponent, the ht-index increased from four to seven (Table 4). Note that ht-index is a
measure for characterizing complexity of fractals or of geographic features (Jiang and Yin 2014). The
increment of the ht-index implies that more hierarchical levels were added, reflecting well the
evolution of the natural cities and of the social medium.
Table 4: Zipf’s exponent and ht-index for the natural cities
2008‐04 2008‐05 2008‐10 2008‐12 2009‐03 2009‐04 2010‐10
Zipf's
exponent 0.98 1.08 1.26 1.24 1.26 1.27 1.25
Ht‐index 4 4 4 5 6 6 7
Locally, there are two points to discuss. First, the boundaries of the natural cities became more
irregular over time, very much like the Koch curve when the iteration goes up. For example, the
boundaries of the natural cities as of April 2008 were simple enough to be described by Euclidean
geometry. However, over time, the boundaries must be characterized by fractal geometry — more
fragmented with more fine scales added. Second, large natural cities tended to become larger and
larger, while small ones continuously emerged at local levels. Figure 5 illustrates this finding in a less
striking manner, as the city sizes are measured by the physical extents. But if the city sizes are
9
measured by population as in Figure 7, we noticed the rapid increases for the four largest cities.
Overall, the four cities tended to become larger and larger, but there was a major difference among the
four. To illustrate the difference, we must clarify that Figure 7 adopts the graduated dots to represent
the city sizes, which are classified according to head/tail breaks. This is because the city sizes
exhibited a heavy tailed distribution, or there were far more small cities than large ones. Therefore, the
dot sizes in Figure 7 do not represent city sizes, strictly speaking, but rather, the corresponding classes
to which the cities belong. Notice that the largest natural city in the New York region in October 2010
appears smaller than in April 2009, which indicates that the natural city belonged to a higher class in
April 2009 than in October 2010. This is indeed true! Table 5 clearly indicates that the New York
natural city in April 2009 belonged to the sixth among the six classes, while its position in October
2010 dropped to the fifth among the seven classes. This finding also describes what we stated above: A
few of the largest cities did not capture the others in attracting more users.
Figure 7: (Color online) Evolution of the natural cities in terms of populations (or points)
near the four largest cities regions
Table 5: Evolution of the four cities within the system of the natural cities
(Note: a/b where a and b respectively denote the class the particular city belongs to, and the total
number of classes or the ht-index)
2008‐04 2008‐05 2008‐10 2008‐12 2009‐03 2009‐04 2010‐10
Chicago 1/4 2/4 3/4 3/5 3/6 4/6 4/7
NewYor
k
2/4 3/4 3/4 4/5 4/6 6/6 5/7
SanFrancisco 3/4 4/4 4/4 5/5 5/6 6/6 6/7
LosAngeles 3/4 3/4 4/4 4/5 5/6 6/6 7/7
The above results or findings can be summarized by nonlinearity, which is reflected in both spatial and
temporal dimensions. Spatially, the natural cities were distributed heterogeneously or unevenly, i.e.,
there were far more small cities than large ones. This uneven distribution also was seen in the temporal
dimension. For example, within the first 10 months of 2008, the natural cities already had taken the
shapes of individual cities (Figure 5), with populations continuously growing, and small natural cities
being added persistently for the remaining time. In other words, it took just one third of the social
medium’s lifetime to determine the shapes of individual cities. That is also the reason that we chose
the seven unequal time intervals to examine the evolution.
10
5. Implications of the Study
The location-based social media provide large amounts of location data of significant value for
studying human activities in the virtual world, as well as on the Earth’s surface. Nowadays, the social
sciences — human geography in particular — benefit considerably from emerging social media data
that are time-stamped and location-based. The ways of doing geography and social sciences are
changing! The emerging big data harvested from social media, as well as from positioning and
geospatial technologies, coupled with data-intensive computing (Hey, Tansley, and Tolle 2009) are
transforming conventional social sciences into computational social sciences (Lazer et al. 2009). In
this section, we discuss some deep implications of this study for geography and social sciences in
general.
The notion of natural cities implies a sort of bottom-up thinking in terms of data collection and
geographic units or boundaries. Conventional geographic data collected and maintained from the top
down by authorities are usually sampled and aggregated, and therefore, are small-sized. On the other
hand, new data harvested from social media are massive and individual, so they are called ‘big data.’
Time-stamped and location-based social media data, supported by Web 2.0 technologies and
contributed by individuals through humans as sensors (Goodchild 2007), constitute a brilliant new data
source for geographic research. Conventional geographic units or boundaries are often imposed from
the top down by authorities or centralized committees, while natural cities are defined and delineated
objectively in some natural manner, based on the head/tail division rule. This natural manner
guarantees that we can see a true picture of urban structure and dynamics, and suggests the
universality of Zipf’s law. This true picture is fractal and can be illustrated in this example: Throw
forcefully a wine glass on a cement ground, and it will very likely break into a large number of pieces.
Like the natural cities, these glass pieces are fractal or follow Zipf’s law: On the one hand, there are
far more small pieces than large ones, and on the other hand, each piece has an irregular shape.
The evolution of natural cities demonstrates nonlinearity at both spatial and temporal dimensions, or
equivalently from both static and dynamic points of view. Many phenomena in human geography, as
well as in physical geography, bear this nonlinearity (Batty and Longley 1994, Frankhauser 1994,
Chen 2009, and Phillips 2003). However, we are still very much constrained by linear thinking,
explicitly or implicitly, consciously or unconsciously. For example, we rely on Euclidean geometry to
describe Earth’s surface, and on a well-defined mean to characterize spatial heterogeneity. Our
mindsets apparently lag behind the advances of data and technologies. Conventional linear thinking is
not suitable for describing the Earth’s surface (the geographic forms), not to mention uncovering the
underlying geographic processes. Instead, we should adopt nonlinear thinking, or nonlinear
mathematics such as fractal geometry, chaos theories, and complexity for geographic research. The
tools adopted in this study, such as head/tail division rule, head/tail breaks, and ht-index, underlie
nonlinear mathematics and power-law-based statistics. These nonlinear mathematical tools help to
elicit new insights into the evolution of natural cities. Nonlinearity also implies that geographic forms
and processes are unpredictable like long-term weather or climate in general. To better predict and
understand geographical phenomena, we must seek to uncover the underlying mechanisms through
simulations rather than simple correlations.
The head/tail division rule is intellectually exciting because it appears to be both powerful and
mysterious. The reason why the head/tail division rule is an effective tool to derive natural cities, in
particular at the different time stages, remains an open question. However, we tend to believe it is the
effect of the wisdom of crowds — the diverse and heterogeneous many are often smarter than the few,
even a few experts (Surowiecki 2004). The massive amount of edges (up to 1,238,859) of the
generated TIN from the massive location points constituted the ‘crowds,’ and they collectively decided
an average cutoff for delineating the natural cities. Every single edge had ‘its voice heard’ in the
democratic decision. From the effectively derived natural cities, we can see an advantage of working
with big data. If we had not worked with the entire US data set, but only an area surrounding New
York for example, we would not have been able to determine a sensible cutoff for delineating the New
York natural city. Only with the big data that includes all location points or all edges can a meaningful
cutoff be determined and applied to all. In this sense, the approach to delineating natural cities is
11
holistic and bottom up, with participation of all diverse and heterogeneous individuals.
It is important to note that the check-in users are biased towards certain types of people. Thus the
derived natural cities are not exactly the same as the corresponding real cities. However, no one can
deny that the boundaries shown in Figure 5 are not those of Chicago, New York, San Francisco, and
Los Angeles, in particular with respect to the last time interval 2010-10. One can simply cross check
Google Maps to see how the cities or regions look like. On the other hand, this paper is not to study
real cities, at an individual level, on how they can be captured or predicted by the natural cities, but to
understand, at a collective level, underlying mechanisms of agglomerations, formed either by people
in physical space (real cities), or by the check-in users in virtual space (natural cities). In other words,
we consider cities (either real or natural cities) as an emergence (Johnson 2002) developed from
interactions of individual people from the bottom up. We believe that the insights developed social
media data can be applied to real cities, e.g., fractal structure and nonlinear dynamics. The fact that not
all people are the check-in users should not be considered a biased sampling issue. Sampling is an
inevitable technique at the time of information scarcity, so called the small data era, but it is not
legitimate concept in the big data era. The large social media data implies N=all (Mayer-Schonberger
and Cukier 2013). This N=all is an essence of big data. Given the 2.8 millions of check-in locations,
the social media can be a good proxy for studying the evolution of real cities in the country.
We face an unprecedented golden era for geography, or social sciences in general, with the wave of
social media and, in particular, the increasing convergence of social media and geographic information
science (Sui and Goodchild 2011). For the first time in history, human activities can be documented at
very fine spatial and temporal scales. In this study, we sliced the data monthly, but we certainly could
have done so weekly, daily, and even hourly. We believe that the observed nonlinearity at the temporal
dimension would be even more striking. This, of course, warrants further study. Geographers should
ride the wave of social media and develop a more computationally minded geography or
computational geography (Openshaw 1998). If we do not seize this unique opportunity, we may risk
being purged from the sciences. The rise of computational social science is a timely response to the
rapid advances of data and technologies. In fact, physicists and computer scientists already have been
working on this exciting and rapidly changing domain (see Brockmann, Hufnage, and Geisel 2006;
and Zheng and Zhou 2011). We geographers should do more rather than less.
6. Conclusion
Driven by the lack of data for tracking the evolution of cites, this study demonstrated that emerging
location-based social media such as Flickr, Twitter, and Foursquare can act a proxy for studying and
understanding underlying evolving mechanisms of cities. Compared with conventional census data
that are usually sampled, aggregated, and small, the time-stamped and location-based social media
data can be characterized as all, individual, and big. In this paper, we abandoned conventional
definitions of cities, and adopted objectively or naturally defined natural cities, using massive
geographic information of various kinds, and based on the head/tail division rule. Built on the notion
of the wisdom of crowds, the head/tail division rule works very well to establish a meaningful cutoff
for delineating natural cities. Natural cities provide an effective means or unique perspective to study
human activities for better understanding of geographic forms and processes.
We examined the evolution of natural cities, derived from massive location points of the social
medium Brightkite, during its 31-month life span. We found nonlinearity during the evolution of
natural cities in both spatial and temporal dimensions, and the universality of Zipf’s law. We archived
all the data that could be of further use for developing and verifying urban theories. This study has
deep implications for geography and social sciences in light of the increasing amounts of data that can
be harvested from location-based social media. Therefore, we call for the application of nonlinear
mathematics, such as fractal geometry, chaos theories and complexity to geographic and social science
research. A limitation of this study lies in the data that shows only the social medium’s continuous rise
and not its decline. Brightkite seemed to disappear overnight. Future research should concentrate on
development of power-law-based statistics, and underlying nonlinear mathematics, to manage the
12
increasing social media data and on agent-based simulations to reveal the mechanisms for the
evolution of natural cities.
Acknowledgement
An early version of paper was presented as a keynote address entitled “The evolution of natural cities:
a new way of looking at human mobility”, at Mobile Ghent '13, 23-25 October 2012, University of
Ghent, Belgium. XXXX
References:
Bak P. (1996), How Nature Works: The science of self-organized criticality, Springer-Verlag: New
Yo rk .
Batty M. and Longley P. (1994), Fractal Cities: A geometry of form and function, Academic Press:
London.
Benguigui L., and Blumenfeld-Leiberthal E. (2011), The end of a paradigm: Is Zipf’s law universal?
Journal of Geographical Systems, 13, 87–100.
Bennett J. (2010), OpenStreetMap: Be your own cartographer, PCKT Publishing: Birmingham.
Berry B. J. L. and Okulicz-Kozaryn A. (2011), The city size distribution debate: Resolution for US
urban regions and megalopolitan areas, Cities, 29, S17-S23.
Boyd D. M. and Ellison N. B. (2008), Social network sites: Definition, history, and scholarship,
Journal of Computer-Mediated Communication, 13, 210 – 230.
Brockmann D., Hufnage L., and Geisel T. (2006), The scaling laws of human travel, Nature, 439, 462
– 465.
Chen Y. (2009), Spatial interaction creates period-doubling bifurcation and chaos of urbanization,
Chaos, Solitons & Fractals, 42(3), 1316-1325.
Cho E., Myers S. A., and Leskovec J. (2011), Friendship and mobility: user movement in
location-based social networks, Proceedings of the 17th ACM SIGKDD international conference
on Knowledge discovery and data mining, ACM: New York, 1082-1090.
Frankhauser P. (1994), La Fractalité des Structures Urbaines, Economica: Paris.
Goodchild M. F. (2007), Citizens as sensors: The world of volunteered geography, GeoJournal, 69(4),
211 -221.
Hey T., Tansley S., and Tolle K. (2009), The Fourth Paradigm: Data intensive scientific discovery,
Microsoft Research: Redmond, Washington.
Jia T. and Jiang B. (2010), Measuring urban sprawl based on massive street nodes and the novel
concept of natural cities, Preprint: http://arxiv.org/abs/1010.0541.
Jiang B. (2013), Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution,
The Professional Geographer, 65 (3), 482 – 494.
Jiang B. and Jia T. (2011), Zipf's law for all the natural cities in the United States: a geospatial
perspective, International Journal of Geographical Information Science, 25(8), 1269-1281.
Jiang B. and Liu X. (2012), Scaling of geographic space from the perspective of city and field blocks
and using volunteered geographic information, International Journal of Geographical
Information Science, 26(2), 215-229.
Jiang B. and Yin J. (2014), Ht-index for quantifying the fractal or scaling structure of geographic
features, Annals of the Association of American Geographers, xx, xx-xx, preprint:
http://arxiv.org/abs/1305.0883
Johnson S. (2002), Emergence: The Connected Lives of Ants, Brains, Cities, and Software, Scribner:
New York.
Kaplan A. M. and Haenlein M. (2010), Users of the world, unite! The challenges and opportunities of
social media, Business Horizons, 53, 59—68.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contractor,
N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., and Van Alstyne, M.
(2009), Computation social science, Science, 323, 721-724.
Mandelbrot B. (1982), The Fractal Geometry of Nature, W. H. Freeman and Co.: New York.
Mayer-Schonberger V. and Cukier K. (2013), Big Data: A revolution that will transform how we live,
13
work, and think, Eamon Dolan/Houghton Mifflin Harcourt: New York.
Openshaw S. (1984), The Modifiable Areal Unit Problem, Geo Books: Norwick Norfolk.
Openshaw S. (1998), Towards a more computationally minded scientific human geography,
Environment and Planning A, 30, 317-332.
Phillips J. D. (2003), Sources of nonlinearity and complexity in geomorphic systems, Progress in
Physical Geography, 27(1), 1–23.
Sui D. and Goodchild M. (2011), The convergence of GIS and social media: challenges for GIScience,
International Journal of Geographical Information Science, 25(11), 1737–1748.
Surowiecki J. (2004), The Wisdom of Crowds: Why the Many Are Smarter than the Few, ABACUS:
London.
Traynor D. and Curran K. (2012), Location-based social networks, In: Lee I. (editor, 2012), Mobile
Services Industries, Technologies, and Applications in the Global Economy, IGI Global:
Hershey, PA, 243 - 253.
Zheng Y. and Zhou X. (editors, 2011), Computing with Spatial Trajectories, Springer: Berlin.
Zipf G. K. (1949), Human Behavior and the Principles of Least Effort, Addison Wesley: Cambridge,
MA.
Appendix: Tutorial on How to Derive Natural Cities based on ArcGIS
This tutorial aims to show, in a step by step fashion with ArcGIS, how to derive natural cities using the
first month data (2008-04) as an example. Once you have got the check-in data of the first month,
transfer them into an Excel sheet with two columns namely x, y, respectively representing longitude
and latitude. Add a third column z, and set all column values as one (or any arbitrary value since
ArcGIS relies on 3D points for creating a TIN). Insert the Excel sheet as a shape file data layer in
ArcGIS (Figure A1(a)). Create a TIN from the point layer, using ArcToolbox > 3D analyst tools > TIN
management > Create TIN (Figure A1(b)).
Figure A1: Screen snapshots including (a) the 1199 unique points, (b) the TIN from the 1199 points, (c)
the selected edges shorter than the mean 54042.8, and (d) the 44 natural cities created
Convert the TIN into TIN edge, using ArcToolbox > 3D analyst tools > Conversion > From TIN > TIN
Edge. The converted TIN edge is a polyline layer. Right click the polyline layer to Open Attribute
Table in order to get statistics about the length of the edges. Figure A2 shows that the frequency
distribution that is apparently L-shaped, indicating that there are far more short edges than long ones.
Note that the mean is 54042.8.
14
Figure A2: Statistics of TIN edges
Select those shorter edges than this mean 54042.8 following menu Selection > Select by Attributes... .
The selected edges are highlighted in Figure A1(c). The selected shorter edges refer to high density
locations. Dissolve all the shorter edges into polygons to be individual natural cities (Figure A1(d)),
following ArcToolbox > Data Management Tools > Generalization > Dissolve, or alternatively
following menu Geoprocessing > Dissolve (the option ‘Create multipart features’ should be
unchecked).
The above steps all can be done with the existing ArcGIS functions, since the first month data is small
enough. However, we cannot fulfill all the processes simply using existing ArcGIS functions for some
later months. This is because more check-in points are added cumulatively. In this case, we must spit
the data into small pieces and put them back again into ArcGIS with some simple codes. All the
related codes together the natural cities data are archived at
https://sites.google.com/site/naturalcitiesdata/.

Supplementary resource (1)

... The NC is based on fractal geometry, a complex structure and mathematical set that can be reflected at multiple scales [29]. The NC involves geographical events that are concentrated in space, such as patches formed by the aggregated location information of social media users [30]. Therefore, the NC can provide a new perspective for geospatial data analysis and help understand the forming and processing of geographic events [31]. ...
... The derivation of NCs is the basis for delineating BUAs. On the one hand, the data used to derive NCs should follow a certain distribution condition [30]. On the other hand, the basis of deriving NCs is a segmentation method [30]. ...
... On the one hand, the data used to derive NCs should follow a certain distribution condition [30]. On the other hand, the basis of deriving NCs is a segmentation method [30]. More specifically, when the value of a variable X follows the heavy-tailed distribution (the statistical distribution of the right deviation), a segmentation method based on the head/tail division rule can divide these values into two parts [28]. ...
Article
Full-text available
Understanding and quantifying urban expansion is critical to urban management and urban planning. The accurate delineation of built-up areas (BUAs) is the foundation for quantifying urban expansion. To quantify urban expansion simply and efficiently, we proposed a method for delineating BUAs using geographic data, taking Guangzhou as the study area. First, Guangzhou’s natural cities (NCs) in 2014 and 2020 were derived from the point of interest (POI) data. Second, multiple grid maps were combined with NCs to delineate BUAs. Third, the optimal grid map for delineating BUA was determined based on the real BUA data and applying accuracy evaluation indexes. Finally, by comparing the 2014 and 2020 BUAs delineated by the optimal grid maps, we quantified the urban expansion occurring in Guangzhou. The results demonstrated the following. (1) The accuracy score of the BUAs delineated by the 200 m × 200 m grid map reaches a maximum. (2) The BUAs in the central urban area of Guangzhou had a smaller area of expansion, while the northern and southern areas of Guangzhou experienced considerable urban expansion. (3) The BUA expansion was smaller in all spatial orientations in the developed district, while the BUA expansion was larger in all spatial orientations in the developing district. This study provides a new method for delineating BUAs and a new perspective for mapping the spatial distribution of urban BUAs, which helps to better understand and quantify urban expansion.
... Tendo subjacente a Teoria de Redes de Dredge (1999) e de Baggio & Cooper (2010), dentro de uma área urbana um conjunto de locais turísticos interligados podem influenciar a construção geográfica da cidade como destino e assemelhar-se a uma cidade turística, embora a própria visão de cidade (como espaço turístico) deva ser, na nossa opinião, objeto de reflexão ao considerar-se as áreas turísticas como locais agregados de origem e/ou destino. Neste contexto, releva-se o uso da expressão "cidades naturais", que mais não são do que áreas onde se usam informações individuais e que de forma integrada determinam uma representação inteligível do espaço geográfico (Jiang & Liu, 2012;Jiang & Miao, 2015). As "cidades naturais" assumem-se como um bom proxy para a realidade das cidades (e.g., cidades estatísticas) e incluem novos insights sobre a evolução dos sistemas urbanos Jiang & Miao, 2015;Long et al., 2018). ...
... Neste contexto, releva-se o uso da expressão "cidades naturais", que mais não são do que áreas onde se usam informações individuais e que de forma integrada determinam uma representação inteligível do espaço geográfico (Jiang & Liu, 2012;Jiang & Miao, 2015). As "cidades naturais" assumem-se como um bom proxy para a realidade das cidades (e.g., cidades estatísticas) e incluem novos insights sobre a evolução dos sistemas urbanos Jiang & Miao, 2015;Long et al., 2018). ...
Thesis
Full-text available
In recent years, those who create strategies and policies for urban tourist destinations have been increasingly concerned with the greater or lesser capacity to enjoy public space. Furthermore, the growth of urban areas on a global scale has caused significant changes in the (micro)climate, due to the increase in impermeable surfaces, the anthropogenic heat generated by human activities and the change in air circulation. Taking into account the increasing demands of tourists and residents and the need to improve cities in the face of climate change, the option is to design new measures and action solutions. However, the lack of quality of the input data or their (total) absence, as well as their low spatial resolution, are common. The inadequacy of structures for sharing information is also noted, which significantly limits planning and adaptation actions. This investigation aims to identify the main methods of analysis to monitor the current ability to enjoy tourism based on the integration of objective and subjective domains; and contribute to the definition of action plans which seek to mitigate and adapt the tourism sector to climate change, in the medium and long-term. To assess the validity of these assumptions, the Porto Metropolitan Area, in general, and the municipality of Porto, in particular, were used as case studies. In this investigation, different methods of information and units of analysis were combined, based on a meso approach and local scale for: (i) the identification of critical areas, in an office analysis based essentially on Big Data (i.e., Flickr photographs, AirBnB accommodation and MODIS and LANDSAT satellite imagery); (ii) the assessment of the comfort level for enjoyment in critical areas with high tourist potential through field data collection; and (iii) the identification of prioritization actions and measures to maintain tourism attractiveness in view of climate change, in the medium and long-term. This research highlights the need for more detailed information, the weak interaction between stakeholders and the limitation of resources. Thus, considering that Porto is a destination with a good climate for tourism, and committed to mitigating the effects of climate change, the proposed methodological triangulation allows to outline some measures with predictable action in the short, medium and long-term. Finally, this study aims to make some contributions at national and international level, with the likelihood of the methodological approach adopted to be replicated in other geographical areas, taking into account the particularities of each territory under analysis.
... However, many scholars have recently studied how to alleviate the MAUP. Jiang and Miao focused on the hierarchical agglomeration and heterogeneity of social media data to determine the corresponding urban structure, thus mitigating the statistical bias of the MAUP [7]. However, they did not evaluate the effect of the MAUP or provide a strategy for optimizing the selection of spatial analysis units. ...
... The traditional spatiotemporal analysis unit is subjectively established according to the analyst's point of view and application field [46][47][48]. There are often no criteria or reasons to ensure the representativeness of the analytical units, which may give rise to MAUP effects [7,8]. This paper argues that a multicriteria-based data-driven approach can be used to provide analysis units that fit the real world. ...
Article
Full-text available
Spatiotemporal scale is a basic component of geographical problems because the size of spatiotemporal units may have a significant impact on the aggregation of spatial data and the corresponding analysis results. However, there is no clear standard for measuring the representativeness of conclusions when geographical data with different temporal and spatial units are used in geographical calculations. Therefore, a spatiotemporal analysis unit optimization framework is proposed to evaluate candidate analysis units using the distribution patterns of spatiotemporal data. The framework relies on Pareto optimality to select the spatiotemporal analysis unit, thereby overcoming the subjectivity and randomness of traditional unit setting methods and mitigating the influence of the modifiable areal unit problem (MAUP) to a certain extent. The framework is used to analyze floating car trajectory data, and the spatiotemporal analysis unit is optimized by using a combination of global spatial autocorrelation coefficients and the coefficients of variation of local spatial autocorrelation. Moreover, based on urban hotspot calculations, the effectiveness of the framework is further verified. The proposed optimization framework for spatiotemporal analysis units based on multiple criteria can provide suitable spatiotemporal analysis scales for studies of geographical phenomena.
... Jiang and Miao [73] define "natural cities" as human settlements or human activities in general on Earth's surface that are naturally or objectively defined and delineated from massive geographic information based on head/tail division rule, a non-recursive form of head/tail breaks [74]. To that end, a massive collection of geo-referenced tweets (as available in TBCOV) can be used to delineate natural cities using tweet densities as a proxy to population densities, and eventually, lead to more meaningful delineation of city centers and borders rather than arbitrarily defined administrative units [75]. ...
... Furthermore, longitudinal analysis of the geo-referenced tweets can help track the evolution of natural cities in terms of changes in the spatial distribution and density of the COVID-19-related chatter across time, and provide new insights into the underlying structure and dynamics of the natural cities occurred during the COVID-19 pandemic. For this purpose, we analyzed the 1,674,265 tweets with accurate geo-coordinates collected across the mainland US and investigated evolution of the natural cities during different phases of the pandemic following the methodology introduced by Jiang and Miao [73]. Figure 14 shows the results of mapping natural cities at four different time intervals. ...
Article
Full-text available
As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.
... Research questions that can be answered with geospatial analysis are multidisciplinary in (Hyvärinen and Saltikoff, 2010), studying structure, dynamics, and rhythms of natural cities (Jiang and Miao, 2015;Morales et al., 2017), making observations about street networks in cities (Boeing, 2017), tracking infectious diseases (Padmanabhan et al., 2013), managing crisis situations (MacEachren et al., 2011a), capturing human movement patterns across political borders (Blanford et al., 2015), discovering significant events and patterns (Andrienko et al., 2010), understanding protest movements (Gleason, 2013), finding geographic patterns (Conover et al., 2013) and correlations in communication networks and languages (Mocanu et al., 2013), fine-tuning communication or marketing strategies (Bhattacharya et al., 2019), and answering many other questions related to human movements, dynamics, and communication. Researchers use maps to 1) report their findings, 2) verify whether social media is more reliable than other techniques for finding statistical relationships, 3) discover new patterns and insights about phenomena, 4) generate hypotheses about phenomena, and 5) understand laws that make generalizations about movements. ...
Chapter
This chapter presents major issues with retrieving, sampling, geocoding and analyzing geospatial and temporal patterns in social media data. The chapter takes an interdisciplinary approach that includes perspectives from different knowledge domains, including information science, geographic information science, geovisualization, information visualization, visual analytics, complex systems, and data science, presenting rich illustrative examples and case studies. It also discusses the benefits and shortcomings of geospatial methods, gives numerous suggestions on how to: collect geospatial data, avoid biases, aggregate data for protecting the privacy of social media contributors during the investigation, and what research questions to ask about people's locations in space or social phenomena. We complete with an overview of the advantages geospatial methods add to the analysis of social media. We carry readers to a conclusion that such techniques allow researchers to perceive the behaviors of social media contributors from a different perspective and discover static and dynamic patterns of users' spatial collective behaviors that are hard to detect to the unaided senses.
... Instead of administrative boundaries, we utilize natural cities as a benchmark to depict the precise distribution of such spatial cities at a global scale. Natural city is a product of the bottom-up thinking in terms of data collection and geographic units or boundaries proposed by Bin Jiang (Jiang and Jia, 2011;Jiang and Miao, 2015). To accomplish this, we first evaluated historical urban shrinkage to establish that the proportion of identified global shrinking cities increased from 9% to 16 and 25% during 199216 and 25% during -200016 and 25% during , 200016 and 25% during -201216 and 25% during , and 201316 and 25% during -201816 and 25% during , respectively. ...
Article
Full-text available
Shrinking cities are often neglected in the context of global urbanization, the tip of the iceberg which was driven by underlying complex sets of causes. It is therefore urgent and crucial to investigate the invisible aspects of global urbanization propelling specific challenges to attain Sustainable Development Goal 11 (SDG 11) related to sustainable cities and communities. Here we identify shrinking cities in 1992–2000, 2000–2012 and 2013–2018, and predict them in 2018–2050, using night-time light images and redefined natural city boundaries. The proportion of shrinking cities increased from 9% to 16% and 25%. Looking ahead, there will be 7,166 predicted shrinking cities in 2050, accounting for 37% of all cities. In this context, synergistic efforts like regreening vacant lands and constructing compact cities would help achieve SDG 11 in consideration of the new urban shrinking landscape with multi-source data like CO2 emissions and points of interests (POIs).
... This method is closely related to the mean value of the data set and is based on the head/tail division rule. The head/ tail division rule is used in variables that follow the heavy-tailed distribution representing the statistical distribution of the right deviation (e.g. the power law and exponential distributions) (Jiang and Miao 2015). More specifically, for a variable X that follows the heavy-tailed distribution, its mean value can divide all X values into two parts. ...
Article
Studying the structure of polycentric cities can promote a better understanding of urban development and contribute to urban planning. In this study, we identified polycentric cities in China and evaluated the urban centre development level of polycentric cities from new data and method. We used Luojia-1A night-time light (NTL) data, combined with the concept of natural cities (NCs), to identify urban centres and thus identify polycentric cities in China. In addition, we used the urban centre development index (UCDI) to quantify the urban centre development level (UCDL) that represents the overall urban centre development level within a polycentric city. The polycentric cities in China are characterized by the spatial distribution pattern of a larger number in the east and fewer in the west. There are a large number of polycentric cities in eastern China, and the closer to the coastal areas, the more polycentric cities there are. The distribution of UCDL in China’s polycentric cities is characterized by significant spatial heterogeneity. UCDLs are generally smaller in polycentric cities in western China. In addition, polycentric cities in northeastern China have smaller UCDL. Polycentric cities with high UCDL are concentrated in the central and coastal regions of China.
Article
Cities with integrated university campuses can become dependent on their student population to function properly. Restrictions caused by the COVID-19 pandemic put a temporary halt to the presence of the student population in some cities. The current study explores this short-term paradigm shift on the relationship between three higher education institutes and their host cities in the northern part of Cyprus. The analysis uses the spatial distribution of Twitter feeds in the academic semester before the pandemic as the baseline and makes a comparison with the following semesters when the education was mostly done via online remote platforms. The findings indicate a rapid decline in diversity and granulation of urban activities among students during the pandemic. This, in turn, is shown to impact the commercial zones of the host cities, shifting many leisure activities farther from the city. Furthermore, the degree of spatial integration between the urban fabrics and the campuses is shown to be influential in rendering emerging equilibrium when facing a crisis that restricts mobility.
Article
Full-text available
The pervasive adoption of GPS-enabled sensors has lead to an explosion on the amount of geolocated data that captures a wide range of social interactions. Part of this data can be conceptualized as event data, characterized by a single point signal at a given location and time. Event data has been used for several purposes such as anomaly detection and land use extraction, among others. To unlock the potential offered by the granularity of this new sources of data it is necessary to develop new analytical tools stemming from the intersection of computational science and geographical analysis. Our approach is to link the geographical concept of hierarchical scale structures with density based clustering in databases with noise to establish a common framework for the detection of crowd activity hierarchical structures in geographic point data. Our contribution is threefold: first, we develop a tool to generate synthetic data according to a distribution commonly found on geographic event data sets; second, we propose an improvement of the available methods for automatic parameter selection in density-based spatial clustering of applications with noise (DBSCAN) algorithm that allows its iterative application to uncover hierarchical scale structures on event databases and, lastly, we propose a framework for the evaluation of different algorithms to extract hierarchical scale structures. Our results show that our approach is successful both as a general framework for the comparison of crowd activity detection algorithms and, in the case of our automatic DBSCAN parameter selection algorithm, as a novel approach to uncover hierarchical structures in geographic point data sets.
Chapter
Elaborating increasing penchant for smart cities, this chapter takes into account linkages between urbanization and sustainable development to evaluate the concept of sustainable urban development along with brief appraisal of prevalent notions of cities like livable cities, eco-cities and their related components that make city life worth living. Thereafter, the study proceeds to examine prospects of sustainable smart cities, with specific focus on its constituents like smart mobility, smart economy, smart living, smart people, smart governance and smart environment. While assessing options available for cities to tackle the vagaries of climate change, the chapter seeks to present a case for ecosystem-based adaptation as a cost-effective, viable and durable option to deal with adverse impacts of climate change. Lastly, it suggests the implementation of New Urban Agenda of the UN-Habitat in tandem with sustainable development goal-11 as a way out.
Article
Full-text available
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intuitive,multimodal, user-centric, scientific applications that can aid and enable scientific research.
Article
Full-text available
Geospatial analysis is very much dominated by a Gaussian way of thinking, which assumes that things in the world can be characterized by a well-defined mean, i.e., things are more or less similar in size. However, this assumption is not always valid. In fact, many things in the world lack a well-defined mean, and therefore there are far more small things than large ones. This paper attempts to argue that geospatial analysis requires a different way of thinking - a Paretian way of thinking that underlies skewed distribution such as power laws, Pareto and lognormal distributions. I review two properties of spatial dependence and spatial heterogeneity, and point out that the notion of spatial heterogeneity in current spatial statistics is only used to characterize local variance of spatial dependence. I subsequently argue for a broad perspective on spatial heterogeneity, and suggest it be formulated as a scaling law. I further discuss the implications of Paretian thinking and the scaling law for better understanding of geographic forms and processes, in particular while facing massive amounts of social media data. In the spirit of Paretian thinking, geospatial analysis should seek to simulate geographic events and phenomena from the bottom up rather than correlations as guided by Gaussian thinking. KEYWORDS: Big data, scaling of geographic space, head/tail breaks, power laws, heavy-tailed distributions
Book
Spatial trajectories have been bringing the unprecedented wealth to a variety of research communities. A spatial trajectory records the paths of a variety of moving objects, such as people who log their travel routes with GPS trajectories. The field of moving objects related research has become extremely active within the last few years, especially with all major database and data mining conferences and journals. Computing with Spatial Trajectories introduces the algorithms, technologies, and systems used to process, manage and understand existing spatial trajectories for different applications. This book also presents an overview on both fundamentals and the state-of-the-art research inspired by spatial trajectory data, as well as a special focus on trajectory pattern mining, spatio-temporal data mining and location-based social networks. Each chapter provides readers with a tutorial-style introduction to one important aspect of location trajectory computing, case studies and many valuable references to other relevant research work. Computing with Spatial Trajectories is designed as a reference or secondary text book for advanced-level students and researchers mainly focused on computer science and geography. Professionals working on spatial trajectory computing will also find this book very useful.
Chapter
The ability to gather and manipulate real world contextual data, such as user location, in modern software systems presents opportunities for new and exciting application areas. A key focus among those working in the area of Location-Based services today has been the creation of social networks which allow mobile device users to exchange details of their personal location as a key point of interaction. While the initial interest in these services has been exceptionally high, they are plagued by the same challenges as all Location Based services, regarding the privacy and security of users and their data. This chapter aims to investigate the area of Location-Based Social Networks (LBSNs), with a view to documenting how they contribute to a new form of expertise due to the now accurate knowledge of where people are actually located at a moment in time.
Conference Paper
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of cloud computing, collaboration services, and research repositories will be discussed.
Article
History tells us that when you want something done you turn to a leader: right? Wrong. If you want to make a correct decision or solve a problem, large groups of people are smarter than a few experts. This brilliant and insightful book shows why the conventional wisdom is so wrong and why the theory of the wisdom of crowds has huge implications for how we run our businesses, structure our political systems and organise our society. Shrewd, meticulous and profound, The Wisdom of Crowds will change for ever the way you think about human behaviour.
Article
Four phases of interest in the distribution of city sizes are identified and current conflict in the literature is shown to be a consequence of poorly-selected units of observation. When urban regions are properly defined, US urban growth obeys Gibrat’s Law and the city size distribution is strictly Zipfian rank-size with coefficient q = 1.0. Care has to be taken with definition of the largest urban-economic regions, however; the fit in the upper tail of the distribution is best when they are recognized to be megalopolitan in scale.