ArticlePDF Available

The Evolution of Natural Cities from the Perspective of Location-Based Social Media

Authors:

Abstract and Figures

This paper examines the former location-based social medium Brightkite, over its three-year life span, based on the concept of natural cities. The term 'natural cities' refers to spatially clustered geographic events, such as the agglomerated patches aggregated from individual social media users' locations. We applied the head/tail division rule to derive natural cities. More specifically, we generated a triangulated irregular network, made up of individual unique user locations, and then categorized small triangles (smaller than an average size) as natural cities for the United States (mainland) on a monthly basis. The concept of natural cities provides a powerful means to develop new insights into the evolution of real cities, because there are virtually no data available to track the history of a city across its entire life span and at very fine spatial and temporal scales. Therefore, natural cities can act as a good proxy of real cities, in the sense of understanding underlying interactions, at a global level, rather than of predicting cities, at an individual level. Apart from the data produced and the contributed methods, we established new insights into the structure and dynamics of natural cities, e.g., the idea that natural cities evolve in nonlinear manners at both spatial and temporal dimensions. Keywords: Big data, head/tail breaks, ht-index, power laws, fractal, and nonlinearity
Content may be subject to copyright.
1
The Evolution of Natural Cities from the Perspective of Location-Based Social Media
Bin Jiang and Yufan Miao
Department of Technology and Built Environment, Division of Geomatics
University of Gävle, SE-801 76 Gävle, Sweden
Email: bin.jiang@hig.se, yufanmiao@gmail.com
(Draft: August 2013, Revision: September 2013, January 2014)
Abstract
This paper examines the former location-based social medium Brightkite, over its three-year life span,
based on the concept of natural cities. The term ‘natural cities’ refers to spatially clustered geographic
events, such as the agglomerated patches aggregated from individual social media users’ locations. We
applied the head/tail division rule to derive natural cities. More specifically, we generated a
triangulated irregular network, made up of individual unique user locations, and then categorized
small triangles (smaller than an average size) as natural cities for the United States (mainland) on a
monthly basis. The concept of natural cities provides a powerful means to develop new insights into
the evolution of real cities, because there are virtually no data available to track the history of a city
across its entire life span and at very fine spatial and temporal scales. Therefore, natural cities can act
as a good proxy of real cities, in the sense of understanding underlying interactions, at a global level,
rather than of predicting cities, at an individual level. Apart from the data produced and the contributed
methods, we established new insights into the structure and dynamics of natural cities, e.g., the idea
that natural cities evolve in nonlinear manners at both spatial and temporal dimensions.
Keywords: Big data, head/tail breaks, ht-index, power laws, fractal, and nonlinearity
1. Introduction
Once upon a time, there were no cities, only scattered villages. Over time, cities gradually emerged
through the interaction of people or residents; similarly, large or mega cities evolve through the
interaction of cities or people. This is a conjecture mentioned in Jiang (2013b), in which he argued that
geographic phenomena such as urban growth are essentially unpredictable. Many models in the
literature that claim to be able to predict urban growth are in effect for short-term prediction like the
weather forecast; weather forecast beyond five days is essentially unforecastable (Bak 1996). A typical
city may have hundreds of years of history, making it nearly impossible to track its growth
quantitatively because of a lack of related data. More important, a city grows within a system of cities;
one cannot understand a city’s growth without considering other related cities. In this paper, we
illustrate that emerging social media provide an unprecedented data source for studying the evolution
of natural cities (c.f., Section 2 for the definition), and subsequently for better understanding structure
and dynamics of real cities. Location-based social media, sometimes termed as location-based social
networks, such as Flickr, Twitter, and Foursquare (Traynor and Curran 2012, Zheng and Zhou 2011)
refer to a set of Internet-based applications founded on Web 2.0 technologies and ideologies that allow
users to create and exchange user-generated content. Location-based social media can act as a proxy of
real cities (or human settlements in general) and provide better understanding of underlying structure
and dynamics of human settlements.
Not a long ago, there were no social media, only scattered home pages and bulletin board systems
created and maintained by individuals and institutions (Boyd and Ellison 2008, Kaplan and Haenlein
2010). In the era of Web 1.0, geographic locations were not an issue. However, with Web 2.0,
geographic locations have been becoming an important feature of social media. Almost all social
media allow users to tag their geographic locations, often at the level of meters, when sharing and
2
exchanging user-generated content. Location-based social media enable users to track individual
historical trajectories, their friends, and even the growth of social media. Unlike with conventional
cities, the trajectories of social media are well documented by the hosting companies; and unlike
conventional census data, social media data is defined at individual level, often at very fine spatial and
temporal scales. Data can be obtained using crawling techniques or through the social media’s
officially released application programming interfaces (API). This study aimed to showcase how
social media’s time-stamped location data can be utilized to study the evolution of natural cities, and
thus, providing new insights into the underlying structure and dynamics of real cities.
The contribution of this paper can be seen from the three aspects: data, methods, and new insights.
This study produced a large amount of data regarding natural cities from the former social medium
Brightkite during its entire 31-month life span. The resulting data has significant value for further
study of city growth and allometric relationship between populations and physical extents (data, as
well as related source codes, from the study will be released upon acceptance of this paper). We drew
upon a set of fractal or scaling oriented methods to characterize natural cities. These unique methods
help create new insights into the evolution of natural cities as well as that of real cities. For example,
natural cities demonstrate a striking nonlinear property, spatially and temporally (see Section 5).
Moreover, the evolution of natural cities can provide better understanding of social media from a
unique geospatial perspective.
This study provides new perspectives, as well as different ways of thinking, to the study of cities and
city growth in the era of big data (Mayer-Schonberger and Cukier 2013). We did not adopt
conventional census data, but rather the emerging georeferenced social media data; we did not adopt
conventional geographic units or boundaries that are imposed from the top down by authorities, but
rather the naturally defined concept of natural cities, to avoid statistical bias out of the modifiable areal
unit problem (Openshaw 1984); and we did not rely on standard and spatial statistics with a
well-defined mean to characterize spatial heterogeneity, but rather power-law-based statistics, driven
by fractal and scaling thinking. Therefore, the underlying ways of thinking adopted in this study are
bottom up rather than top down, in terms of data and methods, nonlinear rather than linear, and fractal
rather than Euclidean in terms of the power-law statistics. Therefore, this study intends to argue that
geospatial analysis requires a different way of thinking while dealing with the problem of spatial
heterogeneity.
The remainder of this paper is structured as follows. Section 2 presents the methods in which we
define the concept of natural cities, and discuss ways of characterizing natural cities. Section 3
presents the data on a monthly basis and shows basic statistics of the data. Section 4 discusses on the
results and major findings, while Section 5 on the implications of the study. Finally, Section 6 draws a
conclusion and points to future work.
2. Methods
In this section, we illustrate and define the concept of natural cities and present various ways of
characterizing natural cities. We also discuss how natural cities differ from conventional cities and
why they represent a new way of thinking for geospatial analysis.
2.1 Defining natural cities
To approach the difficult task of defining and describing natural cities, we start with definitions of
conventional cities and try to clarify why the conventional definitions are not natural. A city is a
relatively large and permanent human settlement. But how large a settlement must be to qualify as a
city is unclear. For example, a city in Sweden may not qualify as a city in China. Also, many cities
have a particular administrative, legal, and historical status according to its local laws. In the United
States, for example, cities can refer to incorporated places, urban areas, or metropolitan areas with
sufficient population of, say, at least 10,000. This population threshold can be very subjective and is
dependent on the country. This subjectivity is also demonstrated in the physical boundaries of cities,
which are legally and administratively determined. Remotely sensed imagery provides new means to
3
delineate city boundaries, but how does one choose an appropriate pixel value as a cutoff for the
delineation? Because of these subjectivities, conventional definitions of cities are unnatural. How, then,
can we define a city in more natural ways?
We present three examples of natural cities before formally define the concept. In the first example,
natural cities are derived from massive street nodes, including both junctions and street ends. Given all
street nodes of an entire country, we can run an iterative clustering algorithm to determine whether a
node is within the neighbor of another node. For example, set a radius of 700 meters and continuously
draw a circle around each node to determine whether any other node is within its circle. This
progressive and exhaustive process results in many natural cities; see Figure 1a for an illustrative
example. In their study, Jiang and Jia (2011) found that millions of natural cities could be derived
from dozens of millions of street nodes in the United States using OpenStreetMap (OSM) data
(Bennett 2010). Instead of massive street nodes, the second example relies on a massive number of
street blocks to extract natural cities. Jiang and Liu (2012) adopted the three largest European
countries: France, Germany, and the UK for their case studies, again using OSM data. The idea is
illustrated in Figure 1b in which small blocks (smaller than an average city block) constitute a natural
city. Although this method sounds very simple, the computation is very intensive for each country, and
involves millions of street blocks. The third example comes from Jiang and Yin (2014), in which the
authors relied on nighttime imagery to derive natural cities. The author took all pixel values (millions
of pixels each valued between 0 and 63) of an image in the United States and computed an average
value or mean. The mean split all the pixels into two: those above the mean, and those below the mean.
For the pixels above the mean, a second mean was obtained, and it can be a meaningful cutoff for
delineating natural cities.
Figure 1: (Color online) Natural cities based on (a) street nodes and (b) street blocks
(Note: Blue rectangles are the boundaries of the natural cities, which are composed of high-density
nodes or small street blocks based on the head/tail division rule (Jia and Jiang (2010))
These examples of deriving natural cities point out the importance of the mean’s effect, which is based
on the head/tail division rule: Given a variable X, if its values x follow a heavy tailed distribution, then
the mean (m) of the values can divide all the values into two parts: a high percentage in the tail, and a
low percentage in the head (Jiang and Liu 2012). The heavy tailed distribution refers to the statistical
distributions that are right-skewed, for example, power law, lognormal, and exponential. Obviously,
the density of street nodes, the size of street blocks, and the nighttime imagery pixel values all exhibit
a heavy tailed distribution, which implies that there are far more small things than large ones. In this
paper, we introduce an additional way of deriving natural cities: from individual users’ geographical
data of location-based social media. From unique users who check in from locations across an entire
country, we can build up a huge triangular irregular network (TIN), and then categorize these small
triangles (smaller than a mean) as natural cities (Figure 2); refer to the Appendix for a short tutorial.
Section 5 includes a discussion of why the head/tail division rules works so well in delineating natural
cities.
4
Figure 2: (Color online) Procedure of generating natural cities (red patches) from points through TIN
Based on these examples, a formal definition of natural cities can be derived. Natural cities refer to
human settlements or human activities in general on Earth’s surface that are objectively or naturally
defined and delineated from massive geographic information of various kinds, and based on the
head/tail division rule. Unlike conventional cities, natural cities do not need to meet a minimum
population requirement. A one-person settlement may constitute a natural city, or even zero people, if
natural cities are defined not according to human population, but something else. For example, when
natural cities are defined according to street nodes, a natural city derived from one street node may
have no people there at all. The reader may question whether this definition makes sense, but the
definition makes good sense because it provides a new perspective for geospatial analysis, and helps
us develop new insights into geographic forms and processes (see Sections 4 and 5). That is also the
reason that we use the term natural cities to refer to human settlements or human activities in general
on the Earth’s surface. With the concept of natural cities, we abandon the top-down imposed unnatural
geographic units or boundaries such as states, counties, and cities, in order to study geographic forms
and processes more scientifically.
2.2 Characterizing natural cities
The rank-size distribution of cities in a region can be well characterized by Zipf’s law, i.e., an inverse
power relationship between city rank (r) and city size (N), N = r ^ -1 (Zipf 1949). Simply put, when
ranking all cities in a decreasing order for a given country, the largest city is twice as big as the second
largest, three times as big as the third largest, and so on. In other words, a city’s size by population is
inversely proportional to its rank. Such a simple and neat law is found to hold remarkably well for
almost all countries or regions (e.g., Berry and Okulicz-Kozaryn 2011), although some researchers
have challenged its universality (e.g., Benguigui and Blumenfeld-Leiberthal 2011). Essentially, Zipf’s
law indicates two aspects: (1) a power-law relationship between rank and size, and (2) the Zipf’s
exponent of one. Most previous studies have confirmed the first aspect, but not the second; the Zipf’s
exponent was found to deviate from one. In other words, the first aspect is not as much controversial
as the second aspect. Some researchers argued that Zipf’s law was primarily used for characterizing
large cities rather than all cities. In this study, we chose large natural cities (larger than a mean) to
examine whether they followed Zipf’s law. The scaling patterns of far more small cities than large
ones underlie Zipf’s law — a majority of small cities, while a minority of large cities. More important,
the scaling pattern recurs not just once, but multiple times for those large cities, again and again. This
is the basis of head/tail breaks (Jiang 2013), a novel classification scheme for data with a heavy tailed
distribution. In what follows, we illustrate head/tail breaks with a working example.
Table 1: Head/tail breaking statistics for the TIN edges
EdgesMean#Head%Head#Tail%Tail
5042.213527%36973%
1356.23526%10074%
3513.41337%2263%
1320.7323%1077%
333.2133%267%
5
The triangulated irregular network shown in Figure 2 apparently seems to contain far more short edges
than long ones, and indeed, this is true. There are 504 edges, ranging from the shortest 0.001 to the
longest 46.752. The wide range 46.751 = 46.752 – 0.001 and the large ratio 46,752 = 46.752/0.001
clearly indicate far more short edges than long ones. The average length of the 504 edges is 2.2, which
splits all the edges into two unbalanced parts: 135 in the head (27 percent) and 369 in the tail (73
percent). This head/tail breaking process can be continued for the head again and again, as shown in
Table 1. Eventually, the scaling pattern of far more short edges than long ones recurs five times, three
of which are plotted in Figure 3, or so-called nested rank-size plots. Given that the scaling pattern
recurs five times, the ht-index is six. Note that ht-index (Jiang and Yin 2014) is an alternative index to
fractal dimension (Mandelbrot 1983) used to capture the complexity of geographical features.
Figure 3: (Color online) Nested rank-size plots for the first three hierarchical levels with respect to the
first three rows in Table 1
(Note: The x axis and y axis represent rank and size respectively. The largest plot contains the 504
edges, the red being the first head (135 edges) and the blue being the first tail (369 edges). The 135
edges are plotted again with the red representing 35 in the second head and the blue 100 in the second
tail. The smallest plot is for the 35 edges in the second head.)
Head/tail breaks or ht-index provides a simple yet effective means to characterize natural cities, or
data in general with a heavy tailed distribution for mapping purposes. The derived ht-index captures
the hierarchy or scaling hierarchy of the data. For mapping purposes, head/tail breaks is superior to
conventional classification methods for capturing the underlying scaling pattern (Jiang 2013).
Ht-index complements to fractal dimension for characterizing the complexity of geographic features or
fractals in general.
3. Data and Data Processing
As stated above, the data for this study came from the former location-based social medium Brightkite,
during its three-year (31 months to be more precise) life span, from April 2008 to October 2010 (Cho,
Myer, and Leskovec 2011). The case included 2,837,256 locations in the mainland United States.
From the amount of locations, we removed duplicate locations, obtained 412,961 unique locations for
generating a TIN, and then 8,307 natural cities as of October 2010, by following the procedure shown
in Figure 2, as well as the short tutorial in the Appendix. The location data was time stamped (Table 2),
so we were able to slice all these locations monthly in an accumulated manner, i.e., locations at month
mi+1 contain all locations between months m1 and mi, where 131. For each time interval or
snapshot, we generated a set of natural cities ranging from dozens to thousands. For some snapshots,
we had to split data into small pieces, and put them back together to ArcGIS for visualization and
analysis. For example, Figure 4 illustrates the 8,307 natural cities as of October 2010, showing their
boundaries and populations. Note that this is just one of the 31 snapshots or datasets in the study.
6
Table 2: Initial check-in data format
User Chec
k
‐intime Latitude Longitude Locationid
58186 2008‐12‐03T21:09:14Z 39.633321 ‐105.317215 ee8b88dea22411
58186 2008‐11‐30T22:30:12Z 39.633321 ‐105.317215 ee8b88dea22411
58186 2008‐11‐28T17:55:04Z ‐13.158333 ‐72.531389 e6e86be2a22411
58186 2008‐11‐26T17:08:25Z 39.633321 ‐105.317215 ee8b88dea22411
58187 2008‐08‐14T21:23:55Z 41.257924 ‐95.938081 4c2af967eb5df8
Figure 4: (Color online) The largest set of natural cities as of October 2010 (red patches for boundaries
and red dots for populations) on the background of TIN (gray lines) generated from 412,961 unique
location points or 2,837,256 duplicate ones
Table 3: Measurements and statistics from location points to natural cities for the different time
intervals
(Note: Pnt = # of points, PntUniq = # of unique points, TINEdge = # of TIN edges, Mean = Average
length of TIN edges, NaturalCity = # of natural cities)
TimePntPntUniqTINEdgeMeanNaturalCity
2008043784119935805404344
2008053543174012217818900192
20080669670124893744313971274
200807105381171195133111713345
200808139248217186512610229458
20080917013426716801238741532
200810211612334381002897537660
200811263919422161266216570854
2008123143395070615208959951029
2009013645845869917606954791158
2009024117716556219666251271295
2009034677157340822020048091467
2009042375827353176105950821397088
2009052430842359243107770921197217
2009062480720364760109425720997306
2009072531199370992111295320797460
2009082577445376811113041020617579
2009092618017381447114432120477669
2009102646890386286115883520307761
2009112674335390462117136320167830
2009122697482393907118169820057923
2010012717638396763119026919957988
2010022734260398964119687219878040
2010032752153401562120466619788082
2010042766773403795121136219708117
2010052779889405814121741919658149
2010062793472407711122311019618206
7
2010072806640409487122843719568259
2010082820373411194123356119518303
2010092831894412407123719719478301
2010102837256412961123885919458307
Table 3 lists some basic measurements and statistics from the location points to the natural cities. For
example, for the first month, April 2008, only 44 natural cities were generated from 3,784 locations, of
which 1,199 unique locations were used for generating a TIN with 3,580 edges, and a mean of 54,043
as the cutoff to derive the 44 natural cities. The number of natural cities increased to 8,307 as of
October 2010. During the 31 months, natural cities increased rapidly at some instances, e.g., over four
time increments from April to May 2008 and from March to April 2009. We do not know why there
are such rapid increments, but it could relate to advertising effects. In addition, there was a slight drop
in the number of natural cities from August to September 2010. In the following section, we utilize the
seven time intervals highlighted in Table 3 for a detailed discussion of our findings.
4. Results and Discussions
Before discussing the findings, we map the natural cities at the seven time intervals (or snapshots) for
four largest natural cities surrounding Chicago, New York, San Francisco, and Los Angeles. These are
shown in Figure 5, which illustrates clearly how the four cities or regions grew or expanded during the
31-month period. All parts of the country can be assessed for similar patterns of growth and evolution.
We know little about why the procedure shown in Figure 2, as well in the Appendix, works so well,
but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities.
On the one hand, the natural cities expanded towards more fragmented pieces, far more small pieces
than large ones. On the other hand, the physical boundaries of the natural cities tended to become
more irregular over time. These two aspects suggest that the natural cities are fractal, and become
more and more fractal, resembling very well real cities (Batty and Longley 1994). These two aspects
are further discussed in the following.
Figure 5: (Color online) Evolution of the natural cities near the four largest cities regions with TIN as
a background
These results can be assessed from both global and local perspectives. Globally, all the natural cities in
the United States exhibit a power-law distribution. This is shown rank-size plots (Figure 6), in which
the distribution lines are very straight for all the natural cities at the different time intervals in the
8
log-log plots. The natural cites as of April 2008, except the smallest with less than 12 people, exhibited
a clear power law, probably the straightest distribution among all others. This result is the same in May
2008. However, the distribution lines from October 2008 to October 2010 are less straight, indicating
that a few of the largest natural cities did not fit well the power-law distribution. This is particularly
obvious for the last two snapshots in April 2009 and October 2010. A possible reason for this
difference, moving from a striking to a less striking power law, is described below.
Figure 6: (Color online) Rank-size plot for the natural cities
In further examinations, we looked at the large cities (larger than the mean) in each snapshot and
found that Zipf’s exponent was indeed around one for the first two months (0.98, and 1.08), and then
greater than one by about 0.25 (Table 4). Considering the duality of Zipf’s law, this result suggests that
Zipf’s law held remarkably well for the first two months, but less so for the remaining months. We
postulated a possible reason: The social medium users at the first two months increased proportionally
with the populations of real cities, thus leading to a striking Zipf’s law effect among the natural cities
because the populations of real cities are power-law distributed. Over time, large cities — particularly
a few of the largest cities such as New York — did not capture the other cities in attracting more users.
In other words, beyond the first two months, the increase in social medium users became less
proportional to the real cities’ populations. As a result, Zipf’s law is less striking. We assess this point
further in the discussion of our findings from a local perspective below. In contrast to small deviations
of Zipf’s exponent, the ht-index increased from four to seven (Table 4). Note that ht-index is a
measure for characterizing complexity of fractals or of geographic features (Jiang and Yin 2014). The
increment of the ht-index implies that more hierarchical levels were added, reflecting well the
evolution of the natural cities and of the social medium.
Table 4: Zipf’s exponent and ht-index for the natural cities
2008‐04 2008‐05 2008‐10 2008‐12 2009‐03 2009‐04 2010‐10
Zipf's
exponent 0.98 1.08 1.26 1.24 1.26 1.27 1.25
Ht‐index 4 4 4 5 6 6 7
Locally, there are two points to discuss. First, the boundaries of the natural cities became more
irregular over time, very much like the Koch curve when the iteration goes up. For example, the
boundaries of the natural cities as of April 2008 were simple enough to be described by Euclidean
geometry. However, over time, the boundaries must be characterized by fractal geometry — more
fragmented with more fine scales added. Second, large natural cities tended to become larger and
larger, while small ones continuously emerged at local levels. Figure 5 illustrates this finding in a less
striking manner, as the city sizes are measured by the physical extents. But if the city sizes are
9
measured by population as in Figure 7, we noticed the rapid increases for the four largest cities.
Overall, the four cities tended to become larger and larger, but there was a major difference among the
four. To illustrate the difference, we must clarify that Figure 7 adopts the graduated dots to represent
the city sizes, which are classified according to head/tail breaks. This is because the city sizes
exhibited a heavy tailed distribution, or there were far more small cities than large ones. Therefore, the
dot sizes in Figure 7 do not represent city sizes, strictly speaking, but rather, the corresponding classes
to which the cities belong. Notice that the largest natural city in the New York region in October 2010
appears smaller than in April 2009, which indicates that the natural city belonged to a higher class in
April 2009 than in October 2010. This is indeed true! Table 5 clearly indicates that the New York
natural city in April 2009 belonged to the sixth among the six classes, while its position in October
2010 dropped to the fifth among the seven classes. This finding also describes what we stated above: A
few of the largest cities did not capture the others in attracting more users.
Figure 7: (Color online) Evolution of the natural cities in terms of populations (or points)
near the four largest cities regions
Table 5: Evolution of the four cities within the system of the natural cities
(Note: a/b where a and b respectively denote the class the particular city belongs to, and the total
number of classes or the ht-index)
2008‐04 2008‐05 2008‐10 2008‐12 2009‐03 2009‐04 2010‐10
Chicago 1/4 2/4 3/4 3/5 3/6 4/6 4/7
NewYor
k
2/4 3/4 3/4 4/5 4/6 6/6 5/7
SanFrancisco 3/4 4/4 4/4 5/5 5/6 6/6 6/7
LosAngeles 3/4 3/4 4/4 4/5 5/6 6/6 7/7
The above results or findings can be summarized by nonlinearity, which is reflected in both spatial and
temporal dimensions. Spatially, the natural cities were distributed heterogeneously or unevenly, i.e.,
there were far more small cities than large ones. This uneven distribution also was seen in the temporal
dimension. For example, within the first 10 months of 2008, the natural cities already had taken the
shapes of individual cities (Figure 5), with populations continuously growing, and small natural cities
being added persistently for the remaining time. In other words, it took just one third of the social
medium’s lifetime to determine the shapes of individual cities. That is also the reason that we chose
the seven unequal time intervals to examine the evolution.
10
5. Implications of the Study
The location-based social media provide large amounts of location data of significant value for
studying human activities in the virtual world, as well as on the Earth’s surface. Nowadays, the social
sciences — human geography in particular — benefit considerably from emerging social media data
that are time-stamped and location-based. The ways of doing geography and social sciences are
changing! The emerging big data harvested from social media, as well as from positioning and
geospatial technologies, coupled with data-intensive computing (Hey, Tansley, and Tolle 2009) are
transforming conventional social sciences into computational social sciences (Lazer et al. 2009). In
this section, we discuss some deep implications of this study for geography and social sciences in
general.
The notion of natural cities implies a sort of bottom-up thinking in terms of data collection and
geographic units or boundaries. Conventional geographic data collected and maintained from the top
down by authorities are usually sampled and aggregated, and therefore, are small-sized. On the other
hand, new data harvested from social media are massive and individual, so they are called ‘big data.’
Time-stamped and location-based social media data, supported by Web 2.0 technologies and
contributed by individuals through humans as sensors (Goodchild 2007), constitute a brilliant new data
source for geographic research. Conventional geographic units or boundaries are often imposed from
the top down by authorities or centralized committees, while natural cities are defined and delineated
objectively in some natural manner, based on the head/tail division rule. This natural manner
guarantees that we can see a true picture of urban structure and dynamics, and suggests the
universality of Zipf’s law. This true picture is fractal and can be illustrated in this example: Throw
forcefully a wine glass on a cement ground, and it will very likely break into a large number of pieces.
Like the natural cities, these glass pieces are fractal or follow Zipf’s law: On the one hand, there are
far more small pieces than large ones, and on the other hand, each piece has an irregular shape.
The evolution of natural cities demonstrates nonlinearity at both spatial and temporal dimensions, or
equivalently from both static and dynamic points of view. Many phenomena in human geography, as
well as in physical geography, bear this nonlinearity (Batty and Longley 1994, Frankhauser 1994,
Chen 2009, and Phillips 2003). However, we are still very much constrained by linear thinking,
explicitly or implicitly, consciously or unconsciously. For example, we rely on Euclidean geometry to
describe Earth’s surface, and on a well-defined mean to characterize spatial heterogeneity. Our
mindsets apparently lag behind the advances of data and technologies. Conventional linear thinking is
not suitable for describing the Earth’s surface (the geographic forms), not to mention uncovering the
underlying geographic processes. Instead, we should adopt nonlinear thinking, or nonlinear
mathematics such as fractal geometry, chaos theories, and complexity for geographic research. The
tools adopted in this study, such as head/tail division rule, head/tail breaks, and ht-index, underlie
nonlinear mathematics and power-law-based statistics. These nonlinear mathematical tools help to
elicit new insights into the evolution of natural cities. Nonlinearity also implies that geographic forms
and processes are unpredictable like long-term weather or climate in general. To better predict and
understand geographical phenomena, we must seek to uncover the underlying mechanisms through
simulations rather than simple correlations.
The head/tail division rule is intellectually exciting because it appears to be both powerful and
mysterious. The reason why the head/tail division rule is an effective tool to derive natural cities, in
particular at the different time stages, remains an open question. However, we tend to believe it is the
effect of the wisdom of crowds — the diverse and heterogeneous many are often smarter than the few,
even a few experts (Surowiecki 2004). The massive amount of edges (up to 1,238,859) of the
generated TIN from the massive location points constituted the ‘crowds,’ and they collectively decided
an average cutoff for delineating the natural cities. Every single edge had ‘its voice heard’ in the
democratic decision. From the effectively derived natural cities, we can see an advantage of working
with big data. If we had not worked with the entire US data set, but only an area surrounding New
York for example, we would not have been able to determine a sensible cutoff for delineating the New
York natural city. Only with the big data that includes all location points or all edges can a meaningful
cutoff be determined and applied to all. In this sense, the approach to delineating natural cities is
11
holistic and bottom up, with participation of all diverse and heterogeneous individuals.
It is important to note that the check-in users are biased towards certain types of people. Thus the
derived natural cities are not exactly the same as the corresponding real cities. However, no one can
deny that the boundaries shown in Figure 5 are not those of Chicago, New York, San Francisco, and
Los Angeles, in particular with respect to the last time interval 2010-10. One can simply cross check
Google Maps to see how the cities or regions look like. On the other hand, this paper is not to study
real cities, at an individual level, on how they can be captured or predicted by the natural cities, but to
understand, at a collective level, underlying mechanisms of agglomerations, formed either by people
in physical space (real cities), or by the check-in users in virtual space (natural cities). In other words,
we consider cities (either real or natural cities) as an emergence (Johnson 2002) developed from
interactions of individual people from the bottom up. We believe that the insights developed social
media data can be applied to real cities, e.g., fractal structure and nonlinear dynamics. The fact that not
all people are the check-in users should not be considered a biased sampling issue. Sampling is an
inevitable technique at the time of information scarcity, so called the small data era, but it is not
legitimate concept in the big data era. The large social media data implies N=all (Mayer-Schonberger
and Cukier 2013). This N=all is an essence of big data. Given the 2.8 millions of check-in locations,
the social media can be a good proxy for studying the evolution of real cities in the country.
We face an unprecedented golden era for geography, or social sciences in general, with the wave of
social media and, in particular, the increasing convergence of social media and geographic information
science (Sui and Goodchild 2011). For the first time in history, human activities can be documented at
very fine spatial and temporal scales. In this study, we sliced the data monthly, but we certainly could
have done so weekly, daily, and even hourly. We believe that the observed nonlinearity at the temporal
dimension would be even more striking. This, of course, warrants further study. Geographers should
ride the wave of social media and develop a more computationally minded geography or
computational geography (Openshaw 1998). If we do not seize this unique opportunity, we may risk
being purged from the sciences. The rise of computational social science is a timely response to the
rapid advances of data and technologies. In fact, physicists and computer scientists already have been
working on this exciting and rapidly changing domain (see Brockmann, Hufnage, and Geisel 2006;
and Zheng and Zhou 2011). We geographers should do more rather than less.
6. Conclusion
Driven by the lack of data for tracking the evolution of cites, this study demonstrated that emerging
location-based social media such as Flickr, Twitter, and Foursquare can act a proxy for studying and
understanding underlying evolving mechanisms of cities. Compared with conventional census data
that are usually sampled, aggregated, and small, the time-stamped and location-based social media
data can be characterized as all, individual, and big. In this paper, we abandoned conventional
definitions of cities, and adopted objectively or naturally defined natural cities, using massive
geographic information of various kinds, and based on the head/tail division rule. Built on the notion
of the wisdom of crowds, the head/tail division rule works very well to establish a meaningful cutoff
for delineating natural cities. Natural cities provide an effective means or unique perspective to study
human activities for better understanding of geographic forms and processes.
We examined the evolution of natural cities, derived from massive location points of the social
medium Brightkite, during its 31-month life span. We found nonlinearity during the evolution of
natural cities in both spatial and temporal dimensions, and the universality of Zipf’s law. We archived
all the data that could be of further use for developing and verifying urban theories. This study has
deep implications for geography and social sciences in light of the increasing amounts of data that can
be harvested from location-based social media. Therefore, we call for the application of nonlinear
mathematics, such as fractal geometry, chaos theories and complexity to geographic and social science
research. A limitation of this study lies in the data that shows only the social medium’s continuous rise
and not its decline. Brightkite seemed to disappear overnight. Future research should concentrate on
development of power-law-based statistics, and underlying nonlinear mathematics, to manage the
12
increasing social media data and on agent-based simulations to reveal the mechanisms for the
evolution of natural cities.
Acknowledgement
An early version of paper was presented as a keynote address entitled “The evolution of natural cities:
a new way of looking at human mobility”, at Mobile Ghent '13, 23-25 October 2012, University of
Ghent, Belgium. XXXX
References:
Bak P. (1996), How Nature Works: The science of self-organized criticality, Springer-Verlag: New
Yo rk .
Batty M. and Longley P. (1994), Fractal Cities: A geometry of form and function, Academic Press:
London.
Benguigui L., and Blumenfeld-Leiberthal E. (2011), The end of a paradigm: Is Zipf’s law universal?
Journal of Geographical Systems, 13, 87–100.
Bennett J. (2010), OpenStreetMap: Be your own cartographer, PCKT Publishing: Birmingham.
Berry B. J. L. and Okulicz-Kozaryn A. (2011), The city size distribution debate: Resolution for US
urban regions and megalopolitan areas, Cities, 29, S17-S23.
Boyd D. M. and Ellison N. B. (2008), Social network sites: Definition, history, and scholarship,
Journal of Computer-Mediated Communication, 13, 210 – 230.
Brockmann D., Hufnage L., and Geisel T. (2006), The scaling laws of human travel, Nature, 439, 462
– 465.
Chen Y. (2009), Spatial interaction creates period-doubling bifurcation and chaos of urbanization,
Chaos, Solitons & Fractals, 42(3), 1316-1325.
Cho E., Myers S. A., and Leskovec J. (2011), Friendship and mobility: user movement in
location-based social networks, Proceedings of the 17th ACM SIGKDD international conference
on Knowledge discovery and data mining, ACM: New York, 1082-1090.
Frankhauser P. (1994), La Fractalité des Structures Urbaines, Economica: Paris.
Goodchild M. F. (2007), Citizens as sensors: The world of volunteered geography, GeoJournal, 69(4),
211 -221.
Hey T., Tansley S., and Tolle K. (2009), The Fourth Paradigm: Data intensive scientific discovery,
Microsoft Research: Redmond, Washington.
Jia T. and Jiang B. (2010), Measuring urban sprawl based on massive street nodes and the novel
concept of natural cities, Preprint: http://arxiv.org/abs/1010.0541.
Jiang B. (2013), Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution,
The Professional Geographer, 65 (3), 482 – 494.
Jiang B. and Jia T. (2011), Zipf's law for all the natural cities in the United States: a geospatial
perspective, International Journal of Geographical Information Science, 25(8), 1269-1281.
Jiang B. and Liu X. (2012), Scaling of geographic space from the perspective of city and field blocks
and using volunteered geographic information, International Journal of Geographical
Information Science, 26(2), 215-229.
Jiang B. and Yin J. (2014), Ht-index for quantifying the fractal or scaling structure of geographic
features, Annals of the Association of American Geographers, xx, xx-xx, preprint:
http://arxiv.org/abs/1305.0883
Johnson S. (2002), Emergence: The Connected Lives of Ants, Brains, Cities, and Software, Scribner:
New York.
Kaplan A. M. and Haenlein M. (2010), Users of the world, unite! The challenges and opportunities of
social media, Business Horizons, 53, 59—68.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contractor,
N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., and Van Alstyne, M.
(2009), Computation social science, Science, 323, 721-724.
Mandelbrot B. (1982), The Fractal Geometry of Nature, W. H. Freeman and Co.: New York.
Mayer-Schonberger V. and Cukier K. (2013), Big Data: A revolution that will transform how we live,
13
work, and think, Eamon Dolan/Houghton Mifflin Harcourt: New York.
Openshaw S. (1984), The Modifiable Areal Unit Problem, Geo Books: Norwick Norfolk.
Openshaw S. (1998), Towards a more computationally minded scientific human geography,
Environment and Planning A, 30, 317-332.
Phillips J. D. (2003), Sources of nonlinearity and complexity in geomorphic systems, Progress in
Physical Geography, 27(1), 1–23.
Sui D. and Goodchild M. (2011), The convergence of GIS and social media: challenges for GIScience,
International Journal of Geographical Information Science, 25(11), 1737–1748.
Surowiecki J. (2004), The Wisdom of Crowds: Why the Many Are Smarter than the Few, ABACUS:
London.
Traynor D. and Curran K. (2012), Location-based social networks, In: Lee I. (editor, 2012), Mobile
Services Industries, Technologies, and Applications in the Global Economy, IGI Global:
Hershey, PA, 243 - 253.
Zheng Y. and Zhou X. (editors, 2011), Computing with Spatial Trajectories, Springer: Berlin.
Zipf G. K. (1949), Human Behavior and the Principles of Least Effort, Addison Wesley: Cambridge,
MA.
Appendix: Tutorial on How to Derive Natural Cities based on ArcGIS
This tutorial aims to show, in a step by step fashion with ArcGIS, how to derive natural cities using the
first month data (2008-04) as an example. Once you have got the check-in data of the first month,
transfer them into an Excel sheet with two columns namely x, y, respectively representing longitude
and latitude. Add a third column z, and set all column values as one (or any arbitrary value since
ArcGIS relies on 3D points for creating a TIN). Insert the Excel sheet as a shape file data layer in
ArcGIS (Figure A1(a)). Create a TIN from the point layer, using ArcToolbox > 3D analyst tools > TIN
management > Create TIN (Figure A1(b)).
Figure A1: Screen snapshots including (a) the 1199 unique points, (b) the TIN from the 1199 points, (c)
the selected edges shorter than the mean 54042.8, and (d) the 44 natural cities created
Convert the TIN into TIN edge, using ArcToolbox > 3D analyst tools > Conversion > From TIN > TIN
Edge. The converted TIN edge is a polyline layer. Right click the polyline layer to Open Attribute
Table in order to get statistics about the length of the edges. Figure A2 shows that the frequency
distribution that is apparently L-shaped, indicating that there are far more short edges than long ones.
Note that the mean is 54042.8.
14
Figure A2: Statistics of TIN edges
Select those shorter edges than this mean 54042.8 following menu Selection > Select by Attributes... .
The selected edges are highlighted in Figure A1(c). The selected shorter edges refer to high density
locations. Dissolve all the shorter edges into polygons to be individual natural cities (Figure A1(d)),
following ArcToolbox > Data Management Tools > Generalization > Dissolve, or alternatively
following menu Geoprocessing > Dissolve (the option ‘Create multipart features’ should be
unchecked).
The above steps all can be done with the existing ArcGIS functions, since the first month data is small
enough. However, we cannot fulfill all the processes simply using existing ArcGIS functions for some
later months. This is because more check-in points are added cumulatively. In this case, we must spit
the data into small pieces and put them back again into ArcGIS with some simple codes. All the
related codes together the natural cities data are archived at
https://sites.google.com/site/naturalcitiesdata/.

Supplementary resource (1)

... These interactions can take the form of trade, commerce, social connections, and political activities across borders. Urban boundaries that respect human interaction space are important to city planning, traffic management and resource allocation , Jiang and Miao 2015, Long et al. 2015. Many studies adopt a "bottom-up" approach to urban boundary delineation, where the geographical space is partitioned into small units and each unit is represented as a node within a network structure. ...
... Urban regions are discrete components in a greater set of regions, with or without physical boundaries separating them (Jiang and Miao 2015). For political and administrative purposes, government agencies define various boundaries to partition urban space into spatial units at different scales, for instance: counties, census tracts and electoral districts. ...
... As connections are made between these units via various human activities, such as social-economic relations and commute patterns of citizens, certain units become more strongly connected than others. The boundaries of the agglomeration of these units are argued to reflect how people naturally interact with the geographical space, which is important for city planning (Hollenstein and Purves 2010), urban growth evaluations (Jiang andMiao 2015, Long et al. 2015), and traffic management . ...
Preprint
Existing urban boundaries are usually defined by government agencies for administrative, economic, and political purposes. Defining urban boundaries that consider socio-economic relationships and citizen commute patterns is important for many aspects of urban and regional planning. In this paper, we describe a method to delineate urban boundaries based upon human interactions with physical space inferred from social media. Specifically, we depicted the urban boundaries of Great Britain using a mobility network of Twitter user spatial interactions, which was inferred from over 69 million geo-located tweets. We define the non-administrative anthropographic boundaries in a hierarchical fashion based on different physical movement ranges of users derived from the collective mobility patterns of Twitter users in Great Britain. The results of strongly connected urban regions in the form of communities in the network space yield geographically cohesive, non-overlapping urban areas, which provide a clear delineation of the non-administrative anthropographic urban boundaries of Great Britain. The method was applied to both national (Great Britain) and municipal scales (the London metropolis). While our results corresponded well with the administrative boundaries, many unexpected and interesting boundaries were identified. Importantly, as the depicted urban boundaries exhibited a strong instance of spatial proximity, we employed a gravity model to understand the distance decay effects in shaping the delineated urban boundaries. The model explains how geographical distances found in the mobility patterns affect the interaction intensity among different non-administrative anthropographic urban areas, which provides new insights into human spatial interactions with urban space.
... This makes big data unique and powerful for developing new insights into geographic forms and processes (e.g. Jiang and Miao 2015). On the other hand, big data poses enormous challenges in terms of data representation, structuring, and analytics. ...
... They are street nodes and tweet locations between June 1-8, 2014 (Table 2). The street nodes were extracted from OpenStreetMap for building up natural cities (Jiang and Miao 2015) as a living structure, while tweet locations are used to verify if they can be predicted by the living structure. The street nodes refer to both street junctions and ending nodes. ...
Preprint
Inspired by Christopher Alexanders conception of the world - space is not lifeless or neutral but a living structure involving far more small things than large ones a topological representation has been previously developed to characterize the living structure or the wholeness of geographic space. This paper further develops the topological representation and living structure for predicting human activities in geographic space. Based on millions of street nodes of the United Kingdom extracted from OpenStreetMap, we established living structures at different levels of scale in a nested manner. We found that tweet locations at different levels of scale, such as country and city, can be well predicted by the underlying living structure. The high predictability demonstrates that the living structure and the topological representation are efficient and effective for better understanding geographic forms. Based on this major finding, we argue that the topological representation is a truly multi-scale representation, and point out that existing geographic representations are essentially single scale, so they bear many scale problems such as modifiable areal unit problem, the conundrum of length, and the ecological fallacy. We further discuss on why the living structure is an efficient and effective instrument for structuring geospatial big data, and why Alexanders organic worldview constitutes the third view of space. Keywords: Organic worldview, topological representation, tweet locations, natural cities, scaling of geographic space
... The natural cities used in this paper are naturally and automatically delineated from a large amount of street blocks, usually all street blocks of a country, although natural cities could be defined with other big data such as social media location data and nighttime images Miao 2015, Jiang et al. 2015). In general, natural cities are defined as human settlements or human activities in general on Earth's surface that are objectively or naturally delineated from massive geographic information of various kinds, and based on the head/tail breaks -a relatively new classification for data with a heavy tailed distribution (Jiang 2013). ...
... Large amounts of social media data enable us to study human activities on very fine spatial and temporal scales (e.g., Jiang and Miao 2015). Instead of aggregating the data into existing geographic units such as census tracts, we assigned individual locations into auto-detected city blocks to study the spatial distribution of tweets in cities. ...
Preprint
Social media outlets such as Twitter constitute valuable data sources for understanding human activities in the virtual world from a geographic perspective. This paper examines spatial distribution of tweets and densities within cities. The cities refer to natural cities that are automatically aggregated from a country's small street blocks, so called city blocks. We adopted street blocks (rather than census tracts) as the basic geographic units and topological center (rather than geometric center) in order to assess how tweets and densities vary from the center to the peripheral border. We found that, within a city from the center to the periphery, the tweets first increase and then decrease, while the densities decrease in general. These increases and decreases fluctuate dramatically, and differ significantly from those if census tracts are used as the basic geographic units. We also found that the decrease of densities from the center to the periphery is less significant, and even disappears, if an arbitrarily defined city border is adopted. These findings prove that natural cities and their topological centers are better than their counterparts (conventionally defined cities and city centers) for geographic research. Based on this study, we believe that tweet densities can be a good surrogate of population densities. If this belief is proved to be true, social media data could help solve the dispute surrounding exponential or power function of urban population density. Keywords: Big data, natural cities, street blocks, urban density, topological distance
... The administrative division is determined by government agencies to serve politics and administration. However, there may be several subjective operations in the process of determining the boundary [20], which may result in some improper divisions. For urban citizens, their perception of urban space depends on their activities in the urban [21], which weakens the impression for official boundaries. ...
... There have been studies exploiting human activities across the urban space to assess the effectiveness of urban growth boundaries [24]. Following these, Jiang et al. [20] propose the 'natural city' which refers to the environment formed by human activities. Under these circumstances, existing studies have sought to employ a good proxy to characterize the human activities across urban regions, and then resort to the network-based approach to detect objective urban boundaries. ...
Article
Full-text available
Administrative divisions are regional divisions of the state for the purpose of hierarchical administration. In recent years, the process of urbanization has greatly promoted the urban development. This development is not only reflected in the expansion of urban areas but also in economic and social patterns. All these changes affect the way the urban operates. Then, a concern arising from the changing urban dynamics is that whether current administrative division accords with urban development? Existing studies conceptualize the urban space as the environment created by human activities, and elaborate the importance of urban boundaries respecting to human activities in urban management. Following this concept, we delineate the urban interior boundaries formed by human activities. Specifically, taking Xi’an in Shaanxi Province of China as an example, this study first explores the region-based human crowd mobility patterns to verify that human mobility can establish a stable correlation between regions, or capture the objective correlations between regions. Then, the above human crowd patterns have been found to be applicable for mining unusual urban regions from the perspective of anomaly detection, and empirical evidence has found that these regions are of great significance for understanding the urban spatial structure. Finally, we employ the community detection technology to naturally delimit the urban interior boundaries formed by human mobility, and make a comparison with the official urban boundaries. Some unexpected communities that are closely linked due to human activities appear from the results, and these findings help the urban planners re-examine the administrative division.
... To further explore the fht-index, we applied it to two case studies. The first case study involves 36 city sizes that follow Zipf's law (Zipf 1949) exactly: 1, 1/2, 1/3,…, and 1/36 (panel (a) of Figure 2) with an ht-index of 3. The second case study involves 8,106 natural cities with an ht-index of 7, derived from the social media Brightkite in the United States (panel (c) of Figure 2, Jiang and Miao 2015). For the first case study, appending the smallest values is pre-determined by the rank sizes, while for the second case study the smallest values are determined by a power law function of 5,03 . of the 8,016 city sizes. ...
Preprint
A fractal bears a complex structure that is reflected in a scaling hierarchy, indicating that there are far more small things than large ones. This scaling hierarchy can be effectively derived using head/tail breaks - a clustering and visualization tool for data with a heavy-tailed distribution - and quantified by an ht-index, indicating the number of clusters or hierarchical levels, a head/tail breaks-induced integer. However, this integral ht-index has been found to be less precise for many fractals at their different phrases of development. This paper refines the ht-index as a fraction to measure the scaling hierarchy of a fractal more precisely within a coherent whole, and further assigns a fractional ht-index - the fht-index - to an individual data value of a data series that represents the fractal. We developed two case studies to demonstrate the advantages of the fht-index, in comparison with the ht-index. We found that the fractional ht-index or fractional hierarchy in general can help characterize a fractal set or pattern in a much more precise manner. The index may help create intermediate map scales between two consecutive map scales. Keywords: Ht-index, fractal, scaling, complexity, fht-index
... The link is with a weight 1 since a location may represent more than one user (some users may have the same first or most frequent check-in), thus the weight equals to the number of pairs of socially connected users. For a city-city network, a city refers to a natural city (Jiang and Miao 2015). The natural city is formed by the clustered check-in location points with short edges (shorter than the arithmetic mean of all edge lengths) under a big triangulated irregular network (TIN) which is composed of all locations in a country. ...
Preprint
Location-based social media make it possible to understand social and geographic aspects of human activities. However, previous studies have mostly examined these two aspects separately without looking at how they are linked. The study aims to connect two aspects by investigating whether there is any correlation between social connections and users' check-in locations from a socio-geographic perspective. We constructed three types of networks: a people-people network, a location-location network, and a city-city network from former location-based social media Brightkite and Gowalla in the U.S., based on users' check-in locations and their friendships. We adopted some complexity science methods such as power-law detection and head/tail breaks classification method for analysis and visualization. Head/tail breaks recursively partitions data into a few large things in the head and many small things in the tail. By analyzing check-in locations, we found that users' check-in patterns are heterogeneous at both the individual and collective levels. We also discovered that users' first or most frequent chec-in locations can be the representatives of users' spatial information. The constructed networks based on these locations are very heterogeneous, as indicated by the high ht-index. Most importantly, the node degree of the networks correlates highly with the population at locations (mostly with R-square being 0.7) or cities (above 0.9). This correlation indicates that the geographic distributions of the social media users relate highly to their online social connections. Keywords: social networks, check-in locations, natural cities, power law, head/tail breaks, ht-index
... Modeling dynamic supply-demand network at a finer scale is a crucial aspect of enhancing supply chain resilience and management. In accordance with the Law of Spatial Heterogeneity (Grove & Burch, 1997), the heterogeneity of geographical spaces results in differentiated distributions of population and resources (Jiang and Miao, 2015;Ren et al., 2024;Shi et al., 2023), consequently leading to uneven development of public facilities across different areas (Luo et al., 2023). Under this context, evaluating the accuracy of spatial supply-demand matches is helpful for identifying hidden supply issues. ...
Article
Full-text available
Confronting the escalating challenge of emergencies, the urban supply network of daily necessity is an important defense line for human well-being. This study introduces a groundbreaking approach that leverages mobile signal data, departing from conventional static regional data, to model urban supply-demand network. Moreover, a significant stride in assessing network invulnerability is presented by incorporating cascade failure and emphasizing demand-side factors in attack strategy simulations. This approach marks a paradigm shift in network in-vulnerability simulation: moving from network topology characteristics to a human-centric approach, which helps better identify vulnerable zones. The model's robustness is corroborated through simulations informed by major disaster scenarios. The results found that: 1) Human spatial mobility promises large-scale, high-precision urban supply network modeling. 2) Supply nodes in secondary centers play a pivotal role in the overall network efficiency, some even surpassing the importance of central area nodes. 3) During various stages of cascade failure , the leading factors contributing to community supply shortages vary, with population density being the predominant factor. This research propels the methodology forward, incorporating multi-scenario simulations to augment practicality, and offers valuable insights for urban supply system enhancement.
Preprint
A city is a whole, as are all cities in a country. Within a whole, individual cities possess different degrees of wholeness, defined by Christopher Alexander as a life-giving order or simply a living structure. To characterize the wholeness and in particular to advocate for wholeness as an effective design principle, this paper develops a geographic representation that views cities as a whole. This geographic representation is topology-oriented, so fundamentally differs from existing geometry-based geographic representations. With the topological representation, all cities are abstracted as individual points and put into different hierarchical levels, according to their sizes and based on head/tail breaks - a classification scheme and visualization tool for data with a heavy tailed distribution. These points of different hierarchical levels are respectively used to create Thiessen polygons. Based on polygon-polygon relationships, we set up a complex network. In this network, small polygons point to adjacent large polygons at the same hierarchical level and contained polygons point to containing polygons across two consecutive hierarchical levels. We computed the degrees of wholeness for individual cities, and subsequently found that the degrees of wholeness possess both properties of differentiation and adaptation. To demonstrate, we developed four case studies of all China and UK natural cities, as well as Beijing and London natural cities, using massive amounts of street nodes and Tweet locations. The topological representation and the kind of topological analysis in general can be applied to any design or pattern, such as carpets, Baroque architecture and artifacts, and fractals in order to assess their beauty, echoing the introductory quote from Christopher Alexander. Keywords: Wholeness, natural cities, head/tail breaks, complex networks, scaling hierarchy, urban design
Preprint
In light of the emergence of big data, I have advocated and argued for a paradigm shift from Tobler's law to scaling law, from Euclidean geometry to fractal geometry, from Gaussian statistics to Paretian statistics, and - more importantly - from Descartes' mechanistic thinking to Alexander's organic thinking. Fractal geometry falls under the third definition of fractal - that is, a set or pattern is fractal if the scaling of far more small things than large ones recurs multiple times (Jiang and Yin 2014) - rather than under the second definition of fractal, which requires a power law between scales and details (Mandelbrot 1982). The new fractal geometry is more towards living geometry that "follows the rules, constraints, and contingent conditions that are, inevitably, encountered in the real world" (Alexander et al. 2012, p. 395), not only for understanding complexity, but also for creating complex or living structure (Alexander 2002-2005). This editorial attempts to clarify why the paradigm shift is essential and to elaborate on several concepts, including spatial heterogeneity (scaling law), scale (or the fourth meaning of scale), data character (in contrast to data quality), and sustainable transport in the big data era.
Article
Full-text available
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intuitive,multimodal, user-centric, scientific applications that can aid and enable scientific research.
Article
Full-text available
Geospatial analysis is very much dominated by a Gaussian way of thinking, which assumes that things in the world can be characterized by a well-defined mean, i.e., things are more or less similar in size. However, this assumption is not always valid. In fact, many things in the world lack a well-defined mean, and therefore there are far more small things than large ones. This paper attempts to argue that geospatial analysis requires a different way of thinking - a Paretian way of thinking that underlies skewed distribution such as power laws, Pareto and lognormal distributions. I review two properties of spatial dependence and spatial heterogeneity, and point out that the notion of spatial heterogeneity in current spatial statistics is only used to characterize local variance of spatial dependence. I subsequently argue for a broad perspective on spatial heterogeneity, and suggest it be formulated as a scaling law. I further discuss the implications of Paretian thinking and the scaling law for better understanding of geographic forms and processes, in particular while facing massive amounts of social media data. In the spirit of Paretian thinking, geospatial analysis should seek to simulate geographic events and phenomena from the bottom up rather than correlations as guided by Gaussian thinking. KEYWORDS: Big data, scaling of geographic space, head/tail breaks, power laws, heavy-tailed distributions
Book
Spatial trajectories have been bringing the unprecedented wealth to a variety of research communities. A spatial trajectory records the paths of a variety of moving objects, such as people who log their travel routes with GPS trajectories. The field of moving objects related research has become extremely active within the last few years, especially with all major database and data mining conferences and journals. Computing with Spatial Trajectories introduces the algorithms, technologies, and systems used to process, manage and understand existing spatial trajectories for different applications. This book also presents an overview on both fundamentals and the state-of-the-art research inspired by spatial trajectory data, as well as a special focus on trajectory pattern mining, spatio-temporal data mining and location-based social networks. Each chapter provides readers with a tutorial-style introduction to one important aspect of location trajectory computing, case studies and many valuable references to other relevant research work. Computing with Spatial Trajectories is designed as a reference or secondary text book for advanced-level students and researchers mainly focused on computer science and geography. Professionals working on spatial trajectory computing will also find this book very useful.
Chapter
The ability to gather and manipulate real world contextual data, such as user location, in modern software systems presents opportunities for new and exciting application areas. A key focus among those working in the area of Location-Based services today has been the creation of social networks which allow mobile device users to exchange details of their personal location as a key point of interaction. While the initial interest in these services has been exceptionally high, they are plagued by the same challenges as all Location Based services, regarding the privacy and security of users and their data. This chapter aims to investigate the area of Location-Based Social Networks (LBSNs), with a view to documenting how they contribute to a new form of expertise due to the now accurate knowledge of where people are actually located at a moment in time.
Conference Paper
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of cloud computing, collaboration services, and research repositories will be discussed.
Article
History tells us that when you want something done you turn to a leader: right? Wrong. If you want to make a correct decision or solve a problem, large groups of people are smarter than a few experts. This brilliant and insightful book shows why the conventional wisdom is so wrong and why the theory of the wisdom of crowds has huge implications for how we run our businesses, structure our political systems and organise our society. Shrewd, meticulous and profound, The Wisdom of Crowds will change for ever the way you think about human behaviour.
Article
Four phases of interest in the distribution of city sizes are identified and current conflict in the literature is shown to be a consequence of poorly-selected units of observation. When urban regions are properly defined, US urban growth obeys Gibrat’s Law and the city size distribution is strictly Zipfian rank-size with coefficient q = 1.0. Care has to be taken with definition of the largest urban-economic regions, however; the fit in the upper tail of the distribution is best when they are recognized to be megalopolitan in scale.