DataPDF Available
Understanding Human
Mobility from Twitter
Raja Jurdak, Kun Zhao, Jiajun
Liu, Maurice Abou Jaoude, Mark
Cameron, David Newth
PLoS ONE 10(7): e0131469. doi:10.1371/journal.pone.0131469
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0131469
Understanding Human Mobility
Significance
Urban planning
Transport
Disease spread
Issues with current data sources
Census: coarse-grained
GPS/Call data records/Wifi: proprietary/private
Geo-tagged tweets as a proxy
500 Million users
340 Million tweets/day
Up to 10m accuracy
Open Issues with Geo-Tagged Tweets as
Mobility Proxy
Potential sampling bias
Demographic
Geographic
Content limit on tweets
140 characters/tweet
Unknown effect of this limit on tweet locations
Tweet location preference
Unclear how it can affect observed movement patterns
Overview of this work
Determine how representative are Twitter-based
mobility patterns of population and individual-
level movement
Analyse a large dataset with 7,811,004 tweets
from 156,607 Twitter users
Compare the mobility patterns observed
through Twitter with the patterns observed
through other technologies, such as call data
records
Displacement Distribution
Displacement distribution, namely spatial dispersal kernel P(d), where d is the distance
between a user’s two consecutive reported locations.
Previously observed
distributions:
Power-law (banknotes)
Truncated power-law
(mobile phones & travel
surveys)
Exponential and log-
normal (GPS from
cars/taxis)
Mixed exponential-stretched
exponential fitting
May stem from multiplicative processes, i.e. the displacement d is determined by the product
of k random variables
These random variables can be transportation cost, lifestyle aspects such as the preference
on commute distance, or socio-economic status such as personal income.
The number of these variables k, namely the number of levels in the multiplicative cascade,
is indicated by the exponent β in the above equation.
When k is small, P(d) converges to a stretched-exponential asymptotically, and k → +∞ leads
to the classic log-normal distribution. In particular, if these random variables are Gaussian
distributed, we have k ≈ 2/β, and the value of k is around 3 or 4 for our data (β ≈ 0.55).
What could be different here?
2 separate power
laws
Differences
between short and
long distance
travel patterns
Another fitting possibility
Radius of Gyration
Quantifies the spatial stretch of an individual
trajectory or the traveling scale of an individual
where is the individual’s i-th location,
is the geometric center of the
trajectory and n is the number of locations in the
trajectory.
Rg Distribution
10
1
10
2
10
3
10
4
10
5
10
6
10
7
d
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
P(d)
a
data
y=y
1
+y
2
y
1
e
0.073x
y
2
x
0.45
e
0.011x
0.55
y
3
x
1.32
10
1
10
2
10
3
10
4
10
5
10
6
10
7
d
10
9
10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
P(d)
b
data
y
1
x
0.77
y
2
x
2.07
10
1
10
2
10
3
10
4
10
5
10
6
10
7
r
g
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
P(r
g
)
c
data
y=y
1
+y
2
y
1
e
0.12x
y
2
x
0.23
e
0.0015x
0.77
y
3
x
1.11
10
1
10
2
10
3
10
4
10
5
10
6
10
7
r
g
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0
P(r
g
)
d
data
y
1
x
0.40
y
2
x
1.60
First passage time
Fpt(t), i.e. the probability of finding a user
at the same location after a period of t
Similar to CDR trends
Periodic behavior with
daily cycles
Content limit does not
appear to affect Fpt
Preferential return to visited locations
The probability function P(L) of
finding an individual at his/her L-th
most visited location.
Sort visited locations and perform
spatial clustering (250m)
Generally follow Zipf law of
preferential return
People are 50% likely to tweet
from most popular location, higher
than other data sources
Predictability of Tweet Locations
Study the randomness (entropy) and predictability of the sequence
of tweeting locations for each user
Bi-modal distribution for users with >20 locations suggests 2 types
of users
Entropy Predictability
Probabilistic Modeling of Human
Movement
Rg: 1-10
km
Rg: 10-100
km
Rg: 100-500 km Rg: 500-1000
km
Isotropy of Motion Patterns
Isotropy ratio σ = δy/δx, where δy is
the standard deviation of P(x, y) along
the y-axis and δx is the standard
deviation of P(x, y) along the x-axis, to
characterise the orbit of each rg group
Second peak at ~1000 km
differentiates results from previous
studies
Preferential return decreases with larger
orbits
Likelihood to tweet
from home location
drops with
increasing rg
Exponent α also
decreases with
increasing rg
Melbourne
Sydney
Brisbane
a
Melbourne
Sydney
Brisbane
b
Melbourne
Sydney
Brisbane
c
Melbourne
Sydney
Brisbane
d
Twitter-based Mobility Patterns
Rg: 1-10
km
Rg: 10-100
km
Rg: 100-500 km Rg: 500-1000
km
Discussion
Three observed modes of mobility
Intrasite
Metropolitan
Intercity
Two apparent groups of tweeters
Highly predictable group where geo-tags are not
highly useful for mobility prediction
Less predictable group where geo-tagged tweets can
be representative of movement pattterns
…Discussion
Long distance movers more diffusive in their movement than
intermediate distance movers, most likely as a reflection of a
switch in transportation mode towards air travel and local
circulation around destination cities.
Preferential return strongly dependent on a person’s orbit of
movement, with long distance movers less likely to return to
previously visited locations.
Population-level mobility patterns are well-represented by geo-
tagged tweets, while individual-level patterns are more sensitive
to contextual factors
Implications and Future Work
Develop agent-based model for disease
spread based on rg group features
Use tweets to better understand drivers
for movement
Use location inference algorithms on tweet
content to increase data sample with
higher uncertainty per location

File (1)

Content uploaded by Raja Jurdak
Author content
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.