Content uploaded by Raja Jurdak
Author content
All content in this area was uploaded by Raja Jurdak on Aug 04, 2015
Content may be subject to copyright.
Understanding Human Mobility
•Significance
–Urban planning
–Transport
–Disease spread
•Issues with current data sources
–Census: coarse-grained
–GPS/Call data records/Wifi: proprietary/private
•Geo-tagged tweets as a proxy
–500 Million users
–340 Million tweets/day
–Up to 10m accuracy
Open Issues with Geo-Tagged Tweets as
Mobility Proxy
•Potential sampling bias
–Demographic
–Geographic
•Content limit on tweets
–140 characters/tweet
–Unknown effect of this limit on tweet locations
•Tweet location preference
–Unclear how it can affect observed movement patterns
Overview of this work
•Determine how representative are Twitter-based
mobility patterns of population and individual-
level movement
•Analyse a large dataset with 7,811,004 tweets
from 156,607 Twitter users
•Compare the mobility patterns observed
through Twitter with the patterns observed
through other technologies, such as call data
records
Displacement Distribution
Displacement distribution, namely spatial dispersal kernel P(d), where d is the distance
between a user’s two consecutive reported locations.
Previously observed
distributions:
•Power-law (banknotes)
• Truncated power-law
(mobile phones & travel
surveys)
•Exponential and log-
normal (GPS from
cars/taxis)
Mixed exponential-stretched
exponential fitting
•May stem from multiplicative processes, i.e. the displacement d is determined by the product
of k random variables
•These random variables can be transportation cost, lifestyle aspects such as the preference
on commute distance, or socio-economic status such as personal income.
•The number of these variables k, namely the number of levels in the multiplicative cascade,
is indicated by the exponent β in the above equation.
•When k is small, P(d) converges to a stretched-exponential asymptotically, and k → +∞ leads
to the classic log-normal distribution. In particular, if these random variables are Gaussian
distributed, we have k ≈ 2/β, and the value of k is around 3 or 4 for our data (β ≈ 0.55).
What could be different here?
•2 separate power
laws
•Differences
between short and
long distance
travel patterns
Another fitting possibility
Radius of Gyration
•Quantifies the spatial stretch of an individual
trajectory or the traveling scale of an individual
where is the individual’s i-th location,
is the geometric center of the
trajectory and n is the number of locations in the
trajectory.
Rg Distribution
10
1
10
2
10
3
10
4
10
5
10
6
10
7
d
10
−9
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
P(d)
a
data
y=y
1
+y
2
y
1
⇠e
−0.073x
y
2
⇠x
−0.45
e
−0.011x
0.55
y
3
⇠x
−1.32
10
1
10
2
10
3
10
4
10
5
10
6
10
7
d
10
−9
10
−8
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
P(d)
b
data
y
1
⇠x
−0.77
y
2
⇠x
−2.07
10
1
10
2
10
3
10
4
10
5
10
6
10
7
r
g
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
P(r
g
)
c
data
y=y
1
+y
2
y
1
⇠e
−0.12x
y
2
⇠x
−0.23
e
−0.0015x
0.77
y
3
⇠x
−1.11
10
1
10
2
10
3
10
4
10
5
10
6
10
7
r
g
10
−7
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
P(r
g
)
d
data
y
1
⇠x
−0.40
y
2
⇠x
−1.60
First passage time
•Fpt(t), i.e. the probability of finding a user
at the same location after a period of t
•Similar to CDR trends
•Periodic behavior with
daily cycles
•Content limit does not
appear to affect Fpt
Preferential return to visited locations
•The probability function P(L) of
finding an individual at his/her L-th
most visited location.
•Sort visited locations and perform
spatial clustering (250m)
•Generally follow Zipf law of
preferential return
•People are 50% likely to tweet
from most popular location, higher
than other data sources
Predictability of Tweet Locations
•Study the randomness (entropy) and predictability of the sequence
of tweeting locations for each user
•Bi-modal distribution for users with >20 locations suggests 2 types
of users
Entropy Predictability
Probabilistic Modeling of Human
Movement
Rg: 1-10
km
Rg: 10-100
km
Rg: 100-500 km Rg: 500-1000
km
Isotropy of Motion Patterns
•Isotropy ratio σ = δy/δx, where δy is
the standard deviation of P(x, y) along
the y-axis and δx is the standard
deviation of P(x, y) along the x-axis, to
characterise the orbit of each rg group
• Second peak at ~1000 km
differentiates results from previous
studies
Preferential return decreases with larger
orbits
•Likelihood to tweet
from home location
drops with
increasing rg
•Exponent α also
decreases with
increasing rg
Melbourne
Sydney
Brisbane
a
Melbourne
Sydney
Brisbane
b
Melbourne
Sydney
Brisbane
c
Melbourne
Sydney
Brisbane
d
Twitter-based Mobility Patterns
Rg: 1-10
km
Rg: 10-100
km
Rg: 100-500 km Rg: 500-1000
km
Discussion
•Three observed modes of mobility
–Intrasite
–Metropolitan
–Intercity
•Two apparent groups of tweeters
–Highly predictable group where geo-tags are not
highly useful for mobility prediction
–Less predictable group where geo-tagged tweets can
be representative of movement pattterns
…Discussion
•Long distance movers more diffusive in their movement than
intermediate distance movers, most likely as a reflection of a
switch in transportation mode towards air travel and local
circulation around destination cities.
•Preferential return strongly dependent on a person’s orbit of
movement, with long distance movers less likely to return to
previously visited locations.
•Population-level mobility patterns are well-represented by geo-
tagged tweets, while individual-level patterns are more sensitive
to contextual factors
Implications and Future Work
•Develop agent-based model for disease
spread based on rg group features
•Use tweets to better understand drivers
for movement
•Use location inference algorithms on tweet
content to increase data sample with
higher uncertainty per location