Content uploaded by Giancarlos Parady
Author content
All content in this area was uploaded by Giancarlos Parady on Aug 06, 2021
Content may be subject to copyright.
The Third Bridging Transportation Researchers (BTR) Online Free Conference
5-6 August 2021
http://bridgingtransport.org/
The effectiveness of using Google Maps Location
History data to detect joint activities in social networks
Giancarlos Parady*
The University of Tokyo
Yuki Oyama
Shibaura Institute of Technology
Makoto Chikaraishi
Hiroshima University
Keita Suzuki
IBM Japan
Missing observation
t
Stay
Ground Truth
No GLH data
Intersect
t
Stay
Ground Truth
Stay
GLH
t
Divergence (arrival)
Stay
GLH
Stay
Ground Truth
Divergence (departure)
t
Stay
Ground Truth
Stay
GLH
Missing observation
t
Stay
Ground Truth
No GLH data
Intersect
t
Stay
Ground Truth
Stay
GLH
t
Divergence (arrival)
Stay
GLH
Stay
Ground Truth
Divergence (departure)
t
Stay
Ground Truth
Stay
GLH
The context (1): Passive survey methods
▪Google Maps Location History data (hereinafter GLH data) is appealing to researchers on human mobility and
transportation
▪Key findings from the literature:
Against GPS data:
✓GLH data for Android users and GPS systems had an 85% agreement when spatially aggregating the data to 100m x 100m
mesh (Ruktanonchai et al., 2018)
✓Macarulla Rodriguez et al. (2018) evaluated GLH location accuracy on different networks (2G,3G, Wi-Fi) and for mobile devices
with GPS.GPS yielded the best performance, followed by 3G and 2G, respectively, with Wi-fi having the worst
performance.
Against observed ground truth data (Cools et al., 2021):
✓GLH overall detection rate of 51%and overall trip detection rate of 32%
✓Shorter dwell times were more likely to be missed
✓iPhones underperformed against Androids (28%vs 57%)
The context (2): Joint activities
▪Social interactions (or social activities) account for a significant share of trips and are one of the fastest growing
segments of travel (Axhausen, 2005)
▪Joint trips account for 40%to 60%of all out-of-home activities in Japan (Qian et al., 2019)
▪Collecting data on joint activities remains a difficult task
✓The aim is the group, not the individual
✓Response burden
✓High spatiotemporal variability of social activities (in particular, of leisure) requires long periods of observation
Yearly average interaction frequency by geographical distance of up to 10,000 km for six different cities (Parady et al., 2021)
Research objective
Evaluate the potential of GLH data to identify joint activities in networks
Experiment protocol
▪For each experiment day 4 participants were asked to execute a schedule designed
by the research team
▪Schedules were on average 8-hours long and were designed considering three factors
Variable
Lv
.
Definition
Additional
explanation
Allocation
Rule
Duration
1
15
to 29 minutes
Random
allocation
2
30
to 44 minutes
3
Over
45 minutes
Floor area
ratio
1
Open
space
Assigned
by
rule
2
Indoors
- low density
FAR <300%
3
Indoors
- mid density
300% < FAR <700%
4
Indoors
- high density
FAR >700%
Group size
1
1
person
Assigned
by
rule
2
2
persons
3
3
persons
4
4
persons
Device
1
Android
Available
to
all
2
iPhone
Wi
-
fi setting
1
On
Available
to
all
2
Off
Equipment
Model/OS (confirmation pending)
Android phone 1
Sharp
Aquos sense basic 702SH,
Android
™8.0, Google maps latest
version as of Dec. 5, 2020
Android phone 2
Kyocera
Digno® -J, Android™8.1,
Google maps latest version as of Dec.
5, 2020
iPhone 1
Apple iPhone XR, iOS13.1.2~14.0.1,
Google maps latest version as of Dec.
5, 2020
iPhone 2
Apple iPhone 6s, iOS 12.4.1~14.1,
Google maps latest version as of Dec.
5, 2020
GPS logger
GNS 3000
Factors controlled for during the schedule design Specifications of equipment provided to each participant
Measuring accuracy: Spatial accuracy
Ground truth data plotted against Google Maps Location History data.
▪Spatial accuracy 𝑠was measured as the
Euclidean distance between the true
location and the estimated location
▪The range of 𝑠is [0,∞] where 0 indicates
perfect accuracy.
▪Indoor locations: coordinates of the
centroid of the facility were used as a
measure of the true location.
▪Open spaces: a polygon of the perimeter
was drawn, and distance was measured
from the estimated location coordinates to
the nearest point of the perimeter polygon.
▪Alternative measure: Google ID location
match
Measuring accuracy: Temporal accuracy
Missing observation
t
Stay
Ground Truth
No GLH data
Intersect
t
Stay
Ground Truth
Stay
GLH
t
Divergence (arrival)
Stay
GLH
Stay
Ground Truth
Divergence (departure)
t
Stay
Ground Truth
Stay
GLH
Missing observation
t
Stay
Ground Truth
No GLH data
Intersect
t
Stay
Ground Truth
Stay
GLH
t
Divergence (arrival)
Stay
GLH
Stay
Ground Truth
Divergence (departure)
t
Stay
Ground Truth
Stay
GLH
Ground truth data plotted against Google Maps Location History data.
Measuring accuracy: Activity detection rate
▪Define an activity
where
is a vector of ground truth data activity attributes
is the set of observations of individuals who participated in activity
▪An individual observation is defined as
where
and are respectively wi-fi setting and device of
is the identified Google Place Id for ,
and are measures of spatial and temporal accuracy
▪An individual activity detection of within a spatial accuracy threshold and a temporal accuracy threshold
is defined as:
Measuring accuracy: Activity detection rate
▪When Google Place Id match is used as a spatial accuracy measure instead of Euclidean distance, it is replaced by:
▪A group activity detection is then defined as the product of individual detections:
▪Finally, we obtain the activity detection rate for -group-sized activities within spatial accuracy threshold
and temporal accuracy threshold , which is defined as:
where indicates the total number of -group-sized activity observations
Aggregate results
▪Clear differences between Android and iPhones
▪These differences in performance are much more marked than
those reported by Cools et al. (2021)*
▪Activity detection rates for Android devices:
✓Google Place Id match at :22%to 25.7% for
and , respectively
✓ and ,38%to 46.4%, for and
, respectively
▪Larger groups result in lower detection rates, but these
reductions are smaller than we expected.
Aggregate spatiotemporal accuracy given different spatiotemporal
threshold levels, devices, and group size
Modelling factors affecting detection probability
▪To evaluate the extent to which several factors affect detection probability, we estimated two sets of binary logit
models given different spatiotemporal accuracy thresholds:
✓Activity detection for individual devices: The dependent variable takes value 1 if the ground truth activity
was detected given thresholds S,T, and 0 otherwise.
✓Joint activity detection: The dependent variable takes value one if and only if all members of the group
were detected given thresholds S,T, and 0 otherwise.
•Composite 2-person, 3-person, and 4-person groups by resampling from all possible permutations
from the observed two-person, three-person and four-person groups
•This was done to mitigate potential biases that might remain after the scheduling process
•The sampling process was repeated 30 times and estimated effect magnitudes averaged over all
iterations
Modelling factors affecting detection probability: All devices
Effect magnitude of variables affecting detection at the individual level given
different spatiotemporal accuracy thresholds. Values in parenthesis show the
respective 95% confidence intervals.
Effect magnitude of variables affecting joint detection given different
spatiotemporal accuracy thresholds. Values in parenthesis show the
respective 95% confidence intervals.
Modelling factors affecting detection probability: Android only
Effect magnitude of variables affecting detection at the individual level given
different spatiotemporal accuracy thresholds. Values in parenthesis show the
respective 95% confidence intervals. Android devices only.
Effect magnitude of variables affecting joint detection given different
spatiotemporal accuracy thresholds. Values in parenthesis show the
respective 95% confidence intervals. Android devices only.
Predictive accuracy of estimated models
Measure
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Threshold settings
Num. observations
912
912
912
912
912
912
GLH activity detection
rate(observed)
21.8%
23.7%
28.6%
31.0%
36.2%
38.8%
Model accuracy
79.7%
78.5%
76.3%
76.4%
75.6%
74.8%
Model balanced accuracy
(TPR+TNR)/2
64.2%
64.4%
65.8%
68.7%
73.1%
73.7%
TPR (True positive rate)
36.8%
37.8%
41.5%
48.6%
64.4%
69.0%
TNR (True negative rate)
91.6%
91.0%
90.1%
88.7%
81.8%
78.3%
PPV (Positive prediction
value)
10.2%
11.6%
15.7%
19.9%
30.9%
35.9%
FNR (False negative rate)
63.2%
62.2%
58.5%
51.4%
35.6%
31.0%
FPR (False positive rate)
8.4%
9.0%
9.9%
11.3%
18.2%
21.7%
Rho
-square
0.40
0.38
0.29
0.29
0.26
0.25
Adjusted rho square
0.39
0.37
0.28
0.27
0.24
0.24
10-fold cross-validation results for activity detection models at the individual
level (all devices)
Measure
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Threshold settings
Num. observations
456
456
456
456
456
456
GLH activity detection
rate (observed)
36.8%
40.3%
45.6%
49.6%
58.3%
62.3%
Model accuracy
65.3%
61.8%
64.2%
63.1%
65.8%
67.7%
Model balanced accuracy
(TPR+TNR)/2
61.8%
59.5%
63.4%
63.3%
62.9%
62.3%
TPR (True positive rate)
46.8%
46.7%
53.4%
60.0%
79.7%
84.5%
TNR (True negative rate)
76.8%
72.4%
73.4%
66.6%
46.1%
40.0%
PPV (Positive prediction
value)
26.1%
30.3%
37.9%
47.0%
70.7%
77.6%
FNR (False negative rate)
53.2%
53.3%
46.6%
40.0%
20.3%
15.5%
FPR (False positive rate)
23.2%
27.6%
26.6%
33.4%
53.9%
60.0%
Rho
-square
0.16
0.12
0.09
0.09
0.13
0.14
Adjusted rho square
0.13
0.09
0.06
0.06
0.10
0.11
10-fold cross-validation results for activity detection models at the individual
level (Android only)
▪Clear difference between true positive rates (TPR) and true negative rates (TNR) across all
models (related to the observed (ground truth) detection rates) →Focus on true positive
rates.
▪For type 1 models ( ), true positive rates are 36.8% (individual detection,
all devices), 46.8% (individual detection, Android-only), 9.8% (joint detection, all
devices), 15.3% (joint detection, Android-only).
Predictive accuracy of estimated models
10-fold cross-validation results for joint activity detection models
(all devices) 10-fold cross-validation results for joint activity detection models
(Android only)
Measure
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Threshold settings
Num. observations
3296
3296
3296
3296
3296
3296
GLH activity detection
rate (observed)
6.3%
8.9%
9.3%
11.7%
11.7%
14.5%
Model accuracy
94.0%
91.5%
91.0%
89.1%
89.6%
87.4%
Model balanced accuracy
(TPR+TNR)/2
54.7%
58.5%
56.6%
61.0%
61.0%
65.2%
TPR (True positive rate)
9.8%
18.3%
14.3%
24.3%
23.6%
33.8%
TNR (True negative rate)
99.5%
98.7%
98.9%
97.7%
98.4%
96.5%
PPV (Positive prediction
value)
0.7%
1.7%
1.4%
3.2%
3.0%
5.6%
FNR (False negative rate)
90.2%
81.7%
85.7%
75.7%
76.4%
66.2%
FPR (False positive rate)
0.5%
1.3%
1.1%
2.3%
1.6%
3.5%
Rho
-square
0.76
0.70
0.67
0.63
0.62
0.58
Adjusted rho square
0.75
0.69
0.67
0.63
0.61
0.57
Measure
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Threshold settings
Num. observations
608
608
608
608
608
608
GLH activity detection
rate (observed)
27.0%
35.2%
32.9%
42.0%
42.0%
51.2%
Model accuracy
71.5%
66.3%
70.9%
65.1%
69.3%
63.7%
Model balanced accuracy
(TPR+TNR)/2
53.8%
59.5%
62.4%
62.3%
68.5%
63.5%
TPR (True positive rate)
15.3%
37.4%
37.0%
46.9%
63.1%
63.0%
TNR (True negative rate)
92.3%
81.5%
87.8%
77.6%
74.0%
64.1%
PPV (Positive prediction
value)
5.9%
20.2%
17.1%
30.7%
38.0%
50.7%
FNR (False negative rate)
84.7%
62.6%
63.0%
53.1%
36.9%
37.0%
FPR (False positive rate)
7.7%
18.5%
12.2%
22.4%
26.0%
35.9%
Rho
-square
0.27
0.16
0.21
0.13
0.18
0.13
Adjusted rho square
0.25
0.13
0.18
0.10
0.15
0.11
▪While we have identified several factors associated with detection and quantified its effect
magnitudes, there is certainly room for improvement in terms of identifying other
factors associated with detection,both at the individual and at the joint level.
Discussion and conclusion
▪While to some extent we agree with the conclusion reached by Cools et al. (2021) that current detection rates might limit its
usefulness in travel behavior studies, we argue that these detection rates, if not ideal, must be weighed against the
potential of observing travel behavior over long periods of time.
▪GLH data could potential be used in conjunction with other data-gathering methodologies to compensate for some of its
limitations. This is particularly the case for joint activities, which are the main target of this study.
✓For example, we could utilize the estimated binary logit model (or a machine learning classifier) to obtain a
propensity score of being detected,and the inverse probability weighting (IPW) (Wooldridge, 2007) could be used to:
1. Obtain an unbiased frequency of joint activities for a particular group from GLH data,
2. Merge identified joint activities with GLH data and those with other data sources, and/or
3. Develop a sampling scheme for other data-gathering methodologies, under a set of assumptions that need to be
met to use the IPW.
Discussion and conclusion
▪Another key finding was the large gap in detection rates between iPhone and Android, which imposes a serious
limitation on the usability of GLH data to study joint travel behavior, given the high market shares that the iPhone enjoy in
most countries.
✓This gap could be attributed to the difference in privacy policies between iPhone and Android (Greene and
Shilton, 2018)
✓In order to collect data at the required quality level we may have to:
1. Identify the required level of privacy encroachment to obtain the desired accuracy levels and corresponding
detection rates
2. Engage in better science communication to help people clearly understand how the data would be utilized to
improve urban and transport systems
3. Confirm whether the required level of privacy encroachment can be accepted or not,and
4. Have a privacy policy agreement between public bodies and citizens, separately from the ones made with
OS firms
Thank you!