Content uploaded by Shunsuke Aoki
Author content
All content in this area was uploaded by Shunsuke Aoki on Sep 15, 2017
Content may be subject to copyright.
An Early Event Detection Technique with Bus GPS Data
Shunsuke Aoki
shunsuka@andrew.cmu.edu
Carnegie Mellon University
Pittsburgh, Pennsylvania
Kaoru Sezaki
sezaki@iis.u-tokyo.ac.jp
The University of Tokyo
Tokyo, Japan
Nicholas Jing Yuan
Xing Xie
nicholas.yuan@microsoft.com
xing.xie@microsoft.com
Microsoft Research Asia
Beijing, China
ABSTRACT
The analysis and study of the relationship between a geo-spatial
event and human mobility in an urban area is very signicant for
improving productivity, mobility, and safety. In particular, in order
to alleviate serious road congestions, trac jams, and stampedes, it
is essential to predict and be informed about the occurrence of an
event as soon as possible. When we know an event occurrence in
advance, some of those who are not interested in the event might
change their plans and/or might take a detour to avoid to get in-
volved in a heavy congestion. In this context, this paper presents
an early event detection technique using GPS trajectories collected
from
periodic
-
cars
, which are vehicles periodically traveling on a
pre-scheduled route with a pre-determined departure time, such
as a transit bus, shuttle, garbage truck, or municipal patrol car. Us-
ing these trajectories, which provide the real-time and continuous
trac ow and speed, our technique detects large-scale events in
advance, without incurring any privacy invasion. The behavior of
periodic
-
cars
shows a certain sign of a large-scale event before at-
tendees gather around a venue because trac can be slowed around
the venue before the event occurrence. We evaluated our method
using over 7
,
000-bus data from January to May in 2015 in Beijing,
which we compared with the check-in data collected from a social
network service.
CCS CONCEPTS
•Information systems →Spatial-temporal systems
;Informa-
tion integration;
KEYWORDS
Urban computing, event detection, GPS trajectory, location knowl-
edge
ACM Reference format:
Shunsuke Aoki, Kaoru Sezaki, Nicholas Jing Yuan, and Xing Xie. 2017. An
Early Event Detection Technique with Bus GPS Data. In Proceedings of
SIGSPATIAL’17, Los Angeles Area, CA, USA, November 7–10, 2017, 4 pages.
https://doi.org/10.1145/3139958.3139959
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
SIGSPATIAL’17, November 7–10, 2017, Los Angeles Area, CA, USA
©2017 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-5490-5/17/11.
https://doi.org/10.1145/3139958.3139959
(a) Usual trac ow.
(b) Anomalous trac ow. (c) Temporal changes.
Figure 1: Example of trac anomalous signal.
1 INTRODUCTION
Large-scale events attracting many participants in urban area have a
strong negative impact on productivity, mobility, comfortability, and
safety [
2
,
3
]. For example, within the last few years, a large number
of serious accidents caused by congestion resulting from football
games, religious ceremonies, festivals, and so on, have occurred
worldwide. To alleviate trac jams and congestion, people need to
know in advance that an event will occur. In fact, if we can know an
event occurrence beforehand, some of those who are not interested
in the event might change their plans and might take a detour to
avoid to get involved in a heavy congestion.
Many research studies have been focused on event detection and
extraction based mainly on the data generated in the cyber world,
which means social media platforms such as Twitter, Instagram,
and Foursquare [
1
,
4
]. However, these data might be hard to detect
a geo-spatial event before it occurs, because the users tend to share
the location information of the event on these social media after
arriving at the venue. On the other hand, the trac ow around
the venue might indicate a large-scale event before the attendees
gather around the venue.
In this context, we present an early event detection technique
using GPS trajectories collected from
periodic
-
cars
, such as a transit
bus, shuttle, garbage truck, or municipal patrol car, that periodically
travel on a pre-scheduled route with a pre-determined departure
time. Since the
periodic
-
cars
are disallowed to use an alternative
route even when an anomaly trac jam occurs, their trips are ad-
versely aected by the trac jams and congestions. In the following,
an example is described.
Example:
In Figure 1 (a), one of the periodic-cars, a transit bus
runs on a pre-scheduled route, which comprises a round trip be-
tween point
A
and point
B
. When a large-scale event has occurred
at location
V1
as shown in Figure 1 (b), it aects trac speeds of the
SIGSPATIAL’17, November 7–10, 2017, Los Angeles Area, CA, USA S. Aoki et al.
surroundings before the event starts, because people are gathering
at location
V1
by their own vehicles, bicycles, or taxicabs. There-
fore, the trac ow heading toward
V1
from its surrounding area
is considerably increased and trac congestion occurs around the
location
V1
. Even under these trac congestions, transit buses have
to keep traveling on the xed routes from point
A
to point
B
. Figure
1 (c) shows the temporal behavior of a bus, and depicts that it is
caught in a massive trac jam caused by the event occurring at
V1
and the vehicle speed is dramatically decreased. A driver may
attempt to recover from the delay, but typically, he/she never in-
tentionally causes a delay. Figure 1 (a) and 1 (b) show one bus line
for simplicity, but in practice, there are many pre-determined lines
where many transit buses run routinely in a city.
The main benets of our research are two-fold. First, our event
detection technique uses the GPS trajectories of periodic-cars that
do not contain any privacy. Second, since the periodic-cars run on
the pre-dened routes, the trac speed of the periodic-car is much
more sensitive to congestions than that of other vehicles. Some of
the taxis and private cars having no interests in the event might
change their plans and take a detour [5, 6].
The denition of the periodic-car as follows. The key idea of our
research is to evaluate the road status continuously by monitoring
the periodic-car.
Periodic-car
: A car running on a pre-dened route in a con-
stant period, such as a transit bus, school bus, garbage truck, and
municipal patrol car.
The contribution of our paper lies in two aspects.
Network-based Event Detection:
We design an early event de-
tection algorithm with a time-dependent network using the features
of periodic-cars.
Real Data Evaluation:
We evaluate our method using a series
of large-scale real GPS trajectories generated by over 7
,
000 buses
in Beijing from January to May in 2015.
2 SPATIO-TEMPORAL EVENT DETECTION
In this section, we present a spatio-temporal event detection tech-
nique using a Time-dependent Congestion Network (TCN) and a
Spatio-Temporal Event Likelihood (STEL). Our technique detects
anomalous trac speed, connects them as the TCN, and estimates
event venues where people are gathering with the STEL.
2.1 Time-dependent Congestion Network
(TCN)
The Time-dependent Congestion Network (TCN) is composed of
anomalous road segments where the trac speed is much slower
than usual. We can know the event occurrence with the TCN, by
measuring the size of the TCN and by monitoring the multiple
subnetted TCN. To develop the TCN, we process the data in the
three steps: (i) Converting to Trac Speed, (ii) Network Mapping,
and (iii) Edge Anomaly Detection.
First, since the periodic-cars run on the pre-dened routes and
would not take a detour, we can estimate the road trac speed from
the GPS trajectory data. We calculate the trip distance between
each sample by using the map information, and estimates the trac
speed from the calculated distance and time to be taken.
Algorithm 1: Edge Anomaly Detection Technique
Input : Real-time data rτ(D,T)
(ni,nj)
Output : TRUE (congested) or FALSE (not congested)
Extract Θ(ni,nj)={θ1(ni,nj),θ2(ni,nj),· · · } from accumulated data;
Select Θτ(D,T)
(ni,nj)={θτ(D,T)
1(ni,nj),θτ(D,T)
2(ni,nj),· · · };
ϒ= Average(Θτ(D,T)
(ni,nj));
if
rτ(Dk,Tl)
(ni,nj)
ϒ>Γthen
return TRUE;
else
return FALSE;
end
Second, our technique maps the calculated speed data to the road
network. In the road network, each intersection is regarded as a
node and each road is used as an edge. To monitor the trac speed
in each direction, we regard the network as directed graph.
Third, we monitor the trac speed for each edge and compare
the real-time data to the accumulated data by using a threshold Γ.
In addition, our anomaly detection technique accounts for a time-
dependency, and therefore, is able to compare the data collected at a
specic time duration
τ
, which is decided by two factors: day types
(
D
) and time periods (
T
). As for the
τ
, we have weekday and holiday,
which are represented as
D=Dw
and
D=Dh
, respectively. For
time periods, we categorize to three time periods: morning rush-
hour (7-10 o’clock;
T=Tm
), daytime (10-17 o’clock;
T=Td
), and
evening rush-hour (17-20 o’clock;
T=Te
). That is, all of the data are
categorized to 6 types, and the real-time data would be compared
to the accumulated data stored in the same category.
The algorithm for the edge anomaly detection is presented in
Algorithm 1. The real-time trac speed from node
ni
to node
nj
is represented as
rτ(D,T)
(ni,nj)
, which is used as an input.
Θ(ni,nj)
rep-
resents the collection of the data for the edge from node
ni
to
nj
.
θb(ni,nj)(b=
1
,
2
,· · · )
represents the single data for the edge from
node
ni
to
nj
. Once an edge is detected as an anomalous in the
algorithm, the edge is added to the TCN.
2.2 Event Venue Estimation
The venue estimation technique is composed of three following
steps: (i) Calculating a Cascading Level, (ii) Estimating Human
Gathering, and (iii) Calculating a Spatio-Temporal Event Likelihood.
First, we give a Cascading Level
ϕnl
to each node for knowing
where the network has the congestion and how it is large. The
Cascading Level
ϕnl
describes how many upstream continuous
links the node
nl
has in the TCN. The collection of the Cascading
Level is denoted Φ={ϕn1,ϕn1,· · · }
The algorithm for calculating the Cascading Level
ϕnl
is pre-
sented in Algorithm 2, where the TCN is represented as
G0=
Figure 2: Time-dependent Congestion Network (TCN).
An Early Event Detection Technique with Bus GPS Data SIGSPATIAL’17, November 7–10, 2017, Los Angeles Area, CA, USA
Algorithm 2: Calculating the Cascading Levels Φ
foreach nl∈V(G0)do
if de д−(nl)=0then
ϕnl=0;
else
ϕnl=1;
end
end
while not reach stable Φdo
k=1;
foreach nl∈V(G0)do
foreach nu∈npa r en t
ldo
if ϕnu≥kthen
ϕnl+ +;
end
end
end
k+ +;
end
Algorithm 3: Estimating the Human Gathering
foreach nl∈V(G0)do
if ϕnl>=∆then
w0=nl;
for k=0 : (ϕnl−1)do
foreach wk+1∈wpa r en t
kdo
if ϕwk+1<Ψ;then
Mt.add(wk+1,nl)
end
end
end
end
end
(V(G0),E(G0))
. Here,
deд−
(
nl
) represents the indegree of the node
nl, which is the number of edges leading to that node.
After giving the Cascading Level to each node, our technique
estimates the human mobility by using the TCN with the Cascading
Levels. The algorithm to estimate the human gathering direction
is presented in Algorithm 3, where
Mt
represents the collection of
the human mobility direction at time t.
To monitor the human mobility, the algorithm uses two thresh-
olds:
∆
and
Ψ
. First, the threshold value
∆
is used for the Cascading
Level to extract the event having a considerable impact on human
mobility. Second, the threshold value
Ψ
is used to know the original
point of the human movement.
Finally, our technique calculates a Spatio-Temporal Event Likeli-
hood (STEL) from the estimated human movements, as shown in
Algorithm 4. We give the STEL value
ρ
to the area having the high
possibility to have an event, and extract the area having more than
ρeven t , which is the threshold value for the STEL value.
In the algorithm, we assume that the event venue is much close
to the end of the congestion and that the venue is probably ahead
of the congested road segments. When the multiple sub-networks
of the TCN indicates the congestion, the STEL value becomes high.
In Figure 3, we present the simple example of the collection
Mt
and the STEL value calculation. Here,
Mt
has four pairs
{qO
1,qD
1}
,
{qO
2,qD
2}
,
{qO
3,qD
3}
, and
{qO
4,qD
4}
and each pair gives the STEL
value to the corresponding area. The area with red shadow has the
highest STEL value in the scenario. To estimate the event venue
Algorithm 4: Estimating the Event Venue using STEL value
foreach (qO,qD) ∈ Mtdo
Make the circle centered at qdwith the radius of α;
Give STEL value zto the closer semicircle to o;
Give STEL value c·zto the farther semicircle to o(c>1) ;
end
Extracting the area having STEL value ρ>ρeven t ;
Figure 3: STEL value and probable event location.
location with high accuracy, we have to setup the appropriate value
for the value ρeven t .
3 EXPERIMENTAL EVALUATION
In this section, we implement our early event detection technique
and evaluate the detection accuracy for geo-spatial events. We rst
collect the information of 209 geo-spatial events for using as ground-
truth data. The event information has been collected from the dif-
ferent sources, such as yers, news reports, posts on social network
services, and web pages. Some of the event information were avail-
able before the event occurrences but some were unavailable.
3.1 Data Set
Route-bus trajectories
We use the GPS trajectories of xed-route
buses as mobility data, which are shown in Table 1
1
. There are over
7
,
000 sensor-equipped buses. However, they sometimes have a rest
period in the garage, and therefore, the number of buses operating
at any one time is smaller than 5,000.
Map data
We have the road networks of Beijing, the statistics
of which are shown in Table 1. Each intersection is regarded as a
node in the road networks. In addition, we have all of the bus-route
information in Beijing. The number of bus lines in the dataset are
466, in which dierent bus lines can use the same road.
Check-in data
We use the check-in data of SinaWeibo
2
SinaWeibo
provides location-based service such as check-ins. The check-in data
in SinaWeibo contains location ID, user ID, and a timestamp. We
use the check-in data to understand the similarities or dierences
between the trac ow changes and social media responses.
3.2 Detection Accuracy
This section evaluates the reliability of our road-network-based
event detection algorithm. As the parameters of the detection algo-
rithm, time span of the calculation ∆is set as 15 minutes.
As shown in Table 2, we focus on the events happening in the
surroundings of the Workers’ Stadium and the LeSports Center.
1
The data was collected by crawling the Beijing Real-time Buses. https://itunes.apple.
com/us/app/bei-jing- shi-shi- shi-gong- jiao/id703306506?mt=8
2www.weibo.com
SIGSPATIAL’17, November 7–10, 2017, Los Angeles Area, CA, USA S. Aoki et al.
Table 1: Statistics of dataset.
data duration: Jan-May 2015
Trajectories
# of route-buses 7,131
# of bus lines 466
# of eective days 106
# of data points 158M
minimum sampling rate 15 (sec)
Roads # of road segments 162,246
# of road nodes 121,771
Social media avg. # of tweets per day 17,770
avg. # of check-ins per day 12,088
Table 2: Event detection accuracy.
Workers’ Stadium # of Target Events 17
Precision (w/ interpolatioin) 0.727
Recall (w/ interpolation) 0.941
F-measure (w/ interpolation) 0.820
LeSports Center # of Target Events 33
Precision (w/ interpolatioin) 0.737
Recall (w/ interpolatioin) 0.848
F-measure (w/ interpolatioin) 0.789
The ground-truth events in the Workers’ Stadium are 13 football
games and 4music concerts, and these in the LeSports Center are 9
basketball games, 22 music concerts, and 2international conference.
As shown in Table 2, the results indicate that Precision, Recall,
and F-measure are suciently high, and our algorithm can detect
the geo-spatial events happening in the city area. The missed events
in our algorithm are music concerts started from the morning, and
this is reasonable because the trac impacts of event occurrence is
small during the morning rush-hour.
3.3 Statistical Analysis
This evaluation assesses the statistical signicance of our method
using 209 geo-spatial events in Beijing. The events that we evaluated
include sport games, music concerts, school festivals, exhibitions,
and so on. These events were held at various venues, such as a
sports center, football stadium, university, or exhibition center. In
our evaluation, in order to compare and evaluate the data generated
surrounding the venues in an event day vs. an ordinary day, we
focus on the mobility data collected in a circle with a radius of 1km
having its center at the venue. Physical impacts by each event are
shown in Figure 4. The gure describes the speed changes of the
route-bus running near the venue, which is the comparison of the
trac data collected at half an hour before event occurrence and
collected at the same time period in an ordinal day.
In addition, we test for an average dierence using the paired
t-test. In the statistical data presented in Figure 4, the mean and
standard deviation of the dierences is 2
.
231 and 3
.
267, respectively.
From the calculation with degree of freedom, 189, the text statistic
is 9
.
315, and this value indicates that the event occurrence certainly
impacts on the trac speed of the surroundings, because the p-value
in this case is much smaller than 0.01.
3.4 Case Study
We further explore our approach using a case study. In the case
study, we focus on the CBA playo basketball game as shown in
Figure 5. The venue for the game is the LeSports Center located in
Figure 4: Physical impacts by each event.
Date Mar 15(Sun)
Location LeSports Center
Start time 19:35
Daily ave. speed 19.57(km/h)
(Statistics,Sun)
Daily ave. speed 18.41(km/h)
(Event day)
Avg. speed of 18:00-19:00 20.47(km/h)
(Statistics,Sun)
Avg. speed of 18:00-19:00 7.76(km/h)
(Event day)
(a) Basic Information. (b) Trac impacts and check-ins.
Figure 5: Trac impacts by CBA basketball games.
the center of Beijing, and the game started at 19:35. The mobility
data to be handled are generated in a circle with a radius of 1km
having its center at the venue. Figure 5(b) shows the impact on the
physical world and the cyber world, that is, the trac impact and
the number of check-ins, respectively. The sign from the transit bus
data appears more quickly than that in the microblog check-in data.
In addition, the audiences tend to check in at the beginning or at
the end of the games.
4 CONCLUSION
In this paper, we presented the early event detection technique that
uses GPS trajectories of the periodic-cars, which routinely travel in
the urban area. Since the periodic-cars have pre-scheduled routes
and departure time, their behaviors showed the people movement
around the event venue. Our method is very prospective and will
be applicable to many other existing transportation services, such
as school buses, garbage trucks, and municipal patrol cars.
REFERENCES
[1]
Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond Trending Topics:
Real-World Event Identication on Twitter. ICWSM 11 (2011), 438–441.
[2]
Liang Hong, Yu Zheng, Duncan Yung, Jingbo Shang, and Lei Zou. 2015. Detecting
urban black holes based on human mobility data. In Proceedings of the 23rd
SIGSPATIAL International Conference on Advances in Geographic Information
Systems. ACM, 35.
[3]
Yoshihide Sekimoto, Ryosuke Shibasaki, Hiroshi Kanasugi, Tomotaka Usui, and
Yasunobu Shimazaki. 2011. Pow: Reconstructing people ow recycling large-
scale social survey data. IEEE Pervasive Computing 10, 4 (2011), 27–35.
[4]
Chaolun Xia, Raz Schwartz, Ke Xie, Adam Krebs, Andrew Langdon, Jeremy
Ting, and Mor Naaman. In Proceedings of the companion publication of the 23rd
international conference on World wide web companion (WWW).
[5]
Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. 2013. T-drive: enhancing
driving directions with taxi drivers’ intelligence. Knowledge and Data Engineering,
IEEE Transactions on 25, 1 (2013), 220–232.
[6]
Yu Zheng, Yanchi Liu, Jing Yuan, and Xing Xie. In Proceedings of the 13th Interna-
tional Conference on Ubiquitous Computing. New York, NY, USA.