ArticlePDF Available

An Automatic Extraction Method of Coach Operation Information from Historical Trajectory Data

Authors:

Abstract and Figures

Quality of travel service for road transport relies heavily on richness of transport operation data. Currently, most types of data including coach operation data are collected by manual investigation which is time-consuming and labor-intensive, and this significantly hinders the realization of intelligent traffic information service. In view of the above problems, this paper is aimed at introducing a method of automatically extracting coach operation information using historical GPS trajectory data of massive coaches. The method first analyzes trajectory characteristics of coaches within stations and identifies the highly dense point clusters as coach stations using the DBSCAN clustering algorithm. Then the schedule information is obtained by conducting error adjustment on the actual arrival and departure time series of multiple shifts, and the name of coach station is queried from point of interest (POI) and geographical name database provided by online map. Finally, the regular driving route of coaches is extracted by an incremental trajectory merging method. The proposed method is applied in handling historical trajectory data in the Beijing-Tianjin-Hebei region in China, and experimental results show that the extraction accuracy is 84% and verify its effectiveness and feasibility. The proposed method makes use of data mining techniques to extract coach operation information from big trajectory data and saves a lot of labor work, time, and economic cost required by on-site investigation.
This content is subject to copyright. Terms and conditions apply.
Research Article
An Automatic Extraction Method of Coach Operation
Information from Historical Trajectory Data
Jun Li ,1,2 Qingqi Li ,1Yan Zhu,1Yan Ma,3Yubin Xu ,3andChaoXie
2
1College of Geoscience and Surveying Engineering, China University of Mining and Technology, Beijing, China
2National Engineering Laboratory for Transportation Safety and Emergency Informatics,
China Transport Telecommunications & Information Center, Beijing, China
3China Academy of Civil Aviation Science and Technology, Beijing, China
Correspondence should be addressed to Jun Li; junli geo@.com
Received 28 October 2018; Revised 13 January 2019; Accepted 23 January 2019; Published 11 February 2019
Academic Editor: David F. Llorca
Copyright ©  Jun Li et al. is is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Quality of travel service for road transport relies heavily on richness of transport operation data. Currently, most types of data
including coach operation data are collected by manual investigation which is time-consuming and labor-intensive, and this
signicantly hinders the realization of intelligent trac information service. In view of the above problems, this paper is aimed
at introducing a method of automatically extracting coach operation information using historical GPS trajectory data of massive
coaches. e method rst analyzes trajectory characteristics of coaches within stations and identies the highly dense point
clusters as coach stations using the DBSCAN clustering algorithm. en the schedule information is obtained by conducting error
adjustment on the actual arrival and departure time series of multiple shis, and the name of coach station is queried from point of
interest (POI) and geographical name database provided by online map. Finally, the regular driving route of coaches is extracted by
an incremental trajectory merging method. e proposed method is applied in handling historical trajectory data in the Beijing-
Tianjin-Hebei region in China, and experimental results show that the extraction accuracy is % and verify its eectiveness and
feasibility. e proposed method makes use of data mining techniques to extract coach operation information from big trajectory
data and saves a lot of labor work, time, and economic cost required by on-site investigation.
1. Introduction
Intelligent trac information service is to make use of
advanced technologies (internet of things, cloud computing,
etc.), transport operation data, analysis, and decision algo-
rithms to obtain trac knowledge for providing decision-
making services for government, enterprises, and the public.
Typical examples are real-time trac monitoring [, ],
transit planning [, ], travel time planning [, ], and online
ticket inquiry and booking system []. Since the th century,
intelligent trac information services have been widely used
in various elds and become an important component and
hot research direction in intelligent transportation eld [, ].
Quality of intelligent trac information service is aected not
only by the accuracy and eciency of analysis algorithms, but
also by the completeness and quality of transport operation
data []. Scholars have carried out a lot of research work
on trac analysis and decision algorithms []; however, few
studies are focused on transport operation data collection,
especially improving the automation level, and accuracy
of data collection process. ere are mainly two types of
collection methods for transport operation data: one is
collected by monitoring equipment, such as a camera or an
induction coil to obtain trac ow, while the other is col-
lected by manual investigation, such as the coach operation
information, including station location, name, schedule, and
operation route. In real world, most of transport operation
data need to be collected through manual work. Taking the
coach operation information in China as an example, a large
number of passenger transport enterprises are involved in
this industry, and the mode of operation in road passenger
stations is isolated and low standardized []. It would be a
very time-consuming and labor-intensive task to manually
collect detailed coach operation information in a large area
Hindawi
Journal of Advanced Transportation
Volume 2019, Article ID 3634942, 15 pages
https://doi.org/10.1155/2019/3634942
Journal of Advanced Transportation
or even across the country. erefore, to realize intelligent
trac information service for road passenger transport, there
isanurgentneedforthemethodsofcollectingdetailedcoach
operation information in a quick, ecient, and low cost way.
With the development of sensing and location technolo-
gies, a lot of vehicles are installed GPS (Global Positioning
System) receivers and wireless communication equipment.
e vehicles are continuously collecting real-time informa-
tion including locations, motion parameters, and positioning
time while moving and then transmit them to the data
center. is type of data is called oating car data [, ].
Massive vehicle positioning data is increasingly accumulated
which creates a new way to solve the above challenges [].
e emergence of these data makes it possible to make use
of big data analysis technologies to mine rich knowledge
from trajectory data [], including operation information of
passenger vehicles.
e research work related to this paper is mainly divided
into three aspects. One is the extraction of regions of interest
basedontrajectorydata.Palmaetal.andBhattacharyaetal.
proposed methods of analyzing the movement characteristics
(velocity, azimuth, acceleration, etc.) at specic locations and
found interesting/important places related to person/object
[, ]. Many scholars further explored region of interest
for various applications, and, among those, Zheng et al. used
trajectory mining algorithms to analyze user-generated GPS
trajectory data for recommending tourist attractions and
personalized tourist attractions []. Besides, Li et al. used
the DBSCAN algorithm to extract parking lot locations by
analyzing the typical characteristics of oating car data within
parking lots [].
Another aspect is route extraction based on trajectory
data. Schoredl et al. and Li et al. proposed two methods
of extracting high-precision road maps from trajectories of
the vehicles equipped with GPS receivers [, ], and their
methods are suitable for high and low sampling frequency
of location data, respectively. Dierent from theirs, Cao and
Krumm proposed a novel gravitational model to convert
original GPS trajectory into a road network that can guide
the path selection []. Aiming at extracting more detailed
information, Tang et al. proposed a method for mining lane-
level road network information from low-precision vehicle
GPS trajectories based on the number of trac lanes and
steering rules []. Kuntzsch et al. established an explicit
crossover model with a fractional function to extract trac
network from GPS data and proved the feasibility of the
method through experiments on GPS datasets of dierent
sizes and data quality [].
In addition to region of interest and route identication,
scholars have also done a lot of research on extraction
of spatiotemporal patterns based on trajectory data. For
example, Kang and Yong proposed a method of nding
spatiotemporal patterns in trajectory data, which rst dis-
covered meaningful spatiotemporal regions and then extracts
frequent spatiotemporal patterns using a prex-projection
approach []. Similarly, Lei et al. [] and Zhao et al. []
proposed space-time analytical model/framework to capture
movement patterns of objects. Dierent from those, Lu et al.
proposed a visual analysis method to study the behavior of
vehicles along a route, focusing on the temporal and spatial
distribution of travel time, that is, the time spent on each
road segment and the travel time change during peak/o-
peak hours []. Temporal and spatial patterns of moving
objects extracted from trajectory data can also be used for
travel planning recommendations; for example, Zheng et
al. [] and Hsieh et al. [] extracted interesting locations
and activities from trajectory data and then recommended
travel routes. Some scholars have also carried out work on
periodic pattern recognition such as a probabilistic period
detection method for moving objects []. Although scholars
have proposed various methods on trajectory data to identify
region of interest, route of interest, and spatiotemporal
patterns, few studies have explored the comprehensive use of
data mining methods to extract coach operation information.
is paper is aimed at introducing a method of automati-
cally extracting coach operation information using historical
GPS trajectory data of massive coaches. Aer analyzing the
typical characteristics of GPS trajectories collected within
stations, the DBSCAN spatial clustering algorithm is used
to identify station location. en, schedule information is
obtained by conducting error adjustment on actual arrival
and departure time series of multiple shis, and coach station
name is identied by point of interest (POI) and geographical
name database provided by online map. Finally, the regular
driving route of each coach line is extracted by an incremental
trajectory merging algorithm. is method was promising to
obtain coach operation information in a large area in a fast
and low-cost way.
2. Methodology
e development and popularization of internet of vehicles
have caused oating cars to generate a large amount of
tracking data. Tracking data record the location and motion
parameters of vehicles while vehicles are moving and are an
indispensable data source for studying behavior of vehicles
and for mining hidden information []. One among many
oating car platforms is the National Commercial Vehicle
Monitoring Platform (NCVMP) established by the Ministry
of Transport of China in  for monitoring several special
typesofcommercialvehiclesincludingcoaches.Forcoaches,
the most important information is coach operation informa-
tion including the location and name of stations, schedules,
and operation routes of coaches. Due to various kinds of
reasons, the drivers of coaches may sometimes adjust their
driving route according to road conditions or driving habits,
so the route of dierent shis of even the same coach or the
arrival time at the same station is oen not exactly the same.
erefore, how to accurately extract station locations, regular
schedules and driving routes from historical trajectory data
are the key to this research.
e overall workow of extracting coach operation infor-
mation from coach trajectory data is shown in Figure . It is
mainly composed of four parts: location extraction of station,
name identication of station, schedule extraction, and route
generation. Firstly, the abnormal positioning points in the
originaldatasetthatareirregularorillogicalareremoved,and
the trajectory points belonging to coach routes are separated
Journal of Advanced Transportation
Coach operation
information
Data preprocessing
Trajectory data splitting
Construct
URL
Online map
API
DBSCAN
Clustering
Trajectory point
classification
Trajectory point
merging
Route
smoothing
POI
database
Coach trajectory
data
Basic coach operation
information
Match
Arrival time series
extraction
Schedule
information
Station
name Coach route
Error
adjustment
Station
location
F : Overall workow of extracting coach operation information based on trajectory data.
fromtheoriginaldatasetbasedonbasiccoachoperation
information. Secondly, this paper uses the DBSCAN algo-
rithm to extract the location of stations where coaches park
for passengers to get on and o and meanwhile calculates the
regular arrival and departure time for each station using error
adjustment theory. irdly, based on the place name and POI
database of online maps, station name is obtained through
the extracted station locations. In the procedure of driving
route extraction, any one trajectory of a coach is rst taken
as a candidate route, and other trajectories are continuously
updated to the candidate route by merging process including
trajectory point classication and merging and condence
ltering, and nally the regular driving route of each coach
is obtained.
For convenience, the historical trajectory point set of all
coach routes is denoted as  = {1,2,...,𝑖,...,𝑚},
where 𝑖is the trajectory data of the i-th coach route. A
trajectory is composed of a series of trajectory points and
it is represented by 𝑖={
1,
2,⋅⋅⋅,
𝑗,⋅⋅⋅,
|𝐿𝑖|}.e
basic coach operation information set is denoted as  =
{1,2,...,𝑗,...,𝑚},where𝑖={
𝑖(,),𝑖(,),𝑖}is
the basic operation data of the i-th coach route, where 𝑖(,)
isthelocationoforiginstation,𝑖(,) isthelocationof
destination station, and 𝑖is departure time of the i-th coach
route.
2.1. Data Preprocessing. Duetothevarietyofin-vehicleposi-
tioning and communication equipment and the complicated
and diverse driving environment of vehicles, the quality of
trajectorydataofcoachesisunevensuchaspositiondri,
abnormal attribute values, or data irregularity. In addition,
there are inevitable erroneous data and redundant data,
and they aect the knowledge extraction process. Based on
research needs, the original trajectory dataset is preprocessed
from two aspects: data cleaning and coordinate transforma-
tion. First, the trajectory points with abnormal coordinate
values are removed, including the cases where the coordinate
value is  or missing. Second, the trajectory point whose
speed is negative or direction exceeds  degrees is ltered
out. Finally, the coordinates of original trajectory data are
stored in the form of latitude and longitude, which is not
convenient for distance calculation. erefore, the coordinate
transformation is converted from the WGS- coordinate
system to the UTM projection coordinate system in order to
facilitate extraction of coach stations and routes.
2.2. Trajectory Data Splitting. e GPS receiver mounted on
thecoachstartstocollectdataonceturnedon,andthis
causes that the original dataset contains not only the tracking
points on a coach route, but also the tracking points not
collected on any coach route, such as temporary vehicle
Journal of Advanced Transportation
Trajectory point
Spilt point
On-route segment
Off-route segment
-1
+1
F : Trajectory data splitting.
scheduling or operating for other routes. e existence of the
points on nontarget routes not only increases the workload
of data processing, but also is not useful for the extraction
of operation information. erefore, the trajectory data set
needs to be split before the operationinformation extraction.
A coach route is determined by three factors: origin station,
destination station, and departure time. Trajectory data split-
ting means separating the tracking points belonging to a route
from the original vehicle trajectory data according to three
factors of the coach route. is paper proposes a trajectory
data splitting algorithm based on the linear sequence of start
and end points. e workow of the algorithm is as follows
(Figure ).
(1) Split Point Identication.First,foranycoachroute𝑖,
the threshold of matching distance for both origin station
𝑖(,) and destination station 𝑖(,) is set to 𝑠,andthe
threshold for the dierence between the actual and planned
departure time is set to 𝑠. Second, all trajectory points 𝑟
in 𝑖are traversed. If a point 𝑟satises (𝑟,𝑖(,)) ≤
𝑠and (𝑟,𝑖)≤
𝑠,then𝑟is identied as a split
point 𝑖of this coach route; otherwise if a point 𝑟satises
(𝑟,𝑖(,)) ≤ 𝑠and (𝑟,𝑖(,)) = ,then𝑟is
identied as a split point 𝑖of the coach route.
(2) Split Point Sequence Generation. e above split point
identication process is repeated until all split points are
found, either 𝑖type or 𝑖type. According to the fact that
the coach rst passes through origin station and then arrives
at destination station, and every two adjacent split points
with a point of 𝑖type followed by a point of 𝑖type form
apointpair(𝑖,
𝑖).Finally,thesplitpointsequence=
{(1,
1),...,(
𝑗,
𝑗),...,(
𝑛,
𝑛)} is generated.
(3) On-Route Segment Splitting. e trajectory points
between any pair of split points in the point sequence are
regarded as on-route trajectory segment, while the rest of the
points are regarded as o-route segment.
2.3. Coach Station Extraction. Coaches have two types of
states throughout the entire operation: moving and docking.
Analyzing and identifying the spatial characteristics of tra-
jectory points in these two states are the key to determining
location of coach stations. As shown in Figure , the trajec-
tory points of coaches collected under moving status (Region
A) are generally evenly spaced along the road, while those
under docking status (Region B) are more aggregated, and
a point cluster tends to appear since the coach would park
here for a while. By comparing the spatial distribution and
attribute characteristics of GPS trajectory points collected
under dierent operating conditions, it can be concluded
Station
Road
Region A
GPS points
Region B
F : Spatial distribution characteristics of trajectory data
under dierent conditions.
that the trajectory points at parking stations has dierent
characteristics from the points in other places: the number of
trajectory points per unit area is relatively large, the number
of vehicles is large, the trajectory points tend to form obvious
point clusters, and the speed value of most points is zero. In
view of these features, this paper uses the DBSCAN clustering
algorithm (density-based spatial clustering of applications
with noise) proposed by Ester et al. to extract highly dense
point clusters in trajectory data of coaches and calculates the
coordinates of cluster center as the location of coach station
[].
2.3.1. DBSCAN Algorithm. e DBSCAN algorithm is a
classic density-based clustering method and denes a point
cluster as the largest set of points connected by density (Han
et al.) []. e method identies a region with suciently
high density as a cluster and can nd clusters of arbitrary
shapes in a spatial database with noises. Assuming a dataset is
=(
1,2,...,𝑖,...,𝑛),where𝑖is a trajectory point,
the algorithm of DBSCAN is conducted as follows.
Step 1. Search for the unprocessed point 𝑖in the original
dataset .If𝑖is neither classied into a cluster nor marked
as a noise, check the points in its neighborhood with the
radius of  represented by N𝐸𝑝𝑠 (𝑖); if the number of points
in N𝐸𝑝𝑠 (𝑖)is not less than ,anewpointclusteris
created, all the points in N𝐸𝑝𝑠 (𝑖)areaddedtothecandidate
point cluster ,and𝑖is marked as processed.
Step 2. Traverse all unprocessed points in the candidate
cluster onebyone.Checktheneighborhood𝐸𝑝𝑠(),and
if 𝐸𝑝𝑠() contains at least  points, then all points in
𝐸𝑝𝑠() areaddedintothecluster.
Step 3. Repeat Step  to continue searching for unprocessed
points in until all points in the candidate cluster C have
already been processed.
Journal of Advanced Transportation
Step 4. Repeat Step  to Step  until all points in are either
classied into a cluster or marked as a noise.
2.3.2. Stop Point Extraction. Due to driving safety or trac
rules, the coach would stop at some specic locations on the
operation route, such as gas station, service area, station, and
trac lights. In this paper, the above locations are referred
to as stop points. e trajectory data 𝑟of each route is
processed using the DBSCAN algorithm to extract all highly
dense point cluster 𝑖, and the coordinates of each cluster
center 𝑐
𝑖(𝑐
𝑖,𝑐
𝑖)are calculated by 𝑐
𝑖=(1/
𝑖)‖𝐶𝑖
𝑃𝑟∈𝐶𝑖𝑃𝑟,
𝑐
𝑖=(1/
𝑖)‖𝐶𝑖
𝑃𝑟∈𝐶𝑖𝑃𝑟,where(𝑃𝑟,
𝑃𝑟)is the coordinates
of trajectory point 𝑟belonging to the cluster 𝑖,and𝑖
isthenumberoftrajectorypointswithin𝑖. ese cluster
centers 𝑐
𝑖are the stop points for the i-th coach route, and
each route has its corresponding stop points.
2.3.3. Coach Station Extraction. Coach station extraction is
to identify station locations from the stop points generated
by various reasons. Dierent from the stop points generated
by gas lling, temporary stop, etc., the stop points within
coach stations have the following characteristics: the density
is higher, the same coach has multiple stop points in the same
place, and the number of involved coaches is relatively larger.
erefore, this paper adopts the DBSCAN algorithm again
on the stop points in the dataset 𝑐
𝑖( = 1, 2,⋅ ,), where
k is the number of coach trajectories and identies highly
dense clusters 𝑗. Similarly, the coordinates of cluster center
𝑐
𝑗(𝑐
𝑗,𝑐
𝑗)are calculated by 𝑐
𝑗=(1/
𝑗)‖𝑂𝑗
𝐶𝑖∈𝑂𝑗𝑐𝑐
𝑖and
𝑐
𝑗=(1/
𝑗)‖𝑂𝑗
𝐶𝑖∈𝑂𝑗𝐶𝑐
𝑖,where(𝐶𝑐
𝑖,
𝐶𝑐
𝑖)is the coordinates
of stop point 𝑐
𝑖belonging to the cluster 𝑗and 𝑗is the
number of stop points of the cluster 𝑗. e cluster centers
𝑐
𝑗are obtained as the locations of coach stations.
2.4. Schedule Information Extraction. e schedule informa-
tion includes arrival time, dwelling time, and departure time
of a coach to each station on its route. Accurate schedule
information is very important for both passengers and
drivers, which can save passengers’ waiting time and ensure
normal operation of coaches. In the previous subsection,
the trajectory points collected while the coach moves into
a station each time are obtained, and they form a point set
𝑖={
𝑖|∀
𝑗,𝑘,
𝑖∈
𝑗&𝑐
𝑗∈
𝑘}.Whenallthetrajectory
points constituting 𝑖aresortedbytime,thelocationtime
series of 𝑖is 𝑖expressed as follows:
𝑖=
1,2,...,‖𝑈𝑖‖−1,‖𝑈𝑖()
where 1represents the earliest location time in the point set
𝑖and ‖𝑈𝑖represents the latest location time. erefore, 1
and ‖𝑈𝑖are the arrival and departure time of the station for
one shi. e arrival time of a coach to a station is inuenced
by many factors including weather and trac condition, and
it is not always the same for dierent shis. According to
the error theory, the arrival timestamps to the same station
canbetakenasarandomvariableaccordingtothenormal
distribution. As such, the arrival time is represented as
(,2),whereis the expected value of the arrival time or
in other words the planned arrival time, and is the standard
deviation of the arrival time series caused by various kinds of
reasons.
Coaches may experience a serious trac jam or accident
on the road, which causes that the arrival times to stations
are much dierent from the planned times, and we call them
abnormal arrival time. e Grubbs’ test is used to eliminate
abnormal arrival timestamps, and the detailed procedures are
as follows:
() e arrival time series of a station is arranged from
small to large (usually the smallest or largest value are
rst doubted whether they are abnormal or not).
() Determine the danger ratio .
() Calculate (is the distribution of the order statistics
()). Suppose (1) is suspicious, let =(−
(1))/,andsuppose(𝑛) is suspicious; let =
((𝑛) − )/,where=∑
𝑛
𝑡=1 𝑡/ and =
𝑛
𝑡=1(𝑡)2/( − 1).
() Check the value of (,) corresponding to and .
() If ≥(,), the suspicious data is abnormal and
should be eliminated and the above procedures are
repeated until there are no abnormal data in the time
series.
Aer all abnormal data are eliminated, the calculated
average arrival time is taken as the planned arrival time
represented by 𝑎,andthesamemethodisusedtocalculate
the planned departure time represented by 𝑏.
2.5. Station Name Extraction. Aer obtaining the geograph-
ical location of coach stations and its associated schedule
information, we also need to identify the name of stations to
compose complete information. In this paper, the Uniform
Resource Locator (URL) search function of AutoNavi Map is
used to identify station name with the geographic coordinates
of stations obtained as an input parameter and the search type
set to trac facility. If the result returned by the function
contains the words related to passenger station, it is taken
as a station name; otherwise the place name (or the name of
administrative unit) near the input coordinates is identied
as the station name. For example, to identify the station name
at the place with the coordinates of .E and .N, the
point type is set to trac facility, and the following URL is
constructed:
http://restapi.amap.com/v/geocode/regeo?output=xml&
location=.,.&key=affacbdebabdd-
&radius=&poitype=&roadlevel=&exten-
sions=all
e returned result is shown in Algorithm  , and then
Tangshan Passenger Transport West Station is taken as the
station name in the place of .E and .N.
2.6. Coach Route Extraction. Another important coach oper-
ation information is the planned route of each coach. Our
task is to extract the planned route of a coach from multishi
Journal of Advanced Transportation
<pois type=“list”>
<poi>
<id>BBANW
<name>Tangshan Passenger Transport West Station </name>
<type>Transportation facilities; long-distance bus station; long-distance bus station </type>
<tel>-</tel>
<direction>East</direction>
<distance>.</distance>
<location>.,.</location>
<address>Station road </adress>
<poiweight>.</poiweight>
<businessarea/>
</poi>
</pois>
A : Station name search result.
Trajectory point
Road
Coach route
F : Coach route extraction.
trajectories with possible variances as shown in Figure .
is paper uses the incremental trajectory merging algorithm
composed of four major steps: candidate route generation,
trajectory point classication, trajectory point merging, con-
dence ltering, and smoothing []. Firstly, the moving
segment of any one trajectory is taken as the candidate route,
and then the tracking points in the rest of trajectories are
classied to dierent types according to their spatial and
semantic relationship with the candidate route and further
merged into the candidate route, and nally the planned route
is obtained aer condence ltering and smoothing.
(1) Candidate Route Generation.ecandidaterouteisa
temporary result of route extraction process, which stores
the moving path of coaches in the form of vector. For the
sakeofclarity,thevectornodepointiscalledtheroutenode
point. Each route node point has a weight attribute showing
the reliability of this node point on the planned route of the
coach. When a node point is newly added, its weight attribute
is given an initial value of .
(2) Trajectory Point Classication. e trajectory points are
classied into three categories based on the relationship
between the trajectory points and the candidate route seg-
ments.Ifthedistancebetweenatrajectorypointandthe
candidate route is less than or equal to 1and the direction
dierence between them is less than or equal to 1,this
trajectory point is classied to 1type. If the distance between
a trajectory point and the candidate route is between 1and
2, and both two adjacent points of the trajectory point meet
1type condition with the same candidate route, this point is
classied to 2type. If a point belongs to neither 1type nor
2type, it is classied as 0type.
(3) Trajectory Point Merging. Dierent types of trajectory
points are merged to the candidate route using dierent
methods. When the trajectory point is 1type, the edge of
candidate route   that is closest to the trajectory
point is rst found. If the distance of the point from any one
node point of   is not more than 3,theweighted
mean value of coordinates of the point and the node point is
calculatedandtakenasthenewposition,andtheweightvalue
of the node point is increased by . If the point is far from
both node points of   but the distance to the edge
 is less than or equal to 4, then the weight values
of both node points of   are increased by . When
the trajectory point is 2type, it is inserted to  .
When the trajectory point is 0type, a new candidate route
is created starting from this trajectory point.
(4) Condence Filtering and Smoothing. Condence ltering
is to remove the extracted route with a low accuracy or
probability of being a real planned route. It is achieved by
removing the route node points, edges, or an entire route
with the weight values less than 1.Routesmoothingisto
smooth slight sawteeth existing in the extracted route without
changing the major shape of the route. is paper adopts the
adaptive smoothing algorithm based on bending angle, which
determines the smoothing degree according to the sawtooth
angle[].AsshowninFigure,PointsA,B,andCforma
sawtooth, and D is the foot point of the sawtooth vertex B
projected onto the straight line AC. e sawtooth is polished
by moving its Vertex Point B towards the Foot Point D, and
the moving distance is calculated by 𝑀=,whereis the
distance between Point B and Point D, =(
2)/(1802),
and isthevertexangle.emaximummovingdistanceis
set to  m in order to ensure that the maximum amending
distance does not exceed the location error.
Journal of Advanced Transportation
T : Sample GPS data of passenger vehicles.
License plate number administrative unit code longitude latitude speed(km/h) Direction (degree) Positioning time
PAE  . .   -- ::
FAN  . .   -- ::
PAW  . .   -- ::
Original node
Amended node
A
B
C
D
E
F : Schematic diagram of smoothing method (from Li et
al.)[].
3. Result and Analysis
3.1. Study Area and Dataset. e Beijing-Tianjin-Hebei
region in China is selected as our study area to demonstrate
the performance of the proposed method as shown in
Figure . ere are two kinds of study datasets used in this
paper.OneistheGPStrajectorydataofaboutcoach
routes collected by the NCVMP from August 1st to 20th,,
and the sample data is shown in Table . e main elds of
the trajectory data include license plate number (encrypted
for privacy protection), administrative unit code, positioning
time,longitude,latitude,speed,anddirection.espatialdis-
tribution of GPS trajectory data is shown in Figure . Another
datasetisthebasiccoachrouteinformation,includingroute
number, coordinates of origin and destination station, and
departure time, and the sample dataset is shown in Table .
3.2. Results and Accuracy Analysis. According to Section ,
the coordinates of the coach trajectory points are rst
converted from the WGS- to the UTM projection, and
the abnormal points are eliminated. Considering that there
areinevitablylocationerrorsinbothtrajectorydataand
basic coach operation information, and the coach station has
a certain spatial size, the buer radius 𝑠in the trajectory
splitting introduced in Section . is set to .km. rough
investigation and analysis, the actual departure times of
multipleshisforthesamecoachrouteareveryclosetothe
planned departure time, so the time dierence threshold 𝑠
is set to .h. e GPS sampling interval of the trajectory
data is between  seconds and  minutes, so the clustering
parameters and are set to m and , respectively,
to extract stop points. In view of the operation characteristics
of the coaches within stations and the regional features of
stations, the above two parameters are set to m and ,
respectively, in the second DBSCAN clustering. It means a
place can be taken as a coach station only if at least  coach
shis or coaches stop in this place. In the extraction process of
coach route, the parameter value selection is shown in Table 
4042
40.42
117
117
N
Miles
GPS Point
Province
0510 20 30 40
F : Study area and dataset.
according to the similarity of multiple trajectories of the same
coach route.
In order to demonstrate the process of extracting coach
operation information more clearly, one coach route (Route
) in the study dataset is taken as an example, which leaves
from the Liuliqiao Bus Station in Beijing at :. Figure (a)
shows the trajectory data of all shis for Route . As we
can see from the main gure and enlarged view, the raw
dataset contains the trajectory segment before : (in other
words, the coach route operation has not started yet) and
also the return trip part belonging to a dierent coach route.
Figure (b) shows the result aer trajectory splitting, and
the remaining data are the trajectories operating for Route
, thereby reducing the number of trajectory points to be
processed. For the coach station extraction, the Qibin Road
Station is taken as an example to demonstrate clustering
process as shown in Figure . Figure (a) shows the trajectory
points around the Qibin Road Station, while Figure (b)
showsthevestoppointsextractedbytheDBSCANalgo-
rithm according to point density, and Figure (c) shows that
Journal of Advanced Transportation
T : Sample of basic coach route information.
Route No. Origin station Destination station Departure time
Longitude Latitude Longitude Latitude
 . . . . :
 . . . . :
T : Parameter values in trajectory merging.
Parameter Value Parameter Value
1m 1
2m 2
3m 1
4m
403836
403836
114% 117
114% 117
N
GPS Point
Province
Kilometers
12060300
(a) Raw trajectory data
403836
403836
114% 117
114% 117
N
GPSPoint
Province
Kilometers
12060300
(b) Trajectory data aer splitting
403836
403836
114% 117
114% 117
N
Cluster_1
Province
Kilometers
12060300
(c) Stop points
Station
Province
403836
403836
114% 117
114% 117
N
Kilometers
12060300
(d) Coach stations
Route
Province
403836
403836
114% 117
114% 117
N
Kilometers
12060300
(e) Coach route
Station
Route
Province
N
403836
403836
114% 117
114% 117
Kilometers
12060300
(f) Coach stations and route
F : Extraction results at dierent steps for a coach route (Route ).
Journal of Advanced Transportation
GPS Point
(a)
Cluster Center
GPS Point
(b)
Station
Cluster Center
GPS Point
Qibin Road Station
(c)
F : Coach station extraction process.
a coach station is identied from the stop points aer another
clustering. For Route , quite a few stop points are extracted
(Figure (c)) and only  coach stations are further extracted
(Figure (d)) since some stop points are generated due to
trac jam, gas lling, etc. Figures (e) and (f) show the
nal extracted coach route which starts from the Liuliqiao
Coach Station in Beijing and ends at the Zhengzhou Coach
Station. Table  shows the operation schedule of this coach
routeshowingthearrivalanddeparturetimeforeachstation.
Figure  shows the nal coach station and route extrac-
tion result of the entire study dataset, and Table  demon-
strates the schedule information for part of schedule routes,
including schedule code, all passing station name, station
location, and arrival and departure time.
A total of , coach stations were extracted from
the study data. In order to verify the location extraction
accuracy of coach stations, we superimposed the extraction
stations on satellite imageries of Google Earth and randomly
selected  coach stations to verify whether the extracted
stations match with satellite imageries or not. By comparison,
 stations were correctly extracted, while  were wrong
extraction (e.g., highway service areas, road intersections).
e extraction accuracy of coach station is %. Figure 
shows the superimposed eect of stations on remote sensing
images for four typical areas, in which the yellow pin point
represents raw trajectory point of coaches, while the green
pin point represents the extracted coach station. Figures
(a), (b), (c), and (d) are a large coach station, a
roadside parking lot, a road intersection in a village, and a
place on highway, respectively. e extracted station in (a)
is apparently correct. rough investigation, although the
extractedlocationsin(b)and(c)areinformalcoachstations,
they are two locations where nearby passengers get on and
get o coaches every day, and they actually function as two
coach stations. e extracted location in (d) is in the middle
of a highway. Aer investigation, we nd that at the extracted
location is a place where trac jam occurs very frequently;
therefore, the GPS trajectory points collected here are very
dense, which causes it to be wrongly extracted since it has
similar characteristics to coach stations.
 Journal of Advanced Transportation
T : Extracted schedule for Route .
Time Station
Liuliqiao Coach Station Douzhuang County Lijiakou County Qibin Road ZhengzhouCoach Station
Arrival time -- : : : :
Departure time : : : : --
T : Schedules for part of coach routes.
Schedule No. Station Longitude Latitude Stay time(minutes) Arrival time Departure time
Baoding . . -- :
Xiacang . . : :
 Yutian . . : :
Shaliuhe . . : :
Tangshan . . : --
Liuliqiao . . -- :
 Zhuozhou . . : :
Gaobeidian . . : :
Pinggang . . : --
Liuliqiao . . -- :
 Daban . . : :
Lindong . . : --
3.3. Eciency Analysis. In order to test the eciency of the
proposed method in extracting coach operation information,
ve subdatasets of dierent sizes including , , , ,
and  coach routes were selected from the study data, and
the execution times were recorded. e number of extracted
stations was , , , , and , respectively, for the
ve datasets, and the corresponding execution times were
,,,,andsecondsasshowninFigure.
e result indicates that the average execution time to extract
coach operation information for each route is less than 
seconds, and each coach route has an average of four stations.
Ascanbeseenfromthechangetrendoftheblacksolid
curve, the execution time of the method is nearly linearly
related to the number of processed routes. erefore, the
proposed method can be applied in real-world applications
since the execution time would not signicantly increase
as the number of coach routes increases. In addition, the
algorithm of this paper is programmed in a batch-processing
mode in which a large number of datasets can be processed
at the same time.
4. Discussion
In this paper, the massive historical coach trajectory data
is applied in extracting coach operation information. e
experimental results prove that the proposed method can
automatically extract accurate coach operation information
in a large area. Note that the extracted coach operation infor-
mation is much more abundant than the input data of the
proposed method: basic coach operation information. e
transport agency manages basic coach operation information
since enterprises need to get approval from them to operate a
coach route; however, they cannot master detailed operation
information because the operating enterprises can choose
driving routes and set intermediate stations on their own.
e extraction result is composed of complete operation
information of coaches: station location, station name, coach
schedule, and driving route, which provides data basis for
intelligent travel information service. e method of this
paper has the following characteristics.
(1) e raw trajectory data is split into eective and
noneective segments according to the location of origin
and destination station of each coach route. e eective
trajectory segment aer splitting represents the data collected
while the vehicle operates for a specic route, and only this
partwouldbeprocessedinfurthersteps.esplittingprocess
not only reduces the amount of data to be processed, but also
prepares for subsequent schedule extraction.
(2) ere are various kinds of clustering algorithms that
can be taken into account for extracting coach stations. e
K-means algorithm [] and the spectral clustering algorithm
[] have to know the correct number of clusters in advance
and cannot identify noise points, so they are not suitable for
our research. By contrast, the DBSCAN [] and HDBSCAN
[, ] algorithm do not need to specify the number of
clusters and are able to detect noise points. We select two
sample trajectories to compare the results of DBSCAN and
HDBSCAN algorithms using the same parameter values. e
parameter for both DBSCAN and HDBSCAN are set
to , and the parameter for DBSCAN is set to  meters.
Figures  and  show clustering results of two algorithms. It
canbeseenthatmostclusterpointsgeneratedbyHDBSCAN
appear in the shape of chains (Figures (a) and (a)), while
those generated by DBSCAN are mostly highly dense points
Journal of Advanced Transportation 
Coach Station
Coach route
Province
N
Kilometers
42403836
42403836
114% 117
114% 117
12060300
F : Coach station and route extraction results for the entire
dataset.
(Figures (b) and (b)). Based on algorithm principle and
clustering results, we can know that HDBSCAN recognizes
a cluster based not on regional point density but on mutual
reachability distance, so it would identify point chains as
clusters, and this feature would lead to wrong extraction
of coach stations. erefore, we choose DBSCAN that can
identify highly dense points.
(3) Two times of clustering is used on the trajectory data
to extract coach stations. e clustering for the rst time is
to identify the highly dense points and calculate the center of
point cluster as a stop point of the coach. e clustering based
on the obtained stop points for the second time is to further
identify the location of coach stations among all stop points
caused by various kinds of reasons. eoretically, one time
clustering is adequate for coach station extraction. However,
thetimecomplexityoftheDBSCANclusteringalgorithmis
(×) which is quadratic to the number of processed points.
Usually, the number of raw trajectory points is thousands
of millions, so the processing time in clustering would be
very long if the clustering algorithm is directly conducted on
them. Using a two-time clustering strategy is very useful in
eciency promotion.
(4) A few parameters are involved in the proposed
method, including distance threshold, time and angle dif-
ference threshold, and clustering parameters. Most of these
parameters do not change as the trajectory dataset or study
area change. For example, the distance threshold, the angle,
and time dierence threshold used in trajectory merging are
determined by considering both vehicle moving character-
istics and the geometric features of roads and can be kept
unchanged. However, the setting of clustering parameters
and  required in coach station extraction needs
to be determined according to the sampling interval of raw
trajectory data. If the sampling interval of the trajectory data
exceeds s, the value of  is recommended to be not
greater than  when extracting stop points. If the sampling
interval is less than s, its value is recommended to be
between  and .
(5) Although the proposed method makes it very con-
venient to obtain coach operation information in a large
area quickly, there are still some problems to be improved.
For example, if there is a frequently congested point on
the coach route, the trajectory data of vehicles at this place
will satisfy the trajectory characteristics within stations, and
the congestion point will be identied as coach stations. In
addition, for the long-distance trip, the coach driver will oen
park for a rest in some specic places such as highway service
area. ese areas actually have similar functions as coach
stations, that is, both of them are the place where passenger
get on and get o. erefore, it is very hard to distinguish
theseplacesfromcoachstationsbasedononlytrajectory
data. In future, the stop time distribution feature or other
geographic information will be taken into account to assist
in improving the extraction accuracy.
(6)e extraction accuracy of coach stations is inuenced
bythetimespanofcoachtrajectorydata.elargerthedata
volume is, the higher the extraction accuracy is. According
to the experimental results, at least  days of trajectory
data should be used in order to automatically extract coach
operation information.
(7) As introduced in Section ., our method is designed
to extract not only formal coach stations but also “informal
coach stations” where there are no station facilities but
function as stations in fact. It means that the word “coach
station” throughout the entire paper could mean both formal
and informal stations. No distinction is made on this word in
order to make the paper simple to follow.
5. Conclusion
Currently, the coach operation information is mainly col-
lected by manual investigation which is time-consuming and
labor-intensive. Road transport authorities and enterprises
have no ecient way to obtain detailed coach operation
information across the country, and this signicantly hinders
the realization of intelligent trac information service. In
 Journal of Advanced Transportation
(a) (b)
(c) (d)
F : Superimposed eect of extracted coach stations on satellite images.
47
153
288
1117
1778
10 50 100
300
500
0
300
600
900
1200
1500
1800
2100
2400
2700
0
200
400
600
800
1000
1200
1400
1600
1800
2000
54156 144677 528098 2481858 2870285
Execution time (second)
Number of coach routes
Number of trajectory points
Station
Route
Execution time
F : Eciency analysis of algorithm.
order to extract coach operation information automatically
and eciently, the paper rst analyzes the trajectory charac-
teristics of coaches within stations and identies the highly
dense point cluster as coach stations using the DBSCAN
clustering algorithm. en the arrival and departure time
foreachstationiscalculatedusingtheerroradjustment
method. In addition, the name of coach station is obtained
through the API of online maps based on the location of
extracted stations. Finally, the coach route is extracted by
an incremental trajectory merging method. e proposed
method is applied in the Beijing-Tianjin-Hebei region to
extract coach operation information. Experimental results
show that the extraction accuracy is % and verify its
eectiveness and feasibility. e proposed method makes use
of data mining techniques to extract useful information from
big trajectory data and saves a lot of labor work, time, and
economic cost needed by the on-site investigation. It can
provide a data source for establishing a nationwide online
ticketing and travel information system for various kinds of
road passenger transport.
Data Availability
e historical trajectory data used to support the ndings
of this study may be released upon application to the China
Transport Telecommunications & Information Center, which
can be contacted at xiechao@transinfo.com.cn.
Conflicts of Interest
e authors declare that they have no conicts of interest.
Acknowledgments
is research is nancially supported by National Natu-
ral Science Foundation of China (no. ), National
KeyResearchandDevelopmentProgramofChina(no.
YFB), and the Open Project of National Engi-
neering Laboratory for Transportation Safety and Emergency
Informatics(no.YW-).anksareduetoChina
Transport Telecommunication & Information Center for
providing experiment data and related support.
Journal of Advanced Transportation 
012 4 6
Kilometers
N
Cluster points
Noise points
(a)
N
01246
Kilometers
Cluster points
Noise points
(b)
F : Clustering results of Sample Trajectory A by two algorithms: (a) HDBSCAN and (b) DBSCAN.
0 1.75 3.5 7 10.5
Kilometers
N
Cluster points
Noise points
(a)
Cluster points
Noise points
0 1.75 3.5 7 10.5
Kilometers
N
(b)
F : Clustering results of Sample Trajectory B by two algorithms: (a) HDBSCAN and (b) DBSCAN.
 Journal of Advanced Transportation
References
[]B.Coifman,D.Beymer,P.McLauchlan,andJ.Malik,“A
real-time computer vision system for vehicle tracking and
trac surveillance,” Transportation Research Part C: Emerging
Tech n o l o g ies, vol. , no. , pp. –, .
[] F.Calabrese,M.Colonna,P.Lovisolo,D.Parata,andC.Ratti,
“Real-time urban monitoring using cell phones: a case study in
Rome,IEEE Transactions on Intelligent Transportation Systems,
vol.,no.,pp.,.
[] J. C. Sutton, “GIS applications in transit planning and opera-
tions: A review of current practice, eective applications and
challenges in the USA,” Transportation Planning and Technol-
ogy,vol.,no.,pp.,.
[] K.-H. Chen, C.-R. Dow, and S.-J. Guan, “NimbleTransit: Public
transportation transit planning using semantic service compo-
sition schemes,” in Proceedings of the 2008 11th International
IEEE Conference on Intelligent Transportation Systems,pp.
, Beijing, China, .
[] A. Simroth and H. Z¨
ahle, “Travel time prediction using oating
car data applied to logistics planning,IEEE Transactions on
Intelligent Transportation Systems,vol.,no.,pp.,
.
[] A. Chepuri, J. Ramakrishnan, S. Arkatkar, G. Joshi, and S. S.
Pulugurtha, “Examining travel time reliability-based perfor-
manceindicatorsforbusroutesusingGPS-basedbustrajectory
data in India,Journal of Transportation Engineering Part A:
Systems,vol.,no.,pp.,.
[] A. Shingare, A. Pendole, N. Chaudhari, P. Deshpande, and S.
Sonavane, “GPS supported city bus tracking & smart ticketing
system,” in Proceedings of the 2015 International Conference on
Green Computing and Internet of ings (ICGCIoT)(ICGCIOT),
pp.,GreaterNoida,Delhi,India,.
[] A. Carter, “Intelligent transport systems,Journal of Navigation,
vol.,no.,pp.,.
[] A. Sładkowski and W. Pamuła, Eds., Intelligent Transportation
Systems-Problems and Perspectives,vol.,SpringerInterna-
tional Publishing, .
[] D. Levinson, “e value of advanced traveler information
systems for route choice,Transportation Research Part C:
Emerging Technologies,vol.,no.,pp.,.
[] J. Zhang, F.-Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen,
“Data-driven intelligent transportation systems: a survey,IEEE
Transactions on Intelligent Transportation Systems,vol.,no.,
pp.,.
[] T. W. Cao, “Research on problems and status of road passenger
stations in china,Road Transport, vol. , no. , pp. –,
.
[]M.RahmaniandH.N.Koutsopoulos,“Pathinferencefrom
sparse oating car data for urban networks,” Transportation
Research Part C: Emerging Technologies,vol.,pp.,.
[] B.Y.Chen,H.Yuan,Q.Li,W.H.K.Lam,S.-L.Shaw,andK.Yan,
“Map-matching algorithm for large-scale low-frequency oat-
ing car data,International Journal of Geographical Information
Science,vol.,no.,pp.,.
[] G.Draijer,N.Kalfs,andJ.Perdok,“Globalpositioningsystem
as data collection method for travel research,Transportation
Research Record, vol. , pp. –, .
[] Y. Liu, X. Liu, S. Gao et al., “Social sensing: a new approach to
understanding our socioeconomic environments,Annals of the
AssociationofAmericanGeographers,vol.,no.,pp.
, .
[] A. T. Palma, V. Bogorny, Kuijpers, and L. O. Alvares, “A
clustering-based approach for discovering interesting places in
trajectories,” in Proceedings of the 23rd Annual ACM Symposium
on Applied Computing, pp. –, Fortaleza Ceara, Brazil,
.
[] T. Bhattacharya, L. Kulik, and J. Bailey, “Extracting signicant
places from mobile user GPS trajectories: a bearing change
based approach,” in Proceedings of the ACMSIGSPATIALGIS’12,
A
C
M,RedondoBeach,Calif,USA,.
[] Y. Zheng and X. Xie, “Learning travel recommendations from
user-generated GPS traces,ACMTransactionsonIntelligent
Systems and Technology (TIST), vol. , no. , article , .
[] J. Li, Q. M. Qin, L. You, and C. Xie, “Parking lot extraction
method based on oating car data,Journal of Wuhan University
(Information Science Edition),vol.,no.,pp.,.
[] S.Schroedl,K.Wagsta,S.Rogers,P.Langley,andC.Wilson,
“Mining GPS traces for map renement,” Data Mining and
Knowledge Discovery,vol.,no.,pp.,.
[] J. Li, Q. Qin, J. Han, L.-A. Tang, and K. H. Lei, “Miningtrajectory
data and geotagged datain s ocial media forroad map inference,
Transactions in GIS,vol.,no.,pp.,.
[] L. L. Cao and J. Krumm, “From gps traces to a routable road
map,” in Proceedings of the Workshop on Advances in Geographic
Information Systems,pp.,NewYork,NY,USA,.
[] L. Tang, X. Yang, Z. Kan, and Q. Li, “Lane-level road infor-
mation mining from vehicle GPS trajectories based on Na¨
ıve
Bayesian classication,ISPRS International Journal of Geo-
Information,vol.,no.,pp.,.
[] C. Kuntzsch, M. Sester, and C. Brenner, “Generative models
for road network reconstruction,International Journal of Geo-
graphical Information Science,vol.,no.,pp.,.
[] J. Y. Kang and H. S. Yong, “Mining spatio-temporal patterns in
trajectory data,Journal of Information Processing Systems,vol.
, no. , pp. –, .
[] P.-R. Lei, T.-J. Shen, W.-C. Peng, and I.-J. Su, “Exploring
spatial-temporal trajectory model for location prediction,” in
Proceedingsofthe12thIEEEInternationalConferenceonMobile
Data Management (MDM ’11), pp. –, Lulea, Sweden, June
.
[]P.Zhao,X.Liu,W.Shi,T.Jia,W.Li,andM.Chen,“An
empirical study on the intra-urban goods movement patterns
using logistics big data,International Journal of Geographical
Information Science,pp.,.
[] M. Lu, Z. C. Wang, and X. R. Yuan, “TrajRank: Exploring travel
behaviour on a route by trajectory ranking,” in Proceedings of
IEEE Pacic Visualization Symposium, pp. –, China, .
[] V. W. Zheng, Y. Zheng, X. Xie, and Q. Yang, “Collaborative
location and activity recommendations with GPS history data,
in Proceedings of the 19th International Conference on World
Wide Web (WWW ’10), pp. –, .
[] H. P. Hsieh, C. T. Li, and S. D. Lin, “Measuring and Recom-
mending Time-Sensitive Routes from Location-Based Data,” in
Proceedings of the Twenty-Fourth International Joint Conference
on Articial Intelligence,vol.,ArticleNo.,,Buenos
Aires, Argentina.
[] J. Li, J. Wang , J. Zhang, Q. Qin, T. Jindal, and J. Han, “A
probabilistic approach to detect mixed periodic patterns from
moving object data,GeoInformatica,vol.,no.,pp.,
.
[] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, “Internet of vehicles:
From intelligent grid to autonomous cars and vehicular clouds,
Journal of Advanced Transportation 
in ProceedingsoftheInInternetofings(WF-IoT).IEEEWorld
Forum, pp. –, IEEE, .
[] J.Li,Q.Qin,C.Xie,andY.Zhao,“Integrateduseofspatialand
semantic relationships for extracting road networks from oat-
ing car data,International Journal of Applied Earth Observation
and Geoinformation,vol.,no.,pp.,.
[] M. Ester, H. P. Kriegel, J. Sander, and X. W. Xu, “A density-based
algorithm for discovering clusters in large spat ial dat abases with
noise,” in Proceedings of the e Second International Conference
onKnowledgeDiscoveryandDataMing,Portland,ORE,USA,
.
[] J. W. Han, M. Kamber, and J. Pei, Data Mining: Concepts
and Techniques, Chapter , Morgan Kaufmann, San Francisco,
Calif, USA, rd edition, .
[] E. W. Forgy, “Cluster analysis of multivariate data: eciency
versus interpretability of classications,Biometrics,vol.,no.
, pp. -, .
[] J. Demmel, “CS: Notes for Lecture ,” April , , Graph
Partitioning, Part .
[] R.J.Campello,D.Moulavi,andJ.Sander,“Density-basedclus-
tering based on hierarchical density estimates,” in Proceedings
of the Pacic-Asia Conference on Knowledge Discovery and Data
Mining,vol.,pp.,.
[] L. McInnes, J. Healy, and S. Astels, “Hdbscan: Hierarchical
density based clustering,Journal of Open Source Soware, e
Open Journal,vol.,no.,.
... The work in [20] utilized massive historical coach trajectory data to extract coach operation information such as station location, station name, coach schedule, and driving route. Such data is important as it facilitates the realization of intelligent traffic information services. ...
... The aforementioned literature review showed that the main challenges in this field are related to finding methods to collect accurate bus propagation and passenger demands information, and then to use such data in the development of intelligent bus management applications that aim to increase bus capacity utilization, reduce bus operation cost, and improve passenger satisfaction. Specifically, the studies in [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24] addressed particular issues related to bus management applications (e.g., reducing bus bunching, finding optimal schedules, predicting bus arrival time, collecting coach operation information, and designing bus tracking hardware), but did not discuss design aspects pertaining to the software development of bus management information systems as a whole. On the other hand, the software engineering aspects to develop bus management information systems that focus on bus trip tracking were introduced in [25][26][27][28][29][30][31][32][33][34][35][36][37], but only the systems in [25][26][27] discussed driver, bus, and route setup steps besides trip tracking. ...
Article
Full-text available
This paper addresses various aspects related to the design, development, and validation of a web-based information system that is intended to facilitate the management of a bus transportation service offered by a Jordanian university to its staff and students. Passengers can use this system to track bus trips to find out how far a desired bus is from a specific location. Also, they can know about arrivals and departures of buses managed using this system. Specifically, this work explores UI design, data structures, database design, system architecture, and development methods to realize the required features (e.g., user roles, bus setup, driver assignment, bus routes, bus schedules, and trip monitoring) in the proposed bus location tracking system. It also suggests using a free open-source API, rather than the proprietary Google Maps API, to develop the interactive maps. The system also records trip information and solicits passenger feedback to allow reviewing and analyzing that data to enhance the quality of service, reduce operation cost, and improve passenger satisfaction. The conducted comparative analysis results illustrate that the open-source API is accurate, fast, and responsive similar to the proprietary API. Furthermore, the user survey output confirms that the deployed system is easy to use, helpful, fast, responsive, and accurate.
... Tang et al. [22] proposed a lane-level road network information mining method based on lane number and turning rules. In addition to the region of interest and path recognition, scholars have also done a lot of research on spatiotemporal pattern extraction based on trajectory data, for example, the method of automatically extracting passenger train operation information from historical track data [23]. Dong et al. [24] proposed a study on the temporal and spatial change of traffic accessibility under public health emergencies based on GPS trajectory. ...
Article
Full-text available
An airport ferry vehicle is a ground service vehicle used to transfer passengers between the far apron and the terminal. The travel time of ferry tasks in the airport ferry network is an important decision-making basis for ferry vehicle scheduling. This paper presents a graph-based method to mine the travel time between nodes in the airport ferry network. Firstly, combined with map and trajectory information, the method takes the terminal boarding gates, parking lots, and remote stands as road network nodes to build a complete airport ferry road network. Then, this paper uses big data processing technology to identify the travel time between regional connection nodes by data fusion through the temporal and spatial relationship between flight schedule and ferry vehicle GPS travel trajectory. Finally, the Floyd shortest path algorithm in graph theory is used to obtain the shortest path and travel time of all OD points. The experimental results show that all the ferry times calculated by the method proposed in this paper can better reflect the actual driving situation. This method saves the manpower, material resources, and time cost of on-site investigation and lays a foundation for the scheduling of ferry vehicles.
... Trajectory data provide an unprecedented and large amount of information that reflects the dynamics of mobile objects and thus are widely applicable to intelligent transportation, urban computing, social network analysis, and other fields [33,34]. As far as this research is concerned, the trajectory data demonstrate the actual moving routes of taxis, while an objective optimal route based on road networks exists for each passenger-carrying trip. ...
Article
Full-text available
Understanding how urban residents process road network information and conduct wayfinding is important for both individual travel and intelligent transportation. However, most existing research is limited to the heterogeneity of individuals’ expression and perception abilities, and the results based on small samples are weakly representative. This paper proposes a quantitative and population-based evaluation method of wayfinding performance on city-scale road networks based on massive trajectory data. It can accurately compute and visualize the magnitude and spatial distribution differences of drivers’ wayfinding performance levels, which is not achieved by conventional methods based on small samples. In addition, a systematic index set of road network features are constructed for correlation analysis. This is an improvement on the current research, which focuses on the influence of single factors. Finally, taking 20,000 taxi drivers in Beijing as a case study, experimental results show the following: (1) Taxi drivers’ wayfinding performances show a spatial pattern of a high level on arterial road networks and a low level on secondary networks, and they are spatially autocorrelated. (2) The correlation factors of taxi drivers’ wayfinding performances mainly include anchor point, road grade, road importance, road complexity, origin-destination length, and complexity, and each factor has a different influence. (3) The path complexity has a higher correlation with the wayfinding performance level than with the path distance. (4) There is a critical point in the taxi drivers’ wayfinding performances in terms of path distance. When the critical value is exceeded, it is difficult for a driver to find a good route based on personal cognition. This research can provide theoretical and technical support for intelligent driving and wayfinding research.
... Even so, for better classify the algorithms, we divide them into three categories according to the main principles of them. One is the method of clustering trajectory points and lines (Davies et al., 2006;Edelkamp and Schrödl, 2003;Kuntzsch et al., 2016;Li et al., 2019;Worrall and Nebot, 2007). The road components are obtained by spatial clustering of GPS, and then they are connected according to different rules. ...
Article
Full-text available
With the gradual opening of floating car trajectory data, it is possible to extract road network information from it. Currently, most road network extraction algorithms use unified thresholds to ignore the density difference of trajectory data, and only consider the trajectory shape without considering the direction of the trajectory, which seriously affects the geometric precision and topological accuracy of their results. Therefore, an adaptive radius centroid drift clustering method is proposed in this paper, which can automatically adjust clustering parameters according to the track density and the road width, using trajectory direction to complete the topological connection of roads. The algorithm is verified by the floating car trajectory data of a day in Futian District, Shenzhen. The experimental results are qualitatively and quantitatively analyzed with ones of the other two methods. It indicates that the road network data extracted by this algorithm has a significant improvement in geometric precision and topological accuracy, and which is suitable for big data processing.
... These trajectory data are widely used in smart transportation (Li et al., 2012), urban computing (Zheng et al., 2015), social sensing (Liu et al., 2015) and other fields because of their rich spatiotemporal location and semantic information. For example, Li et al. (2019) used the vehicle trajectory data to extract coach operation information such as coach stations, routes and timetables, which provided data support for China's national road passenger transportation ticketing platform. Some studies used taxi trajectory data to conduct passenger-finding strategies, spatiotemporal analysis of public transportation, road networks update and other studies to optimize urban traffic (Wu et al., 2016;Tang et al., 2017;Tu et al., 2018). ...
Article
Full-text available
The trajectory data generated by various position-aware devices is widely used in various fields of society, but its conventional vector representation and various analysis algorithms based on it have high computational complexity. This makes it difficult to meet the application requirements of real-time or near real-time management and analysis of large-scale trajectory data. In view of the above challenges, this paper proposes a trajectory data management and analysis technology framework based on the Spatiotemporal Grid Model (STGM). First, the trajectory data is represented by spatiotemporal grid encoding instead of vector coordinates, and it can achieve dimensionality reduction and integrated management of high-dimensional heterogeneous trajectory data. Second, the trajectory computing and analysis methods based on STGM are introduced, which reduce the computing complexity of algorithms. Furthermore, various types of trajectory mining and applications are realized on the basis of high-performance computing technologies. Finally, a trajectory data management and analysis prototype system based on the STGM is developed, and experimental results verify the reliability and effectiveness of the proposed technology framework.
Article
Full-text available
This research paper aims at evaluating travel time variability as well as reliability indices using Global Positioning Systems (GPS) based trajectory data of bus trips collected along a selected bus-route of the Chennai city, the southern part of India. Travel time reliability indices, such as, Planning Time Index (PTI), Buffer Time Index (BTI), and Buffer Time (BT) along with other statistical measures over different time periods are estimated. Generalized Extreme Value (GEV) distribution is found to be the best-fitted distribution for explaining bus travel-time variability, reasonably-well; using Kolmogorov-Smirnov (KS) test. BT and 95th percentile travel time are the most potential reliability measures, variation of which reasonably matches with the variation in k-value (shape parameter of GEV distribution) over time. The findings from the statistical distribution analysis indicate that travel times during peak hour can be better described using normal distributions. The generic model is developed for predicting volumes based on bus journey speeds. Further, the developed model is validated with the help of travel time data of the same route during a different time period. The study also attempts to demonstrate a methodology for establishing Level-of-Service (LoS) criteria using reliability indicators. The classification of reliability indicators, considering segment-level travel time data, Coefficient of Variation (CV) of travel time and Volume-to-Capacity (V/C) ratio is finally presented using cluster-technique. Finally, the study concludes that the most-effective performance indicators for examining travel time variability on a given bus route are 95th percentile travel time and BT.
Article
Full-text available
The prevalence of moving object data (MOD) brings new opportunities for behavior related research. Periodic behavior is one of the most important behaviors of moving objects. However, the existing methods of detecting periodicities assume a moving object either does not have any periodic behavior at all or just has a single periodic behavior in one place. Thus they are incapable of dealing with many real world situations whereby a moving object may have multiple periodic behaviors mixed together. Aiming at addressing this problem, this paper proposes a probabilistic periodicity detection method called MPDA. MPDA first identifies high dense regions by the kernel density method, then generates revisit time sequences based on the dense regions, and at last adopts a filter-refine paradigm to detect mixed periodicities. At the filter stage, candidate periods are identified by comparing the observed and reference distribution of revisit time intervals using the chi-square test, and at the refine stage, a periodic degree measure is defined to examine the significance of candidate periods to identify accurate periods existing in MOD. Synthetic datasets with various characteristics and two real world tracking datasets validate the effectiveness of MPDA under various scenarios. MPDA has the potential to play an important role in analyzing complicated behaviors of moving objects.
Article
Full-text available
In this paper, we propose a novel approach for mining lane-level road network information from low-precision vehicle GPS trajectories (MLIT), which includes the number and turn rules of traffic lanes based on naïve Bayesian classification. First, the proposed method (MLIT) uses an adaptive density optimization method to remove outliers from the raw GPS trajectories based on their space-time distribution and density clustering. Second, MLIT acquires the number of lanes in two steps. The first step establishes a naïve Bayesian classifier according to the trace features of the road plane and road profiles and the real number of lanes, as found in the training samples. The second step confirms the number of lanes using test samples in reference to the naïve Bayesian classifier using the known trace features of test sample. Third, MLIT infers the turn rules of each lane through tracking GPS trajectories. Experiments were conducted using the GPS trajectories of taxis in Wuhan, China. Compared with human-interpreted results, the automatically generated lane-level road network information was demonstrated to be of higher quality in terms of displaying detailed road networks with the number of lanes and turn rules of each lane.
Conference Paper
Full-text available
In this paper, we propose a novel visual analysis method TrajRank to study the travel behaviour of vehicles along one route. We focus on the spatial-temporal distribution of travel time, i.e., the time spent on each road segment and the travel time variation in rush/non-rush hours. TrajRank first allows users to interactively select a route, and segment it into several road segments. Then trajectories passing this route are automatically extracted. These trajectories are ranked on each road segment according to travel time and further clustered according to the rankings on all road segments. Based on the above ranking analysis, we provide a temporal distribution view showing the temporal distribution of travel time and a ranking diagram view showing the spatial variation of travel time. With real taxi GPS data, we present three use cases and an informal user study to show the effectiveness and usability of our method.
Article
Modern cities, particularly megacities, have strong mobility in terms of human movement, traffic flow and flow of goods. Movement patterns of intra-urban goods/things and the ways they differ from human mobility and traffic flow patterns have seldom been explored due to data access and methodological limitations, especially from systemic and long timescale perspectives. Increasingly available urban logistics big data have created a new possibility to address this issue with unprecedented spatial and temporal resolutions. This research proposes an analytical framework for exploring intra-urban goods movement patterns by integrating spatial analysis, network analysis and spatial interaction analysis. Using daily urban logistics big data (over 10 million orders) provided by the largest online logistics company in Hong Kong (GoGoVan) from 2014 to 2016, we analysed two spatial characteristics (displacement and direction) of urban goods movement. Results showed that the distribution of goods displacement, different from human mobility trends, exhibited a bimodal Weibull distribution rather than a power law or exponential distribution. The origin-destination flows of goods were used to build a spatially embedded network and an analysis of the network revealed that Hong Kong became increasingly connected through intra-urban freight movement. Finally, spatial interaction characteristics were revealed using a fitting gravity model. The distance decay effect was observed to be significantly smaller than that of human mobility patterns. In other words, distance lacked substantial influences on the spatial interaction of goods movement over the last three years. These findings could have policy implications to intra-urban logistics and urban transport planning.
Article
The phenomena that parking is difficult in large and medium-sized cities makes people's desire for parking lot information become stronger and stronger, and the level of detail and accuracy of parking lot information in electronic maps directly impact the service quality of maps. In view of the problems of current surveying methods of parking lots, this paper proposes an approach to automatically extract the locations of parking lots from floating car data. In this paper, the DBSCAN algorithm is used to detect point clusters located within parking areas, and a special spatial grid index is established to decrease the time complexity of the clustering algorithm. Experiment results validate the proposed method.
Article
Location-based services allow users to perform geospatial recording actions, which facilitates the mining of the moving activities of human beings. This article proposes to recommend time-sensitive trip routes consisting of a sequence of locations with associated timestamps based on knowledge extracted from large-scale timestamped location sequence data (e.g., check-ins and GPS traces). We argue that a good route should consider (a) the popularity of places, (b) the visiting order of places, (c) the proper visiting time of each place, and (d) the proper transit time from one place to another. By devising a statistical model, we integrate these four factors into a route goodness function that aims to measure the quality of a route. Equipped with the route goodness, we recommend time-sensitive routes for two scenarios. The first is about constructing the route based on the user-specified source location with the starting time. The second is about composing the route between the specified source location and the destination location given a starting time. To handle these queries, we propose a search method, Guidance Search, which consists of a novel heuristic satisfaction function that guides the search toward the destination location and a backward checking mechanism to boost the effectiveness of the constructed route. Experiments on the Gowalla check-in datasets demonstrate the effectiveness of our model on detecting real routes and performing cloze test of routes, comparing with other baseline methods. We also develop a system TripRouter as a real-time demo platform.
Article
This work aims at the inference of traffic networks from GPS trajectories. We perform geometry and topology reconstruction of the network in a multistep process. Our main contributions are the formulation of an explicit intersection model with a score function that accounts for consistency with the raw tracking data, as well as for a topology prior and the search for the best model by maximization of this score function using a Markov chain Monte Carlo sampler. We demonstrate the viability of our model-based approach with experiments on GPS data sets of varying size and data quality, followed by a comparison with results achieved by alternative, heuristic approaches.