PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Transportation agencies are starting to leverage increasingly-available GPS trajectory data to support their analyses and decision making. While this type of mobility data adds significant value to various analyses, one challenge that persists is lack of information about the types of vehicles that performed the recorded trips, which clearly limits the value of trajectory data in transportation system analysis. To overcome this limitation of trajectory data, a deep Convolutional Neural Network for Vehicle Classification (CNN-VC) is proposed to identify the vehicle's class from its trajectory. Since a raw GPS trajectory does not convey meaningful information, this paper proposes a novel representation of GPS trajectories, which is not only compatible with deep-learning models, but also captures both vehicle-motion characteristics and roadway features. To this end, an open source navigation system is also exploited to obtain more accurate information on travel time and distance between GPS coordinates. Before delving into training the CNN-VC model, an efficient programmatic strategy is also designed to label large-scale GPS trajectories by means of vehicle information obtained through Virtual Weigh Station records. Our experimental results reveal that the proposed CNN-VC model consistently outperforms both classical machine learning algorithms and other deep learning baseline methods. From a practical perspective, the CNN-VC model allows us to label raw GPS trajectories with vehicle classes, thereby enriching the data and enabling more comprehensive transportation studies such as derivation of vehicle class-specific origin-destination tables that can be used for planning.
Content may be subject to copyright.
A Deep Learning Approach for Vehicle Classification Using
Large-Scale GPS
Sina Dabiria,b, Nikola Markovi´cc, Kevin Heaslipa, Chandan K. Reddyb
aDepartment of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA, USA
bDepartment of Computer Science, Virginia Tech, Arlington, VA, USA
cDepartment of Civil and Environmental Engineering, University of Utah, Salt Lake City, UT, USA
Abstract
Transportation agencies are starting to leverage increasingly-available GPS trajectory data to
support their analyses and decision making. While this type of mobility data adds significant
value to various analyses, one challenge that persists is lack of information about the types of
vehicles that performed the recorded trips, which clearly limits the value of trajectory data
in transportation system analysis. To overcome this limitation of trajectory data, a deep
Convolutional Neural Network for Vehicle Classification (CNN-VC) is proposed to identify
the vehicle’s class from its trajectory. Since a raw GPS trajectory does not convey meaningful
information, this paper proposes a novel representation of GPS trajectories, which is not only
compatible with deep-learning models, but also captures both vehicle-motion characteristics
and roadway features. To this end, an open source navigation system is also exploited to
obtain more accurate information on travel time and distance between GPS coordinates.
Before delving into training the CNN-VC model, an efficient programmatic strategy is also
designed to label large-scale GPS trajectories by means of vehicle information obtained
through Virtual Weigh Station records. Our experimental results reveal that the proposed
CNN-VC model consistently outperforms both classical machine learning algorithms and
other deep learning baseline methods. From a practical perspective, the CNN-VC model
allows us to label raw GPS trajectories with vehicle classes, thereby enriching the data
and enabling more comprehensive transportation studies such as derivation of vehicle class-
specific origin-destination tables that can be used for planning.
Keywords: Deep Learning, Vehicle Classification, GPS Data, CNN
1. Introduction1
The advances in Global Position Systems (GPS) and ever-increasing market penetration2
of GPS-equipped devices (e.g., smart phones) enabled generation of massive spatiotemporal3
trajectory data that capture human mobility patterns. Availability of large-scale trajectory4
Preprint submitted to Transportation Research Part C February 24, 2019
data motivated transportation agencies to leverage such information for measuring trans-1
portation network performance and supporting their decision making. However, even though2
this type of mobility data adds significant value to various transportation analyses, one chal-3
lenge that persists is lack of information about the classes of vehicles that performed the4
recorded trips. Namely, users of different GPS navigation apps (e.g., Google Maps) typically5
do not specify the type of vehicle they are driving, making it unclear to a transportation6
analyst whether a recorded trip was performed by a passenger car, small commercial vehi-7
cle, or even an eighteen-wheeler truck. Obviously, having this piece of information would8
be extremely useful for a wide range of applications in Intelligent Transportation Systems9
(ITS) including emission control, pavement designation, traffic flow estimation, and urban10
planning [16]. Furthermore, unlike fixed-point traffic flow sensors (e.g., loop detectors and11
video cameras), GPS sensors are not limited by sparse coverage and can be considered as12
an efficient and ubiquitous data source. Accordingly, our main objective in this paper is to13
enrich the increasingly available trajectory data by developing a model that is capable of14
identifying the type of a vehicle that produced a GPS trajectory.15
However, the definition of vehicle categories is highly contingent on the ITS application16
and data-collection sensors. For example, vehicles can be classified based on the number of17
axles or weights for pavement designation and ETC systems. Among various categorizations,18
the vehicle-classification proposed by the US Federal Highway Administration (FHWA) in19
the mid 1980s has been accepted as the most standard scheme and served as the basis for20
a majority of state-counting efforts. The number of axles, the spacing between axles, the21
first axle weight, and gross vehicle weight are the main factors for categorizing vehicles in22
the FHWA system. Table 1 provides information on the 13 vehicle categories according23
to the FHWA definitions. The FHWA vehicle classification might slightly change from24
State to State depending on the types of vehicles that are allowed to commute in that25
State. Nonetheless, in real-world situations, vehicles can be regrouped to more coarse-26
grained classes, such as {light, medium, heavy}[19], {passenger cars, trucks}[20], and27
{sedan, pickup truck, SUV/minivan}[14].28
The common approach for classifying vehicles is based on fixed-point traffic sensors, which29
are categorized into two groups: 1) in-roadway sensors that are embedded in pavement or30
attached to road surfaces, including inductive-loop detectors, magnetometers, and Weigh-31
In-Motion (WIM) systems, 2) over-roadway sensors that are mounted above the surface,32
including microwave radar sensors, laser radar sensors, and ultrasonic infrared sensors. When33
a vehicle crosses them, these sensors are capable of calculating the vehicle’s axle information34
(e.g., the number of axles and the space between them). Video image processor (VIP) is35
another over-roadway sensor that has extensively been deployed for the vehicle classification,36
2
Table 1: FHWA Vehicle Classification System
#Class Class Definition #Class Class Definition
1 Motorcycles 8 Four or Fewer Axle Single-Trailer Trucks
2 Passenger Cars 9 Five-Axle Single-Trailer Trucks
3 Other Two Axle, Four tires, Single-Unit 10 Six or More Axle Single-Trailer Trucks
4 Buses 11 Five or Fewer Axle Multi-Trailer Trucks
5 Two-Axle, Six-Tire, Single-Unit Trucks 12 Six-Axle Multi-Trailer Trucks
6 Three-Axle Single-Unit Trucks 13 Seven or More Axle Multi-Trailer Trucks
7 Four or More Axle Single-Unit Trucks - -
in particular due to advances in vision-based techniques [11, 14]. After a vehicle is detected1
through VIP, computer-vision algorithms are exploited to first extract robust features from2
images and/or videos, and then classify the vehicle into pre-defined groups using supervised3
learning algorithms. However, there are several major shortcomings regarding the fixed-4
point traffic flow sensors including: 1) the installation and maintenance of such technologies5
not only requires extra costs but also necessitates road closure and physical changes to6
road infrastructure, 2) their performance is subjected to errors in various situations such as7
inclement weather conditions, high-speed road segments, and traffic congestion, 3) they can8
be used only for the location where they were installed while many ITS applications require9
dynamic vehicle classification across a wide area.10
One cost-effective way to address the above-mentioned issues is to use well-established11
positioning tools such as Global Position Systems (GPS) that enable to record spatiotemporal12
information of vehicles’ trips. Unlike traffic flow sensors, GPS infrastructures does not13
impose extra costs as the system has already been designed for purposes not related to14
traffic management. Furthermore, GPS data are not limited by sparse coverage and can15
be considered as an efficient and ubiquitous data source for several transportation-domain16
applications including traffic state estimation, traffic incident detection, mobility pattern17
extraction, travel demand analysis, and transport mode inference [7]. Accordingly, this18
study seeks to harness the GPS data for addressing the vehicle-classification problem.19
Before proceeding to development of a vehicle-classification model, for the first time,20
we label a large-scale GPS trajectory dataset according to the fine-grained FHWA vehicle21
categories. A vehicle’s GPS trajectory is constructed by connecting a sequence of GPS22
points, recorded by means of an on board GPS-enabled device, where a GPS point contains23
information about the device’s geographic location at a particular moment. The GPS data24
are recorded from vehicles that have traveled in the State of Maryland over the period25
of four months in 2015. To this end, a separate WIM dataset that contains the FHWA26
class information of the vehicles that passed Virtual Weight Stations (VWS) is used. An27
3
efficient programmatic-labeling strategy is designed to assign the vehicle class information1
from the WIM dataset to the GPS trajectories that have passed the corresponding VWS.2
Our programmatic labeling is accurate enough while saving costs and times compared to3
manual labeling, which is too expensive and time-consuming.4
Having a large volume of labeled GPS trajectories, the GPS-based vehicle-classification5
problem can be examined for several vehicle-categorization scenarios. However, a raw GPS6
trajectory contains only a sequence of GPS points without any meaningful correlations and7
explicit features for feeding into supervised models. Feature engineering is common approach8
for extracting some hand-crafted features using the descriptive statistics of motion charac-9
teristics such as maximum speed and acceleration. Nonetheless, feature engineering not only10
requires expert knowledge but also involves biased engineering justification and vulnerabil-11
ity to traffic and environmental conditions [6]. For example, different types of vehicles may12
have the same speed in a traffic jam. Automated feature learning methods such as deep13
learning architectures are a remedy for resolving shortcomings with hand-designed features.14
But, a raw GPS trajectory with a sequence of GPS coordinates and timestamps neither15
has an adaptable structure for passing into deep-learning algorithms nor conveys mean-16
ingful information. Accordingly, before deploying deep-learning algorithms, a novel GPS17
representation is designed to convert the raw GPS trajectory into a structure that contains18
both vehicle-motion characteristics and roadway-geometric information associated with the19
GPS trajectory. Afterward, a deep Convolutional Neural Network for Vehicle-Classification20
(CNN-VC) is proposed to identify the vehicle class from its GPS trajectory. The developed21
classification method can be subsequently deployed to identify vehicle classes of the remain-22
ing GPS trajectories in the original dataset, which adds a very important dimension to the23
dataset that is widely used to support analysis of the Maryland State Highway Administra-24
tion (e.g., derive class-specific trip tables or link-based traffic volumes). Accordingly, this25
paper makes the following contributions:26
Labeling a large-scale GPS trajectory dataset based on the FHWA clas-27
sification scheme. An efficient programmatic approach is developed to label GPS28
trajectories by means of vehicle class information obtained from VWS vehicle records.29
Information from an open source routing engine is also exploited to improve the accu-30
racy of our labeling approach.31
Designing a new representation for raw GPS trajectories. First, a GPS tra-32
jectory is considered as a sequence of GPS legs, where each leg is the route segment33
between two consecutive GPS points. Then, a feature vector is computed for every GPS34
leg, which is a combination of motion-related and road-related features. Concatenating35
4
the feature vector corresponding to all GPS legs generates an efficient representation1
for each GPS trajectory.2
Developing a deep Convolutional Neural Network for Vehicle-Classification3
(CNN-VC). For the first time in this domain, a CNN-based deep-learning model is4
developed to identify vehicle’s class from the proposed GPS representation. The CNN-5
VC comprises a stack of CNN layers for extracting abstract features from the GPS6
representation, a pooling operation for encapsulating the most important information,7
and a softmax layer for performing the classification task.8
Conducting an extensive set of experiments for evaluating the proposed9
model. In addition to the classical machine-learning techniques, several state-of-the-10
art deep-learning algorithms are developed to be used as baselines. Experimental11
results reveal that our CNN-VC model clearly outperforms both classical-supervised12
algorithms and deep-learning architectures.13
The rest of this article is organized as follows. After reviewing related works in Section 2,14
the datasest and the labeling strategy are described in Section 3. The details of our proposed15
framework are elaborated in Section 4. In Section 5, numerous experiments are conducted16
to evaluate the performance of the proposed framework. Finally, conclusions are drawn in17
Section 6.18
2. Related Works19
Since vehicle classification is essential for many tasks in ITS, a considerable amount of20
literature has been published on developing frameworks for identifying the vehicle’s class.21
Two primary types of sensors that have been used to develop vehicle-classification systems22
include: 1) fixed-point traffic flow sensors, and 2) floating sensors using GPS data. In this23
section, after briefly reviewing the literature on the first group, the few studies on the vehicle24
classification using GPS data are examined.25
2.1. Fixed-point sensors for vehicle classification26
A large and growing body of literature has addressed the vehicle-classification problem27
using various fixed-point traffic data-collection systems over the past decades. In-roadway28
(e.g., loop detectors, magnetic sensors, and piezoelectric sensors) and over-roadway (e.g.,29
radar sensors, infrared sensors, and VIP) are the two main types of traffic flow sensors that30
have been exploited for classifying vehicles. Since in-roadway sensors interrupt traffic flow31
during installation and maintenance, they are more appropriate for urban areas rather than32
5
high-speed roads such as freeways. On the other hand, over-roadway sensors are more appro-1
priate for capturing vehicles information in high-speed, high-capacity roads as they are much2
less disruptive to traffic flows [20]. The idea behind vehicle classification using fixed-point3
sensors is to capture vehicles’ physical characteristics (e.g., axle configuration and length)4
and then categorize vehicles according to a pre-defined set of classes. A dual-loop detector5
system, as an example of in-roadway sensors, can simply estimate the length of a vehicle by6
computing the speed and occupancy time and establish a length-based classification system7
[4]. A VIP system, as an example of on-roadway sensors, models vehicles as rectangular8
patches in a sequence of images recorded by a camera. Afterward, vehicles are classified9
to several groups using visual features (e.g., width, length, area, and perimeter) computed10
by traditional vision-based techniques [14] or abstract automatic features learned by deep-11
learning algorithms [8]. Sensors can also be integrated to form a more robust classifier12
system for fine-grained classification. A WIM system, which is comprised of piezoelectric13
sensors, VIP sensors, loop detectors, etc., is capable of classifying vehicles based on the14
13-category FHWA classification scheme [12]. Further details on pros and cons of various15
vehicle-classification systems using fixed-point technologies are available in [20]. Since the16
specific objective of this study is to develop a GPS-based vehicle-classification system, we17
focus more on the studies that have harnessed GPS data for classifying vehicles.18
2.2. GPS-based vehicle-classification19
In spite of the popularity in fixed-point traffic flow sensors, they incur a high initial cost for20
installation and a permanent cost for maintenance. Furthermore, they are unable to spatially21
cover a wide area and in turn their usage is limited to the installation location. Probe22
vehicles equipped with positioning tools such as GPS can be a viable alternative to fixed-23
point sensors, due to the fast-growing market-penetration rate of GPS-enabled devices (e.g.,24
smart phones). GPS trajectories, which summarize vehicle’s spatiotemporal information,25
can be recorded when vehicles are performing regular trips. Over the past decade, much26
research has been carried out to harness GPS trajectories for several mobility applications,27
ranging from vehicle fleet management, human mobility behavior, and inferring significant28
locations to travel mode detection and travel anomaly detection [7]. A comprehensive and29
systematic review on various aspects of trajectory data mining including trajectory data30
preprocessing, trajectory data management, and several trajectory mining tasks is available31
in [23]. Although numerous studies leveraged GPS data in various mobility applications,32
only a few studies have developed vehicle-classification systems based on GPS trajectories.33
[20], for the first time, utilized GPS data to distinguish passenger cars from trucks, yet34
on a very small dataset with 52 samples of passenger cars and 84 samples of trucks. Com-35
6
pared to speed-related features, they found that the features related only to acceleration1
and deceleration (e.g., the proportions of acceleration and deceleration larger than 1 ms2,2
and the standard deviations of acceleration and deceleration) result in a better classification3
performance. Acceleration-based features are then passed into the Support Vector Machine4
(SVM) classifier for the classification task. In a recent study by [19], a deep-learning frame-5
work based on Recurrent Neural Networks (RNN) was developed to classify vehicles into6
three categories: light-duty, mid-duty, and heavy-duty. The input data for the RNN model7
is a sequential point-wise GPS information. Such a sequence is created by computing a set8
of point-wise features including distance, time, speed, acceleration, and road type for every9
GPS point in a GPS track. Their proposed model was trained using almost 1 million GPS10
tracks collected by a fleet intelligence company. Note that a growing body of literature has11
recently leveraged the power of deep learning algorithms for solving various applications in12
the transportation domain [5, 18].13
A similar research track belongs to travel mode detection using GPS trajectories. There14
the objective is to identify transportation mode(s) used by travelers for making their regular15
trips based on GPS data. Analogous to the vehicle-classification problem, the problem of16
travel mode detection consists of two main steps: (1) extracting features from GPS logs,17
either by means of feature engineering [24] or automated feature learning methods [6], (2)18
feeding features to learning algorithms for the classification task. However, it is obvious that19
distinguishing between the GPS patterns of transportation modes (e.g., walk and car) is less20
challenging than discriminating between GPS patterns of vehicles (e.g., passenger car and21
pickups).22
To the best of our knowledge, no GPS dataset annotated with the FHWA vehicle cate-23
gories is available. Thus, we first design an efficient time-based-matching process for labeling24
a large number of GPS trajectories by the help of a separate WIM dataset, which contains the25
FHWA class information. Furthermore, a novel representation for GPS tracks is proposed to26
consider both kinematic motion characteristics of vehicles and roadway features for all road27
segments between every two consecutive GPS points in a GPS track. Finally, for the first28
time, a convolutional-based deep-learning model is applied to the new GPS representation29
for discriminating GPS trajectories according to their vehicle classes.30
3. Datasets description and labeling strategy31
The main purpose of this section is to develop an efficient scheme for labeling a large-32
scale GPS trajectory dataset based on the FHWA classification scheme. To the best of our33
knowledge, no GPS dataset has yet been labeled according to a fine-grained vehicle-class34
system. Such valuable labeled dataset can be exploited in a variety of trajectory mining and35
7
mobility studies besides the vehicle-classification problem. For example, knowing vehicle1
classes for all the recorded trips would enable derivation of trip tables for different vehicle2
classes (e.g, separate for passenger and commercial vehicles) and estimation of not only3
aggregate link-based volumes but their vehicle compositions.4
However, manual labeling of a large-scale dataset is obviously an expensive and time-5
consuming task, which calls for a programmatic labeling approach. To this end, vehicle class6
information obtained from VWS records is deployed as an auxiliary dataset for labeling GPS7
trajectories. Information from an open source routing engine is also extracted to maximize8
the quality of the labeling process. Using these resources, an efficient programmatic approach9
for labeling GPS trajectories is proposed to not only dramatically save time and cost but also10
ensure accurate labeling of GPS data. In this section, we first describe GPS trajectory data,11
VWS data, and Open Source Routing Machine (OSRM), and then elaborate on the labeling12
approach. Lastly, the distribution of labeled GPS data among vehicle classes, obtained by13
applying our labeling strategy to 20 million GPS trajectories, is provided.14
3.1. Data sources description15
3.1.1. GPS trajectory16
Trajectory data used in this paper comes from one of the leading GPS data providers in17
North America, which collects data from several hundred million vehicles and devices across18
the globe. As a sample data, Figure 1(a) presents a single GPS trajectory, which consists of19
vehicle locations and corresponding timestamps. The considered dataset includes 20 million20
GPS trajectories over a period of four months in 2015: February, June, July, and October.21
Every GPS trajectory is defined by a unique trip ID and a device ID. Trajectories have22
also been categorized into three groups based on the vehicle gross weight: (1) below 14,00023
lbs (FHWA classes 1-3), (2) between 14,000 lbs and 26,000 lbs (FHWA classes 4-6), and24
(3) above 26,000 lbs (FHWA classes 7-12). These trajectories span over the entire state of25
Maryland and include a relatively high percentage of vehicles with weights above 14,000 lbs,26
as shown in Figure 1(b).27
3.1.2. Virtual Weight Stations Data28
A VWS is a roadside enforcement facility that eliminates the need for continuous staffing29
by automatically identifying and recording vehicles’ characteristics on a real-time basis. Each30
station is comprised of several components, including WIM sensors (for computing vehicles’31
weights and axle configuration), camera system (for real-time identification of vehicles),32
screening software (for integrating data from WIM and camera systems), and communication33
infrastructure (for transferring the recorded information to authorized personnel such as34
mobile enforcement teams). Accordingly, WIM sensors in a VWS are capable of determining35
8
0
133
134
(a) Sample trajectory (b) Trip info
Figure 1: A sample GPS trajectory and info about 20 million trips (i.e., vehicle weights and data providers).
the approximate gross weight and axle configurations (e.g., weight, number, and spacing) in1
real-time, which are in turn used for the vehicle classification [2].2
The VWS records considered in this study include information collected at 7 VWS in3
the state of Maryland over the period of four months in 2015: February, June, July, and4
October. However, due to programmatic nature of the labeling strategy described in the5
following sections, only three stations installed on road segments with one lane per direction6
are utilized for the labeling purpose. Figure 2 illustrates the approximate location and7
the roadway network around these three VWS. For every vehicle crossing WIM sensors,8
various information has been recorded including: crossing time, crossing lane, class, number9
of axles, speed, gross weight, and length. The class definitions are based on the FHWA10
vehicle-classification system, as described in Table 1. Class 1 and 13 are not available in this11
dataset. Among all attributes, only vehicle’s crossing time, class, and gross-weight features12
are deployed in our labeling process. It should be noted that there is no pre-relation between13
the vehicles in GPS and VWS datasests. Indeed, the main goal of our labeling strategy is14
to match vehicles in these two datasets.15
3.1.3. Open Source Routing Machine16
Open Source Routing Machine (OSRM) is a web-based navigation system that computes17
the fastest path between an origin-destination pair, using the Open Street Map (OSM)18
data. The other functional services that OSRM provides, include map-matching, nearest19
matching, traveling salesman problem solving, and generating vector tiles. All of these20
services are accessible through its Application Programming Interface (API) methods. The21
main advantage of the OSRM APIs is that it can be used for free and without usage limits,22
which makes it a valuable resource for research purposes. The following OSRM services23
9
VWS 3
VWS 2
VWS 1
Figure 2: The location and roadway network associated with three Virtual Weight Stations (VWS) in the
state of Marlyland, used for the labeling process. The green map markers depict the location of VWS.
are exploited throughout this study for both labeling process and the vehicle-classification1
system:2
Map-matching: Given a set of GPS points (e.g, a GPS trajectory), the map-matching3
is the service that snaps the GPS points to the OSM network in the most plausible4
way. Attaching GPS points to the most likely nodes in the road network alleviates lo-5
cation errors associated with the GPS technology. Erroneous GPS logs, that cannot be6
matched successfully, are considered as outliers and discarded. When map-matching is7
completed, a variety of fine-grained information between every two consecutive matched8
points are provided including travel distance, travel time, number of turn movements,9
number of intersections, etc.10
Nearest : The nearest service snaps a GPS coordinate to the nearest location in the11
traffic network. This service is particularly useful when the accurate location of a GPS12
log, rather than a GPS trajectory, matters. The name of street, to that the GPS log13
is snapped, is the most useful information provided by the nearest service.14
Route: The route service finds the fastest route between two GPS coordinates. Al-15
though the map-matching service can provide the similar information, the route service16
is a more optimized way when the goal is to compute travel time and distance for only17
one origin-destination pair.18
10
3.2. Labeling GPS trajectories1
First, a vehicle’s GPS trajectory is defined as follows, which is used throughout this2
study:3
Definition 1 (Vehicle’s GPS trajectory). A vehicle’s GPS trajectory Tis defined as a4
sequence of time-stamped GPS points pT,T=[p1,...,pN]. Each GPS point pis a tuple5
of latitude, longitude, and time, p=[lat, lon, t], which identifies the geographic location of6
point pat time t.7
The key idea behind our proposed labeling process is to compute the crossing time-8
window of a vehicle’s GPS trajectory from a VWS and then identify its class among those9
vehicles that crossed the station at the computed time-window. Thus, our proposed labeling10
strategy consists of the following five major steps that should be taken in sequence:11
1. Retrieving initial GPS trajectories. GPS trajectories that have potentially crossed12
one of the three stations are filtered out from the whole GPS dataset. This eliminates13
the need of involving a large volume of GPS trajectories that obviously have not crossed14
any of VWS and in turn there is no chance to be labeled. Temporarily removing these15
GPS trajectories from our labeling pipeline significantly reduces the computation time16
in the next steps.17
2. Applying filtering criteria. Every GPS trajectory, obtained from the first step, is18
re-examined through several filtering rounds to guarantee that the GPS trajectory has19
truly crossed one of the three VWS. The GPS trajectories that have crossed a VWS20
and their corresponding VWS are determined and used in the next step.21
3. Predicting crossing time-window. For every identified trajectory in the previous22
step, the time-window that the GPS trajectory has crossed the VWS is predicted.23
4. Labeling GPS trajectories. The class information of all vehicles that have crossed24
the VWS during the predicted crossing time-window are retrieved. The GPS trajectory25
is labeled only if all crossing vehicles have the same class.26
5. Augmentation of labeled data. Trajectories with the same device ID have been27
operated with the same vehicle. Hence, after labeling a portion of trajectories retrieved28
from the first step, trajectories with the same device ID as the trajectories labeled in29
the previous step are fetched from the database containing all 20 million GPS traces30
and labeled accordingly.31
The above four steps are explained in the following sections.32
11
3.2.1. Step 1: Retrieving initial GPS trajectories1
Using the OSM API, only trajectories that include a GPS point within a two mile radius2
from the 3 VWS are queried for the subsequent analysis. This reduces the number of3
trajectories from the initial 20 million to about 300,000 trajectories. The rationale behind4
this quick step is to dramatically expedite the processing time in steps 2 to 4, as the most5
computationally-intensive steps in our labeling pipeline.6
3.2.2. Step 2: Applying filtering criteria7
The retrieved trajectories from step 1 have not necessarily passed one of the three sta-8
tions due to several reasons. For example, both directions of a roadway in OSM are often9
represented by one ‘way’ component, in which ‘way’ is one of the basic elements in OSM10
that are used to represent linear features such as roads. Therefore, a trajectory found in the11
first step might have a reverse direction with respect to the VWS, as shown in Figure 3(a).12
Also, some of the retrieved trajectories are only close to a VWS without passing the station.13
Accordingly, we implement several strong filtering criteria to ensure a GPS trajectory has14
crossed the station. The possibility of a GPS trajectory crossing a station is examined one-15
by-one for all three stations. Since stations are far apart, we assume that a GPS trajectory16
can cross only one station within a few hours. Thus, if a GPS trajectory passes all criteria17
for the first station, the remaining stations will not be examined for that particular trajec-18
tory. Finally, our filtering criteria in this step have been designed to simultaneously obtain19
the GPS points of a trajectory that are adjacent to the crossed VWS, which is an essential20
information for the next step (i.e., predicting and matching the crossing time-window). This21
is another way to improve the overall efficiency of our labeling scheme. Before proceeding to22
the filtering criteria, the following notation is described for an easier understanding of the23
filtering approach.24
Definition 2 (Before/after GPS point lists). Let a VWS be represented by S. The25
before/after GPS point lists of the trajectory Twith respect to the station S, denoted as26
BLT S /ALT S , is defined as all GPS points in Tin which their timestamp attribute are27
less/greater than the approximate time that the GPS trajectory Tmight have passed the28
station S. This crossing time is denoted as CT S . Note that the before/after lists are defined29
based on the time, not the location. BLT S /ALT S are mathematically defined as follows:30
BLT S ={p[t]CT S pTand the station is S}31
ALT S ={p[t]CT S pTand the station is S}32
Every GPS trajectory Tis looped over the three VWS and, for every VWS, the following33
filtering criteria are examined in sequence. If Tfails to pass a criterion, Tis discarded for34
12
(a) (b) (c)
(d)
Figure 3: Examples of GPS tra jectories that have failed to pass one of the filtering criteria. The green and
red markers indicate the VWS and GPS point locations, respectively. The green and black arrows indicate
the road direction associated with the VWS and GPS trajectory, respectively. (a) GPS trajectory is moving
in a reverse direction with respect to the VWS. (b) GPS trajectory is moving along a parallel road with
respect to the VWS. (c) GPS trajectory approaches to the VWS, yet not crossing. (d) GPS trajectory is
moving along an alternative route, rather than the VWS road.
labeling and the remaining criteria are not examined. Thus, the GPS trajectory Tcrosses a1
VWS Sonly if it passes all the filtering criteria.2
1. The GPS trajectory Tneeds to have at least one GPS point located before the station3
Sin order to cross the station. For example, since the station 1 is installed on the4
southbound, the GPS trajectory Tis crossing the station 1 only if a GPS point with5
the latitude greater than the latitude of Sis found in T. If any GPS point found, then6
the closest GPS point to the Sis considered as the before-GPS-point. Note that, since7
the trip direction of a vehicle might change several times along the trip path, not all8
GPS points geographically located before the station are considered in the before list9
BLT S . The GPS points that are recorded in a time order before the before-GPS-point10
13
form BLT S .1
2. At least one GPS point needs to exist after the before-GPS-point and located after2
the station Sin the GPS trip T; otherwise, Thas not crossed the station S. If exists,3
this GPS point is considered as the after-GPS-point. An example of a GPS trajectory4
that does not have an after-GPS-point (i.e., has not crossed the station) is depicted in5
Figure 3(b) .6
3. The distance of before-GPS-point and after-GPS-point to the station Smust be less7
than a specified threshold; otherwise, Thas not crossed the station S. For example, it8
might pass a street parallel to the station S, as shown in Figure 3(c). The threshold9
value in our labeling process is set to 2 miles. Note that if a GPS trajectory is pass-10
ing through a parallel street yet its distance to the station is less than the specified11
threshold, the GPS trajectory is detected and discarded by the next filtering criterion.12
4. Due to the existence of parallel and similar roadways around a station, there is still a13
chance that a GPS trajectory passes all the previous criteria without actually crossing14
the station. This situation particularly happens for the station 2, in which the road15
geometry around this station is very complex with several grade-separated road junc-16
tions. This complex interchange contains road sections that are parallel with the VWS17
lane. This causes to detect many GPS trajectories that have passed the previous cri-18
teria while they have not crossed the station 2, as shown in Figure 3(d). Accordingly,19
more restricted criteria are implemented. In this case, we utilize the nearest service20
in OSRM to snap the GPS points of T, that are close to a VWS, into the nearest21
coordinates in the traffic network. The names of roads associated to each snapped22
GPS point are then retrieved. The GPS trajectory Thas crossed the VWS only if it23
has traversed the road sections that end up to the VWS location.24
3.2.3. Step 4: Predicting time-window25
We have thus far obtained some GPS trajectories that have crossed one of the three VWS.26
The before and after GPS point lists have also been identified for each GPS trajectory with27
respect to its corresponding station. Having the GPS trajectory Tand the crossed station28
S, the following steps are performed to approximate the time-window that Thas crossed S.29
Figure 4 illustrates the process of predicting the time-window for a sample GPS trajectory.30
1. Computing the approximate crossing time. Using the OSRM route service, the fastest31
route between the before-GPS-point (or alternatively the after-GPS-point, whichever32
14
GPS point
VWS
Figure 4: Predicting the crossing time-window from a VWS for a GPS trajectory. Notations are referred
inside the text.
is closer to the station S) and the station Sis computed. The approximate crossing1
time of station Sby the GPS trajectory Tis then computed as follows:2
CT S =pb[t]+T Rosrm OR CT S =pa[t]T Rorsm (1)
where pb[t]and pa[t]are the timestamps for the before-GPS-point and after-GPS-3
point, respectively. T Rorsm is the travel time duration between either pb[t]or pa[t]to4
the station S, retrieved from the OSRM route service.5
2. Computing the travel-time error. Travel time retrieved from OSRM does not6
exactly match with the real travel time recorded by GPS-enabled devices. Thus, we7
need to obtain the travel-time error introduced by OSRM, which is then used as the8
time-window around CT S . Two immediate before and after GPS points from the BLT S
9
and ALT S are selected and concatenated to form a GPS sequence for the time-window10
prediction. After snapping the GPS sequence into the traffic network using the map-11
matching service, the travel times computed by OSRM between every two consecutive12
GPS points in the GPS sequence are obtained. Comparing OSRM travel times against13
real travel times computed from the timestamp attribute of GPS points results in the14
average error as follows:15
MAE =
N
i=1T Rireal T Riosrm
N
(2)
where T Rireal and T Riosrm are the real and OSRM-based travel times between every16
two consecutive GPS points, respectively. MAE is the mean absolute error in approxi-17
15
mating CT S . In other words, MAE shows the error imposed by OSRM when computing1
CT S according to Equation (1). In computing the travel-time error, we particularly2
use very few points (i.e., two immediate GPS points from the before and after lists)3
around the VWS to consider traffic conditions and driver behavior only around that4
specific location, rather than over the whole GPS trajectory.5
3. Computing crossing time-window. Finally a crossing time-window is constructed6
around the CT S using the average travel-time error as follows:7
CT S MAE CTS CT S +MAE (3)
3.2.4. Step 5: Labeling GPS trajectories8
After predicting the time window for CT S , we assign the class label to the Taccording9
to the following steps:10
1. As mentioned, the raw GPS data have categorized into three weight groups by the11
GPS provider. The vehicle records that their crossing time and gross weight attributes12
fall in the predicted time-window and the original weight category, respectively, are13
fetched from the VWS vehicle records associated with the station S. Note that no14
vehicle records might be retrieved from the VWS dataset as the predicted crossing15
time-window is commonly very short due to the low amount of error in Equation (5).16
Low travel-time error is a good sign about our crossing-time prediction accuracy.17
2. The GPS trajectory Tis labeled to a vehicle class only if all vehicle records retrieved18
in the last step are the same. Although this leads to a smaller number of labeled GPS19
trajectories, such a conservative approach results in very accurate labeling. The GPS20
trajectory is discarded as an unlabeled trajectory if a unique class is not found among21
all retrieved vehicle records. The ‘Before Augmentation’ column in Table 2 shows the22
number of GPS trajectories that have been labeled using our labeling approach. Only23
6,906 trajectories out of almost 300,000 initial GPS trajectories, retrieved from the first24
step of the labeling approach, have been labeled through our proposed strategy. The25
low number of labeled data at this stage, which mainly stems from applying strong26
constraints, is a good sign that our approach only labels GPS trajectories that have27
truly crossed a VWS.28
3.2.5. Step 6: Augmentation of labeled data29
Finally, in order to augment labeled data, the GPS trajectories having the same device30
ID as the trajectories labeled in the previous step are extracted from the dataset containing31
16
20 million trajectories and annotated with corresponding vehicle classes. As can be seen in1
Table 2, extracting trajectories with the same device ID significantly augments the number2
of labeled trajectories, from almost 7,000 to 500,000. Due to the augmentation opportunity,3
we deliberately implemented strong filtering criteria and labeling constraints so as to obtain4
labeled GPS data with the highest possible quality out of a programmatic approach. More-5
over, due to the lack of ground truth data, the proposed strategy was refined through several6
iterative visualizations to ensure the accuracy of labeled data. It should be noted that the7
‘After Augmentation’ column in Table 2 summarizes the output of our labeling process while8
the ‘After Processing (Final)’ column shows the number of labeled GPS data per class used9
for developing CNN-VC after applying the pre-processing steps described in Section 5.10
Table 2: Number of labeled GPS trajectories per vehicle class obtained in various stages
Vehicle Class Before Augmentation After Augmentation After Processing (Final)
Class 2 3,306 83,917 42,473
Class 3 46 8,819 5,961
Class 4 86 12,240 8,128
Class 5 1,372 302,960 100,000
Class 6 208 40,361 24,610
Class 7 356 43,717 29,178
Class 8 100 13,006 7,968
Class 9 630 17,516 10,777
Class 10 16 105 85
Class 11 3 20 16
Class 12 1 689 284
Total 6,906 523,350 229,480
3.2.6. Scalabitlty of the labeling approach11
As described, our proposed labeling approach is a computationally-extensive task since12
labeling every GPS trajectory demands numerous computation steps. This makes the la-13
beling of such a large-scale GPS data unscalable without using parallel computing systems.14
Apache Spark, a distributed computing engine that processes data in parallel, is deployed15
to make this procedure scalable by expediting the process 13x faster. For instance, the com-16
putation time for labeling one month of GPS data reduced from 8 hours to 35 minutes by17
means of Apache Spark.18
17
4. The Proposed Model1
The proposed CNN-VC is comprised of two main components: 1) New representation for2
GPS trajectories, and 2) Convolutional Neural Network (CNN). In this section, these two3
components and the way they interact with each other for identifying vehicles’ classes from4
GPS trajectories are elaborated. The process for training our CNN-VC is also described in5
this section.6
4.1. GPS trajectory representation7
As defined, a GPS trajectory contains only a series of chronologically ordered points,8
with only coordinate and timestamp information for each point. The raw values in such9
an ordered list is incapable of representing the vehicle’s moving pattern and roadway char-10
acteristics, because the high or low values of coordinates and timestamps do not convey11
any meaningful information. Moreover, although this numerical ordered list can be blindly12
fed into traditional supervised algorithms (e.g., SVM), its structure is not adaptable for13
deep-learning models such as CNN and RNN. Therefore, a novel representation of GPS14
trajectories suitable for deep learning algorithms is proposed, which accounts for both the15
vehicle’s motion features and roadway characteristics.16
Unlike other studies [6, 19], we leverage the OSRM APIs to obtain more accurate motion17
information and access to fine-grained roadway features. First, GPS coordinates in a raw18
GPS trajectory are snapped into the traffic network using the OSRM map-matching service.19
The matching process is accompanied by several advantages: 1) Every GPS point is snapped20
to the nearest node in the traffic network, which in turn alleviates errors that may be21
attributed to the GPS technology (e.g., satellite orbits, receiver clocks, and atmosphere22
effects) and the GPS data provider (i.e., rounding the latitude/longitude decimal numbers23
to the 4th decimal number), 2) The outlier GPS points are removed if they cannot be24
appropriately matched, 3) The sequence of matched points represents a real and plausible25
route in the network. The matched GPS trajectory can be imagined as a sequence of GPS26
legs, denoted by lg T,T=[lg1,...,lgN], where every lg is the route segment between two27
consecutive matched GPS points. Such a structure is analogous to the sentence structure28
in Natural Language Processing (NLP) tasks, in which a sentence is comprised of a word29
sequence. Our main objective in this section is to represent every lg with a numerical30
feature vector. The feature vector corresponding to a GPS lg posses two types of features:31
motion-related and road-related features.32
The following motion features are computed for every lg in a GPS trajectory T:33
Distance, denoted by D: Distance is the accurate map-based length traveled by lg,34
which is extracted from the OSRM API. Unlike other studies [6, 19], we do not compute35
18
the geodesic distance (i.e., the direct distance between two locations) using well-known1
formulas such as Haversine and Vincenty. Instead, the map-based distance is computed2
using the OSRM API to take the traffic network configuration into account.3
Duration, denoted by ∆t: Duration is the travel time for lg. The travel time is cal-4
culated based on the original timestamp information of GPS points, rather than the5
travel time computed by OSRM.6
Speed, denoted by S: Speed is the rate of change in distance for traveling the road7
segment of lg that shows how fast a driver is moving. Speed is calculated based on the8
map-based distance Dand the real duration ∆t.9
Acceleration, denoted by S: Acceleration is the rate of change in speed. The accel-10
eration for the current lg is calculated based on the speed change of the current and11
subsequent legs. Accordingly, the last lg in Twill be discarded.12
Bearing rate, denoted by BR. Bearing rate for lg indicates to what extent the vehicle’s13
direction has changed between the start and the end points in lg.BR is the difference14
between bearings of the starting and end points. The bearings of start/end points15
are the clockwise angle from true north to the direction of the travel immediately16
after/before the start/end points of the lg. Likewise to distance D, the bearing values17
are retrieved from OSRM so as to obtain more accurate information. Note that bearing18
rate is an important motion feature that varies among vehicles [24, 6]. For example,19
a semi-trailer may not travel in a route segment that requires a sharp change in the20
heading direction.21
The overall characteristics of the roadway that a vehicle uses for performing a trip is a dis-22
cerning factor for developing a robust vehicle-classification system. It is apparent that various23
types of vehicles (e.g., passenger cars versus trucks) have different transport-infrastructure24
preferences for making their regular trips. Furthermore, vehicle regulations may ban some25
vehicles (e.g., semi-trailer trucks) from traversing in dense urban environment. Accordingly,26
the following roadway characteristics are extracted from the OSRM API for every lg in a27
GPS trajectory T.28
Number of intersections, denoted by IN. Intersection is defined as any cross-path the29
vehicle passes when traveling along lg.I S is the number of all intersections in lg.30
Number of maneuvers, denoted by M. Maneuver is defined as any type of movement31
change that a vehicle needs to take when traversing lg. Turning right/left, traversing32
19
roundabout, taking a ramp to enter a highway, and merging onto a street are the1
examples of maneuver.2
Road type, denoted by R. Although a wide range of schemes are available for classifying3
roadways, a coarse-grained scheme is selected to divide roads into three groups: 1) Low-4
capacity roads including alley, street, avenue, boulevard, etc. 2) High-capacity roads5
including arterial, expressway, turnpike, parkway. 3) Limited-access grade-separated6
roads including freeway, highway, motorway, beltway, thruway, bypass, etc. The road7
type associated to each GPS lg can be extracted using the OSRM nearest service.8
Since Ris represented as one-hot encoding, a coarse-grained road-type scheme avoids9
having a high-dimensional sparse feature vector.10
The feature vector corresponding to every GPS lg is then created by concatenating the11
above-described motion and road features. Let xiRdrepresent the feature vector corre-12
sponding to the i-th leg in a trajectory, where dis the feature-vector dimension. Horizontally13
concatenating feature vectors of all legs in the GPS trajectory results in the new matrix rep-14
resentation, denoted by XRL×d, where Lis the number of legs in the GPS trajectory. The15
structure of XRL×dis shown in Figure 5 and is used as the input matrix for deep-learning16
models. Since deep learning requires Xfor all samples in a training batch to have a fixed17
size, we either truncate long GPS trajectories or pad short ones with zero values into the18
fixed size L. Our observation indicates that setting Lto a high percentile value (i.e., a value19
between 75-85%) of the number of legs in all GPS trajectories improves the overall model20
performance as long as trajectories can be represented by more motion and roadway features.21
4.2. Convolutional Neural Network22
CNN is a sophisticated network that is capable of capturing local correlations in spatial23
structures. In each convolutional layer of CNN, local correlations between adjacent input24
values are perceived by convolving a filter across the whole surface of an input volume. Con-25
sidering Xas a spatial representation, CNNs can obtain the correlation between consecutive26
GPS legs by convolving their corresponding feature vectors. The output of the convolution27
operation performed by each filter is called a feature map. Concatenating feature maps28
corresponding to multiple filters results in a new volume, which is an abstract representa-29
tion for the GPS trajectory. The subsequent convolution layer will extract a more abstract30
representation of the output volume from the previous layer.31
Figure 5 depicts our proposed architecture for identifying the vehicle class associated32
with a GPS trajectory. As can be seen in Figure 5, a vehicle’s class is detected by passing its33
GPS trajectory Xthrough multiple sets of convolutionsal layers that are stacked together.34
20
𝐺𝑃𝑆 𝑙𝑒𝑔𝑖
First Conv. Layer with
Color-Coded Feature Maps Max-Pooling
Layer
GPS Representation
𝐗 ∈ ℝ𝐿×𝑑 Softmax Layer
𝐺𝑃𝑆 𝑙𝑒𝑔1
𝐺𝑃𝑆 𝑙𝑒𝑔2
𝐺𝑃𝑆 𝑙𝑒𝑔𝐿−1
𝐺𝑃𝑆 𝑙𝑒𝑔𝐿
𝐺𝑃𝑆 𝑙𝑒𝑔𝑖+1
GPS Point
Raw GPS Trajectory
Distance
Duration
Road Type
Speed
Acceleration
Intersections
Stack of Conv. Layers
Last Conv. Layer with
Color-Coded Feature Maps
Vehicle Class
Probability Distribution
Figure 5: The structure of the proposed GPS representation and CNN-VC model. The CNN-VC comprises a
stack of convolutional layers followed by max-pooling and softmax layers. Each color code in the convolutional
layer is corresponding to one filter and its feature map
The convolution operation in each layer involves a filter FRn×d, where nis the number of1
consecutive GPS legs in the GPS trajectory and dis the filter width equal to the leg’s feature-2
vector dimension. It is very important to set the width of the filter equal to the feature-vector3
dimension since the principal goal is to extract spatial correlation between consecutive legs.4
Sliding the filter Facross the matrix Xproduces a feature map hR(Ln+1)×1, where each5
element hiof the feature map is computed as6
hi=f(FXii+n1+b),(4)
where is the dot product between the parameters of filter Fand the entries of submatrix7
Xii+n1,bis a bias term, and fis a non-linear activation function. Rectified Linear Units8
(ReLU) is utilized as the non-linear activation function fthroughout this paper. Xii+n1
9
refers to concatenation of nconsecutive GPS legs at position iin the GPS trajectory. Using10
Kfilters of the same size as F,Kfeature maps are generated. Vertical concatenation of11
the created feature maps results in a new GPS representation matrix Xnew R(Ln+1)×K,12
formally defined as13
Xnew =h1,,hLn+1.(5)
The i-th row of the matrix Xnew is an abstract feature vector for the nconsecutive14
legs in the GPS trajectory. Xnew R(Ln+1)×Kis used as the input volume for the next15
convolutional layer, where the same convolutional operation is applied. Our network can16
comprise several sets of convolutional layers, where every set has two convolutional layers17
with the same filter size and number of filters (i.e., nand k). However, the filter size and18
the number of filters may vary among layer sets. Finally, a max-pooling layer is applied to19
each feature map of the last convolutional layer. The rationale behind applying the max-20
21
pooling layer is to take one feature with the highest value (probably as the most important1
one) from each feature map. Such a max-pooling operation generates a feature vector that2
summarizes the high-level, abstract representation of a GPS trajectory. As the last step,3
the final vector representation is directly passed into the softmax function to perform the4
classification task by generating a probability distribution over vehicle classes for each GPS5
trajectory T, denoted by P={p1,, pC}, where Cis the number of vehicle classes. It should6
be noted that no fully-connected layers are used between the convolutional layers and the7
softmax layer.8
4.2.1. Training and Regularization9
Our proposed architecture is trained by minimizing the categorical cross-entropy as the10
loss function, which is formulated for every GPS trajectory Tas11
12
L(θ)=
H
i=1
yTilog(pTi),(6)
where θare all learnable parameters in the model. yTiYis a binary indicator which is13
equal to 1 if the class iis the true vehicle class for the GPS trajectory Tand 0, otherwise.14
Yis the true label for T, represented as one-hot encoding. pTiis predicted probability that15
the GPS trajectory Thas been performed by the vehicle class i. The cross-entropy loss is16
averaged across the training batch in each iteration. The Adam optimizer is used to update17
model parameters in the back-propagation process. Adam is a well-suited optimization18
technique for problems with large dataset and parameters that has recently seen broader19
adoption for deep learning applications [15]. We use Adam’s default settings as provided20
in the aforementioned paper: learningrate =0.001, β1=0.9, β2=0.999, and =108. The21
network weights are initialized by following the scheme proposed in [10].22
The vehicle-classification problem often suffers from an imbalanced distribution among23
vehicle classes, where the number of samples belonging to each class does not constitute an24
equal portion of the training dataset. Training an imbalanced dataset in the same way as25
a balanced one and without implementing appropriate strategies results in a poor classifier26
that is not robust against non-dominant classes. Therefore, the following strategies are27
implemented in our training process in order to achieve a robust classifier:28
Random over-sampling. The number of GPS trajectories in all minority classes in-29
creases to the number of GPS trajectories in the majority class by randomly replicating30
them.31
Balancing training batch. Equal proportion of GPS trajectories from each class forms32
the training batch in each training iteration.33
22
Cost-sensitive learning. Higher weights are assigned to minority classes in the cross-1
entropy loss, Equation (6), where weights are determined according to their class pro-2
portion in the dataset. This strategy penalizes misclassifications of the minority classes3
more heavily than the majority classes.4
It is worth noting that the random over-sampling and balancing the training batch signifi-5
cantly improves our model performance while the cost-sensitive learning has a low impact.6
Dropout and early-stopping are two types of regularization that are applied to our ar-7
chitecture. The dropout layer is added before the last layer (i.e., the softmax layer) with8
setting the dropout ratio to 0.5. In the early-stopping methods, the training is stopped if9
the performance metric does not improve on the validation set after two consecutive epochs.10
The model with the highest validation score is restored for applying on the test set.11
5. Experimental Results12
In this section, first, several variants of the labeled GPS data are created. Afterward, the13
performance of our proposed model will be evaluated on the created datasest by comparing14
against several classical and deep-learning models. Robust classification metrics are used15
to assess the prediction quality of our proposed model. The quality of our proposed GPS16
representation is also investigated. Finally, the overall quality of the proposed model for17
predicting different types of vehicle categorization is discussed.18
5.1. Datasets definitions and preparations19
The large-scale GPS trajectories, that are labeled according to the proposed strategy20
in Section 3, is exploited for model evaluation. Before using the GPS data for training21
purposes, the following pre-processing steps are applied to each GPS trajectory so as to22
prepare them for generating the proposed GPS representation, introduced in Section 4, and23
remove erroneous GPS points caused by error sources in the GPS technology (e.g., satellite24
or receiver clocks).25
GPS points whose speed and/or acceleration exceeds a realistic value for vehicles mov-26
ing in the US road networks are identified and discarded. The maximum speed and27
acceleration are set to 54 msand 10 ms2, respectively.28
After removing the unrealistic GPS points, the GPS trajectories containing less than 429
points are disregarded. Similarly, GPS trajectories shorter than 600 meters or 10 min30
were disregarded as well.31
23
The maximum length (i.e., the number of GPS legs) is set to 70, which is equivalent1
to the 80 percentile of the number of GPS legs in all GPS trajectories.2
After converting raw GPS trajectories into the proposed GPS representation, except for3
the road type binary features, all other types of features are standardized by subtracting4
the mean value and dividing by the standard deviation.5
The number of labeled GPS trajectories per each vehicle class after applying the above-6
mentioned processing steps has been shown in the last column of Table 2. The processed7
GPS data are used for building our proposed model and baselines.8
After pre-processing GPS trajectories, several variants of the original labeled dataset are9
created by re-grouping vehicle classes so as to examine and validate the proposed model.10
The vehicle-class definition is the only difference among the following datasets.11
2&3 This dataset contains GPS trajectories only from FHWA class 2 and class 3. The12
proposed model needs to discriminate passenger cars (e.g., sedans, coupes, and station13
wagons) from all other two-Axle, four-tire single unit vehicles (e.g., campers, motor14
homes, and ambulances). As mentioned, the vehicles with class 2 and 3 of the FHWA15
system are categorized into one group (i.e., vehicles with gross weight less than 14,00016
lbs) according to the GPS provider classification system. Hence, this type of problem is17
particularly useful for subdividing GPS trajectories with the gross weight under 14,00018
lbs into the class 2 and class 3 of the FHWA system.19
light&heavy This dataset categorizes all GPS trajectories into two classes: (1) light-20
duty, which contains vehicles from class 2 and class 3, and (2) heavy-duty, which21
contains vehicles from all the other classes. This definition was first introduced in [20]22
and also used in [19].23
light&medium&heavy This dataset categorizes all GPS trajectories into three classes:24
(1) light-duty, which contains vehicles from FHWA class 2, (2) medium-duty, which25
contains vehicles from FHWA classes 3-6, (3) heavy-duty, which contains vehicles from26
FHWA class 7 and above. This classification is based on Gross Vehicle Weight Rating27
(GVWR) system, reported in the US Department of Energy Portal. Note that the28
corresponding FHWA vehicle classes in each category of GVWR is slightly different29
from weight category system defined by the GPS provider.30
5.2. Baselines31
Two types of baseline methods are considered for comparison: 1) classical-supervised32
methods on the top of hand-crafted features, 2) deep-learning models on the top of GPS33
representation proposed in Section 4.34
24
5.2.1. Classical-supervised baselines1
In the first group, widely used supervised algorithms in the literature of GPS-based vehi-2
cle classification and travel mode detection are deployed including K-Nearest Neighborhood3
(KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), as a4
representative of ensemble algorithms, and Multilayer Perceptron (MLP), as a regular fully-5
connected neural networks. Since the proposed GPS representation is not an appropriate6
input for these algorithms, a set of robust and efficient hand-designed features are extracted7
from each GPS trajectory. These features, which are introduced in the seminal study by8
[24] include the trajectory’s length, mean speed, expectation of speed, variance of speed,9
top three speeds, top three accelerations, heading change rate, stop rate, and speed change10
rate. These are the most robust and acceptable hand-designed features in this domain that11
have been used by various studies. Note that since the MLP technique is fed by these12
hand-designed features, we group it as a classical baseline in spite of its deep nature.13
5.2.2. Deep-learning baselines14
With the respect to the second group, several deep-learning models are developed for15
comparison. Major training blocks for building deep-learning baselines are CNN, RNN, and16
attention mechanism.17
Recurrent Neural Network. An RNN is a chain-like neural network architecture that18
allows information to be passed from one step to another step. Thus, they constitute a series19
of repeating modules of neural networks, where each module consists of a hidden state that20
operates on sequential data like GPS trajectory representation X[3]. For our proposed GPS21
representation, as shown in Figure 6, each module looks at the current input xi, which is the22
feature vector corresponding to the i-th GPS leg in the GPS trajectory, and the previous23
hidden state to update the hidden state at the current module through a set of non-linear24
functions. Long Short Term Memory network (LSTM) is the most widely used module in25
RNN layers that utilizes several non-linear functions to update the hidden state in each RNN26
module. The hidden state in the last layer is often used as the output of an RNN layer.27
Further details on the RNN and LSTM models can be found in [3, 13].28
Attention mechanism. Attention mechanism, first introduced by [1] is a new trend29
in deep-learning models that have received much interest in recent years. Attention helps30
a neural network to learn the importance of various regions of an input and focus more31
on certain input parts for performing the task-at-hand. It seeks to emulate the human’s32
visual attention mechanism, which often pays more attention to only a small amount of33
information presented in visual scenes while doing their routine chores. Although a number of34
studies have assessed the efficacy of the attention mechanism in several applications including35
25
Figure 6: The structure of an RNN layer with attention mechanism. Analogously, the attention mechanism
can be applied to the convolutional layers.
neural machine translation [1], text classification [9], text summarization [17], and caption1
generation [21], no study has examined the attention mechanism in the GPS trajectory2
mining tasks.3
In particular for our application, the ultimate goal in the attention layer is to compute4
an importance weight for each abstract leg. Let uRKbe the attention vector, where K5
is the number of feature maps in the last CNN/RNN layer. Let Xl ast be the output of the6
last convolutional layer, in which xiRKis the feature vector corresponding to the i-th7
abstract GPS leg in Xlast. The importance of the abstract leg xiis measured by computing8
its similarity with the attention vector u, and then get a normalized importance weight9
αithrough a softmax function. Elements in the attention vector uare trainable weights10
that are jointly learned with other network weights during the training process. Finally,11
the weighted representation of abstract legs are summed up to generate the attention vector12
representation, which is fed into a softmax layer for performing the classification task. Figure13
6 depicts how the attention mechanism works on the top of an RNN layer. Further details14
on the attention mechanism are available in the seminal studies by [1, 22].15
Using CNN, RNN, and attention mechanism, the following deep-learning baselines are16
used, where all trained on the top of our proposed GPS representation:17
RNN: A single RNN layer with LSTM unit is used. The hidden state in the last18
LSTM unit is directly fed into a softmax layer for the classification task.19
RNN-Attention: The model structure is the same as the RNN baseline while an20
attention mechanism is used between the RNN and softmax layers, as shown in Figure21
6.22
26
Parallel-CNN: Considering a convolutional layer followed by a max-pooling layer as1
one convolutional unit, several units are operated on the same GPS representation on a2
parallel way. The abstract feature-vector generated by each unit is then concatenated3
to create the final feature vector, which is fed into the softmax layer for the classification4
task. Note that convolutional and max-pooling operations are similar to our proposed5
model in Section 4. The primary difference between this baseline and our proposed6
architecture is that convolutional layers in our model are stacked on the top of each7
other and in turn operated sequentially rather than in a parallel way.8
5.2.3. Performance Evaluation9
As mentioned, the vehicle classification represents an imbalanced classification problem.10
In this situation, a predictive model could simply achieve high accuracy by predicting all11
unseen records as the dominant class and considering samples in the minority class as noises.12
A robust and high-quality model needs to have a good prediction capability for both majority13
and minority classes. For imbalanced classification problems, recall is much more reliable14
performance metric that implicitly indicates the model accuracy for retrieving true positives15
corresponding to each class. Since we are seeking a model that is able to retrieve a majority16
of true positives for all vehicle classes, average recall is selected as the main performance17
measure. Average recall is computed based on one specific probability threshold (e.g., 0.5).18
However, the threshold value for classification might be subjected to change depending on19
application objectives. Area Under the Receiver Operating Characteristic curve (AUROC)20
is another reliable metric that, like recall, measures the model accuracy for each class. For21
a binary-class problem, Receiver operating characteristic Curve is created by plotting true22
positive rate (i.e., recall for positive class) versus false positive rate (i.e., 1recall for negative23
class), for various threshold settings. The following performance metrics are used for models24
evaluation:25
AVE-Recall. AVE-Recall is the average recall values of all vehicle classes in a dataset,26
which is defined as below:27
AVE Recall =1
C
C
c=1
T Pc
T Pc+F Nc
28
where T Pcand F Ncare the number of true positives and false negatives in the test set29
related to the class c.Cis the total number of vehicle class in a dataset.30
AUROC. First, AUROC is computed for each class, and then the average AUROC for31
all classes is reported. For computing AUROC for each class in a multi-class problem,32
the subject class is considered as positive while the remaining assigned as negative.33
27
Accuracy. Accuracy is computed as the fraction of GPS trajectories in the test set1
that are correctly classified. While accuracy is not considered as the main metric in2
this study on account of imbalanced distribution issue, we report this metric because3
it is the most common classification metric.4
In all our experiments, models are trained and tested using stratified 5-fold cross-validation5
and average values along with the standard deviations of the results on all 5-fold are reported.6
In other words, we create 5 train/test splits from the whole dataset. In each split, 4 folds are7
used for training while holding out the remaining one-fold as the test set. Before obtaining8
the average results on the 5 test folds, models’ hyperparameters are tuned. To this end, one9
fold out of the 4-fold training is used as the validation set for each split. The hyperparameter10
combination that on average achieves the highest performance on the validation sets of all11
splits is used for the final training and testing on 5 train/test splits. In this case, the test fold12
in each split plays no role in tuning hyperparameters. Next, using the obtained hyperpa-13
rameters, models are trained on the folds and tested on one fold for each train/test split and14
the average results are reported. By doing this, every GPS trajectory in the whole dataset is15
used in the test set at least and at most once. Only for implementing the early-stop method16
in deep-learning models, 10% of the training data in each split is randomly selected as the17
validation set for the early-stopping procedure using the stratified random selection.18
All data processing and models are implemented within Python programming language.19
The deep-learning architectures are learned in TensorFlow with the GPU support. Classic20
machine-learning algorithms are implemented using the scikit-learn library. The source codes21
related to all data processing and models utilized in this study are available at: https:22
//github.com/sinadabiri/CATT-DL-Vehicle-Classifcation-GPS.23
5.3. Hyperparameter Settings24
The number of convolutional sets, denoted as D, the filter size n, and the number of25
filters Kin each layer are the primary hyperparameters in our proposed architecture. The26
grid search is a common approach for tuning hyperparameters that exhaustively considers27
all parameter combinations, yet it may not be an efficient way for deep-learning models with28
many hyperparameters. Instead, a manual search is conducted by designing a variety of29
model configurations and selecting the best parameter combination. We tune the hyperpa-30
rameters based on the AVE-Recall metric by examining the model performance on only the31
2&3 dataset. Table 3 shows the average of AVE-Recall along with the standard deviation32
over 5 folds for several CNN-VC configurations. A wide range of configurations has been33
designed by varying the number of convolutional sets, filter size, and number of filters in34
the range of D[1,2,3],n[1,2,3,4], and K[25,50,75,100,125], respectively. As can35
28
be seen from Table 3, setting the filter size and the number of filters in each convolutional1
layer to 2 and 100, respectively, achieves the optimum performance. Furthermore, increas-2
ing the number of convolutional sets, as used in configurations #9-10, does not boost the3
performance. Thus, a network with one convolutional set is selected to avoid the model com-4
plexity and reduce the training time. Even using multiple sets of convolutional layers with5
different filter sizes and number of filters (i.e., configuration #11) does not ameliorate the6
model performance. Finally, comparison between configurations #4 and #12 demonstrates7
that using a max-pooling operation on the top of the last convolutional layer significantly8
improves the prediction quality by increasing the AVE-Recall by more than 4%.9
Table 3: AVE-Recall for several CNN-VC configurations. The convolutional layer hyperparameters are
denoted as conv(filter size)-(number of filters). Each conv set consists of two stacked convolutional layers
with the same hyperparameters. NA: Not Available
#Config Conv-set 1 Conv-set 2 Conv-set 3 Max-Pool AVE-Recall
1 Conv2-25 NA NA Available 0.793(±0.003)
2 Conv2-50 NA NA Available 0.805(±0.007)
3 Conv2-75 NA NA Available 0.807(±0.004)
4 Conv2-100 NA NA Available 0.821(±0.005)
5 Conv2-125 NA NA Available 0.811(±0.007)
6 Conv1-100 NA NA Available 0.809(±0.008)
7 Conv3-100 NA NA Available 0.798(±0.007)
8 Conv4-100 NA NA Available 0.777(±0.012)
9 Conv2-100 Conv2-100 NA Available 0.789(±0.010)
10 Conv2-100 Conv2-100 Conv2-100 Available 0.784(±0.016)
11 Conv2-50 Conv3-75 Conv4-100 Available 0.791(±0.004)
12 Conv2-100 NA NA NA 0.776(±0.007)
5.4. Comparison results10
Tables 4, 5, and 6 summarize the average results of the stratified 5-fold cross-validation11
along with the standard deviation for our CNN-VC model and baselines in terms of AVE-12
Recall, AUROC, and accuracy, respectively. In addition, the impact of the attention mecha-13
nism in our CNN-VC model is investigated by adding the attention layer, as shown in Figure14
6, on the top of the convolutional layers. Since the attention layer summarizes the high-level15
GPS representation into a feature vector for passing into the softmax layer, the max-pooling16
operation is taken out from the network. Like our CNN-VC model, the hyperparameters17
associated with every baseline is first tuned on the 2&3 datasets. Then, using the optimum18
combination of hyperparameters, every model is trained and tested on the three datasets:19
2&3, light&heavy, and light&medium&heavy.20
29
Table 4: Comparison of AVE-Recall values on created datasets using classical-supervised algorithms, deep-
learning baselines, and our proposed CNN-VC model.
Model \Dataset 2&3 light&heavy light&mid&heavy
KNN 0.566 (±0.005) 0.568 (±0.001) 0.392 (±0.001)
SVM 0.623 (±0.024) 0.582 (±0.010) 0.421 (±0.014)
DT 0.576 (±0.004) 0.616 (±0.001) 0.438 (±0.001)
RF 0.555 (±0.004) 0.637 (±0.002) 0.467 (±0.002)
MLP 0.649 (±0.012) 0.624 (±0.010) 0.437 (±0.006)
RNN 0.766 (±0.048) 0.732 (±0.002) 0.615 (±0.011)
RNN+Attention 0.776 (±0.014) 0.732 (±0.002) 0.619 (±0.002)
CNN-Parallel 0.752 (±0.008) 0.701 (±0.004) 0.573 (±0.003)
CNN-VC (ours with attention) 0.812 (±0.011) 0.735 (±0.003) 0.614 (±0.002)
CNN-VC (ours) 0.819 (±0.005)0.741 (±0.005)0.631 (±0.004)
Table 5: Comparison of AUROC values on created datasets using classical-supervised algorithms, deep-
learning baselines, and our proposed CNN-VC model.
Model \Dataset 2&3 light&heavy light&mid&heavy
KNN 0.585 (±0.005) 0.593 (±0.001) 0.557 (±0.001)
SVM 0.692 (±0.022) 0.649(±0.009) 0.603 (±0.010)
DT 0.576 (±0.004) 0.619 (±0.001) 0.575 (±0.001)
RF 0.737 (±0.004) 0.687 (±0.002) 0.634 (±0.001)
MLP 0.685 (±0.012) 0.655 (±0.011) 0.591 (±0.007)
RNN 0.873 (±0.044) 0.830 (±0.004) 0.797 (±0.007)
RNN+Attention 0.884 (±0.008) 0.831 (±0.004) 0.801 (±0.001)
CNN-Parallel 0.845 (±0.006) 0.789 (±0.004) 0.758 (±0.002)
CNN-VC (ours with attention) 0.909 (±0.003) 0.832 (±0.001) 0.798 (±0.002)
CNN-VC 0.910 (±0.003)0.840 (±0.002)0.810 (±0.002)
Considering AVE-Recall and AUROC as the most reliable metrics, Tables 4 and 5 clearly1
show the superiority of our CNN-VC model in comparison with both classical-supervised and2
deep-learning models for all three datasets. With respect to AVE-Recall, the CNN-VC model3
achieves on average 19% and 4% better performance compared to the classical-supervised4
and deep-learning models, respectively. Analogously, the CNN-VC surpasses the baselines by5
obtaining on average 22% and 3% higher AUROC in comparison with the classical-supervised6
and deep-learning models, respectively.7
What is striking about Tables 4 and 5 is the significant superiority of deep-learning models8
compared to classical machine-learning techniques. One of the key differences between these9
two architectures is the structure of their input volume, the proposed GPS representation10
versus a set of hand-crafted features. Since the ultimate performance of a machine-learning11
30
Table 6: Comparison of Accuracy values on created datasets using classical supervised algorithms, deep-
learning baselines, and our proposed CNN-VC model.
Model \Dataset 2&3 light&heavy light&mid&heavy
KNN 0.672 (±0.002) 0.611 (±0.001) 0.395 (±0.001)
SVM 0.658 (±0.019) 0.534 (±0.027) 0.337 (±0.055)
DT 0.814 (±0.002) 0.737 (±0.001) 0.511 (±0.002)
RF 0.869 (±0.002)0.790 (±0.001) 0.589 (±0.002)
MLP 0.493 (±0.053) 0.764 (±0.058) 0.602 (±0.023)
RNN 0.763 (±0.165) 0.751 (±0.022) 0.607 (±0.048)
RNN+Attention 0.853 (±0.022) 0.749 (±0.021) 0.587 (±0.025)
CNN-Parallel 0.720 (±0.041) 0.670 (±0.029) 0.562 (±0.036)
CNN-VC+Attention 0.811 (±0.043) 0.771 (±0.011) 0.596 (±0.019)
CNN-VC 0.856 (±0.016) 0.771 (±0.025) 0.610 (±0.004)
algorithm is highly contingent on the quality of its input representation, the superiority of1
deep-learning models strongly demonstrates the effectiveness of our proposed GPS repre-2
sentation. Another interesting finding is that using the attention mechanism on the top3
of RNN and CNN blocks could not improve the model performance. This is a compelling4
reason to utilize the pooling operation instead of the attention mechanism in our proposed5
network. The comparison between RNN and CNN based models reveals that although the6
RNN architecture generates competitive results, our CNN-VC outperforms by improving the7
AVE-Recall and AUROC on average 3% and 2%, respectively. Finally, the clear superiority8
of CNN-VC compared to CNN-Parallel demonstrates that stacking convolutional layers in9
sequence ameliorates the overall performance of the model, against deploying convolutional10
layers in parallel.11
Although the accuracy is not a reliable metric for the imbalanced problems, we do report12
the accuracy results in Table 6 to evaluate the overall performance of our model in terms of13
this widely used metric. As can be seen, except for RF, our CNN-VC model works better14
than all baselines for three datasets. While the model is forced to have close prediction15
quality for all vehicle classes, it achieves high and reasonable accuracy as well. The high16
accuracy value in contrast to low AVE-Recall and AUROC values for some baselines (e.g.,17
RF and DT) is another proof that using only accuracy measure for imbalanced problems is18
not sufficient for model evaluation.19
5.5. Effect of GPS Representation20
As noted, the comparison results have already demonstrated the effectiveness of our pro-21
posed GPS representation. In addition, the effect of motion features and roadway features22
31
are investigated by training our model on the top of different variants of GPS representa-1
tion: (1) only motion features, (2) only roadway features, and (3) combination of motion and2
roadway features. Table 7 shows the average results for variants of GPS representation on3
the 2&3 dataset. From Table 7, it is apparent that the impact of motion features in improv-4
ing the model quality is more significant. However, fusing motion features with roadway5
characteristics improves the prediction quality by more than 2% in terms of AVE-Recall,6
as the main metric value. Also, it should be noted that the GPS representation structure7
is very adaptable to be combined with information from other sources (e.g., environmental8
data) upon availability.9
Table 7: Performance comparison of the CNN-VC model with variants of GPS representation on the 2&3
dataset
GPS Representation AVE-Recall AUROC Accuracy
Only motion features 0.794 (±0.011) 0.896 (±0.004) 0.859 (±0.029)
Only roadway features 0.690 (±0.005) 0.771 (±0.005) 0.667 (±0.023)
Motion and roadway features 0.819 (±0.005)0.910 (±0.003)0.856 (±0.016)
5.6. Model performance interpretation10
An interesting analysis is to investigate the model prediction quality for every vehicle11
class. Confusion matrix, as shown in Figure 7 for all types of datasets, allows us to visualize12
our CNN-VC quality at retrieving true positives for each vehicle class. Values in diagonal13
elements of Figure 7 represent the percentage of GPS trajectories per each class in the test14
set that the CNN-VC model has correctly predicted (i.e., the recall value for each vehicle15
class). On the other hand, the off-diagonal values in each row shows the misclassification16
rate. The values in the parenthesis represent the number of true GPS trajectories that were17
assigned to various classes.18
As shown in Figure 7(a), the CNN-VC achieves almost the same performance for correctly19
predicting both the majority vehicle class (i.e., class 2) and the minority vehicle class (i.e.,20
class 3), with recall values around 81%. It should be noted that the FHWA class 2 and class21
3 include vehicles with similar operating functions and in turn similar moving patterns. For22
example, pickups and vans, which are categorized as the FHWA class 3, may be utilized as23
passenger cars, which is the main function of vehicles in the FHWA class 2. Accordingly, a24
high-quality performance for discriminating between the FHWA class 2 and 3 is not a trivial25
task, yet it has been attained by the CNN-VC model. Since the raw GPS data collected26
by the GPS provider categorizes these two classes in the same weight group (i.e., less than27
32
14,000 lbs), a practical benefit of the classifier on the 2&3 dataset is to distinguish the GPS1
trajectories of class 2 from class 3 across the entire dataset including 20 million GPS traces.2
In a similar fashion, it can be observed from Figure 7(b) that the proposed model has also3
a reasonable performance when vehicles are categorized into light-duty versus heavy-duty.4
The model’s capability for detecting both the non-dominant class (i.e., light class in this5
dataset) and the dominant-class (i.e., heavy class in this dataset) are roughly similar, with6
the recall values around 74%. It should be emphasized that even the FHWA 13-category rule7
sets have some difficulty in distinguishing between vehicle categories when the information8
on the number, weight, and spacing of axles are available by means of traffic flow sensors [12].9
For example, a pickup truck with conventional two-tire real axle (FHWA class 3) cannot be10
distinguished from the similar truck yet with dual tries on each side of its rear axle (FHWA11
class 5) when they are empty. This problem is exacerbated when the information related12
to the vehicle is limited to only a sequence of GPS records, rather than vehicle shape and13
weight characteristics. Considering such issues, an accuracy around 74% for both light-duty14
and heavy-duty vehicle classes can be considered as an acceptable performance while all15
information is limited to the vehicle mobility pattern obtained from GPS records.16
As expected and shown in Figure 7(c), the performance quality is worsened by increasing17
the number of vehicles in the light&medium&heavy dataset. The CNN-VC achieves a good18
performance in distinguishing light (i.e., FHWA class 2 and 3) and medium (i.e., FHWA19
class 3-6) vehicles. However, the model encounters difficulty for discriminating the heavy20
vehicles from medium ones, where 34% of heavy-duty vehicles are misclassified as medium-21
duty. Although a part of this low-quality performance stems from the quality of the original22
GPS data and the overall prediction ability of a machine-learning algorithm, distinguishing23
between medium and heavy vehicles is not a trivial task even when information on vehicles’24
axles is available. For instance, a light truck pulling utility trailer, which is classified as the25
FHWA class 3 and medium-duty class, may have similar axle configuration to a truck pulling26
a heavy single-axle trailer, which is classified as the FHWA class 8 and heavy-duty class [12].27
Unless accessing to the axle weight information, these two trucks cannot be differentiated28
from each other even using fixed-point traffic flow sensors. Furthermore, trucks size and29
weight configurations might vary among different States and manufacturers, depending on30
State laws and companies profit strategies although they are designed for exactly the same31
purposes. Considering the boundary vehicles types (e.g., the FHWA class 6 as the medium-32
duty class against the FHWA vehicle class 7 as the heavy-duty class), that have almost the33
same operation capabilities, is sufficient to intuitively imagine the difficulty in distinguishing34
these boundary classes using only GPS information.35
In summary, the GPS sensor is a reliable alternative to overcome the shortcomings of36
33
fixed-point sensors by reducing the installation and maintenance costs, removing the spatial1
coverage limitation, and opening the opportunity for online monitoring the traffic network2
by inferring the vehicle-class distribution. However, it is obvious that GPS trajectories can3
only represent the mobility patterns of moving vehicles around the road network. Lack of4
information on the axle and weight configuration, which cannot be extracted from GPS data,5
makes the GPS-based vehicle classification harder than alternative methods, in particular6
for fine-grained classification scheme with several vehicle categories. While the results in7
this study show that building a coarse-grained vehicle-classification scheme using GPS data8
is achievable, creating a robust model for the fine-grained classification scheme requires9
additional information. A potential solution for improving the model performance while10
preserving the advantages of the GPS data (e.g., wide-area spatial coverage) is to fuse other11
information such weather conditions, that manifests the road network with more details,12
with the GPS data. As described in Section 4, our proposed GPS representation has the13
flexibility for including more information about the vehicle and road network along the GPS14
path.15
5.7. Practical Applications16
The proposed method can be readily used to label all 20 million trajectories that Mary-17
land State Highway Administration is using to support its analysis and decision making.18
Such enriching of the trajectory data would enable more comprehensive studies that require19
information about vehicle classes. Examples include, but are not limited to:20
Derivation of origin-destination tables for individual FHWA vehicle classes, which is21
especially useful for planning efforts where transportation analysts need to distinguish22
between travel patterns of passenger cars used mainly for commuting and commercial23
vehicles;24
Examining whether trajectories associated with heavy trucks are observed in downtown25
areas or along the routes with weight restrictions, which would indicate the need for26
additional enforcement in those areas in order to establish the desired level of safety;27
Analyzing whether trajectories attributed to large trucks are deviating from locations28
with weigh-in-motion systems, which would suggest the need for deploying mobile29
patrols in these regions to enforce weight limits;30
Estimation of FHWA-class-specific volumes along different road links in a transporta-31
tion network, which would allow transportation analysts to more accurately measure32
performance as it would enable them to distinguish between vehicle-hour and truck-33
hour delays;34
34
More accurate estimation of emissions based on trajectory data, where individual tracks1
can now be attributed to different vehicle classes (e.g., passenger cars vs. trucks), which2
obviously makes a big difference in estimating transportation-related emissions.3
6. Conclusion4
Vehicle classification, as an essential step in a variety of ITS applications, has been ad-5
dressed for decades using fixed-point traffic flow sensors. However, such conventional senors6
suffer from major shortcomings including high maintenance cost and low spatial coverage.7
To address these shortcomings, we aimed to leverage the spatiotemporal information of ve-8
hicles’ trips retrieved through on board GPS-enabled devices, for classifying vehicles into9
various categories. First, we designed an efficient programmatic approach to label a large10
scale GPS data with the aid of an auxiliary resource, (i.e., VWS vehicle records). Using11
the labeled GPS data, we proposed a deep convolutional neural network (CNN-VC) for12
identifying the vehicle class from its GPS trajectory. Since a GPS trajectory contains a se-13
quence of GPS coordinates without any meaningful information, a novel representation was14
designed to convert the GPS trajectory into a matrix representation. Each row in the new15
representation corresponds to motion-related and roadway-related features for the segment16
between two consecutive GPS points. Afterward, a stack of convolutional layers, followed by17
a max-pooling opertation, were applied on the top of the proposed GPS representation so as18
to compute the abstract representation of the GPS trajectory for the classification task us-19
ing the softmax operation. Our extensive experiments clearly demonstrated the superiority20
of the CNN-VC model in comparison with several state-of-the-art classical-supervised and21
deep-learning techniques.22
Acknowledgement23
The authors are grateful to Subrat Mahapatra and the Maryland State Highway Admin-24
istration who provided the GPS trajectory data used in the paper. This support is gratefully25
acknowledged, but it implies no endorsement of the findings.26
References27
[1] Bahdanau, D., Cho, K. and Bengio, Y. [2014], ‘Neural machine translation by jointly28
learning to align and translate’, arXiv preprint arXiv:1409.0473 .29
[2] Capecci, S., Krupa, C. and Systematics, C. [2009], Concept of operations for virtual30
weigh station, Technical report, United States. Federal Highway Administration.31
35
[3] Cho, K., Van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H.1
and Bengio, Y. [2014], ‘Learning phrase representations using rnn encoder-decoder for2
statistical machine translation’, arXiv preprint arXiv:1406.1078 .3
[4] Coifman, B. and Kim, S. [2009], ‘Speed estimation and length based vehicle classifi-4
cation from freeway single-loop detectors’, Transportation research part C: emerging5
technologies 17(4), 349–364.6
[5] Dabiri, S. and Heaslip, K. [2018a], ‘Developing a twitter-based traffic event detection7
model using deep learning architectures’, Expert Systems with Applications .8
[6] Dabiri, S. and Heaslip, K. [2018b], ‘Inferring transportation modes from GPS trajec-9
tories using a convolutional neural network’, Transportation research part C: emerging10
technologies 86, 360–371.11
[7] Dabiri, S. and Heaslip, K. [2018c], ‘Transport-domain applications of widely used data12
sources in the smart transportation: A survey’, arXiv preprint arXiv:1803.10902 .13
[8] Dong, Z., Wu, Y., Pei, M. and Jia, Y. [2015], ‘Vehicle type classification using a semisu-14
pervised convolutional neural network’, IEEE transactions on intelligent transportation15
systems 16(4), 2247–2256.16
[9] Du, J., Gui, L., Xu, R. and He, Y. [2017], A convolutional attention model for text clas-17
sification, in ‘National CCF Conference on Natural Language Processing and Chinese18
Computing’, Springer, pp. 183–195.19
[10] Glorot, X. and Bengio, Y. [2010], Understanding the difficulty of training deep feed-20
forward neural networks, in ‘Proceedings of the thirteenth international conference on21
artificial intelligence and statistics’, pp. 249–256.22
[11] Gupte, S., Masoud, O., Martin, R. F. and Papanikolopoulos, N. P. [2002], ‘Detection23
and classification of vehicles’, IEEE Transactions on intelligent transportation systems24
3(1), 37–47.25
[12] Hallenbeck, M. E., Selezneva, O. I. and Quinley, R. [2014], Verification, refinement, and26
applicability of long-term pavement performance vehicle classification rules, Technical27
report.28
[13] Hochreiter, S. and Schmidhuber, J. [1997], ‘Long short-term memory’, Neural compu-29
tation 9(8), 1735–1780.30
36
[14] Kafai, M. and Bhanu, B. [2012], ‘Dynamic Bayesian networks for vehicle classification1
in video’, IEEE Transactions on Industrial Informatics 8(1), 100–109.2
[15] Kingma, D. P. and Ba, J. [2014], ‘Adam: A method for stochastic optimization’, arXiv3
preprint arXiv:1412.6980 .4
[16] Markovi´c, N., Seku la, P., Vander Laan, Z., Andrienko, G. and Andrienko, N. [2018],5
‘Applications of trajectory data from the perspective of a road transportation agency:6
Literature review and maryland case study’, IEEE Transactions on Intelligent Trans-7
portation Systems (99), 1–12.8
[17] Rush, A. M., Chopra, S. and Weston, J. [2015], ‘A neural attention model for abstractive9
sentence summarization’, arXiv preprint arXiv:1509.00685 .10
[18] Seku la, P., Markovi´c, N., Laan, Z. V. and Sadabadi, K. F. [2018], ‘Estimating historical11
hourly traffic volumes via machine learning and vehicle probe data: A Maryland case12
study’, Transportation Research Part C: Emerging Technologies 97, 147–158.13
[19] Simoncini, M., Taccari, L., Sambo, F., Bravi, L., Salti, S. and Lori, A. [2018], ‘Vehicle14
classification from low-frequency GPS data with recurrent neural networks’, Transporta-15
tion Research Part C: Emerging Technologies 91, 176–191.16
[20] Sun, Z. and Ban, X. J. [2013], ‘Vehicle classification using GPS data’, Transportation17
Research Part C: Emerging Technologies 37, 102–117.18
[21] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. and19
Bengio, Y. [2015], Show, attend and tell: Neural image caption generation with visual20
attention, in ‘International conference on machine learning’, pp. 2048–2057.21
[22] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A. and Hovy, E. [2016], Hierarchical22
attention networks for document classification, in ‘Proceedings of the 2016 Conference of23
the North American Chapter of the Association for Computational Linguistics: Human24
Language Technologies’, pp. 1480–1489.25
[23] Zheng, Y. [2015], ‘Trajectory data mining: An overview’, ACM Trans. Intell. Syst.26
Technol. 6(3), 29:1–29:41.27
URL: http://doi.acm.org/10.1145/274302528
[24] Zheng, Y., Li, Q., Chen, Y., Xie, X. and Ma, W.-Y. [2008], Understanding mobility29
based on GPS data, in ‘Proceedings of the 10th international conference on Ubiquitous30
computing’, ACM, pp. 312–321.31
37
(a) Confusion matrix for 2&3 dataset
figures/confusion_light&heavy-crop.pdf
(b) Confusion matrix for light&heavy dataset
(c) Confusion matrix for light&medium&heavy dataset
Figure 7: Confusion matrices of the CNN-VC model for three datasets.
38
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper focuses on the problem of estimating historical traffic volumes between sparsely-located traffic sensors, which transportation agencies need to accurately compute statewide performance measures. To this end, the paper examines applications of vehicle probe data, automatic traffic recorder counts, and neural network models to estimate hourly volumes in the Maryland highway network, and proposes a novel approach that combines neural networks with an existing profiling method. On average, the proposed approach yields 26% more accurate estimates than volume profiles, which are currently used by transportation agencies across the US to compute statewide performance measures. The paper also quantifies the value of using vehicle probe data in estimating hourly traffic volumes, which provides important managerial insights to transportation agencies interested in acquiring this type of data. For example, results show that volumes can be estimated with a mean absolute percent error of about 20% at locations where average number of observed probes is between 30 and 47 vehicles/hr, which provides a useful guideline for assessing the value of probe vehicle data from different vendors.
Article
Full-text available
Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must understand its potential benefits beforehand in order to properly assess its value relative to the cost of acquisition. While the literature concerned with trajectory data is rich, it is naturally fragmented and focused on technical contributions in niche areas, which makes it difficult for government agencies to assess its value across different transportation domains. To overcome this issue, the current paper explores trajectory data from the perspective of a road transportation agency interested in acquiring trajectories to enhance its analyses. The paper provides a literature review illustrating applications of trajectory data in six areas of road transportation systems analysis: demand estimation, modeling human behavior, designing public transit, traffic performance measurement and prediction, environment and safety. In addition, it visually explores 20 million GPS traces in Maryland, illustrating existing and suggesting new applications of trajectory data.
Article
Full-text available
Vehicle classification has evolved into a significant subject of study due to its importance in autonomous navigation, traffic analysis, surveillance and security systems, and transportation management. While numerous approaches have been introduced for this purpose, no specific study has been conducted to provide a robust and complete video-based vehicle classification system based on the rear-side view where the camera's field of view is directly behind the vehicle. In this paper, we present a stochastic multiclass vehicle classification system which classifies a vehicle (given its direct rear-side view) into one of four classes: sedan, pickup truck, SUV/minivan, and unknown. A feature set of tail light and vehicle dimensions is extracted which feeds a feature selection algorithm to define a low-dimensional feature vector. The feature vector is then processed by a hybrid dynamic Bayesian network to classify each vehicle. Results are shown on a database of 169 videos for four classes.
Article
The categorization of the type of vehicles on a road network is typically achieved using external sensors, like weight sensors, or from images captured by surveillance cameras. In this paper, we leverage the nowadays widespread adoption of Global Positioning System (GPS) trackers and investigate the use of sequences of GPS points to recognize the type of vehicle producing them (namely, small-duty, medium-duty and heavy-duty vehicles). The few works which already exploited GPS data for vehicle classification rely on hand-crafted features and traditional machine learning algorithms like Support Vector Machines. In this work, we study how performance can be improved by deploying deep learning methods, which are recently achieving state of the art results in the classification of signals from various domains. In particular, we propose an approach based on Long Short-Term Memory (LSTM) recurrent neural networks that are able to learn effective hierarchical and stateful representations for temporal sequences. We provide several insights on what the network learns when trained with GPS data and contextual information, and report experiments on a very large dataset of GPS tracks, where we show how the proposed model significantly improves upon state-of-the-art results.
Article
In this paper, we propose a vehicle type classification method using a semisupervised convolutional neural network from vehicle frontal-view images. In order to capture rich and discriminative information of vehicles, we introduce sparse Laplacian filter learning to obtain the filters of the network with large amounts of unlabeled data. Serving as the output layer of the network, the softmax classifier is trained by multitask learning with small amounts of labeled data. For a given vehicle image, the network can provide the probability of each type to which the vehicle belongs. Unlike traditional methods by using handcrafted visual features, our method is able to automatically learn good features for the classification task. The learned features are discriminative enough to work well in complex scenes. We build the challenging BIT-Vehicle dataset, including 9850 high-resolution vehicle frontal-view images. Experimental results on our own dataset and a public dataset demonstrate the effectiveness of the proposed method.
Article
Vehicle classification information is crucial to transportation planning, facility design, and operations. Traditional vehicle classification methods are either too expensive to be deployed for large areas or subject to errors under specific situations. In this paper, we propose methods to classify vehicles using GPS data extracted from mobile traffic sensors, which is considered to be low-cost especially for large areas of urban arterials. It is found that features related to the variations of accelerations and decelerations (e.g., the proportions of accelerations and decelerations larger than 1 meter per square second, and the standard deviations of accelerations and decelerations) are the most effective in terms of vehicle classification using GPS data. By classifying general trucks from passenger cars, the average misclassification rate is about 1.6% for the training data, and 4.2% for the testing data.
Article
Roadway usage, particularly by large vehicles, is one of the fundamental factors determining the lifespan of highway infrastructure. Operating agencies typically employ expensive classification stations to monitor large vehicle usage. Meanwhile, single-loop detectors are the most common vehicle detector and many new, out-of-pavement detectors seek to replace loop detectors by emulating the operation of single-loop detectors. In either case, collecting reliable length data from these detectors has been considered impossible due to the noisy speed estimates provided by conventional data aggregation at single-loop detectors. This research refines non-conventional techniques for estimating speed at single-loop detectors, yielding estimates that approach the accuracy of a dual-loop detector’s measurements. Employing these speed estimation advances, this research brings length based vehicle classification to single-loop detectors (and by extension, many of the emerging out-of-pavement detectors). The classification methodology is evaluated against concurrent measurements from video and dual-loop detectors. To capture higher truck volumes than empirically observed, a process of generating synthetic detector actuations is developed. By extending vehicle classification to single-loop detectors, this work leverages the existing investment deployed in single-loop detector count stations and real-time traffic management stations. The work also offers a viable treatment in the event that one of the loops in a dual-loop detector classification station fails and thus, also promises to improve the reliability of existing classification stations.
Learning phrase representations using rnn encoder-decoder for 2 statistical machine translation
  • Y Bengio
and Bengio, Y. [2014], 'Learning phrase representations using rnn encoder-decoder for 2 statistical machine translation', arXiv preprint arXiv:1406.1078.