ArticlePDF Available

Impact of Driver Behavior on Fuel Consumption: Classification, Evaluation and Prediction Using Machine Learning

Authors:

Abstract and Figures

Driving behavior has a large impact on vehicle fuel consumption. Dedicated study on relationship between driving behavior and fuel consumption can contribute to decrease the energy cost of transportation and the development of the behavior assessment technology for the ADAS system. So, it is vital to evaluate this relationship in order to develop more ecological driving assistance systems and improve vehicle fuel economy. However, modeling driving behavior under dynamic driving conditions is complex, making quantitative analysis of the relationship between driving behavior and fuel consumption difficult. In this paper, we introduce two kinds of machine learning methods for evaluating the fuel efficiency of driving behavior using naturalistic driving data. In the first stage, we use an unsupervised spectral clustering algorithm to study the macroscopic relationship between driving behavior and fuel consumption, using data collected during the natural driving process. In the second stage, dynamic information from the driving environment and natural driving data are integrated to generate a model of the relationship between various driving behaviors and the corresponding fuel consumption features. The dynamic environment factors are coded into a processible, digital form using a deep learning-based object detection method, so that the environmental data can be linked with the vehicle’s operating signal data to provide the training data for the deep learning network. The training data is labeled according to its fuel consumption feature distribution, which is obtained from road segment data and historical driving data. This deep learning-based model can then be used as a predictor of the fuel consumption associated with different driving behaviors. Our results show that the proposed method can effectively identify the relationship between driving behavior and fuel consumption on both macro and micro levels, allowing for end-to-end fuel consumption feature prediction, which can then be applied in advanced driving assistance systems.
Content may be subject to copyright.
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Impact of driver behavior on fuel
consumption: classification, evaluation
and prediction using machine learning
PENG PING1, WENHU QIN1, YANG XU1, CHIYOMI MIYAJIMA2,(Member, IEEE) and KAZUYA
TAKEDA3, (Senior Member, IEEE).
1School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
2School of Informatics, Daido University, Nagoya, 457-8530, Japan
3The Graduate School of Informatics, Nagoya University, Nagoya, 464-0814, Japan
Corresponding author: Wenhu Qin (e-mail: qinwenhu@seu.edu.cn).
ABSTRACT Driving behavior has a large impact on vehicle fuel consumption. Dedicated study on
relationship between driving behavior and fuel consumption can contribute to decrease the energy cost
of transportation and the development of the behavior assessment technology for the ADAS system. So, it is
vital to evaluate this relationship in order to develop more ecological driving assistance systems and improve
vehicle fuel economy. However, modeling driving behavior under dynamic driving conditions is complex,
making quantitative analysis of the relationship between driving behavior and fuel consumption difficult.
In this paper, we introduce two kinds of machine learning methods for evaluating the fuel efficiency of
driving behavior using naturalistic driving data. In the first stage, we use an unsupervised spectral clustering
algorithm to study the macroscopic relationship between driving behavior and fuel consumption, using data
collected during the natural driving process. In the second stage, dynamic information from the driving
environment and natural driving data are integrated to generate a model of the relationship between various
driving behaviors and the corresponding fuel consumption features. The dynamic environment factors are
coded into a processible, digital form using a deep learning-based object detection method, so that the
environmental data can be linked with the vehicle’s operating signal data to provide the training data for the
deep learning network. The training data is labeled according to its fuel consumption feature distribution,
which is obtained from road segment data and historical driving data. This deep learning-based model can
then be used as a predictor of the fuel consumption associated with different driving behaviors. Our results
show that the proposed method can effectively identify the relationship between driving behavior and fuel
consumption on both macro and micro levels, allowing for end-to-end fuel consumption feature prediction,
which can then be applied in advanced driving assistance systems.
INDEX TERMS Driving behavior modeling, Data mining, Deep learning, Vehicle fuel economy.
I. INTRODUCTION
ACombination of emissions from coal combustion and
urban vehicle use has become the primary source of air
pollution in most of the world’s major cities [1,2]. According
to the World Health Organization, transportation emissions
are a significant and growing contributor to particulate air
pollution, which makes up 30% of particulate matter emis-
sions (PM) in European cities and 50% of PM emissions
in OECD countries [3]. One study estimated that approxi-
mately 1.03 million deaths were associated with ambient PM
2.5 air pollution in the 74 largest cities of China in 2013,
which accounted for 32% of all reported deaths [4]. As a
result, much research has been focused on reducing vehicle
emissions. As has been demonstrated in various studies [5-
7], driving behavior, such as speed control, preferred rate
of acceleration, and vehicle control stability, have a major
effect on fuel consumption, regardless of the type of vehicle
being driven. By accurately identifying relationships between
driving behavior and fuel consumption, Advanced Driving
Assistant Systems (ADAS) can be designed to give more
accurate and intelligent eco-driving advice [8,9]. By studying
the driving behavior’s impact on the fuel consumption, we
VOLUME 4, 2016 1
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
can know how some drivers cost more energy than others
so as to help high energy cost drivers to achieve fuel-
effect driving style. Besides, as the fundamental technology
of the ADAS systems or eco-driving coaching system, the
effective driving behavior-energy consumption model can
be applied to decrease the commercial vehicle’s fuel cost
[10], optimal the charging station location [11], decrease
the transportation’s greenhouse gas emission [12] and so on.
Thus, discovering the precise relationship between driving
behavior and fuel consumption, in order to reduce vehicle
emissions and increase fuel efficiency, has become an impor-
tant studying area and the motivation of our study. However,
effective analysis model for driving behavior’s impact on fuel
consumption is rarely studied. In this paper, we aimed to
design a machine learning based method which can analyse
and predict a reasonable relationship between the driving
behavior and fuel consumption. The eco-driving system or
ADAS system can obtain driving state from the proposed
model so as to give more reasonable advice to the driver to
keep fuel-efficient driving.
Quantitative analysis of the relationship between driving
behavior and fuel consumption is a natural and direct ap-
proach. However, traditional fuel consumption models such
as the Vehicle Specific Power (VSP) model [13], the Com-
prehensive Modal Emission Model (CMEM) [14] and the
International Vehicle Emissions model (IVE) [15] are specif-
ically designed to evaluate the fuel economy performance of
engines, and the process of calibrating these models is very
complex [16]. In contrast, most driving behavior modeling
studies have focused on specific driving scenarios, such as
lane changes [17,18], arterial corridors [19], signalized inter-
sections [20], and so on. These models focus on identifying
safe or comfortable driving, which are difficult to link to fuel
consumption. As a result, the integration of driving behav-
ior parameters or models with traditional fuel consumption
models is a problem which remains to be resolved. Many re-
searchers have proposed two-stage methods, where statistical
or machine learning methods are used to identify a driver’s
driving style, and then the features of that driving style
are compared with the related fuel consumption features. J.
E. Meseguer et al. used a three-layered neural network to
classify drivers into quiet, normal and aggressive groups [21].
They then analyzed the fuel consumption features for each
group. E. Gilman et al. used 17 driving behavior factors to
identify fuel-efficient driving behavior for a driver coaching
system [22]. The driving behavior factors were evaluated
according to their distributions, calculated from a historical
driving trip. R. Trigui et al. analyzed the impact of vari-
ous driving behaviors on fuel efficiency using mathematical
modelling [23]. The study first divided driving behavior into
two levels; maneuvering level and control level behavior.
Then, by identifying the various parameters of their model,
the authors simulated three different behaviors; aggressive
driving, eco-driving and normal driving. Their results showed
that their proposed model could accurately match measured
fuel consumption and real driving behavior. C. Lv et al.
proposed an unsupervised machine learning method using
Gaussian mixture models to recognize three typical driving
styles, and then provided the optimal control strategy for each
driving style in order to improve energy efficiency [24]. All of
the studies cited here succeeded in identifying fuel-efficient
driving behavior, however their lack of detailed consideration
of the impact of various traffic condition limits the usefulness
of their results as driving behaviors are also influenced by
various static or dynamic environmental factors [25,26].
Therefore, some researchers have also examined driving
environment features, which can be deduced or directly
obtained from naturalistic driving data, in their analyses of
driver fuel consumption, resulting in more nuanced assess-
ments. M. Ehsani et al. discussed in detail the effects of
external environmental factors on vehicle fuel consumption
[27], but did not carefully examine the effect of driving
behavior, only mentioning that speed and acceleration are
the two most important parameters. J. Rios-Torres et al.
classified driving styles into three categories by analyzing
real-world data, and then examined the effect of each driving
style on fuel consumption [28]. The results of this study show
that vehicle fuel consumption can vary widely compared with
standard US Environmental Protection Agency (EPA) driving
cycles, depending on the driver’s driving style and the driving
scenario.
The studies mentioned above investigating the relation-
ship between driving behavior and fuel consumption have
achieved good results, but unanswered questions remain.
Most of these studies have employed statistical or rule-based
methods to analyze the relationship between driving behav-
ior and fuel consumption, so these methods require huge
amounts of long-term driving data as well as prior knowledge
of the data’s statistical feature. The ordinary methods usually
need lots of expert skills to extracted prior knowledge from
the raw data set. And the results have limited universality
because the experiments have mostly been conducted on a
limited variety of traffic conditions. Although the machine
learning method also need considerable amount of data, the
learning-based method can learn the inner feature or the
knowledge from the raw data automatically.
Thus, in this paper we propose an approach which employs
two machine learning methods, in order to push the research
of the fuel-efficient driving behavior one step further. In
the first stage, we use an unsupervised machine learning
method to analyze the fuel efficiency of driver behavior
macroscopically, as shown in the upper section of Fig. 1
(circled in red). Inspired by some previous studies [29-31]
in which machine learning was used for driving behavior
analysis, in this study we employ a parallel spectral clustering
algorithm [32] to classify the driving signal dataset collected
from multiple drivers. Drivers are divided into three groups
based on similarities in their driving styles. We then analyze
the data to extract the data points which lie in the same
fuel consumption zone. Due to the properties of spectral
clustering, prior knowledge about the data is not required, so
this clustering method is suitable for dealing with unique sets
2VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 1: Two-stage architecture of the proposed driving behavior modeling method. In the first stage (outlined in red)
unsupervised machine learning is used to obtain the macro-level fuel consumption features of driver behavior. In the second
stage (outlined in blue) an LSTM is used to analyze short-term driving behavior and driving environment data to predict real-time
fuel consumption.
of driving data. A parallel calculating structure is also used
to improve the efficiency of the clustering process.
The other machine-learning method used in this study is
Long Short-Term Memory (LSTM), which is a powerful
method for modeling behavior [33]. In contrast to previous
studies which using LSTM to analyse the fuel consumption
model [34,35], in this paper we include more features of the
dynamic traffic environment, in form of video frames, in our
learning model, as shown in the lower part of Fig. 1 (circled
in blue), so as to make the network more robust and general
to a wider variety of traffic conditions. In addition to ana-
lyzing the fuel efficiency of a driver’s historic or long-term
driving behavior, our learning-based method is designed to
also examine short-term driving data, making the prediction
results adaptive to dynamic traffic conditions. The input end
of the model uses video frame, GPS and ECU information,
while the output is a real-time prediction of the level of fuel
consumption. This structure allows end-to-end evaluation of
the fuel-efficiency of driving behavior.
This paper is organized as follows: The paper’s objectives
and related research are described in the Introduction. Section
II provides details about the spectral clustering algorithm we
employed and describes the collection of driving behavior
data using data mining. Section III describes our use of an
LSTM to predict short-term fuel consumption features and
describes the model’s performance using representative fuel
consumption feature prediction results. Finally, in Section IV
we discuss our findings and conclusions.
II. DATA COLLECTION AND UNSUPERVISED
EXTRACTION OF FEATURES OF FUEL-EFFICIENT
DRIVING BEHAVIOR
A. DATA COLLECTION PROCESS
1) Experiment design
Research by E. Ericsson [26] suggests that driving behav-
ior is affected by various factors such as street design, traffic
management methods, traffic conditions, weather conditions
and the driver’s mental and physical condition. In order
to evaluate the effect of the driver’s condition on vehicle
fuel consumption and simplify the verification process, in
this study we fixed the vehicle type, trip route and weather
conditions used in our experiment. The only variable factors
are the drivers (i.e., their driving behavior) and the traffic con-
ditions. If more than one route were used in the experiment, it
would be difficult to determine which factors were primarily
responsible for variation in fuel consumption. Therefore, all
of the data for our experiment was collected using a fixed
route which included some variation in road types. Examples
of the two types of roads used in our study are shown in Fig.
2. The total distance of all of the road segments was about
VOLUME 4, 2016 3
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 2: Two types of roads used for data collection. Left:
expressway loop with two lanes. Right: ordinary road with one
lane in each direction.
15.2 km, which consisted of a 5.3 km expressway loop with
two lanes in each direction and 9.9 km of ordinary road with
one lane in each direction. The detail route map and road
information are shown in Fig. 3.
FIGURE 3: Overall map of the roads used for data collection.
Yellow line is the expressway and the white line is the ordinary
road.
Our data was collected using 30 normal passenger cars
with a 1.2T (85kw) gasoline engine and a six-speed auto-
matic transmission (6AT). Fuel consumption increases by
0.38±0.079% each time the air temperature decreases by 1C
[36]. Therefore, in order to avoid the possibility of variations
in air temperature obscuring the relationship between driv-
ing behavior and fuel consumption, the data collection was
conducted in the autumn from September to November. 202
drivers are selected to join the experiment, the information
of the drivers is shown in Fig. 4. As the supervised and
unsupervised learning method need lots of samples, so we
try out best to find the experiment participants as much as
possible. We choose these 202 drivers from our university’s
students and the cooperator’s staffs. All the participants drove
in the experimental route for 10 circuits a day and the whole
experiment of single drivers last a week. When processing
our experiment, we did not give time limitation or some
special driving tasks to the participants in order to avoid
extra mental pressure. We just tell them the research goal,
experimental route and drive as they usually do. Most of the
experiment participants are in normal emotion and will be
FIGURE 4: Age and sex distribution of all the experiment
participants.
FIGURE 5: Data collection system (for driving data and GPS
information).
paid after the experiment.
2) Data collection and redundant data pruning
The data collection system (DCS) in Fig. 5 is divided
into three parts: a vehicle-mounted data collection system
(VMDCS), a wireless transmission system (WTS) and a
data center (DC). The VMDCS uses On-Board Diagnostics
(OBD) to obtain the vehicle’s operating information from
the ECU, and uses GPS to track the vehicle’s position. The
WTS uses a wireless transmission unit (WTU) installed on
the vehicle which communicates with the base station via
4G broadband to upload the collected data. Messages from
the WTS include a receiving module IP address so that the
data can be transmitted to the DC via the internet. The DC
server shows the vehicle’s position and real-time vehicle
information on the Web. The collected data is stored in an
SQL database.
In order to improve calculation efficiency, we selected
vehicle operation data with a strong relationship to driving
behavior, and used the Pearson correlation coefficient (PCC)
4VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 6: Correlation coefficients of various vehicle op-
eration parameters with fuel consumption. Red bar: strong
correlation, Yellow bar: moderate correlation, Blue bar: weak
correlation.
TABLE 1: Difference between calculated fuel consumption
and fuel consumption analyzer results.
Vehicle load
Road type Urban road Expressway Rural road
No-load 4.85% 1.28% 2.18%
Full-load 5.94% 0.81% 3.65%
[37] to determine the relevance of each parameter to vehicle
fuel consumption. We treated positive and negative accel-
eration as different parameters because their effects on fuel
economy differ. For example, when calculating fuel cost,
if negative acceleration is less than zero, instantaneous fuel
consumption is zero. The calculated correlation coefficients
for various features are listed in Fig. 6, where PCC value ρis
represented by different color bars according to the following
standard guidelines; when |ρ|>0.5= strong correlation,
when 0.5>|ρ|>0.3= moderate correlation, when
|ρ|<0.3= weak correlation [38]. In Fig. 6, ‘Negative acc’
and ‘Negative acc variance’ have a negative correlation with
fuel consumption, so in fact, the PCC of these two parameters
are negative values. Then, before using an unsupervised
clustering method to abstract the data distribution features,
we first pruned the weakly correlated data parameters.
3) Fuel consumption calculation
To calculate fuel consumption, we integrated instant fuel
consumption information from the ECU to obtain accumu-
lated fuel consumption data. In order to verify the results
of our calculations, we compared our calculated results with
the results from a fuel consumption analyzer under various
traffic conditions. The differences between these two fuel
consumption measurement approaches are shown in Table 1.
From the data in Table 1, we can conclude that the dif-
ference between our calculation method and actual fuel con-
sumption is less than 6%. As the route used in our experiment
is only 15 km in length and the goal of the study is to
evaluate the effect of driving behavior on fuel consumption,
this difference can be ignored.
B. DATA SEGMENT CONSTRUCTION
As our research goal is to analyze and predict the impact
of driving behavior on fuel consumption within a limited
time frame (25 to 35 minutes), in this section we describe
the spectral clustering method we used to compare inner
similarity within the data set, so as to cluster data with similar
features into the same cluster. Our spectral clustering method
can only handle data sets of the same size. The data collection
rate was 10Hz and we collected 15,000-21,000 data points
per circuit of the driving route (we treated each circuit of the
driving route as an independent data set). Since the amount of
data collected in each data set varied, we needed to compress
each data set to the same size.
FIGURE 7: Data compression process based on road seg-
ment.
As shown in Fig. 7, we firstly partitioned the raw data
set into several subsets. The driving route was divided into
50 road segments according to their location distribution.
And then the whole data will be divided according to their
belonging road segment (each data points contain the GPS
position). As each road segment contains a different num-
ber of data points, we needed to calculate each segment’s
minimum data size Sn. For example, S1is the minimum
data size of the first road segment (calculated from the entire
data set associated with the first road segment). Each data set
allocated to road segment 1 is then compressed to size S1.
After data compression, each data set will have the same data
size Sall, as shown in equation (1):
Sall =
50
X
i=1
Si(1)
In contrast to using maximum information entropy to select
the size limit of the data, as in our previous study [39], the
VOLUME 4, 2016 5
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
data compression method adopted in this paper allows us to
retain most of the data points.
C. UNSUPERVISED DATA FEATURE EXTRACTION
1) Spectral Clustering Algorithm
Unsupervised machine learning is usually used for data
distribution analysis or data set inner feature abstraction. In
this paper, we adopt spectral clustering to study the features
of our self-collected dataset. As described previously, we
collected driving data sets of the same size from multiple
drivers during natural driving along a fixed route. Spectral
clustering performs data clustering as a graph partitioning
problem without making any assumptions about the form of
the data clusters. Due to this characteristic, we do not need
to have prior knowledge of the driving behavior data. This is
very important for our research because the data sets which
are obtained from the data collection platform vary from
driver to driver. Spectral clustering is a suitable method for
working with these kinds of ‘random’ data sets. In additions,
spectral clustering is reasonably fast, especially for sparse
data sets of up to several thousands of points. Furthermore,
spectral clustering is not dependent on the dimensions of the
data sets. The first step of the spectral clustering process is to
construct driving data layout graph G, which is an undirected
similarity graph for the parameters of the data points, all of
which are scalar. We use Xto represent the entire raw driving
data set:
X = {x1, x2, . . . , xN}, xiRl×Sall (2)
Each xicontains the six selected fuel-efficiency linked
parameters which were chosen as described above, so l = 6 in
this case. N is the total number of data samples. Graph G is
weighted using the distances between each pair of vertices xi
and xj, which are represented by non-negative weight wi,j.
Because there has been no definitive determination of how
the designs of similarity graphs influence spectral clustering
results [29], here we use a full-connection to construct simi-
larity matrix W, and use a Gaussian function to calculate wi,j
as follows:
wi,j = exp kxixjk2
2δ2!, δ = 10 (3)
Similarity matrix WRN×Nis constructed using the
terms of wi,j . Obviously, matrix W is a symmetric matrix
for G, which is an undirected similarity graph. We then build
degree matrix D, which is a diagonal matrix with degree
(d1, . . . , dn)as the diagonal. The degree of vertex xiis
defined as:
di=
N
X
j=1
wi,j (4)
Two other parameters are defined, the volume of a cluster,
Vol(C), and the border between two clusters, Cut (C1, C2),
which are calculated as follows:
Vol(C) = X
iC
di(5)
Cut (C1, C2) = X
iC1X
jC2
wi,j (6)
Next, similarity graph G is partitioned into disjointed sets.
There are different graph cutting methods, such as MinCut
[40], RatioCut [41] and NCut [42]. MinCut is simple and
effective, but it often fails to satisfactorily solve the problem
due to possible singularity problems. RatioCut and NCut
take into consideration the vertices and edge weights to
make the clusters more balanced, but RatioCut is relatively
slow, so in this study we chose Ncut, which is an NP-hard
problem [40], as our border determination method. In order to
obtain optimal clustering results, we used the object function
shown in (7), where (A1, . . . , Ak)are the final clustering
groups. This object function is used again in (10). Aiis the
complementary set of Ai:
minNcut(A1, ..., Ak) =min(1
2
k
X
i=1
W(Ai, Ai)
Vol(Ai))
= min(
k
X
i=1
Cut(Ai, Ai)
Vol(Ai))
(7)
A group of indicator vectors hj= (h1,j, . . . , hn,j)Tare then
defined as follows:
hi,j =(1
Vol(Aj), xiAj
0, xi/Aj
(8)
Matrix HRN×kwhich contains the k indicator vec-
tors hi,j as columns, is then constructed. Normalized graph
Laplacians [44] are then introduced as:
Lsym =D1
2LD1
2=ID1
2W D1
2(9)
Due to the following given properties:
H0H=I
h0
iDhi= 1
h0
iLhi= Cut Ai, Ai/Vol (Ai)
(10)
the Ncut problem is then reformulated as:
argminA1,...AkTr (H0LH )subject to H0DH =I(11)
By substituting T = D1
2H, we can change the Ncut
problem into a simpler form:
argminTRN×kTr T0D1
2LD1
2Tsubject to T0T=I
(12)
Then, according to the Rayleigh-Ritz theorem [32,45], this
standard trace minimization problem can be solved using
matrix U, which contains k eigenvectors as columns, corre-
sponding to the first k eigenvalues (in increasing order) of
Lsym. Finally, by taking each row of matrix U as new data
sets, we then cluster them into k groups using a K-means
6VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
clustering algorithm. If the unit in row i of matrix U belongs
to group Cj, the original data xiin the raw data set X also
belongs to group Cj.
2) Parallel spectral clustering algorithm
The time complexity of a spectral clustering algorithm is
O(n3), where n represents the amount of data. If n is greater
than 5,000, the time cost of spectral clustering using conven-
tional calculation methods will be excessively high, therefore
we introduce a method of parallel spectral clustering which
employs cloud computing. The cloud computing platform
Spark [46] is suitable for parallel calculations involving big
data. By analyzing the inner calculation mechanism of our
spectral clustering method, we see that three processes are
responsible for most of the calculation time cost: construction
of the similarity matrix, calculation of the eigenvalues of the
graph Laplacians and the final K-means clustering.
The process of parallel spectral clustering using the Spark
platform can be described as follows:
Step 1: Calculating the similarity matrix in a parallel
manner.
First, we store the entire raw data set in a Hadoop dis-
tributed file system (HDFS), since data sets in HDFS can
be accessed by the whole calculating cluster. We then use
the Spark resilient distributed dataset (RDD) map method
(shown in Fig. 8) to assign the spilt data set to several
parallel calculating tasks. Because the similarity graph is
fully connected, the similarity matrix is symmetrical. As a
result, we just need to calculate wi,j ,1ijN. The
detailed method for dividing the data to construct sub-sets is
shown below:
Raw data set: X = x1, x2, . . . , xN
Fragment set:
X1={x1, X0
1}, X X0
1=
X1={x2, X0
1}, X X0
1={x1}
.
.
.
XN={xN, X0
N}, X X0
N={x1, x2, . . . , xN1}(13)
Fragment set X1will be assigned to Task 1, as shown
in Fig. 8. The job of the Task 1 model is to calculate
(w1,1, . . . , w1,n). Expanding to arbitrary Task i, the fragment
set Xiwill be offered to Task i to calculate (wi,1, . . . , wi,n ).
The final step is to integrate the results of all of the tasks
in order to construct the similarity matrix. An overview of
the method of calculating the similarity matrix in a parallel
manner is shown in Fig. 8.
Step 2: Simplifying the calculation of the eigenvalues of
the graph Laplacians.
Lanczos algorithm [47] is the method used to calculate the
eigenvalues, and the calculation process is shown in Fig. 9.
Based on the process shown in Fig. 9, the following
relationships can be derived:
V0LV = T,V = {v1, v2, . . . , vn}(14)
FIGURE 8: Method of calculating the similarity matrix in a
parallel manner.
FIGURE 9: Method of calculating the eigenvalues of the graph
Laplacians.
T = tridiag(B,A,B),B = {b1, . . . , bn},A = {a1, . . . , an}
(15)
By observing the Lanczos algorithm calculation process,
we find that most of calculation time cost is due to the process
of L×vj, so we split L into n rows and multiply each row by
vj. We then merge the results to get the final value of L×vj.
An overview of the calculation process is shown in Fig. 10.
The parallel calculation process increases memory cost, but
the inner memory assignment mechanism limits this problem
to a tolerable level.
Step 3: K-means is an iteration process, so we split the data
into several smaller data sets.
We first choose random center points for the whole data
set and assign the center points to each data subset. The
subset data will be used to calculate the distance between the
subset data and the randomly chosen center points. Next, the
VOLUME 4, 2016 7
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 10: Parallel calculation of the eigenvalues of the
graph Laplacians.
subset data results are sent to a task which integrates all of
the results of the data subsets, in order to obtain new center
points for the whole data set. This process will continue until
the center points satisfy the demands of the overall data set.
Compared to the traditional K-means process, parallel K-
means calculation converts global calculation into regional
calculation, which simplifies the calculation object in order
to reduce the time cost. The parallel K-means calculation
process is shown in Fig. 11.
FIGURE 11: Method of calculating K-means in a parallel
manner.
FIGURE 12: Driving data clustering results.
3) Feature extraction results
A total of 8,984 natural driving data samples (i.e., the
number of completed trips) were selected during the data
collection process described in Subsection A above. Using
the parallel spectral clustering algorithm described above,
the data samples were then clustered into three groups, with
each group containing drivers with similar driving styles or
behavior, as shown in Fig. 12. The X and Y axes of Fig.
12 represent velocity and positive acceleration, respectively.
The points in the blue cluster represent the drivers who drove
at low velocity with low positive acceleration. The points
in the yellow cluster represent the drivers who preferred to
drive at low velocity but who used high rates of acceleration.
Points in the red cluster represent the drivers who preferred
to drive at a high velocity and whose acceleration ranged
from high to low. We break the clustering results down
statistically using our six selected fuel consumption-related
parameters in Fig. 13. In Fig. 14, the data points of each of
the clusters are plotted on 2-D and 3-D graphs according to
fuel-consumption and their serial number within the data set.
Average fuel consumption for drivers on the blue line was
3.68 L/100 km, on the yellow line 5.14 L/100 km and on the
red line 7.44 L/100 km.
There are several phenomena illustrated in Fig. 13 which
are worth noting. First, we find that fuel-consumption within
each cluster differs and that fuel-consumption increases
steadily from the blue to the yellow to the red cluster, i.e.,
there is a surprising amount of variation within each group,
but this variation is constrained by a clear trend. Second,
some outlier points exist, which represent drivers whose fuel
consumption was actually higher than that of some of the
drivers in the next cluster. A numerical analysis of these
outlier points is shown in Table 2. We can see clearly in
the Fig. 14 that the height of each cluster, which represents
increasing fuel consumption, differs. We can also see that the
three clusters have overlapping areas, which can be observed
in the areas of the 3D graph containing blended colors. These
overlapping areas represent the outlier points. Because the
8VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 13: Driving data clustering results for all of the selected parameters (Blue, Yellow and Red refer to the data clusters
shown in Fig. 12).
FIGURE 14: Fuel consumption distribution of the three-cluster group.
TABLE 2: Numerical analysis of outlier points
Group # of outlier
points
Total # of
points
Proportion
of outlier
points
Low fuel
consumption
276 3000 9.20%
Medium fuel
consumption
345 3001 11.49%
spectral clustering process is based on a data graph partition
algorithm, the points on the periphery of each cluster group
will tend towards randomness, which means the points on
the boundaries will join the clusters randomly. Additionally,
the six chosen parameters represent only the major factors
affecting fuel-consumption, but not all of the factors related
to vehicle operation. As a result, some data points which
have high fuel-consumption attributes may also share other
attributes with data points in the lower fuel consumption clus-
ters. What’s more, long-term fuel consumption is deduced by
observing instantaneous fuel consumption, as shown in Table
1, so the calculated fuel consumption values could have an
error rate of 0.8%-5.9%, which could also affect the final
clustering results. Finally, the overall proportion of outlier
data points is about 20.69%.
From the above results, we can conclude that drivers who
operate their vehicles with relatively low fuel consumption
are those who change their driving speed moderately and
drive their vehicles at a relatively low average speed. The pro-
posed parallel spectral clustering algorithm was able to accu-
rately cluster the drivers according to their fuel-consumption
using vehicle operation data, with an approximate clustering
accuracy rate of 79.31%.
In order to verify the performance of the clustering method
used in this study, we compared our clustering results with
VOLUME 4, 2016 9
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
those of the kernel fuzzy C-means (KFCM) [30] and K-
means clustering methods [48]. Performance of the three
clustering methods are compared in Table 3.
From this comparison we can see that the proportion of
outlier points when using KFCM is 4% higher than when
using the proposed parallel spectral clustering method. When
using the K-means method, the data points are less tightly
clustered compared with the other two clustering methods,
and the proportion of outlier points is the highest of all the
clustering methods. Therefore, the proposed parallel spectral
clustering method achieved the best clustering performance
of the three methods.
We then compared the calculation efficiency of the pro-
posed parallel calculating structure with normal spectral
clustering. Different sample sizes were chosen to verify the
proposed method’s superior performance. The results are
shown in Fig. 15. When the amount of data being calculated
is greater than 10,000 data points, the time cost of normal
spectral clustering using Matlab is almost 18 times higher
than when using the parallel spectral clustering method.
Furthermore, as the amount of data increases, the time cost
of normal spectral clustering increases sharply.
FIGURE 15: Comparison of calculation efficiency of classical
and parallel spectral clustering methods.
In this section we described the clustering method used
to obtain the macroscopic relationship between driving be-
havior and fuel consumption. In the next section, an LSTM-
based method is proposed to analyze this relationship in a
more detailed or microscopic way.
III. PREDICTION OF SHORT-TERM FUEL
CONSUMPTION USING LSTM
The clustering-based method proposed in Section II above
can only provide relatively long-term (25 to 35 minutes)
assessment of the impact of a driver’s behavior on fuel
consumption. When attempting to perform relatively short-
term prediction (30 seconds to 5 minutes), the clustering-
based method does not work well for classifying driving
behavior according to fuel efficiency. Besides, our clustering
method is, in fact, a kind of classifier, so it has no prediction
ability. Therefore, in this section we propose the use of a
time series learning method (an LSTM network) to model
the relationship between driving behavior and fuel consump-
tion, allowing us to predict the short-term fuel consumption
state of a driver’s behavior. As a driving behavior pattern
represents the driver’s interaction with a dynamic driving
environment, and fuel consumption can be treated as the
cost result of this process, in this section we add dynamic
driving environment information to our learning data. In the
series data construction process described in this section,
we first explain how we coded driving environment factors
into a digital form. Then the environmental feature data and
the behavior data are integrated into time-series data using
a sliding window. Fuel consumption state will be the label
for the constructed time-series data set. The LSTM-based
model is then trained using the time-series data. The model’s
classification performance and prediction accuracy will be
discussed at the end of this section.
A. TIME-SERIES DATA CONSTRUCTION
1) Coding of environmental factors
As explained in our previous study [49], we divided the
environmental factors into two categories, dynamic envi-
ronmental features (other vehicles, brake lights of leading
vehicles, pedestrians, etc.) and static environmental features
(features which remain invariable for relatively long periods
of time, including road structures such as intersections and
curves). The driving environment factors used for training
our model are shown in Table 4. Some of the dynamic
features are captured by a camera mounted on the vehicle.
As shown in Figs. 2 and 16, two types of roads were used in
this study. In Fig. 16, the gray car is the experimental vehicle,
the red vehicle is the leading vehicle or leading vehicle in
the right lane, the blue vehicle is a parked vehicle, the green
vehicle is the first on-coming vehicle in the opposite lane
and the yellow vehicle is the second on-coming vehicle in
the opposite lane. In ordinary-road scenes (one lane in each
direction), the motorcycle or motorbike and the pedestrian
are also considered to be environmental factors which can
affect the driver’s behavior. Thanks to the development of
object detection technology, we can easily extract these traf-
fic environment factors. In this study we used YOLOv3 [50],
a deep learning-based, real-time object detection method, to
obtain the relative positions of these traffic factors. Using this
position information, we can code the traffic factors into a
digital form.
Examples of the raw output of the YOLO network are
shown in the two images on the left of Fig. 17. Environmental
factors beside the road which will not affect driving behavior
are also detected by YOLO. As the camera position is fixed,
a lane detection program can be used to determine lane
position. Using the lane boundary indicator (blue dotted line
10 VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
TABLE 3: Comparison of different clustering methods
Clustering
algorithm
Fuel con-
sumption
group
# of outlier
points
Total # of
points
Outlier point
proportion
Total
proportion
of outlier
points
Spectral cluster Low 276 3000 9.20% 20.69%
Medium 345 3001 11.49%
KFCM Low 417 3117 13.38% 24.53%
Medium 334 2995 11.15%
K-means Low 535 2889 18.52% 28.71%
Medium 327 3210 10.19%
TABLE 4: Driving environment factors considered in the
training data
Dynamic features
On-road traf-
fic factors
Leading
vehicle’s brake
lights
Position of the
vehicle in the
right lane
Position of the
vehicle in the
left lane
Positions of
parked vehicles
Position of
merging vehicle
Positions of
pedestrian &
bicycles
Static features Road structure
Curves
Uncontrolled
intersections
Controlled
intersections
shown in upper left image of Fig. 17), we can remove the
detected environmental factors which are not located within
the range of the road lane. The other noise in YOLO’s output
is the multi-bounding box. We first identify the unneeded
multi-bounding boxes by comparing the center points of each
box, and then remove the box with the lower confidence
rating.
After removing the redundant roadside data and the un-
needed bounding boxes, we classify the environmental fac-
tors, using the feature categories listed in Table 4, according
to their positions in the camera image, as shown in Fig. 18.
In our previous study [46], we discovered that providing
the positions of the detected environmental factors helps the
LSTM learn driving behavior more effectively. So, in this
study, we use the same method to change the continuous
positions of traffic objects into discrete locations using a
mapping grid. As shown in Fig. 19, the positions of traffic
factors, such as the vehicles in the photo, are labeled as the
FIGURE 16: Road types and dynamic traffic factors consid-
ered in this study.
FIGURE 17: Correction of raw YOLO output.
VOLUME 4, 2016 11
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 18: The classified traffic factors label for the object
detected by YOLO.
FIGURE 19: Position zones for identifying the locations of
traffic factors [49].
belonging to an area or zone, in this case areas A2 and B1.
The size of each object is labeled according to the length of
the yellow line under the object.
2) Fuel consumption feature labeling and time series data
construction
In (16), BTrepresents the driving behavior data set from
one trip along the fixed driving route, while S represents
the size of the data (the number of behavior data points)
collected during the time period it took to complete the route.
S is calculated by applying the method shown in Fig. 7
(compression of all of the data sets into the same size). The
only difference in compressing process used in this section is
that here, we divide the experimental road into 150 segments
instead of 50 in order to obtain much more detailed data
features. N in (16) represents the driving behavior categories
strongly and moderately correlated with fuel consumption (N
= 6, which are listed in Fig. 6).
BT=
b1,1··· b1,S
.
.
.....
.
.
bN,1··· bN,S
N×S
(16)
In (17), ETrepresents the environmental data set from one
trip along the driving route. S is the size (number of envi-
ronmental data points) of the environmental data collected
during the period of time it takes to complete one circuit
of the driving route. M represents the environmental factor
number from the list in Table 3 (M = 13).
ET=
e1,1··· e1,S
.
.
.....
.
.
eM,1··· eM,S
M×S
(17)
When collecting the driving data, in addition to the camera
frames we also collect the driving behavior data associated
with each frame simultaneously, so that each set of behavior
data corresponds to one camera frame. This allows us to
integrate driving behavior data set BTand environmental
data set ETinto a single dataset XT:
XT=
b1,1··· b1,S
.
.
.....
.
.
bN,1··· bN,S
e1,1··· e1,S
.
.
.....
.
.
eM,1··· eM,S
(M+N)×S
(18)
Fuel consumption Fcan then be calculated as follows:
F (XT) = {F1, F2, . . . , Fi, . . . , FI}(19)
Function f(x) in (20) and (21) represents the hypothetical
equation which describes the nonlinear relationship between
driving behavior, driving environment and fuel consumption
features. To deduce function f(x)would be relatively diffi-
cult, so here we treat f(x)as a ‘black box’, so our LSTM-
based method is applied to simulate the computations of
this ‘black box’. As the input for the LSTM should be data
in a time-series format, the raw training data must first be
converted into time-series data. As shown in Fig. 20, we use
a sliding window to construct each set of time-series data,
and the data label is each data segment’s fuel consumption
Fi. The window size is 50 data points and the size of the
sliding step is 15 data points, so in (21), step = 15 and j = 50.
Fiis mapped into the data segment’s distribution to obtain
its ranking level. For example, in Fig. 20, F1belongs to
the low fuel consumption level (marked with dotted points),
so the “time-series data 1” will be labeled as “low fuel
consumption”. The green, yellow, and red labels represent the
low, medium, and high fuel consumption group respectively.
The fuel consumption group is judged by the other driver’s
12 VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 20: Time series data composition using sliding win-
dow.
historical records.
F1=f
b1,1··· b1,j
.
.
.....
.
.
bN,1··· bN,j
e1,1··· e1,j
.
.
.....
.
.
eM,1··· eM,j
(20)
Fi=f
b1,1+(i1)×step ··· b1,(i1)×step+j
.
.
.....
.
.
bN,1+(i1)×step ··· bN,(i1)×step+j
e1,1+(i1)×step ··· e1,(i1)×step+j
.
.
.....
.
.
eM,1+(i1)×step ··· eM,(i1)×step+j
(21)
The boundaries of the fuel consumption levels are defined
by the trisection lines, and the equation for calculating the
boundaries is shown in (22). Results (0,0,1), (0,1,0) and
(1,0,0) represent low, moderate and high fuel consumption,
respectively. Fiis the current data segment’s fuel consump-
tion and Favg,i is the expected value of the remaining driving
process data in that data segment.
Ii=
(0,0,1), Fi<0.6Favg,i
(0,1,0),0.6Favg,i < Fi<1.2Favg ,i
(1,0,0), Fi>1.2Favg,i
(22)
After completing the labeling process, we can obtain our
training data with fuel consumption feature labels. The labels
are not only obtained by calculating detailed fuel consump-
tion, but also obtained by comparing the fuel consumption
FIGURE 21: The unfolded structure of the LSTM network and
the inner composition of an LSTM node.
distribution with all of the other drivers’ fuel consumption
distributions.
B. FUEL CONSUMPTION PREDICTION MODELING
BASED ON LSTM
1) LSTM components and their mathematical expressions
As the state of the art in information processing and behav-
ior modeling, LSTM is widely used in machine translation
[51], speech recognition [52], driving behavior analysis [53],
and other applications. LSTM is in fact a kind of Recurrent
Neural Network (RNN) [33,54]. Standard RNNs usually suf-
fer from the vanishing gradient problem, but LSTMs include
a ‘forget gate’, which can prevent backpropagation errors
from vanishing or exploding. The structure of the LSTM used
in this study is shown in Fig. 21.
An LSTM is a recurrent network which produces a state
as its output, and the state of current network is passed on to
the next step in the network for further calculation. As shown
in Fig. 21, each node of the LSTM network is composed of
three main components, a ‘forget gate’, an ‘input gate’ and
an ‘output gate’. The ‘forget gate’ determines the effect of
the information from the previous step on the calculations of
the current network, which is the key feature of the LSTM,
allowing it to avoid the problems of gradient vanishing or
exploding. The function of the ‘forget gate’ can be expressed
mathematically as follows:
Fforget =σ(Wf·[yi1, xi] + bf)(23)
σ(x) = 1
1 + ex(24)
As σ(x) is a sigmiod fuction, Ff orget is always smaller
VOLUME 4, 2016 13
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
than 1. Furthemore, Fforget will be mutiplied by previous
network state Si1to form part of the new state Si, so
Fforget determines how much state Si1will affect current
network state Si.
The second part of the LSTM is the ‘input gate’, which
mainly decides what should be newly added to the current
network state. First, we should find which part of the previous
state should be updated, so we use the following equations to
define the update procedure:
Fin =σ(Wi·[yi1, xi] + bi)(25)
And then the updated value can be determined as follows:
Snew = tanh (Wnew ·[yi1, xi] + bnew)(26)
Current network state Sican be obtained from the updated
state value and the remaining previous network state:
Si=Fin ×Snew +Fforget (27)
The third part of the LSTM is the ‘output gate’, which uses
current network state Sito generate the final output. Using
current inner state Si, we decide which data we can output,
then the data is multiplied by Fout (which ranges from 0 to
1) to determine which data can be output. The calculation
process is shown in the following equation:
yi= tanh (Si)×σ(Wo·[yi1, xi] + bo)(28)
In this paper, input xi=XTin (18), and the size of XT,
which is defined by the sliding window in Fig. 20, is 50.
2) LSTM network training process
First, we need to pre-process the training data. All of
the time-series data is normalized into a range of 0 to 1.
We code each data set’s label into a one-hot form: high
fuel consumption is (0,0,1)T, medium fuel consumption is
(0,1,0)Tand low fuel consumption is (1,0,0)T.
To build the LSTM network, we used TensorFlow [55],
which is an end-to-end open source software platform for
machine learning. The LSTM block is based on the LSTM
node unit “tf.nn.rnn_cell.LSTMCell” [56] which is provided
by the TensorFlow API. The hyper-parameters and the train-
ing strategy of the LSTM network are shown in Table 5.
The output of the LSTM is put into a Softmax classifier,
which calculates its probability of belonging to each class.
The Softmax function can convert the output of the LSTM
into a range from 0 to 1. The mathematical expression of the
Softmax function is as follows:
Ci=eyi
Pjeyj(29)
where Ciis the output confidence rate, i.e., the dataset’s
probability of belonging to a certain fuel consumption group.
C. RESULTS OF FUEL CONSUMPTION PREDICTION
USING LSTM
TABLE 5: Hyper-parameters and the training strategy of the
LSTM network.
Hyper-parameters
Parameter
name
Parameter value
Number of
units in the
LSTM’s
hidden layers
125, 150, 200
Number of
hidden layers
in the LSTM
2
Batch size 64
Initial forget
bias
1
Initial
learning
rate
0.005
Training strategy Optimizer Adam Optimizer
[57]
Loss function Sparse Softmax
cross entropy
1) Training data
The entire data set is divided into six groups randomly,
with each group containing 5,000 data points of time-series
data. The six groups of data are divided as follows: four
groups are used for training, one group is used for validation
and one group is used for testing. Because the training
process involves cross-validation, each group will be treated
as a training data group, a validation group or a testing group.
2) Comparison of LSTM prediction results with those of
other machine learning methods
We compared the performance of two other machine learn-
ing methods with the performance of the proposed LSTM-
based method. One of those methods was kernel-based Sup-
port Vector Machine (SVM) [58], and the other was a multi-
layer neural network. In addition, LSTM networks with
different number of nodes were also evaluated.
SVM is a very powerful machine learning method which
maps the objects to be sorted into high-dimensional feature
spaces. It is widely used for semantic parsing [59], image
segmentation [60], facial recognition [61] and other applica-
tions. MATLAB’s Statistics and Machine Learning Toolbox
[62] was used to construct our SVM-based classifier.
The multi-layer neural network we used had two hidden
layers, and each layer contained 150 rectified linear units
(ReLU), as shown in Fig. 22. The output of the network is
passed into a Softmax layer, and the probabilities of the data
belonging to each of the three fuel consumption categories
are calculated.
Two criteria were considered in our evaluation, the clas-
sifier accuracy rate and the area under the curve (AUC) of
receiver operating characteristics (ROC) [63]. The classifier
accuracy rate is a direct index which can be used to judge
14 VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 22: Structure of the neural network with two hidden
layers, each of which contains 150 ReLU nodes.
FIGURE 23: AUC values for each modeling method and each
test group.
the performance of the prediction model, however it cannot
evaluate the classification performance of the model. AUC
is a probability value, which is the general standard for
evaluating classifier performance. In Fig. 23 we show each
classifier’s performance for each of the six testing groups. We
can see in Fig. 24 that the LSTM with 150 nodes achieved the
best overall performance.
Next, we experimentally evaluated the short-term fuel con-
sumption estimation performance of our proposed LSTM-
based prediction method. Three representative drivers who
belonged to different fuel consumption groups were selected
to test the performance of our deep learning-based predic-
tor. The fuel consumption data distributions for these three
drivers are shown in Fig. 24.
The LSTM-based classifier’s prediction accuracies for
these three drivers are illustrated in Fig. 25.
The red lines represent the predicted fuel consumption
category based on the driver’s fuel consumption features
over time, while the light blue bars represent the actual
distribution of the fuel consumption features corresponding
to the driver’s behavior. The average prediction accuracy for
the three selected drivers was 81%.
FIGURE 24: Fuel consumption data distributions of three
representative drivers.
IV. DISCUSSION AND CONCLUSION
In this paper, we first used the unsupervised machine
learning method of spectral clustering to classify drivers into
three groups using six driving behavior-based fuel consump-
tion features. We then analyzed the macro-behavior of each
group, focusing on power demand (speed and acceleration)
and control stability (variation in speed and acceleration).
Our results showed that the proposed spectral clustering-
based method could accurately identify drivers with different
fuel consumption profiles, and clearly modeled the relation-
ship between the real-world driving data and the correspond-
ing fuel consumption features.
In addition to the estimation of fuel consumption using ve-
hicle operation data, we also performed a qualitative analyses
of driving behavior, as shown in Fig. 13. Speed and accelera-
tion information reveal the amount of power demanded by a
driver, while variance in speed and acceleration represent the
range of dynamic control exercised by drivers [25, 26]. The
results of our analysis showed that high fuel consumption
drivers (those in the red cluster) tend to maintain a relatively
steady, high demand for power, while their dynamic control
of the vehicle is less stable. Their acceleration rates are higher
and their pedal control behavior is less stable compared to
drivers in the low fuel consumption cluster. Drivers in the
median yellow cluster showed the lowest speed distribution,
but their gas and brake pedal operation characteristics were
similar to those of the low efficiency drivers in the red cluster.
Drivers in the blue cluster had the lowest fuel consumption,
since they tended to maintain a consistent speed, and their
dynamic control of the vehicle was the most stable among the
three groups. We also compared the spectral cluster method
VOLUME 4, 2016 15
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
FIGURE 25: Short-term fuel consumption prediction performance using LSTM-based classifier and fuel consumption features
for three representative drivers.
with other state of art clustering method such as k-means
and KFCM. As show in Table.3, spectral cluster method can
achieve the best clustering performance of the three methods.
However, there were drawbacks to our proposed method,
in that the spectral clustering-based method requires rela-
tively long-term data to produce accurate classification re-
sults. So, for real-time and short-term fuel consumption fea-
ture prediction, this unsupervised method is not appropriate.
Furthermore, the results of data mining can only show the
impact of a driver’s behavior on fuel consumption on a
macro-level.
Therefore, in the second stage of our study we attempted
to use a supervised machine learning-based LSTM method
to build a link between short-term driving data and the fuel
consumption features. The proposed LSTM-based model was
able to accurately predict driver behavior, achieving a maxi-
mum AUC of 0.836, which is considered to be good human
behavior prediction performance [64]. As shown in Fig.
23, the LSTM-based method achieved better classification
performance than the SVM or NN-based methods. LSTM
networks with different numbers of hidden nodes were also
evaluated in this study, revealing that the LSTM with 150
hidden nodes achieved the best average AUC, compared to
LSTMs with 125 or 200 hidden nodes. Three representative
drivers were then selected for a more detailed evaluation of
the model’s performance. As shown in Fig. 25, the short-
term fuel consumption performance of the drivers could be
accurately predicted using the proposed method, although
some prediction error did occur. However, an average overall
prediction accuracy of more than 80% was achieved.The
whole prediction process is end-to-end, as the input of the
model is the driving behavior and dynamic traffic condition
data. After the raw data is reformatted and then processed by
the model, the output is a prediction of which fuel consump-
tion group a particular driver belongs to.
In conclusion, we made three contributions in this paper;
firstly, we propose a clustering-based data-mining method
which can analyse the behavior and its fuel consumption
result in a macro view. The method can serve as a group
behavior assessment mechanism for the public transporta-
tion department or the commercial transportation company
to evaluate the energy cost distribution. Secondly, we also
propose a micro fuel consumption evaluation model by learn-
ing the driving behavior. The model shows good prediction
ability which can be integrated into the ADAS system or the
eco-driving coach system to evaluate and obtain the fuel-cost
behavior of the single drivers. The predicted state can make
the ADAS or eco-driving system give more reasonable and
adaptive fuel-efficient driving strategy or detail manipulation.
Thirdly, we widen the deep learning method’s application
area, to our knowledge, it is the first time that the deep
learning method is used for learning the driving behavior’s
impact on fuel consumption feature.
There are some limitations in our study and in our pro-
posed method. First, the shortcomings of the collected data
will mainly affect the deep-learning based method. As the
collected data are collected from two kinds of road and the
traffic environment factors are not all coded into the time-
series data, so the LSTM can just learn the limited feature
from the fixed traffic condition and the environment it ever
meet. When facing different road types, for example the
road with four lanes, it will suffer prediction performance
decreasing. Second, the prediction accuracy of the proposed
LSTM-based method was not extremely high. We suspect
this is mainly because the model input information included
a limited number of traffic conditions, and because the form
of this input information was relatively basic. As a result, the
LSTM could not accurately predict fuel consumption in very
complex or unknown situations. And our deep-learning based
method is the model can just predict the fuel consumption
level of the driving process so it is hard to give more detail
fuel cost information. What’s more, compared with other
16 VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
state of the art behavior prediction method, LSTM or deep
learning network need lots of training data and training
time. If other new behavior factors which affect the fuel
consumption need to be added into the network, the model
need to revised the original parameter and training process
should be reprocessed. This will limit the generality of the
model. Third, we only used one type of experimental vehicle,
so we need to do further research to determine whether the
proposed LSTM-based model can be adapted to other types
of vehicles. At last, the driver’s personal feature such as age,
sex, driving experiences and so on, are not further studied in
this study.
So, in our future work, firstly we aim to use larger scale
naturalistic driving data to make our prediction model with
more robustness. Then the other factors’ effect, such as group
personality feature or vehicle type, on the fuel consumption
analysis should also be studied in order to make the fuel
consumption prediction model more general.
ACKNOWLEDGMENT
This work was partially supported by the National Natu-
ral Science Foundation of China [Grant No.61300101] and
key research plan of Jiangsu Province (No.BE2017035).The
authors would like to thank the Vehicle Engineering Devel-
opment Division of Mitsubishi Motors and Functional Safety
Department of UISEE Technologies Beijing Co., Ltd for their
valuable research assistance.
REFERENCES
[1] C. Sun, Y. Luo, and J. Li, “Urban traffic infrastructure investment and air
pollution: Evidence from the 83 cities in China,” J. Clean. Prod., vol. 172,
pp. 488496, 2018.
[2] H. Zhang, S. Wang, J. Hao, X. Wang, S. Wang, F. Chai, and M. Li, “Air
pollution and control action in Beijing,” Journal of Cleaner Production,
vol. 112. pp. 1519-1527, 2016.
[3] World Health Organization, “Health in the green economy. Health co-
benefits of climate change mitigation,” 2011.
[4] H. Li, Y. Yu, X. Qian, D. Fang, Q. Wang, and Y. Lu, “Mortality effects
assessment of ambient PM 2.5 pollution in the 74 leading cities of China,”
Sci. Total Environ., vol. 569-570, pp. 1545-1552, 2016.
[5] H. Liimatainen, “Utilization of fuel consumption data in an eco-driving
incentive system for heavy-duty vehicle drivers,” IEEE Trans. Intell.
Transp. Syst., vol. 12, no. 4, pp. 1087-1095, 2011.
[6] J. E. Meseguer, C. T. Calafate, J. C. Cano, and P. Manzoni, “Assessing
the impact of driving behavior on instantaneous fuel consumption,” in
2015 12th Annual IEEE Consumer Communications and Networking
Conference, CCNC 2015, 2015, pp. 443-448.
[7] C. D’Agostino, A. Saidi, G. Scouarnec, and L. Chen, “Rational truck
driving and its correlated driving features in extra-urban areas,” in IEEE
Intelligent Vehicles Symposium, Proceedings, 2014, pp. 1199-1204.
[8] K. Dietmayer, H. Winner, M. Maurer, K. Bengler, C. Stiller, and B.
Farber, “Three Decades of Driver Assistance Systems: Review and Future
Perspectives,” IEEE Intelligent Transportation Systems Magazine, vol. 6,
no. 4. pp. 6-22, 2014.
[9] J. N. Barkenbus, “Eco-driving: An overlooked climate change initiative,”
Energy Policy, vol. 38, no. 2, pp. 762-769, 2010.
[10] M. J. M. Sullman, L. Dorn, and P. Niemi, “Eco-driving training of
professional bus drivers - Does it work?,” Transp. Res. Part C Emerg.
Technol., 2015.
[11] C. H. Lee and C. H. Wu, “An Incremental Learning Technique for
Detecting Driving Behaviors Using Collected EV Big Data,” Proc. ASE
BigData Soc. 2015, 2015.
[12] J. N`egre and P. Delhomme, “Drivers’ self-perceptions about being an
eco-driver according to their concern for the environment, beliefs on eco-
driving, and driving behavior,” Transp. Res. Part A Policy Pract., 2017.
[13] J. Jim´enez, “Understanding and quantifying motor vehicle emissions with
vehicle specific power and TILDAS remote sensing,” Massachusetts Inst.
Technol. Cambridge, 1999.
[14] M. Barth, F. An, T. Younglove, G. Scora, C. Levine, M. Ross, and T.
Wenzel, “NCHRP PROJECT 25-11: Development of a Comprehensive
Modal Emissions Model - Final Report,” 2000.
[15] N. Nikkila, M. Osses, J. Lents, N. Davis, and M. Barth, “Development
and Application of an International Vehicle Emissions Model,” in Trans-
portation Research Record: Journal of the Transportation Research Board,
2018, vol. 1939, no. 1, pp. 156-165.
[16] Z. Xu, T. Wei, S. Easa, X. Zhao, and X. Qu, “Modeling Relationship
between Truck Fuel Consumption and Driving Behavior Using Data from
Internet of Vehicles,” Comput. Civ. Infrastruct. Eng., vol. 33, no. 3, pp.
209-219, 2018.
[17] G. Xu, L. Liu, Y. Ou, and Z. Song, “Dynamic Modeling of Driver Control
Strategy of Lane-Change Behavior and Trajectory Planning for Collision
Prediction,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 3, pp. 1138-
1155, 2012.
[18] Z. Zheng, “Recent developments and research needs in modeling lane
changing,” Transp. Res. Part B Methodol., vol. 60, pp. 16-32, 2014.
[19] H. Xia, K. Boriboonsomsin, and M. Barth, “Dynamic eco-driving for sig-
nalized arterial corridors and its indirect network-wide energy/emissions
benefits,” J. Intell. Transp. Syst. Technol. Planning, Oper., vol. 17, no. 1,
pp. 31-41, 2013.
[20] X. Xiang, K. Zhou, W. Bin Zhang, W. Qin, and Q. Mao, “A Closed-
Loop Speed Advisory Model With Driver’s Behavior Adaptability for Eco-
Driving,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 6, pp. 3313-3324,
2015.
[21] J. E. Meseguer, C. K. Toh, C. T. Calafate, J. C. Cano, and P. Manzoni,
“Drivingstyles: A mobile platform for driving styles and fuel consumption
characterization,” J. Commun. Networks, 2017.
[22] E. Gilman, A. Keskinarkaus, S. Tamminen, S. Pirttikangas, J. R¨oning, and
J. Riekki, “Personalised assistance for fuel-efficient driving,” Transp. Res.
Part C Emerg. Technol., vol. 58, no. PD, pp. 681-705, 2015.
[23] R. Trigui, S. Javanmardi, E. N. Bourles, H. Tattegrain, E. Bideaux, and
J. F. Tr ´egou¨et, “Driving Style Modelling for Eco-driving Applications,”
IFAC-PapersOnLine, vol. 50, no. 1, pp. 13866-13871, 2017.
[24] C. Lv, X. Hu, A. Sangiovanni-Vincentelli, Y. Li, C. M. Martinez, and
D. Cao, “Driving-Style-Based Codesign Optimization of an Automated
Electric Vehicle: A Cyber-Physical System Approach,” IEEE Trans. Ind.
Electron., vol. 66, no. 4, pp. 2965-2975, 2019.
[25] A. E. af W˚ahlberg, “Long-term effects of training in economical driving:
Fuel consumption, accidents, driver acceleration behavior and technical
feedback,” Int. J. Ind. Ergon., vol. 37, no. 4, pp. 333-343, 2007.
[26] E. Ericsson, “Variability in urban driving patterns,” Transp. Res. Part D
Transp. Environ., vol. 5, no. 5, pp. 337-354, 2000.
[27] M. Ehsani, A. Ahmadi, and D. Fadai, “Modeling of vehicle fuel con-
sumption and carbon dioxide emission in road transport,” Renewable and
Sustainable Energy Reviews, vol. 53. pp. 1638-1648, 2016.
[28] J. Rios-Torres, J. Liu, and A. Khattak, “Fuel consumption for various
driving styles in conventional and hybrid electric vehicles: Integrating
driving cycle predictions with fuel consumption optimization,” Int. J.
Sustain. Transp., 2018.
[29] C. H. Lee and C. H. Wu, “A Novel Big Data Modeling Method for
Improving Driving Range Estimation of EVs,” IEEE Access, 2015.
[30] J. Wu, Y. Du, G. Qi, and M. Xu, “Leveraging longitudinal driving be-
haviour data with data mining techniques for driving style analysis,” IET
Intell. Transp. Syst., vol. 9, no. 8, pp. 792-801, 2015.
[31] Z. Constantinescu, C. Marinoiu, and M. Vladoiu, “Driving style analysis
using data mining techniques,” Int. J. Comput. Commun. Control, vol. 5,
no. 5, pp. 654-663, 2010.
[32] U. Von Luxburg, “A tutorial on spectral clustering,” Stat. Comput., vol. 17,
no. 4, pp. 395-416, 2007.
[33] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
[34] Q. Han, X. Hu, S. He, L. Zeng, L. Ye, and X. Yuan, “Evaluate Good Bus
Driving Behavior with LSTM,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 2018.
[35] S. Kanarachos, J. Mathew, and M. E. Fitzpatrick, “Instantaneous vehicle
fuel consumption estimation using smartphones and recurrent neural net-
works,” Expert Syst. Appl., vol. 120, pp. 436-447, 2019.
[36] B. Degraeuwe, B. Beusen, C. Beckx, T. Denys, L. Govaerts, S. Broekx, M.
Gijsbers, K. Scheepers, L. I. Panis, and R. Torfs, “Using on-board logging
VOLUME 4, 2016 17
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
devices to study the longer-term impact of an eco-driving course,” Transp.
Res. Part D Transp. Environ., vol. 14, no. 7, pp. 514-520, 2009.
[37] SPSS Tutorials: Pearson Correlation. [Online]. Available:
https://libguides.library.kent.edu/SPSS/PearsonCorr
[38] J. Cohen, Statistical power analysis for the behavioral sciences, second
edition. 1988.
[39] P. Ping, W. Qin, Y. Xu, C. Miyajima and T. Kazuya, "Spectral clustering
based approach for evaluating the effect of driving behavior on fuel
economy," 2018 IEEE International Instrumentation and Measurement
Technology Conference (I2MTC), Houston, TX, 2018, pp. 1-6. doi:
10.1109/I2MTC.2018.8409675
[40] M. Stoer and F. Wagner, “A simple min-cut algorithm,” J. ACM, vol. 44,
no. 4, pp. 585-591, 2002.
[41] L. Hagen and A. B. Kahng, “New Spectral Methods for Ratio Cut Parti-
tioning and Clustering,” IEEE Trans. Comput. Des. Integr. Circuits Syst.,
vol. 11, no. 9, pp. 1074-1085, 1992.
[42] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888-905, 2000.
[43] D. Wagner and F. Wagner, “Between Min Cut and Graph Bisec-
tion”,International Symposium on Mathematical Foundations of Computer
Science, pp. 744-750, 1993.
[44] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis
and an Algorithm,” in Advances in Neural Information Processing Sys-
tems, 2001.
[45] G. Schofield, J. R. Chelikowsky, and Y. Saad, “A spectrum slicing method
for the Kohn-Sham problem,” Comput. Phys. Commun., vol. 183, no. 3,
pp. 497-505, 2012.
[46] Apache Spark. [Online]. Available: https://spark.apache.org/
[47] C. Lanczos, “An iteration method for the solution of the eigenvalue
problem of linear differential and integral operators,” J. Res. Natl. Bur.
Stand. (1934)., vol. 45, no. 4, p. 255, 2012.
[48] A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-means clustering
algorithm,” Pattern Recognit., vol. 36, no. 2, pp. 451-461, 2003.
[49] P. Ping, Y. Sheng, W. Qin, C. Miyajima, and K. Takeda, “Modeling Driver
Risk Perception on City Roads Using Deep Learning,” IEEE Access, vol.
6, pp. 68850-68866, 2018.
[50] J. Redmon and A. Farhadi. 2018. “YOLOv3: An incremental improve-
ment.” [Online]. Available: https://arxiv.org/abs/1804.02767.
[51] K. Cho, B.V. Merrienboer, C.Gulcehre, D. Bahdanau, F. Bougares, H.
Schwenk, and Y. Bengio. 2014. “Learning Phrase Representations using
RNN Encoder-Decoder for Statistical Machine Translation.” [Online].
Available: https://arxiv.org/abs/1406.1078.
[52] S. Han, Y. Wang, H. Yang, W. (Bill) J. Dally, J. Kang, H. Mao, Y. Hu, X.
Li, Y. Li, D. Xie, H. Luo, and S. Yao, “Ese: Efficient Speech Recognition
Engine with Sparse LSTM on FPGA” Proc. 2017 ACM/SIGDA Int. Symp.
Field-Programmable Gate Arrays - FPGA ’17, pp. 75-84, 2017.
[53] J. Morton, T. A. Wheeler, and M. J. Kochenderfer, “Analysis of Recurrent
Neural Networks for Probabilistic Modeling of Driver Behavior,” IEEE
Trans. Intell. Transp. Syst., vol. 18, no. 5, pp. 1289-1298, 2017.
[54] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa-
tions by back-propagating errors,” Cognitive modeling, 1988.
[55] TensorFlow Tutorials, TensorFlow, 2019. [Online]. Available:
https://tensorflow.google.cn/tutorials/.
[56] Class LSTMCell,TensorFlow, 2019. [online]. Available:
https://www.tensorflow.org/versions/r1.13/api_docs/python/tf/nn/rnn_cell/
LSTMCell?hl=en#class_lstmcell
[57] D. Kingma and J. Ba. 2017. “Adam: A Method for Stochastic Optimiza-
tion” [Online]. Available: https://arxiv.org/abs/1412.6980v8.
[58] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol.
20, no. 3, pp. 273-297, 1995.
[59] R. J. Kate and R. J. Mooney, “Semi-supervised learning for semantic
parsing using support vector machines,” in NAACL-Short ’07 Human
Language Technologies 2007: The Conference of the North American
Chapter of the Association for Computational Linguistics, 2007, pp. 81-
84.
[60] M. Song and D. Civco, “Road Extraction Using SVM and Image Segmen-
tation,” Photogramm. Eng. Remote Sens., vol. 70, no. 12, pp. 1365-1371,
2013.
[61] G. Guo, S. Z. Li, and K. Chan, “Face recognition by support vector ma-
chines,” in Proceedings - 4th IEEE International Conference on Automatic
Face and Gesture Recognition, FG 2000, 2000, pp. 196-201.
[62] Support Vector Machines for Binary Classification. [Online]. Available:
https://ww2.mathworks.cn/help/stats/support-vector-machines-for-binary-
classification.html.
[63] C. X. Ling, J. Huang, and H. Zhang, “AUC: A better measure than accuracy
in comparing learning algorithms,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 2003, vol. 2671, pp. 329-341.
[64] M. E. Rice and G. T. Harris, “Comparing effect sizes in follow-up studies:
ROC area, Cohen’s d, and r,” Law Hum. Behav., vol. 29, no. 5, pp. 615-
620, 2005
PENG PING received the B.S. degree in automa-
tion from Beijing University of Chemical Technol-
ogy, Beijing, China, in 2010 and the M.S. degree
in automation from Nanjing University of Science
and Technology, Nanjing, China, in 2013. From
2013 to 2015, he was a R&D Engineer as part of
the Cloud switch Group in Huawei Technologies
Co.Ltd. He is currently working toward the Ph.D.
degree from Southeast University, Nanjing, China.
From 2017, He went to Nagoya University as a
joint Ph.D. Student. His research interests include vehicle safety, data-
mining, cloud computing and eco-driving.
WENHU QIN received the Ph.D. degree from
Southeast University, Nanjing, China, in 2005.
He is currently a Professor with the School of
Instrument Science and Engineering, Southeast
University, where he has been on its faculty since
1997. He directs the vehicle safety and virtual
reality laboratory at Southeast University. He has
authored or coauthored over 30 journal papers, ten
conference papers, and a book. He is the holder
of three patents. His research interests include
vehicle safety, virtual reality, crowd simulation, and road traffic accident
reconstruction.
YANG XU received the B.S. degree in Instrument
Science from East China University of Technol-
ogy, Nanchang, China, in 2012 and the M.S. de-
gree in Instrument Science from HeFei Univer-
sity of Technology, Hefei, China, in 2015. He is
currently working toward the Ph.D. degree from
Southeast University, Nanjing, China. From 2018,
He went to The University of Queensland as a
joint Ph.D. Student. His research interests include
machine learning, data science, bio-medical signal
processing, and human-computer interaction.
18 VOLUME 4, 2016
2169-3536 (c) 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2920489, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
CHIYOMI MIYAJIMA received the B.E., M.E.,
and Ph.D. degrees in computer science from the
Nagoya Institute of Technology, Japan, in 1996,
1998, and 2001, respectively. From 2001 to 2003,
she was a Research Associate with the Department
of Computer Science, Nagoya Institute of Tech-
nology. She was a Designated Associate Professor
with the Graduate School of Information Science,
Nagoya University, Japan, from 2003 to 2016. She
was an Associate Professor with the Institutes of
Innovation for Future Society, Nagoya University, from 2016 to 2018. Since
2018, she has been an Associate Professor with Daido University, Nagoya,
Japan. Her research interests include the analysis and the modeling of driver
behavior.
KAZUYA TAKEDA received his B.E.E., M.E.E.,
and Ph.D. degrees from Nagoya University, Japan,
in 1983, 1985, and 1994, respectively. Since 1985,
he has worked at Advanced Telecommunication
Research Laboratories and at KDD R&D Lab-
oratories, Japan. In 1995, he started a research
group for signal processing applications at Nagoya
University. His main focus is investigating driving
behavior using data centric approaches, utilizing
signal corpora of real driving behavior. He is cur-
rently a professor at the Institutes of Innovation for Future Society, Nagoya
University. He is also a member of the Board of Governors of the IEEE
Intelligent Transportation Systems Society. He is a Senior Member of the
IEEE.
VOLUME 4, 2016 19
... In the following subsections, we review the literature considering two main related topics: (i) classification of the driver's driving profile in terms of fuel consumption [44,41,28]; and (ii) regression models to estimate the fuel consumption, including analyses of related factors (environmental, traffic, driving behavior) [22,24,32]. Compared to humandesigned analyses of engine data, ML algorithms speed up the processing with a greater amount of data [29]. The ML Classifiers section presented below was extracted from [5], which is the article being expanded in this study. ...
... Other studies developed adaptive driving-style-oriented equivalent consumption minimization strategies for hybrid electric vehicles [40] and measured the effect of driving style on fuel consumption and traffic flow emissions based on field data [41]. [29] observed the impact of driver behavior on fuel consumption by classifying, evaluating, and predicting consumption using ML methods, such as neural networks, and visual information beyond the vehicle-gathered data, getting an accuracy of 80% to 83%. Nonetheless, we obtained superior metrics working only with vehicle features, reaching 100% accuracy through algorithms that require less computational power (Logistic Regression and Gradient Boosting). ...
... To evaluate the classifiers, we performed cross-validation, partitioning the datasets into training and testing sets. The chosen partitioning method consisted of taking one experiment of a "Stage" as a test set and the others for training, similar to [29]. This technique allows us to verify the generalization of the models, in addition to applying in results the metrics of accuracy, precision, and recall, as used and described by [28]. ...
Article
Full-text available
Data extracted directly from a vehicle’s electronic control unit (ECU) play a crucial role in the automotive industry because they contain valuable information from the engine and electronic parts. These data have the potential to enable compliance analysis, detect faults and errors, and guarantee driver and car safety as well as product quality. Among the possible uses of the data from the ECUs, driving profile analysis and fuel consumption prediction stand out, which enable analyses for insurers and transportation companies, and help to reduce fuel consumption and greenhouse gases, in addition to providing feedback to the driver. In this work, we apply machine learning algorithms to real data from an engine ECU to examine the driver’s driving behavior and accurately classify their fuel efficiency. Moreover, we develop regression models that predict fuel consumption for vehicles in operation. To ensure the effectiveness of our models, we carefully select variables strongly correlated with fuel consumption using a feature selection process. Compared to related works, both our profile classification results in precision, recall, and accuracy metrics, and our regression models result in the metrics of mean square errors, mean absolute error, and coefficient of determination, which are superior or similar. Notably, our algorithms exhibit lower computational costs and enable real-time analysis by utilizing a cloud server.
... Data extracted by the smartphoneembedded measurement devices were the inputs of the deep neural network-based fuel consumption estimation model. Ping et al. [68] investigated the relationship between driving behavior and vehicle fuel consumption rates as it is a complex phenomenon, particularly under dynamic driving conditions. Researchers of this study proposed two different kinds of machine learning methods to study the correlation between driving behavior and fuel demand. ...
... where • y i is the output representing the ith dependent variable • x i is the input representing the ith independent variable • b j (j 1,2,…,k) is the model parameter symbolizing the jth regression coefficient Integration of the sum of most minor square errors between model inputs and outputs into the optimization objective function is illustrated in the below given following equation, ( 9 4 ) where N is the number of samples in the collected dataset and k is the number of independent parameters in the linear model. The second estimation method employed for attaining better predictions is the conventional deep learning algorithm, which has been previously applied for fuel consumption of electric buses several times in the literature studies [62,68]. Figure 23 depicts the general schematic representation of the deep neural network scheme used for fuel consumption estimation of electric buses. ...
Article
Full-text available
This research study introduces a Q-learning enhanced hyper-heuristic framework for the accurate estimation of energy consumption rates of electric buses. Fundamentals of reinforcement learning concepts are hybridized with the integrated newly emerged metaheuristic methods of Aquila optimizer, Barnacles Mating Optimizer, Gradient-based Optimizer, Harris Hawks Optimization, and Poor and Rich Optimization algorithms to solve high-dimensional optimization problems with higher accuracy. In this context, the Q-learning algorithm is considered a high-level heuristic for administering the selection and move acceptance mechanisms, while search agents of those mentioned above low-level competitive metaheuristic algorithms meticulously explore the search space to find the optimum global point. Q-learning guides the operating hyper-heuristic in selecting the suitable low-level optimizer based on the Q-table score during iterations. An intelligent control mechanism is devised to get a reward or penalty for the actions of the low-level algorithms. The proposed method is evaluated on thirty-two optimization benchmark problems composed of unimodal and multimodal test functions. Then, each constituent algorithm and the hyper-heuristic model are applied to thirty-dimensional benchmark functions of CEC 2017 and twenty-eight test instances of CEC 2013. Four different challenging, complex real-world engineering design cases are also solved to assess the predictability of the proposed method on constrained problems. Finally, the proposed hyper-heuristic is employed to derive the fuel consumption estimates of electric buses. It is seen that the Multiple linear regression model, whose unknown parameters are extracted by the hyper-heuristic framework, gives the best predictions.
... Analysis of driving data can be crucial for understanding the relationship between driving behaviors and vehicle energy consumption. This enables researchers to better quantify the direct effects of driver behavior on energy efficiency and propose targeted strategies to curb emissions, taking into account the human element in vehicular operations (Ping et al., 2019). ...
Preprint
Full-text available
In urban traffic environments, driver behaviors exhibit considerable diversity in vehicle operation, encompassing a range of acceleration and braking maneuvers as well as adherence to traffic regulations, such as speed limits. It is well-established that these intrinsic driving behaviors significantly influence vehicle energy consumption. Therefore, establishing a quantitative relationship between driver behavior and energy usage is essential for identifying energy-efficient driving practices and optimizing routes within urban traffic. This study introduces a data-driven approach to predict the equivalent fuel consumption of a plug-in hybrid electric vehicle (PHEV) based on an integrated model of driver behavior and vehicle energy consumption. Unlike traditional models that provide point predictions of fuel consumption, this approach uses Conformalized Quantile Regression (CQR) to offer prediction intervals that capture the variability and uncertainty in fuel consumption. These intervals reflect changes in fuel consumption, as well as variations in driver behavior, and vehicle and route conditions. To develop this model, driver-specific data were collected through a driver-in-the-loop simulator, which tested different human drivers responses. The CQR model was then trained and validated using the experimental data from the driver-in-the-loop simulator, augmented by the synthetic data generated from Monte Carlo simulations conducted using a calibrated microscopic driver behavior and vehicle energy model. The CQR model was evaluated by comparing its predictions of equivalent fuel consumption intervals with those of baseline prediction interval methods that rely solely on conformal prediction.
... Ultimately, the primary purpose of forecasting fuel sales is to formulate strategies and build competitive advantages (Sołoducho-Pelc & Sulich, 2022) for enterprises within the business ecosystem of the fuel industry (Ensley-Field et al., 2023). It can also be noted that nowadays machine learning methods are most commonly used for sales forecasting, such as XGBoost (Bentéjac et al, 2021) or neural networks (Gamboa, 2017;Ping et al., 2019;Wu & Liu, 2012), also ARIMA models are used (Yusuf et al., 2022). ...
Article
Full-text available
Aim/purpose-This paper aims to explore both fuel sales forecasting and the business ecosystem, subsequently reversing the focus to examine the business ecosystem in the context of fuel sales forecasting. Accompanying this research objective are the following research questions: 1) Does the order in which the topics of "business ecosystems" and "fuel sales forecasting" are searched affect the search results? 2) Which keywords frequently co-occur in publications related to "business ecosystems" and "fuel sales forecasting"? 3) What is the relationship between the terms "fuel sales forecasting" and "business ecosystem"? T. Zema, A. Sulich, & M. Hernes 80 Design/methodology/approach-The study employs a hybrid review methodology, utilizing specific queries within the Scopus database to identify research themes and motifs. This hybrid form of literature review integrates the tenets of both bibliometric and structured reviews. In this study, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was employed. The visual analysis was conducted using VOSviewer bibliometric software, with a focus on keywords relevant to the relationship between fuel sales forecasting and business ecosystem terms. Findings-Key findings include the identification of co-occurring keywords in fuel sales forecasting and business ecosystem theory literature. The study reveals research gaps and potential areas for future study in business ecosystems, highlighting the impact of fuel sales forecasting in various economic sectors beyond traditional ones, like forestry, agriculture, and fisheries. Utilizing a hybrid literature study research method, the paper analyses data from scientific publications in the Scopus database and employs VOSviewer software to develop bibliometric maps of keyword co-occurrences. Research implications/limitations-The research underscores the broad implications of fuel sales forecasting within a business ecosystem context and identifies areas lacking in-depth study. This study maps scientific publications, identifying the intellectual structure and current research trends. This study contributes to the understanding of fuel sales forecasting within the business ecosystem context as a part of the energy sector transition. Originality/value/contribution-This paper contributes to the field of science and practice by identifying research areas integrating fuel sales forecasting within the business ecosystem construct. It indicates future promising research avenues for researchers and industry professionals, aiming to guide ongoing research. The article addresses a significant theme that warrants scholarly attention. This study allows researchers to define the research gaps covered by published articles and indicate the directions of scientific development.
... Then, the negative gradient of the loss function is calculated to fit the approximate value of the current wheel loss function. Equation (12) demonstrates the approximate value of the loss function. ...
Article
Full-text available
The financial risk management mechanism of enterprises can be more complete through exploration in the application effect of data mining technology combined with K-means clustering algorithm in enterprise risk audit. Hence, K-means clustering algorithm is introduced to study the paperless status of electronic payment in the trading process of e-commerce enterprises. Additionally, a risk audit model of e-commerce enterprises is implemented based on K-means algorithm combined with Random Forest Light Gradient Boosting Machine (RF-LightGBM). In this model, the actual operation process of data preparation, data preprocessing, model construction, model application and evaluation are implemented to study the payment flow in the transaction process of e-commerce enterprises by using big data analysis technology. Eventually, the performance of the model is evaluated by simulation. The results show that, compared with the models and algorithms proposed by scholars in other related fields, the classification accuracy of the model proposed here reaches 95.46 %. Simultaneously, the data message delivery rate of the model algorithm is basically stable at about 81.54 %, and the data message leakage rate, packet loss rate and average delay are lower than those of other models and algorithms. Therefore, under the premise of ensuring the prediction accuracy, the audit model of e-commerce enterprises can also achieve high data transmission security performance, which can provide experimental basis for the safety improvement and risk control of the audit process in e-commerce enterprises.
... Pavlovic et al. (2021) analyzed the accuracy, vehicle measurement distance, and specific fuel consumption rate of on-board fuel consumption measurement systems installed in light-duty trucks and heavy-duty trucks. Ping et al. (2019) obtained instantaneous fuel consumption using ECUs. Kanarachos et al. (2019) used smartphones and recurrent neural networks to estimate instantaneous vehicle fuel consumption. ...
Article
Full-text available
Based on the development of the concept of a resource-saving and environmentally friendly society, needing to develop low-carbon and sustainable urban transportation. Most of the pollutants come from the emissions of motor vehicle exhaust. Therefore, this paper analyzes the relationship between driving behavior and traffic emissions, to constrain driver behavior to reduce pollutant emissions. The GPS data are preprocessed by using Navicat for data integration, data screening, data sorting, etc., and then, the speed data are cleaned by using a combination of box-and-line plots and linear interpolation in SPSS. Second, this paper uses principal component analysis (PCA) to downsize 12 indicators such as average speed, average acceleration, and maximum speed and then adopts K-MEANS and K-MEDOIDS methods to cluster the driver’s behavioral indicators, selects the aggregation method based on the clustering indexes optimally, and analyzes the driver’s driving state by using the symbolic approximation aggregation method; finally, according to the above research results and combined with the MOVES traffic emission model to analyze the relationship between the driver’s driving mode, driving state, and traffic emissions, the decision tree can be used to predict the unknown driving mode of the driver to estimate the degree of its emissions.
Article
In recent years, creating a green and low-carbon sustainable development has received extensive attention, prompting considerable research into reducing pollution emissions in the transportation sector. This paper analyzes the energy consumption patterns of logistics vehicles on Beijing’s Sixth Ring Road. Firstly, driving segments are categorized based on variations in vehicle speed, followed by the application of the [Formula: see text]-means algorithm for segment clustering, resulting in the identification of three distinct driving states and the construction of corresponding driving cycles. It is observed that the driving states have high correlations with different road grades. Subsequent analysis reveals that speed, torque, and engine speed are the primary factors influencing energy consumption of logistic vehicles. Furthermore, energy consumption prediction models using the long short-term memory algorithm for the identified driving states on various road types are built leveraging historical data, i.e. vehicle speed, motor torque, and engine speed. Finally, the analysis highlights a notable increase in 100 km energy consumption for logistics trucks on branch roads with complex road conditions. This study contributes to the effective management of energy consumption in medium and large trucks.
Article
Full-text available
Research on how risk is perceived by drivers is vital to driving behavior research and driving safety. As risk can be divided into subjective and objective risk, in this paper we focus on modeling subjective risk perception by drivers using a deep learning method. Different drivers often perceive different levels of subjective risk under the same driving conditions. In addition, different driving conditions or driving events will have different effects on drivers. Based on these two risk perception features, in this paper we first design an experiment on a city road with two lanes to assess the level of subjective risk perceived by drivers belonging to different groups. We then use a deep learning network based method to abstract features of the driving environment. These environmental features are integrated with driver risk perception data and this information is used as training and testing data for the learning network. Finally, a Long Short Term Memory based method is adopted to model the subjective risk perception of individual drivers based on traffic conditions and vehicle operation data from the driver’s vehicle. Our results show that the proposed method can effectively model the subjective risk perception behavior of drivers, allowing for end-to-end risk perception prediction in future driving assistance systems.
Article
The high level of air pollution in urban areas, caused in no small extent by road transport, requires the implementation of continuous and accurate monitoring techniques if emissions are to be minimized. The primary motivation for this paper is to enable fine spatiotemporal monitoring based on crowd sensing, whereby the instantaneous fuel consumption of a vehicle is estimated using smartphone measurements. To this end, a surrogate method based on indirect monitoring using Recurrent Neural Networks (RNNs) that process a smartphone's GPS position, speed, altitude, acceleration and number of visible satellites is proposed. Extensive field trials were conducted to gather smartphone and fuel consumption data at a wide range of driving conditions. Two different RNN types were explored, and a parametric analysis was performed to define a suitable architecture. Various training methods for tuning the RNN were evaluated based on performance and computational burden. The resulting estimator was compared with others found in the literature, and the results confirm its superior performance. The potential impact of the proposed method is noteworthy as it can facilitate accurate monitoring of in-use vehicle fuel consumption and emissions at large scales by exploiting available smartphone measurements.
Chapter
Drivers’ behaviors and their decision can affect the probability of the traffic accident, pollutant emissions and the energy efficiency level, good driving behavior can not only reduce fuel consumption, but also improves ride comfort and safety. In this paper, a new concept, evaluation zone, is defined to distinguish special driving areas which has much influence on energy consumption and ride comfort. Then, based on reducing fuel consumption and improving ride comfort, evaluation zone based driving behavior model is proposed to obtain good driving behavior dataset for the long short-term memory (LSTM) to apply the driving behavior evaluation and driving suggestion providing tasks. By using 687# bus line’s driving data of Chongqing City, China, test results demonstrate that the developed model performs well and the LSTM could provide reliable driving evaluations and suggestions for drivers.
Article
This paper studies the co-design optimization approach to determine how to optimally adapt automatic control of an intelligent electric vehicle to driving styles. A cyber-physical system (CPS) based framework is proposed for co-design optimization of the plant and controller parameters for an automated electric vehicle, in view of vehicle's dynamic performance, drivability, and energy along with different driving styles. System description, requirements, constraints, optimization objectives and methodology are investigated. Driving style recognition algorithm is developed using unsupervised machine learning and validated via vehicle experiments. Adaptive control algorithms are designed for three driving styles with different protocol selections. Performance exploration method is presented. Parameter optimizations are implemented based on the defined objective functions. Test results show that an automated vehicle with optimized plant and controller can perform its tasks well under aggressive, moderate, and conservative driving styles, further improving the overall performance. The results validate the feasibility and effectiveness of the proposed CPS-based co-design optimization approach. IEEE
Article
Improving fuel economy and lowering emissions are key societal goals. Standard driving cycles, pre-designed by the US Environmental Protection Agency (EPA), have long been used to estimate vehicle fuel economy in laboratory-controlled conditions. They have also been used to test and tune different energy management strategies for hybrid electric vehicles (HEVs). This paper aims to estimate fuel consumption for a conventional vehicle and a HEV using personalized driving cycles extracted from real-world data to study the effects of different driving styles and vehicle types on fuel consumption when compared to the estimates based on standard driving cycles. To do this, we extracted driving cycles for conventional vehicles and HEVs from a large-scale U.S. survey that contains real-world GPS-based driving records. Next, the driving cycles were assigned to one of three categories: volatile, normal, or calm. Then, the driving cycles were used along with a driver-vehicle simulation that captures driver decisions (vehicle speed during a trip), powertrain, and vehicle dynamics to estimate fuel consumption for conventional vehicles and HEVs with power-split powertrain. To further optimize fuel consumption for HEVs, the Equivalent Consumption Minimization Strategy (ECMS) is applied. The results show that depending on the driving style and the driving scenario, conventional vehicle fuel consumption can vary widely compared with standard EPA driving cycles. Specifically, conventional vehicle fuel consumption was 13% lower in calm urban driving, but almost 34% higher for volatile highway driving compared with standard EPA driving cycles. Interestingly, when a driving cycle is predicted based on the application of case-based reasoning and used to tune the power distribution in a hybrid electric vehicle, its fuel consumption can be reduced by up to 12% in urban driving. Implications and limitations of the findings are discussed.
Article
Rapid vehicle growth in developing nations makes it necessary for these nations to address the transportation and environmental impacts of on-road mobile sources. To estimate the air quality impact of their fleets, many nations have adopted modified versions of U.S. or European emissions models or factors. In most cases, these models can lead to significant errors in emissions estimates. To address this problem, a new on-road mobile source emissions model, called the international vehicle emissions (IVE) model, designed for use in developing countries has been developed. The IVE model was developed jointly by researchers at the International Sustainable Systems Research Center and the University of California at Riverside. The IVE model uses local vehicle technology distributions, power-based driving factors, vehicle soak distributions, and meteorological factors to tailor the model to the local situation. In addition, an intensive 2-week field study was designed to collect the necessary fleet and activity data to populate the model with critical local information. The IVE model, along with the field study process, has proved highly effective in providing an improved estimate of mobile source emissions in an urban area and allows the effective analysis of local policy options. The studies have served to transfer tools and knowledge on the process of creating and improving mobile source inventories in an efficient manner. The rationale behind the development of the model, the development and application of the field studies, an overview of the results obtained to date, and planned next steps are described in this paper.
Article
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/