Available via license: CC BY 4.0
Content may be subject to copyright.
VOLUME XX, 20XX 1
Date of publication xxxx 00, 0000, date of current versio n xxxx 00, 0000.
Digital Object Identifier xxx/ACCESS xxx .Doi Number
Estimating Average Vehicle Mileage for Various
Vehicle Classes using Polynomial Models in
Deep Classifiers
Naghmeh Niroomand1, Christian Bach2
1 School of Management and Law, ZHAW Zurich University of Applied Sciences, Winterthur CH-8400, Switzerland
2 Automotive Powertrain Technologies Laboratory, Swiss Federal Laboratories for Materials Science and Technology, Dübendorf, CH-8600, Switzerland
Corresponding author: Naghmeh Niroomand (email: niro@zhaw. ch).
ABSTRACT Accurately measuring vehicle mileage is pivotal in precise CO2 emission calculations and the
development of reliable emission models. Nonetheless, mileage data gathered from surveys relying on self-
estimation, garage reports, and other estimation-based sources often yield rough approximations that
substantially deviate from the actual mileage. To tackle this issue, we present a comprehensive framework
aimed at bolstering the accuracy of CO2 emission models. This paper harnesses two innovative techniques:
the deep learning semi-supervised fuzzy C-means (SSFCM) and polynomial classifier models. By leveraging
these sophisticated mathematical techniques, we achieve successful classification of passenger vehicles,
enabling more precise evaluations of average mileage. Moreover, our framework supports segment-based
analysis, which enables segment-specific assessment of average mileage. By implementing our proposed
techniques, we aspire to enrich the precision of emission models, resulting in more dependable calculations
and an enhanced understanding of the environmental implications associated with vehicles.
INDEX TERMS Average vehicle mileage, Mileage model, CO2 emissions, Deep feature learning,
Polynomial deep classifiers, Vehicle classification.
I. INTRODUCTION
The adoption of the Paris agreement more than 8 years ago
[1], which aimed to mitigate global warming to a level below
1.5°C, has not yielded favorable results. Global greenhouse
gas emissions persistently continue to rise, causing concern.
The 2016 EU Reference Scenario indicates that without a
determined commitment to decarbonization, carbon dioxide
(CO2) emissions from transportation are forecasted to
experience a modest reduction of only 8% between 2010 and
2050, ultimately peaking by 2050 [2-3]. Various factors
contribute to this feeble progress, including a significant
proliferation of passenger cars, sluggish uptake of electric
vehicles, and a restricted transition to alternative fuels. These
factors serve as barriers to advancing and impede the
substantial mitigation of emissions.
According to International Energy Agency [4],
Switzerland's contribution to global anthropogenic CO2
emissions from fossil fuels is less than 0.2%. However, the
transportation sector has a substantial impact on Switzerland's
overall carbon footprint, constituting around 30.6% of the
nation's CO2 emissions in the year 2021. Among the various
transportation modes, road transport takes responsibility for
the majority, accounting for 97.3% of these emissions.
Passenger cars, specifically, constitute a significant portion of
Swiss road transport emissions, contributing approximately
71.2% of the total emissions [5]. It is noteworthy that the
normative CO2 emissions from passenger cars in Switzerland
have displayed a fluctuating pattern. After experiencing a
continuous decline since 2003 for both gasoline and diesel
vehicles, the normative CO2 emissions witnessed a slight
increase in 2017 due to the partial introduction of the new
WLTP normative measurement procedure for European type
approval and a significant rise in 2021 due to its full
introduction. While the introduction of the new normative CO2
measurement procedure had a significant impact on the
normative CO2 emissions, no impact on the CO2 emissions on
the road are expected; however, the difference between
normative and real CO2 emissions could be reduced
significantly [6]. Estimating CO2 emissions involves
employing calculation models that heavily rely on factors such
as the vehicle fleet composition, fuel parameters, and average
mileage of the vehicles [7-10].
Due to the lack of standardization in estimating vehicle
mileage, which varies greatly between periodic technical
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
2 VOLUME XX, 20XX
inspections (PTI), garage reports, and individual estimations,
accurately determining the true CO2 emissions from road
traffic has become increasingly challenging and unreliable.
Additionally, the implementation of new carbon dioxide
legislation, which includes an EU fleet average normative
emission target of 95 g CO2/km according to the old
measurement procedure, has resulted in significant changes in
new immatriculated vehicle fleet composition, as well as the
technical and dimensional characteristics of vehicles over time
[11]. Despite advancements in technology and measures such
as purchasing new vehicles and scrapping old or damaged
ones, Swiss passenger car fleet continues to have high CO2
emissions. Therefore, understanding the relationship between
estimated and actual mileage of passenger cars and the impact
of these differences on CO2 emissions is of utmost importance
in achieving the goal of zero net CO2 emissions by 2050.
Hence, this study aims to develop a mathematical model to
calculate average vehicle mileage for different vehicle
segments, thereby improving the accuracy of CO2 emissions
calculations. Given the limited informative value of CO2
standard values for real emissions, this approach represents an
important step towards a new CO2 assessment of road traffic.
The study builds upon previous work focused on developing a
machine learning methodology for the segmentation of
passenger cars based on technical and dimensional features
[12-14]. Fig. 1 illustrates the core challenge of vehicle
segmentation in this context.
Our primary objective was to enhance the accuracy of CO2
emission calculations and gain a deeper understanding of the
impact of variations in vehicle class on the CO2 footprint of
passenger vehicle fleets. To achieve this, we employed a
meticulous approach by categorizing passenger vehicles based
on their technical and dimensional characteristics [14]. This
segmentation allowed for better analysis of the intricate
variations within each class (intra-class) as well as
comparisons between different classes (inter-class). By doing
so, we aimed to comprehend the diverse factors influencing
the calculation of accurate average vehicle mileage across the
passenger vehicle fleet. In our approach, we conducted a
comparative analysis of various semi-supervised clustering
algorithms to predict labels obtained from unsupervised
clustering algorithms. Our focus was on utilizing a feature
learning technique, which effectively learns representations in
datasets with high dimensionality and significant uncertainties
[15-23]. Additionally, our research aimed to develop a model
for calculating average vehicle mileage for both inter-class and
intra-class scenarios, thereby improving the accuracy of CO2
emission calculations and understanding the impact of vehicle
class variations on the CO2 footprint of passenger vehicle
fleets [9]. Ultimately, this study serves a greater purpose by
facilitating a better understanding of the impact vehicle class
variations have on the overall CO2 footprint of passenger
vehicle fleets. With more precise calculations and deeper
insights, we can drive advancements toward reducing
emissions.
Section II briefly introduces the Swiss motor vehicles
system. Section III presents the related research. Section IV
describes the methods. Section V provides concise details on
the used datasets, the algorithms, the performed experiments
and the discussion of the results and last, section VI provides
the majors findings of our work and recommendations for
further research.
II. SWISS MOTOR VEHICLES AND CO2 EMISSIONS
Switzerland registered over 6.6 million motor vehicles in
2023. Out of these, more than 4.7 million were passenger cars.
On average, these vehicles are used for nine years. Despite a
high rate of the population accepting public transport modes
(59%), car travel still accounts for two thirds of the total
passenger kilometers [24]. In 2023, the collective distance
covered annually by these vehicles amounts to 55 billion
kilometers, with an average daily distance of 20.8 kilometers.
As reported by the Federal Office for Spatial Development,
this is equivalent to a rate of 100,000 kilometers per minute.
[25]. Switzerland records vehicle odometer readings during
periodic technical inspections (PTI). New cars undergo their
first PTI after 5 years, followed by a second test for cars after
three more years. Subsequent tests are required every two
years. The cantonal road traffic office in Switzerland manages
and standardizes PTIs, maintaining an extensive vehicle
database with odometer readings. Additionally, there was a
consistent decline in the average normative CO2 emissions for
newly registered cars, dropping from around 190 g CO2/km in
2003 to approximately 134 g CO2/km in 2016. However, the
mean CO2 emissions of new registrations saw an increase,
reaching 137.8 g CO2/km in 2018. By 2022, the average CO2
emissions of all new cars were approximately 120.9 g
CO2/km, indicating a decrease of around 9 grams compared to
2021. Despite this reduction, the specified target value of 118
g CO2/km (measured using the world harmonized light-duty
vehicles test procedure (WLTP)) that came into effect in 2022
was not fully achieved. This outcome is primarily attributed to
the implementation of the new WLTP measurement method.
A real-world factor of 1.4 was applied to NEDC-based CO2
emissions, while a factor of 1.2 was utilized for WLTP-based
CO2 emissions. During the intermediate period, the factor used
was 1.3.
Fig. 2 depicts the monthly progress of CO2 emissions from
newly registered cars between 2012 and 2022. The transition
from new European driving cycle (NEDC) to the more
accurate WLTP measurement method resulted in higher
recorded average CO2 emissions from new vehicles. To
prevent a sudden and drastic tightening of the CO2 target,
adjustments were made to align the CO2 target value with the
EU standards [26].
III. RELATED WORK
Over the last decades, despite achieving partial success in
meeting the normative CO₂ emission targets, actual CO₂
emissions in real-world conditions have only experienced a
modest decrease of approximately 10% [27]. However, a
notable difference of 42% now exists between the estimated
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
3 VOLUME XX, 20XX
and real-world emissions, resulting in a significant
discrepancy of 31 g CO₂/km in supposedly saved emissions
[28-29]. One crucial aspect in accurately calculating emissions
is determining the average mileage of vehicles, which can be
challenging to obtain precise values for or often rely on
estimations. In order to tackle the inconsistencies arising from
diverse methodologies in estimating average mileage and CO₂
emissions, researchers have made substantial scientific
progress. They have implemented advanced simulation
programs to construct comprehensive emission inventories,
enhancing the accuracy and reliability of their findings [30-
36]. These simulation-based approaches are effective in
addressing the limitations associated with conventional
laboratory test methods. Simulation programs play a crucial
role in bridging the gap between the two primary estimation
techniques. Top-down approaches focus on market dynamics,
such as fuel consumption patterns and economic factors, to
estimate CO₂ emissions on a broader scale. Conversely,
bottom-up approaches concentrate on intricate technological
details, taking into account factors such as vehicle class,
vehicle mileage, and engine efficiency. By employing
simulation programs, researchers are able to integrate these
complex factors and interactions, specifically in the case of
vehicle class and average mileage of vehicle, leading to more
precise estimates of CO₂ emissions. These programs simulate
real-world scenarios and consider a wide range of parameters,
enabling a comprehensive assessment of the environmental
impact of different activities and technologies. Consequently,
the compilation of emission inventories becomes more
reliable and comprehensive. Simulations also prove
particularly valuable in compensating for the limitations of
laboratory test methods. Traditional lab tests are conducted
under controlled conditions, which may not fully capture the
diverse and dynamic factors that influence real-world
emissions. In contrast, simulation programs enable more
realistic and dynamic simulations by considering a broader
range of variables and scenarios.
Jimenez et al. [37] conducted a review focusing on the
influence of vehicle classification, vehicle characteristics,
vehicle brand, and registration year on real-world CO₂
emissions. The researchers utilized a database consisting of
650 passenger cars. Their study aimed to elucidate how these
factors contribute to the disparity between real-world
emissions and type-approval emission values. Winslott et al.
[38] suggested targeting CO2 emission reduction in the upper
quintiles to have a more significant impact compared to
uniform reductions across all quintiles. However, eliminating
passenger mileage in the sustainable category contributes only
minimally to achieving the required one-third reduction. Pejić
et al. [39] devised a model that utilizes the age of vehicles and
their population size to determine the average mileage. The
model assumes an annual reduction in mileage of 5% for
passenger cars and small delivery vehicles, 5% for medium
trucks, 9.1% for large trucks, and 9% for buses.
However, limitations exist in simulation techniques when it
comes to considering variations in emissions within vehicle
classes and conducting detailed analyses. Feature learning
techniques show promise in addressing uncertainties and
improving classification but have been underutilized in
predicting vehicle CO₂ emissions on high-dimensional
datasets [40-41]. Saleh et al. [42] employed a combination of
deep learning and support vector machine (SVM) model to
forecast CO₂ emissions through energy consumption and
mileage monitoring. The model demonstrated a high level of
accuracy in its predictions, as evidenced by the low value of
the Root Mean Square Error. Pei et al. [43] introduced a
method to estimate emissions and mileage using driving cycle
data. Their approach incorporates temporal features and a
clustering method, leading to improved accuracy. The
proposed driving cycle construction technique eliminates the
need for manual parameters and is evaluated using
visualizations and the COPERT emission model.
Experimental results demonstrate significant enhancements in
accuracy and robustness. Chrysos et al. [44] provided a
principled approach to study state-of-the-art classifiers as
polynomial expansions. The research highlighted the
prevalence of polynomial functions in various classifiers and
elucidated their underlying design principles within a unified
framework. The suggested framework can be applied to
compress models or enhance model performance.
IV. MATERIALS AND METHODS
A. SEMI-SUPERVISED CLUSTERING
Semi-supervised clustering endeavors to optimize cluster
accuracy by identifying superior clusters in comparison to
those obtained through unsupervised learning algorithms
[18, 45-49]. Traditionally, semi-supervised clustering
techniques yield subpar results when represented in the
original feature space. To enhance the effectiveness of semi-
supervised clustering, it is rational to integrate deep feature
learning [50-53]. The framework of the suggested clustering
approach is depicted in Fig. 3.
In contrast to commonly employed methodologies in
semi-supervised clustering that rely on feature extraction
techniques, our approach integrates three different types of
information (diffusion labels, extracted core data, and
extracted feature vectors) in order to improve classification
accuracy and tackle challenges such as imbalanced class
distribution and overlapping among multiple classes.
Our proposed framework includes four primary layers,
where the first three layers have been previously discussed
in a prior study [14]. In the initial layer, we partition the
labeled data into separate training and testing sets which are
used for constructing and evaluating classifiers, respectively.
The second layer involves utilizing the training set along
with unlabeled data as input for the feature learning process.
The output of this step yields cluster centroids, which serve
as a basis for projecting data from both the training and
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
4 VOLUME XX, 20XX
testing sets into a newly learned space. Furthermore, this
projection allows for the extraction of feature vectors during
the subsequent feature extraction step. In the classification
step, we construct AdaBoost [54], Random Forest [55], and
semi-supervised fuzzy C-means clustering (SSFCM) models
using the feature vectors derived from the training set. These
models are then utilized to predict labels for the
corresponding feature vectors within the testing set. The
third layer involves the comparison of performance
parameters among the three individual models and a fusion
model, with the aim of evaluating their effectiveness in terms
of data classification and prediction. Lastly, the experimental
outcomes from the third layer are applied to a dataset
concerning used cars. In this context, we independently
employ the polynomial regression algorithm for each vehicle
class, with the objective of establishing a model that
accurately calculates the average mileage of a vehicle
belonging to a specific class. To validate the coefficients
obtained from the experimental model, a representative
subset is randomly selected from each class and compared
with a real dataset corresponding to the given year.
B. SEMI-SUPERVISED FUZZY C-MEAN CLUSTERING
A semi supervised fuzzy C-means clustering incorporates
deep feature learning to further improve its effectiveness
and eliminate redundant information [56-58]. Let uki be a
weighted squared errors function known as membership
function and can be defined as follow:
=
/() (1)
Where C is the number of clusters;
is a weighting
exponent that determines the degree of fuzziness and that
was set to 2 in order to ensure high membership values for
each data point to its closest cluster; A is a positive and
symmetric (n × n) weight matrix. The calculation for the
updated cluster center is as follows:
=
(2)
This method aims to minimize the objective function (J) as
follows:
(;,)=
(1≤ < ∞) (3)
.. = 1
(0 ≤ ki ≤ 1) (4)
Where N is number of data elements, Xk represents the
data k of X= {X1,X2,X3,…,XN} in the ith cluster; U is the
fuzzy partition matrix of the dataset X into c cluster; vi is
vectors of center in ith cluster; K denotes the features, and
denotes to the Euclidean distance function and
it is computed in the A norm between jth data and ith cluster
center.
C. STEPS OF DEEP SEMI-SUPERVISED FUZZY C-MEAN
CLUSTERING ALGORITHMS
The SSFCM algorithm comprises the following steps:
Algorithm 1 Membership and Centroid of FCM
Input: N data elements X= {X1, X2,…,XN}, weight
matrix (A), number of clusters (C), degree of fuzziness
(m=2), max iteration number (T), error threshold (ε)
Output: uki, vi
Set t = 0
1. Initialize centroid vectors vi
2. Update t = t + 1
3. Calculate membership degrees uki
4. Calculate updated centroid vectors vi
5. Until ||ut − ut−1 || < ε is satisfied, then stop
6. Otherwise repeat from step 3.
Subsequently, algorithm 2 is employed to compute the
memberships and centroids of deep FCM.
Algorithm 2 Training Strategies for Deep FCM
Input: N data elements X= {X1, X2,…,XN}, number of
clusters (C), clusters feature (K), labeled dataset (L),
unlabeled dataset (UN), membership degree (U), max
iteration number (T), error threshold (ε)
Output: , ,
,
Set t = 0
1. Initialize (random for labeled data)
2. Update t = t + 1
a) Calculate ,
b) Calculate
,
c) If the stopping criterion, until ||Jt − Jt−1 || < ε, is
fulfilled for all labeled and unlabeled objective
functions, then stop
3. Otherwise repeat from step 2
Then, employing algorithm 3, we select the features (s⸦K)
through the utilization of the random oversampling (ROS)
technique. The aim of employing the ROS technique is to
maintain a balance between the feature subsets of labeled
classes and unlabeled data elements [14].
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
5 VOLUME XX, 20XX
Algorithm 3 Feature Extraction of Deep FCM
Input: N data elements X= {X
1
, X
2
,…,X
N
}, clusters feature
(K), labeled dataset (XL), unlabeled dataset (XUNL),
() mean of the elements of D, set of the centroids (
,
)
Output: Set of extract features of labeled and unlabeled
dataset
Set Q =
1. Calculate =
2. Calculate =
3. Calculate means & of elements
(),()
4. feature extraction (()=max (0, ())
a) for all L and UNL features do
5. Return the set Q
In the following step, we utilize the Euclidean distance
technique, which is widely used as a metric to measure
similarity or distance between labeled and unlabeled feature
vectors. The result is determined by finding the maximum
average of the maximum relevant and minimum redundant
features between each selected feature of unlabeled data and
labeled classes:
max Sim( , ) = min =min
(1 ≤ i ≤ c), Xj XUNL (5)
Finally, in algorithm 4 the maximum average of the
maximum similarity between the selected features are
estimated, which is then utilized in the classifiers.
Algorithm 4 SSFCM Classifier
Input: N data elements X={X
1
,X
2
,…,X
N
}with minimum
features in any subset (s), set of the centroid (
,
) of
selected features
Output: Predicted labeled data (Q= {qL+1, qL+2,…, qL+N })
Set Q =
1. For each centroid index i ϵ {1, …, c} do
2. For each data element index j ϵ {1, …, N}, do the
following steps:
a) Employ
to calculate max Simi
b) If maximum average of max Simi ϵ ith labeled class,
then
c) Append Xj to ith labeled class
d) Update the set Q if a labeled class is achieved
e) For all
do
3. Return the set Q
D. STATE-OF-THE-ART METHODS
To improve the accuracy and performance of
classification, two ensemble learning methods, namely
Random Forest and AdaBoost, are utilized [59-60]. The
Random Forest technique employs parallel learning and
utilizes bagging for data training. Its purpose is to minimize
variance and bias in the model by creating multiple decision
trees (sets) from the original data. Importantly, in the parallel
process, these decision trees are independent of one another.
Conversely, AdaBoost functions as a sequential learning
approach that builds decision stumps based on the training
data. Each subsequent decision stump in this sequential
process depends on the previous one. Specifically, any errors
made by the initial decision stump, such as misclassifying a
few datasets, impact the subsequent decision stump by
assigning higher weights to those particular training data.
Conversely, AdaBoost functions as a sequential learning
approach that builds decision stumps based on the training
data. Each subsequent decision stump in this sequential
process depends on the previous one. Specifically, any errors
made by the initial decision stump, such as misclassifying a
few datasets, impact the subsequent decision stump by
assigning higher weights to those particular training data.
Algorithm 6 AdaBoost Classifier
Input: Data X whose number of elements N, training set
(S), decision tree in forest (B), subsample size (µ), max
iteration number (T)
1. Initialize data weights {Dn}to 1/N
2. for t ϵ {1, …, T} do
a) find best weak classifier ym (x) by minimizing
weighted error function Jm:
=
() 1[()]
b) compute
=
() 1[()]/
()
c) assign weight =log (
) to classifier ym (x)
d) update the data weights:
()=
()exp{1[ ()]}
e) Normalize
() to be proper distribution
Output: Make prediction using the final model:
()= (
())
Algorithm 5 Random Forests Classifier
Input: Training set (S), number of decision trees in the
forest (B), subsample size (µ), maximum iteration
number (T)
Output: Set K =
1. Initialize the iteration number t ϵ {1, …, T} do
2. For each decision tree index b ϵ {1, …, B} do the
following steps:
a) Sample µ instances from S with replacement,
creating a subsample set St
b) construct a decision tree Kt using decision tree b on
the subsample set St
c) Add the trained decision tree classifier Kt to set K
3. Return the set K
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
6 VOLUME XX, 20XX
E. Performance Measure
To evaluate the effectiveness of the various algorithms, we
analyze the confusion matrix to calculate metrics. These
metrics are used to assess the performance of the algorithms
and are outlined below:
Predicted Value (class i)
Positive Negative
Actual Value
Positive
True Positive
Prediction (TP)
False negative
prediction (FN)
Recall (R
i
)
Negative
False Positive
Prediction (FP)
True negative
prediction (TN)
Specificity
Precision (P
i
)
Negative Predictive
F-Measure
Rand Index
(0 ≤ RI ≤ 1)
Adjusted Rand Index []
()[] (-1 ≤ ARI ≤ 1)
F. Model Fusion
The Model fusion method is a deep learning technique that
combines multiple classification predictive models with
individual weights to improve the final estimation. This
approach serves as a more robust meta-classifier by
leveraging a majority voting classifier estimator, which helps
overcome the limitations of individual classifiers and results
in higher classification accuracy. The two commonly used
types of voting classifiers are the hard voting classifier and
soft voting classifier. The hard voting classifier determines
the majority vote by giving equal weights to each classifier
(selecting the mode of all predicted labels), while the soft
voting classifier calculates the majority vote by assigning
different weights to each classifier (considering the
probability of all predicted labels). The predictions of the
voting classifier can be defined as:
H(x) = maxlab
(x, j, 1), … , lab
(x, j, c)
(1≤ j ≤ T) (1≤ c ≤ K) (6)
S(x) = max
(,,)
,
(,,)
, … ,
(,,)
(7)
where Hvote(x) represent the outcome of the hard voting
process. The function lab (x, j, c) acts as an indicator,
determining whether x belongs to the label c as calculated by
the jth classifier, Svote(x) represents the result of the soft
voting process. The probability p (x, j, c) is associated with
the likelihood of the jth classifier surpassing certain threshold
values. Here, nT denotes the total number of classifiers, while
k signifies the number of labels.
G. Polynomials and Deep Classifiers
Polynomials are mathematical expressions that establish a
connection between an input variable and coefficients. In the
context of regression analysis, polynomial regression is
employed to handle data that deviates from the assumptions
of basic models [62-63]. When combined with ensemble
methods, polynomial regression can improve the overall
model's generalization performance. This combination has
the potential to decrease both bias and variance, resulting in
improved predictions for unseen data. A principled approach
is adopted to investigate advanced classifiers as polynomial
expansions. It is observed that polynomials play a recurring
role in various classifiers, and their design choices can be
interpreted under a unified framework. Building upon
existing methods, we introduce extensions that lead to
enhanced classification accuracy. Specifically, we represent
state-of-the-art ensemble learning methods as polynomials,
allowing us to gain insights into the inductive bias of each
vehicle class. This allows for evaluating performance under
different changes in the training distribution, such as limited
samples per class or a long-tailed distribution.
Algorithm 7 Third-degree polynomials
Input: Data X whose number of elements N, training set
(S), polynomial coefficients (C), degree of polynomial (t)
Output:
1. Set t = 3
2. Update t = t - 1
2. Initialize data weights ѡ [n]
3.
[]()=,
[]()=,
[]()=
4. = (
[]()
[]())
[]() +
5. Y = (
[]
) +
V. EXPERIMENTS
A.
DATA PREPARATION
In this study, the primary dataset is the Swiss Motor
Vehicle Information System (MOFIS) [64]. It contains
information about more than 4.7 million passenger vehicles.
This information includes various details such as type
approval numbers, physical characteristics, weight
properties, ownership information, technical specifications,
and registration dates. Additionally, we also incorporated
data on vehicle technical specifications and periodic
technical inspections from the Technical Type Approval
Information provided by the Federal Roads Office (ASTRA)
[65] and the Vehicles Expert Partner [66] respectively.
To align with the goal of the paper, we divided the dataset
into two parts: a training set and a testing set. The training
set consisted of 308,824 newly registered passenger cars in
2018. Initially, a filtering process was applied to remove
vehicles that didn't fit the conventional definitions of
passenger cars, such as small pickup trucks, standard pickup
trucks, vans, special purpose vehicles (SPVs), sports cars,
and multi-purpose vehicles (MPVs). These cars were then
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
7 VOLUME XX, 20XX
categorized into various types based on their make, model,
and manufacturer code, resulting in 366 unique passenger car
types. These types were further classified into classes: 18 in
the micro class, 50 in the small class, 110 in the middle class,
84 in the upper middle class, and 104 in the large class and
luxury class. Due to limitations of the unsupervised FCM
clustering algorithm, only labeled data with true labels and a
membership degree higher than 0.95 were used as the core
dataset. This core dataset was utilized to extract accurate
classifications and serve as the foundation for subsequent
training steps. Furthermore, 10% of the data from each class
was randomly selected as training labeled samples. Lastly,
the used cars dataset [67], consisting of 1,880,417 entries,
was utilized. This comprehensive dataset contains essential
information about the mileage covered by each car and their
estimated age. Its purpose is to facilitate precise predictions
concerning the mileage associated with different passenger
car types.
B. EXPERIMENTAL SETUP AND RESULTS
The initial analysis revealed a strong correlation between
emissions, vehicle segments, sub-segments, and influencing
factors. To process the data, a combination of labeled and
unlabeled data was used, along with the core dataset, and
principal component analysis was applied to address
multicollinearity. New features were extracted to reduce the
number of features, and a selection process involving
resampling and Euclidean distance was used to identify the
best features (algorithm 2-4). Pseudo labels were assigned to
unlabeled data for pre-training different classification
algorithms (algorithm 5-6). Model fusion was performed
using labeled data to improve accuracy. The results indicated
that the soft voting fusion model and SSFCM algorithm
achieved the highest accuracy (Table 1). The final features
extracted from the model fusion were used to re-evaluate the
single algorithms and select the ultimate classification
model. These experimental results demonstrate that the
SSFCM algorithm is capable of extracting more valuable
information from the vehicle dataset, resulting in improved
recognition rates compared to other classifiers.
The underlying assumption of feature extraction is that it
leads to improved classification results in comparison to the
initial classifier’s predictions with the original features. In
Algorithm 7, particularly during the Polynomial features
selection step, the inter-class and intra-class classification
results obtained from the SSFCM approach are employed on
used cars dataset. These results encompass a total of five
classes, each accompanied by their respective sub-classes, as
described in Table 2.
The extraction of average mileage data has been
conducted specifically for used cars within the age range of
up to 20 years, focusing on data obtained in the year 2018.
Furthermore, in-depth analysis of the dataset from the year
2015 has been carried out to examine the average mileage
data for each vehicle class. Additionally, the dataset has been
expanded to include sport cars and MPVs. As a result, there
are now seven distinct car segments available for mileage
analysis. Rigorous data quality checks are performed to
eliminate mileage records with unrealistic values, such as
zero mileage or a negative mileage difference between
consecutive years for a given vehicle. In Fig. 4, an
encompassing comparison of inter-class differences is
depicted by employing the utilization of boxplots.
Furthermore, it offers a comprehensive overview of the
relationship between mileage and age within each distinct
class.
Following data refinement, a third-degree polynomial
analysis is conducted on the average mileage and age data,
Fig. 5. This analysis takes into consideration the life cycle
pattern of vehicles, where the highest annual mileage is
typically observed at the initial stage, followed by a period
of stabilization and gradual decline. Consequently, the
utilization of a third-degree polynomial analysis provides a
more accurate representation of the actual vehicle operation.
To validate the coefficients obtained from the resulting
model, a stratified sampling approach is employed based on
the number of unique vehicles in some intra-classes.
Specifically, 10% of the data from each class is randomly
selected as training labeled samples from SSFCM classifiers,
representing their respective classes Fig. 6. Finally, the
resulting model is compared to an existing one in 2015 for
evaluation and comparison purposes as presented in Table 3.
C.
DISCUSSIONS
The experiment results have demonstrated that there exists
a significant decrease in the overall fleet size for each vehicle
class within the age range of up to three years. This reduction
in fleet size can be attributed to the ongoing scarcity of used
cars that are specifically three years old or younger. These
vehicles are consistently 17% less available compared to
other age ranges that have slightly higher supply. However,
it is important to note that despite this decline in fleet size,
the average age of passenger cars in Switzerland has
continued to increase throughout the study period.
Specifically, the average age of passenger cars has risen from
9 years in 2018 to 9.3 years by the end of 2021. This upward
trend suggests that older vehicles are remaining in use for
longer periods of time. It could also indicate a growing
interest in electric vehicles among some individuals.
Furthermore, based on observations, a newly purchased
vehicle was found to cover an average distance of 17,935 km
annually. However, after 5 years, this annual distance
reduced by 25%, and after 10 years, it decreased by 40%.
Despite the majority of passenger kilometers being covered
by cars in Switzerland, there is a notable variation in mileage
between rural and urban areas, particularly for older vehicles.
For instance, 10-year-old vehicles in cities travel
approximately 20% fewer kilometers on average compared
to their rural counterparts.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
8 VOLUME XX, 20XX
The distribution of mileage in various segments tends to
shift towards higher values. The range of driving
performance is also quite extensive, with some vehicles only
traveling a few thousand kilometers per year while others
cover several tens of thousands of kilometers. Moreover, the
mileage of vehicles is not constant throughout their lifespan.
It generally decreases over time, although the decrease is not
linear during the first ten years but becomes more linear
thereafter. Across all segments, the mileage is halved over a
span of 20 years. To estimate the average mileage, we
considered the entire operational period. We used a
polynomial model that takes into account the vehicle age and
population size as input features for each vehicle class.
Experimental results demonstrate discrepancies between the
estimated data and the actual vehicle data. However, we
validated the model by comparing it with the actual data for
2015, as shown in Fig. 7. It is worth noting that the difference
mainly arises in the accumulated mileage after vehicles reach
five years of age, indicating that used cars generally
accumulate more mileage than initially predicted. This
underscores the significance of updating the model
coefficients every three to five years, leading to
recommendations for regular updates. Furthermore, the
accuracy of the chosen model coefficients was validated by
applying them to a randomly selected sample from within the
vehicle class. This test demonstrated their applicability and
reliability. Additionally, except for sports cars, we observed
a strong positive correlation (R2 > 0.90) between the
proposed estimated mileage and the data provided by the
federal vehicle control authority for all vehicle classes.
Hence, we used distinct approaches to assess the mileage in
both cases, and the results exhibit a high level of correlation.
Our previous findings indicated significant variations in
average CO2 emissions among different vehicle classes [14].
This underscores the importance of considering both average
mileage within and between vehicle classes to effectively
address emission reductions. Additionally, our observations
revealed that the average mileage of SUVs tends to increase
as vehicles age. This notable finding highlights that the SUV
fleet in Switzerland covered an extensive distance of 12.6
billion kilometers in 2018, resulting in the unnecessary
production of CO2 emissions with each kilometer traveled,
Fig. 8. Therefore, the integration of inter-class and intra-class
classification offers crucial insights for developing strategies
to transform the passenger vehicle fleet and promote
decarbonization.
VI. CONCLUSION
The accurate estimation of average annual vehicle mileage
holds immense importance for conducting effective emission
analyses and making informed decisions in sustainable
transport planning. Incorrect or unreliable mileage values can
result in misguided incentives and long-term consequences.
Therefore, this study aimed to establish a precise model for
calculating average vehicle mileage, enabling a better
understanding of the influence of vehicle segments on real
CO2 emissions. To develop the model, extensive analysis of
mileage data was conducted for vehicles up to 20 years of age
in 2018. Utilizing technical and dimensional features, vehicles
were classified based on a mathematical model. Additionally,
the model considered population size and vehicle age as inputs
for calculating average mileage within each vehicle class. The
results demonstrated that the actual mileage covered by
vehicles in Switzerland exceeded the estimated mileage,
particularly after five years of vehicle age. The model's
validity was assessed by comparing it with actual data from
2015, leading to recommendations for updating the model
coefficients every three to five years. Additionally, the
accuracy of selected model coefficients was affirmed by
applying them to a randomly selected sample within the
vehicle class, exemplifying their applicability and reliability.
Overall, this study successfully developed a model for
accurately calculating average vehicle mileage. The proposed
approach offers several advantages, including automated
vehicle classification of vast databases, facilitating fleet
analysis. The adoption of clustering-based mathematical
segmentation also allows for standardized comparisons of
databases across different regions. Furthermore, as mileage
varies over the age of vehicles, it was observed that the
average mileage of SUVs tends to increase over time. As a
result, combining inter-class and intra-class classification is
essential for gaining valuable insights to formulate fleet
transformation strategies aimed at decarbonizing the
passenger vehicle fleet. An area that holds promise for future
research involves utilizing CO2 estimates derived from real-
world measurements instead of relying solely on type approval
values.
This approach would enable a more precise evaluation of
fleet CO2 emissions and further enhance our understanding of
the environmental impact of vehicles. This study's
comprehensive analysis and the development of an accurate
model for calculating average vehicle mileage contribute to
advancing CO2 emission analysis, informing sustainable
transport planning, and paving the way for effective fleet
transformation strategies to reduce CO2 emissions in the
passenger vehicle sector.
ACKNOWLEDGMENT
The authors thank the Federal Roads Office (FEDRO) for
providing the Swiss Vehicle Information System (MOFIS)
data and the vehicle technical dataset and the Vehicle Expert
Partners for providing the expert segmentation data.
REFERENCES
[1] The Paris Agreement | UNFCCC. https://unfccc.int/process-and-
meetings/the-paris-agreement/the-paris-agreement, Accessed: Sep. 2021.
[2] EU Reference Scenario 2016, Energy, Transport and GHG Emissions.
Trends to 2050, Euro-pean Commission, 2016. https://ec.europa.eu/ ener-
gy/sites/ener/files/documents/ref2016_report_final-web.pdf
[3] P. Capros, D. Vita, A. Tasios, N. Siskos, P. Kannavou, M. Petropoulos,
A. Evangelopoulou, S. Zampara, et al., EU Reference Scenario 2016 -
Energy, transport and GHG emissions Trends to 2050. European
Commission Directorate - General for Energy, Directorate - General for
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
9 VOLUME XX, 20XX
Climate Action and Directorate - General for Mobility and Transport, 2016,
Luxembourg.
[4] World Energy Outlook 2018, Int. Energy Agency, Paris, France, 2018.
[5] Federal Office for the Environment (FOEN). Accessed: Oct. 2023.
[Online].Available:https://www.bafu.admin.ch/bafu/en/home/topics/climat
e/state/data/greenhouse-gasinventory/ transport.html.
[6] International Council on Clean Transportation (ICCT). November 2016.
[Online]. Available:https://theicct.org/wp-content/uploads/2022/01/
FactSheet_FromLabToRoad_ICCT_2016_EN.pdf, Accessed: July 2023.
[7] RE. Wilson, J. Anable, S. Cairns, T. Chatterton, S. Notley, JD. Lees-
Miller. “On the estimation of temporal mileage rates,” Procedia – Social and
Behavioral Sciences, vol. 80, pp. 139-156, May 2013.
[8] L. Fridstrøm, V. Østli, KW. Johansen. “A stock-flow cohort model of
the national car fleet,” Eur. Transp. Res. Rev., vol. 8 no. 22, doi:
https://doi.org/10.1007/s12544-016-0210-z, Aug. 2016.
[9] S. Caserini, C. Pastorello, P. Gaifami, L. Ntziachristos. “Impact of the
dropping activity with vehicle age on air pollutant emissions,” Atmospheric
Pollution Research, vol. 4(3), pp: 282-289, doi:10.5094/ APR.2013.031,
July 2013.
[10] V. Williams, S. McLaughlin, R. McCall, T. Buche. “Motorcyclists'
self-reported riding mileage versus actual riding mileage in the following
year,” Journal of Safety Research, vol. 63, pp: 121-126, Available from:
doi:10.1016/j.jsr.2017.10.004, Oct. 2017.
[11] European Commission. Reducing CO2 Emissions from Passenger Cars.
https://ec.europa.eu/clima/policies/transport/vehicles/cars_en. Accessed
August 2020.
[12] N. Niroomand, C. Bach, M. Elser, “Vehicle dimensions based
passenger car classification using Fuzzy and Non-Fuzzy clustering
methods”, Transportation Research Record, 2021. DOI:
10.1177/03611981211010795
[13] N. Niroomand, C. Bach and M. Elser, "Robust Vehicle Classification
Based on Deep Features Learning," in IEEE Access, vol. 9, pp. 95675-
95685, 2021, doi: 10.1109/ACCESS.2021.3094366.
[14] N. Niroomand, C. Bach and M. Elser, “Segment-based CO2 emission
evaluations from passenger cars based on deep learning techniques,” in
IEEE Access, vol. 9, pp. 166314-166327,
doi.org/10.1109/ACCESS.2021.3135604.
[15] W. Shi, Y. Gong, C. Ding, Z. Ma, X. Tao, and N. Zheng, “Transductive
semi-supervised deep learning using min-max features”, in Proc. Eur. Conf.
Comput. Vis. (ECCV), vol. 11209. Cham, Switzerland: Springer, pp. 299-
315, 2015.
[16] X. Zhu, “Semi-Supervised learning literature survey”, 2008.
http://pages.cs.wisc.edu/~jerryzhu/research/ssl/semireview.html.
[17] L. Zhuo, L.Y. Jiang, etc. “Vehicle Classification for Large-Scale
Traffic Surveillance Videos Using Convolutional Neural Networks”, 2016.
[18] G. Forestier and C. Wemmert, “Semi-supervised learning using
multiple clusterings with limited labeled data”, Inf. Sci., vols. 361-362, pp.
48-65, Sep. 2016.
[19] Chapelle, O., Schölkopf, B., & Zien, A. ''Semi-supervised learning (1st
ed.)”, Cambridge: The MIT Press. 2006.
[20] S. Melacci and M. Belkin, “Laplacian support vector machines trained
in the primal”, J. Mach. Learn. Res., vol. 12, pp. 1149-1184, Jul. 2011.
[21] A. Arshad, S. Riaz and L. Jiao, “Semi-Supervised Deep Fuzzy C-Mean
Clustering for Imbalanced Multi-Class Classification”, in IEEE Access, vol.
7, pp. 28100-28112, 2019, doi: 10.1109/ACCESS.2019.2901860.
[22] H. Wu and S. Prasad, “Semi-supervised deep learning using pseudo
labels for hyperspectral image classification”, IEEE Trans. Image Process.,
vol. 27, no. 3, pp. 1259-1270, Mar. 2018.
[23] Y. Ren, G. Zhang, G. Yu, “Random subspace based semi-supervised
feature selection”, in: Proceedings of International Conference on Machine
Learning and Cybernetics, pp. 113–118, 2011.
[24] Swiss federal office of energy (SFOE), CO₂ emission regulations for
new cars and light commercial vehicles,
https://www.bfe.admin.ch/bfe/en/home/efficiency/mobility/co2-emission-
regulations-for-new-cars-and-light-commercial-vehicles.html, Accessed:
Sep. 2023.
[25] Swiss federal office of energy (SFOE), Mobility behavior of the
population
https://www.bfs.admin.ch/bfs/de/home/statistiken/mobilitaetverkehr/perso
nenverkehr/ verkehrsverhalten .html, Accessed: Nov. 2023.
[26] Swiss federal office of energy (SFOE), Vollzug der CO2-
Emissionsvorschriften für Personenwagen 2022,
https://www.bfe.admin.ch/bfe/de/home/effizienz/mobilitaet/co2-
emissionsvorschriften-fuer-neue-personen-und-
lieferwagen/personenwagen-
pw.exturl.html/aHR0cHM6Ly9wdWJkYi5iZmUuYWRtaW4uY2gvZGUv
c3VjaGU_a2/V5d29yZHM9NDcw.html, Accessed: Nov. 2023.
[27] Stanford earth matters magazine, COVID lockdown causes record drop
in carbon emissions for 2020. Stanford University,
https://earth.stanford.edu/news, Accessed: Oct. 2021
[28] F. Grelier, “CO₂ Emissions Form Cars: The Facts; Technical Report;
European Federation for Transport and Environment”, AISBL: Brussels,
Belgium, 2018.
[29] G., Fontaras, N. G., Zacharof, and B. Ciuffo, “Fuel consumption and
CO₂ emissions from passenger cars in Europe – Laboratory versus real-
world emissions”. Progress in Energy and Combustion Science, vol. 60, pp:
97-13, 2017.
[30] J. Pavlovic, K. Anagnostopoulos, M. Clairotte, V. Arcidiacono, G.
Fontaras, I. P. Rujas., V.V. Morales, B. Ciuffo, “Dealing with the Gap
between Type-Approval and In-Use Light Duty Vehicles Fuel Consumption
and CO₂ Emissions: Present Situation and Future Perspective”.
Transportation Research Record, vol. 2672, pp:23–32, 2018.
[31] H. Dai, P. Mischke, X. Xie, Y. Xie, “Masui, T. Closing the gap? Top-
down versus bottom-up projections of China’s regional energy use and CO₂
emissions”. Appl. Energy vol. 162, pp: 1355–1373, 2016.
[32] S.D. Tuladhar, M., Yuan, P. Bernstein, W.D. Montgomery, A. Smith,
“A top–down bottom–up modeling approach to climate change policy
analysis”. Energy Econ. 2009, 31, S223–S234.
[33] D.P. Van Vuuren, M. Hoogwijk, T. Barker, K. Riahi, S. Boeters, J.
Chateau, S. Scrieciu, J. van Vliet, T. Masui, K. Blok, et al., “Comparison of
top-down and bottom-up estimates of sectoral and regional greenhouse gas
emission reduction potentials”. Energy Policy, vol. 37, pp: 5125–5139,
2009.
[34] N. Karali, T. Xu, J. Sathaye, “Reducing energy consumption and CO₂
emissions by energy efficiency measures and international trading: A
bottom-up modeling for the U.S. iron and steel sector”. Appl. Energy, vol.
120, pp:133-146, 2014.
[35] P. Thunis, B. Degraeuwe, K. Cuvelier, M. Guevara, L. Tarrason, A.
Clappier. “A novel approach to screen and compare emission inventories”.
Air Qual Atmos Health, vol. 9, pp: 325-333, 2016.
[36] Y. Natarajan, G. Wadhwa, KR. Sri Preethaa, A. Paul. Forecasting
Carbon Dioxide Emissions of Light-Duty Vehicles with Different Machine
Learning Algorithms. Electronics. 2023; 12(10):2288.
https://doi.org/10.3390/electronics12102288
[37] J. L. Jimenez, J. Valido, N. Molden. “The drivers behind differences
between oficial and actual vehicle efciency and CO₂ emissions”, Transp Res
Part D, vol. 67, pp: 628 – 641, 2019.
[38] Winslott Hiselius L, Smidfelt Rosqvist L. Segmentation of the current
levels of passenger mileage by car in the light of sustainability targets – The
Swedish case. Journal of Cleaner Production. 2018;182: 331-337. Available
from: doi:10.1016/j.jclepro.2018.02.072
[39] G. Pejić, F. Bijelić,; G .Zovak, Z. Lulić, “Model for Calculating
Average Vehicle Mileage for Different Vehicle Classes Based on Real Data:
A Case Study of Croatia”, Promet-Traffic Transp. 31, pp: 213–222, 2019.
[40] Z. He, G. Ye, H. Jiang, Y. Fu, "Vehicle Emission Detection in Data-
Driven Methods", Mathematical Problems in Engineering, vol. 2020,
Article ID 4875310, 13 pages, 2020. https://doi.org/10.1155/2020/4875310.
[41] C. Saleh, N. R. Dzakiyullah, and J. B. Nugroho, “Carbon dioxide
emission prediction using support vector machine”. IOP Conference Series
Materials Science and Engineering, vol, 114, no. 1, pp: 012148, 2016. DOI:
10.1088/1757-899X/114/1/012148.
[42] M. Ghahramani and F. Pilla, “Analysis of Carbon Dioxide Emissions
from Road Transport Using Taxi Trips”, in IEEE Access, vol. 9, pp: 98573-
98580, 2021, doi: 10.1109/ACCESS.2021.3096279.
[43] L. Pei, Y. Cao, Y. Kang, Z. Xu and Z. Zhao, “UJ-FLAC: Unsupervised
Joint Feature Learning and Clustering for Dynamic Driving Cycles
Construction”, in IEEE Transactions on Intelligent Transportation Systems,
vol. 23, no. 8, pp. 10970-10982, Aug. 2022, doi:
10.1109/TITS.2021.3098353.
[44] G. Grigorios Chrysos, M. Georgopoulos, J. Deng, and Y. Panagakis.
“Polynomial networks in deep classifiers”. arXiv preprint
arXiv:2104.07916, 2021.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
10 VOLUME XX, 20XX
[45] Z. Dong, Y. Wu, M. Pei and et al., “Vehicle type classification using a
semi supervised convolutional neural network”. IEEE Trans. Intelligent
Transportation Systems, vol. 16, no. 4, pp: 2247-2256, 2015.
[46] Y. Ren, X. Hu, K. Shi, G. Yu, D. Yao, Z. Xu, “Semi-supervised den
peak clustering with pairwise constraints”, in: Proceedings of the Fifteenth
Pacific Rim International Conference on Artificial Intelligence, pp. 837–
850, 2018.
[47] Y. Qin, S. Ding, L. Wang, Y. Wang, “Research Progress on Semi-
Supervised Clustering”, Cognitive Computation, vol. 11, pp: 599 -612, 2019.
[48] Y. Re, X. Hu, K. Shi, G. Yu, D. Yao, Z. Xu, “Semi -supervised denpeak
clustering with pairwise constraints, in: Proceedings of the Fifteenth Pacific
Rim International Conference on Artificial Intelligence, pp. 837–850, 2019.
[49] G. Forestier, C. Wemmert, “Semi-supervised learning using multiple
clusterings with limited labeled data”, Information Sciences, vol. 361–362,
pp: 48-65, 2016.
[50] A. Arshad, S. Riaz, L. Jiao, and A. Murthy, “Semi-supervised deep
fuzzy c-mean clustering for software fault prediction”, IEEE Access, vol. 6,
pp. 25675-25685, 2018.
[51] A. Arshad, S. Riaz, L. Jiao, and A. Murthy, “The empirical study of
semi-supervised deep fuzzy c-mean clustering for software fault
prediction”, IEEE Access, vol. 6, pp. 47047-47061, 2018.
[52] W. Shi, Y. Gong, C. Ding, ZMX. Tao, N. Zheng, “Transductive semi-
supervised deep learning using min-max features” Proceedings of the
European Conference on Computer Vision (ECCV), 299-315, 2018.
[53] G. Chen, “Deep transductive semi-supervised maximum margin
clustering Compute”. Res. Repos, pp 1–14. arXiv: 1501.06237.
[54] S. Wang and X. Yao, “Multiclass imbalance problems: Analysis and
potential solutions”, IEEE Trans. Syst., Man, Cybern., B (Cybern.), vol. 42,
no. 4, pp. 1119-1130, Aug. 2012.
[55] Verikas, A. Gelzinis, A. Bacauskiene, M. “Mining data with random
forests: A survey and results of new tests”. Pattern Recognit., vol. 44, pp:
330–349, 2011
[56] A. Arshad, S. Riaz and L. Jiao, “Semi-Supervised Deep Fuzzy C-Mean
Clustering for Imbalanced Multi-Class Classification”, in IEEE Access, vol.
7, pp. 28100-28112, 2019, doi: 10.1109/ACCESS.2019.2901860.
[57] S. Riaz, A. Arshad and L. Jiao, “A Semi-Supervised CNN With Fuzzy
Rough C-Mean for Image Classification”, in IEEE Access, vol. 7, pp.
49641-49652, 2019, doi: 10.1109/ACCESS.2019.2910406.
[58] Y. Ren, X. Hu, K. Shi, G. Yu, D. Yao, Z. Xu, “Semi-supervised
denpeak clustering with pairwise constraints”, in: Proceedings of the
Fifteenth Pacific Rim International Conference on Artificial Intelligence,
pp. 837–850, 2018.
[59] T. Hasanin, T. M. Khoshgoftaar, J. Leevy, N. Seliya, “Investigating
Random Under-sampling and Feature Selection on Bioinformatics Big
Data”. 2019 IEEE Fifth International Conference on Big Data Computing
Service and Applications (BigDataService). pp. 346-356, doi:
10.1109/BigDataService.2019.00063.
[60] R. Kumar and R. Verma, “Classification algorithms for data mining: a
survey,” International Journal of Innovations in Engineering and
Technology (IJIET), vol. 1, no. 2, pp. 7–14, 2012.
[61] C. Macdonald and I. Ounis, “Voting for candidates: adapting data
fusion techniques for an expert search task,” in Proceedings of the 15th
ACM International Conference on Information and Knowledge
Management, pp. 387–396, Arlington, VA, USA, November 2006.
[62] Grigorios G. Chrysos, Stylianos Moschoglou, Giorgos Bouritsas,
Jiankang Deng, Yannis Panagakis, and Stefanos P Zafeiriou. Deep
polynomial neural networks. IEEE Transactions on Pattern Analysis and
Machine Intelligence (T-PAMI), page 1–1, 2021.
[63] Z. Chen, K. Batselier, J. A. K. Suykens and N. Wong, “Parallelized
Tensor Train Learning of Polynomial Classifiers”, in IEEE Transactions on
Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4621-4632, Oct.
2018, doi: 10.1109/TNNLS.2017.2771264.
[64] Das Motorfahrzeuginformationssystem der Eidgenössischen
Fahrzeugkontrolle, MOFIS. https://www.experience-online.ch/de/9-case-
study/2023-mofis. Accessed: Mar. 2019.
[65] Bundesamt für Strassen, ASTRA. https://www.astra.
amin.ch/astra/de/home.html, Accessed: Mar. 2019.
[66] Schweizer partner für fahrzeugdaten. https://www.auto-i-dat.ch.
Accessed: Mar. 2020.
[67] Autoscout24. https://www.autoscout24.ch/de. Accessed: Mar. 2023.
Naghmeh Niroomand is a techno-energy economist
of the ZHAW Zurich University of Applied
Sciences. Prior to joining ZHAW, she was at
Automotive Powertrain Technologies Laboratory of
the Swiss Federal Laboratories of Material Science
and Technology (Empa). She received her MA, 1th
PhD and IAPM degree from the Eastern
Mediterranean University, Cyprus and Queen’s
University, Canada and her 2th PhD degree from
SSPH, Switzerland. Since then, worked as research
fellow in Transport and Mobility Laboratory at
EPFL Lausanne and senior scientist at Cambridge Resources International,
USA. Her current research interests include vehicle fleet and operational
analysis, retro-perspective analyze vehicle specific changes in function of
spatial technology and economic frame conditions, and economies of
synthetic energy carriers.
Christian Bach is head of the Automotive
Powertrain Technologies Laboratory of the Swiss
Federal Laboratories of Material Science and
Technology (Empa). He received his B.Sc. degree
in Automotive Engineering from the University of
Applied Sciences in Bern. He performed two
internships at the Haagen-Smit Laboratory of the
California Air Resources Board in El Monte (USA)
to study Zero and Ultra low Emission Technologies
in the transport sector. He is lecturer at ETH Zurich,
and member of several expert groups in Switzerland.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
11 VOLUME XX, 20XX
TABEL 1. Evaluation of model performance on a dataset with labeled rate of 10% from each class
TABLE 2. Inter-class and intra-class classification of passenger cars using SSFCM in the year 2018.
Vehicle classes and Sub-classes Power (kW) Fuel Type
Class representatives: SMART Fortwo 451, TOYOTA Aygo AB1, FIAT 500 312, SUZUKI Jimny FJ
Micro class
- Not-SUV
- SUV
(Q1<54),
(54=<Q2<=66),
(54=<Q2<=66)
Benzin, Diesel
Class representatives: VW Polo AW, AUDI S1 8X, ALFA ROMEO MiTo 955, SUZUKI Vitara LY
Small class
- Not-SUV
- SUV
(Q1<70),
(70=<Q2<=103),
(Q3>103)
Diesel, Benzin
Class representatives: DACIA Duster SR, TOYOTA C-HR AX1, ALFA ROMEO Giulietta 940, VW Tiguan 5N
Middle class
- Not-SUV
- SUV
(Q1<103),
(103=<Q2<=135),
(Q3>135)
Diesel, Benzin
Class representatives: SKODA Octavia 5E, ALFA ROMEO Giulia 952, AUDI A4 B8, HYUNDAI Santa TM
Upper middle class
- Not-SUV
- SUV
(Q1<114),
(114=<Q2<=185),
(Q3>185)
Diesel, Benzin
Class representatives: AUDI A6 4G, BMW 5er G5L, MERCEDES-BENZ AMG 212, AUDI Q7 4L
Large & luxury class
- Not-SUV
- SUV
(Q1<135),
(135=<Q2<=240),
(Q3>240)
Diesel, Benzin
Techniques Method
Feature Learning Techniques
Feature Extraction Techniques
Accuracy Rate Precision Rate Training Accuracy Test Accuracy
Algorithm 4
SSFCM
0.954
0.953
0.952
0.904
Algorithm 5
Random Forest
0.902
0.89
0.903
0.837
Algorithm 6
AdaBoost
0.891
0.871
0.781
0.715
Model Fusion
Hard Voting
0.921
0.935
Soft Voting
0.942
0.956
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
12 VOLUME XX, 20XX
TABLE 3. Accuracy of polynomial model coefficients validated on 10% randomly chosen SSFCM labeled samples within the vehicle classes.
Vehicle classes and
sub-classes polynomial regression (k=3) R2 value Accuracy
Micro class
y = -1.5608x3 + 44.105x2 - 554.94x + 13382
0.875
0.952
Small class
y = -1.0057x3 + 30.805x2 - 466.24x + 14674
0.739
0.903
Middle class
y = 0.2663x3 - 22.979x2 - 70.834x + 17908
0.910
0.964
SUV
y = 1.128x3 - 60x2 + 533.35x + 14567
0.787
0.926
Upper middle class
y = 1.2903x3 - 58.968x2 + 109.1x + 20252
0.850
0.937
SUV
y = 0.9501x3 - 29.58x2 - 270.75x + 19330
0.841
0.931
Large & luxury class
y = 6.0331x3 - 234.5x2 + 1931.3x + 18118
0.877
0.943
SUV
y = 3.7257x3 - 140.56x2 + 1063.2x + 16550
0.836
0.904
MPVs
y = -1.9785x3 + 78.987x2 - 1290.5x + 20673
0.824
0.918
Sport Cars
y = -2.489x3 + 100.31x2 - 1226.2x + 17701
0.715
0.817
FIGURE 1. Character izing vehicle fleet composi tion structure-type data input fram ework. Internal combust ion engine (ICE), mil d hybrid electric ve hicle
(MHEV), full hybrid electric vehi cle (HEV), plug-in hybrid electric vehicle (PHEV), battery electric vehicle (BEV), fuel cell electric vehicle (FCEV), new European
driving cycle (NEDC) an d world harmonized light-duty vehicles test procedure (WLTP).
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
13 VOLUME XX, 20XX
FIGURE 2. Monthly normative CO2 emissions 2012-2022. Data source: ASTRA (IVZ/TARGA), BFE (CO 2 enforcement data).
FIGURE 3. The structure of the pr oposed semi-super vised deep learnin g and Polynomial r egression approac h.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
14 VOLUME XX, 20XX
FIGURE 4. Overall comparison of inter-class differences (Boxplots A and B) and mileage-age relationship in each se gment (Boxpl ot C).
FIGURE 5. SSFCM classi fier and poly nomial regression perform ed on each segment.
(A)
(B)
(C)
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
15 VOLUME XX, 20XX
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
N. Niroom and et al.: Preparation of Papers for IEEE Access (Novemb er 2023)
16 VOLUME XX, 20XX
FIGURE 6. Applying a polynomial re gression of t he third order for each vehi cle segment, along with 10% sample of average mileage in some intra-
classes as well as the average mileage for the year 2015.
FIGURE 7. Comparison of act ual average mileage and estimated values
FIGURE 8. FIGURE 8. Distribution of mileage within selected passenger car segments. Additionally, a 10% sample of average mileage in specific intra-
classes is incl uded. Boxplot representation with median and 25/75% quartiles and mean (×) of the mileage of the passenger c ar segments.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3359990
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/