Conference PaperPDF Available

Classification of Travel Modes from Cellular Network Data Using Machine Learning Algorithms

October 2021

October 2021

DOI:10.1109/ELMAR52657.2021.9550817

Conference: 63rd International Symposium ELMAR-2021
At: Zadar, Croatia

Authors:

Leo Tisljaric

University of Zagreb

Dominik Cvetek

University of Zagreb

Valentin Vareškić

iOLAP

Martin Gregurić

University of Zagreb

Data availability in recent years has grown exponentially, allowing researchers in the transport sector to harness valuable information regarding traffic flows. In that sense, cellular network data represents valuable traffic information when dealing with spatially large areas due to its property of collecting route data using distant mobile base stations. This property enables the automatic collection of origin-destination data, which is traditionally collected using field or online questionnaires. This paper aims to present the possibility of using origin-destination data extracted from cellular network dataset to classify travel modes. A case study was performed on the dataset collected in the City of Rijeka, Croatia. Dataset is evaluated on five machine learning algorithms, which resulted in Random forest as the highest performing algorithm with an accuracy score of 99.93%

OD matrix for recorded trips at the City of Rijeka and its nearby surrounding

…

Sectors (spatial zones) for the City of Rijeka

…

Distributions and correlations between attributes

…

Precision, recall and F1 score values for machine learning algorithms

…

represent precision, recall, and F-1 values for every algorithm. We report the precision calculated as T P/(T P + F P ), recall T P/(T P + F N ), and F-1 as a harmonic mean between precision and recall, where T P stands for true positive, F P for true negative, and F N for false negative values. Generally, confusion matrices, also called error matrices, show the difference between a number of true versus predicted class labels. Fig.6 represents confusion matrices with normalized values for all observed algorithms. Confusion matrices also confirm RF as the best performing algorithm with wellseparated classes. KNN and DT algorithms achieved good separation between classes, NB could not separate the 'car' and 'public transport' classes, while LR could not separate the 'car' from 'public transport' and 'walking' from 'public transport' classes.

…

Figures - uploaded by Leo Tisljaric

Content may be subject to copyright.

Content uploaded by Leo Tisljaric

Content may be subject to copyright.

Classiﬁcation of Travel Modes from Cellular

Network Data Using Machine Learning Algorithms

Leo Tišljari´

c1, Dominik Cvetek1, Valentin Vareški´

c2, Martin Greguri´

1Faculty of Transport and Trafﬁc Sciences, University of Zagreb

Vukeli´

ceva 4, HR-10000 Zagreb, Croatia

2iOLAP d.o.o.

Prolaz Marije Kruciﬁkse Kozuli´

c 1, HR-51000, Rijeka, Croatia

ltisljaric@fpz.unizg.hr

Abstract—Data availability in recent years has grown expo-

nentially, allowing researchers in the transport sector to harness

valuable information regarding trafﬁc ﬂows. In that sense, cel-

lular network data represents valuable trafﬁc information when

dealing with spatially large areas due to its property of collecting

route data using distant mobile base stations. This property

enables the automatic collection of origin-destination data, which

is traditionally collected using ﬁeld or online questionnaires. This

paper aims to present the possibility of using origin-destination

data extracted from cellular network dataset to classify travel

modes. A case study was performed on the dataset collected in

the City of Rijeka, Croatia. Dataset is evaluated on ﬁve machine

learning algorithms, which resulted in Random forest as the

highest performing algorithm with an accuracy score of 99.93%.

Keywords—Travel mode classiﬁcation; Machine learning; Cel-

lular network data; Origin-destination matrices

I. INTROD UC TI ON

Deployment of various trafﬁc sensors combined with the

increased development of data processing techniques and com-

puting power resulted in exponential growth of available trafﬁc

datasets in the recent decade. Consequently, many data-driven

road trafﬁc-related research topics emerged like trafﬁc data

fusion models [1], [2], trafﬁc state estimation and prediction

[3], [4], and trafﬁc control [5].

In this paper, the methodology for the classiﬁcation of

Origin-Destination (OD) data extracted from cellular network

data using machine learning algorithms is presented. The

paper’s primary goal is to compare the performances of the

most used machine learning algorithms to identify and propose

the best-performing ones.

Authors in [6] compared classiﬁcation algorithms for se-

lecting the travel mode. Random Forest (RF) and K-Nearest

Neighbors (KNN) were compared. RF showed better accuracy

for travel mode classiﬁcation applied to Global Navigation

Satellite System (GNSS) historical data and data streaming.

Zang et al. in [7] analyzed the navigational behavior of users

using GNSS traces. The goal of analyzing data is to enable

the construction of a more ﬁne-tuned optimal route. Authors in

[8] discuss a new travel mode classiﬁcation approach using a

Convolutional Neural Network (CNN) on accelerometer data.

Authors in [9] analyze the potential and limitations of using

cellular network data for trafﬁc analysis. Katatian et al. [10]

calculate travel times as the primary attributes for clustering.

Trips between the exact origin and destination zones are

combined in a cluster. KNN is used to calculate clusters

representing a particular travel modes: walking, public trans-

portation, or driving a private car. Authors in [11] compare

three geometry-based mode classiﬁcation methods and three

supervised methods to classify trips extracted from the cellular

network.

Contributions of this paper are as follows: (i) methodol-

ogy for processing OD-based datasets for machine learning

algorithms, (ii) evaluation of machine learning algorithms for

classiﬁcation of travel modes using OD dataset, and (iii)

proposed methodology and evaluation are applied on the real-

life dataset from the City of Rijeka, Croatia.

The rest of the paper is organized as follows. Section 2

presents a used methodology that describes the usage of OD

matrices, presents the used dataset, prepossessing steps, and

feature selection part. Then, every used machine learning

algorithm is described with advantages and disadvantages.

Section 3 presents the results of the evaluation on ﬁve machine

learning algorithms. The conclusion and future work directions

are given in section 4.

II. METHODOLOGY

A. Origin-destination matrix

Cells in the OD matrix represent the number of trip records

in the observed time period, where each trip is realized as

unimodal. Fig.1 represents the used OD matrix where the

trips are recorded at the City of Rijeka, Croatia. Data was

collected and aggregated for average working day in a year

with excluded data from a holiday season. The data contains

records divided into 48 sectors covering the wide city area

represented in Fig.2.

The OD matrix is mostly used as an input for the trafﬁc

simulations, where the matrix is used for generating the trips

that represent the initial trafﬁc model of the observed area

[12]. The matrix can also be used as an input for trafﬁc state

estimation, anomaly detection or prediction models because it

highlights the area of the increased trafﬁc activity [13].

63rd International Symposium ELMAR-2021, 13-15 September 2021, Zadar, Croatia

173

Authorized licensed use limited to: University of Zagreb: Faculty of Electrical Engineering and Computing. Downloaded on October 05,2021 at 11:35:13 UTC from IEEE Xplore. Restrictions apply.

Attribute Description

Interval Time interval when the trip started.

Duration Duration of the trip [s].

Start sector Id of the trip starting sector.

End sector Id of the trip ending sector.

Air distance Euclidean distance between start

and end of the trip [m].

Air speed Obtained speed using Euclidean

distance [m/s].

Road distance Real road distance between start

and end of the trip [m].

Road speed Obtained speed using road distance

[m/s].

Mode Travel mode used for the trip.

TABLE I. Attributes of the dataset

B. Data

The used dataset contains around 500,000 OD records

extracted from the cellular network, which represent trip data.

Every trip record is described with nine attributes presented

in Table I. Attribute ’Interval’ represents the time interval in

which the trip is recorded, with possible values of 00:00-06:00,

06:00-09:00, 09:00-14:00, 14:00-18:00 or 18:00-00:00h. ’Du-

ration’ represents the total trip duration in seconds. The

’Air distance’ and ’Air speed’ represent computed Euclidean

distance and the speed calculated using Euclidean distance

and the difference in time from the trip start to end. The

’Road distance’ and ’Road speed’ represent computed actual

distance and speed of vehicles when traveling using road

infrastructure. The ’Mode’ attribute represents the travel mode

used for completing the trip, where recorded travel modes are

’car,’ ’public transport,’ and ’walking.’

C. Data preprocessing

Distributions of the observed attributes are shown in the

diagonal images in Fig.3. Due to highly skewed distributions,

we used the adjusted box plot method to remove anomalies

because it is a method that does not take any parametric as-

sumptions and uses med couple as a robust skewness estimator

[14]. The anomaly detection resulted in excluding trips that

had a duration longer than 9,000 s (2.5h).

Figure 1. OD matrix for recorded trips at the City of Rijeka and its nearby

surrounding

Figure 2. Sectors (spatial zones) for the City of Rijeka

After the anomaly detection, the feature scaling step is

conducted. To mitigate the inﬂuence of different units and

large differences in attribute values, all attributes are scaled

to the [0,1] range.

After the data preprocessing, 504,419 trip records were used

for further analysis. The most dominant travel mode was a

’car’ with 398,729 records, followed by ’public transport’ with

83,060 records, and ’walking’ with 22,630 records.

D. Feature selection

After examining all dataset attributes represented in Table I,

four features were selected for further analysis and the learning

process: ’Duration,’ ’Air speed,’ ’Road distance,’ and ’Road

speed.’

When considering the correlation plots in Fig.3, it can be

observed that attributes ’Road distance’ and ’Air distance’

provide redundant information, and one can be removed. We

removed the ’Air distance’ attribute from the learning because

road distance is a more informative attribute for trafﬁc data

analysis.

E. Machine learning algorithms

This section presents ﬁve machine learning algorithms

for classiﬁcation of the travel modes used in this research

with corresponding advantages and disadvantages. We analyze

following algorithms: (i) Decision tree (DT), (ii) K-Nearest

Neighbors, (iii) Logistics Regression (LR), (iv) Naive Bayes

(NB), and (v) Random Forest.

1) K-Nearest Neighbors: Machine learning algorithm that

solves classiﬁcation and regression problems. KNN is a non-

parametric classiﬁcation method, which classiﬁes data points

based on its similarity measure. Algorithm calculates the dis-

tance between data points using the preferred distance metric

and adds the distance to an ordered collection. Then, it sort the

ordered collection from smallest to largest and picks the ﬁrst K

groups from the sorted collection [15]. The main disadvantage

of KNN as it becomes signiﬁcantly slower with increasing data

volume makes it an impractical choice in environments where

rapid forecasting is required. The main advantages of KNN for

63rd International Symposium ELMAR-2021, 13-15 September 2021, Zadar, Croatia

174

Authorized licensed use limited to: University of Zagreb: Faculty of Electrical Engineering and Computing. Downloaded on October 05,2021 at 11:35:13 UTC from IEEE Xplore. Restrictions apply.

classiﬁcation are very simple application, robust in terms of

search space; for example, classes do not have to be linearly

separable.

2) Decision Tree: Machine learning algorithm that solves

both classiﬁcation and regression tasks. A DT can be used to

visually represent the decision-making process using a tree-

like model of decisions. The main advantages of a DT are

that it does not require scaling and normalization, and it is

very intuitive and easy to explain. The main disadvantage is

that it often involves a higher time to train the model, and

small changes in the data can lead to a large change in the

structure of the optimal decision tree because calculation can

go far more complex compared to other algorithms [16].

3) Random Forest: Machine learning algorithm that can

solve both classiﬁcation and regression problems. RF builds

multiple decision trees and merges them together to get a

more accurate and stable predictions [17]. One of the beneﬁts

of using the RF is the power of handling large data sets

with higher dimensionality, and it is an effective method

for estimating missing data. The main limitation is that a

large number of trees can make the algorithm too slow and

ineffective for real-time predictions.

4) Logistic regression: In a classiﬁcation problem, the

output variable can only take discrete values for a given set of

inputs, and it models the data using the sigmoid function. The

advantage is that LR is easy to implement, interpret, and very

efﬁcient to train. The disadvantage is that it is challenging to

capture complex relationships. In high-dimensional datasets,

this can lead to the model over-ﬁtting into the training set.

Non-linear problems cannot be solved with LR because it has

a linear decision background [18].

5) Naive Bayes: NB is a classiﬁcation technique based

on Bayes’ Theorem with assumption of independence among

predictors. It is used to discriminate against different objects

based on speciﬁc features. The main advantage of NB is that it

requires a small amount of training data. When the assumption

Figure 3. Distributions and correlations between attributes

63rd International Symposium ELMAR-2021, 13-15 September 2021, Zadar, Croatia

175

Authorized licensed use limited to: University of Zagreb: Faculty of Electrical Engineering and Computing. Downloaded on October 05,2021 at 11:35:13 UTC from IEEE Xplore. Restrictions apply.

Figure 4. Total accuracy for machine learning algorithms

of independent predictors holds true, the classiﬁer performs

better as compared to other models. The main limitation

of NB is the assumption of independent predictors. In real

life scenario, the predictors are dependent, and it is almost

impossible that we get a set of predictors which are entirely

independent [19].

III. RESULTS

This section presents the evaluation of the cellular network

dataset on ﬁve machine learning algorithms. For each algo-

rithm, total accuracy is presented, alongside precision, recall,

F-1 score, and corresponding confusion matrices. The input

dataset is divided into training and test sets by using the

standard ratio of 30% for test and 70% for training. The

experiments are done using Python programming language

with the package Scikit-Learn [20]. The used code for this

research is publicly available on the Github repository [21].

The total accuracy of the algorithms is shown in Fig.4. It

can be observed that RF achieved the best result with a total

accuracy of 99.35%. KNN and DT achieved high accuracy

scores, while LR and NB failed to achieve sufﬁcient scores.

Figure 5. Precision, recall and F1 score values for machine learning algorithms

Fig.5 represent precision, recall, and F-1 values for every

algorithm. We report the precision calculated as T P /(T P +

F P ), recall T P /(T P +F N ), and F-1 as a harmonic mean

between precision and recall, where T P stands for true

positive, F P for true negative, and F N for false negative

values.

Generally, confusion matrices, also called error matrices,

show the difference between a number of true versus predicted

class labels. Fig.6 represents confusion matrices with normal-

ized values for all observed algorithms. Confusion matrices

also conﬁrm RF as the best performing algorithm with well-

separated classes. KNN and DT algorithms achieved good

separation between classes, NB could not separate the ’car’

and ’public transport’ classes, while LR could not separate

the ’car’ from ’public transport’ and ’walking’ from ’public

transport’ classes.

IV. CON CL US IO N

This paper presents a methodology for processing the OD

dataset extracted from the cellular network mobile records.

The paper’s main goal was to evaluate the dataset on the

most used machine learning algorithms for classiﬁcation tasks

and report the results by comparing the performances. Based

on the results, conclusions can be drawn: (i) best performing

algorithm was RF, with a total accuracy of 99.93%, (ii) KNN

and DT algorithms can be used for this purpose because of

high accuracy rates and well separation of the classes, and (iii)

LR and NB are not well suited for the classiﬁcation task on

this dataset.

Future work directions based on this dataset include auto-

matic feature extraction from OD matrices. As OD matrix is

presented as a heatmap, it can be used as an input for learning

the CNN to estimate the trafﬁc states, and the results of the

CNN can be validated using the methodology proposed in this

paper.

ACKNOW LE DG ME NT

This research has been supported by the University of

Zagreb, Student Centre as part of the project “Znanstveno-

istraživaˇ

cke aktivnosti studentske istraživaˇ

cke skupine SIS-

DVA” and European Regional Development Fund under the

grant KK.01.1.1.01.0009 (DATACROSS). Data used for this

research is provided by Ericsson Nikola Tesla Ltd. through the

collaboration with the Laboratory for Data Science in Trafﬁc

and Logistics at Faculty of Transport and Trafﬁc Sciences,

University of Zagreb.

REFERENCES

[1] D. Cvetek, M. Muštra, N. Jeluši´

c, and L. Tišljari´

c, “A survey of methods

and technologies for congestion estimation based on multisource data

fusion,” Applied Sciences, vol. 11, no. 5, 2021.

[2] D. Cvetek, I. Horenec, M. Muštra, and N. Jeluši´

c, “Analysis of cor-

relation between dwell time measured using bluetooth detector and

occupancy,” in 2019 International Symposium ELMAR, pp. 31–34, 2019.

[3] L. Tišljari´

c, T. Cari´

c, B. Abramovi´

c, and T. Fratrovi´

c, “Trafﬁc state

estimation and classiﬁcation on citywide scale using speed transition

matrices,” Sustainability, vol. 12, no. 18, 2020.

63rd International Symposium ELMAR-2021, 13-15 September 2021, Zadar, Croatia

176

Authorized licensed use limited to: University of Zagreb: Faculty of Electrical Engineering and Computing. Downloaded on October 05,2021 at 11:35:13 UTC from IEEE Xplore. Restrictions apply.

(a) (b)

(e)

Figure 6. Confusion matrices for machine learning algorithms: (a) Decision

tree, (b) K-nearest neighbors, (c) Logistics regression, (d) Naive Bayes, and

(e) Random forest

[4] T. Erdeli´

c, T. Cari´

c, M. Erdeli´

c, L. Tišljari´

c, A. Turkovi´

c, and N. Jeluši´

“Estimating congestion zones and travel time indexes based on the

ﬂoating car data,” Computers, Environment and Urban Systems, vol. 87,

p. 101604, 2021.

[5] F. Vrbani´

c, E. Ivanjko, K. Kuši´

c, and D. ˇ

Cakija, “Variable speed limit

and ramp metering for mixed trafﬁc ﬂows: A review and open questions,”

Applied Sciences, vol. 11, no. 6, 2021.

[6] M. Erdeli´

c, T. Cari´

c, E. Ivanjko, and N. Jeluši´

c, “Classiﬁcation of travel

modes using streaming gnss data,” Transportation Research Procedia,

vol. 40, pp. 209–216, 2019. TRANSCOM 2019 13th International

Scientiﬁc Conference on Sustainable, Modern and Safe Transport.

[7] L. Zhang, S. Dalyot, and M. Sester, Travel-Mode Classiﬁcation for

Optimizing Vehicular Travel Route Planning, pp. 277–295. Berlin,

Heidelberg: Springer Berlin Heidelberg, 2013.

[8] H. Wang, G. Liu, J. Duan, and L. Zhang, “Detecting transportation

modes using deep neural network,” IEICE Transactions on Information

and Systems, vol. E100.D, no. 5, pp. 1132–1135, 2017.

[9] N. Breyer, D. GundlegÃ¥rd, and C. Rydergren, “Cellpath routing and

route trafﬁc ﬂow estimation based on cellular network data,” Journal of

Urban Technology, vol. 25, no. 2, pp. 85–104, 2018.

[10] Kalatian, Arash and Shafahi, Yousef, “Travel mode detection exploiting

cellular network data,” MATEC Web Conf., vol. 81, p. 03008, 2016.

[11] N. Breyer, D. GundlegÃ¥rd, and C. Rydergren, “Travel mode classi-

ﬁcation of intercity trips using cellular network data,” Transportation

Research Procedia, vol. 52, pp. 211–218, 2021. 23rd EURO Working

Group on Transportation Meeting, EWGT 2020, 16-18 September 2020,

Paphos, Cyprus.

[12] L. Novaˇ

cko, L. Šimunovi´

c, and D. Krasi´

c, “Estimation of

origin-destination trip matrices for small cities,” Promet -

Trafﬁcamp;Transportation, vol. 26, pp. 419–428, Oct. 2014.

[13] H. Fanaee-T and J. Gama, “Event detection from trafﬁc tensors: A hybrid

model,” Neurocomputing, vol. 203, pp. 22–33, 2016.

[14] M. Hubert and E. Vandervieren, “An adjusted boxplot for skewed

distributions,” Computational Statistics Data Analysis, vol. 52, no. 12,

pp. 5186 – 5201, 2008.

[15] S. Oh, Y.-J. Byon, and H. Yeo, “Improvement of search strategy

with k-nearest neighbors approach for trafﬁc state prediction,” IEEE

Transactions on Intelligent Transportation Systems, vol. 17, no. 4,

pp. 1146–1156, 2016.

[16] D. Tong, Y. R. Qu, and V. K. Prasanna, “Accelerating decision tree

based trafﬁc classiﬁcation on fpga and multicore platforms,” IEEE

Transactions on Parallel and Distributed Systems, vol. 28, no. 11,

pp. 3046–3059, 2017.

[17] N. Dogru and A. Subasi, “Trafﬁc accident detection using random forest

classiﬁer,” in 2018 15th Learning and Technology Conference (L T),

pp. 40–45, 2018.

[18] S. Agarwal, P. Kachroo, and E. Regentova, “A hybrid model using

logistic regression and wavelet transformation to detect trafﬁc incidents,”

IATSS Research, vol. 40, no. 1, pp. 56–63, 2016.

[19] J. Zhang, C. Chen, Y. Xiang, W. Zhou, and Y. Xiang, “Internet trafﬁc

classiﬁcation by aggregating correlated naive bayes predictions,” IEEE

Transactions on Information Forensics and Security, vol. 8, no. 1, pp. 5–

15, 2013.

[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,

O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Van-

derplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and Édouard

Duchesnay, “Scikit-learn: Machine learning in python,” Journal of

Machine Learning Research, vol. 12, no. 85, pp. 2825–2830, 2011.

[21] L. Tišljari´

c, D. Cvetek, and V. Vareški´

c, “Transport Mode Classiﬁcation.”

https://github.com/tisljaricleo/transport-mode-classiﬁcation, 2021.

63rd International Symposium ELMAR-2021, 13-15 September 2021, Zadar, Croatia

177

Authorized licensed use limited to: University of Zagreb: Faculty of Electrical Engineering and Computing. Downloaded on October 05,2021 at 11:35:13 UTC from IEEE Xplore. Restrictions apply.

Intercity Traffic Travel Mode Identification Method Based on Mobile Signalling Data

Article

May 2024

With the popularity of mobile devices, the signalling data generated by them provides significant opportunities for studying intercity travel behaviour in terms of data scale and information continuity. However, due to the low quality of the data in spatial accuracy, temporal frequency, and traffic semantics, the accuracy of identifying individual travel modes is low and it is difficult to extend to complex traffic scenarios. In this paper, we propose a new framework for identifying individual intercity travel modes based on mobile signalling data. The framework includes components for data pre-processing, geo-information mapping, feature and attribute extraction, and travel mode recognition. We utilize a comprehensive detection model to identify users’ multimodal intercity transport behaviour. Using two modules, Random Forest Embedding (RFE) and Bidirectional Long Short-Term Memory (Bi-LSTM), the model can capture the spatiotemporal characteristics and complex multi-stage associations in intercity travel chains. A large-scale mobile phone dataset from Jiangsu Province, China, was used for verification. The results showed that, on average, the method was able to detect travel mode with 92% accuracy. This study provides valuable support for further research on individual travel behaviour and the enhancement of transportation planning.

Variable Speed Limit and Ramp Metering for Mixed Traffic Flows: A Review and Open Questions

Article

Full-text available

Mar 2021

The trend of increasing traffic demand is causing congestion on existing urban roads, including urban motorways, resulting in a decrease in Level of Service (LoS) and safety, and an increase in fuel consumption. Lack of space and non-compliance with cities’ sustainable urban plans prevent the expansion of new transport infrastructure in some urban areas. To alleviate the aforementioned problems, appropriate solutions come from the domain of Intelligent Transportation Systems by implementing traffic control services. Those services include Variable Speed Limit (VSL) and Ramp Metering (RM) for urban motorways. VSL reduces the speed of incoming vehicles to a bottleneck area, and RM limits the inflow through on-ramps. In addition, with the increasing development of Autonomous Vehicles (AVs) and Connected AVs (CAVs), new opportunities for traffic control are emerging. VSL and RM can reduce traffic congestion on urban motorways, especially so in the case of mixed traffic flows where AVs and CAVs can fully comply with the control system output. Currently, there is no existing overview of control algorithms and applications for VSL and RM in mixed traffic flows. Therefore, we present a comprehensive survey of VSL and RM control algorithms including the most recent reinforcement learning-based approaches. Best practices for mixed traffic flow control are summarized and new viewpoints and future research directions are presented, including an overview of the currently open research questions.

A Survey of Methods and Technologies for Congestion Estimation Based on Multisource Data Fusion

Article

Full-text available

Mar 2021

Traffic congestion occurs when traffic demand is greater than the available network capacity. It is characterized by lower vehicle speeds, increased travel times, arrival unreliability, and longer vehicular queueing. Congestion can also impose a negative impact on the society by decreasing the quality of life with increased pollution, especially in urban areas. To mitigate the congestion problem, traffic engineers and scientists need quality, comprehensive, and accurate data to estimate the state of traffic flow. Various types of data collection technologies have different advantages and disadvantages as well as data characteristics, such as accuracy, sampling frequency, and geospatial coverage. Multisource data fusion increases the accuracy and provides a comprehensive estimation of the performance of traffic flow on a road network. This paper presents a literature overview related to the estimation of congestion and prediction based on the data collected from multiple sources. An overview of data fusion methods and congestion indicators used in the literature for traffic state and congestion estimation is given. Results of these methods are analyzed, and a disseminative analysis of the advantages and disadvantages of surveyed methods is presented.

Estimating congestion zones and travel time indexes based on the floating car data

Article

Full-text available

Mar 2021
COMPUT ENVIRON URBAN

Efficiently predicting traffic congestion benefits various traffic stakeholders, from regular commuters and logistic operators to urban planners and responsible authorities. This study aims to give a high-quality estimation of traffic conditions from a large historical Floating Car Data (FCD) with two main goals: (i) estimation of congestion zones on a large road network, and (ii) estimation of travel times within congestion zones in the form of the time-varying Travel Time Indexes (TTIs). On the micro level, the traffic conditions, in the form of speed profiles were mapped to links in the road network. On the macro level, the observed area was divided into a fine-grained grid and represented as an image where each pixel indicated congestion intensity. Spatio-temporal characteristics of congestion zones were determined by morphological closing operation and Monte Carlo simulation coupled with temporal clustering. As a case study, the road network in Croatia was selected with spatio-temporal analysis differentiating between the summer season and the rest of the year season. To validate the proposed approach, three comparisons were conducted: (i) comparison to real routes' travel times driven in a controlled manner, (ii) comparison to historical trajectory dataset, and (iii) comparison to the state-of-the-art method. Compared to the real measured travel times, using zone's time-varying TTIs for travel time estimation resulted in the mean relative percentage error of 4.13%, with a minor difference to travel times estimated on the micro level, and a significant improvement compared to the current Croatian industrial navigation. The results support the feasibility of estimating congestion zones and time-varying TTIs on a large road network from FCD, with the application in urban planning and time-dependent routing operations due to: significant reduction in the data volume without notable quality loss, and meaningful reduction in the pre-processing computation time.

Travel mode classification of intercity trips using cellular network data

Article

Full-text available

Jan 2021

Many applications in transport planning require an understanding of travel patterns separated by travel mode. To use cellular network data as observations of human mobility in these applications, classification by travel mode is needed. Existing classification methods for GPS-trajectories are often inefficient for cellular network data, which has lower resolution in space and time than GPS data. In this study, we compare three geometry-based mode classification methods and three supervised methods to classify trips extracted from cellular network data in intercity origin-destination pairs as either road or train. To understand the difficulty of the problem, we use a labeled dataset of 255 trips in two OD-pairs to train the supervised classification methods and to evaluate the classification performance. For an OD-pair where the road and train routes are not separated by more than four kilometers, the geometry-based methods classify 4.5% - 7.1% of the trips wrong, while two of the supervised methods can classify all trips correctly. Using a large-scale dataset of 29037 trips, we find that separation between classes is less evident than in the labeled dataset and show that the choice of classification methods impacts the aggregated modal split estimate.

Traffic State Estimation and Classification on Citywide Scale Using Speed Transition Matrices

Article

Full-text available

Sep 2020

The rising need for mobility, especially in large urban centers, consequently results in congestion, which leads to increased travel times and pollution. Advanced traffic management systems are being developed to take the advantage of increased mobility positive effects and minimize the negative ones. The first step dealing with congestion in urban areas is the detection of congested areas and the estimation of the congestion level. This paper presents a a method for a traffic state estimation on a citywide scale using the novel traffic data representation, named Speed Transition Matrix (STM). The proposed method uses traffic data to extract the STMs and to estimate the traffic state based on the Center Of Mass (COM) computation for every STM. The COM-based approach enables the simplification of the clustering process and provides increased interpretability of the resulting clusters. Using the proposed method, traffic data is analyzed, and the traffic state is estimated for the most relevant road segments in the City of Zagreb, which is the capital and the largest city in Croatia. The traffic state classification results are validated using the cross-validation method and the domain knowledge data with the resulting accuracy of 97% and 91%, respectively. The results indicate the possible application of the proposed method for the traffic state estimation on macro-and micro-locations in the city area. In the end, the application of STMs for traffic state estimation, traffic management, and anomaly detection is discussed.

Analysis of Correlation Between Dwell Time Measured Using Bluetooth Detector and Occupancy

Conference Paper

Full-text available

Sep 2019

Classification of Travel Modes Using Streaming GNSS Data

Article

Full-text available

Jan 2019

Over the last decade, smartphones became a valuable source of traffic data. GNSS data and other data from smartphone sensors can be successfully used in travel mode classification. Travel mode classification data are a significant source of information for various applications such as travel planning, urban road operations or user behavior understanding. Today, the availability of access to real-time data streams makes fast and real-time classification of travel modes possible. Because of different characteristics of data streams, the applied classification method has to be adjusted to the particular data stream. In this paper two classification methods, k Nearest Neighbors and Random Forest, are compared with emphasis on accuracy. First, they are applied for classification of travel modes using a static GNSS dataset, and afterward using streaming GNSS data. For the purpose of classification, characteristic distribution of velocity and acceleration for different travel modes is determined. Regarding streaming GNSS data, the influence of the window size on the classification accuracy is analyzed. Obtained results show that both classification methods can be successfully applied for the classification of travel modes.

Traffic accident detection using random forest classifier

Conference Paper

Feb 2018

Cellpath Routing and Route Traffic Flow Estimation Based on Cellular Network Data

Article

Nov 2017

The signaling data in cellular networks provide means for analyzing the use of transportation systems. We propose methods that aim to reconstruct the used route through a transportation network from call detail records (CDRs) which are spatially and temporally sparse. The route estimation methods are compared based on the individual routes estimated. We also investigate the effect of different route estimation methods when employed in a complete network assignment for a larger city. Using an available CDR dataset for Dakar, Senegal, we show that the choice of the route estimation method can have a significant impact on resulting link flows.

Accelerating Decision Tree Based Traffic Classification on FPGA and Multicore Platforms

Article

Jun 2017

Machine learning (ML) algorithms have been shown to be effective in classifying a broad range of applications in the internet traffic. In this paper, we propose algorithms and architectures to realize online traffic classification using flow level features. First, we develop a traffic classifier based on C4.5 decision tree algorithm and Entropy-MDL (Minimum Description Length) discretization algorithm. It achieves an overall accuracy of 97.92% for classifying eight major applications. Next we propose approaches to accelerate the classifier on FPGA (Field Programmable Gate Array) and multicore platforms. We optimize the original classifier by merging it with discretization. Our implementation of this optimized decision tree achieves 7500+ Million Classifications Per Second (MCPS) on a state-of-the-art FPGA platform and 75-150 MCPS on two state-of-the-art multicore platforms. We also propose a divide and conquer approach to handle imbalanced decision trees. Our implementation of the divide-and-conquer approach achieves 10000+ MCPS on a state-of-the-art FPGA platform and 130-340 MCPS on two state-of-the-art multicore platforms. We conduct extensive experiments on both platforms for various application scenarios to compare the two approaches.

Classification of Travel Modes from Cellular Network Data Using Machine Learning Algorithms

Abstract and Figures

Recommended publications

Traffic Equilibrium in a stochastic transportation network

Speed Transition Matrix Feature Extraction for Traffic State Estimation Using Machine Learning Algor...

Fuzzy Inference System for Congestion Index Estimation Based on Speed Probability Distributions

Spatiotemporal Road Traffic Anomaly Detection: A Tensor-Based Approach

Motorway Bottleneck Probability Estimation in Connected Vehicles Environment Using Speed Transition...