ThesisPDF Available

Spatio-temporal analysis and machine learning for traffic speed forecasting


Abstract and Figures

Advances in machine learning, computational power and the explosion of traffic data are driving the development and use of Intelligent Transport Systems. Real- time road traffic forecasting is a fundamental element required to make effective use of these Intelligent Traffic Systems. Traffic modelling techniques have moved beyond traditional statistical approaches towards data-driven and computational intelligence methods. Often still overlooked is the spatial structure of traffic data and the impact of road network topology on traffic flows. In order to develop effective forecasting methods understanding the autocorrelation structure among space-time observations is crucial. This paper presents methods for quantifying the complex spatio-temporal relationships present in transportation network data and assess the impact of upstream and downstream congestion propagation. Additionally, introduces machine learning techniques for the forecasting of traffic speeds in a supervised learning task where, by leveraging spatial and temporal characteristics of dynamic traffic flows, the LightGBM algorithm is shown to produce state-of-the-art performance on real-world motorway traffic data with predictions up to 30 minutes in advance, outperforming other established forecasting techniques.
Content may be subject to copyright.
Department of Geography and Planning
School of Environmental Sciences
Spatio-temporal analysis and
machine learning for trac speed
Leo McCarthy
Dissertation submitted in partial fulfilment of the degree
MSc Geographic Data Science
September 2020
Word count: 10,175
I declare that the material presented for this dissertation is entirely my own work and
has not previously been presented for the award of any other degree of any institution
and that all secondary sources used are properly acknowledged
September 15, 2020
Advances in machine learning, computational power and the explosion of trac
data are driving the development and use of Intelligent Transport Systems. Real-
time road trac forecasting is a fundamental element required to make eective
use of these Intelligent Trac Systems. Trac modelling techniques have moved
beyond traditional statistical approaches towards data-driven and computational
intelligence methods. Often still overlooked is the spatial structure of trac data
and the impact of road network topology on trac flows. In order to develop
eective forecasting methods understanding the autocorrelation structure among
space-time observations is crucial. This paper presents methods for quantifying the
complex spatio-temporal relationships present in transportation network data and
assess the impact of upstream and downstream congestion propagation. Addition-
ally, introduces machine learning techniques for the forecasting of trac speeds in a
supervised learning task where, by leveraging spatial and temporal characteristics
of dynamic trac flows, the LightGBM algorithm is shown to produce state-of-
the-art performance on real-world motorway trac data with predictions up to 30
minutes in advance, outperforming other established forecasting techniques.
List of Figures 4
List of Tables 4
1 Introduction 5
2 Literature Review 6
2.1 Trac data .................................. 7
2.2 Space-time autocorrelations ......................... 8
2.3 Trac modelling ............................... 9
2.3.1 ARIMA ................................ 9
2.3.2 OLS .................................. 10
2.3.3 Nonparametric machine learning .................. 11
2.3.4 Recurrent Neural Networks ..................... 11
2.3.5 Gradient Boosting .......................... 12
2.3.6 Developing the current literature .................. 13
3 Methods and Data 14
3.1 Data ...................................... 14
3.1.1 Preprocessing ............................. 15
3.1.2 Overview ............................... 16
3.2 Space-time autocorrelations ......................... 19
3.2.1 Temporal ............................... 19
3.2.2 Spatio-temporal ........................... 20
3.3 Experimental set up ............................. 20
3.4 Model specification and tuning ....................... 21
3.4.1 OLS .................................. 22
3.4.2 LightGBM .............................. 23
3.4.3 LSTM ................................. 24
3.5 Evaluation metrics .............................. 25
3.6 Implementation ................................ 25
4Results 26
4.1 Space-time autocorrelations ......................... 26
4.2 Modelling ................................... 27
5 Discussion 30
5.1 Literature implications ............................ 31
5.2 Real world applications ........................... 32
5.3 Limitations .................................. 32
5.4 Future work .................................. 33
6 Conclusion 33
7 Bibliography 35
List of Figures
1 Neural Network architectures ........................ 12
2 Decision tree example ............................ 13
3 Sensor locations ................................ 15
4 Heat-map of trac speeds for all sensors on 2019-09-06 ......... 16
5 Average trac speeds across each day of the week ............ 17
6 Average trac speeds by time of day for each sensor ........... 17
7 Trac speeds on 2019-09-06 for each sensor ................ 18
8 Box plot of average daily trac speeds across all sensors ......... 19
9 Input windows for time series prediction .................. 21
10 Train validation and test procedure for modeling ............. 22
11 Gradient boosted decision tree learning process .............. 23
12 Structure of a general LSTM network layer ................ 24
13 Implemented LSTM network structure ................... 25
14 Temporal autocorrelations .......................... 26
15 CCF coecients for all times ........................ 27
16 CCF coeecients for afternoon peak hours ................ 27
17 MAE for all predictions in each hour of the day .............. 30
18 Actual and predictied trac speed values for LightGBM and LSTM models 30
List of Tables
1 Four model Input vectors taken over 6 previous time steps ........ 21
2 LightGBM hyperparameters ......................... 23
3 MAE and RMSE scores for target sensor M60/9223A at time t+1 ... 28
4 MAE and RMSE scores for target sensor M60/9223A at time t+2 ... 29
1 Introduction
The prediction of future trac characteristics such as flow, travel time and speed have
been a key component of Intelligent Transport Systems (ITS) for the past couple of
decades. Accurate forecasting of future trac conditions has benefits for both road
users and trac authorities alike. The broad aims of Intelligent Transport Systems are
to enable road users and administrators to be better informed and make safer and more
ecient use of transport networks. Specifically, the foremost consideration of almost
all trac management systems is reducing the high levels of congestion that generate
economic, social and environmental issues in many cities around the world (Knittel et al.
The rapid development of technologies that allow for the capture and storage of data
describing the prevailing trac conditions such as sensors, radars, inductive loops and
GPS (Leduc 2008) has resulted in increasingly sophisticated ITS. Real time trac infor-
mation is vital for eective and swift trac management allowing trac authorities to
actively respond to prevailing trac conditions in order to reduce congestion, provide
information to road users and deal with collisions, it is a key element in many of the
operational components of ITS such as Advanced Traveller Information System (ATIS)
and Dynamic Trac Assignment (DTA). In the past, much of the focus from trac
managers was on reactive trac measures based on current trac conditions, dealing
with congestion once it had developed with no attempt to anticipate. Now that many
authorities possess real time trac data feeds and subsequently large volumes of his-
torical trac data, the focus has switched towards developing methods for forecasting
trac conditions ahead of time, allowing for proactive trac measures based on ex-
pected trac conditions rather than making decisions based on a trac state soon to
be obsolete.
The past decade has seen a significant uptick in research surrounding the develop-
ment of predictive models for trac state predictions ahead of time. Currently there
is far from any consensus as to the most eective methods, this is in part due to the
variability in problem framing. Given there is no universal standard in trac data col-
lection and aggregation techniques there is huge dierences in structure across trac
data sets with dierences in data size, type, frequency and the prediction horizon.
Given the increasing variety and huge volume of trac data being collected, more re-
cent years have seen a move away from trac theory and towards more data driven and
machine-learning methods in an attempt to develop fast, accurate and scalable models.
Initial attempts mainly involved the use of parametric univariate time series models such
as Autoregressive Integrated Moving Average method (ARIMA) and Seasonal Autore-
gressive Integrated Moving Average method (SARIMA) (Williams & Hoel 2003). Many
of these models however have shown shortcomings in generating trac predictions which
has led to the wider use of nonparametric machine-learning (ML) or deep-learning (DL)
based models which form the basis for most of the current literature. These are advan-
tageous for a few dierent reasons, ML and DL models are much better at extracting
non-linear relationships within the data, they can handle very high-dimensional data
and co-linear features more eectively and make no prior assumptions regarding the
normality of the data distribution, all of these factors culminating in more accurate
and robust predictions. There are several examples of ML applied to trac forecasting
such as Support Vector Regression (Castro-Neto et al. 2009) and K-nearest-neighbour
(Sun et al. 2018). The most current literature is largely centred around Recurrent Neu-
ral Networks (RNN) and specifically Long-Short Term Memory (LSTM) networks, the
main reason behind their usage is their high representational power and ability to cap-
ture short and long temporal dependencies of trac data. Gradient Boosting and in
particular ecient implementations of Gradient Boosted Decision Trees (GBDT) such
as LighGBM and XGBoost are hugely popular amongst data science practitioners and
in competitive data science events such as those hosted on Kaggle. They have shown
state-of-the-art performance on a range of tasks including time-series prediction and
applications to spatio-temporal data (Bojer & Meldgaard 2020). Despite this, there has
been limited attempt to apply them to the problem of trac forecasting, this paper will
attempt to bridge that gap by implementing LightGBM for trac prediction and com-
paring results to both OLS regression and LSTM - the most popular Neural Network
architecture for trac forecasting.
The First Law of Geography (Tobler 1970) states “everything is related to everything
else, but near things are more related than distant things.” This would imply that trac
speeds at a given location are not only aected by the trac state at previous points
in time but also by trac states at other nearby locations, given the connected nature
of road networks. Shockwave theory has long been used to model the downstream to
upstream progression of queues (Richards 1956) and Several studies have shown trac
networks to exhibit both spatial and temporal patterns (Cheng et al. 2012)whichmust
be carefully considered in any modelling framework. Recent studies have shown the im-
provement of predictions when of including upstream or downstream trac information
(Chandra & Al-Deek 2009). The importance of space and network topology is clear,
however is often overlooked and is still an area that requires further development in the
domain of trac forecasting (Vlahogianni et al. 2014). More recently attempts have
been made to integrate both spatial and temporal dependence in the modelling process
(Luo et al. 2019). Furthermore, studies have shown that the inclusion of exogenous
variables such as current weather conditions, incident reports and geo-tagged tweets can
improve prediction accuracy (Essien et al. 2020). This paper will attempt to quantify
the importance of spatial variables while also exploring the eectiveness of state-of-the-
art machine leaning models for trac prediction.
The main aims of this dissertation can be summarised as follows:
To identify and quantify the relevant spatio-temporal patterns present in trac
speed data.
To develop trac speed prediction models integrating both spatial and tempo-
ral elements of trac speed data. Comparing statistical and machine learning
Asses the importance of exploiting the spatial structure of trac data on model
The remainder of this thesis is organised as follows. Section 2 presents an overview of
the methodologies prevalent in the current trac forecast literature. Section 3 provides
a description of the data and outlines the experimental setup and model specifications.
Section 4 Presents the findings. Section 5 discusses the findings and their implications.
Section 6 concludes the paper.
2 Literature Review
There is a significant amount of literature relating to short term trac forecasting. The
continually growing urban population places ever greater emphasis on reliable trac
management and in turn trac prediction methods. Generally, trac forecasting meth-
ods can be divided into two main strands. The first approach is one based on explicit
modelling of the trac system itself and carrying out simulations incorporating param-
eters such as trac flow, signal controls and driver behaviour. Much of this work is
grounded in classical trac flow theory (Drew 1968) macroscopic trac models define
explicit mathematical equations to describe the relationship between variables such as
trac flow, density, congestion-shockwave and singling. The main advantages of these
trac simulation approaches is that “they allow inclusion of trac control measures
(ramp metering, routing, trac lights, and even trac information) in the prediction,
and that they provide full insight into the locations and causes of possible delays on the
road network of interest” (Lint 2004). However, there are a number of drawbacks to
this approach such as the computational complexity of parameter calibration and also
their lack of flexibility. These models require a-priori knowledge of trac processes and
their predictive quality relies on the quality of its inputs which must be determined “by
hand”, they cannot learn the latent trac patterns from given data.
This paper focuses on the second main approach, statistical and data driven methods
including machine learning (ML). Machine Learning methods take no prior knowledge
of the trac system and aim to define functional approximations between input data
and output data (Hastie et al. 2009), learning the relationships in the trac data to
predict variables such as speed and flow etc. Activity in this research area has been
expedited in recent years due to improved data collection methods resulting in large,
rich data sets which has occurred in parallel to the rapid advancement of technology and
available computational power making it possible to implement advanced data driven
methodologies on big data sets. Despite this, there is still considerable variety in applied
methods and large areas for development within this research area as highlighted in
recent reviews of the research field (Vlahogianni et al. 2014). One such area is the
understanding and eective incorporation of space-time autocorrelations in trac data.
This section will provide an overview of the relevant literature with regards to trac
forecasting methods with particular emphasis on exploiting the latent spatial structure
of the underlying trac network data to improve forecasting models.
2.1 Trac data
Most of the recent developments in trac forecasting have come about as a result of
data intensive, as opposed to analytical, approaches and thus their eectiveness is highly
reliant on the prevalence, quality and resolution of available trac data (Vlahogianni
et al. 2004). Given the enormous undertaking it would be for a researcher to collect task
specific trac data, the development of trac forecasting methods almost always relies
on data already available, collected by transport authorities or previous large research
projects. The two main data characteristics to be considered for trac forecasting
problems are data resolution and the measured parameters. Resolution of trac data
can vary greatly, with trac state readings being made as often as every few seconds
(Ye et al. 2012) to as infrequently as every hour (Hong 2012). The forecasting step is
equivalent to the resolution of the data, for example for data collected every 15 minutes
one forecasting step would be 15 minutes in advance of the current time step. The
forecasting horizon is a related concept and refers to the specific time horizon in the
future for which we want predictions to be made and can include several forecasting
steps depending on the data resolution. For instance, Ishak & Al-Deek (2002) and
Liu et al. (2019) conclude that prediction accuracy becomes increasingly more dicult
as the prediction horizon increases. However as Vlahogianni et al. (2014) notes, the
usefulness of a prediction model degrades given too short a prediction horizon, as for
the majority of trac management applications a reasonable amount of notice is required
for proactive steps to be put in place or information to be communicated to road users.
It is also the case that particularly short time steps make predictions dicult due to large
fluctuations in readings from one step to the next (Smith & Demetsky 1997), because
of this, oftentimes high resolution data is aggregated for easier modelling (Florio &
Mussone 1996).
There are several methods of data collection for trac data, the most common by
some way is trac sensors (Coleri et al. 2004). These can oer measurements for a
range of trac characteristics such as speed, flow, occupancy and travel times. There
is no real consensus in the literature on which variable is more suitable in describing
underlying trac conditions but Vlahogianni et al. (2014) found that the most com-
mon characteristics chosen for prediction where flow and travel time. There have also
been attempts to forecast dierent measures simultaneously such as (Florio & Mussone
1996) who tried to predict the evolution of trac flow, density and speed over 10 min.
However Innamaa (2005) found that that one single model predicting two variables gave
worse forecasts than two separate models for speed and flow. In addition to this, some
researchers have found success in incorporating exogenous variables in the modelling
process. These can be things such as weather conditions and trac incident updates
mined from twitter as done by Essien et al. (2020) ,or in the case of Xia et al. (2019)
contextual temporal information such as the day of the week or whether it is a national
Defining the appropriate data resolution and measurement parameters are of partic-
ular importance for data driven methods as it aects the quality of information about
trac conditions lying in the data (Vlahogianni et al. 2004). Poor or inappropriate
data for the task at hand could result in a poor model; the “Garbage in, garbage out”
2.2 Space-time autocorrelations
Whilst many traditional approaches to trac forecasting consider it is a stationary and
single point time series problem, the most current literature makes it clear that it is of
paramount importance to consider both spatial and temporal dimensions of the data in
order to develop accurate trac forecasting models. So first provided is an overview of
the literature regarding autocorrelations in space-time and their presence across trac
networks. (Temporal) Autocorrelation is a term most prevalent in time series analysis
and signal processing, it is a mathematical representation of the degree of similarity
between a given time series and a lagged version of itself over successive time intervals.
However, it can also be extended to the spatial dimension hence the term “spatial
autocorrelation” which has groundings in the First Law of Geography (Tobler 1970)
and can be defined as follows. “Spatial autocorrelation is the correlation among values
of a single variable strictly attributable to their relatively close locational positions on
a two-dimensional surface, introducing a deviation from the independent observations
assumption of classical statistics” (Grith 1987). Until recently the spatial component
of autocorrelations in trac data were overlooked, however a number of studies, most
notably Cheng et al. (2012) highlighted the extent of spatial autocorrelation in road
network data and its potential importance to trac forecasting. Various indices exist
in the literature to measure autocorrelations in both space and time. Most of which are
derived from Pearson’s product moment correlation coecient (PMCC) (Soper et al.
1917) which given two independent variables X and Y is defined as their covariance over
the product of their standard deviations or the equation:
X,Y =E[(X¯x)(Y¯x)]
Where ¯xyand Y,Yare the means and standard deviations of variables X and
Yrespectively. ThecoecientX,Y is used as a measure of the strength of linear
association between variables and can fall in the range 1 to 1 with 1 indicating perfect
positive correlation, 1 perfect negative correlation and 0 no correlation. Temporal
autocorrelation can be calculated by taking a lagged specification of PMCC so rather
than calculating the correlation between two variables it is calculated between a variable
at time tand the same variable at time tkfor some k.
Moran’s I is a measure commonly found in the broader geospatial literature and is
an extension of PMCC to the spatial dimension (Moran 1950), its more complicated
than temporal autocorrelation in that it can occur in any direction. Moran’s I however,
does not consider the temporal dimension so cannot capture dynamic spatio-temporal
correlation properties. Thus, adaptations are required to capture spatial and temporal
correlations simultaneously. A global Space-Time Autocorrelation Function (ST-ACF)
is proposed by Pfeifer & Deutrch (1980) and implemented in the domain of network
flow data by Grith (1987), this measure measures the cross-covariances between all
possible pairs of locations lagged in both time and space. STACF has been used in Space-
Time Autoregressive Integrated Moving Average (STARIMA) modelling to determine
the range of spatial neighbourhoods that contribute to the current location measurement
at a given time lag (Pfeifer & Deutrch 1980)
As is the case with Morans, the STACF has a local variant. Local spatio-temporal
correlations can be quantified using the Cross-Correlation Function (CCF) (see Box et al.
(2011)). This treats two time series in separate spatial locations as a bivariate stochastic
process and measures the cross-covariance coecients between each series at specified
time lags. Yu e & Yeh (2008) show that the CCF can eectively measure spatio-temporal
dependencies between a road sensor and its neighbours and can be used for analysing
the forecast-ability of a given trac data set. For example, a significant positive CCF
coecient between sensor Aat time tand downstream sensor Bat given time tk
would suggest a back propagation of congestion from Bto Aover the time period k.
Despite the empirical evidence provided by Yue & Ye h (2008) and Cheng et al. (2012)
that the CCF is a sound method of measuring spatio-temporal relationships between
neighbouring road links it has limited exposure in the field of trac forecasting where
it could be used for determining appropriate modelling approaches or selecting spatial
model-input features. In this study the CCF will be implemented to determine the
extent of local spatio-temporal autocorrelation and their dynamic behaviour, the exact
specification with respect to this data set will be provided in the methods section.
2.3 Trac modelling
Trac forecasting has evolved a great deal over the past couple of decades however
there is still no unified modelling approach. This section will provide an overview of the
most used prediction modelling techniques both traditionally and in the most current
literature. The data-driven forecasting methods can broadly be divided into two distinct
categories: Statistical parametric approaches and non-parametric machine learning.
2.3.1 ARIMA
Parametric models have a fixed and finite number of parameters, the number determined
prior to model fitting. One of the most popular parametric models for time series predic-
tion both within trac forecasting and across other disciplines is the Auto Regressive
Integrated Moving Average (ARIMA) model. ARIMA uses a statistical approach to
identify recurring patterns of dierent periodicity from previous observations of a time
series. It can be defined by the following dierencing equation:
Yt=c+1yt1+2yt2+... +pytp+1t1+... +tq+t(2)
This is an ARIMA(p, d, q)modelwhere:
yt1...ytprepresents the previous pvalues general time series. This is the autoregres-
sive part of the model
Ytrepresents the forecast for yat time t
t...tqis zero mean white noise or the qmoving average terms
and are parameters to be determined by model fitting
dis the order of dierencing applied of the data set as the time series must be stationary
for the application of ARIMA.
The first recorded application of the ARIMA model in the transport forecast literature
was by Ahmed & Cook (1979) in which an ARIMA(0,1,3) model was proposed to predict
trac flow and occupancy on freeways, where it was shown to be more accurate that
previous smoothing techniques. A primary assumption made by this model is the sta-
tionarity of the mean, variance, and autocorrelation. Thus, a major criticism directed
toward ARMA models concerns their tendency to perform poorly during abnormal traf-
fic states and be unable to predict extreme values in time–series (Vlahogianni et al.
2004). Furthermore, the univariate nature of ARIMA means that the integration of
spatial or contextual features is not possible.
2.3.2 OLS
Other Linear methods include Ordinary Least Squares (OLS) regression and its exten-
sion Geographically Weighted Regression as applied by Gebresilassie (2017). OLS allows
the simple integration of multiple input variables and additional features such as multi-
ple sensor measurements across a network or contextual features. OLS regression aims
to find a linear relationship between input and output variables and an OLS model is
defined as :
yi.=0+1x1+2x2+... +nxn+i(3)
Where yiis the response variable, xi:i=1,2,..,n are the nindependent predictor
variables, is a random error term. 0is the model constant (intercept) and i:i=
1,2,..,n are the model parameters to be determined using the Ordinary Least Squares
estimator which is defined by the equation:
Parametric approaches have favourable statistical properties and capture regular varia-
tions very well. However, find prediction dicult given more volatile trac states such
as when a trac accident or severe congestion occurs (Smith et al. 2002). Further to this
the aforementioned parametric models make the assumption of linearity between input
and response variables which makes them sub-optimal when considering the complex
spatio-temporal structure and non-linear relationships in trac data.
2.3.3 Nonparametric machine learning
Nonparametric models function in a fundamentally dierent way to parametric mod-
els. Nonparametric models do not assume that the structure of a model is fixed and in
most cases the model complexity grows to accommodate the complexity of the data, in
other words the model structure is “learned” from the data (Bishop 2006). Compared
to parametric methods, nonparametric methods are believed to oer more flexibility in
fitting the nonlinear characteristics and are better able to process noisy data (Karlaftis
& Vlahogianni 2011a). These methods include algorithms such as K-Nearest Neighbour
(KNN) Support Vector Regression (SVR) and Artificial Neural Networks (ANN). There
have been several attempts to implement machine learning models for trac forecasting
(Sun et al. 2018, Castro-Neto et al. 2009, Essien et al. 2020) and even hybrid combi-
nations of these models (Wang et al. 2015, Luo et al. 2019). These non-linear, flexible
and multivariate models have proven to show greater accuracy in trac forecasting than
parametric models and are better able to leverage the spatio-temporal structure of the
trac data (Vlahogianni et al. 2014).
2.3.4 Recurrent Neural Networks
Recurrent Neural Networks (RNN) are a subset of Artificial Neural Networks (ANN)
that aim to preserve the temporal dimension of time series data using a hidden recurrent
state. They are seen as the most appropriate ANN architecture for series prediction
problems and have show success in trac forecasting problems as shown by Cheng et al.
(2018) when using the RNN model to predict the trac flow in the case of special
events. In conventional ANN’s there are only full connections between adjacent layers,
but no connection among the nodes of the same layer. As Zhao et al. (2017) explains
the ANN may be sub-optimal when dealing with the spatio-temporal problems, because
there are always interactions among the nodes in spatio-temporal network. Dierent
from conventional networks, the hidden units in RNN receive a feedback which is from
the previous state to current state (Fig 1) allowing them to understand relationships
between states i.e. time steps.
(a) Simple feed-forward Artificial Neural Network
(b) Recurrent Neural Network (Essien et al. 2020)
Figure 1: Neural Network architectures
RNN’s can however often struggle with long term dependencies and are unable to
process a large number of time lags due to correlations caused by periodic or seasonal
structures that are often seen in spatio-temporal time series. For this reason, most trac
forecast literature now implements the Long Short-Term Memory (LSTM) (Hochreiter
&Schmidhuber1997) adaptation of the RNN. The LSTM network is a special kind of
RNN. By treating the hidden layer as a memory unit, LSTM networks can cope with
the both short and long-term correlations. LSTM neural networks have found great
popularity in trac forecasting, many researchers have applied the algorithm to spatio-
temporal trac data with great success, where it has been found to outperform other
popular techniques including ARIMA, Support Vector Machines and Random Forests.
(Yang et al. 2019, Zhao et al. 2017), it is probably the most popular single ML algorithm
in the current trac forecasting literature.
2.3.5 Gradient Boosting
Gradient Boosting and in particular Gradient Boosted Decision Trees (GBDT) is a tree
based machine learning model that is becoming increasingly popular due to their fast
implementations such as LightGBM (Ke et al. 2017) and XGBoost (Chen & Guestrin
2016) and also there state-of-the-art performance across a broad range of problems.
Figure 2: Decision tree example
GBDT works by combining many decision trees (Fig 2) with low accuracy into a model
which has significantly higher accuracy than its base learners. It is an additive model
with the general expression:
Where ˆyiis the response variable given input vector xi.Kis the number of decision trees
and fis a function of the function space Fwhich contains all possible decision trees. A
decision tree (Fig 2) is an algorithm for classification and regression that sequentially
splits the data based on feature values, the process is repeated in a recursive manner on
each successive subset following the split, the end nodes represent the predicted value for
data in that subset. This carries on until splitting no longer adds value to the predictions
(Breiman et al. 1984).
GBDT algorithms, despite their popularity in the Data Mining and ML communities,
have been implemented only a couple of times in the current literature namely Xia et al.
(2019) who implement LightGBM to predict trac characteristics in Shenzhen and find
it significantly outperforms Linear Regression, ANN and ARIMA. Also Mei et al. (2018)
combines both XGBoost and LightGBM for accurate flow predictions on spatio-temporal
2.3.6 Developing the current literature
From this review and also highlighted in the comprehensive review of the research area
by Vlahogianni et al. (2014), it can be seen that historically most eort has been put
into employing univariate statistical models, predicting trac volume or travel time and
using data from single point sources. Currently there is considerably more eort being
applied to incorporate the spatial structure of the data using multivariate methods such
as OLS, LSTM and SVM however often there is less of an attempt to understand how or
why spatial dependencies eect trac forecasting models. Gradient boosting methods
such as LightGBM have largely been overlooked in the literature. Lastly, attempts to
predict trac speed are far less prevalent than other trac characteristics such as flow or
travel time. With this in mind this paper will first attempt to quantify the significance
and nature of local spatial autocorrelations in trac data and then implement three
modelling approaches LSTM, LightGBM and OLS in order to predict trac speeds,
assessing their performance with respect to both each other and also the spatial richness
of various input data.
3 Methods and Data
This section aims to outline the applied methodology and experimental procedure carried
out, guided by the key research themes of this paper. As a reminder these are can be
briefly summarised as follows.
Quantify spatio-temporal relationships in trac data.
Develop and compare trac forecasting models using OLS, LSTM and LightGBM.
Assess the impact of spatial features on model performance.
3.1 Data
The Data used is real world trac data collected by Highways England and can be down-
loaded from their WebTris website 1,a map-based data query service with measurements
collected by road sensors along the Strategic Road Network (SRN).
The selected data set contains measurements for 21 site locations along the M60
motorway in England, in the clockwise direction between J7 and J20 (Fig 3). The data
contains measurements for both trac speed and trac volume, although only speed was
considered for this analysis. The total time period for the data used was between 2019-
05-01 and 2019-10-31 this is a 6-month period (184 days) for which measurements are
available at 15 minute intervals with trac speeds being aggregated over this 15-minute
period. Resulting in 17,664 total speed readings across 21 sites so 370,944 individual
data points.
Figure 3: Sensor locations
3.1.1 Preprocessing
Across the 21 sites there were 1668 missing data points equating to 0.44% of the data,
a very small proportion. For a given time stamp where there existed missing data
points, the missingness was not consistent across all sites, further to this, discarding
entire time steps worth of data can result in inconsistent time steps across the series
and be detrimental to model performance as periodic and seasonal eects are distorted.
Because of this, where data was missing it was imputed with values calculated by taking
the average speed for that sensor location at that particular time of day across the rest
of the data set, in order to maintain day of week and time of day characteristics of trac
3.1.2 Overview
Before carrying out any analysis or modelling, some exploratory work was carried out
with the data in order to better understand and obvious patterns and spot any potential
measurement errors. Figure 4is a heat-map showing a snapshot of average trac speeds
on 2019-09-06. The time is shown on the x-axis and the spatial component is shown on
the y-axis with sensor locations ordered from downstream to upstream or from south-
north as shown on Fig 3. Both spatial and temporal patterns are apparent with a drop-
in speed around peak times and also a speed values closeted based on sensor locations
which can be seen by the vertical strip patterns. The three horizontal strip patterns
(M60/919K, M60/9229L, M60/9298K) represent roads sensors located on entry or exit
slip roads where speed is consistently reduced.
Figure 4: Heat-map of trac speeds for all sensors on 2019-09-06
Figure 5: Average trac speeds across each day of the week
Figure 6: Average trac speeds by time of day for each sensor
Figure 7: Trac speeds on 2019-09-06 for each sensor
Fig 5shows time of day and day-of-week variations in trac speeds across the net-
work. During the working week severe congestion peaks can be seen around the “rush
hour” period with speeds dropping by almost half for 1-2 hours every day. There is a
clear weekend eect with speeds higher across the day and no visible congestion at any
point, Fridays also show less sever and earlier in the day congestion than the rest of the
working weeks. Clearly trac speeds are largely influenced by the commuting patterns
of the working population. Fig 6shows the distribution of aggregated speeds across
the day for each individual sensor, coloured depending on location from the southern-
most sensor progressing downstream. Not all sensors share the same daily patterns with
some exhibiting more severe congestion patterns than others, the spatial dependence is
obvious with closely located pairs of sensors sharing more similarities in trac speeds
compared to sensor pairs more widely spread. The congestion during the PM period
is far more severe than that in the AM period. Meanwhile Fig 7shows a snap shot
of trac speeds by hour taken on 2019-09-06 which highlights the volatility in trac
patterns on a day-to-day basis, the patterns on this particular day dier significantly
from the average patterns over the whole time series as shown in Fig 6. Again shown,
is the strong spatial dependence with abnormal congestion centred around a handful
of adjacent sensors. Congestion events can also extend past an hour by hour basis,
Fig 8shows a box-plot of average speeds across the network by day of the week. The
large interquartile range and number of outliers suggest there can be large variation
in trac speeds not only during days but between days of aggregated speeds as well.
These variations, along with the complex spatial relationships and factors such as un-
expected events, trac accidents, slow moving vehicles, bad weather etc., bring lots of
challenges for robust short-term trac prediction by traditional linear regression models
Figure 8: Box plot of average daily trac speeds across all sensors
3.2 Space-time autocorrelations
3.2.1 Temporal
The standard autocorrelation function is used to calculate the level of correlation be-
tween the speed value at a sensor with a lagged version of itself at pervious time steps,
given a time series y1,y
2...ytat the target sensor, autocorrelation of lag kis described
by the equation:
k=0,±1,±2,... (6)
Where E[(Yt¯x)(Yt+k¯x)] is the covariance between Ytand Ytkand Yt,
sent the standard deviations of Ytand Ytkrespectively. (k) measures the relationship
between ytand ytk.
Also measured is Partial autocorrelation. This is similar to the standard autocorrelation
however accounts for the influence of intermediate values in the time series between yt
and ytkthe Partial Autocorrelation Function is given as:
[Yt|Yt1,Yt1,...,Ytk+1][Ytk|Yt1,Yt1,...,Ytk+1 ]
This gives a clearer understanding between the relationship of a time series when com-
pared with a version of itself that is lagged over multiple time steps, as it removes
information carried by intermediate time steps. This will help us to determine the
feasibility of multi-step forecasts using a single sensor.
3.2.2 Spatio-temporal
The introduction of the spatial element to autocorrelation calculations is brought by
the Cross Correlation Function (CCF). Let Xtdenote the time series of measurements
collected at sensor Xup until time T. Let Let Ytdenote the time series of measurements
collected at a dierent sensor Y, either upstream or downstream, up until time T.The
CCF at lag kis then given by the equation:
X,Y (k)=E[(Xt¯x)(Yt+k¯y)]
k=0,±1,±2,... (8)
Where E[(Xt¯x)(Yt+k¯y] represents the cross-covariance between Xand Yat lag k
This equation allows us to calculate the correlation between sensors in dierent locations
where one is temporally lagged, thus the CCF can be used to measure quantitatively
the spatio-temporal relationship of the trac flow and speed.
3.3 Experimental set up
In order to train and validate ML models the data must first be in a form that can be
interpreted by the algorithms. In this case we will frame the data to be a supervised
learning problem where each individual input vector is mapped to a unique output
value. To do this, input feature vectors must be extracted from the raw sensor data.
Starting with the simplest case of using one sensors historical data to predict the trac
speed at the next time step, fixed size and non-overlapping historic feature windows are
constructed (Fig 9) which are mapped to a future output value. The temporal length of
the feature window needs to be determined in order to ensure best model performance
but without unneeded complexity or redundant features, in this case the previous 6-
time steps where used i.e. 90 minutes. So, for a prediction at time tat target sensor
kconsidering only historical information from the target sensor the input vector ˆxk,t
would be [xk(t1),x
k(t6)] and the ground truth output is the trac speed
value at the target sensor kat time t.
Figure 9: Input windows for time series prediction
Expanding this to include upstream and downstream sensors is fairly simple the
input vector becomes
ˆxk,t =[xk(t1),x
Where x1,x
2.., xjare the upstream and downstream sensors to be included in the input
data. This paper focuses only on predictions for a single target sensor at a time. Thus,
the problem definition becomes to develop a functional approximation Fsuch that
I.e. the sum of the squared errors between the function mapping, in this case a model
prediction, and the true speed value is minimised. For each modelling technique three
dierent input vectors are tested in order to evaluate the impact of spatial features these
can be seen in Table 1.
Table 1: Four model Input vectors taken over 6 previous time steps
Data source
Input 1 Target sensor
Input 2 Target sensor + downstream sensors
Input 3 Target sensor + upstream sensors
Input 3 All sensors
3.4 Model specification and tuning
Machine learning models can contain hyper-parameters that determine structural fea-
tures of the model and are very important to model performance. These hyper-parameters
can be highly sensitive, with small perturbations resulting in large changes in prediction
outcomes. They are also dicult to determine heuristically and an empirical method
must be employed in order to determine the most appropriate values. The process of
“tuning” hyper-parameters requires repeatedly testing the model accuracy with dierent
sets of hyper-parameters on a data set that was not used during the training procedure,
in order to determine their best values, this is called model validation. It is not to be
confused however with model testing, the validation data set must be distinct from the
testing data set due to the fact hyper-parameter values are being chosen based on vali-
dation performance which gives a potentially biased representation on model accuracy.
A separate test set, that shares no overlap with training or validation sets, must be held
back for testing model prediction accuracy and oers an unbiased view on model perfor-
mance on unseen data. The raw data set is split into train, validation and test sets. This
is done in a sequential manner in order to avoid look-ahead bias so data from 2019-05-01
to 2019-08-31 is used for model training, data from 2019-09-01 to 2019-09-31 for vali-
dation and the data for October 2019 used for model testing. The train-validation-test
workflow can be seen in Fig 10
Figure 10: Train validation and test procedure for modeling
3.4.1 OLS
The OLS model implemented is described by the following equation in the univariate
Where xtis the prediction for target sensor at time t,xiare the trac speed values at
the target sensor at time i,are regression coecients to be determined by model fitting.
In the multivariate case the equation is
xt,k =0+
ikxik +
ij xti
Where xk,t is the prediction for target sensor kat time t,xi,k are the trac speed values
at target sensor k,xi,j are the trac speed values at additional Jsurrounding sensors.
3.4.2 LightGBM
Gradient Boosted Decision Trees (GBDT), as briefly introduced in the literature review,
are an ensemble machine learning technique that combines many decision trees to create
a strong learner. Decision trees are a popular algorithm however tend to overfit the data
and are unable to generalise to unseen data, GBDT’s are able to generalise better. They
work by sequentially combining many weak decision trees in a way that each new tree fits
to the residuals of the previous tree such that the model improves, employing the logic
in which the subsequent predictors learn from the mistakes of the previous predictors
(Fig 11).
LightGBM (Ke et al. 2017) is an ecient implementation of GBDT’s which uses a
histogram based decision tree algorithm and leaf wise tree growth methods rather than
the level wise method that most other GBDT implementations opt for.
LightGBM has numerous hyper-parameters that are tuned here using a grid search
method, iteratively traversing a grid of possible parameter combinations and comparing
their performance on the validation data set. The number of boosting iterations was
determined by the early stopping criterion with a patience of 100 i.e. if validation per-
formance didn’t improve after 100 rounds then there were no further boosting iterations,
final hyper-parameters are shown in Table 2.
Figure 11: Gradient boosted decision tree learning process 2
Table 2: LightGBM hyperparameters
Hyper-parameter Value
Objective Regression
Feature fraction 0.9
bagging fraction 0.5
num leaves 64
bagging freq 4
learning rate 0.007
boosting rounds 515
3.4.3 LSTM
Long Short Term Memory (LSTM) networks introduced by Hochreiter & Schmidhuber
(1997) are an adaptation of the Recurrent Neural Network that fix many of the issues
of its predecessor such as vanishing/exploding gradients and handling of long term de-
pendencies. Fig 12 shows the structure of an LSTM layer, which is similar to that of
an RNN (Fig 1) in that it is a chain of repeating modules of a neural network, but
the repeating modules have a dierent structure. The memory module contains input,
output, and forget gates, which respectively carry out to write, read, and reset functions
on each cell. The multiplicative gates, and , refer to operations for matrix addition
and dot product respectively and allow the model to store information over long periods,
eliminating the aforementioned vanishing/exploding gradient problem.
LSTM networks, like most ANN’s are trained using back propagation and gradient
descent. Given an error function E(X, ), which defines the error between the ground
truth yiand the predicted output ˆyiof the neural network with input vector xifor a
set of input output pairs (xi,y
i)2Xand given a set of network weights and biases
. Gradient descent requires the calculation of the gradient of the error function with
respect to the weights wk
ij and biases bk
i. Then according to the learning rate , each
iteration updates the weights and biases according to
t+1 =t@E(X, t)
Where trepresents the parameters of the network (weights and biases) at iteration
tof the gradient descent. The goal is to minimise the error function.
Figure 12: Structure of a general LSTM network layer
The structure of the network use in this paper is shown in Fig 13 and consists of the
input layer, a hidden LSTM layer, a dropout layer which randomly drops 5% of nodes
during a run through the network and is an eective technique for preventing over fitting
(Srivastava et al. 2014), then finally a fully connected dense layer which outputs the final
prediction. The learning rate was 0.001 and early stopping was used to determine 29
epochs as optimal to balance training error and generalisation strength.
Figure 13: Implemented LSTM network structure
3.5 Evaluation metrics
In order to asses and compare model prediction performance two evaluation metrics are
employed to valuate the dierence between actual and predicted speed values. These are
Root Mean Squared Error (RMSE) which is the most frequently used metric in previous
work (Luo et al. 2019) and Mean Absolute Error (MAE). RMSE tends to place more
weight on particularly large individual errors, thus more useful when large errors are
MAE =1
3.6 Implementation
All model training and computation was carried out using Pyhton 3 running on macOS
Catalina v10.15.6 operating within a MacBook Pro (2018) with 2.3GHz Quad-Core intel
I5 and 8 GB of 2133Mhz memory. LSTM was implemented using the Keras 3API library
running on top of Tensorflow. The LightGBM implementation of GBDT’s used here is
freely available online 4.
4 Results
4.1 Space-time autocorrelations
First, presented here is an evaluation of the spatial and temporal patterns found within
the data using measures of autocorrelation previously outlined. Fig 14 shows the tem-
poral autocorrelations found in the sequence measurements of the target sensor for up
to 10 lagged time steps. Fig 14a shows a steady decrease in temporal autocorrelation as
the time lag increases. The partial autocorrelation show in Fig 14b, which controls for
intermediate time lags, shows a steep drop oonce following once the variable is lagged
more than one time step.
(a) Autocorrelation (b) Partial autocorrelation
Figure 14: Temporal autocorrelations
Examining the CCF coecient between target sensor and the other network sensors
gives a local view of the space-time autocorrelation structure of the network. The CCF
coecient is calculated both across the whole day and also separately for the peak after-
noon period (4pm-8pm) to see how space-time autocorrelations vary both spatially and
temporally. Fig 15 and Fig 16 are both cantered at time order 0, this is the correlation
between sensors at the same point in time. Positive time lags indicate that the sensor
has been lagged in time with respect to the target sensor whereas a negative lag value is
given when a series has been shifted forwards in time with respect to the target sensor.
Each time lag step represents 15 intervals minutes in real terms. The CCF coecient
was calculated between the target sensor and all other surrounding sensors Fig 15 shows
the CCF coecient values across the whole day lagged up to 10 time steps with plot-
ting distinguishing between upstream and downstream sensors. Almost all sensors, both
upstream and downstream, show some level of cross correlation with the target sensor.
Generally, corss-correlation decreases as time lag increases which is to be expected, with
upstream sensors generally showing greater levels of correlation with target, although
as show in Fig 3upstream sensors are in this instance located slightly closer on aver-
age than downstream.. At peak times as shown in Fig 16 downstream sensors show
increased CCF coecient values due to the backward propagation of trac through the
network at congested times. The plots demonstrate cross correlations varying both in
space (sensor locations) and time of day.
Figure 15: CCF coecients for all times
Figure 16: CCF coeecients for afternoon peak hours
4.2 Modelling
This section aims to give a comparative overview of the dierent modelling approaches
and their performance when provided with varying input data. The metrics employed to
evaluate model performance are RMSE and MAE, the calculations for these are outlined
in the methods section. The data used for evaluation is trac speed data collected
between 2019-10-01 and 2019-10-31 this data was not used during model training thus
oers an unbiased estimated of true model prediction accuracy on unseen data. Models
are also compared to a “common sense” approach, in this case this means taking the
trac speed at the current time and assuming that it will remain the same for the
next time step, this is a consideration that is often overlooked in trac forecasting
but is vital to ensure that any proposed, potentially complex, models are in fact able
to provide genuine predictive power when compared to a much more simple intuitive
Table 3: MAE and RMSE scores for target sensor M60/9223A at time t+1
Model Input Data MAE RMSE
OLS Target sensor 1.71923 3.42504
Target sensor + downstream sensors 1.67342 2.97104
Target sensor + upstream sensors 1.72988 3.4096
All sensors 1.69123 2.97821
LightGBM Target sensor 1.66323 3.32318
Target sensor + downstream sensors 1.48142 2.6648
Target sensor + upstream sensors 1.59021 3.18623
All sensors 1.43430 2.62749
LSTM Target sensor 1.64175 3.26481
Target sensor + downstream sensors 1.58658 2.95209
Target sensor + upstream sensors 1.57595 3.32331
All sensors 1.56655 3.17687
Extrapolate 1.59935 3.40976
Table 3Presents a comparison of model accuracy for trac speed prediction at the
target sensor M60/9223A at time horizon t+ 1. Taking the OLS model it is clear that
prediction improvement can be gained by incorporating spatial features. The MAE falls
from 1.71 to 1.69 when the previous time step readings for upstream and downstream
sensor readings are incorporated as model input, as well as previous target sensor read-
ings. In terms of MAE the OLS model however fails to outperform the “common sense”
approach which takes the speed value at time tto be a prediction for the speed at time
t+ 1, it can however outperform this approach with regard to RMSE which suggest the
OLS is making fewer very large errors than the “common sense” approach. The Light-
GBM is the overall best performing model by a clear margin. When only the previous
6 time steps for the target sensor are considered as input data it fails to outperform the
“common sense” approach in terms of MAE, however as can be seen from the table the
incorporation of upstream and downstream sensor readings can improve model perfor-
mance significantly, the LightGBM model appears the most capable of leveraging the
spatio-temporal structure of the data, with a 15.96% decrease in MAE for LightGBM
when incorporating previous time step data for all sensors compared to only the target
sensor. LightGBM is in fact the only model to outperform the “common sense” ap-
proach when in terms of MAE, it does so by a good margin, MAE of 1.43 compared to
1.60, but only when incorporating surrounding sensor data. The LSTM performs with
slightly better accuracy than the OLS model but gains only a slight improvement in
error when incorporating upstream and downstream sensor information, it marginally
outperforms the “common sense” in terms of MAE and RMSE.
Table 4presents the results for model predictions at time horizon t+ 2. The results
share similarities with results for t+ 1 predictions with once again LightGBM being
by far the best performing model with a MAE of 1.885 and RMSE of 3.696, this when
including all sensors as input. The model performance when compared to common sense
is magnified when predicting 2 time steps ahead, there is a 12% decrease in MAE in
LightGBM model predictions compared to common sense while when predicting one time
step ahead there was a 10.32% dierence in error. Again OLS and LightGBM model
performance improves, with a reduction in MAE and RMSE when incorporating spatial
features by way of upstream and downstream sensor readings At this time horizon, the
LSTM model does not seem to take advantage of the spatial elements of the data as it
fails to improve when incorporating surrounding sensor readings.
Table 4: MAE and RMSE scores for target sensor M60/9223A at time t+2
Model Input Data MAE RMSE
OLS Target sensor 2.29718 4.8005
Target sensor + downstream sensors 2.23167 4.2297
Target sensor + upstream sensors 2.27521 4.65812
All sensors 2.23841 4.19049
LightGBM Target sensor 2.1893 4.62002
Target sensor + downstream sensors 1.91728 3.81149
Target sensor + upstream sensors 2.07082 4.3758
All sensors 1.88533 3.7961
LSTM Target sensor 2.16217 4.68135
Target sensor + downstream sensors 2.20957 4.65229
Target sensor + upstream sensors 2.26725 4.6352
All sensors 2.20115 4.78406
Extrapolate 2.15487 4.65977
The results in terms of MAE for the best performing LightGBM model are shown in
Fig 17 the MAE of predictions is aggregated for each hour of the day and the plot shows
the influence of both time of day and the inclusion of spatial features on model perfor-
mance. During none peak times the accuracy of models that take input data only from
the target sensor is marginally worse than models that take inputs from both the target
sensor and surrounding sensors upstream and downstream. At peak times however the
dierence is more pronounced, the error is much higher in models that take input data
from only the target sensor or the target sensor + upstream sensors. Congested times
appear to be when the incorporation of spatial features is most important, especially
downstream sensors, due to the back-propagation of trac through the network at busy
For one week 2019-10-21 to 2019-10-28 (Mon-Sun) in Fig 18, the predicted trac
speeds for both the LightGBM and LSTM models are shown, as well as the actual
target sensor readings for that period also known as “ground truth” values. Whilst
both models fit the general trend of the data the LightGBM better matches the ground
truth at times when the trac speeds are particularly unstable or congested, such as
the areas circled.
Figure 17: MAE for all predictions in each hour of the day
Figure 18: Actual and predictied trac speed values for LightGBM and LSTM models
5 Discussion
To reiterate, the main research questions of this study where thus. To what extent
does trac data exhibit spatio-temporal patterns? Which is the best performing trac
forecasting technique OLS regression, Gradient Boosted Decision Trees or LSTM neural
networks? Does the inclusion of spatial features improve trac forecasting models? The
results from this study provide clear indications as to the answers. The exploration of
spatio-temporal patterns using the Cross-Correlation Function indicated that there are
clear spatial and temporal dependencies in the data. These spatio-temporal patterns
are dynamic in both space and time with distinctions found between the influence of
upstream trac compared to downstream trac, for instance downstream trac pat-
terns becoming more significant during congested periods (Fig 16). The result found
the LightGBM model to be the most accurate by a significant margin, it appears to
be the most appropriate model for capturing the complex spatio-temporal patterns and
generated accurate predictions under both free flowing and congested conditions (Fig
18) up to 30 minutes into the future. The LightGBM model also saw considerable im-
provements when including the eects of upstream and downstream sensors in the input
vector(Table 3), which in turn answers our final question.
5.1 Literature implications
The spatio-temporal correlations studied in this paper concur with findings from Yue &
Yeh (2008) and Cheng et al. (2012) that trac networks exhibit strong spatial as well
as temporal patterns as trac propagates through the network. Although this is an
aspect that has previously been overlooked in the literature (Vlahogianni et al. 2014)
the importance of space in trac forecasting has been brought more to the fore in recent
years. The CCF proved an eective way of quantifying space time relationships which
could aid in model selection and feature selection for trac forecast models.
As previously stated the literature is still divided over which modelling techniques
are most appropriate, with relation to this study both Xia et al. (2019) and Mei et al.
(2018) also found success in developing trac flow forecasting models with GBDT’s. On
the other hand Essien et al. (2020) found that their Autoencoder-LSTM outperformed
XGBoost when predicting trac flows across the A56 in Manchester and Wang & Li
(2018) found that their hybrid Convolutional LSTM network outperformed LightGBM
in predicting travel speed 15 to 90 minutes in the future. In both of these studies
the GBDT models where being used as comparison against significantly more complex
models and networks with multiple Convolutional and bidirectional-LSTM hidden layers.
There are few, if any, studies that show LightGBM to be outperformed by a pure LSTM
model. This study confirms the findings of Xia et al. (2019) and Mei et al. (2018) that
LightGBM appears to be a useful method, dealing with spatio-temporal relationships
well without the need for supplementary models or network layers to capture the spatial
aspect of the data as is the case with LSTM networks. The importance of integrating
upstream and downstream sensors is a agrees with finds by both Gebresilassie (2017) and
Du et al. (2018) and forms a growing body of evidence for the importance of explicitly
extracting spatial characteristics of trac data to improve trac forecasting.
Perhaps unexpectedly, some of the model variations implemented failed to outper-
form a much more simple “common sense” approach that required almost no compu-
tation. This is interesting for a couple of reasons, first of all it is evidence of just how
challenging it can be to accurately model such a dynamic and complex system as an
urban trac network. The intricate spatio-temporal relationships in the data, outlined
in this study and previous works (Cheng et al. 2012, Yue & Ye h 2008), mean that trac
prediction is dicult, it requires serious though about a number of interconnected fac-
tors in spatial and temporal dimensions as well as consideration for contextual factors
that drive trac patterns such as weather and accidents. Second of all it raises questions
around the methods of evaluation for trac forecasting models in the literature. Very
few studies in the literature, of a similar nature to this one, test their models against
similar heuristic common sense measures, usually opting instead to test developed mod-
els against other common modelling techniques. The problem with this is that it tends
to lose sight of the underlying goals of trac forecasting, rather than leveraging com-
putational techniques to enhance or improve existing methods, much research is pitting
increasingly complex models against each other(Karlaftis & Vlahogianni 2011b). As
(Vlahogianni et al. 2014) notes the increasing model complexity can also be detrimental
to the explanatory power of the model which in the real world is imperative to make
them adaptable and responsive to dynamic trac changes.(Karlaftis & Vlahogianni
2011b). Tree based methods such as GBDT’s can oer methods to explain predictions
by extracting the input features that have most influence on model predictions (Chen
& Guestrin 2016). This is an area where models such as LSTM networks struggle.
5.2 Real world applications
Given that this is a field of research that developed with real-world applications in
mind, it would be sensible to examine the findings of this paper in the context of real
Intelligent Transport Systems. The results gained here regarding the spatio-temporal
structure of the trac data emphasise the need for comprehensive and widespread trac
monitoring and data collection systems, these methods rely on high quality, rich and
multidimensional raw data sources and the clear spatial dependence exhibited by the
type of network data shown in this study means the performance is only likely to be
improved given the addition of more sensors distributed across the network. The promis-
ing performance shown by the LightGBM model to provide accurate predictions even
in abnormal conditions ( Fig 18) and over multiple time steps (Table 4) suggest that
it is a modelling technique that could be of significant use in real world applications.
It is important to highlight here that this study focused only on generating forecasts
for an individual sensor. Clearly, for trac forecasts to be an eective tool in a trac
management system they need to be extended across the entire network. There are a
number of challenges that come with this, however. It is not clear for instance how many
upstream and or downstream sensors are required for accurate predictions, in this study
all sensors appeared to exhibit some level of spatio-temporal correlation with the target
sensor however this is unlikely to be the case as the distance between sensors increases.
A further challenge regarding real world integration is the availability of real-time
data streams, while this study used historical data to asses performance, real world
application requires that data can be captured, processed and fed to any model in a
short window of time for any prediction scheme to be viable, the reverse is also true
and there is a challenge to determine which ITS and trac management technologies
are capable of integrating and adapting to information taken from forecasting models.
(Vlahogianni et al. 2014)
5.3 Limitations
Some of the main limitations have been briefly touched upon already in this discussion.
Firstly, the fact that the modelling techniques employed in this paper are suitable for
predictions at a single sensor at a time and more general methods that are able to
predict trac speeds across an entire network might be prefered. One option is the
method of Geographically Weighted Regression (Brunsdon et al. 1998) as implemented
by Gebresilassie (2017). This technique would involve a single regression model for the
entire network but allows independent and dependent variables to vary by locality. Luo
et al. (2019) developed a method in which models where developed for individual sensors
the predictions combined and a function based on spatial weights between sensors was
implemented in order to improve overall accuracy when making simultaneous predictions
across the network.
A further limitation of this study is the lack of understanding of causality of model
predictions. While models may be able to accurately map an input vector to an output
value this oers no information as to the underlying causes of congestion or abnormal
trac patterns.
Finally homogeneity of the data used for model evaluation represents a limitation in
terms of what can be inferred from the results, while this paper highlighted the potential
of the LighGBM model when compared to other methods. It is not possible to conclude
that the LightGBM is a more appropriate method than LSTM for trac forecasting
based othis one study, for that more tests would need to be carried out on a number
of data sets across diering road networks.
5.4 Future work
Although this study highlighted the importance of spatial dependence in trac fore-
casting, beyond the inclusion of explicitly spatial features in the form of upstream and
downstream sensors, there was no significant eort to more formally encode the spatial
structure of the data in a way that might help models to better learn the latent spa-
tial structure of the data. There have attempts to develop models that are “spatially
aware” for instance Du et al. (2018) and Wang & Li (2018) introduce Convolutional
Neural Networks (Cheng et al. 2018), these were originally developed for image recogni-
tion but have been applied in trac forecasting literature as they are designed to extract
spatial features of unstructured data such as images and graphs which is appropriate
since transport networks can easily be represented as a graph structure.
Given that this study focused on one continuous road there is room for further explo-
ration of spatio-temporal patterns of interconnected road networks to see if relationships
hold between roads that are adjacent or even completely disjoint but in close proximity
to each other work by Cheng et al. (2012) would suggest that the influence of road
links on their neighbours is local and varies widely in space and time making it hard to
measure spatio-temporal relationships across complex road topologies.
6 Conclusion
The growth of urban populations globally has resulted in rising levels of trac and
congestion, creating numerous social, environmental and economic issues (Hymel 2009)
with the pollution caused by increased trac levels in urban areas even being found
to increase infant mortality rates (Knittel et al. 2016). The need therefore has never
been greater for Intelligent trac management systems guided by eective forecasting
techniques and a comprehensive understanding of the spatial and temporal patterns that
drive trac flows. This dissertation first examined how trac forecasting can benefit
from the understanding of how trac speeds amongst neighbouring locations are related.
Furthermore, it compared the eectiveness of machine learning algorithms to provide
accurate and robust forecasts, improved by incorporating the eects of upstream and
downstream sensors.
A Cross-Correlation Function was used in order to identify and quantify the spatio-
temporal dependencies of trac speed data where it was found that significant relation-
ships exist that are dynamic in both space and time. There were distinct dierences
between the influence of upstream and downstream trac patterns on a given road
segment, which also varied over the course of the day and provided evidence of trac
“shockwaves” (Richards 1956) during congested periods as trac propagated upstream.
The Cross-Correlation Function was found to be eective method of quantifying space
time relationships which could aid in model selection, feature selection and determining
the feasibility of short-term trac forecasts on a given road network.
Three trac forecasting models where developed and tested on trac data collected
on the M60 motorway in Manchester, England. The goal was to predict trac speeds
both 15 minutes and 30 minutes ahead of time. The LightGBM algorithm, an ecient
gradient boosting implementation, was the best performing model by a significant mar-
gin. It outperformed Ordinary Least Squares regression and Long Short-Term Memory
neural networks at both time prediction horizons and under varying trac conditions.
It also outperformed a heuristic approach by a large enough margin to suggest it is
a method with genuine predictive power. This is hypothesised to be due to the abil-
ity of the LightGBM algorithm to more eectively learn the latent spatial structure
of the trac speed data, as opposed to hugely popular sequence prediction algorithm
LSTM neural network, which evidence suggests does not capture spatial dependencies
well on its own, resulting in sup-optimal performance as a standalone algorithm that
may require the use of hybrid modelling structures to extract spatial features prior to
its application (Du et al. 2018, Cheng et al. 2012). Empirical evidence was provided for
the benefit of including neighbouring road sensor measurements for trac forecasting
models given an appropriate model that was able to capture the spatial structure. The
results highlighted both the importance of spatial dependences and also that of correct
model selection and specification in order to produce acceptable performance in com-
plex trac forecasting tasks. The role of geo-space in this study was clear, but until
recent years has been overlooked in the literature, which leaves open the much broader
question of which other related fields may benefit from
7 Bibliography
Ahmed, M. S. & Cook, A. R. (1979), Analysis of freeway trac time-series data by
using Box-Jenkins techniques, number 722.
Bishop, C. M. (2006), Pattern recognition and machine learning, springer.
Bojer, C. S. & Meldgaard, J. P. (2020), ‘Kaggle forecasting competitions: An overlooked
learning opportunity’, International Journal of Forecasting .
Box, G. E., Jenkins, G. M. & Reinsel, G. C. (2011), Time series analysis: forecasting
and control, Vol. 734, John Wiley & Sons.
Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. (1984), Classification and
regression trees,CRCpress.
Brunsdon, C., Fotheringham, S. & Charlton, M. (1998), ‘Geographically weighted regres-
sion’, Journal of the Royal Statistical Society: Series D (The Statistician) 47(3), 431–
Castro-Neto, M., Jeong, Y.-S., Jeong, M.-K. & Han, L. D. (2009), ‘Online-svr for short-
term trac flow prediction under typical and atypical trac conditions’, Expert Sys-
tems with Applications 36(3, Part 2), 6164–6173.
Chandra, S. R. & Al-Deek, H. (2009), ‘Predictions of freeway trac speeds and volumes
using vector autoregressive models’, Journal of Intelligent Transportation Systems
13(2), 53–72.
Chen, T. & Guestrin, C. (2016), Xgboost: A scalable tree boosting system, in ‘Proceed-
ings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining’, KDD ’16, Association for Computing Machinery, New York, NY,
USA, p. 785–794.
Cheng, N., Lyu, F., Chen, J., Xu, W., Zhou, H., Zhang, S. & Shen, X. (2018), ‘Big data
driven vehicular networks’, IEEE Network 32(6), 160–167.
Cheng, T., Haworth, J. & Wang, J. (2012), ‘Spatio-temporal autocorrelation of road
network data’, Journal of Geographical Systems 14(4), 389–413.
Coleri, S., Cheung, S. Y. & Varaiya, P. (2004), Sensor networks for monitoring trac,
in ‘In Allerton Conference on Communication, Control and Computing’.
Drew, D. R. (1968), Trac flow theory and control, Technical report.
Du, S., Li, T., Gong, X. & Horng, S.-J. (2018), ‘A hybrid method for trac flow fore-
casting using multimodal deep learning’, arXiv preprint arXiv:1803.02099 .
Essien, A., Petrounias, I., Sampaio, P. & Sampaio, S. (2020), ‘A deep-learning model
for urban trac flow prediction with trac events mined from twitter’, World Wide
Web .
Florio, L. & Mussone, L. (1996), ‘Neural-network models for classification and forecast-
ing of freeway trac flow stability’, Control Engineering Practice 4(2), 153 – 164.
Gebresilassie, M. A. (2017), Spatio-temporal trac flow prediction.
Grith, D. A. (1987), ‘Spatial autocorrelation’, A Primer. Washington DC: Association
of American Geographers .
Hastie, T., Tibshirani, R. & Friedman, J. (2009), The elements of statistical learning:
data mining, inference, and prediction, Springer Science & Business Media.
Hochreiter, S. & Schmidhuber, J. (1997), ‘Long short-term memory’, Neural computation
9(8), 1735–1780.
Hong, W.-C. (2012), ‘Application of seasonal svr with chaotic immune algorithm in
trac flow forecasting’, Neural Computing and Applications 21(3), 583–593.
Hymel, K. (2009), ‘Does trac congestion reduce employment growth?’, Journal of
Urban Economics 65(2), 127 – 135.
Innamaa, S. (2005), Short-term prediction of trac situation using mlp-neural networks.
Ishak, S. & Al-Deek, H. (2002), ‘Performance evaluation of short-term time-series trac
prediction model’, Journal of Transportation Engineering 128(6), 490–498.
Karlaftis, M. G. & Vlahogianni, E. I. (2011a), ‘Statistical methods versus neural net-
works in transportation research: Dierences, similarities and some insights’, Trans-
portation Research Part C: Emerging Technologies 19(3), 387–399.
Karlaftis, M. & Vlahogianni, E. (2011b), ‘Statistical methods versus neural networks in
transportation research: Dierences, similarities and some insights’, Transportation
Research Part C: Emerging Technologies 19(3), 387 – 399.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. & Liu, T.-Y.
(2017), Lightgbm: A highly ecient gradient boosting decision tree, in I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett,
eds, ‘Advances in Neural Information Processing Systems 30’, Curran Associates, Inc.,
pp. 3146–3154.
Knittel, C. R., Miller, D. L. & Sanders, N. J. (2016), ‘Caution, drivers! children present:
Trac, pollution, and infant health’, Review of Economics and Statistics 98(2), 350–
Leduc, G. (2008), Road trac data: Collection methods and applications.
Lint, J. (2004), Reliable travel time prediction for freeways, PhD thesis.
Liu, D., Tang, L., Shen, G. & Han, X. (2019), ‘Trac speed prediction: An attention-
based method’, Sensors 19(18).
Luo, X., Li, D., Yang, Y. & Zhang, S. (2019), ‘Spatiotemporal trac flow prediction
with knn and lstm’, Journal of Advanced Transportation 2019, 4145353.
Mei, Z., Xiang, F. & Zhen-hui, L. (2018), Short-term trac flow prediction based on
combination model of xgboost-lightgbm, in ‘2018 International Conference on Sensor
Networks and Signal Processing (SNSP)’, pp. 322–327.
Moran, P. A. (1950), ‘Notes on continuous stochastic phenomena’, Biometrika
37(1/2), 17–23.
Pfeifer, P. E. & Deutrch, S. J. (1980), ‘A three-stage iterative procedure for space-time
modeling phillip’, Technometrics 22(1), 35–47.
Richards, P. I. (1956), ‘Shock waves on the highway’, Operations Research 4(1), 42–51.
Smith, B. L. & Demetsky, M. J. (1997), ‘Trac flow forecasting: Comparison of modeling
approaches’, Journal of Transportation Engineering 123(4), 261–266.
Smith, B. L., Williams, B. M. & Keith Oswald, R. (2002), ‘Comparison of parametric
and nonparametric models for trac flow forecasting’, Transportation Research Part
C: Emerging Technologies 10(4), 303 – 321.
Soper, H. E., Young, A. W., Cave, B. M., Lee, A. & Pearson, K. (1917), ‘On the
distribution of the correlation coecient in small samples. appendix ii to the papers
of ”student” and r. a. fisher’, Biometrika 11(4), 328–413.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014),
‘Dropout: a simple way to prevent neural networks from overfitting’, The journal of
machine learning research 15(1), 1929–1958.
Sun, B., Cheng, W., Goswami, P. & Bai, G. (2018), ‘Short-term trac forecasting using
self-adjusting k-nearest neighbours’, IET Intelligent Transport Systems 12(1), 41–48.
Tobler, W. R. (1970), ‘A computer movie simulating urban growth in the detroit region’,
Economic Geography 46(sup1), 234–240.
Vlahogianni, E. I., Golias, J. C. & Karlaftis, M. G. (2004), ‘Short-term trac forecasting:
Overview of objectives and methods’, Transport Reviews 24(5), 533–557.
Vlahogianni, E. I., Karlaftis, M. G. & Golias, J. C. (2014), ‘Short-term trac forecasting:
Where we are and where we’re going’, Transportation Research Part C: Emerging
Technologies 43, 3 – 19. Special Issue on Short-term Trac Flow Forecasting.
Wang, W. & Li, X. (2018), ‘Travel speed prediction with a hierarchical convolutional
neural network and long short-term memory model framework’.
Wang, X., An, K., Tang, L. & Chen, X. (2015), ‘Short term prediction of freeway exiting
volume based on svm and knn’, International Journal of Transportation Science and
Technology 4(3), 337–352.
Williams, B. M. & Hoel, L. A. (2003), ‘Modeling and forecasting vehicular trac flow as
a seasonal arima process: Theoretical basis and empirical results’, Journal of Trans-
portation Engineering 129(6), 664–672.
Xia, H., Wei, X., Gao, Y. & Lv, H. (2019), Trac prediction based on ensemble machine
learning strategies with bagging and lightgbm, in ‘2019 IEEE International Conference
on Communications Workshops (ICC Workshops)’, pp. 1–6.
Yang, B., Sun, S., Li, J., Lin, X. & Tian, Y. (2019), ‘Trac flow prediction using lstm
with feature enhancement’, Neurocomputing 332, 320 – 327.
Ye, Q., Szeto, W. Y. & Wong, S. C. (2012), ‘Short-term trac speed forecasting based on
data recorded at irregular intervals’, IEEE Transactions on Intelligent Transportation
Systems 13(4), 1727–1737.
Yue, Y. & Yeh, A. G.-O. (2008), ‘Spatiotemporal trac-flow dependency and short-term
trac forecasting’, Environment and Planning B: Planning and Design 35(5), 762–
Zhao, Z., Chen, W., Wu, X., Chen, P. C. Y. & Liu, J. (2017), ‘Lstm network: a deep
learning approach for short-term trac forecast’, IET Intelligent Transport Systems
11(2), 68–75.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Short-term traffic parameter forecasting is critical to modern urban traffic management and control systems. Predictive accuracy in data-driven traffic models is reduced when exposed to non-recurring or non-routine traffic events, such as accidents, road closures, and extreme weather conditions. The analytical mining of data from social networks – specifically twitter – can improve urban traffic parameter prediction by complementing traffic data with data representing events capable of disrupting regular traffic patterns reported in social media posts. This paper proposes a deep learning urban traffic prediction model that combines information extracted from tweet messages with traffic and weather information. The predictive model adopts a deep Bi-directional Long Short-Term Memory (LSTM) stacked autoencoder (SAE) architecture for multi-step traffic flow prediction trained using tweets, traffic and weather datasets. The model is evaluated on an urban road network in Greater Manchester, United Kingdom. The findings from extensive empirical analysis using real-world data demonstrate the effectiveness of the approach in improving prediction accuracy when compared to other classical/statistical and machine learning (ML) state-of-the-art models. The improvement in predictive accuracy can lead to reduced frustration for road users, cost savings for businesses, and less harm to the environment.
Full-text available
Short-term traffic speed prediction has become one of the most important parts of intelligent transportation systems (ITSs). In recent years, deep learning methods have demonstrated their superiority both in accuracy and efficiency. However, most of them only consider the temporal information, overlooking the spatial or some environmental factors, especially the different correlations between the target road and the surrounding roads. This paper proposes a traffic speed prediction approach based on temporal clustering and hierarchical attention (TCHA) to address the above issues. We apply temporal clustering to the target road to distinguish the traffic environment. Traffic data in each cluster have a similar distribution, which can help improve the prediction accuracy. A hierarchical attention-based mechanism is then used to extract the features at each time step. The encoder measures the importance of spatial features, and the decoder measures the temporal ones. The proposed method is evaluated over the data of a certain area in Hangzhou, and experiments have shown that this method can outperform the state of the art for traffic speed prediction.
Full-text available
The traffic flow prediction is becoming increasingly crucial in Intelligent Transportation Systems. Accurate prediction result is the precondition of traffic guidance, management, and control. To improve the prediction accuracy, a spatiotemporal traffic flow prediction method is proposed combined with k-nearest neighbor (KNN) and long short-term memory network (LSTM), which is called KNN-LSTM model in this paper. KNN is used to select mostly related neighboring stations with the test station and capture spatial features of traffic flow. LSTM is utilized to mine temporal variability of traffic flow, and a two-layer LSTM network is applied to predict traffic flow respectively in selected stations. The final prediction results are obtained by result-level fusion with rank-exponent weighting method. The prediction performance is evaluated with real-time traffic flow data provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center. Experimental results indicate that the proposed model can achieve a better performance compared with well-known prediction models including autoregressive integrated moving average (ARIMA), support vector regression (SVR), wavelet neural network (WNN), deep belief networks combined with support vector regression (DBN-SVR), and LSTM models, and the proposed model can achieve on average 12.59% accuracy improvement.
Full-text available
Short-term traffic forecasting is becoming more important in intelligent transportation systems. The k-nearest neighbours (kNN) method is widely used for short-term traffic forecasting. However, kNN parameters self-adjustment has been a problem due to dynamic traffic characteristics. This paper proposes a fully automatic dynamic procedure kNN (DP-kNN) that makes the kNN parameters self-adjustable and robust without predefined models or training. We used realworld data with more than one-year traffic records to conduct experiments. The results show that DP-kNN can perform better than manually adjusted kNN and other benchmarking methods with regards to accuracy on average. This study also discusses the difference between holiday and workday traffic prediction as well as the usage of neighbour distance measurement.
We review the results of six forecasting competitions based on the online data science platform Kaggle, which have been largely overlooked by the forecasting community. In contrast to the M competitions, the competitions reviewed in this study feature daily and weekly time series with exogenous variables, business hierarchy information, or both. Furthermore, the Kaggle data sets all exhibit higher entropy than the M3 and M4 competitions, and they are intermittent. In this review, we confirm the conclusion of the M4 competition that ensemble models using cross-learning tend to outperform local time series models and that gradient boosted decision trees and neural networks are strong forecast methods. Moreover, we present insights regarding the use of external information and validation strategies, and discuss the impacts of data characteristics on the choice of statistics or machine learning methods. Based on these insights, we construct nine ex-ante hypotheses for the outcome of the M5 competition to allow empirical validation of our findings.
Vehicular communications networks (VANETs) enable information exchange among vehicles, other end devices and public networks, which plays a key role in road safety/infotainment, intelligent transportation system, and self-driving system. As the vehicular connectivity soars, and new on-road mobile applications and technologies emerge, VANETs are generating an ever-increasing amount of data, requiring fast and reliable transmissions through VANETs. On the other hand, a variety of VANETs related data can be analyzed and utilized to improve the performance of VANETs. In this article, we first review the VANETs technologies to efficiently and reliably transmit the big data. Then, the methods employing big data for studying VANETs characteristics and improving VANETs performance are discussed. Furthermore, we present a case study where machine learning schemes are applied to analyze the VANETs measurement data for efficiently detecting negative communication conditions.
Short-term traffic forecast is one of the essential issues in intelligent transportation system. Accurate forecast result enables commuters make appropriate travel modes, travel routes, and departure time, which is meaningful in traffic management. To promote the forecast accuracy, a feasible way is to develop a more effective approach for traffic data analysis. The availability of abundant traffic data and computation power emerge in recent years, which motivates us to improve the accuracy of short-term traffic forecast via deep learning approaches. A novel traffic forecast model based on long short-term memory (LSTM) network is proposed. Different from conventional forecast models, the proposed LSTM network considers temporal-spatial correlation in traffic system via a two-dimensional network which is composed of many memory units. A comparison with other representative forecast models validates that the proposed LSTM network can achieve a better performance.