Content uploaded by Omar Alam

Author content

All content in this area was uploaded by Omar Alam on Sep 04, 2020

Content may be subject to copyright.

Content uploaded by Omar Alam

Author content

All content in this area was uploaded by Omar Alam on Aug 27, 2020

Content may be subject to copyright.

Content uploaded by Omar Alam

Author content

All content in this area was uploaded by Omar Alam on Aug 27, 2020

Content may be subject to copyright.

Vol.:(0123456789)

1 3

Journal of Ambient Intelligence and Humanized Computing

https://doi.org/10.1007/s12652-020-02507-9

ORIGINAL RESEARCH

Predicting irregularities inarrival timesfortransit buses withrecurrent

neural networks using GPS coordinates andweather data

OmarAlam1 · AnshumanKush1· AliEmami2· ParisaPouladzadeh3

Received: 26 March 2020 / Accepted: 27 August 2020

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract

Intelligent transportation systems (ITS) play an important role in the quality of life of citizens in any metropolitan city.

Despite various policies and strategies incorporated to increase the reliability and quality of service, public transportation

authorities continue to face criticism from commuters largely due to irregularities in bus arrival times, most notably mani-

fested in early or late arrivals. Due to these irregularities, commuters may miss important appointments, wait for too long

at the bus stop, or arrive late for work. Therefore, accurate prediction models are needed to build better customer service

solutions for transit systems, e.g. building accurate mobile apps for trip planning or sending bus delay/cancel notiﬁcations.

Prediction models will also help in developing better appointment scheduling systems for doctors, dentists, and other busi-

nesses to take into account transit bus delays for their clients. In this paper, we seek to predict the occurrence of arrival time

irregularities by mining GPS coordinates of transit buses provided by the Toronto Transit Commission (TTC) along with

hourly weather data and using this data in machine learning models that we have developed. In our study, we compared the

performance of a Long Short Term Memory Recurrent Neural Network (LSTM) model against four baseline models, an

Artiﬁcial Neural Network (ANN), Support Vector Regression (SVR), Autoregressive Integrated Moving Average (ARIMA)

and historical averages. We found that our LSTM model demonstrates the best prediction accuracy. The improved accuracy

achieved by the LSTM model may lend itself to its ability to adjust and update the weights of neurons while accounting for

long-term dependencies. In addition, we found that weather conditions play a signiﬁcant role in improving the accuracy

of our models. Therefore, we built a prediction model that combines an LSTM model with a Recurrent Neural Network

Model (RNN) that focuses on the weather condition. Our ﬁndings also reveal that in nearly 37% of scheduled arrival times,

buses either arrive early or late by a margin of more than 5 min, suggesting room for improvement in the current strategies

employed by transit authorities.

Keywords Intelligent transportation systems· ITS· Traﬃc ﬂow· Neural networks· GPS locations· Weather conditions

1 Introduction

The importance of modeling and predicting bus arrival times

for public transit has long been recognized (Kumar etal.

2014). Throughout the past decade, much work has been

done to explore means of achieving faster and more reliable

transit systems (Hua etal. 2018). However, public transit

authorities continue to face criticisms from commuters due

to discrepancies between a vehicle’s scheduled and actual

arrival times. These irregularities naturally have a nega-

tive impact on the commuter’s daily life. Commuters may

miss medical appointments, school events, or arrive late for

work. With the availability of large scale pervasive data,

e.g. GPS locations collected from buses, we believe that

machine learning algorithms can help in predicting actual

* Omar Alam

omaralam@trentu.ca

Anshuman Kush

anshumankush@trentu.ca

Ali Emami

ali.emami@mail.mcgill.ca

Parisa Pouladzadeh

Parisa.Pouladzadeh@ﬂemingcollege.ca

1 Trent University, Peterborough, Canada

2 Mila/McGill University, Montreal, Canada

3 Fleming College, Peterborough, Canada

O.Alam et al.

1 3

arrival times for public transit buses, and assist in strategies

to overcome their discrepencies with scheduled times.

This paper aims at modelling the irregularities in arrival

times for public transit buses using historical bus arrival

times, stop locations, bus schedules, and weather data. Irreg-

ularities can be considered to occur in one of two ways, leads

(early arrival at a stop) and delays (late arrival at a stop). We

focused on predicting irregularities for transit buses for the

City of Toronto, where irregularities in bus arrival times are

so commonly occurring that Toronto Transit Commission

(TTC) issues notes for commuters who arrive late for work

due to misleading scheduling times (Star 2020).

To reduce irregularities in arrival times, transit authorities

incorporate a variety of strategies to bridge the gap between

actual and scheduled arrival times of buses. Among these

strategies, the holding control strategy is found to be the

most eﬀective to regulate bus operations (Fu and Yang

2002). This strategy seeks to address the phenomenon called

bus headway, which is a large, accumulated arrival lead or

delay in a bus stop that results from a succession of leads or

delays that occurred in previous stops. By holding an early-

arriving bus, a bus headway can be mitigated and service

reliability can be improved (Fu and Yang 2002). Another

strategy is stop-skipping, which is particularly useful when

buses are running late and behind their schedule (Liu etal.

2013). Despite applying these strategies, transit services

continue to face delays in their daily operations, which could

be due to ongoing road constructions, bus breakdown, road

accidents, or other day-to-day factors. Therefore, transit

authorities seek to increase the quality of service by provid-

ing passengers with predicted arrival times at a bus stop

using algorithms that exploit transit data (Hua etal. 2018).

With computational power becoming cheaper and easily

accessible, it is increasingly feasible to use data driven mod-

els for accurately predicting arrival times by leveraging a

large volume of data. These prediction models can assist in

developing intelligent trip planning apps, improved schedul-

ing systems for doctors and other businesses, and improving

urban planning strategies for city authorities.

In this paper, we propose a regression task to test the abil-

ity for machine learning algorithms to predict whether a bus

at a given stop and time will be early, on time, or late based

on the transit and weather data for Toronto Transit Commis-

sion (TTC). The machine learning models that we experi-

ment with include traditional feed-forward artiﬁcial neural

network (ANN) and a recurrent neural network (RNN) using

long short term memory (LSTM).

Our contribution can be summarized as follows:

– To our knowledge, this is the ﬁrst work that investigates

the impact of weather data on prediction accuracy for

bus arrival times. We compare the prediction models

with and without weather features. Previous work either

avoided using weather data altogether, e.g. (Kumar etal.

2014) or did not ﬁnd weather to be a useful feature for

their prediction task (Patnaik etal. 2004).

– We used historical arrival times, weather data, and other

input features for arrival time prediction for transit buses.

We found that the LSTM model, a variant of Recurrent

Neural Network that uses long term dependencies, yields

the best predictive performance.

– We found that weather has strong relationship with

arrival time prediction models. In nearly half of our data,

including weather improved the prediction accuracy by

48%. We also found that including the weather data sig-

niﬁcantly improves the accuracy when predicting bus

arrival times at multiple future stops in a trip.

– Because of the importance of weather, we built a sepa-

rate RNN model that focuses on the weather feature and

combined its result with the result of the LSTM model.

This combined hybrid model improved the prediction by

more than 500%.

The rest of the paper is organized as follows. The next sec-

tion discusses the related work. Section3 discusses the data

collection. Section4 discusses the machine learning mod-

els that we used. Section5 discusses the results and Sect.6

concludes the paper.

2 Related work

This section discusses related work on bus arrival prediction.

In general, previous work used linear regression (LR) (Hua

etal. 2018), non-parametric regression (NPR) (Chang etal.

2010; Balasubramanian and Rao 2015), or Kalmann Filters

(KFT) (Shalaby and Farhan 2004).

Hua etal. (2018) use linear regression to predict bus

locations. Bus location data displays non-linear relation-

ships between its features. Therefore, data has to be con-

verted into a linear space to be used in conventional math-

ematical models such as linear regression. This requires a

signiﬁcant amount of data pre-processing and be in turn,

costly and time-consuming. Kormáksson et. al. (2014) use

additive models (non linear regression models) to predict

bus arrival times using General Transit Feed Speciﬁcation

(GTFS) data. GTFS data is standardized by Google, which

is used to provide schedules and geographic information to

Google Maps and other Google applications that show tran-

sit information. Regression models are easy to interpret and

fast to train. Shalaby and Farhan (2004) use very limited

AVL (Automatic Vehicle Location) and APC (Automatic

Passenger Counter) data on Kalmann Filters (KFT) to pre-

dict the arrival time for Toronto Transit Buses. Their data

size is small (only 5 days of vehicle locations). In our study,

we used 3.5 million data points which were collected over a

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

period of four months. We use large datasets for predicting

arrival times using machine learning algorithms. Wang et.

al. (Wang etal. 2019) applies a multi-objective optimization

technique to reduce the capacity allocation of subway sys-

tems based on diﬀerent factors, e.g. number of passengers,

headway, number of available trains. Their objective is to

reduce the passenger wait time in the subway system. Simi-

larly, passenger travel time was used to predict the arrival

time in subway stations a in study done for the city of Bei-

jing (Xu etal. 2020). We used several input features in our

prediction models, e.g. past arrival time, day of the week,

hour of the day, etc. Liu etal. (2019) studied the optimal

combination of diﬀerent input features in mass rapid transit

(MRT) systems. However, they did not consider the weather

in their study.

Kumar et. al. (2014) compared Kalmann Filters (1960)

with artiﬁcial neural network for bus arrival prediction in

Chennai, India. A key ﬁnding of this experiment was that

with a large volume of data, artiﬁcial neural network models

give better accuracy as compared to mathematical models

(linear regression and kalmann ﬁlters). Wang et. al. (2009)

use a Support Vector Machine (SVM) to model traﬃc con-

ditions. They used bus arrival times and bus schedules as

inputs to train their model. ANN and kernalized SVM have

gained popularity for predicting travel time because of their

ability to solve complex and non-linear relationships among

features (Chien etal. 2002; Kumar etal. 2014; Jeong and

Rilett 2004). In Hua etal. (2018), the performance of lin-

ear regression, artiﬁcial neural networks and support vector

machine models were compared for prediction of bus arriv-

als at a single stop using data from multiple routes. Lin-

ear regression’s performance was poor due to non-linearity

in data, however the performance of ANN and SVM were

quite competitive. These approaches did not use recurrent

neural networks in their predictions. Our work uses LSTM

recurrent neural networks. Moreover, we use weather data

in our prediction, which had not been incorporated in any

way in the previous approaches.

To our knowledge, there has not been an abundance of

work that uses weather data for predicting arrival times for

public transit buses. Yang et. al. (2016) use a combination

of genetic algorithms and support vector machines along

with weather conditions to predict bus arrival time. They

did not use historical arrival times and did not explore

recurrent neural networks in their study. Chen et. al. ( 2004)

used weather condition and automatic passenger counting

data with ANN for bus arrival prediction for New Jersey

county. The previous two studies only relied on weather con-

ditions (i.e. snow, rain, fog) in their models. We consider

other weather attributes, such as visibility and temperature.

Patnaik etal. (2004) used weather data as features for bus

arrival prediction model, however their experiment failed to

show improvement with weather data.

Ke et. al. used a combination of CNN and LSTM Recur-

rent Neural Networks along with weather data for forecasting

short-term passenger demand for ride services (Ke etal. 2017).

In contrast, we use weather data for a diﬀerent problem, i.e.

to predict arrival times of transit buses. Rui et.al. compared

the performance of a GRU Model (Gated Recurrent Neural

Network) and LSTM model on yet another prediction task

concerning traﬃc ﬂow prediction (Fu etal. 2016).

3 Dataset collection

We used four datasets to build our models: (1) Live Automatic

Vehicle Locations (AVL) data for Toronto Transit Comission

(TTC) transit buses, collected every 20 s, (2) bus schedules (3)

and bus stop locations retrieved from GTFS (General Transit

Feed Speciﬁcation) data, (4) hourly weather data collected

from a weather station near downtown Toronto. The AVL data

comprises of GPS locations for Toronto Transit Commission

(TTC) buses. This data is publicly available through the Next-

Bus API (Nextbus 2020). We collected more than 700,000

unique live GPS locations for transit buses for two routes,

Route 28 and Route 8 (Fig.1) for the City of Toronto over 3

months, from January 2018 to March 2018. Figure2 presents

Fig. 1 GPS locations mapped to bus stop location data for TTC

routes. Markers are the GPS coordinates calculated for actual arrival

time at each stop. The top map depicts Route 28, while the bottom

map depicts Route 8

O.Alam et al.

1 3

an overview of our study. Table1 summarizes the datasets

that we used in our study. After collecting the four datasets,

we calculated the arrival time for a bus at each bus stop in the

studied routes. Then, we calculated the diﬀerence between the

actual arrival time of a bus at a stop and its scheduled arrival

time in that stop. Based on this diﬀerence, we determined if

the bus had arrived early, on time, or was delayed. Then, we

normalized the data from all four datasets and used them as

inputs to our models.

3.1 Estimating actual arrival time

The TTC data does not, in fact, specify whether a bus had

arrived at a stop. The actual bus arrival time of a bus at a stop

is calculated using the distance between the GPS location of

the bus and the bus stop location. This distance is calculated

using the haversine formula (Veness 2018), which is a well-

known formula used to calculate the path distance between

two points on the surface of the earth, and has wide range

of applications, e.g. (Chopde and Nichat 2013; Basyir etal.

2017; Ingole and Nichat 2013). The formula gives the dis-

tances between two points on a sphere using their latitudes and

longitudes while ignoring hills:

In Equation1,

𝜑

is latitude,

𝛬

is longitude. In Eq.2, c is

the angular distance in radians. In Eq.3, R is Earth’s radius

(mean radius = 6371 km) and d is the distance between two

GPS locations in kilometer. Since the real time GPS location

data is collected for every 20 s, we may miss the exact time

when the bus actually arrives at the bus stop. Furthermore,

during a 20 s window, the bus could arrive at a bus stop and

start moving again. In that case, the recorded GPS location

of the bus could be further away from the bus stop.

To mitigate these issues, we identify the GPS location

where the distance between the bus and the bus stop is mini-

mal. We do this by checking whether the bus is close to

the bus stop, where closeness corresponds to the bus being

within 100 m from the stop.

(1)

a=

sin

2(

𝛥𝜑

∕

2

)+

cos 𝜑

1∗

cos 𝜑

2∗

sin

2(

𝛥

∕

2

)

(2)

c=

2

∗

atan2

(

√

a,

√

1

−

a

)

(3)

d=R∗c

Fig. 2 An overview of bus

arrival prediction

Table 1 Datasets used for our study

Data points

TTC real time data 700,000

GTFS bus stop schedule data 18,110

GTFS bus stop location data 24

Weather data 3624

Algorithm 1: Calculatingthe Actual Arrival Time of aBus

Input: GPSTime: Reported time for the GPS location of the bus, ScheduledTime:

Scheduled arrival timeofthe busatthe stop

Output: ActualTime: Actual arrival timeofabus at a stop

Let d=Distancebetween the bus and the stop using the Harvsinedistance equation;

Let min = ∞

while GP ST ime ≤ScheduledTime +25minutes OR GP ST ime ≥

ScheduledTime −25 minutes do

Calculate d;

if d≤min then

min=d

if dis within 100mof the stop then

ActualTime = GPSTimefor the bus of d

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

Figure3 illustrates how we calculate the diﬀerence

between the actual and the scheduled arrival times of a

bus at a particular bus stop. Let

bt

denote the GPS location

of the bus, t denote the time when the bus was at location

bt

, and

St

denote the bus stop location. Since we capture

GPS locations for the bus every 20 s, we may encounter a

large number of GPS locations around a particular bus stop

during the scheduled arrival time for the bus.

To determine whether the bus arrives on time, we use

the GPS locations of the bus that are reported within a 50

min window from its scheduled arrival at the bus stop

St

(i.e. 25 min before and 25 min after the scheduled arrival

time). Then, we choose the closest bus location to the bus

stop within that time window, for example,

bt5

in Fig.3.

The next step is to check if that GPS location is within the

vicinity of the bus stop (i.e., we check if

bt5

is within 100

m distance from the bus stop). Algorithm1 summarizes

this process.

After estimating the actual arrival time of buses at a

particular bus stops, we calculate the diﬀerence between

the actual and scheduled arrival times. Equation4 calcu-

lates the actual time diﬀerence between scheduled arrival

time and actual arrival time.

In a similar way, Equation 5 calculates the difference

between scheduled arrival time and predicted arrival time.

If the diﬀerence is less than zero the bus arrived late and if

the diﬀerence is greater than zero the bus arrived early.

After preprocessing the data, we conducted a preliminary

analysis on the collected bus arrival data. We found that in

more than 37% of the time the buses on these routes were

either delayed more than 5 min or arrived early by more than

5 min (see Fig.4). In some cases the delay was more than 20

min. During the period of our study, the scheduled arrival

times did not change, i.e., the schedules did not get updated

by TTC. Therefore, we can consider that our models predict

the arrival times. However, we used the two formulas in Eqs.

(4) and (5) for prediction because we were interested in the

delays and early arrivals.

(4)

Diﬀerenceactual =ScheduleArrivaltime −ActualArrivaltime

(5)

Diﬀerencepredicted =ScheduleArrivaltime −PredictedArrivaltime

4 Machine learning models

This section discusses the machine learning models that

we used for predicting the arrival time of buses on selected

routes. In particular, we use regression models to estimate

the amount of time that a given bus deviates from its sched-

ule. Given historical arrival times at a stop s, our models

predict the next arrival time at stop

s+1

.

In our study, we use four baselines to which we compare

our model’s results: SVR, ANN, ARIMA and Historical

Average.

1. Support Vector Regression (SVR): SVR (Drucker

etal. 1997) is an extension of the basic support vector

machines (SVM) (Boser etal. 1992). In linear regression

models, the error rate is minimized, whereas in SVR

models, the error is ﬁt within a certain threshold. The

model that emerges from SVR is the hyperplane that

separates a maximum number of data points.

2. Artiﬁcial Neural Network (ANN): (Zhang and Qi 2005):

is a network of interconnected neurons, inspired by stud-

ies of biological nervous systems (Zhang and Qi 2005;

Tan etal. 2005). Neurons are simple information pro-

cessing units. For time-series analysis, inputs to an ANN

model are observations from previous time-steps and the

output corresponds to the predicted observation at the

next time-step (Zhang and Qi 2005). The information

Fig. 3 Calculating actual arrival

of bus at a bus stop

Fig. 4 Distribution of the diﬀerence between actual arrival time and

scheduled arrival time, 20% of the buses are delayed more than 5 min

and 17% of the buses arrive early more than 5 min

O.Alam et al.

1 3

received from the input nodes is processed by hidden

layer units along with appropriate activation functions

to determine the output.

3. ARIMA: ARIMA stands for Autoregressive Integrated

Moving Average models. ARIMA is a mature time

series prediction model based on statistics. For time

series data, ARIMA predicts future values of the data

entirely based on the previous data points in the series.

4. Historical average: Historical averages are the mean

arrival time for bus trips. Historical averages are used as

a common reference point to compare the performance

of diﬀerent machine learning models.

4.1 Long short‑term memory (LSTM) recurrent

neural networks

In Fig.5, we show in a single LSTM cell structure how

LSTM recurrent neural network maintains long term

dependencies.

The LSTM architecture contains series of connected

cells. Each LSTM cell consists of a special unit called a

memory block in the recurrent hidden layer. The memory

blocks have connections that provides necessary informa-

tion to maintain temporal state of the network. LSTM cell

has three gates: Input gate, Output gate and Forget gate.

Input gate control the ﬂow of input information provided

to the LSTM cell. Output cell controls the output ﬂow of

cell activations into the rest of the network. Unlike conven-

tional RNN, LSTM recurrent neural network has a separate

forget gate which makes it more suitable for time-series

analysis. The forget gate decides which information is rel-

evant for the prediction task and removes irrelevant infor-

mation. These gates together provides the overall memory

function for LSTM recurrent neural networks.

Following an iterative process, the LSTM model estab-

lishes a mapping between an input sequence and the irreg-

ularity in arrival time (output) from the training set. Below

are the equations for the LSTM neural network:

(6)

Input Gate ∶it=𝛼(Wxixt+Whiht−1+Wci Ct−1+bi)

At time interval t,

𝛼

is the element-wise sigmoid function

1

1+exp(−x)

and

tanh

represents the hyperbolic tangent function

exp(x)−exp(−x)

exp(x)+exp)(−x)

.

it

,

ft

and

ot

are the input, forget and output gate states

respectively, and

Ct

is the cell input state.

xt

is input and

bi

,

bf

,

bo

and

bC

are the bias terms.

Wxi

,

Whi

and

Wci

are the weight matrices for the input

gate.

Wxf

,

Whf

and

Wcf

are weight matrices for forget gate.

Wxo

,

Who

and

Wco

are the weight matrices corresponding

to output gate.

Whi

,

Whf

,

Whc

,

Who

are the weight matrices

connecting

ht1

to the three gates.

The current cell state

Ct

is generated by calculating the

weighted sum of the previous cell state and the current cell

state.

The LSTM Recurrent Neural Network has the ability to

remove or add relevant information to the cell state, this is

because cell state is adjusted by input gate and forget gate.

The forget gate layer removes the irrelevant information

from the cell state. It uses

ht−1

and

xt

, and outputs a number

between 0 and 1 for each input in the sequence in the pre-

vious cell state

Ct−1

. If the number is zero, no information

passes through the gate. If the number is one, all the infor-

mation passes through the forget gate.

Similarly, Input gate decides what new information will

be stored to the cell state. The ﬁnal output is based on the

cell state of LSTM network. As explained above, the current

cell state depends on the previous cell state. Therefore, the

previous cell state is taken into consideration when updating

the weights of the LSTM cell. This is how LSTM cell is able

to maintain long term dependencies for predictions. LSTM

Recurrent Neural Networks has shown promising results in

solving complex machine learning tasks (Sutskever etal.

2014).

4.2 Recurrent neural networks fortheweather

feature

(7)

Forget Gate ∶ft=

𝛼

(Wxf xt+Whf ht−1+Wcf Ct−1+bf)

(8)

Cell Input ∶Ct=ftCt−1+ittanh(WxCxt+WhC ht−1+bC)

(9)

Output Gate ∶ot=𝛼(Wxoxt+Whoht −1+Wco Ct+bo)

(10)

hidden layer output ∶ht=ottanh(Ct),

(11)

InputData

=

⎛

⎜

⎜

⎜

⎜

⎝

T1

t0W1

t0T1

t1W1

t1W1

t2

⋮

⋮

Tn

t

0

Wn

t

0

Tn

t

1

Wn

t

1

Wn

t

2

⎞

⎟

⎟

⎟

⎟

⎠

Fig. 5 LSTM cell structure

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

Since weather condition has signiﬁcant impact on the

prediction results, we decided to create a Recurrent Neural

Network (RNN) model that focuses on the weather feature.

The output of this model is combined with the LSTM model

discussed in the previous subsection to increase the accuracy

(12)

TargetData

=

⎛

⎜

⎜

⎜

⎜

⎝

T1

t0

⋮

⋮

Tn

t

0

⎞

⎟

⎟

⎟

⎟

⎠

of prediction. The RNN model takes as input the arrival

times and weather readings at the current stop and at the

previous stop to predict the arrival time at the next stop when

the weather reading is known.

In this model, a window of three is chosen for arrival

times T and weathers W. Inputs that are shown in Eq.11,

are divided to three categories

X1

(previous bus stop),

X2

(current bus stop) and

X3

(next bus stop). These inputs are

illustrated graphically in Fig.6.

X1

is comprised of

T1

t0

and

W1

t

0

for n samples. Similarly,

X2

is comprised of n arrival

times and weather readings. The third input,

X3

only takes

weather readings. Equation12 shows the predicted arrival

times for the next bus stop. Figure6 illustrates the archi-

tecture of our RNN model. Two hidden layers,

h1

and

h2

are set in the diagram with diﬀerent matrix sizes. The

inputs go to

h1

with the batch size of 32. After processing

in

h1

, the results will be transferred to

h2

in diﬀerent for-

mats. In

h2

, the result of processing

X1

will be sent to a

matrix of 1×16 and will be concatenated with the result of

processing

X2

, which is sent to a matrix size of 1x32. Simi-

larly, the output of processing

X3

goes to a matrix of size

1×16 in

h2

. Then all results of

h2

are concatenated together.

Finally, the sigmoid function is applied in the last layer

which provides the arrival time prediction.

4.3 Data preprocessing andnormalization

Table2 summarizes the list of features used in our mod-

els. We have ﬂattened the data, i.e., we augment the bus

trip travelling southbound with the next trip for the same

bus travelling northbound. The ﬁrst feature (time diﬀ) is

Fig. 6 RNN architecture

Table 2 Features used for

model building Feature name Description

time diﬀ Diﬀerence between actual arrival time and scheduled arrival time. This is vari-

able which we are trying to predict

Tag Speciﬁes the direction on which bus is heading

Trip.ID A unique number given to each trip

Stop sequence Assigned sequence numbers starting from 1 to each bus stop in the route

Distance traveled Cumulative distance travelled by the bus to reach the bus stop

routeTag A unique numeric code to identify a particular route on which the bus is traveling

Stop ID A unique numeric code to identify a particular bus stop

Bus ID A unique numeric code to identify a particular bus

Service class Weekday, Saturday and Sunday

Day of the

week Numerical number indicating day of the week (1-Sunday, 2-Monday..etc)

Hour Numerical number indicating hour of the day

Max temperature Maximum temperature in the hour

Min temperature Minimum temperature in the hour

Visibility Visibility in Km, i.e., how far the driver is able to see

Weather condition Weather conditions: rain, snow, fog or haze

O.Alam et al.

1 3

calculated using live GPS locations and the TTC schedules

data as discussed in the previous section. The last four

features are obtained from the weather data. The rest of the

features are obtained from the live bus stop locations data.

Before a machine learning model is trained, all features

are converted into a vector representation (e.g., the cate-

gorical features). There are two ways to convert a categori-

cal feature into a vector representation; one-hot Encoding

and Label Encoding (Tan etal. 2005).

1. One-Hot Encoding: Encodes a categorical features as a

one-hot numeric vector, i.e., it creates binary column for

each category and returns a sparse matrix, where only

the entry at the row representing the category is assigned

a 1, with the remaining entries assigned 0, creating a

sparse vector.

2. Label Encoding: Transforms categorical features to

numerical features by assigning each categorical feature

a unique number which can be normalized before using

it as an input for machine learning model.

We have two categorical features, tag and weather condi-

tions. Tag was converted using one-hot Encoding because it

only has two categories (North and South). Weather condi-

tions was converted using Label Encoding. Other features in

our data do not require encoding because they are continu-

ous variables.

After converting all the features into a vector representa-

tion, data was normalized using the following equation:

In Eq.13,

xi

is the

ith

observation of a feature and

zi

is the

ith

normalized data point.

4.4 Model training inLSTM

The input to each LSTM cells is a 3-dimensional (3D)

matrix. The following discusses brieﬂy each dimension:

1. Sample size: sample size refers to how many rows are

given as an input to the model. In this study we used a

sample size of 32.

2. Time Steps: time step is one point of observation in the

sample. The number of steps determines steps ahead in

time the model will predict. We used one, two, three,

and four time steps in our model.

3. Features: The detailed explanation of each feature used

is discussed in Table2. Our model uses the time diﬀ fea-

ture as a dependent feature (output of the model), which

speciﬁes the diﬀerence between scheduled arrival time and

actual arrival time of bus from previous time stamp. We

(13)

z

i=

x

i

−min(x)

max(x)−min(x)

use 11 independent features as input to model, Trip.ID,

Tag, Stop.sequence, distance travelled, maximum temper-

ature, minimum temperature, visibility, hour, day of week,

service class, weather conditions as inputs to the model.

A unique property of Neural Networks is that the when

the model adjusts the weights, it can reduce the eﬀect of

the irrelevant features while training by assigning them

low weights. These features can still have a small nega-

tive inﬂuence on the model which can decrease its overall

accuracy. Only features which gave us the highest accuracy

were used in the ﬁnal model. We did an ablation study, by

removing one feature at a time and calculating the error

rate of the model. From Table2, we found that Stop ID

and Bus ID to be insigniﬁcant to our model. Therefore,

we excluded them from our model. Other features showed

signiﬁcant impact on the accuracy of the model.

In our LSTM model architecture, we use 12 input neu-

rons, this represents the number of features (11 independ-

ent features and 1 dependent feature) in our dataset used

for modeling. The number of neurons used in the output

layer is 1 which speciﬁes the diﬀerence between predicted

arrival and scheduled arrival times (i.e., delay or early

arrivals) for a bus at a stop. We tried diﬀerent variations of

LSTM hidden layers and tried diﬀerent number of LSTM

cells within each layer. For the ﬁnal model selection, we

choose 1 hidden layer with 100 LSTM cells with ’ReLU’

(Goodfellow etal. 2016) activation function.

When the LSTM model starts training, a sequence of 3D

samples (3D tuple) is given to an LSTM layer. The values

of a sequence are (32, 1, 12). This means, in one iteration,

the model runs 32 samples (batch size), to predict 1 time

step ahead, using 12 input features (11 independent input

features discussed previously and the previous reading for

the dependent feature, i.e. time diﬀ). In the next model

iteration each sample will carry the cell state (weights)

and a forget gate. Forget gate controls how much from the

current cell state is passed to next cell, thus, ensuring that

model can learn longer sequences.

When training neural networks, several decisions need

to be made regarding the choice of hyperparameters used

by the model. We chose the following hyperparameters

for our model:

1. Activation functions: are non-linear mathematical functions

used to combine the output of neurons at one layer for the

next layer. They are important for a neural network model

as they introduce non-linear properties to the model.

We experimented with diﬀerent activation functions,

such as, linear, sigmoid and ReLU, for our ﬁnal model

we used ReLU activation function.

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

2. Optimization algorithms: help to minimize (or maxi-

mize) an error function, and they are used to compute

the output in such a way that it is computationally less

expensive and the model converges to the global minima

rather than a local minima. We investigated RMSprop

and ADAM optimizers. For the ﬁnal model we used

ADAM optimizer.

3. Epochs: specify how many full passes of the data set

(epochs) should be used during training. If we use too

few epochs, we may underﬁt the model and do not allow

it to learn everything it can from the training data. If we

use too many epochs, we may overﬁt the model, which

leads to introducing noise to the model.

4. Early stopping: early stopping is a regularization method

used to prevent the model from over-ﬁtting. Early stop-

ping is used to remove the need to manually adjust the

value of epochs while training a model. When the error

rate of the model stops decreasing it automatically stops

model from training. Another method for regularization

is called dropout (Srivastava etal. 2014), we found that

Early Stopping works best for our model.

5. Batch size: batch size is the number of samples that will

be propagated through the network in one iteration. A

batch size can be either less than or equal to the total num-

ber of training samples. Advantages of using a small batch

size is that it requires less memory for training. A small

batch size also reduces the overall training time required

by the model, which important when working with large

datasets because it is not possible to ﬁt all of the data

into memory at once. However, if the batch size is too

small, it can lead to less accurate models because we are

not providing suﬃcient number of samples to the model,

which leads to less accurate estimate of the output.

Table3 shows diﬀerent conﬁgurations of LSTM model that

we tried in our experiments. For our ﬁnal model, we used

one hidden layer with 100 cells and one dimensional output

representing the next arrival time. This conﬁguration provides

the best performance for both Route 28 and Route 8.

5 Results/model performance

To measure the performance of our models, we calculate its

Mean Absolute Percentage Error (MAPE) and the Root Mean

Square Error (RMSE) on the testing data. All models were

trained ten times and the average of MAPE and RMSE error

rates were considered as the ﬁnal value for the models.

The equation for these performance measures are deﬁned

as follows:

Where

yt

is the actual value and

xt

is the predicted value. In

our case,

yt

is the diﬀerence of scheduled arrival time and

actual arrival time,

xt

is the diﬀerence between scheduled

arrival time and predicted arrival time, and n is the number

of samples.

Table4 shows the MAPE and RMSE values for diﬀerent

models for Route 28. The LSTM model substantially outper-

formed other models. It shows a 7 fold reduction in MAPE

over historical average. A possible reason that LSTM model

MAPE

=

n

t=1

yt−xt

yt

RMSE =

n

t=1(yt−xt)2

n

Table 3 Model tuning for

LSTM

Bold values indicate that the ﬁnal values used in the experiments

Activation Layers Cells Batch size Rote 28 Rote 8

RMSE MAPE RMSE MAPE

ReLU 1 10 32 433.15 0.2 284.77 0.44

ReLU 1 50 32 427.87 0.14 277.97 0.45

ReLU 1 100 32 422.22 0.13 269.49 0.36

Linear 1 50 32 426.56 0.17 283.58 0.55

Linear 1 100 32 425.52 0.16 276.74 0.45

ReLU 1 100 64 426.24 0.23 275.76 0.41

Sigmoid 1 100 32 433.58 0.25 279.62 0.4

ReLU 3 40,80,40 32 427.68 0.31 283.56 0.54

ReLU 3 40,80,40 64 427.77 0.28 283.75 0.54

ReLU 2 40,40 64 431.50 0.2 279.32 0.5

Table 4 Comparison of diﬀerent models for Route 28

Bold values indicate that the ﬁnal values used in the experiments

Historical average ARIMA SVR ANN LSTM

MAPE 0.91 0.80 0.68 0.30 0.13

RMSE 477.87 432.69 428.79 427.33 422.2

O.Alam et al.

1 3

performs better than other models is because it may account

more directly to the long term dependencies between input

and output features. LSTM model also was best performing

for Route 8 as shown in Table5.

We observe that the RMSE value for LSTM model is

not substantially lower than the baseline models. RMSE is

sensitive to large outlying errors which occurred in our data,

and performs best when errors follow a normal distribution

(Chai and Draxler 2014). Chai and Draxler (2014) suggest

to remove the outliers that are larger than other errors by

several orders of magnitude. However, we did not need to

remove outliers, i.e., extreme irregularities, because MAPE

clearly showed LSTM model outperforms other models, and

the RMSE value for LSTM model is lower than all other

baseline models. In addition, we were interested to see the

impact of weather on extreme irregularities. In the next sub-

section, we investigate the performance of LSTM model

with and without the weather data.

5.1 Signicance oftheweather data

We investigated the impact of the weather data on the accu-

racy of our prediction models. When we ran our models with

weather data features (i.e., when we included the following

features: maximum temperature, minimum temperature,

visibility and weather conditions), we noticed signiﬁcant

improvement in the results (see Table6 for Route 28 and

Table7 for Route 8).

Figure7 compares the actual arrival time versus predicted

arrival time with and without using weather data for Route

28. The x-axis shows the ordered observations of bus arriv-

als at stops. As mentioned previously, we augment the bus

trip travelling on a direction with the next trip for the same

bus travelling the opposite direction. This means the x-axis

depicts the arrival of the bus at the ﬁrst stop, followed by

its arrival at the next stop. When the bus arrives at the last

stop, it returns back on the same route. The next observation

after the last stop would be next arrival of the same bus at

the stop before the last stop. The y-axis is time in seconds.

It can be observed from the plot that the model created with

the weather data has better accuracy than the model that was

created without the weather data. In particular, we notice

that the model that was created using weather data was able

to capture extreme delays and early arrivals better than the

model that was created without the weather data. We notice

similar trend for Route 8 (see Fig.8).

Furthermore, we compared the results of LSTM mod-

els for diﬀerent portions of the data. We observed that

Table 5 Comparison of diﬀerent models for Route 8

Bold values indicate that the ﬁnal values used in the experiments

Historical average ARIMA SVR ANN LSTM

MAPE 0.92 0.84 0.76 0.49 0.36

RMSE 292.38 286.64 279.01 278.69 269.49

Table 6 Comparison of models with and without weather data for

Route 28

Bold values indicate that the ﬁnal values used in the experiments

LSTM without weather

data

LSTM with weather data

MAPE 0.21 0.13

RMSE 427.02 422.2

Table 7 Comparison of models with and without weather data for

Route 8

Bold values indicate that the ﬁnal values used in the experiments

LSTM without weather

data

LSTM with weather

data

MAPE 0.43 0.36

RMSE 279.11 269.49

Fig. 7 Model performance of LSTM Model with and without weather data on Route 28

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

for 16% of the data, the model with the weather data has

much higher prediction accuracy when compared to the

model created without the weather data (see Table8). The

model accuracy improves with weather by 310% when we

compare RMSE and by 282% when we compare MAPE.

Table8 clearly demonstrates that weather plays a signiﬁ-

cant impact on the prediction accuracy for nearly half of

the data (49%). We observed similar results for Route 8,

where weather had higher impact (for nearly half of the

data, the model accuracy improved by more than 150%

as shown in Table9). The impact of weather decreases as

we see more data points because additional factors may

also contribute to bus arrival prediction, suggesting that

weather has complex non-linear relationship with bus

arrival times. Examples of these factors are traﬃc condi-

tions, construction zones, emergency vehicles, number of

passengers which we are planning to explore in future.

However, we will mitigate this issue by modelling weather

and arrival times in a separate RNN model as explained

by end of this section.

To investigate further how much impact an individual

weather feature has on the model, we created three LSTM

models by just removing one feature and keep other features.

Fig. 8 Model performance of LSTM Model with and without weather data on Route 8

Table 8 Diﬀerence in RMSE

and MAPE of with and without

the weather data for Route 28

Bold values indicate that the ﬁnal values used in the experiments

% Data RMSE MAPE

Weather No Weather % Weather No Weather %

16% 10.75 44.12 310% 934.4 282%

33% 20.94 42.74 104% 20.339.5 95%

49% 34.27 47.56 39% 34.150.5 48%

66% 55.63 62.12 11% 47 63.2 34%

82% 116.13 117.72 1.3% 61.58 72.97 18%

Table 9 Diﬀerence in RMSE

and MAPE of with and without

the weather data for Route 8

Bold values indicate that the ﬁnal values used in the experiments

% Data RMSE MAPE

Weather No Weather %RMSE Weather No Weather %MAPE

16% 4.85 36.91 661% 4.534.2 660%

33% 9.25 35.86 288% 9.737 281%

49% 14.32 36.66 156% 16.36 41.5 154%

66% 22.98 37.49 63% 21.448.2 100%

82% 30.41 41.04 35% 39.47 62.08 57%

Table 10 Comparison of models with diﬀerent features for Route 28

Bold values indicate that the ﬁnal values used in the experiments

Visibility Weather

conditions

Temperature All weather

features

MAPE 0.17 0.15 0.18 0.13

RMSE 424.76 423.65 425.08 422.2

O.Alam et al.

1 3

The ﬁrst model removes visibility, the second model removes

weather conditions (rain, snow, haze, fog), and the third model

removes temperature. Table10 shows the comparison of dif-

ferent LSTM models as we remove diﬀerent features from the

model for Route 28. The MAPE value increases from 0.13 to

0.17 when we remove visibility feature from the model. Simi-

larly, when we keep all the other features except the weather

conditions the MAPE value increases to 0.15. Removing tem-

perature increases the MAPE value to 0.18. Similar observa-

tions were found for Route 8 (see Table11). These results

suggest that all weather features that we use in our models are

important to achieve better prediction accuracy.

5.2 Multi‑stop forecasting models

Apart from comparing diﬀerent machine learning models,

we also compared the accuracy of the LSTM model in pre-

dicting irregularities for multiple future stops in a trip (i.e.,

predicting the delay/early arrivals for the future arrivals of

the bus after its immediate next scheduled arrival).

We created 4 diﬀerent models: (

s+1

,

s+2

,

s+3

,

s+4

).

The ﬁrst model was discussed througout the paper and pre-

dicts one stop ahead in time (i.e., given the historical arrival

times and weather data for stop s, it predicts the irregulari-

ties for the next scheduled bus arrival at the next stop

s+1

).

The second model predicts the irregularities for the bus

arrival at stop

s+2

. Similarly the third and fourth models

predict irregularities for the bus arrival at stop

s+3

and

s+4

, respectively. Figures9 and 10 show the comparison

between the MAPE% errors when predicting irregularities

for multiple stops with and without the weather data.

It is clear from Figs.9 and 10 that the model performance

decreases as we predict for multiple future stops ahead in

time. This is similar to the ﬁndings by (Duan etal. 2016),

(Hua etal. 2018) and (Kormáksson etal. 2014)). However,

we found that when weather data was excluded (the dotted

lines), the rate of decrease in prediction accuracy increases

as we predict for more future stops. This suggests that

weather plays a signiﬁcant role when predicting arrival times

or their irregularities for multiple future stops.

5.3 Modelling weather feature withRNN model

Since the previous experiments clearly established that

weather has a significance influence on the prediction

results, we decided to use this feature in a separate RNN

model and combine the result with the LSTM model (which

also included the weather features as discussed previously).

The ﬁnal prediction is the average of the two models. The

architecture of the RNN model was discussed in Sect.4.

Our motivation was to investigate whether we can improve

the prediction accuracy if we create a model dedicated to

the weather. We tested and trained the RNN model with dif-

ferent hyper parameters and ﬁnally we have tuned the hyper

parameters as follow:

– learning rate = 0.001

– training epochs = 300

– batch size = 32

– display step = 1

Table12 compares the performance of this model with the

LSTM model for route 28. The RMSE of our new hybrid

model showed improvement of 562.38% over the LSTM

model for route 28 for 82% of the data. For route 8, the

improvement was 873.85% as shown in Table13. We also

noticed that the accuracy does not decrease when we add

more data to the model, contrast to the ﬁndings in Sect.5.1.

This could be because the RNN model focuses on the

weather features, while the LSTM model includes other

Table 11 Comparison of models with diﬀerent weather features for

Route 8

Visibility Weather

conditions

Temperature All weather

features

MAPE 0.42 0.40 0.42 0.36

RMSE 278.30 278.43 281.51 269.49

Fig. 9 Prediction accuracy with and without weather features for mul-

tiple stops for Route 28

Fig. 10 Prediction accuracy with and without weather features for

multiple stops for Route 8

Predicting irregularities inarrival timesfortransit buses withrecurrent neural networks…

1 3

features along with the weather. In other words, in small

portion of the data, weather condition played a signiﬁcant

role in improving the prediction results in the LSTM model.

However, when a separate RNN model is used for weather,

its role to improve accuracy included larger segments of the

data.

6 Conclusion

Nowadays, complex machine learning algorithms can be

applied quickly over large datasets, thanks to the advances

in the area of big data analytics. This paper investigates

diﬀerent prediction model for irregularities in bus arrival

times, using machine learning algorithms. In particular,

we built Long Short-Term Memory Recurrent Neural

Network models to predict the next arrival time for a bus

at a particular stop. Our prediction models use historical

bus arrival data, i.e. real time GPS locations for Toronto

Transit buses, bus schedules obtained from a Google API,

and weather condition data obtained from a weather sta-

tion in Toronto. Our analysis show that Toronto transit

buses experience signiﬁcant irregularities in arrival times.

In nearly 37% of times, transit buses are either delayed or

arrive early by more than 5 min, showing great room for

improvement. To our knowledge, this is the ﬁrst work to

investigate the impact of weather on bus arrival prediction.

We found that weather plays a signiﬁcant role improv-

ing prediction accuracy. Therefore, we built a prediction

model that combines two machine learning models: an

LSTM model that focuses on a range of input features,

e.g. arrival times and hour of the day, and an RNN model

which focuses on the weather features. We also investi-

gated prediction accuracy for multiple scheduled arrival

of buses ahead in time using weather data. In future, we

plan collect more data in order to run our experiments over

the entire year. Our current study covers the Winter season

and the beginning of the Spring season in Toronto. We

plan to extend our study to cover all weather seasons. In

addition, we plan to extend our work on bus arrival predic-

tion by using machine learning algorithms with additional

datasets, such as passenger count and traﬃc condition.

Furthermore, we plan to use diﬀerent RNN extensions,

such as the Gated Recurrent Unit (GRU) (Cho etal. 2014;

Che etal. 2016).

References

Balasubramanian P, Rao KR (2015) An adaptive long-term bus

arrival time prediction model with cyclic variations. J Public

Transport 18:1–18. https ://doi.org/10.5038/2375-0901.18.1.6

Basyir M, Nasir M, Suryati S, Mellyssa W (2017) Determination of

nearest emergency service oﬃce using haversine formula based

on android platform. EMITTER Int J Eng Technol 5(2):270–278

Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm

for optimal margin classifiers. In: Proceedings of the fifth

annual workshop on computational learning theory, ACM,

New York, NY, USA, COLT ’92, pp 144–152. https ://doi.

org/10.1145/13038 5.13040 1

Chai T, Draxler RR (2014) Root mean square error (rmse) or mean

absolute error (mae)? arguments against avoiding rmse in the

literature. Geosci Model Dev 7(3):1247–1250. https ://doi.

org/10.5194/gmd-7-1247-2014

Chang H, Park D, Lee S, Lee H, Baek S (2010) Dynamic multi-

interval bus travel time prediction using bus transit data. Trans-

portmetrica 6(1):19–38

Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2016) Recurrent

neural networks for multivariate time series with missing val-

ues. Sci Rep. https ://doi.org/10.1038/s4159 8-018-24271 -9

Chen M, Liu X, Xia J, Chien SIJ (2004) A dynamic bus-arrival

time prediction model based on apc data. Comput Aided

Civ Infrastruct Eng 19:364–376. https ://doi.org/10.111

1/j.1467-8667.2004.00363 .x

Chien SIJ, Ding Y, Wei C (2002) Dynamic bus arrival time prediction

with artiﬁcial neural networks. J Transport Eng 128(5):429–438.

https ://doi.org/10.1061/(ASCE)0733-947X(2002)128:5(429)

Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F,

Schwenk H, Bengio Y (2014) Learning phrase representations

using rnn encoder–decoder for statistical machine translation.

In: Proceedings of the 2014 Conference on Empirical Methods

in Natural Language Processing (EMNLP), Association for

Computational Linguistics, Doha, Qatar, pp 1724–1734. http://

www.aclwe b.org/antho logy/D14-1179

Chopde NR, Nichat MK (2013) Landmark based shortest path detec-

tion by using a* and haversine formula. Int J Innov Res Comput

Commun Eng 1(2):298–302

Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997)

Support vector regression machines. In: Mozer MC, Jordan

MI, Petsche T (eds) Advances in neural information processing

Table 12 Diﬀerence in RMSE for the LSTM model and our

LSTM+RNN (weather) model for Route 28

Bold values indicate that the ﬁnal values used in the experiments

RMSE

% Data LSTM+RNN LSTM %

49% 12.97 34.27 264.23%

66% 18.55 55.63 299.90%

82% 20.65 116.13 562.38%

Table 13 Diﬀerence in RMSE for the LSTM model and our

LSTM+RNN (weather) model for Route 8

Bold values indicate that the ﬁnal values used in the experiments

RMSE

% Data LSTM+RNN LSTM %

49% 7.28 14.32 196.71%

66% 5.84 22.98 393.50%

82% 3.48 30.41 873.85%

O.Alam et al.

1 3

systems 9, MIT Press, Cambridge, pp 155–161, http://paper

s.nips.cc/paper /1238-suppo rt-vecto r-regre ssion -machi nes.pdf

Duan Y, Lv Y, Wang FY (2016) Travel time prediction with lstm

neural network. In: 2016 IEEE 19th international conference on

intelligent transportation systems (ITSC), pp 1053–1058

Fu L, Yang X (2002) Design and implementation of bus–holding

control strategies with real-time information. Transp Res Rec

1791(1):6–12

Fu R, Zhang Z, Li L (2016) Using lstm and gru neural network

methods for traﬃc ﬂow prediction. pp 324–328. https ://doi.

org/10.1109/YAC.2016.78049 12

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The

MIT Press, Cambridge

Hua X, Wang W, Wang Y, Ren M (2018) Bus arrival time prediction

using mixed multi-route arrival time data at previous stop. Trans-

port 33(2):543–554

Ingole P, Nichat MMK (2013) Landmark based shortest path detec-

tion by using dijkestra algorithm and haversine formula. Int J

Eng Res Appl (IJERA) 3(3):162–165

Jeong R, Rilett R (2004) Bus arrival time prediction using artiﬁcial

neural network model. In: Proceedings. The 7th international

IEEE conference on intelligent transportation systems (IEEE

Cat. No.04TH8749), pp 988–993. https ://doi.org/10.1109/

ITSC.2004.13990 41

Kalman RE (1960) A new approach to linear ﬁltering and prediction

problems. Trans ASME J Basic Eng 82(Series D):35–45

Ke J, Zheng H, Yang HXC (2017) Short-term forecasting of passenger

demand under on-demand ride services: a spatio-temporal deep

learning approach. Transport Res Part C Emerg Technol. https ://

doi.org/10.1016/j.trc.2017.10.016

Kormáksson M, Barbosa L, Vieira MR, Zadrozny B (2014) Bus travel

time predictions using additive models. In: 2014 IEEE inter-

national conference on data mining, pp 875–880. https ://doi.

org/10.1109/ICDM.2014.107

Kumar V, Kumar BA, Vanajakshi L, Subramanian SC (2014) Com-

parison of model based and machine learning approaches for 1

bus arrival time prediction. Transportation Research Board 93rd

Annual Meeting. http://docs.trb.org/prp/14-2518.pdf

Liu L, Chen RC, Zhao Q, Zhu S (2019) Applying a multistage of input

feature combination to random forest for improving mrt passenger

ﬂow prediction. J Ambient Intell Hum Comput 10(11):4515–4532

Liu Z, Yan Y, Qu X, Zhang Y (2013) Bus stop-skipping scheme with

random travel time. Transport Res Part C Emerg Technol 35:46–

56. https ://doi.org/10.1016/j.trc.2013.06.004

Nextbus Nexbus public feed. https ://www.nextb us.com/xmlFe edDoc s/

NextB usXML Feed.pdf. Accessed 2020

Patnaik J, Chien S, Bladikas A (2004) Estimation of bus arrival times

using APC data. J Public Transp 7(1):1

Shalaby A, Farhan A (2004) Prediction model of bus arrival and depar-

ture times using avl and apc data. J Public Transport 7(1):41–61.

https ://doi.org/10.5038/2375-0901.7.1.3

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R

(2014) Dropout: a simple way to prevent neural networks from

overﬁtting. J Mach Learn Res 15(1):1929–1958. http://dl.acm.

org/citat ion.cfm?id=26274 35.26703 13

Star TT (2020) Ttc gives notes for aﬀected customers arriving late

for work. https ://www.thest ar.com/news/gta/2017/12/01/late-for-

work-the-ttc-can-give-you-a-note-for-that.html. Accessed 2020

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learn-

ing with neural networks. In: Ghahramani Z, Welling M, Cor-

tes C, Lawrence ND, Weinberger KQ (eds) Advances in neural

information processing systems 27, Curran Associates, Inc., pp

3104–3112. http://paper s.nips.cc/paper /5346-seque nce-to-seque

nce-learn ing-with-neura l-netwo rks.pdf

Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining,

1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

Veness C (2018) Movable type scripts: calculate distance, bearing and

more between latitude/longitude points. URL:https ://www.movab

le-type.co.uk/scrip ts/latlo ng.html

Wang B, Huang J, Xu J (2019) Capacity optimization and allocation of

an urban rail transit network based on multi-source data. J Ambi-

ent Intell Hum Comput 10(1):373–383

Wang J, Chen X, Guo S (2009) Bus travel time prediction model with

v-support vector regression. In: 2009 12th International IEEE con-

ference on intelligent transportation systems, pp 1–6

Xu J, Wu Y, Jia L, Qin Y (2020) A reckoning algorithm for the predic-

tion of arriving passengers for subway station networks. J Ambient

Intell Hum Comput 11(2):845–864

Yang M, Chen C, Wang L, Yan X, Zhou L (2016) Bus arrival time

prediction using support vector machine with genetic algorithm.

Neural Netw World 26:205–217. https ://doi.org/10.14311 /

NNW.2016.26.011

Zhang P, Qi M (2005) Neural network forecasting for seasonal and

trend time series. Eur J Oper Res 160:501–514. https ://doi.

org/10.1016/j.ejor.2003.08.037

Publisher’s Note Springer Nature remains neutral with regard to

jurisdictional claims in published maps and institutional aﬃliations.

- A preview of this full-text is provided by Springer Nature.
- Learn more

Preview content only

Content available from Journal of Ambient Intelligence and Humanized Computing

This content is subject to copyright. Terms and conditions apply.