ArticlePDF Available

Abstract and Figures

👀Online Video! https://youtu.be/yfq6nr8XK_Y 👩‍🏫 In this paper, we investigate the day-to-day regularity of urban congestion patterns. We first partition link speed data every 10 min into 3D clusters that propose a parsimonious sketch of the congestion pulse. We then gather days with similar patterns and use consensus clustering methods to produce a unique global pattern that fits multiple days, uncovering the day-to-day regularity. We show that the network of Amsterdam over 35 days can be synthesized into only 4 consensual 3D speed maps with 9 clusters. This paves the way for a cutting-edge systematic method for travel time predictions in cities. By matching the current observation to historical consensual 3D speed maps, we design an efficient real-time method that successfully predicts 84% trips travel times with an error margin below 25%. The new concept of consensual 3D speed maps allows us to extract the essence out of large amounts of link speed observations and as a result reveals a global and previously mostly hidden picture of traffic dynamics at the whole city scale, which may be more regular and predictable than expected.
Travel time estimation based on congestion patterns. (a) Map of the probe trips (b) Travel time estimation errors for all probe trips and all training days considering the three estimation methods: M1, link speed is the mean speed in the original cluster; M2, link speed is the mean speed in the consensus cluster; M3, same as M2 but the mean speed is calculated over all days of the same group (c) Estimated vs. experimented travel times for the 10 probe trips, one validation day and a departure time equal to 9am (d) Estimated vs. experimented travel times for the 10 probe trips, all validation days and all departure times (e) Distribution of the travel time estimation errors for the 10 probe trips, all validation days and all departure times. (b) Shows that travel time errors are in most case relatively low. Averaging speed within each cluster has the highest contribution to errors. Interestingly, using the consensus cluster shape (M1→M2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M1\to M2$$\end{document}) and the average of all days within a group (M2→M3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M2\to M3$$\end{document}) have very impacts on errors. (c–e) Show that travel time predictions based on assigning a new day to an historical group and using the consensus cluster shape and the mean cluster speed of the group are very good for most probe trips.
… 
Content may be subject to copyright.
1
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
www.nature.com/scientificreports
Revealing the day-to-day regularity
of urban congestion patterns with
3D speed maps
Clélia Lopez1, Ludovic Leclercq
1, Panchamy Krishnakumari2, Nicolas Chiabaut
1 &
Hans van Lint2
In this paper, we investigate the day-to-day regularity of urban congestion patterns. We rst partition
link speed data every 10 min into 3D clusters that propose a parsimonious sketch of the congestion
pulse. We then gather days with similar patterns and use consensus clustering methods to produce a
unique global pattern that ts multiple days, uncovering the day-to-day regularity. We show that the
network of Amsterdam over 35 days can be synthesized into only 4 consensual 3D speed maps with 9
clusters. This paves the way for a cutting-edge systematic method for travel time predictions in cities.
By matching the current observation to historical consensual 3D speed maps, we design an ecient
real-time method that successfully predicts 84% trips travel times with an error margin below 25%.
The new concept of consensual 3D speed maps allows us to extract the essence out of large amounts of
link speed observations and as a result reveals a global and previously mostly hidden picture of trac
dynamics at the whole city scale, which may be more regular and predictable than expected.
Studying human mobility in large cities is critical for multiple applications from transportation engineering to
urban planning and economic forecasting. In recent years, the availability of new data sources, e.g. mobile-phone
records and global-positioning-system data, has generated new empirically driven insights on this topic. A central
question at large spatial and temporal scales is which (dynamic) components of human mobility can be consid-
ered as predictable and thus suitable for explanatory and predictively valid mathematical models, and which part
is unpredictable. Earlier studies of human trips shows that traveled distance can be described by random walks
and more precisely as Lévy-ights1. Latter studies partly amend this theory by recognizing some regularity fea-
tures in peoples’ trips. Individuals obviously frequently move between specic locations, such as home or work2.
Such patterns are also regular in time3,4 meaning that the most frequent locations are likely to be correlated with
daily hours and dates. Regularity can also come from decomposition by transportation modes5. Human mobility
can be studied at the microscopic level, i.e. through person trajectories, but also at the macroscopic level, for
example by estimating commuting ows between dierent regions (origins to destinations) or on the dierent
links of a transportation network6,7. Such collective mobility patterns can be explained for example by distances
between regions8,9, trip purposes10 and road attractiveness related to road types, e.g. freeways, or locations, e.g.
in major business districts11. Predicting commuting ows oen requires local data for calibration12 meaning that
results cannot easily be transferable to other regions or cities. Recent ndings13, however, show that a scale-free
approach corresponding to an extension of the radiation model can successfully be applied to commuting ow
estimation. is means that some regular patterns can be observed also at the macroscopic level.
In this paper, we aim to pursue the investigation of regularity in macroscopic mobility patterns not by focusing
on the commuting ow distributions; but on the resulting level of service of the transportation (road) network,
i.e. on congestion patterns. Along with commuting ows, congestion patterns vary both within days and from
day-to-day at large urban scales. It is common knowledge that some regularity happens as congestion is usually
observed during peak hours on the most critical links of the network. In contrast to commuting ows, congestion
patterns are more easily observed using real data as they only require speed information in the dierent network
links. Nowadays, such information is easily accessible through dierent sensing technologies that are massively
deployed in many cities. However, in large networks with speed data on hundreds (or thousands) of links over a
large number of time periods, studying regularity and identifying distinct network congestion patterns is not an
1Univ. Lyon, IFSTTAR, ENTPE, LICIT, Lyon, F-69675, France. 2Delft University of Technology, CITG, Delft, N-2600GA,
The Netherlands. Correspondence and requests for materials should be addressed to L.L. (email: ludovic.leclercq@
ifsttar.fr)
Received: 12 July 2017
Accepted: 6 October 2017
Published: xx xx xxxx
OPEN
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
2
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
easy task to undertake: the challenge is to see the forest (regular large-scale trac patterns) for the trees (many
local pockets of queuing and congestion spillback processes). Here, we propose a new concept to address this
challenge. We synthesize within days link speed data and simplify day-to-day comparisons, by means of so-called
spatio-temporal speed cluster maps. Such 3D speed maps consist of a joined partition of space (road network
links) and time (the dierent observations) into homogeneous clusters characterized by a constant mean speed.
More precisely, such a partitioning should fulll the following criteria: (i) all clusters should contain a single con-
nected graph component meaning that all links are reachable within a cluster, (ii) the internal speed variance for
all clusters should be minimized - the intra-cluster homogeneity criterion and (iii) the dierence in speed between
neighboring clusters should be maximized - the inter-cluster dissimilarity criterion.
Clustering is a common problem in dierent elds of engineering such as data mining14 or image segmen-
tation15. Two recent and signicant contributions in transportation for our work are (i) the application of the
k-means algorithm16 to partition urban networks by considering spatial locations of the road as new features in
the data and (ii) the denition of a similarity matrix between observations and the application of the Ncut algo-
rithm17. ese works result in 2D clusters, covering a spatial portion of transportation networks for a given time
period. To obtain a picture of the trac dynamics over dierent time periods, the algorithms are simply iterated
for each time period without connecting the 2D clusters. Note that usual clustering works in transportation also
include compactness as a requirement for clusters. is is because the main application is perimeter control. In
this paper we present an algorithm that directly unravels trac dynamics over both space and time. We favor
connectivity - requirement (i) - rather than compactness for clusters, which makes more sense in 3D. To this
end, we rst determine which clustering method is the most ecient to cluster all time-dependent link speed
observations into 3D speed maps, where we consider the intra-cluster homogeneity and inter-cluster dissimi-
larity criteria as well as the computational times to determine the optimal number of clusters. Second, we apply
consensus learning techniques18,19 to summarize multiple 3D speed maps from a training set of days, into a single
common pattern. Interestingly, such a meta-partitioning operation can be fullled with a very small number of
groups. is means that the day-to-day regularity of daily congestion patterns can be easily revealed based on
such a classication. Finally, we will show that using a single consensus pattern for each class of 3D congestion
maps is sucient to accurately estimate in real-time travel times in the city. is means that addressing congestion
patterns directly at the whole city scale for all time intervals reveals a meaningful and accurate global picture of
the city trac dynamics that can be used as an ecient alternative to classical methods that process much more
data at local and short-term scales.
Results
Our case study corresponds to most of the major street network of Amsterdam city excluding the freeways, see
Fig.1(a). Whereas the original mapping of the inner city network contains over 7512 links, it is coarsened in this
paper to 208 links and 214 nodes. Such an operation basically merges all successive links in the same direction
between two intersections into a single one and disregards the internal links in the original mapping for intersec-
tions, see the method section. Mean speed information is available every 10 min between 7am and 3 pm for all
208 links during 35 days. is information is derived from license plate recognition systems at dierent critical
points of the network. e methodology to derive link speed data from passing times, coarsen the network, and
reconstruct missing data has already been published20. It should be noticed that all the methods elaborated in this
paper can be applied to any set of time-dependent link speed data combined with the related connected graph
(contiguous time intervals for the same network link should be connected by an edge) whatever the initial sensing
method is.
Clustering results for individual days. So, the initial data for a particular day is an undirected graph in
which links are connected in space with their upstream and downstream neighbors following the road network,
and in time by their immediate neighbors, i.e. the previous and the next time intervals for a given link. Each link
is characterized by a spatial (x, y) position, a time and a speed value. Link directions are not considered during
the clustering process because changes in trac volume propagate forward while congestion propagates back-
ward and we want to capture both phenomena. To obtain the 3D speed map related to such data, we rst bench-
mark dierent clustering algorithms from the literature. We choose to oppose the most recent development in
clustering for transportation networks, i.e. the Ncut algorithm with snake similarity also referred to as S-Ncut17
(see supplementaryS1 for more details) with two simpler clustering algorithms, the k-means21 and DBSCAN22
algorithms, see the method section. e main dierence between these, is that S-Ncut uses network topology
when calculating the similarities between observations; whereas the two other methods simply use normalized
Euclidean distances (regardless of topology) to balance both space, time and speed values. Note we weigh speed
three times more heavily (α = 3) compared to space and time (vicinity) since our objective is to obtain clusters
with a narrow speed distribution, see supplementaryS2 for more rationales about the choice of α. e quality
of the clustering results is assessed for a given number of clusters n through two indicators that relate to the
intra-cluster homogeneity and the inter-cluster dissimilarity criteria respectively: the total within cluster variance
(TVn) and the connected cluster dissimilarity (CCDn).
TV n
ns
sCCD
nn xx
nn
1;
(1)
n
i
n
i
i
n
ii ni
n
ki
n
ik ik ik
i
n
ki
n
ik ik
1
1
2
211
11
δ
δ
=
=∑∑
∑∑
=
===+
==+
where ni is the number of links in cluster i,
xi
and si are respectively the mean and the standard deviation of link
speeds for cluster i,
ik
δ
is equal to 1 only if clusters i and k have a common border and s is the standard deviation
of link speeds for the whole network. Since we also impose that each cluster should contain a single connected
graph component, clustering results should be post-processed, see supplementaryS3. Note that S-Ncut results,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
3
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
even though the method includes topological considerations to calculate similarity between observations, also
require post-processing, see supplementaryS1. Post-processing has very little impacts on TVn and CCDn values
for S-Ncut. It deteriorates TVn values and to a lesser extent also CCDn values for DBSCAN and k-means methods,
see supplementaryS3. is is not surprising as these two methods only account for proximity (distance between
links) and not for connectivity within a cluster. In the end, what is important to assess the quality of a method is
to compare TVn and CCDn values aer post-processing when we are sure that connectivity - requirement (i) - is
veried.
Clustering results aer post-processing are presented for a randomly selected day among the 35 available
in Fig.1(b–f). e evolutions of TVn in Fig.1(b) and CCDn in Fig.1(c) are comparable for all three methods,
although k-means can be identied as the best method to minimize TVn, and DBSCAN appears slightly more e-
cient in maximizing CCDn. DBSCAN also appears to provide more stable (i.e. monotonically decreasing) results
for increasing cluster numbers than the other two. However, the TVn and CCDn values are not suciently dier-
ent to provide conclusive evidence that one method is better than the other two. What can be concluded is that the
S-Ncut algorithm has much higher computational times than the other two, which disqualies the method since
clustering has to be repeated for multiple dierent days. Both k-means and DBCAN are over 20 times faster than
S-Ncut on the same computer, see Fig.1(f). Finally, Fig.1(b,c) highlight that improvements to TVn and CCDn
values tend to signicantly reduce when the number of cluster exceeds 9 to 10. is means that for this particular
day, the optimal number of clusters can be xed to 9. e resulting 3D speed map is presented in Fig.1(d). A 3D
video is also visible on the data repository website, see additional information. In Fig.1(e) a slice at time t = 9am
is shown to illustrate the clustering results in detail. Note that links from the same cluster may look not connected
because of the slicing but they are of course connected through time links and dierent time periods.
Figure2 now presents the clustering results for all 35 days. Figure2(a) shows that S-Ncut and k-means gen-
erally outperform the DBSCAN method with lower TVn values. e score on CCDn values is much less decisive.
However, when reducing the number of clusters to 9, and testing all methods with this same number of clusters,
k-means clearly outperforms the other methods over all 35 days. Interestingly, when comparing Fig.2(b) to
Fig.2(a), one can observe that for this relatively low number of clusters (9), using k-means results in TVn values
that are very close to the best results obtained with any of the other two methods for larger number of clusters.
Figure2(c) and (d) provide a direct comparison of the three methods with respect to minimizing TVn and maxi-
mizing CCDn for n = 9. e k-means method generates a distribution of TVn values for all days that is signicantly
Figure 1. Link speed 3D clustering for one particular day. (a) Sketch of Investigated network - Amsterdam
city (NL) - MapData @2017 Google (b,c) Evolution of the total variance (TVn) and the connected cluster
dissimilarity (CCDn) with respect to the number of cluster for dierent clustering methods (d) Resulting 3D
speed maps for 9 clusters (e) Slide of the 3D speed map for time period t = 9am (f) Computational times for
dierent clustering methods and a targeted number of clusters equal to 9. Graphs (b,c,f) show that the clustering
algorithms that do not consider the graph topology, i.e. the k-mean and the DBSCAN, blast the S-Ncut in terms
of computational times with analogous TVn and CCDn results. DBSCAN appears very stable when the number
of cluster exceeds 6. Selecting 9 clusters looks optimal for this dataset and network conguration.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
4
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
better (lower) than both other methods. e distribution of CCDn with k-means is not the best (the highest), but
it is very close to what is obtained with the best methods for this indicator, i.e. the S-Ncut see Fig.2(d). Since
k-means is the most economical method in terms of computational cost, we can conclude that it must be favored
to obtain 3D speed maps in this case. Furthermore, the results provide evidence to x the optimal number of 3D
clusters to 9 for our case study.
Classication of multiple days to identify consensual congestion patterns. Now our objective
is to nd commonalities in the 35 daily congestion patterns, and, ideally, summarize these with a fewer number
of “consensual” patterns. To this end, we rst have to dene a common link network for all the 35 days, see sup-
plementaryS4. is is necessary because some links may have insucient observations on particular days to be
assigned with a signicant value. e procedure has 3 main steps as outlined in Fig.3(a). In step 1 we obtain 3D
speed maps related to each daily pattern, by running the k-means algorithm with 9 targeted clusters over all the
35 days of the dataset. Aer this, each observation, i.e. a couple composed by a link and a time period, is assigned
a cluster ID i. Each day k can then be synthesized into a single ordered vector of all observations πk, whose values
are the cluster ID. To compare two dierent days πk and πl and assess if their 3D speed maps have similar shapes,
we use the normalized mutual information (NMI) indicator. It has been designed to assess the proximity between
two clustering results18,23.
ππ
ππ
ππ
ππππ
ππ
==
+−
NMI
I
HH
HHH
HH
(, )
(, )
()()
() () (, )
()()
(2)
kl kl
kl
klkl
kl
where I(πk, πl) is the mutual information between πk and πl, which measures the mutual dependence between two
random variables18,
πH()
k
is the entropy of πk and
ππH(, )
kl
is the joint entropy of πk and πl. Calculating the NMI
for all day couples allows us to dene a similarity matrix. We can then classify the whole set of days using the Ncut
algorithm15, see step 2 in Fig.3(a). More specically, we apply a classical cross-validation approach by randomly
splitting our 35 days into a training set of 28 days and a validation set of 7 days and considering 12 replications in
total. e purpose of the validation set will be explained later. We test a partition of the 28 training days into 2 and
4 groups for all replications of the training set. It appears in all cases that 4 groups lead to better results, see sup-
plementaryS5. All four groups appear homogeneous with high mean NMI values inside a same group (usually
higher than 0.6) and low dierences between the maximum and the minimum NMI values (usually below 0.24).
When looking at the day labels (Monday, ) within the four groups, no clear pattern appears. The major
Figure 2. Clustering results for all 35 days. (a) Clustering eciency with respect to the number of clusters
(b) Clustering eciency for a number of clusters equal to 9 (c,d) TVn and CCDn values (respectively) for all
days and 9 clusters. (a) Shows that S-Ncut provides the best results compared to the other two methods when
the number of cluster is large (above 15). However, when the number of clusters is reduced to 9 (b), k-means
provides in general the lowest TVn values while leading to similar CCDn values than S-Ncut and DBSCAN. is
is conrmed by (c) and (d) that show TVn and CCDn distribution for all methods when the number of cluster is
9. More interestingly, by comparing (b) and (a), it appears that k-means with only 9 clusters usually lead to close
results compared to S-Ncut with a signicant higher number of clusters. So, we dene as 9 the optimal number
of clusters for all days.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
5
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
conclusion at this stage is that the 28 days can be classied into only 4 groups, which exhibits close 3D speed map
shapes. We are now going to adjust the cluster shapes of the days belonging to the same group to obtain a unique
consensual shape that can be applied within the group.
e consensus clustering problem consists in identifying the most representative partition from a group of
partitions18,24. e best of K (BOK) algorithm19 can be used to determine the median partition m (the 3D speed
map shape of a single day in our case) that maximizes the total similarity TS with all the other days belonging to
the same group:
ππ=
=
TS NMI(,)
(3)
k
a
mk
1
where a is the number of days in the targeted group,
m
π
is the vector resulting from the initial clustering (3D speed
map) for the median partition, and
the same vector but for each other day of the same group. e median par-
tition can be further improved (increasing TS) by moving some of its elements from one cluster to another, i.e.
changing the cluster ID of some elements in the vector. To realize such an optimization, we apply the one element
move (OEM) algorithm19. It consists in randomly changing the label of one element of the vector and assess if
such a change improves the TS value. The algorithm stops when TS has not been improved for a while.
Determining the consensus shaping for all 4 groups corresponds to the nal step 3 of the data processing, see
Fig.3(a). Figure3(b) and (c) illustrate the dierence between the original cluster shape of a particular day and the
consensual shape resulting from the processing of all days in the same group. Figure3(d) and (e) respectively
show the variations of the TVn and CCDn values when comparing the consensus cluster shape with the original
one for all the training days and all replications. It appears that the TVn values signicantly deteriorate (increase
by more than 2%) for only 15% of the days while the CCDn are significantly worse (decrease by more than
0.5 m/s) for only 20.8% of the days. Even for the days that see a signicant change in the clustering quality, the
nal values related to the consensual shape remain always acceptable. is means that the consensual shape is
relevant to describe in a unique and common manner the congestion patterns of the same group of days. Since
Figure 3. Classication of multiple days and congestion patterns identication for training sets. (a) e three
steps to obtain consensual 3D speed maps (b) Original clustering for a particular day (c) Consensus clustering
for the same day (d) Variation of TVn between the original and the consensual cluster shapes for all days and
all replications of the training set (e) Variation of CCDn between the original and the consensual cluster shapes
for all days and all replications of the training set (f) Distribution of the standard deviation of the mean cluster
speed within a group of days (one value per cluster ID, group and replication). (d) and (e) Show that in most
case switching from the original to the consensus shapes for a day has minor to acceptable impacts on the TVn
and CCDn values. is means that the consensus shapes can be considered as a good proxy for the clustering of
each day. (f) Shows that the consensus shape is also relevant to identify homogeneous regions in speed within a
group as the standard deviation of the mean cluster speed remains below 0.5 m/s for the vast majority of cases.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
6
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
classication into 4 groups appears sucient, the conclusion is that the 3D speed maps of the 28 training days can
be synthesized into only 4 dierent consensual congestion patterns. For now, the groups and the consensual
shapes have been dened based on the initial cluster shapes without using the link speed information. e
remaining question is to assess whether the consensual shapes are also relevant to dene homogeneous regions in
speed for each group of days. Because the consensual shape is the same within a group, it is easy to calculate the
mean speed for each of the 9 cluster IDs and each day. Figure3(f) shows the distribution of the standard deviation
of such a mean cluster speed among all days belonging to the same group. Such a calculation has been performed
for all replications of the training set. It turns out that 37% of the standard deviation values are below 0.5 m/s and
the vast majority (85%) is below 1 m/s. is means that the mean cluster speeds are very close for the same cluster
ID among the days of the same group.
is is a major result because it implies that the consensual shape is also relevant to summarize the speed
prole observed in the network over time for a same group of days. For a given group, we can associate to each
consensual cluster ID the mean of the mean cluster speeds for each day and so, obtain a single 3D speed map that
denes the congestion pattern of this group. In other words, all days of the same group can be synthesized into no
more than 9 cluster shapes and 9 mean speed values. For our case study (the Amsterdam network), 4 consensual
3D speed maps look sucient to capture the functioning of the entire work network over the 35 days and to get a
full overview of the dynamic trac conditions within the major road network of the city. is is strong evidence
for a high degree of regularity and predictability of macroscopic trac conditions in this network.
Application to real-time travel time prediction. We are now going to take advantage of the above major
result to propose a fresh new look on a classical and popular problem in transportation systems, i.e. travel time
prediction. is problem has been extensively investigated in the transportation literature using both (simula-
tion) model-based and data-driven approaches as shown by recent review papers25,26. Model-based approaches
use network trac ow models in conjunction with data assimilation techniques such as recursive Bayesian
estimators to predict the trac state and the resulting travel times in networks2729. Data-driven approaches use
general purpose parameterized mathematical models such as (generalized) linear regression30,31; kriging32; sup-
port vector regression33; random forest34; Bayesian networks35; articial neural networks, e.g. dynamic36,37 and
(increasingly oen) deep learning architectures38,39; and many other techniques to capture (learn) from data the
correlations between trac variables (speed, travel time) over space and time. When reviewing the literature,
there are many more approaches reported for estimation and prediction on freeway corridors, than for mixed
or urban networks, which we hypothesize is due to two reasons. First, until recently, insucient data sources
were available for such large-scale urban prediction models. Additionally, and more tentatively, the urban pre-
diction problem is a more complex problem to address than the freeway prediction problem because there are
many more degrees of freedom that govern the underlying local trac dynamics (e.g. intersection control, cross-
ing ows, high-frequency queuing also under free owing conditions, much more route alternatives, etc), and
thereby also the dynamics of speed and travel time. Recently, both model-based29,40 and more unied and sys-
temic data-driven approaches38,4143 have been proposed that, at least in principle, can be used to predict trac
variables in large-scale urban networks. However, when applied to large-scale networks, both model-based and
data-driven approaches are indeed computationally complex, and methodologically cumbersome due to the high
number inputs and parameters that continuously need to be calibrated and validated from data.
As an alternative, we propose a very simple and systemic approach that uses the consensual congestion pat-
terns obtained in the previous section. First, let us dene a number of probe trips that we will use for investigating
the methods and the validation. Based on the network map, we dene 10 trips that cover most of the network
links, see Fig.4(a). A virtual probe vehicle is launched every 10 min over the time period between 8 am and 2
pm and its travel time is calculated based on the time-dependent link speed information of the studied day. is
denes for each day 370 probe trips characterized by the travel time that a vehicle would have experimented for
this trip and this departure time. Note that travel time calculations are made on the directed version of the road
network graph while the initial and consensual clustering were made without considering directions. First, we
are going to investigate if the mean speed values related to the 3D congestion maps can be considered as a good
proxy for the travel time calculation. For now, only the days included in the 12 dierent training sets are consid-
ered because their group label and thus their consensus clustering shapes are known. We dene three methods to
estimate the travel time depending all the options we have to dene congestion maps:
• M1: initial cluster shape of the day + link speeds equal to the mean speed value of all links in each initial
cluster and the same day
• M2: consensus cluster shape of the group + link speeds equal to the mean speed value of all links in each
consensus cluster and the same day
• M3: consensus cluster shape of the group + link speeds equal to the mean speed value for all links in each
consensus cluster over all days of the group.
Figure4(b) shows the distribution (box plot) of the travel time estimation errors for all probe trips, all training
days and the three methods. It appears that averaging the link speeds within each initial cluster (M1) obviously
introduces errors in the travel time estimation: (i) the mean and median errors are respectively equal to 2.0%
and 2.3%, and are thus close to 0 (ii) 50% of the probe trips (25th to 75th percentiles) have errors between
13.7% and 8.6% and (iii) 80% of the probe trips (10th to 90th percentiles) have errors between 22.1% and
17.6%. Interestingly, most of the errors come from the averaging process within the cluster: when switching to
the consensus cluster shape (M2) or replacing mean cluster speeds of the day by the mean cluster speeds of the
group of days (M3) leads to error distributions that are very close to what is observed for (M1). In particular, for
M3, the mean and median error values are respectively 2.7% and 3.6%, 50% of the probe trips exhibit errors
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
7
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
between 15.7% and 9.4% and 80% of the probe trips have errors between 23.9% and 19.2%. ese results are
fundamental because they rst conrm from another perspective (here the travel time estimation) that consensus
congestion maps with mean speed in each cluster determined over a similar group of days are very relevant to
synthesize the trac congestion pulse at the city level. We see no discrepancy when switching from M2 to M3
meaning that all days of a group have similar speed behavior within each consensus cluster. As no discrepancy is
observed when switching from M1 to M2, the consensus shape appears to be a good proxy to partition all the days
of the same group. Together, these two results demonstrate that the consensus cluster decompositions are relevant
not only in terms of shape but also in terms of mean cluster speed values and provide a unique and systemic pic-
ture of what happen for all days belonging to a same group. Note that the same graph as Fig.4(b) but with absolute
travel time errors is presented in supplementaryS6.
e previous analysis provides the rationale for a simple, systemic and real-time travel time prediction method
for new days belonging to the validation sets. For a new day, M1 and M2 are no longer relevant because they
require the data of this particular day. However, M3 still holds as long as the new day can be assigned to an exist-
ing group obtained through historical analysis, i.e. over the training set. e only missing component is a method
to allocate in real-time the current observations of a new day to an existing group. Knowing the group, the prede-
termined consensus cluster shape and the related mean speed values for each cluster can be applied to predict the
future travel times. Here, we propose a simple method with very low computational times to match a new day
with an existing group. is method only requires the link speed information until the actual time t of the new
day. First, we reduce the consensual cluster shape of each historical group (4 in our case) to the period of time
between 7am and t. en, we apply all restricted consensual cluster shapes both on the new day data and on the
consensus map of the related group. Mean speed values for the same cluster i in the new day
xig,
and the consen-
sus
yig,
are compared. e optimal group index g* minimizes the Euclidean speed distance between the current
day and the group:
gnxyargmin 1()
(4)
g
i
n
ig ig
1
,,2
=
=
Figure 4. Travel time estimation based on congestion patterns. (a) Map of the probe trips (b) Travel time
estimation errors for all probe trips and all training days considering the three estimation methods: M1, link
speed is the mean speed in the original cluster; M2, link speed is the mean speed in the consensus cluster; M3,
same as M2 but the mean speed is calculated over all days of the same group (c) Estimated vs. experimented
travel times for the 10 probe trips, one validation day and a departure time equal to 9am (d) Estimated vs.
experimented travel times for the 10 probe trips, all validation days and all departure times (e) Distribution of
the travel time estimation errors for the 10 probe trips, all validation days and all departure times. (b) Shows
that travel time errors are in most case relatively low. Averaging speed within each cluster has the highest
contribution to errors. Interestingly, using the consensus cluster shape (
M M12
) and the average of all days
within a group (
M M23
) have very impacts on errors. (ce) Show that travel time predictions based on
assigning a new day to an historical group and using the consensus cluster shape and the mean cluster speed of
the group are very good for most probe trips.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
8
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
Note that the number of clusters n within the restricted time period (7am-t) can be lower than 9 in particular
at the beginning of the day where all 3D patterns have not yet necessarily appeared. In practice, we can refresh
the assignment of the new day to a group every hour starting at 8am, and assess the travel time predictions on the
probe trips where a new virtual vehicle starts every 10 min. Figure4(c) shows the results for a particular validation
day and all trips starting at 9 am. It appears that, even if the reference time period to assign the day to a group
is short (here 7 am–9 am), the predicted travel times are close to the experimented one for all trips, i.e. all error
values but one fall between 20% and 20%. Note that the travel times are simply calculated using the link speed
values of the full day since we are not testing the application in real-time here. is means that we already know
all the link speed information for the validation days on the contrary to a real-time implementation where the
future is unknown. Figure4(d) shows exactly the same results but now for all validation days (7 days and 12 rep-
lications meaning 84 in total) and all departure times between 8am and 2 pm. Again, a large fraction of the travel
time predictions (72.1% of the total probe trips) exhibit errors between 20% and 20% and almost all (91.9%) fall
within an ±30% error margin. Despite its simplicity and its very low computational cost, the proposed method
leads to accurate travel time predictions for most trips. is is conrmed by Fig.4(e), which shows the cumu-
lative distribution of all prediction errors. e mean and median values are equal to 2.2% and 2.7%, 50% of
the probe trips experiments errors between 15.5% and 10.0% and 80% of the probe trips have errors between
24.5% and 20.8%. e counterpart of Fig.4(e) with absolute travel time errors is provided in supplementaryS6.
Discussion
In this paper, we questioned the regularity of day-to-day mobility patterns at the macroscopic level. e global
analysis of Amsterdam link speed data over 35 days shows a high degree of regularity when comparing the daily
congestion patterns. In our case, four consensual 3D speed maps related to four groups of days are sucient to
describe the daily trac dynamics at the city scale. is is remarkable given the fact that these consensual 3D
speed maps are very parsimonious: for our case study, they consists of 9 clusters (collections of link and time ID)
only, each characterized by a single mean speed value. A key contribution here was to use consensus learning
methods to turn the cluster shapes of dierent days belonging to the same group into a single common pattern.
Note that if more days are available for the learning, it is possible to keep the same level of quality for the con-
sensual shape by increasing the number of groups. e NMI index permits to monitor the level of dissimilarity
within a group of days and determine if a group should be split or not. is paper has thus demonstrated that
consensual 3D speed maps are a new and very powerful tool to capture the congestion pulse in one shot at the
whole city scale. It should be noticed that some factors that have not been observed during our sample of 35 days
may inuence the regularity of congestion patterns. From our experience, we can mention adverse weather condi-
tions; exceptional (large cultural) events; or incidents as sources of major disruptions in the network. Over longer
time periods, during which such situations are observed multiple times, the number of groups will increase to
accommodate the resulting broader array of patterns, and most likely some regularity patterns with low frequency
of appearance will emerge. Only the consequences of very rare or specic events are fully unpredictable.
A second major nding in this paper is that these consensual 3D speed maps allow us to design a simple and
systemic method to predict travel times in an entire city. In this method rst prevailing link speed observations
are matched to an existing group of days. Subsequently, the consensual 3D speed map related to this group is used
to predict the travel time of any trip within the city. is method is real-time and practice ready as the matching
step is computationally lightweight. It corresponds to the selection of the best consensual 3D speed maps among
the existing group of days based on the comparison of the mean speed in each cluster. In our data set, we suc-
ceeded in making travel time predictions for more than 84% of the trips with an absolute error lower than 25%,
which is sucient for most potential practical applications like trac information provision, route guidance,
trac control and management, or optimizing good deliveries and solving vehicle routing problems.
e methodology presented in this paper to derive consensual 3D speed maps can be easily implemented in
the real eld. Link speed data at a granularity of say 1–10 minutes become more and more readily available thanks
to advances in estimation methods using classical data (induction loops, cameras) and new data sources based on
crowd-sourcing (mobile-phone records, GPS tracking). One clear direction for (methodological) improvement
relates to decreasing computational costs, particularly when determining the initial 3D speed map for a new day
on much larger networks in terms of number of links. Our aim was to make the case for 3D patterns as a new way
to identify large-scale regularity in trac networks and it turned out that with these 3D congestion patterns a new
approach to a notoriously dicult problem (predicting travel times in urban networks) is possible. Even though
optimizing the clustering and the post-treatment operations is very important for larger networks with (much)
more links and data, it should be noticed that (continuously) learning and updating the consensual patterns with
new daily patterns are o-line steps that can be performed over the night (determining the 3D congestion maps
for a new day) and over the weekend (updating the consensual patterns). e critical component for real-time
travel time estimation is the matching between the current observations and the historical data included in the
3D consensual congestion maps. With our method, this operation is so fast that it can already be applied in much
larger networks. In this paper, we do already hint at an important avenue to signicantly cut computational
costs for the original clustering operations. We constructed the 208 link graph of Amsterdam through coarsening
the original 7512 link OSM network, using a constrained version of contraction hierarchy44 as explained in20.
Network coarsening45 appears then as an ecient strategy to reduce the network size while preserving both
network topology and the underlying data patterns. Also this strategy deserves further in-depth analysis and
research.
Clearly, there are numerous other directions to further improve the methodologies behind the two contri-
butions oered here. ese relate for example to improve the underlying data processing methods, or to more
advanced clustering techniques and matching procedures. Nonetheless, we believe the main results stand and
touch upon a fundamental property of city trac dynamics, and that is, that these dynamics may be more regular
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
9
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
and predictable than expected. Consensual 3D speed maps enable us to extract the essence of large sets of detailed
data to reveal the global picture about trac dynamics in cities. We expect many applications of this concept not
only for trac monitoring and control but also for policy making and urban planning in general.
Methods
Initial dataset. In this study, link speed data are reconstructed from trip travel time observations. In
Amsterdam, 127 cameras are recording license plates at the critical points of the major street networks (excluding
freeways). is denes 314 single origin-destination (OD) pairs. For each OD pair the shortest path in distance is
determined using the OpenStreetMaps GIS database46. e nal network consists in all the links included in all
the shortest paths, i.e. 7512 links in total. We apply an algorithm that merges together successive links in the same
direction between two intersections. Internal links for intersections are also merged into a single node that only
reproduces the available turning movements. At the end, the network has 208 links and 214 nodes20. e nal step
is to calculate the link speed information for 10 min time intervals from the individual travel times between OD
pairs. We have a complete database of 35 days where we select the time period between 7am and 3 pm (morning
peak hour and lunch time). e mean number of individual travel time records per day is 171000. Each individual
travel time information provides both the departure and the arrival times. All travel time data that exceeds a given
threshold added to the current moving average for a given OD are considered as outliers and then disregarded
(7% in total). e remaining information are then matched to links assuming a constant travel speed. We used a
10 min time window for link speed data, meaning that all observations coming from vehicles that drive through a
link during the same 10 min period are averaged into a single link speed value. A complete description of the data
preparation can be found in20. Note that the data processing in this paper is not restricted to the data we used for
the Amsterdam network but can be applied to any network with link speed information.
Ncut algorithm. Ncut is a clustering algorithm based on a similarity matrix S(i, j) that denes the level of
similarity between two elements i and j of the dataset15. In this paper, we use two dierent metrics to dene the
similarity: the Snake similarity17 when determining the original clustering for each day and the NMI, eq.2, when
gathering days with similar patterns. More details about the Snake similarity are provided in supplementaryS1.
e dierent steps of the Ncut algorithm are:
1. Calculate the diagonal matrix D of the similarity matrix S
2. Calculate the normalized Laplacian matrix
=−
−−
LD DSD()
1/21/2
3. Calculate the eigenvalues of L and increasingly order the eigenvectors with respect to the eigenvalues
4. To obtain a partition in 2m clusters, select the 2nd to the
+−m(2 1)th
eigenvectors in the ordered list. e
splitting point here is equal to 0 meaning that we separate for each eigenvector the values >0 and 0. Each
observation is then codied into a set of m binary values > or 0 depending on the related values in the
eigenvectors. Each observation with the same codication falls into the same cluster.
5. When the targeted number of clusters is not a power of 2, take the closest higher value for 2m that then
apply a merge algorithm. Clusters with the closest similarities are iteratively merged two by two17.
k-means and DBSCAN. Before running the k-means or the DBSCAN we rst normalized each observation
i dened by the following vector
xytv(, ,,)
iiii
, where xi and yi are the geographical coordinates of the middle of a
link, ti denes the time period and vi the speed value. Normalization is performed based on the global minimal
and maximal values for all coordinates. Speed values are then overweighted by a factor 3 because this variable
should play a predominant role during the clustering process. For both algorithms, the distance between two
observations is assessed based on the Euclidean one. e details of k-means algorithm can be found in21. e only
parameter is the number of targeted clusters. e DBSCAN (Density-based spatial clustering of applications with
noise) has been proposed by Ester et al. in 199622. It is a density-based clustering algorithm that groups together
points that are close, i.e. within a circle of radius
ε
(0.005 in our case). ere is no targeted number of clusters but
a minimal number of points to dene a cluster (10 in our case). e algorithm stops when all points have been
labeled. To obtain a given number of clusters, clusters are nally merged using the same algorithm as for the
Ncut17. In practice both k-means and DBSCAN scripts have been retrieved from the MATLAB© File Exchange
website47,48.
Data availability. All the data related to this study and its documentation are accessible using the following
links: http://dittlab.tudel.nl:8080/3DPartitioning or https://doi.org/10.6084/m9.gshare.5198566.
References
1. Brocmann, D., Hufnagel, L. & Geisel, T. e scaling laws of human travel. Nature 439, 462–465 (2006).
2. González, M., Hidalgo, C. & Barabási, A. Understanding individual mobility patterns. Nature 453, 779–782 (2008).
3. Song, C., oren, T., Wang, P. & Barabási, A. Modelling the scaling properties of human mobility. Nature Physics 6, 818–823 (2010).
4. Song, C., Qu, Z., Blumm, N. & Barabási, A. Limits of predictability in human mobility. Science 327, 1018–1021 (2010).
5. Zhao, ., Musolesi, M., Hui, P., ao, W. & Taroma, S. Explaining the power-law distribution of human mobility through
transportation modality decomposition. Scientic reports 5, srep09136 (2015).
6. Ortuzar, J. & Willumsen, L. Modelling transport. (Wilsey,Chichester) (1994).
7. Wilson, A. Land-use/transport interaction models:past and future. Journal of Transportation Economic Policy 32, 3–26 (1998).
8. Chouroun, J. A general framewor for the development of gravity-type trip distribution models. egional Science and Urban
Economics 5, 177–202 (1975).
9. Lenormand, M., Bassolas, A. & amasco, J. J. Systematic comparison of trip distribution laws and models. Journal of Transport
Geography 51, 158–169 (2016).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
10
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
10. Peng, C., Jin, X., Wong, ., Shi, M. & Lio, P. Collective human mobility pattern from taxi trips in urban area. PLOS ONE 7, e34487
(2012).
11. Wang, P., Hunter, T., Bayen, A., Schechtner, . & González, M. Understanding road usage patterns in urban areas. Scientic reports
2, srep011001 (2012).
12. Anas, A. Discrete choice theory, information theory and the multinomial logit and gravity models. Transportation esearch part B
17, 13–23 (1983).
13. Yang, Y., Herrera, C., Eagle, N. & González, M. Limits of predictability in commuting ows in the absence of data calibration.
Scientic eports 4, srep05662 (2014).
14. Alpaydin, E. Ntroduction to Machine Learning–3rd edition (MIT Press, 2014).
15. Shi, J. & Mali, J. Normalized cuts and image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence 22(8),
888–905 (2000).
16. Ji, Y. & Geroliminis, N. On the spatial partitioning of urban transportation networs. Transportation esearch Part B 46(10),
1639–1656 (2012).
17. Saeedmanesh, M. & Geroliminis, N. Clustering of heterogeneous networs with directional ows based on “snae” similarities.
Transportation esearch Part B 91, 250–269 (2016).
18. Cover, T. M. & omas, J. A. Elements of Information eory 2nd Edition (Wiley Series in Telecommunications and Signal Processing)
(Wiley-Interscience, 2006).
19. Filov, V. & Siena, S. Integrating microarray data by consensus clustering. International Journal on Articial Intelligence Tools 13,
863–880, https://doi.org/10.1142/S0218213004001867 (2004).
20. Lopez, C., rishnaumari, P., Leclercq, L., Chiabaut, N. & van Lint, H. Spatio-temporal partitioning of the transportation networ
using travel time data. Transportation esearch ecords 14Â p. https://doi.org/10.3141/2623-11 (2017).
21. MacQueen, J. B. Some methods for classication and analysis of multivariate observations. In Cam, L. M. L. & Neyman, J. (eds) Proc.
of the h Bereley Symposium on Mathematical Statistics and Probability, vol. 1, 281–297 (University of California Press, 1967).
22. Ester, M., riegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise.
In Proceedings of the Second International Conference on nowledge Discovery and Data Mining, DD'96, 226–231 (AAAI Press,
1996).
23. Yang, F., Li, T., Zhou, Q. & Xiao, H. Cluster ensemble selection with constraints. Neurocomputing 235, 59–70, https://doi.
org/10.1016/j.neucom.2017.01.001 (2017).
24. Ozay, M. Semi-supervised segmentation fusion of multi-spectral and aerial images. In 2014 22nd International Conference on
Pattern ecognition https://doi.org/10.1109/icpr.2014.659 (IEEE 2014).
25. Vlahogianni, E. I., arlais, M. G. & Golias, J. C. Short-term trac forecasting: Where we are and where we’re going. Transportation
esearch Part C: Emerging Technologies 43, 3–19, https://doi.org/10.1016/j.trc.2014.01.005. Special Issue on Short-term Trac Flow
Forecasting (2014).
26. Mori, U., Mendiburu, A., Álvarez, M. & Lozano, J. A. A review of travel time estimation and forecasting for advanced traveller
information systems. Transportmetrica A: Transport Science 11, 119–157, https://doi.org/10.1080/23249935.2014.932469 (2015).
27. Wang, Y., Papageorgiou, M. & Messmer, A. enaissance - a unied macroscopic model-based approach to real-time freeway
networ trac surveillance. Transportation esearch Part C: Emerging Technologies 14, 190–212 (2006).
28. umar, B. A., Vanajashi, L. & Subramanian, S. C. Bus travel time prediction using a time-space discretization approach.
Transportation esearch Part C: Emerging Technologies 79, 308–332 (2017).
29. Nantes, A., Ngoduy, D., Bhasar, A., Misa, M. & Chung, E. eal-time trac state estimation in urban corridors from heterogeneous
data. Transportation esearch Part C: Emerging Technologies 66, 99–118, https://doi.org/10.1016/j.trc.2015.07.005 (2016).
30. Zhang, X. & ice, J. A. Short-term travel time prediction. Transportation esearch Part C: Emerging Technologies 11, 187–210 (2003).
31. ice, J. & Zwet, Ev A simple and effective method for predicting travel times on freeways. IEEE Transactions on Intelligent
Transportation Systems 5, 200–207 (2004).
32. Biswas, S., Charaborty, S., Chandra, S. & Ghosh, I. riging-based approach for estimation of vehicular speed and passenger car
units on an urban arterial. Journal of Transportation Engineering 143 https://doi.org/10.1061/JTEPBS.0000031 (2017).
33. Xu, Y., Chen, H., ong, Q.-J., Zhai, X. & Liu, Y. Urban trac ow prediction: A spatio-temporal variable selection-based approach.
Journal of Advanced Transportation 50, 489–506, https://doi.org/10.1002/atr.1356 (2016).
34. Bahuleyan, H. & Vanajashi, L. D. Arterial path-level travel-time estimation using machine-learning techniques. Journal of
Computing in Civil Eng ineering 31 (2017).
35. Huang, W., Song, G., Hong, H. & Xie, . Deep architecture for trac ow prediction: Deep belief networs with multitas learning.
IEEE Transactions on Intelligent Transportation Systems 15, 2191–2201, https://doi.org/10.1109/tits.2014.2311123 (2014).
36. Wang, J., Tsapais, I. & Zhong, C. A space-time delay neural networ model for travel time prediction. Engineering Applications of
Articial Intelligence 52, 145–160 (2016).
37. Van Lint, J. W. C. Online learning solutions for freeway travel time prediction. IEEE Transactions on Intelligent Transportation
Systems 9, 38–47 (2008).
38. Lv, Y., Duan, Y., ang, W., Li, Z. & Wang, F. Y. Trac ow prediction with big data: A deep learning approach. IEEE Transactions on
Intelligent Transportation Systems 16, 865–873 (2015).
39. Yang, H., Dillon, T. S. & Chen, Y. P. Optimized structure of the trac ow forecasting model with a deep learning approach. IEEE
Transactions on Neural Networs and Learning Systems (2016).
40. Bhasar, A., Tsubota, T., ieu, L. M. & Chung, E. Urban trac state estimation: Fusing point and zone based data. Transportation
esearch Part C: Emerging Technologies 48, 120–142 (2014).
41. Li, L. et al. obust causal dependence mining in big data networ and its application to trac ow predictions. Transportation
esearch Part C: Emerging Technologies 58B, 292–307 (2015).
42. Laharotte, P.-A., Billot, ., El-Faouzi, N.-E. & aha, H. A. Networ-wide trac state prediction using bluetooth data. In TB 94th
Annual Meeting Compendium of Papers, 15-3022 (Transportation esearch Board, 2015).
43. Fusco, G., Colombaroni, C. & Isaeno, N. Short-term speed predictions exploiting big data on large urban road networs.
Transportation esearch Part C: Emerging Technologies 73, 183–201, https://doi.org/10.1016/j.trc.2016.10.019 (2016).
44. Geisberger, ., Sanders, P., Schultes, D. & Delling, D. Contraction hierarchies: Faster and simpler hierarchical routing in road
networs. In McGeoch, C. C. (ed.) Experimental Algorithms: 7th International Worshop, WEA 2008 Provincetown, MA, USA, May
30-June 1, 2008 Proceedings, 319–333 (Springer, Berlin, Heidelberg, 2008).
45. Chevalier, C. & Safro, I. Comparison of coarsening schemes for multilevel graph partitioning. In Lecture Notes in Computer Science,
191–205 (Springer Berlin Heidelberg, 2009).
46. OpenStreetMap contributors. Planet dump retrieved from, https://planet.osm.org. https://www.openstreetmap.org (2017).
47. Mo C. means algorithm retrieved on 2016-12-10 from, https://fr.mathwors.com/matlabcentral/leexchange/24616-means-
clustering (2016).
48. Yarpiz. DBSCAN Clustering Algorithm retrieved on 2016-12-10 from, https://fr.mathwors.com/matlabcentral/leexchange/52905-
dbscan-clustering-algorithm (2016).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
www.nature.com/scientificreports/
11
Scientific RepoRts | 7: 14029 | DOI:10.1038/s41598-017-14237-8
Acknowledgements
is study has received funding from the European Research Council (ERC) under the European Union’ Horizon
2020 research and innovation program (grant agreement 646592–MAGnUM project); and the Horizon 2020
SETA project (grant agreement No 688082). Map data copyrighted OpenStreetMap contributors and available
from https://www.openstreetmap.org.
Author Contributions
L.L., N.C. and H.V.L. designed the research; C.L. and P.K. performed the research in collaboration with L.L.,
H.V.L. and N.C.; L.L. wrote the paper with support of H.V.L. All authors participated in analyzing the results and
reviewed the manuscript.
Additional Information
Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-017-14237-8.
Competing Interests: e authors declare that they have no competing interests.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
© e Author(s) 2017
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Many different cluster validity indices or evaluation metrics could be used for selecting the clustering which best matches the criteria. Common practice is to use one of the well-known and widely accepted clustering methods such as k-means, hierarchical, affinity propagation, or spectral clustering (Lopez et al., 2017;Yang et al., 2017;Krishnakumari et al., 2020;Ferranti, 2020;Cebecauer et al., 2019;Chiabaut & Faitout, 2021). ...
... The problem is even more complex as clustering methods usually rely on a similarity or distance metric. Among the many metrics that could be used, the most prevalent for day clustering in transport networks are the Euclidean distance (Cebecauer et al., 2019;Lopez et al., 2017;Yang et al., 2017), cosine similarity (Cebecauer et al., 2019;Toqu e et al., 2017), and normalized mutual information (Chiabaut & Faitout, 2021;Krishnakumari et al., 2020;Lopez et al., 2017). ...
... The problem is even more complex as clustering methods usually rely on a similarity or distance metric. Among the many metrics that could be used, the most prevalent for day clustering in transport networks are the Euclidean distance (Cebecauer et al., 2019;Lopez et al., 2017;Yang et al., 2017), cosine similarity (Cebecauer et al., 2019;Toqu e et al., 2017), and normalized mutual information (Chiabaut & Faitout, 2021;Krishnakumari et al., 2020;Lopez et al., 2017). ...
Article
Full-text available
Recognition of spatio-temporal traffic patterns at the network-wide level plays an important role in data-driven intelligent transport systems (ITS) and is a basis for applications such as short-term prediction and scenario-based traffic management. Common practice in the transport literature is to rely on well-known general unsupervised machine-learning methods (e.g., k-means, hierarchical, spectral, DBSCAN) to select the most representative structure and number of day-types based solely on internal evaluation indices. These are easy to calculate but are limited since they only use information in the clustered dataset itself. In addition, the quality of clustering should ideally be demonstrated by external validation criteria, by expert assessment or the performance in its intended application. The main contribution of this paper is to test and compare the common practice of internal validation with external validation criteria represented by the application to short-term prediction, which also serves as a proxy for more general traffic management applications. When compared to external evaluation using short-term prediction, internal evaluation methods have a tendency to underestimate the number of representative day-types needed for the application. Additionally, the paper investigates the impact of using dimensionality reduction. By using just 0.1% of the original dataset dimensions, very similar clustering and prediction performance can be achieved, with up to 20 times lower computational costs, depending on the clustering method. K-means and agglomerative clustering may be the most scalable methods, using up to 60 times fewer computational resources for very similar prediction performance to the p-median clustering.
... To address the dynamics of bottlenecks we present the analysis through two forms of order: scaling characteristics versus local meso dynamics. We found, that while the data correspond to scaling laws in a macro resolution 41,42 when zooming into the spatio-temporal behavior of the bottlenecks and their trees, they vary both in their location and in the time they occur. ...
... Previous work found universal laws in urban traffic congestion [42][43][44][45][46][47][48] . Some studies even identified a high degree of regularity in the measured speed of the street segments 41,42 . Others focused on the time evolution of urban congestion 45,47 , but not through the analysis of bottlenecks. ...
... Others focused on the time evolution of urban congestion 45,47 , but not through the analysis of bottlenecks. At large scales, traffic dynamics and congestions have been found predictable 41,42 and their weights follow power-law distributions 45,47 . With this in mind, we analyzed the bottlenecks' dynamic at the macro-scale to find if they present such regularities as well. ...
Article
Full-text available
The increasing urbanization in the last decades results in significant growth in urban traffic congestion around the world. This leads to enormous time people spent on roads and thus significant money waste and air pollution. Here, we present a novel methodology for identification, cost evaluation, and thus, prioritization of congestion origins, i.e., their bottlenecks. The presented work is based on network analysis of the entire road network from a global point of view. We identify and prioritize traffic bottlenecks based on big data of traffic speed retrieved in near-real-time. Our approach highlights the bottlenecks that have the most significant effect on the global urban traffic flow. We follow the evolution of every traffic congestion in the entire urban network and rank all the congestions, based on the cost they cause (in Vehicle Hours units). We show that the macro-stability that represents the seeming regularity of traffic load both in time and space, overshadows the existence of meso-dynamics, where the bottlenecks that create these congestions usually do not reappear on different days or hours. Thus, our method enables to identify in near-real-time both recurrent and nonrecurrent congestions and their sources.
... This is because the definition of the path for this kind of aggregated model is different from the one of trips in urban networks [28] (see Figure 1). The application of the MFD-based models requires the aggregation of the urban network into a set of regions ( [29], [30]), within which vehicles circulate at the same average speed. This enables the definition of a regional graph (or regional network). ...
Article
Full-text available
Macroscopic traffic models represent a promising tool to design strategies for ecological routing. To benefit from this tool, we must first characterize the relationship between path emissions and distance traveled or travel time on aggregated networks, i.e., a regional network. This paper investigates this relationship between two toy networks and a real urban network representing the city of Innsbruck (Austria). We utilize an accumulation-based model based on the Macroscopic Fundamental Diagram to mimic the traffic dynamics in the network and utilize the COPERT IV model to estimate the travel emissions, focusing on the carbon dioxide CO2. We show that there is a linear relationship between the total emissions of CO2 and the average travel time of internal paths, i.e. paths that take place completely within a single region. We also show that in some cases, there is a linear relationship between the total emissions and the average travel distance or travel time of paths that cross multiple regions in the network. However, the latter is not always true as traffic dynamics play an important role in path emissions. In other words, eco-friendly paths on regional networks do not necessarily follow the shortest paths in terms of distance or time.
... Existing literature has demonstrated the importance of such inter-link communication 1,6,[20][21][22][23][24][25] . Pioneering work in the last few years established that traffic networks exhibit a percolating state based on the characteristic speeds in their roads 6,21 . ...
Article
Full-text available
The science of cities aims to model urban phenomena as aggregate properties that are functions of a system’s variables. Following this line of research, this study seeks to combine two well-known approaches in network and transportation science: (i) The macroscopic fundamental diagram (MFD), which examines the characteristics of urban traffic flow at the network level, including the relationship between flow, density, and speed. (ii) Percolation theory, which investigates the topological and dynamical aspects of complex networks, including traffic networks. Combining these two approaches, we find that the maximum number of congested clusters and the maximum MFD flow occur at the same moment, precluding network percolation (i.e. traffic collapse). These insights describe the transition of the average network flow from the uncongested phase to the congested phase in parallel with the percolation transition from sporadic congested links to a large, congested cluster of links. These results can help to better understand network resilience and the mechanisms behind the propagation of traffic congestion and the resulting traffic collapse.
... Loder et al. (2019) 18 found how road and bus network topology explains the critical number of vehicles in the network which results in congestion of the urban network. Lopez et al. (2017)19 investigated the day-to-day regularity of urban congestion patterns.Saberi et al. (2020) 20 described the dynamics of congestion propagation and dissipation of traffic in cities using a simple contagion process. ...
Preprint
Full-text available
Static user equilibrium model and dynamic user equilibrium model are two commonly used traffic assignment models. Compared to dynamic model, static model costs much less time, needs less information but cannot assign the traffic in time dimension so it cannot obtain the time series of traffic flow. We can divide the time interval into segments and apply static model on each segment, so that the time series of traffic flow can be obtained. A challenge of this method is how to determine the scale of the time segment so that the result of static model can be close to dynamic model. This paper proposes a big data-driven analysis framework and uses China’s 40 cities’ road networks and traffic flow data as samples. We calculate the gap between the results of dynamic and static models using different scales of time segment. It is shown that the unevenness of the distribution of travel demand in time dimension and other indexes which measure the volume of the city, such as GDP, have a strong correlation with the optimal scale of time segment for static model.
... Thus, these patterns act as a support for selecting adequate measures for traffic management, for example, if a specific pattern of traffic flows is linked to a bottleneck activation pattern. This network-level perspective has already been shown, for example, for loop detector data (17) and automated number-plate recognition system data (18), where the complexity of urban traffic dynamics has been reduced to a few clusters. It must not be limited to traffic state estimation, but can also be used to inform about other events such as weather (19), from which further measures for traffic management can be drawn. ...
Article
The market for on-demand mobility services is growing worldwide. These services include, for example, ride-hailing, ride-sharing, and car-sharing. Large-scale fleets of such services collect GPS trajectory (probe vehicle) data constantly everywhere in the network. At a certain penetration rate, this data becomes representative of the entire road network. It can give valuable insights into traffic dynamics and the evolution of congestion. In this paper, we use such GPS trajectory data from Chengdu, China, to investigate the stability and recurrence of macroscopic traffic patterns. Using the two-fluid theory, we find that the two-fluid coefficients are robust on between-day variation, not only supporting the theory itself but also emphasizing that the general evolution of traffic is a robust pattern. We investigate the deviations from the model using time series analysis of the residuals of the two-fluid model. Here, we find evidence for daily and weekly seasonality in the residuals, indicating that congestion patterns are convincingly recurring. These patterns can be used for network-wide traffic state prediction. We conclude that GPS trajectory data from large on-demand mobility fleets is a promising data source for observing traffic patterns in urban road networks once the data becomes representative.
Article
An alternative approach for real-time network-wide traffic control in cities that has recently gained attention is perimeter flow control. Many studies have shown that this method is more efficient than state-of-the-art adaptive signal control strategies for heterogeneously congested urban networks. The basic concept of such an approach is to partition heterogeneous cities into a small number of homogeneous regions (zones) and apply perimeter control to the interregional flows along the boundaries between regions. The transferring flows are controlled at the traffic intersections located at the borders between regions so as to distribute the congestion in an optimal way and minimize the total delay of the system. The focus of current work is the mathematical formulation of the original nonlinear problem in a linear parameter-varying (LPV) form so that optimal control can be applied in a (rolling horizon) model predictive concept. This work presents the mathematical analysis of the optimal control problem as well as the approximations and simplifications that are assumed in order to derive the formulation of a linear optimization problem. Numerical simulation results for the case of a macroscopic environment (plant) are presented in order to demonstrate the efficiency of the proposed approach. Results for the closed-loop model predictive control scheme are presented for the nonlinear case, which is used as “benchmark,” as well as the linear case. Furthermore, the developed scheme is applied to a large-scale microsimulation of a European city with more than 500 signalized intersections in order to better investigate its applicability to real-life conditions. The simulation experiments demonstrate the effectiveness of the scheme compared with fixed-time control because all of the performance indicators are significantly improved. Funding: This work was supported by Dit4Tram “Distributed Intelligence & Technology for Traffic & Mobility Management” project from the European Union’s Horizon 2020 research and innovation programme under [Grant agreement 953783].
Article
Online demand prediction plays an important role in transport network services from operations, controls to management, and information provision. However, the online prediction models are impacted by streaming data quality issues with noise measurements and missing data. To address these, we develop a robust prediction method for online network-level demand prediction in public transport. It consists of a PCA method to extract eigen demand images and an optimization-based pattern recognition model to predict the weights of eigen demand images by making use of the partially observed real-time data up to the prediction time in a day. The prediction model is robust to data quality issues given that the eigen demand images are stable and the predicted weights of them are optimized using the network level data (less impacted by local data quality issues). In the case study, we validate the accuracy and transferability of the model by comparing it with benchmark models and evaluate the robustness in tolerating data quality issues of the proposed model. The experimental results demonstrate that the proposed Pattern Recognition Prediction based on PCA (PRP-PCA) consistently outperforms other benchmark models in accuracy and transferability. Moreover, the model shows high robustness in accommodating data quality issues. For example, the PRP-PCA model is robust to missing data up to 50% regardless of the noise level. We also discuss the hidden patterns behind the network level demand. The visualization analysis shows that eigen demand images are significantly connected to the network structure and station activity variabilities. Though the demand changes dramatically before and after the pandemic, the eigen demand images are consistent over time in Stockholm.
Article
Full-text available
Understanding human mobility is of great significance for sustainable transportation planning. Long-term travel delay change is a key metric to measure human mobility evolution in cities. However, it is challenging to quantify the long-term travel delay because it happens in different modalities, e.g., subway, taxi, bus, and personal cars, with implicated coupling. More importantly, the data for long-term multi-modal delay modeling is challenging to obtain in practice. As a result, the existing travel delay measurements mainly focus on either single-modal system or short-term mobility patterns, which cannot reveal the long-term travel dynamics and the impact among multi-modal systems. In this paper, we perform a travel delay measurement study to quantify and understand long-term multi-modal travel delay. Our measurement study utilizes a 5-year dataset of 8 million residents from 2013 to 2017 including a subway system with 3 million daily passengers, a 15 thousand taxi system, a 10 thousand personal car system, and a 13 thousand bus system in the Chinese city Shenzhen. We share new observations as follows: (1) the aboveground system has a higher delay increase overall than that of the underground system but the increase of it is slow down; (2) the underground system infrastructure upgrades decreases the aboveground system travel delay increase in contrast to the increase the underground system travel delay caused by the aboveground system infrastructure upgrades; (3) the travel delays of the underground system decreases in the higher population region and during the peak hours.
Article
Full-text available
Today, the deployment of sensing technology permits the collection of massive amounts of spatiotemporal data in urban areas. These data can provide comprehensive traffic state conditions for an urban network and for a particular day. However, data are often too numerous and too detailed to be of direct use, particularly for applications such as delivery tour planning, trip advisors, and dynamic route guidance. A rough estimate of travel times and their variability may be sufficient if the information is available at the full city scale. The concept of the spatiotemporal speed cluster map is a promising avenue for these applications. However, the data preparation for creating these maps is challenging and rarely discussed. In this study, that challenge is addressed by introducing generic methodologies for mapping the data to a geographic information system network, coarsening the network to reduce the network complexity at the city scale, and estimating the speed from the travel time data, including missing data. This methodology is demonstrated on the large-scale urban network of Amsterdam, Netherlands, with real travel time data. The preprocessed data are used to build the spatiotemporal speed cluster by using three partitioning techniques: normalized cut, density-based spatial clustering of applications with noise, and growing neural gas (GNG). A new posttreatment methodology is introduced for density-based spatial clustering and GNG, which are based on data point clustering, to generate connected zones. A preliminary cross comparison of the clustering techniques shows that GNG performs best in generating zones with minimum internal variance, the normalized cut computes three-dimensional zones with the best intercluster dissimilarity, and GNG has the fastest computation time.
Conference Paper
Full-text available
Nowadays, the deployment of sensing technology permits to collect massive spatio-temporal data in urban cities. These data can provide comprehensive traffic state conditions for an urban network and for a particular day. However, they are often too numerous and too detailed to be of direct use, particularly for applications like delivery tour planning, trip advisors and dynamic route guidance. A rough estimation of travel times and their variability may be sufficient if the information is available at the full city scale. The concept of spatio-temporal speed cluster map is a promising avenue for these applications. However, the data preparation for creating these maps is a challenging and rarely discussed topic. In this paper, we address this challenge by introducing generic methodologies for mapping the data to a Geographic Information System (GIS) network, coarsening the network for reducing the network complexity at the city scale and also estimating the speed from the travel time data, including missing data. We demonstrate this on a large scale urban network of Amsterdam with real travel time data. The preprocessed data is used to build the spatio-temporal speed cluster using three partitioning techniques – Normalized cut, DBSCAN and Growing Neural Gas (GNG). A new post-treatment methodology is introduced for DBSCAN and GNG, which are based on data point clustering, to generate connected zones. A preliminary cross comparison of the clustering techniques shows that GNG performs best in generating zones with minimum internal variance, Normalized Cut computes 3D zones with the best inter-cluster dissimilarity and GNG has the faster computation time.
Article
Gives a unique up-to-date account of mainstream approaches to transport modelling with emphasis on the implementation of a continuous approach to transport planning. The authors discuss modern transport modelling techniques and their use in making reliable forecasts using various data sources. Importance is placed on practical applications, but theoretical aspects are also discussed and mathematical derivations outlined. -from Publisher
Article
The accuracy of travel time information given to passengers plays a key role in the success of any Advanced Public Transportation Systems (APTS) application. In order to improve the accuracy of such applications, one should carefully develop a prediction method. A majority of the available prediction methods considered the variation in travel time either spatially or temporally. The present study developed a prediction method that considers both temporal and spatial variations in travel time. The conservation of vehicles equation in terms of flow and density was first re-written in terms of speed in the form of a partial differential equation using traffic stream models. Then, the developed speed based equation was discretized using the Godunov scheme and used in the prediction scheme that was based on the Kalman filter. From the results, it was found that the proposed method was able to perform better than historical average, regression, and ANN methods and the methods that considered either temporal or spatial variations alone. Finally, a formulation was developed to check the effect of side roads on prediction accuracy and it was found that the additional requirement in terms of location based data did not result in an appreciable change in the prediction accuracy. This clearly demonstrated that the proposed approach based on using vehicle tracking data is good enough for the considered application of bus travel time prediction.
Article
Providing travel time information to travelers on available route alternatives in traffic networks is widely believed to yield positive effects on individual drive behavior and (route/departure time) choice behavior, as well as on collective traffic operations in terms of, for example, overall time savings and-if nothing else-on the reliability of travel times. As such, there is an increasing need for fast and reliable online travel time prediction models. Previous research showed that data-driven approaches such as the state-space neural network (SSNN) are reliable and accurate travel time predictors for freeway routes, which can be used to provide predictive travel time information on, for example, variable message sign panels. In an operational context, the adaptivity of such models is a crucial property. Since travel times are available (and, hence, can be measured) for realized trips only, adapting the parameters (weights) of a data-driven travel time prediction model such as the SSNN is particularly challenging. This paper proposes a new extended Kalman filter (EKF) based online-learning approach, i.e., the online-censored EKF method, which can be applied online and offers improvements over a delayed approach in which learning takes place only as realized travel times are available.
Article
Clustering ensemble has emerged as an important tool for data analysis, by which a more robust and accurate consensus clustering can be generated. On forming the ensembles, empirical studies have suggested that better ensembles can be obtained by simultaneously considering the quality of the ensembles and the diversity among ensemble members. However, little research efforts have been paid to incorporate prior background knowledge. In this paper, we first provide a theoretical analysis on the effect of the diversity and quality of the ensemble members. We then propose a unified framework to solve constraint-based clustering ensemble selection problem, where some instance level must-link and cannot-link constraints are given as prior knowledge or background information. We formalize this problem as a combinatorial optimization problem in terms of the consistency under the constraints, the diversity among ensemble members, and the overall quality of ensembles. Our proposed framework brings together two distinct yet interrelated themes from clustering: ensemble clustering and semi-supervised clustering. We study different techniques for searching high-quality solutions. Experiments on benchmark datasets demonstrate the effectiveness of our framework.
Article
This study presents a methodology for travel-time prediction on urban arterial networks using data from global positioning system (GPS) probe vehicles, under Indian traffic conditions. Given any link in the network, the model predicts its travel time on the basis of historic patterns and real-time information. By doing this, it would be possible to find the time required to travel on all possible routes for any given origin-destination pair. The study also emphasizes the need for splitting links into intersection and midlinks. The k-nearest neighbor algorithm was used for prediction at the midlinks and a random forest predictor was developed to predict the travel time on the high-variation intersection links. For intersections with location-based sensors, a methodology for delay and travel-time prediction was proposed on the basis of predicted values of queue length. Overall, it was observed that better prediction accuracy was achieved when links were considered separately as midlink and intersection. The model was validated on a study network using GPS data procured from public transport buses in Chennai, India.
Article
Big data from floating cars supply a frequent, ubiquitous sampling of traffic conditions on the road network and provide great opportunities for enhanced short-term traffic predictions based on real-time information on the whole network. Two network-based machine learning models, a Bayesian network and a neural network, are formulated with a double star framework that reflects time and space correlation among traffic variables and because of its modular structure is suitable for an automatic implementation on large road networks. Among different mono-dimensional time-series models, a seasonal autoregressive moving average model (SARMA) is selected for comparison. The time-series model is also used in a hybrid modeling framework to provide the Bayesian network with an a priori estimation of the predicted speed, which is then corrected exploiting the information collected on other links. A large floating car data set on a sub-area of the road network of Rome is used for validation. To account for the variable accuracy of the speed estimated from floating car data, a new error indicator is introduced that relates accuracy of prediction to accuracy of measure. Validation results highlighted that the spatial architecture of the Bayesian network is advantageous in standard conditions, where a priori knowledge is more significant, while mono-dimensional time series revealed to be more valuable in the few cases of non-recurrent congestion conditions observed in the data set. The results obtained suggested introducing a supervisor framework that selects the most suitable prediction depending on the detected traffic regimes.
Article
Use of speed-prediction models sometimes appears as a feasible alternative to laborious field measurements, particularly when field data cannot fulfill designers’ requirements. However, developing speed models is a challenging task, especially in the context of developing countries like India where vehicles with diverse static and dynamic characteristics use the same right of way. Here, the traffic composition plays a significant role in determining vehicular speed. Determination of passenger car units (PCU), which is required to convert heterogeneous traffic flow to its homogeneous equivalent, also requires speed of different types of vehicles at varying traffic conditions and traffic volumes on the road. In this context, the present research is carried out to examine the effects of traffic volume and its composition on speed and PCU factors for individual types of vehicles under mixed traffic conditions. Classified traffic-volume and speed data were collected at six-lane divided arterial sections in New Delhi and categorized speed models developed adopting a kriging approximation technique, an alternative for commonly used regression. A novel algorithm for selecting the optimal correlation function in kriging has also been proposed in this paper. Developed speed models are validated with the data set of another location kept aside for this purpose, and the predicted speeds show a good agreement to the observed ones. Predicted speeds are then used to estimate PCU factors for each vehicle category. Finally, the proposed models are utilized to evaluate the effects of traffic volume and its composition on speed and PCU of different vehicle categories. - See more at: http://ascelibrary.org/doi/abs/10.1061/JTEPBS.0000031#sthash.bl9eZ1vK.dpuf