Conference PaperPDF Available

Fast Hidden Markov Model Map-Matching for Sparse and Noisy Trajectories

Authors:

Abstract and Figures

The problem of map-matching sparse and noisy GPS trajectories to road networks has gained increasing importance in recent years. A common state-of-the-art solution to this problem relies on a Hidden Markov Model (HMM) to identify the most plausible road sequence for a given trajectory. While this approach has been shown to work well on sparse and noisy data, the algorithm has a high computational complexity and becomes slow when working with large trajectories and extended search radii. We propose an optimization to the original approach which significantly reduces the number of state transitions that need to be evaluated in order to identify the correct solution. In experiments with publicly available benchmark data, the proposed optimization yields nearly identical map-matching results as the original algorithm, but reduces the algorithm runtime by up to 45%. We demonstrate that the effects of our optimization become more pronounced when dealing with larger problem spaces and indicate how our approach can be combined with other recent optimizations to further reduce the overall algorithm runtime.
Content may be subject to copyright.
Fast Hidden Markov Model Map-Matching for
Sparse and Noisy Trajectories
Hannes Koller, Peter Widhalm, Melitta Dragaschnig, Anita Graser
Austrian Institute of Technology
Giefinggasse 2
1210 Vienna
Email: first.last@ait.ac.at
Abstract—The problem of map-matching sparse and noisy
GPS trajectories to road networks has gained increasing impor-
tance in recent years. A common state-of-the-art solution to this
problem relies on a Hidden Markov Model (HMM) to identify the
most plausible road sequence for a given trajectory. While this
approach has been shown to work well on sparse and noisy data,
the algorithm has a high computational complexity and becomes
slow when working with large trajectories and extended search
radii. We propose an optimization to the original approach which
significantly reduces the number of state transitions that need to
be evaluated in order to identify the correct solution. In ex-
periments with publicly available benchmark data, the proposed
optimization yields nearly identical map-matching results as the
original algorithm, but reduces the algorithm runtime by up to
45%. We demonstrate that the effects of our optimization become
more pronounced when dealing with larger problem spaces and
indicate how our approach can be combined with other recent
optimizations to further reduce the overall algorithm runtime.
KeywordsData Analysis, GPS Processing, Map-Matching,
Hidden Markov Model, Optimization
I. INTRODUCTION
With the advent of modern vehicles, smart phones and
navigation systems, a rich data pool of GPS trajectories is
gradually becoming available as an exciting new data source
for intelligent transportation systems and related services. As
GPS data is often prone to inaccuracies and measurement
errors, the first step in trajectory analysis usually involves a
map-matching algorithm that finds the most likely route for a
given sequence of GPS measurements. In our work we focus
on off-line analysis, where the entire GPS trajectory is known
beforehand. In this context, map-matching can formally be
defined as the problem of matching an entire GPS trajectory (a
sequence of GPS positions z0, z1, ..., zn) onto a road network
graph. The output is a sequence of road links r0, r1, ..., rn
which best corresponds to the input trajectory.
Existing map-matching approaches can be classified into
three categories: Geometric methods such as [1] perform
matching based on the geometrical properties of the GPS tra-
jectory. Topology-based methods utilize additional information
about the road network (such as link connectivity) to improve
map-matching performance [2]. Statistical methods make use
of probability models to identify the most likely road sequence
for a GPS trajectory. Statistical approaches based on Hidden
Markov Models (HMM) and related methods have successfully
been applied to map-matching problems ([6], [7], [8], [9]). We
Fig. 1. An example problem. Position z1could be matched to the Links 1,2
or 3. Position z2could be matched to Links 4 or 5 and Position z3could be
matched to the Links 6 or 7.
base our contribution on the work of Newson and Krumm [3]
which has been shown to work well on sparse and noisy data.
The Hidden Markov Model introduced in [3] attempts
to balance the trade-off between (i) the probability of the
candidate roads suggested by a single GPS measurement and
(ii) the feasibility of the path between different candidate roads.
The algorithm can be summarized as follows:
1) For each GPS position zi, a set of candidate roads
{r0
i, r1
i, ..., rn
i}is determined, i.e. a set of all roads
within a certain search-radius, for example 200m,
around zi.
2) For each candidate road rj
i, a measurement probabil-
ity is calculated which reflects how likely it is for a
vehicle on road rj
ito emit a GPS measurement having
position zi. The probability depends on GPS error
characteristics and decreases with increasing distance
between road and GPS position.
3) For each candidate road rj
iof a GPS position
zi, the transition probability to all candidate roads
{r0
i+1, r1
i+1, ..., rm
i+1}of the next GPS position zi+1 is
calculated.The transition probability is an exponential
function of the difference between the route length
and the great circle distance between ziand zi+1.The
transition probability calculation therefore requires a
shortest path routing between each pair of candidate
roads, which is a computationally expensive opera-
tion.
4) The Viterbi dynamic programming algorithm [12]
is applied to determine the map-matching solution
by selecting the sequence of candidate roads which
yields the overall best explanation for the observed
GPS trajectory.
As noted in [5], the run-time bottleneck of the approach
are the shortest-path calculations that have to be performed be-
tween every combination of candidate roads of two consecutive
GPS positions. Assuming there are ncandidate roads for GPS
position ziand mcandidate roads for GPS position zi+1 then
nm shortest paths need to be calculated to obtain all transition
probabilities between these two GPS positions. When dealing
with long, noisy GPS trajectories where a large number of
candidate roads must be considered in order to obtain good
map-matching results, the number of shortest path routings
becomes intractable.
Improving the run-time of [3] has already been the subject
of several recent publications. In [4] multi-threading is used
to parallelize the computation of measurement and transition
probabilities. The authors of [5] adapt the transition probability
calculation such, that the paths from a candidate road to all
of its successors can be determined with a single execution of
Dijkstra’s algorithm [13], thus reducing the number of required
shortest-path routings from nm to n.
In our work, we describe a different approach to improve
the run-time performance for HMM-based map-matching al-
gorithms. Our suggested method replaces the Viterbi algorithm
with a Bidirectional Dijkstra algorithm [10] and employs
lazy evaluation to reduce the costly calculation of transition
probabilities which are not required to determine the optimal
solution. Our contribution is orthogonal to the other opti-
mization approaches, because we employ a different solving
algorithm to find the solution. Our method decreases the num-
ber of transition probabilities which need to be evaluated in
order to identify the optimal solution. The presented approach
is however compatible with the other, previously mentioned
optimizations and can help to further improve the algorithm
performance.
The remainder of the paper is structured as follows: In
section II we describe our optimization of the original algo-
rithm. We show how we construct a search graph from the
problem space of the HMM and describe how to transform
the HMM probabilities into cost functions compatible with
Dijkstra’s algorithm. In section III we evaluate the quality
and performance of our optimized algorithm using a publicly
available benchmark dataset and compare our results to the
results reported in [3]. Section IV concludes with a summary of
our findings and indicates directions for further improvements
of the presented algorithm.
II. ALGORITHM
As our algorithm is based on the approach of Newson
and Krumm [3], the performed steps are similar. However,
as opposed to the original method we do not apply the Viterbi
algorithm to find the most likely sequence of candidates given
the GPS positions z0, z1, ..., zn. Instead, we find the solution
by minimizing the path costs in a search graph using a bidi-
rectional version of Dijkstra’s algorithm [10]. This algorithm
follows a greedy approach for finding a minimal cost path and
evaluates the costs of a node and its outgoing edges only when
it arrives at this node during search. This way the computation
of transition costs, which involve computationally expensive
Fig. 2. Search graph for the example problem in Figure 1. Nodes in layer
licorrespond to the candidate roads of position zi. Origin and destination
nodes, and dashed lines complete the graph. Solid lines represent edges, where
a vehicle routing operation is required to obtain the edge cost (transition
probability).
routing operations, do not have to be performed immediately
but can instead be delayed until they are actually needed. In
most cases, the bidirectional Dijkstra algorithm only needs to
visit a fraction of all nodes before the minimum cost path is
found [11]. Thus, a large percentage of costly routing opera-
tions can be avoided. In real-world scenarios, where a trip can
consist of thousands of GPS positions (and therefore millions
of edges might be present in the graph), this optimization can
lead to substantial performance improvements.
A. Constructing a search graph
The search graph is constructed from an entire GPS
trajectory. First we insert an origin node o. Then, for each
position zi, we insert a layer liin the search graph and
identify a set of candidate roads (i.e. all roads in a search
radius daround position zi). For each candidate road rj
iof
position zi, a corresponding node is created in layer liand
assigned with a cost representing the distance between the
GPS position ziand candidate rj
i. Next, a set of edges is
created connecting all nodes of a layer lito all nodes of the
next layer li+1. These edges are assigned with costs for a
transition from candidate rj
iin layer lito rk
i+1 in layer li+1,
which are defined based on the route length and great circle
distance between the GPS positions ziand zi+1. Note that
the edge costs are not calculated when the search graph is
constructed, the corresponding cost function is evaluated only
when the costs are actually needed (lazy evaluation). Finally a
destination node dis added and connected to all nodes of the
previous layer.
An example is given in Fig. 1, which shows a small trip
with 3 GPS positions {z0, z1, z2}. The resulting search graph
for this example problem is shown in Fig. 2: layer l0, which
represents position z0, contains three nodes representing the
roads r1
0= 1,r2
0= 2 and r3
0= 3. Similarly, layers l1and
l2contain nodes representing the candidate roads for position
z1resp. z2.
Note that the constructed search graph is analogous to
the HMM decoding performed by the Viterbi Algorithm. The
difference is that the measurement and transition probabilities
have been replaced by costs assigned to nodes and edges,
respectively. In the original algorithm, the Viterbi Algorithm
is used to find the map-matching solution by identifying the
state sequence R= (r0, r1, ..., rn)which maximizes the joint
probability p(R, Z). Dijkstra’s algorithm, in contrast, attempts
to minimize the path cost. In the following section we therefore
demonstrate how the measurement and transition probabilities
can be transformed into costs in such a way, that the joint
probability p(R, Z)is expressed as a path cost that can be
minimized with the Bidirectional Dijkstra algorithm.
B. Transforming probabilities into costs
A HMM models the joint distribution p(R, Z)of a state
sequence R= (r0, r1, ..., rn)given measurements Z=
(z0, z1, ..., zn)as
p(R, Z) = p(z0|r0)p(r0)Y
i=1..n
p(zi|ri)p(ri|ri1),(1)
where p(zi|ri)are measurement probabilities (condi-
tional probability distribution of measurements given states),
p(ri|ri1)are transition probabilities and p(r0)are initial
state probabilities. In [3] the GPS errors are assumed to
follow a zero-mean Gaussian distribution, and accordingly the
measurement probabilities are defined as
p(zi|ri) = 1
2πσz
e0.5( δ(zi,xi)
σz)2,(2)
where xiis the point on the road segment rinearest to
zi, and δ(zi, xi)is the great circle distance between GPS
measurement ziand xi. The transition probabilities were
assumed to follow an exponential distribution
p(ri|ri1) = βeβ|δ(zi1,zi)φ(xi1,xi)|.(3)
Here φ(xi1, xi)is the the driving distance along the
shortest route from xi1to xi, and βis a parameter chosen
to best fit empirical data. The most likely state sequence - i.e.
sequence of road segments R- is calculated by maximizing
p(R, Z), which is equivalent to minimizing
log p(R, Z) = log p(z0|r0)log p(r0)+
X
i=1..n log p(zi|ri)log p(ri|ri1),(4)
where
log p(zi|ri) = log(2πσz)+0.5(δ(zi, xi)
σz
)2(5)
and
log p(ri|ri1) = log(β) + β|δ(zi1, zi)φ(xi1, xi)|.
(6)
When minimizing log p(R, Z)all additive constants can be
dropped as they are not relevant for the result. The initial state
probabilities p(r0)are assumed to be uniformly distributed and
can be dropped as well, and we minimize
C=0.5(δ(z0, x0)
σz
)2+
X
i=1..n
0.5(δ(zi, xi)
σz
)2+β|δ(zi1, zi)φ(xi1, xi)|
(7)
By constructing a search graph with node costs
nk
i= 0.5(δ(zi, xk
i)
σz
)2(8)
for node kin layer iand edge costs
ejk
i=β|δ(zi1, zi)φ(xj
i1, xk
i)|.(9)
for the edge connecting node jin layer i1to node kin
layer i, we can use the Bidirectional Dijkstra algorithm to find
a path (ie. a sequence R= (r0, r1, ..., rn)) minimizing the
path cost Cand thereby maximizing p(R, Z).
As Dijkstra’s algorithm generally operates only on
weighted edges, the node costs n(v)have to be merged with
the edge costs e(u, v)when building the search graph. The
edge weight w(u, v)between graph nodes uand vis thus
defined as w(u, v) = e(u, v) + n(v).
C. Adapted cost functions
The previous subsection has shown that the probabilities
of the original algorithm can be transformed into equivalent
cost functions suitable for use with Dijkstra’s algorithm. It
is however interesting to note that in the original formalism
the parameter βof equation 9, which controls the influence
of transition probabilities versus measurement probabilities,
is difficult to calibrate. This is apparent by the fact that
the absolute magnitude of the transition cost (as defined in
equation 9) is influenced by the length of the great circle
distance between two GPS positions. As a result, costs for
equally plausible routes will vary greatly, depending on the
sampling interval and the speed at which the tracked vehicle
was travelling. This makes it hard to define a βwhich con-
sistently balances transition and observation costs, especially
in real-life situations where the algorithm has to deal with
incomplete trajectories and varying vehicle speeds. In order to
allow for easier calibration, we have made adjustments to the
cost functions. In our implementation we define the node costs
as
nk
i=δ(zi, xk
i)(10)
Edge costs for the edge connecting node jin layer i1
to node kin layer iare based on the ratio (instead of the
difference) between great circle distance and route distance
cjk
i=φ(xj
i1, xk
i)
δ(zi1, zi)(11)
and are defined as
Fig. 3. Top: Comparison of route mismatch fraction between our optimized
version (d=50) and the values reported for the original algorithm in [3]. As
no exact numbers are given in [3], we use our best estimates based on the
bar plot in Figure 7 of the article. Bottom: Percentage of routing operations
which were avoided by our optimized approach
ejk
i=(βcj k
iif cjk
it
if cjk
i> t (12)
where tis the maximal ratio between great circle distance
and route distance which can be considered plausible. The
calibration parameter βcan be calculated intuitively as
β=α
1α
d
t(13)
where αis a value between 0and 1which controls the
tradeoff between transition and observation weights (higher α
values put more weight on route plausibilty) and dis the search
radius used to identify candidate roads around the observations
zi. Here, the role of dand tis to normalize the node and edge
costs, respectively, in order to assure similar value ranges. In
our experiments the following settings have been used: α=
0.65,t= 5.0, and dwas varied between 20 and 300 meters.
III. EXP ER IM EN TAL RE SU LTS
To evaluate our improvements, we use the benchmark data
of [3] which was made publicly available. The data consists of
7531 GPS positions obtained during an 80km car trip in the
Seattle area. The evaluation is based on a manually created
ground truth, which was provided in the form of the correct
link sequence travelled by the car. We have applied the same
preprocessing steps to the GPS data as described in [3], where
GPS positions with low confidence were removed from the
data set.
A. Quality
We evaluate the quality of the results using the route
mismatched fraction as proposed in [3]. The reported error
Fig. 4. Percentage of saved routings with different search radii and sampling
intervals. When the search radius grows larger, a higher percentage of
unnecessary routings can be avoided, especially when dealing with sparse
trajectories.
is defined as d+d+
d0
where d0is the length of the correct route, dis the length of
all road links which were missed during map matching and d+
is the length of all road links which were erroneously found
during map matching.
Figure 3 (top) shows that for a one-second sampling period
the algorithm is able to reconstruct the trajectory perfectly
(route mismatch fraction is zero). This is identical with the
results reported in [3]. With longer sampling periods the error
of our implementation remains very similar to the results
presented in [3]: Up to about 30 seconds, the route mismatch
fraction is very low. With longer sampling periods, where
more information about the original trajectory is lost, the
performance of both algorithm versions degrades considerably
(with a sampling period of 480 seconds only 17 of the 7531
positions are used as input).
B. Performance
Compared to the original algorithm, our solution maintains
almost the same map-matching quality while at the same time
requiring fewer routing operations to obtain the result. The
map-matcher presented in [3] uses the Viterbi algorithm, which
needs to evaluate all transition probabilities to find a solution.
As the algorithm needs to evaluate all transition probabilities,
it needs to perform 100% of the shortest path routings defined
by the edges in the search graph. Our algorithm in contrast
is able to avoid a substantial amount of these operations:
Figure 3 (bottom) shows that for a sampling period of one
second, our approach is able to avoid 45% of the routing
operations (only 100703 of the 185333 edge weights needed to
be evaluated). As the routing operation is the most expensive
calculation step in the map-matching algorithm, this amounts
to a substantial runtime improvement. With longer sampling
periods the percentage of avoided routing operations decreases,
because the problem space has to be searched more thoroughly
before the optimal solution can be identified. However with
a 30 second sampling period (where the quality of the map-
matching result is still very good) our optimization is still able
to avoid 33% of the routings.
The quality of the map-matching algorithm is strongly
dependent on the search radius dwhich is used to identify
candidate roads around every GPS position. When dealing with
sparse and noisy trajectories, a larger search radius typically
improves the map-matching result because it increases the
likelihood that the optimal road is included in the set of
candidate roads. Larger search radii however also increase the
size of the overall problem space: the number of nodes in
each layer of the search tree increases and as a result the
number of transition probabilities which need to be evaluated
by the Viterbi algorithm grows larger. Figure 4 shows that the
effects of our optimization become more pronounced under
these conditions: with a search radius of 300m and a sampling
interval of 1 second, the Viterbi algorithm would need to
perform about 27.4 million routing operations. By contrast,
our algorithm needed to perform only 14.6 million routing
operations to determine the correct solution, thus avoiding
over 46% of the operations. With increasing sampling interval
the effect of the optimization on different search radii be-
comes more obvious: at a 60 second sampling interval our
optimization can avoid 36% of the routing operations for the
300m search radius, but only 26% of the routing operations
for the 20m search radius. The reason is that our approach
evaluates more plausible paths through the search tree first
and terminates once the best path has been identified. The
problem space induced by the 300m search radius contains
many state transitions related to possible, but unlikely paths.
The evaluation of the related transition probabilities can often
be avoided by our algorithm. This amounts to greater savings
when the problem space includes a greater number of (possible
but unlikely) candidate roads.
Based on our experiments, we therefore conclude that our
optimization is especially advantageous when analyzing sparse
and noisy trajectories, i.e. where an extended search radius is
most useful.
IV. CONCLUSION AND OUT LO OK
In the last few years, map-matching has become increas-
ingly important for evaluating road traffic and driving be-
haviour. While high quality map-matching algorithms exist,
they can be slow when applied to large data sets. We have
proposed an optimization to reduce the running time of a state-
of-the-art map-matching algorithm which is based on a Hidden
Markov Model [3]. We suggested an improvement which
replaces the Viterbi algorithm with a Bidirectional Dijkstra
and employs lazy evaluation to reduce the number of costly
route calculations which are necessary to determine the optimal
map-matching solution for a trajectory.
We have evaluated the quality and performance of our
proposed method based on a publicly available data set. Our
test results show that our suggested solution can avoid up to
45% of the costly routing operations and has no negative effect
on the quality of the map-matching result. Future work will
also include a detailed analysis of the effect of increased GPS
noise and extreme maneuvers (such as sharp turnings) on run-
time savings and map-matching quality.
Several other researchers have suggested different methods
for optimizing the run time of the original map-matching
algorithm [4] [5]. Since these suggestions optimize different
aspects of the map-matching algorithm, they should be com-
patible with our approach. Future work will focus on further
decreasing the algorithm run-time by combining our approach
with these techniques. Another promising direction for further
improvement of the optimization proposed in this paper is to
replace the Bidirectional Dijkstra with other search algorithms
such as the A-algorithm. This algorithm employs a heuristic
to estimate the cost from the current node to the target node
in the search graph, and with a suitable heuristic this could
further reduce the average number of routing operations in
many real-world scenarios.
V. ACKNOWL ED GE ME NT S
This work was supported by the European Commission
under TEAM, a large-scale integrated project part of the
Seventh Framework Programme for research, technological de-
velopment and demonstration [Grant Agreement NO.318621].
REFERENCES
[1] Sotiris Brakatsoulas, Dieter Pfoser, Randall Salas, and Carola Wenk.
On map-matching vehicle tracking data. In Proceedings of the 31st
international conference on Very large data bases (VLDB ’05). VLDB
Endowment 853-864, 2005.
[2] N.R. Velaga, M.A. Quddus and A.L.Bristow Developing an enhanced
weight-based topological map-matching algorithm for intelligent trans-
port systems Transportation Research Part C: Emerging Technologies,
17 (6), pp.672-683, 2009
[3] Paul Newson and John Krumm. Hidden markov map matching through
noise and sparseness. In In Proceedings of the 17th ACM SIGSPATIAL
International Conference on Advances in Geographic Information Sys-
tems, pages 336–343, 2009.
[4] R.Song, W. Lu, W.Sun. Quick Map Matching Using Multi-Core CPUs.
ACM SIGSPATIAL GIS, 2012
[5] Hong Wei, Yin Wang, George Foreman Fast viterbi mapmatching with
tunable weight functions ACM SIGSPATIAL GIS, 2012
[6] P.Lamb, S.Thiebaux. Avoiding explicit mapmatching in vehicle location.
In Proceedings of the 6th World Conference on Intelligent Transportation
Systems, 1999
[7] B.Hummel. Map matching for vehicle guidance. In Dynamic and Mobile
GIS: Investigating Space and Time. J.Drummond and R.Billen, Editors.
CRC Press: Florida, 2006
[8] Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and
Yan Huang. Map-matching for low-sampling-rate GPS trajectories. In
Proceedings of the 17th ACM SIGSPATIAL International Conference
on Advances in Geographic Information Systems (GIS ’09). ACM, New
York, NY, USA, 352-361. DOI=10.1145/1653771.1653820, 2009
[9] C. Y. Goh, J. Dauwels, N. Mitrovic, M. T. Asif, A. Oran, and P. Jaillet.
Online map-matching based on hidden markov model for for real-time
traffic sensing applications In Proceedings of the 15th International IEEE
Conference on Intelligent Transportation Systems (ITSC),Anchorage,
AK, DOI=10.1109/ITSC.2012.6338627, 2012
[10] Ira S. Pohl. Bi-directional search. Machine Intelligence, 6:127–140,
1971.
[11] T.A.J. Nicholson Finding the shortest route between two points in a
network The Computer Journal, Vol. 9, Nr. 3,S. 275-280, 1966.
[12] A. Viterbi Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm IEEE Trans. Information Theory, vol. 13,
no. 2, Apr. 1967, pp. 260-269.
[13] E. W. Dijkstra A note on two problems in connection with graphs
Numer. Math., 1:269271, 1959
... • A crowd-sourced trajectory only contains GNSS points and does not contain the ground truth of the road segments on which the vehicle was travelling. In such scenarios, the methods of trajectory data-based traffic flow analysis require map-matching to determine the most likely segment of the mobility network for each GNSS point in the trajectory data (Bang, Kim, and Yu, 2016;Brakatsoulas et al., 2005;Koller et al., 2015;Newson and Krumm, 2009;Qi, Di, and Li, 2019;Roth, 2016;Schuessler and Axhausen, 2009). Existing map-matching algorithms encounter challenges in predicting correctly map-matched routes when dealing with the peculiar road networks and traffic conditions in LMICs (Dey, Tomko, and Winter, 2022;Yanocha, Mason, and Hagen, 2021). ...
... The discrete GNSS data points from the crowd-sourced trajectory are matched to the most likely route segments by a map-matching algorithm (Koller et al., 2015;Newson and Krumm, 2009;Qi, Di, and Li, 2019). However, map-matching algorithms are prone to produce erroneous outputs and incorrectly map-matched segments, mainly due to the inevitable occurrence of systematic GNSS errors, including blocked signals and multipath effects (measurement error) (Chen and Bierlaire, 2015;Cui, Bian, and Wang, 2021;Schuessler and Axhausen, 2009), low sample rate GNSS trajectories (sampling error) (Hsueh and Chen, 2018), computational limitations of algorithms, incomplete map data (Dey et al., 2021b), matching errors at junctions (Xu et al., 2019), and matching to wrong mobility networks (Chen and Bierlaire, 2015;Hsueh and Chen, 2018;Qi, Di, and Li, 2019;Xu et al., 2019). ...
... The HMM takes into account the probabilities that govern the state measurements and the probabilities that govern the transitions between states (road segments) at each time (Cui, Bian, and Wang, 2021;Newson and Krumm, 2009). The HMM calculates the emission probabilities by modeling the measurement noise and the transition probabilities by modeling the distance between the GNSS measurements and the probable route (Koller et al., 2015;Luo, Chen, and Xv, 2017;Newson and Krumm, 2009;Raymond et al., 2012). The Viterbi algorithm is used to calculate the best route through the HMM lattice (Cui, Bian, and Wang, 2021;Luo, Chen, and Xv, 2017;Mohamed, Aly, and Youssef, 2014;Newson and Krumm, 2009). ...
Thesis
Full-text available
An essential part of regulating and optimising traffic flow in urban and highway networks is Traffic state estimation (TSE). It provides information on road-traffic conditions that may be utilised to improve traffic flow, optimize the performance of the transportation system, support the development of Intelligent Transportation Systems (ITS), revolutionise the transport industry and raise the quality of life for those living in cities and suburbs. TSE needs dedicated expensive infrastructure for transportation data collection, depending on a city's transport infrastructure and the level of economic development. Many Low- and Middle-Income Cities (LMIC) do not have well-established transportation infrastructure. As a result, there is limited availability of expensive sensor data for estimating the traffic state. Therefore, there is a need to develop more cost-effective and efficient methods for estimating traffic states in the absence of dedicated sensor-based data. Consequently, recent studies have shown that trajectory data collected from vehicles equipped with Global Navigation Satellite Systems (GNSS) units can be used for TSE reducing the dependency on dedicated sensor data. Trajectory data are also combined with static sensor data to estimate relevant parameters of traffic states, such as traffic flow, travel times, queue lengths, and shockwave boundaries. Commercial trajectory data are highly privacy-sensitive and not always accessible in a location of interest. So anonymized crowd-sourcing has been used in some studies to collect trajectory data without hindering privacy issues. Crowd-sourced trajectory data at least contain information on location, time, and unique anonymous ID in discrete time steps. However, crowd-sourced trajectory data collection methods bring several challenges due to various systematic and device errors, and a limited number of trajectories collected with an unknown fraction of the population traffic. Also, there are inherent difficulties with the existing trajectory-based research to be applied for TSE. Crowd-sourced trajectory data points do not contain essential traffic information, such as what ground truth route was taken on a road network, whether the data was collected from a moving vehicle on road traffic, and how it interacted with traffic conditions. These challenges with crowd-sourced trajectory data emphasise the necessity for novel methods to get around the current limitations of estimating traffic state parameters. This thesis proposes new techniques to utilize crowd-sourced trajectory data in order to estimate traffic flow, density, velocity, and travel time as traffic state parameters of a road network. An overall hypothesis of the research problem, a set of research questions and their corresponding research goals are developed by analyzing research gaps in crowd-sourced trajectory data-based TSE. The challenges with data collection methods and limitations on the applicability of the existing methods are established in the individual research questions. The first two research goals estimate ground truth routes of vehicles on the road by an improved map-matching and travel mode detection of the multi-modal raw trajectory data. The remaining research goals involve developing methodologies for estimating parking information, traffic counts, and vehicle Origin-Destination (OD) flows from vehicle-bound trajectories. Finally, the estimated traffic counts, and OD flow, together infer travel time, traffic flow, density, and velocity at the road network of interest. Thus, this thesis contributes to TSE only using crowd-sourced trajectory data at the road network of interest where dedicated data collection infrastructure does not exist or is not operational. The thesis draws upon empirical data and real-world case studies to validate and evaluate the proposed methods and models. It also discusses the limitations and potential applications of the findings, highlighting the implications for traffic management, urban planning, and transportation systems. All the proposed methods within the research goals demonstrated usefulness including the future applicability of crowd-sourced trajectory data as a low-cost solution for TSE in LMIC. This thesis serves as a valuable contribution to the field of traffic management systems by demonstrating the feasibility and benefits of using crowd-sourced trajectory data in traffic estimation. It contributes to a better understanding of urban traffic conditions in LMIC, where road infrastructure is inadequate and resources are limited. The proposed techniques can help in optimizing traffic operations and planning, which can result in economic and ecological loss control. Overall, this research has the potential to make significant contributions to the field of transportation engineering and improve the quality of life in LMIC.
... Given an observation (c i , t s i , t e i ) in a trajectory of a user and a set of metro stations, S, the emission probability P(s j |c i ) represents the likelihood of c i being observed if the user is located at a metro station, s j ∈ S. A higher emission probability is associated with c i if s j is closer to c i . Following some prior works on HMM [41][42][43], we use Gaussian distribution to model the emission probability: ...
... Following prior works [41][42][43], we formulate the transition probability based on the exponential probability distribution: ...
Article
Full-text available
A fine-grained metro trip contains complete information on user mobility, including the original station, destination station, departure time, arrival time, transfer station(s), and corresponding transfer time during the metro journey. Understanding such detailed trip information within a city is crucial for various smart city applications, such as effective urban planning and public transportation system optimization. In this work, we study the problem of detecting fine-grained metro trips from cellular trajectory data. Existing trip-detection approaches designed for GPS trajectories are often not applicable to cellular data due to the issues of location noise and irregular data sampling in cellular data. Moreover, most cellular data-based methods focus on identifying coarse-grained transportation modes, failing to detect fine-grained metro trips accurately. To address the limitations of existing works, we propose a novel and efficient fine-grained metro-trip detection (FGMTD) model in this work. By considering both the local and global spatial–temporal characteristics of a trajectory and the metro network, FGMTD can effectively mitigate the effects of location noise and irregular data sampling, ultimately improving the accuracy and reliability of the detection process. In particular, FGMTD employs a spatial–temporal hidden Markov model with efficient index strategies to capture local spatial–temporal characteristics from individual positions and metro stations, and a weighted trip-route similarity measure to consider global spatial–temporal characteristics from the entire trajectory and metro route. We conduct extensive experiments on two real datasets to evaluate the effectiveness and efficiency of our proposed approaches. The first dataset contains cellular data from 30 volunteers, including their actual trip details, while the second dataset consists of data from 4 million users. The experiments illustrate the significant accuracy of our approach (with a precision of 87.80% and a recall of 84.28%). Moreover, we demonstrate that FGMTD is efficient in detecting fine-grained trips from a large amount of cellular data, achieving this task within 90 min of processing a day’s data from 4 million users.
... We add a new hidden state, the open-field state, for which the complexity of our proposed EHMM-P is 1 . Our approach can achieve the computation of approximately 1000 GPS points per second, similar to the reference work, where Viterbi was also employed to search for the optimal path of HMM for map-matching [41]. Algorithm 1 shows the summary of the proposed EHMM-P. ...
Article
Full-text available
Map-matching is a core functionality of pedestrian navigation applications. The localization errors of the global positioning systems (GPSs) in smartphones are one of the most critical factors that limit the large-scale deployment of pedestrian navigation applications, especially in dense urban areas where multiple road segments exist within the range of GPS errors, which can be increased by tall buildings neighboring each other. In this paper, we address two issues of practical importance for map-matching based on the Hidden Markov Model (HMM) in pedestrian navigation systems: large localization error in the initial phase of map-matching and HMM breaks in open field traversals. A heuristic method to determine the probability of initial states of the HMM based on a small number of GPS data received during the short warm-up period is proposed to improve the accuracy of initial map-matching. A simple but highly practical method based on a heuristic evaluation of near-future locations is proposed to prevent the malfunction of the Viterbi algorithm within the area of open fields. The results of field experiments indicate that the enhanced HMM constructed via the proposed methods achieves significantly higher map-matching accuracy compared to that of state of the art.
Chapter
Full-text available
With the ubiquity of mobile devices that are capable of tracking positions (be it via GPS or Wi-Fi/mobile network localization), there is a continuous stream of location data being generated every second. These location measurements are typically not considered individually but rather as sequences, each of which reflects the movement of one person or vehicle, which we call trajectory. This chapter presents new algorithmic approaches to process and visualize trajectories both in the network-constrained and the unconstrained case.
Conference Paper
Full-text available
In many Intelligent Transportation System (ITS) applications that crowd-source data from probe vehicles, a crucial step is to accurately map the GPS trajectories to the road network in real time. This process, known as map-matching, often needs to account for noise and sparseness of the data because (1) highly precise GPS traces are rarely available, and (2) dense trajectories are costly for live transmission and storage. We propose an online map-matching algorithm based on the Hidden Markov Model (HMM) that is robust to noise and sparseness. We focused on two improvements over existing HMM-based algorithms: (1) the use of an optimal localizing strategy, the variable sliding window (VSW) method, that guarantees the online solution quality under uncertain future inputs, and (2) the novel combination of spatial, temporal and topological information using machine learning. We evaluated the accuracy of our algorithm using field test data collected on bus routes covering urban and rural areas. Furthermore, we also investigated the relationships between accuracy and output delays in processing live input streams. In our tests on field test data, VSW outperformed the traditional localizing method in terms of both accuracy and output delay. Our results suggest that it is viable for low latency applications such as traffic sensing.
Conference Paper
Full-text available
Map-matching is the process of aligning a sequence of observed user positions with the road network on a digital map. It is a fundamental pre-processing step for many applications, such as moving object management, traffic flow analysis, and driving directions. In practice there exists huge amount of low-sampling- rate (e.g., one point every 2-5 minutes) GPS trajectories. Unfortunately, most current map-matching approaches only deal with high-sampling-rate (typically one point every 10-30s) GPS data, and become less effective for low-sampling-rate points as the uncertainty in data increases. In this paper, we propose a novel global map-matching algorithm called ST-Matching for low- sampling-rate GPS trajectories. ST-Matching considers (1) the spatial geometric and topological structures of the road network and (2) the temporal/speed constraints of the trajectories. Based on spatio-temporal analysis, a candidate graph is constructed from which the best matching path sequence is identified. We compare ST-Matching with the incremental algorithm and Average-Fré chet-Distance (AFD) based global map-matching algorithm. The experiments are performed both on synthetic and real dataset. The results show that our ST-matching algorithm significantly outperform incremental algorithm in terms of matching accuracy for low-sampling trajectories. Meanwhile, when compared with AFD-based global algorithm, ST-Matching also improves accuracy as well as running time.
Conference Paper
Full-text available
Vehicle tracking data is an essential "raw" material for a broad range of applications such as traffic management and control, routing, and navigation. An important issue with this data is its accuracy. The method of sampling vehicular movement using GPS is affected by two error sources and consequently produces inaccurate trajectory data. To become useful, the data has to be related to the underlying road network by means of algorithms. We present three such algorithms that consider especially the trajectory nature of the data rather than simply the current position as in the typical map-matching case. An incremental algorithm is proposed that matches consecutive portions of the trajectory to the road network, effectively trading accuracy for speed of computation. In contrast, the two global algorithms compare the entire trajectory to candidate paths in the road network. The algorithms are evaluated in terms of (i) their running time and (ii) the quality of their matching result. Two novel quality measures utilizing the Fréchet distance are introduced and subsequently used in an experimental evaluation to assess the quality of matching real tracking data to a road network.
Article
A new method is proposed for finding the shortest route between two points in an interconnected network. The shortest route is found by investigating a selection of routes from both the starting point and the terminal point. The selection of routes is decided dynamically by extending one by one the routes which have currently covered the least distance. Once a complete through route has been found, it has to be made certain that it is the minimum. The new method appears to be more efficient than alternative approaches to the problem through linear or dynamic programming. Some applications of the technique to scheduling and other problems are briefly described.
Article
Map-matching (MM) algorithms integrate positioning data from a Global Positioning System (or a number of other positioning sensors) with a spatial road map with the aim of identifying the road segment on which a user (or a vehicle) is travelling and the location on that segment. Amongst the family of MM algorithms consisting of geometric, topological, probabilistic and advanced, topological MM (tMM) algorithms are relatively simple, easy and quick, enabling them to be implemented in real-time. Therefore, a tMM algorithm is used in many navigation devices manufactured by industry. However, existing tMM algorithms have a number of limitations which affect their performance relative to advanced MM algorithms. This paper demonstrates that it is possible by addressing these issues to significantly improve the performance of a tMM algorithm. This paper describes the development of an enhanced weight-based tMM algorithm in which the weights are determined from real-world field data using an optimisation technique. Two new weights for turn-restriction at junctions and link connectivity are introduced to improve the performance of matching, especially at junctions. A new procedure is developed for the initial map-matching process. Two consistency checks are introduced to minimise mismatches. The enhanced map-matching algorithm was tested using field data from dense urban areas and suburban areas. The algorithm identified 96.8% and 95.93% of the links correctly for positioning data collected in urban areas of central London and Washington, DC, respectively. In case of suburban area, in the west of London, the algorithm succeeded with 96.71% correct link identification with a horizontal accuracy of 9.81 m (2σ). This is superior to most existing topological MM algorithms and has the potential to support the navigation modules of many Intelligent Transport System (ITS) services.
Conference Paper
The problem of matching measured latitude/longitude points to roads is becoming increasingly important. This paper describes a novel, principled map matching algorithm that uses a Hidden Markov Model (HMM) to find the most likely road route represented by a time-stamped sequence of latitude/longitude pairs. The HMM elegantly accounts for measurement noise and the layout of the road network. We test our algorithm on ground truth data collected from a GPS receiver in a vehicle. Our test shows how the algorithm breaks down as the sampling rate of the GPS is reduced. We also test the effect of increasing amounts of additional measurement noise in order to assess how well our algorithm could deal with the inaccuracies of other location measurement systems, such as those based on WiFi and cell tower multilateration. We provide our GPS data and road network representation as a standard test set for other researchers to use in their map matching work.
Article
The probability of error in decoding an optimal convolutional code transmitted over a memoryless channel is bounded from above and below as a function of the constraint length of the code. For all but pathological channels the bounds are asymptotically (exponentially) tight for rates above R_{0} , the computational cutoff rate of sequential decoding. As a function of constraint length the performance of optimal convolutional codes is shown to be superior to that of block codes of the same length, the relative improvement increasing with rate. The upper bound is obtained for a specific probabilistic nonsequential decoding algorithm which is shown to be asymptotically optimum for rates above R_{0} and whose performance bears certain similarities to that of sequential decoding algorithms.
Avoiding explicit mapmatching in vehicle location
  • P Lamb
  • S Thiebaux
P.Lamb, S.Thiebaux. Avoiding explicit mapmatching in vehicle location. In Proceedings of the 6th World Conference on Intelligent Transportation Systems, 1999