Table 1 - uploaded by Julian Dibbelt
Content may be subject to copyright.
Source publication
We study the journey planning problem in public transit networks. Developing
efficient preprocessing-based speedup techniques for this problem has been
challenging: current approaches either require massive preprocessing effort or
provide limited speedups. Leveraging recent advances in Hub Labeling, the
fastest algorithm for road networks, we revis...
Context in source publication
Similar publications
The Bϵ-tree [Brodal and Fagerberg 2003] is a simple I/O-efficient external-memory-model data structure that supports updates orders of magnitude faster than B-tree with a query performance comparable to the B-tree: for any positive constant ϵ < 1 insertions and deletions take O(B11-ϵ logB N) time (rather than O(logB N) time for the classic B-tree),...
lot of problems in different practical fields of Computer Science, Database Management System, Networks, Data Mining and Artificial intelligence. Searching is common fundamental operation and solve to searching problem in a different formats of these field. This research paper are presents the basic type of searching algorithms of data structure li...
Overfitting is the bane of data analysts, even when data are plentiful.
Formal approaches to understanding this problem focus on statistical inference
and generalization of individual analysis procedures. Yet the practice of data
analysis is an inherently interactive and adaptive process: new analyses and
hypotheses are proposed after seeing the re...
Citations
... Many efficient algorithms have been developed over the last years for mono-or bicriteria routing in public transit network (e.g. [1,4,5,15,3]). However, they mostly consider fixed transfer speed and sometimes rely on long preprocessing depending on fixed transfer times. ...
In the context of routing in public transit networks, we consider the issue of the customization of walking transfer times, which is incompatible with the preprocessing required by many state-of-the-art algorithms. We propose to extend one of those, the Trip-Based Public Transit Routing algorithm, to take into account at query time user defined transfer speed and maximum transfer duration. The obtained algorithm is optimal for the bicriteria problem of optimizing minimum arrival time and number of transfers. It is tested on two large data sets and the query times are compatible with real-time queries in a production context.
... For the above reasons, and since big graphs are becoming the norm in most modern application contexts that rely on distance mining, many smarter and more scalable techniques, between these two extreme solutions, have been proposed in the recent past to deal with massive graphs. Essentially all of them adopt the common strategy of preprocessing the graph to compute some data structure that is then exploited to accelerate the query algorithm [2,3,7,9,[18][19][20][21][22][23]. Some of these methods have been designed to handle special classes of graphs, of interest of specific applications (e.g., speed-up techniques for route planning) and in road/transport networks (e.g., [2,14,20,21,24,25]) while some others are general and provide speedups over baseline strategies regardless of the structural properties of the graph to be managed (e.g., [3,15,19,23]). ...
... Essentially all of them adopt the common strategy of preprocessing the graph to compute some data structure that is then exploited to accelerate the query algorithm [2,3,7,9,[18][19][20][21][22][23]. Some of these methods have been designed to handle special classes of graphs, of interest of specific applications (e.g., speed-up techniques for route planning) and in road/transport networks (e.g., [2,14,20,21,24,25]) while some others are general and provide speedups over baseline strategies regardless of the structural properties of the graph to be managed (e.g., [3,15,19,23]). In this latter category we find 2-HOP-COVER-based labeling approaches that currently are considered state-of-the-art methods for distance mining in massive graphs. ...
... In this latter category we find 2-HOP-COVER-based labeling approaches that currently are considered state-of-the-art methods for distance mining in massive graphs. In fact, they have been shown to exhibit superior performance in terms of query times, allowing computation of shortest-path distances in microseconds even for billion-vertex graphs, at the price of a competitive preprocessing time and space overhead [7,8,21,26,27]. ...
Computing shortest-path distances is a fundamental primitive in the context of graph data mining, since this kind of information is essential in a broad range of prominent applications, which include social network analysis, data routing, web search optimization, database design and route planning. Standard algorithms for shortest paths (e.g., Dijkstra’s) do not scale well with the graph size, as they take more than a second or huge memory overheads to answer a single query on the distance for large-scale graph datasets. Hence, they are not suited to mine distances from big graphs, which are becoming the norm in most modern application contexts. Therefore, to achieve faster query answering, smarter and more scalable methods have been designed, the most effective of them based on precomputing and querying a compact representation of the transitive closure of the input graph, called the 2-hop-cover labeling. To use such approaches in realistic time-evolving scenarios, when the managed graph undergoes topological modifications over time, specific dynamic algorithms, carefully updating the labeling as the graph evolves, have been introduced. In fact, recomputing from scratch the 2-hop-cover structure every time the graph changes is not an option, as it induces unsustainable time overheads. While the state-of-the-art dynamic algorithm to update a 2-hop-cover labeling against incremental modifications (insertions of arcs/vertices, arc weights decreases) offers very fast update times, the only known solution for decremental modifications (deletions of arcs/vertices, arc weights increases) is still far from being considered practical, as it requires up to tens of seconds of processing per update in several prominent classes of real-world inputs, as experimentation shows. In this paper, we introduce a new dynamic algorithm to update 2-hop-cover labelings against decremental changes. We prove its correctness, formally analyze its worst-case performance, and assess its effectiveness through an experimental evaluation employing both real-world and synthetic inputs. Our results show that it improves, by up to several orders of magnitude, upon average update times of the only existing decremental algorithm, thus representing a step forward towards real-time distance mining in general, massive time-evolving graphs.
... For example, Efentakis et al. [10] formulated the public transportation routing problem as a database query and proposed a pure SQL based routing framework. Wang et al. [24] and Delling et al. [8] modeled public transportation networks as a timetable graph and proposed a labeling based index to speedup shortest path queries. However, all the above approaches mainly focus on city-wide public transportation routing, and only consider single or a few transport modes (bus and metro). ...
Public transportation plays a critical role in people's daily life. It has been proven that public transportation is more environmentally sustainable, efficient, and economical than any other forms of travel. However, due to the increasing expansion of transportation networks and more complex travel situations, people are having difficulties in efficiently finding the most preferred route from one place to another through public transportation systems. To this end, in this paper, we present Polestar, a data-driven engine for intelligent and efficient public transportation routing. Specifically, we first propose a novel Public Transportation Graph (PTG) to model public transportation system in terms of various travel costs, such as time or distance. Then, we introduce a general route search algorithm coupled with an efficient station binding method for efficient route candidate generation. After that, we propose a two-pass route candidate ranking module to capture user preferences under dynamic travel situations. Finally, experiments on two real-world data sets demonstrate the advantages of Polestar in terms of both efficiency and effectiveness. Indeed, in early 2019, Polestar has been deployed on Baidu Maps, one of the world's largest map services. To date, Polestar is servicing over 330 cities, answers over a hundred millions of queries each day, and achieves substantial improvement of user click ratio.
... Algorithms that yield fast query times for public transit routing in large metropolitan networks are numerous, with extensions of Dijkstra's algorithm (Disser et al., 2008;Müller-Hannemann et al., 2007;Pyrga et al., 2008), graph labeling algorithms (Delling et al., 2015a;Wang et al., 2015), non graph based algorithms like the Connection Scan Algorithm (CSA) (Dibbelt et al., 2018) or RAPTOR (Delling et al., 2015b), preprocessing heavy approaches with Transfer Patterns (Bast et al., 2010;Bast and Storandt, 2014), or with a lighter preprocessing with Trip-Based Public Transit Routing (Witt, 2015). There are also extensions to these algorithms to allow for shorter response times, Trip Based Routing is made faster with the use of condensed search trees (Witt, 2016), RAPTOR with the use of hyper graphs (Delling et al., 2017), CSA with the use of overlaygraphs (CSAccel) (Strasser and Wagner, 2014) and Transfer Patterns by reducing the space and time consumption of the preprocessing (Bast et al., 2016b). ...
... The graph-based models, instead, store the timetable as a suitable graph and execute known adaptations of Dijkstra's shortest path algorithm to compute optimal routes [2,[9][10][11]. Alternatives to both plain array-based and graph-based models have been also recently considered [12][13][14]. Some of them, like the one in Reference [12], directly operate on the timetable. ...
... Some others, like those in References [13,14], combine a graph representation of the timetable with the notion of graph labeling to achieve extremely low query times. In this paper, we focus on this latter category of approaches, since they are the ones that offer the smallest average query times and are hence suited for modern applications of the journey planning problem. ...
... The fastest solutions to the journey planning problem with respect to query time are those in References [13,14]. Of them, the one in Reference [13] relies on an algorithmic framework referred to, in the following, as Public Transit Labeling (PTL). ...
This paper studies the journey planning problem in the context of transit networks. Given
the timetable of a schedule-based transportation system (consisting, e.g., of trains, buses, etc.),
the problem seeks journeys optimizing some criteria. Specifically, it seeks to answer natural queries
such as, for example, “find a journey starting from a source stop and arriving at a target stop as early
as possible”. The fastest approach for answering to these queries, yielding the smallest average query
time even on very large networks, is the Public Transit Labeling framework, proposed for the first
time in Delling et al., SEA 2015. This method combines three main ingredients: (i) a graph-based
representation of the schedule of the transit network; (ii) a labeling of such graph encoding its
transitive closure (computed via a time-consuming pre-processing); (iii) an efficient query algorithm
exploiting both (i) and (ii) to answer quickly to queries of interest at runtime. Unfortunately, while
transit networks’ timetables are inherently dynamic (they are often subject to delays or disruptions),
PTL is not natively designed to handle updates in the schedule—even after a single change,
precomputed data may become outdated and queries can return incorrect results. This is a major
limitation, especially when dealing with massively sized inputs (e.g., metropolitan or continental
sized networks), as recomputing the labeling from scratch, after each change, yields unsustainable
time overheads that are not compatible with interactive applications. In this work, we introduce a new
framework that extends PTL to function in delay-prone transit networks. In particular, we provide
a new set of algorithms able to update both the graph and the precomputed labeling whenever
a delay affects the network, without performing any recomputation from scratch. We demonstrate
the effectiveness of our solution through an extensive experimental evaluation conducted on
real-world networks. Our experiments show that: (i) the update time required by the new algorithms
is, on average, orders of magnitude smaller than that required by the recomputation from scratch via
PTL; (ii) the updated graph and labeling induce both query time performance and space overhead that
are equivalent to those that are obtained by the recomputation from scratch via PTL. This suggests that
our new solution is an effective approach to handling the journey planning problem in delay-prone
transit networks
... In this paper, we focus on the realistic scenario. To solve the mentioned variants, a great variety of models and techniques have been proposed in the literature [5,11,[13][14][15]. We refer to the very recent survey of Bast et al. [2] for a comprehensive overview. ...
... The state-of-the-art method, achieving the smallest query times on large sets of real-world inputs, is Public Transit Labeling (ptl, for short) a preprocessingbased approach that has been experimentally shown to outperform all other solutions, achieving order of milliseconds query times on average even in continental-sized networks [4,7,11,17]. Such approach essentially consists of three main ingredients: (i) a well-known graph data structure for storing transit networks, i.e. the time-expanded graph; (ii) a compact labeling-based representation of the transitive closure of the said graph, computed via a (time-consuming) preprocessing step; (iii) an efficient query algorithm exploiting both the graph and the precomputed data to answer quickly to queries of interest at runtime. ...
... In particular, even after a single change to the network, queries can return incorrect results since the preprocessed data can become easily outdated and do not reflect properly the transitive closure. Recomputing the labeling-based data from scratch, after an update occurs, is not a viable option as it yields unsustainable time overheads, up to tens of hours [11]. Since transit networks are inherently dynamic (delays can be very frequent), the above represents a major limitation of ptl. ...
We study the journey planning problem in transit networks which, given the timetable of a schedule-based transit system, asks to answer to queries such as, e.g., “seek a journey that arrives at a given destination as early as possible”. The state-of-the-art solution to such problem, in terms of query time, is Public Transit Labeling (ptl), proposed in [Delling et al., SEA 2015], that consists of three main ingredients: (i) a graph data structure for storing transit networks; (ii) a compact labeling-based representation of the transitive closure of such graph, computed via a time-consuming preprocessing routine; (iii) an efficient query algorithm exploiting both graph and precomputed data to answer quickly to queries of interest at runtime.
The major drawback of ptl is not being practical in dynamic scenarios, when the network’s timetable can undergo updates (e.g. delays). In fact, even after a single change, precomputed data become outdated and queries can return incorrect results. Recomputing the labeling-based representation from scratch, after a modification, is not a viable option as it yields unsustainable time overheads. Since transit networks are inherently dynamic, the above represents a major limitation of ptl.
In this paper, we overcome such limit by introducing a dynamic algorithm, called D−PTL , able to update the preprocessed data whenever a delay affects the network, without recomputing it from scratch. We demonstrate the effectiveness of D−PTL through a rigorous experimental evaluation showing that its update times are orders of magnitude smaller than the time for recomputing the preprocessed data from scratch.
... As research related to this study, there are studies on timetablebased public transit routing, studies related to multiple-routing, and studies related to transfer penalty. First, as typical timetable-based public transit routing algorithms being recently developed, RAPTOR (Round-bAsed Public Transit Optimized Router), CSA (Connection Scanning Algorithm), and Tripbased (Delling et al., 2015;Dibbelt et al., 2013;Madkour et al., 2017) can be cited. Delling et al. (2012) proposed the RAPTOR algorithm, in which they stored route"s vehicle arrival time at each stop, searched for vehicles passing by each stop, and computed the minimum arrival time and path to the end arrival stop. ...
Most of the existing public transit routing algorithms were developed on the basis of graph theory. Recently, algorithms are being developed that can compute for O-D public transit paths by using timetable information only, not using network structure consisting of nodes and links. The timetable-based public transit routing algorithm produces one shortest path to destination, using departure time and arrival time by stop. But it has limitations in reflecting additional factors, such as transfer penalty and alternative path selection, in the process of path calculation. In addition, since public transit passengers tend to choose one among various alternative paths, it is necessary to calculate multiple paths rather than a single path as in the existing methods. Therefore, this study proposes an improved RAPTOR algorithm that can consider transfer penalty and produce multiple paths, while it is based on RAPTOR, the existing timetable-based public transit routing algorithm. The transfer penalty was applied at the point of transfer, and differently according to transfer types. As a result of analyzing computed paths of the algorithms before and after improvement, it was found that computed paths with the improved RAPTOR algorithm proposed by this study were more similar to Seoul public transit passengers' actual travel paths than computed paths by the existing RAPTOR alone.
... Moreover, (Delling, et al., 2015) introduced the Public Transit Labelling (PTL) algorithm, a new preprocessingbased algorithm for journey planning in public transit networks, by revisiting the time-expanded model and adapting the Hub Labeling. In fact, they adopt 2-hop labeling to public transit networks, improving query performance by orders of magnitude over previous methods, while keeping preprocessing time practical. ...
In the last decades, products and services concerning the transportation of individuals have made the world more interconnected than ever before. Although this fact has enabled people to perform travel-related activities more effectively, neglecting the ecological footprint of those transports has created environmental problems, and as a consequence, for our societies. This study introduces a novel approach for the multi-modal journey planning problem (MMJP) and specifically proposes a hybrid solution algorithm that solves the problem of Environmental MMJP, based both on heuristic and exact algorithms. The algorithm delivers as solutions multi-modal paths that a person can follow and produce the minimum Greenhouse Gas Emissions (GHG) from the different modes of transport that he or she will use while travelling. Given a set of public transport operation schedules, emission calculation models and public network data, a mixed-integer linear programming (MILP) model was developed for the problem, which is solved in combination with the Dijkstra’s algorithm in order to deliver the optimal journey. The research is still ongoing for the improvement of the algorithm and the goal is to integrate it in an online platform.
... In RAPTOR the timetable is stored as a set of arrays of trips and routes which are used by a dynamic programming algorithm to solve the bi-criteria problem. Recently, some faster approaches in terms of query time with respect to the two aforementioned methods, have been proposed in [18] and in [37]. The graph-based models, instead, store the timetable as a suitable graph and execute known adaptations of Dijkstra's shortest path algorithm to compute optimal routes. ...
... Regarding array-based models, CSA and RAPTOR are suitable to handle dynamic changes of the timetable since they are not based on preprocessing, although there is no experimental evidence of this feature in the literature. Concerning the fast approaches in terms of query time given in [18] and in [37], the first is clearly not applicable to dynamic scenarios due to its huge preprocessing time, while the second looks more suitable, even though it needs around 6 minutes of (single threaded) preprocessing on a snapshot that models the public transportation system of London (with around 5 millions connections) in a fixed time interval. ...
In the last years we have witnessed remarkable progress in providing efficient algorithmic solutions to the problem of computing best journeys (or routes) in schedule-based public transportation systems. We have now models to represent timetables that allow us to answer queries for optimal journeys in a few milliseconds, also at a very large scale. Such models can be classified into two types: those representing the timetable as an array, and those representing it as a graph. Array-based models have been shown to be very effective in terms of query time, while graph-based ones usually answer queries by computing shortest paths, and hence they are suitable to be combined with the speed-up techniques developed for road networks.
In this paper, we study the behavior of graph-based models in the prominent case of dynamic scenarios, i.e., when delays might occur to the original timetable. In particular, we make the following contributions. First, we consider the graph-based reduced time-expanded model and give a simplified and optimized routine for handling delays, and a re-engineered and fine-tuned query algorithm. Second, we propose a new graph-based model, namely the dynamic timetable model, natively tailored to efficiently incorporate dynamic updates, along with a query algorithm and a routine for handling delays. Third, we show how to adapt the ALT algorithm to such graph-based models. We have chosen this speed-up technique since it supports dynamic changes, and a careful implementation of it can significantly boost its performance. Finally, we provide an experimental study to assess the effectiveness of all proposed models and algorithms, and to compare them with the array-based state of the art solution for the dynamic case. We evaluate both new and existing approaches by implementing and testing them on real-world timetables subject to synthetic delays.
Our experimental results show that: (i) the dynamic timetable model is the best model for handling delays; (ii) graph-based models are competitive to array-based models with respect to query time in the dynamic case; (iii) the dynamic timetable model compares favorably with both the original and the reduced time-expanded model regarding space; (iv) combining the graph-based models with speed-up techniques designed for road networks, such as ALT, is a very promising approach.
... These labels are then used to very fast answer vertex-to-vertex shortest-path queries. This technique has been adapted successfully to road networks [2,3,5,19], undirected, unweighted graphs in [6,17,33] and schedule-based, public-transportation networks in [15,22,48]. The HL method has also been applied to one-to-many, many-to-many and k-Nearest Neighbors (kNN) queries on road networks [18,20] and Reverse k-Nearest Neighbor (RkNN) queries in the context of social networks in [26]. ...
... The HL method has also been used for one-to-many, many-to-many and kNN queries on road networks in [18] and [20] respectively. Recently the HL technique has also been extended to schedule-based, public-transportation networks in [15,48]. ...
Shortest-path computation on graphs is one of the most well-studied problems in algorithmic theory. An aspect that has only recently attracted attention is the use of databases in combination with graph algorithms, so-called distance oracles, to compute shortest-path queries on large graphs. To this purpose, we propose a novel, efficient, pure-SQL framework for answering exact distance queries on large-scale graphs, implemented entirely on an open-source database engine. Our COLD framework (COmpressed Labels on the Database) can answer multiple distance queries (vertex-to-vertex, one-to-many, k-Nearest Neighbors, Reverse k-Nearest Neighbors, Reverse k-Farthest Neighbors and Top-k Range) not handled by previous methods, rendering it a complete database solution for a variety of practical large-scale graph applications. Our experimentation shows that COLD outperforms existing approaches (including popular graph databases) in terms of query time and efficiency, while requiring significantly less storage space than these methods.