Conference Paper

Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a novel order dispatch algorithm in large-scale on-demand ride-hailing platforms. While traditional order dispatch approaches usually focus on immediate customer satisfaction, the proposed algorithm is designed to provide a more efficient way to optimize resource utilization and user experience in a global and more farsighted view. In particular, we model order dispatch as a large-scale sequential decision-making problem, where the decision of assigning an order to a driver is determined by a centralized algorithm in a coordinated way. The problem is solved in a learning and planning manner: 1) based on historical data, we first summarize demand and supply patterns into a spatiotemporal quantization, each of which indicates the expected value of a driver being in a particular state; 2) a planning step is conducted in real-time, where each driver-order-pair is valued in consideration of both immediate rewards and future gains, and then dispatch is solved using a combinatorial optimizing algorithm. Through extensive offline experiments and online AB tests, the proposed approach delivers remarkable improvement on the platform's efficiency and has been successfully deployed in the production system of Didi Chuxing.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On-demand ride-hailing services like DiDi, Uber, and Lyft have significantly transformed urban transportation systems and human mobility patterns [38]. These platforms have introduced a paradigm shift in how people access transportation, offering unparalleled convenience, efficiency, and flexibility [11,25]. ...
... The ride-hailing service at DiDi continuously matches passenger requests with available drivers [38]. The service can be structured into three key stages: pre-ride, in-ride, and post-ride [29]. ...
Preprint
Full-text available
On-demand ride-hailing services like DiDi, Uber, and Lyft have transformed urban transportation, offering unmatched convenience and flexibility. In this paper, we introduce DiMA, an LLM-powered ride-hailing assistant deployed in DiDi Chuxing. Its goal is to provide seamless ride-hailing services and beyond through a natural and efficient conversational interface under dynamic and complex spatiotemporal urban contexts. To achieve this, we propose a spatiotemporal-aware order planning module that leverages external tools for precise spatiotemporal reasoning and progressive order planning. Additionally, we develop a cost-effective dialogue system that integrates multi-type dialog repliers with cost-aware LLM configurations to handle diverse conversation goals and trade-off response quality and latency. Furthermore, we introduce a continual fine-tuning scheme that utilizes real-world interactions and simulated dialogues to align the assistant's behavior with human preferred decision-making processes. Since its deployment in the DiDi application, DiMA has demonstrated exceptional performance, achieving 93% accuracy in order planning and 92% in response generation during real-world interactions. Offline experiments further validate DiMA capabilities, showing improvements of up to 70.23% in order planning and 321.27% in response generation compared to three state-of-the-art agent frameworks, while reducing latency by 0.72×0.72\times to 5.47×5.47\times. These results establish DiMA as an effective, efficient, and intelligent mobile assistant for ride-hailing services.
... While recent years have seen researchers explore learningbased methods [7,24,33,41] with promising results in solving VRP, these methods are often restricted to fixed-size state inputs and may not be applicable to VRPs with specific constraints. ...
... The research in [7] formulated the courier dispatching problem as a Markov Decision Process (MDP) and proposed a data-driven approach to derive optimal dispatching rules under different scenarios. An order dispatch algorithm [41] was developed for large-scale on-demand ride-hailing platforms, combining learning and planning techniques to optimize resource utilization and enhance user experience from a global and farsighted perspective. Additionally, a multi-agent reinforcement learning method using mean field approximation was designed to simplify local interactions, leading to improved cumulative driver income and order response rate [24]. ...
Article
Full-text available
With the increasing prevalence of Online-to-Offline (O2O) commerce, online food ordering platforms are handling tens of millions of daily food orders. For O2O platforms, an efficient food delivery strategy is crucial as it directly impacts customer satisfaction and, subsequently, the platform’s competitiveness. To address the O2O food delivery problem’s (OFDP) unique features, including multi-depot with capacity, pickup-delivery, and time-window constraints, we have developed a heuristic algorithm called the Niche-based Memetic Algorithm with Adaptive Parameters (NMAAP). The NMAAP incorporates niche differentiation to increase population diversity and adaptive adjustment of crossover and mutation rates to balance exploration and exploitation, ultimately overcoming the issue of being trapped in local optima. To evaluate the effectiveness of the NMAAP, we conducted static and dynamic experiments using real order data obtained from Ele.me. The results of the experiments were promising and indicated that the NMAAP outperformed the baseline methods, resulting in reduced average and maximum waiting times for customers, in both static and dynamic settings. These findings emphasize the ability of the NMAAP to improve overall performance and enhance customer satisfaction in the O2O food delivery industry.
... There are primarily two research scopes on this topic. The macro-view scope [14,23,24,28] focuses on long-term (several hours to a day) and city-level efficiency optimization. The other scope attends to the optimization under localized spatiotemporal scenarios with high stochasticity, i.e., micro-view order-dispatching (MICOD). ...
... With the prevalence of RL, abundant researches formulate large-scale ride-hailing problems in an MDP setting and attempt to solve them in a valuebased way. It is intuitive to model each driver as an agent [6,22,27], such that the scalability in action space can be easily handled, usually by learning a tabular or state value function [4,25,28]. ...
Preprint
Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging timings, spatial relationships, and travel duration, etc. In this paper, we propose a one-stage end-to-end reinforcement learning based order-dispatching approach that solves behavior prediction and combinatorial optimization uniformly in a sequential decision-making manner. Specifically, we employ a two-layer Markov Decision Process framework to model this problem, and present \underline{D}eep \underline{D}ouble \underline{S}calable \underline{N}etwork (D2SN), an encoder-decoder structure network to generate order-driver assignments directly and stop assignments accordingly. Besides, by leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance. Extensive experiments on Didi's real-world benchmarks justify that the proposed approach significantly outperforms competitive baselines in optimizing matching efficiency and user experience tasks. In addition, we evaluate the deployment outline and discuss the gains and experiences obtained during the deployment tests from the view of large-scale engineering implementation.
... Relevant studies have introduced deep reinforcement learning (DRL) algorithms to mitigate the challenges associated with high-dimensional state/action spaces. Although there are no direct studies about dynamic subsidy optimization, studies in the ride-hailing domain, such as order dispatching [19,20] and vehicle repositioning [21,22], provide valuable insights for our study. Compared with myopic optimization methods [20,23] and heuristic algorithms [24,25], DRL algorithms have superior performance in efficiency and accuracy. ...
... Although there are no direct studies about dynamic subsidy optimization, studies in the ride-hailing domain, such as order dispatching [19,20] and vehicle repositioning [21,22], provide valuable insights for our study. Compared with myopic optimization methods [20,23] and heuristic algorithms [24,25], DRL algorithms have superior performance in efficiency and accuracy. However, these studies neglected the impact of non-stationarity of traffic demand in the ride-hailing market. ...
Article
Full-text available
The ride-hailing market often experiences significant fluctuations in traffic demand, resulting in supply-demand imbalances. In this regard, the dynamic subsidy strategy is frequently employed by ride-hailing platforms to incentivize drivers to relocate to zones with high demand. However, determining the appropriate amount of subsidy at the appropriate time remains challenging. First, traffic demand exhibits high non-stationarity, characterized by multi-context patterns with time-varying statistical features. Second, high-dimensional state/action spaces contain multiple spatiotemporal dimensions and context patterns. Third, decision-making should satisfy real-time requirements. To address the above challenges, we first construct a Non-Stationary Markov Decision Process (NSMDP) based on the assumption of ride-hailing service systems dynamics. Then, we develop a solution framework for the NSMDP. A change point detection method based on feature-enhanced LSTM within the framework can identify the changepoints and time-varying context patterns of stochastic demand. Moreover, the framework also includes a deterministic policy deep reinforcement learning algorithm to optimize. Finally, through simulated experiments with real-world historical data, we demonstrate the effectiveness of the proposed approach. It performs well in improving the platform’s profits and alleviating supply-demand imbalances under the dynamic subsidy strategy. The results also prove that a scientific dynamic subsidy strategy is particularly effective in the high-demand context pattern with more drastic fluctuations. Additionally, the profitability of dynamic subsidy strategy will increase with the increase of the non-stationary level.
... Trip 1: Estimated to arrive in [8:17, 8:22] with a confidence level of 90% Trip 2: Estimated to arrive in [8:40, 8:54] with a confidence level of 90% cle routing (Xu et al. 2018;Fu et al. 2020). Existing studies for OD TTE (Yuan et al. 2020a;Lin et al. 2023;Wang et al. 2023) primarily focus on point estimates of travel time and have not adequately addressed uncertainty quantification. ...
Article
Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground truth, and 2) modeling the impact of travel time in each segment on overall uncertainty under varying conditions. We propose DutyTTE to address these challenges. For the first challenge, we introduce a deep reinforcement learning method to improve alignment between the predicted path and the ground truth, providing more accurate travel time information from road segments to improve TTE. For the second challenge, we propose a mixture of experts guided uncertainty quantification mechanism to better capture travel time uncertainty for each segment under varying contexts. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed method.
... The existing literature manages customer demand mainly through acceptance decisions, assortment optimization, or dynamic pricing. Upon receiving a request or a batch of requests, the service provider can decide which customers to accept for service according to the service provider's capacity, profitability, and the resource reservation for the future [Hosni et al., 2014, Xu et al., 2018, Ulmer et al., 2018a, Holler et al., 2019, Kullman et al., 2022, Giallombardo et al., 2022. Note that, although using acceptance decisions in their problems, these papers optimize in a single-day context. ...
Preprint
The last few years have witnessed rapid growth in the on-demand delivery market, with many start-ups entering the field. However, not all of these start-ups have succeeded due to various reasons, among others, not being able to establish a large enough customer base. In this paper, we address this problem that many on-demand transportation start-ups face: how to establish themselves in a new market. When starting, such companies often have limited fleet resources to serve demand across a city. Depending on the use of the fleet, varying service quality is observed in different areas of the city, and in turn, the service quality impacts the respective growth of demand in each area. Thus, operational fulfillment decisions drive the longer-term demand development. To integrate strategic demand development into real-time fulfillment operations, we propose a two-step approach. First, we derive analytical insights into optimal allocation decisions for a stylized problem. Second, we use these insights to shape the training data of a reinforcement learning strategy for operational real-time fulfillment. Our experiments demonstrate that combining operational efficiency with long-term strategic planning is highly advantageous. Further, we show that the careful shaping of training data is essential for the successful development of demand.
... 1. FlexiPool consistently outperforms both NeurADP and HIVES across all settings. Our average improvement over NeurADP and HIVES is 13% and 6% respectively, which is considered a significant improvement in cityscale taxi ride-pooling problems (Xu et al. 2018). ...
Preprint
Full-text available
The Ride-Pool Matching Problem (RMP) is central to on-demand ride-pooling services, where vehicles must be matched with multiple requests while adhering to service constraints such as pickup delays, detour limits, and vehicle capacity. Most existing RMP solutions assume passengers are picked up and dropped off at their original locations, neglecting the potential for passengers to walk to nearby spots to meet vehicles. This assumption restricts the optimization potential in ride-pooling operations. In this paper, we propose a novel matching method that incorporates extended pickup and drop-off areas for passengers. We first design a tree-based approach to efficiently generate feasible matches between passengers and vehicles. Next, we optimize vehicle routes to cover all designated pickup and drop-off locations while minimizing total travel distance. Finally, we employ dynamic assignment strategies to achieve optimal matching outcomes. Experiments on city-scale taxi datasets demonstrate that our method improves the number of served requests by up to 13\% and average travel distance by up to 21\% compared to leading existing solutions, underscoring the potential of leveraging passenger mobility to significantly enhance ride-pooling service efficiency.
... Another extensively studied method to reduce the dimensionality of the policy space is to use a decentralized reinforcement learning (RL) training approach [63,64,65,66,67,68,69,70,71,72,73,74,75,24,76,77,78,79,80,81,67,82,83,84,85,86,87,88,89,25,29,31,33,37]. In this approach, vehicles are treated as homogeneous and uncooperative agents that independently choose their own optimal actions based on a shared Q-function. ...
Preprint
Full-text available
Pioneering companies such as Waymo have deployed robo-taxi services in several U.S. cities. These robo-taxis are electric vehicles, and their operations require the joint optimization of ride matching, vehicle repositioning, and charging scheduling in a stochastic environment. We model the operations of the ride-hailing system with robo-taxis as a discrete-time, average reward Markov Decision Process with infinite horizon. As the fleet size grows, the dispatching is challenging as the set of system state and the fleet dispatching action set grow exponentially with the number of vehicles. To address this, we introduce a scalable deep reinforcement learning algorithm, called Atomic Proximal Policy Optimization (Atomic-PPO), that reduces the action space using atomic action decomposition. We evaluate our algorithm using real-world NYC for-hire vehicle data and we measure the performance using the long-run average reward achieved by the dispatching policy relative to a fluid-based reward upper bound. Our experiments demonstrate the superior performance of our Atomic-PPO compared to benchmarks. Furthermore, we conduct extensive numerical experiments to analyze the efficient allocation of charging facilities and assess the impact of vehicle range and charger speed on fleet performance.
... In terms of general vehicle dispatching and ride-matching, much work has been conducted using multiple objectives, such as minimizing passenger wait times or maximizing the number of rides matched, the system's profit and vehicle miles traveled (see, Molenbruch et al., 2017;Xu et al., 2018;Daoud et al., 2020;Kucharski and Cats, 2020). There is a gap in the literature of studies focusing on sustainability-based objectives in solving ride-matching problems. ...
... Although these approaches demonstrate improved performance, they rely on the standard DDIM paradigm [42], which necessitates simulating a Markov chain with a substantial number of steps for generation, significantly increasing time overhead. This limitation deprives them of the speed advantage of NAR approaches and hinders their practicality in time-sensitive real-world applications [57]. In contrast, DEITSP directly maps noise to the optimal solution by enhancing self-consistency [43] and develops a specialized iteration process tailored for TSP, which contributes to improved efficiency in both inference speed and solution quality. ...
Preprint
Full-text available
Recent advances in neural models have shown considerable promise in solving Traveling Salesman Problems (TSPs) without relying on much hand-crafted engineering. However, while non-autoregressive (NAR) approaches benefit from faster inference through parallelism, they typically deliver solutions of inferior quality compared to autoregressive ones. To enhance the solution quality while maintaining fast inference, we propose DEITSP, a diffusion model with efficient iterations tailored for TSP that operates in a NAR manner. Firstly, we introduce a one-step diffusion model that integrates the controlled discrete noise addition process with self-consistency enhancement, enabling optimal solution prediction through simultaneous denoising of multiple solutions. Secondly, we design a dual-modality graph transformer to bolster the extraction and fusion of features from node and edge modalities, while further accelerating the inference with fewer layers. Thirdly, we develop an efficient iterative strategy that alternates between adding and removing noise to improve exploration compared to previous diffusion methods. Additionally, we devise a scheduling framework to progressively refine the solution space by adjusting noise levels, facilitating a smooth search for optimal solutions. Extensive experiments on real-world and large-scale TSP instances demonstrate that DEITSP performs favorably against existing neural approaches in terms of solution quality, inference latency, and generalization ability. Our code is available at \href\href{https://github.com/DEITSP/DEITSP}{https://github.com/DEITSP/DEITSP}.
... Order Dispatch and Markov Matching Markets. Order dispatch systems [Xu et al., 2018] apply policy iteration and Q-learning to optimize driver-task assignments, handling stochastic action spaces [Cohen et al., 2022]. Markov matching markets [Min et al., 2022] introduce stochastic agent sets and planner actions that affect environmental transitions, integrating RL to optimize matching policies. ...
Preprint
This paper presents a hierarchical reinforcement learning (RL) approach to address the agent grouping or pairing problem in cooperative multi-agent systems. The goal is to simultaneously learn the optimal grouping and agent policy. By employing a hierarchical RL framework, we distinguish between high-level decisions of grouping and low-level agents' actions. Our approach utilizes the CTDE (Centralized Training with Decentralized Execution) paradigm, ensuring efficient learning and scalable execution. We incorporate permutation-invariant neural networks to handle the homogeneity and cooperation among agents, enabling effective coordination. The option-critic algorithm is adapted to manage the hierarchical decision-making process, allowing for dynamic and optimal policy adjustments.
... In this model, although the system is multi-agent from a global perspective, only a single agent is considered in the training phase. The most commonly used state elements include current vehicle location, passenger location, and order details (Al-Abbasi et al., 2019;Holler et al., 2019;Tang et al., 2019;Wang et al., 2018;Xu et al., 2018). These order details include, in addition to existing orders, forecasted demand information derived from forecasting models. ...
Article
Full-text available
The challenge of spatial resource allocation is pervasive across various domains such as transportation, industry, and daily life. As the scale of real-world issues continues to expand and demands for real-time solutions increase, traditional algorithms face significant computational pressures, struggling to achieve optimal efficiency and real-time capabilities. In recent years, with the escalating computational power of computers, the remarkable achievements of reinforcement learning in domains like Go and robotics have demonstrated its robust learning and sequential decision-making capabilities. Given these advancements, there has been a surge in novel methods employing reinforcement learning to tackle spatial resource allocation problems. These methods exhibit advantages such as rapid solution convergence and strong model generalization abilities, offering a new perspective on resolving spatial resource allocation problems. Despite the progress, reinforcement learning still faces hurdles when it comes to spatial resource allocation. There remains a gap in its ability to fully grasp the diversity and intricacy of real-world resources. The environmental models used in reinforcement learning may not always capture the spatial dynamics accurately. Moreover, in situations laden with strict and numerous constraints, reinforcement learning can sometimes fall short in offering feasible strategies. Consequently, this paper is dedicated to summarizing and reviewing current theoretical approaches and practical research that utilize reinforcement learning to address issues pertaining to spatial resource allocation. In addition, the paper accentuates several unresolved challenges that urgently necessitate future focus and exploration within this realm and proposes viable approaches for these challenges. This research furnishes valuable insights that may assist scholars in gaining a more nuanced understanding of the problems, opportunities, and potential directions concerning the application of reinforcement learning in spatial resource allocation.
... In ride-hailing, utility, privacy, and fairness of the order assignment are also widely researched. Xu et al. [29] design an algorithm for optimizing resource utilization and user experience, and Shi et al. [30] focus on improving utility and fairness. Wang et al. [31] propose a federated-learningbased framework for cross-platform ride-hailing platforms to achieve high efficiency while protecting privacy. ...
Article
Full-text available
Efficient order assignment in last-mile delivery benefits customers, couriers, and the platform. State-of-the-practice order assignment is based on the static delivery area partition, which cannot adapt well to the dynamic order quantity and destination distributions on different days. State-of-the-art methods focus on balancing order amounts or the payoff among couriers dynamically, neglecting the courier's workload in delivering orders. This paper explores the courier's heterogeneous behaviors for delivering orders to different destinations to measure the courier's workload and then achieve more efficient order assignments under the fair workload constraint. We design a workload-constrained order assignment system, called WORD , to reduce the cost of the last-mile delivery, i.e., the couriers' total travel distance and overdue order rate. Specifically, the heterogeneous behaviors for delivering orders are first utilized for workload calculation. Then a two-stage order assignment framework is designed, including a sort-based initialization algorithm for initializing the assignment under the fair workload constraint and a coalition-game-based improvement algorithm for improving the assignment. Extensive evaluation results with real-world logistics data from one of the largest logistics companies in China show that WORD reduces the cost of the order assignment by up to 51.9% under the fair workload constraint compared to the state-of-the-art methods.
... Nevertheless, it is important to understand that the configuration of the RL model is a crucial step to achieve such good results. As example, Xu et al. (2018) approached a large-scale order-dispatching in ondemand ride-hailing and, despite the complexity of a scenario with multiple states and actions that could be selected, the authors simplified the action and state set in order to increase performance and efficiency. Similarly, Kuhnle et al. (2021) approached a large-scale industry scenario and, in order to develop a feasible solution, modeled the solution as a sequential decision-making. ...
... For example, ride-hailing services can benefit from providing customers with lower and upper confidence bounds of travel time, allowing them to better plan their schedules (Liu et al. 2023). Moreover, understanding travel time uncertainty can help ride-hailing and logistics platforms improve decisionmaking effectiveness, such as in order dispatching and vehicle routing (Xu et al. 2018;Liu et al. 2023;Fu et al. 2020). Existing studies for OD TTE (Yuan et al. 2020a;Lin et al. 2023;Wang et al. 2023) primarily focus on point estimates of travel time and do not address uncertainty quantification. ...
Preprint
Full-text available
Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground truth, and 2) modeling the impact of travel time in each segment on overall uncertainty under varying conditions. We propose DutyTTE to address these challenges. For the first challenge, we introduce a deep reinforcement learning method to improve alignment between the predicted path and the ground truth, providing more accurate travel time information from road segments to improve TTE. For the second challenge, we propose a mixture of experts guided uncertainty quantification mechanism to better capture travel time uncertainty for each segment under varying contexts. Additionally, we calibrate our results using Hoeffding's upper-confidence bound to provide statistical guarantees for the estimated confidence intervals. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed method.
... Furthermore, the authors in [9] slightly improved the computational efficiency of the method in [6]. Also, a large-scale on-demand ridesharing framework is proposed in [26] for vehicle dispatching. In [27], the authors assessed and quantified the benefits of ridesharing and demonstrated that smaller sized vehicles could be preferred to larger sized vehicles for ridesharing. ...
Article
On-demand ridesharing is a promising avenue to transform urban mobility by providing effective, low-cost transportation services to passengers in real-time while increasing transit companies' profits. However, existing works suffer from the lack of balance between the system profit and the response time. Therefore, this paper proposes a computationally efficient, dynamic vehicle dispatch framework while optimizing system profit. We implemented a one-to-one assignment method coupled with a feasible schedule pruning strategy to enhance the proposed method's performance. Additionally, a profit optimization mechanism that dynamically estimates the fare per passenger based on travel time is presented. This fare estimation strategy is more realistic since it incorporates varying traffic conditions. Extensive experiments conducted on the New York City taxicab open-source data demonstrate that the proposed framework is up to ten times faster than the state-of-the-art method and achieves comparable profit. Moreover, the proposed approach proved scalable and efficient for large problem instances.
Article
A modern service model known as the “hub-oriented” model has emerged with the development of mobility services. This model allows users to request vehicles from multiple companies (agents) simultaneously through a unified entry (a ‘hub’). In contrast to conventional services, the “hub-oriented” model emphasizes pricing competition. To address this scenario, an agent should consider its competitors when developing its pricing strategy. In this paper, we introduce DRLPG, a mixed opponent-aware pricing method, which consists of two main components: the two-stage guarantor and the end-to-end deep reinforcement learning (DRL) module, as well as interaction mechanisms. In the guarantor, we design a prediction-decision framework. Specifically, we propose a new objective function for the spatiotemporal neural network in the prediction stage and utilize a traditional reinforcement learning method in the decision stage, respectively. In the end-to-end DRL framework, we explore the adoption of conventional DRL in the “hub-oriented” scenario. Finally, a meta-decider and an experience-sharing mechanism are proposed to combine both methods and leverage their advantages. We conduct extensive experiments on real data, and DRLPG achieves an average improvement of 99.9% and 61.1% in the peak and low peak periods, respectively. Our results demonstrate the effectiveness of our approach compared to the baseline.
Article
Recent technology development brings the boom of numerous new Demand-Driven Services (DDS) into urban lives, including ridesharing, on-demand delivery, express systems and warehousing. In DDS, a service loop is an elemental structure, including its service worker, the service providers and corresponding service targets. The service workers should transport either people or parcels from the providers to the target locations. Various planning tasks within DDS can thus be classified into two individual stages: 1) Dispatching, which is to form service loops from demand/supply distributions, and 2) Routing, which is to decide specific serving orders within the constructed loops. Generating high-quality strategies in both stages is important to develop DDS but faces several challenges. Meanwhile, deep reinforcement learning (DRL) has been developed rapidly in recent years. It is a powerful tool to solve these problems since DRL can learn a parametric model without relying on too many problem-based assumptions and optimize long-term effects by learning sequential decisions. In this survey, we first define DDS, then highlight common applications and important decision/control problems within. For each problem, we comprehensively introduce the existing DRL solutions. We also introduce open simulation environments for development and evaluation of DDS applications. Finally, we analyze remaining challenges and discuss further research opportunities in DRL solutions for DDS.
Article
Developing smart cities is vital for ensuring sustainable development and improving human well-being. One critical aspect of building smart cities is designing intelligent methods to address various decision-making problems that arise in urban areas. As machine learning techniques continue to advance rapidly, a growing body of research has been focused on utilizing these methods to achieve intelligent urban decision making. In this survey, we conduct a systematic literature review on the application of machine learning methods in urban decision making, with a focus on planning, transportation, and healthcare. First, we provide a taxonomy based on typical applications of machine learning methods for urban decision making. We then present background knowledge on these tasks and the machine learning techniques that have been adopted to solve them. Next, we examine the challenges and advantages of applying machine learning in urban decision making, including issues related to urban complexity, urban heterogeneity and computational cost. Afterward and primarily, we elaborate on the existing machine learning methods that aim to solve urban decision making tasks in planning, transportation, and healthcare, highlighting their strengths and limitations. Finally, we discuss open problems and the future directions of applying machine learning to enable intelligent urban decision making, such as developing foundation models and combining reinforcement learning algorithms with human feedback. We hope this survey can help researchers in related fields understand the recent progress made in existing works, and inspire novel applications of machine learning in smart cities.
Article
Spatial crowdsourcing is drawing much attention with the rapid development of mobile Internet. Achieving efficient crowdsourcing task assignment involves not only maximizing the earnings of workers but also balancing the preferences of users or customers. Users often express preferences for specific workers or conditions, such as particular drivers, delivery personnel, or service providers. To address this challenge, we investigate the Future-Aware Balanced Preference(FABP) problem. This problem aims to maximize the profits of global workers while simultaneously considering the preferences of both parties to ensure bilateral satisfaction. To address the FABP problem, we propose the Learning to Match(LTM) algorithm. This algorithm utilizes online reinforcement learning that considers both immediate profits and long-term rewards. It acknowledges the significance of task assignment decisions in relation to the spatial distribution of future drivers, which in turn affects subsequent decisions. The LTM algorithm generates future-aware preference lists using learned driver state values and guides the subsequent matching. Additionally, we present the Real-Time Preference-Based Matching(RTPM) algorithm, which is a real-time matching algorithm that enables substitutions based on preference lists when a more preferred matching pair becomes available. This enhances the efficiency and fairness of real-time task assignment in dynamic environments, while simultaneously meeting the needs of passengers and drivers. Our extensive experiments on both real and synthetic datasets validate the effectiveness of our proposed algorithms, demonstrating a noteworthy improvement of up to 11.8% and an average increase of 4.7% compared to benchmark algorithms.
Article
Mobility on-demand (MoD) systems widely use machine learning to estimate matching utilities of order-vehicle pairs to dispatch orders by bipartite matching. However, existing methods suffer from overestimation problems due to the complex interactions among order-vehicle pairs in the global bipartite graph, leading to low overall revenue and order completion rate. To fill this gap, we propose a multi-agent deep reinforcement learning (MADRL) based order dispatching method with bipartite splitting, named SplitMatch. The key idea is to split the global bipartite graph into multiple sub-bipartite graphs to overcome the overestimation problem. First, we propose a bipartite splitting theorem and prove that the optimal solution of global bipartite matching can be achieved by solving multiple sub-bipartite matching problems when certain conditions are met. Second, we design a spatial-temporal padding prediction algorithm to generate sub-bipartite graphs that satisfy this theorem, where the spatial-temporal feature of orders and vehicles is captured. Next, we propose a MADRL framework to learn the matching utility, where multi-objective, e.g., immediate revenue and quality of service (QoS), are taken into account to deal with varying action space. Finally, a series of simulations are conducted to verify the superiority of SplitMatch in terms of overall revenue and order completion rate.
Article
As ride-hailing services have experienced significant growth, most research has concentrated on the dispatching mode, where drivers must accept the platform’s assigned trip requests. However, the broadcasting mode, in which drivers can freely choose their preferred orders from those broadcast by the platform, has received less attention. One crucial but challenging task in such a system is the determination of the matching radius, which usually varies across space, time, and real-time supply/demand characteristics. This study develops a Deep Learning-based Matching Radius Decision (DL-MRD) model that predicts key system performance metrics for a range of matching radii, which enables the ride-hailing platform to select an optimal matching radius that maximizes overall system performance according to real-time supply and demand information. To simultaneously maximize multiple system performance metrics for matching radius determination, we devise a novel multi-task learning algorithm named Weighted Exponential Smoothing Multi-task (WESM) learning strategy that enhances convergence speed of each task (corresponding to the optimization of one metric) and delivers more accurate overall predictions. We evaluate our methods in a simulation environment designed for broadcasting-mode-based ride-hailing service. Our findings reveal that dynamically adjusting matching radii based on our proposed approach significantly improves system performance.
Article
With e-commerce rapidly developing, the Online To Offline (O2O) business model requests high efficiency for order allocation and last-mile delivery. Focusing on the challenges associated with online, same-day, and large-scale order allocation and distribution, we formulate an online dynamic vehicle routing problem with pickup and delivery (ODVRPPD), considering the uncertainty of dynamic orders and sustainability of online reassignments to improve the Quality of Experience (QoE). A novel social behavior whale optimization algorithm (SBWOA) with state machine formulation is proposed to solve this problem and express the order closed-loop fulfillment procedure. Inspired by the social behaviors and sonar communication of whale swarms, we propose SBWOA with a double-zone coding (DZC) scheme and affinity propagation clustering (AP clustering). DZC could make real-coding optimization algorithms be used in integer-coding VRPPDs. SBWOA uses AP clustering for the pickup and delivery locations to minimize delivery distance without specifying the initial clustering center and the number of clusters. Additionally, we use the real order data from Alibaba Cloud to construct 11 test problems (including a multi-day test problem with 12925 tasks and 990 vehicles). SBWOA outperforms four compared algorithms. Moreover, the extensive experimental results demonstrate the feasibility and adaptability of our model and SBWOA.
Article
Popular ride-hailing products, such as DiDi, Uber and Lyft, provide people with transportation convenience. Pricing, order dispatching and vehicle repositioning are three tasks with tight correlation and complex interactions in ride-hailing platforms, significantly impacting each other's decisions and demand distribution or supply distribution. However, no past work considered combining the three tasks to improve platform efficiency. In this paper, we exploit to optimize pricing, dispatching and repositioning strategies simultaneously. Such a new multi-stage decision-making problem is quite challenging because it involves complex coordination and lacks a unified problem model. To address this problem, we propose a novel J oint optimization framework of P ricing, D ispatching and R epositioning (JPDR) integrating contextual bandit and multi-agent deep reinforcement learning. JPDR consists of two components, including a Soft Actor-Critic (SAC)-based centralized policy for dispatching and repositioning and a pricing strategy learned by a multi-armed contextual bandit algorithm based on the feedback from the former. The two components learn in a mutually guided way to achieve joint optimization because their updates are highly interdependent. Based on real-world data, we implement a realistic environment simulator. Extensive experiments conducted on it show our method outperforms state-of-the-art baselines in terms of both gross merchandise volume and success rate.
Preprint
Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These traditional assumptions frequently prove inadequate in real-world settings, thereby restricting the capability of current OPE methods to effectively address complex interference effects. In response, we advocate for the implementation of the permutation invariance (PI) assumption. This innovative approach enables the data-driven, adaptive learning of the mean-field function, offering a more flexible estimation method beyond conventional averaging. Furthermore, we present novel algorithms that incorporate the PI assumption into OPE and thoroughly examine their theoretical foundations. Our numerical analyses demonstrate that this novel approach yields significantly more precise estimations than existing baseline algorithms, thereby substantially improving the practical applicability and effectiveness of OPE methodologies. A Python implementation of our proposed method is available at https://github.com/BIG-S2/Causal-Deepsets.
Conference Paper
Full-text available
Recommender system is one of the most popular data mining topics that keep drawing extensive attention from both academia and industry. Among them, POI (point of interest) recommendation is extremely practical but challenging: it greatly benefits both users and businesses in real-world life, but it is hard due to data scarcity and various context. While a number of algorithms attempt to tackle the problem w.r.t. specific data and problem settings, they often fail when the scenarios change. In this work, we propose to devise a general and principled SSL (semi-supervised learning) framework, to alleviate data scarcity via smoothing among neighboring users and POIs, and treat various context by regularizing user preference based on context graphs. To enable such a framework, we develop PACE (Preference And Context Embedding), a deep neural architecture that jointly learns the embeddings of users and POIs to predict both user preference over POIs and various context associated with users and POIs. We show that PACE successfully bridges CF (collaborative filtering) and SSL by generalizing the de facto methods matrix factorization of CF and graph Laplacian regularization of SSL. Extensive experiments on two real location-based social network datasets demonstrate the effectiveness of PACE.
Conference Paper
Full-text available
Cycling as a green transportation mode has been promoted by many governments all over the world. As a result, constructing effective bike lanes has become a crucial task for governments promoting the cycling life style, as well-planned bike paths can reduce traffic congestion and decrease safety risks for both cyclists and motor vehicle drivers. Unfortunately, existing trajectory mining approaches for bike lane planning do not consider key realistic government constraints: 1) budget limitations, 2) construction convenience, and 3) bike lane utilization. In this paper, we propose a data-driven approach to develop bike lane construction plans based on large-scale real world bike trajectory data. We enforce these constraints to formulate our problem and introduce a flexible objective function to tune the benefit between coverage of the number of users and the length of their trajectories. We prove the NP-hardness of the problem and propose greedy-based heuristics to address it. Finally, we deploy our system on Microsoft Azure, providing extensive experiments and case studies to demonstrate the effectiveness of our approach.
Article
Full-text available
This study proposes and evaluates an efficient real-time taxi dispatching strategy that solves the linear assignment problem to find a globally optimal taxi-to-request assignment at each decision epoch. The authors compare the assignment-based strategy with two popular rule-based strategies. They evaluate dispatching strategies in detail in the city of Berlin and the neighboring region of Brandenburg using the microscopic large-scale MATSim simulator. The assignment-based strategy produced better results for both drivers (less idle driving) and passengers (less waiting). However, computing the assignments for thousands of taxis in a huge road network turned out to be computationally demanding. Certain adaptations pertaining to the cost matrix calculation were necessary to increase the computational efficiency and assure real-time responsiveness.
Article
Full-text available
Taxi service strategies, as the crowd intelligence of massive taxi drivers, are hidden in their historical time- stamped GPS traces. Mining GPS traces to understand the service strategies of skilled taxi drivers can benefit the drivers themselves, passengers and city planners in a number of ways. This paper intends to uncover the efficient and inefficient taxi service strategies, based on a large-scale GPS historical database of approximately 7600 taxis over one year in a city in China. First, we separate the GPS traces of individual taxi drivers and link them with the revenue generated. Second, we investigate the taxi service strategies from three perspectives: passenger- searching strategies, passenger-delivery strategies, and service- region preference. Finally, we represent the taxi service strategies with a feature matrix and evaluate the correlation between service strategies and revenue, informing which strategies are efficient or inefficient. We predict the revenue of taxi drivers based on their strategies and achieve a prediction residual as less as 2.35RMB per hour, which demonstrates that the extracted taxi service strategies with our proposed approach well characterize the driving behavior and performance of taxi drivers.
Article
Full-text available
Informed driving is increasingly becoming a key feature for increasing the sustainability of taxi companies. The sensors that are installed in each vehicle are providing new opportunities for automatically discovering knowledge, which, in return, delivers information for real-time decision making. Intelligent transportation systems for taxi dispatching and for finding time-saving routes are already exploring these sensing data. This paper introduces a novel methodology for predicting the spatial distribution of taxi-passengers for a short-term time horizon using streaming data. First, the information was aggregated into a histogram time series. Then, three time-series forecasting techniques were combined to originate a prediction. Experimental tests were conducted using the online data that are transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide effective insight into the spatiotemporal distribution of taxi-passenger demand for a 30-min horizon.
Article
Full-text available
The existing taxi dispatch system that taxi operators in Singapore use to handle current bookings was studied. This dispatch system adopts the Global Positioning System and is based on the nearest-coordinate method: the taxi assigned for each booking is the one with the shortest, direct, straight-line distance to the customer location. However, the taxi assigned under this system often is not capable of reaching the customer in the shortest time possible. An alternative dispatch system is proposed, whereby the dispatch of taxis is determined by real-time traffic conditions. In the proposed system, the taxi assigned the booking job is the one with the shortest time path, reaching the customer in the shortest time. This dispatch ensures that customers are served within the shortest period of tune and increases customer satisfaction. The effectiveness of both the existing and the proposed dispatch systems is investigated through computer simulations. The results from a simulation model of the Singapore central business district network are presented and analyzed. Data from the simulations show that the proposed dispatch system is capable of being more efficient in dispatching taxis more quickly and leads to more than 50% reductions in passenger pickup times and average travel distances. A more efficient dispatch system would result in higher standards of customer service and a more organized taxi fleet to meet customer demands better.
Chapter
Full-text available
Traffic light control is one of the main means of controlling road traffic. Improving traffic control is important because it can lead to higher traffic throughput and reduced traffic congestion. This chapter describes multiagent reinforcement learning techniques for automatic optimization of traffic light controllers. Such techniques are attractive because they can automatically discover efficient control strategies for complex tasks, such as traffic control, for which it is hard or impossible to compute optimal solutions directly and hard to develop hand-coded solutions. First, the general multi-agent reinforcement learning framework is described, which is used to control traffic lights in this work. In this framework, multiple local controllers (agents) are each responsible for the optimization of traffic lights around a single traffic junction, making use of locally perceived traffic state information (sensed cars on the road), a learned probabilistic model of car behavior, and a learned value function which indicates how traffic light decisions affect long-term utility, in terms of the average waiting time of cars. Next, three extensions are described which improve upon the basic framework in various ways: agents (traffic junction controllers) taking into account congestion information from neighboring agents; handling partial observability of traffic states; and coordinating the behavior of multiple agents by coordination graphs and the max-plus algorithm.
Article
Full-text available
This paper presents a novel multiagent approach to automating taxi dispatch that services current bookings in a distributed fashion. The existing system in use by a taxi operator in Singapore and elsewhere, attempts to increase customer satisfaction locally, by sequentially dispatching nearby taxis to service customers. The proposed dispatch system attempts to increase customer satisfaction more globally, by concurrently dispatching multiple taxis to the same number of customers in the same geographical region, and vis-à-vis human driver satisfaction. To realize the system, a multiagent architecture is proposed, populated with software collaborative agents that can actively negotiate on behalf of taxi drivers in groups of size N for available customer bookings. Theoretically, an analysis of the boundary and optimal multiagent taxi-dispatch situations is presented along with a discussion of their implications. Experimentally, the operational efficiency of the existing and proposed dispatch systems was evaluated through computer simulations. The empirical results, obtained for a 1000-strong taxi fleet over a discrete range of N , show that the proposed system can dispatch taxis with reduction in customer waiting and empty taxi cruising times of up to 33.1% and 26.3%, respectively; and up to 41.8% and 41.2% reduction when a simple negotiation speedup heuristic was applied.
Conference Paper
Full-text available
In modern cities, more and more vehicles, such as taxis, have been equipped with GPS devices for localization and navigation. Gathering and analyzing these large-scale real-world digital traces have provided us an unprecedented opportunity to understand the city dynamics and reveal the hidden social and economic “realities”. One innovative pervasive application is to provide correct driving strategies to taxi drivers according to time and location. In this paper, we aim to discover both efficient and inefficient passenger-finding strategies from a large-scale taxi GPS dataset, which was collected from 5350 taxis for one year in a large city of China. By representing the passenger-finding strategies in a Time-Location-Strategy feature triplet and constructing a train/test dataset containing both top- and ordinary-performance taxi features, we adopt a powerful feature selection tool, L1-Norm SVM, to select the most salient feature patterns determining the taxi performance. We find that the selected patterns can well interpret the empirical study results derived from raw data analysis and even reveal interesting hidden “facts”. Moreover, the taxi performance predictor built on the selected features can achieve a prediction accuracy of 85.3% on a new test dataset, and it also outperforms the one based on all the features, which implies that the selected features are indeed the right indicators of the passenger-finding strategies.
Article
Full-text available
We addressed the problem of developing a model to simulate at a high level of detail the movements of over 6,000 drivers for Schneider National, the largest truckload motor carrier in the United States. The goal of the model was not to obtain a better solution but rather to closely match a number of operational statistics. In addition to the need to capture a wide range of operational issues, the model had to match the performance of a highly skilled group of dispatchers while also returning the marginal value of drivers domiciled at different locations. These requirements dictated that it was not enough to optimize at each point in time (something that could be easily handled by a simulation model) but also over time. The project required bringing together years of research in approximate dynamic programming, merging math programming with machine learning, to solve dynamic programs with extremely high-dimensional state variables. The result was a model that closely calibrated against real-world operations and produced accurate estimates of the marginal value of 300 different types of drivers.
Article
Full-text available
Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.
Conference Paper
Taxi-booking apps have been very popular all over the world as they provide convenience such as fast response time to the users. The key component of a taxi-booking app is the dispatch system which aims to provide optimal matches between drivers and riders. Traditional dispatch systems sequentially dispatch taxis to riders and aim to maximize the driver acceptance rate for each individual order. However, the traditional systems may lead to a low global success rate, which degrades the rider experience when using the app. In this paper, we propose a novel system that attempts to optimally dispatch taxis to serve multiple bookings. The proposed system aims to maximize the global success rate, thus it optimizes the overall travel efficiency, leading to enhanced user experience. To further enhance users' experience, we also propose a method to predict destinations of a user once the taxi-booking APP is started. The proposed method employs the Bayesian framework to model the distribution of a user's destination based on his/her travel histories. We use rigorous A/B tests to compare our new taxi dispatch method with state-of-the-art models using data collected in Beijing. Experimental results show that the proposed method is significantly better than other state-of-the art models in terms of global success rate (increased from 80% to 84%). Moreover, we have also achieved significant improvement on other metrics such as user's waiting-time and pick-up distance. For our destination prediction algorithm, we show that our proposed model is superior to the baseline model by improving the top-3 accuracy from 89% to 93%. The proposed taxi dispatch and destination prediction algorithms are both deployed in our online systems and serve tens of millions of users everyday.
Conference Paper
Taxi-calling apps are gaining increasing popularity for their efficiency in dispatching idle taxis to passengers in need. To precisely balance the supply and the demand of taxis, online taxicab platforms need to predict the Unit Original Taxi Demand (UOTD), which refers to the number of taxi-calling requirements submitted per unit time (e.g., every hour) and per unit region (e.g., each POI). Predicting UOTD is non-trivial for large-scale industrial online taxicab platforms because both accuracy and flexibility are essential. Complex non-linear models such as GBRT and deep learning are generally accurate, yet require labor-intensive model redesign after scenario changes (e.g., extra constraints due to new regulations). To accurately predict UOTD while remaining flexible to scenario changes, we propose LinUOTD, a unified linear regression model with more than 200 million dimensions of features. The simple model structure eliminates the need of repeated model redesign, while the high-dimensional features contribute to accurate UOTD prediction. We further design a series of optimization techniques for efficient model training and updating. Evaluations on two large-scale datasets from an industrial online taxicab platform verify that LinUOTD outperforms popular non-linear models in accuracy. We envision our experiences to adopt simple linear models with high-dimensional features in UOTD prediction as a pilot study and can shed insights upon other industrial large-scale spatio-temporal prediction problems.
Conference Paper
Bike sharing systems, aiming at providing the missing links in public transportation systems, are becoming popular in urban cities. Many providers of bike sharing systems are ready to expand their bike stations from the existing service area to surrounding regions. A key to success for a bike sharing systems expansion is the bike demand prediction for expansion areas. There are two major challenges in this demand prediction problem: First. the bike transition records are not available for the expansion area and second. station level bike demand have big variances across the urban city. Previous research efforts mainly focus on discovering global features, assuming the station bike demands react equally to the global features, which brings large prediction error when the urban area is large and highly diversified. To address these challenges, in this paper, we develop a hierarchical station bike demand predictor which analyzes bike demands from functional zone level to station level. Specifically, we first divide the studied bike stations into functional zones by a novel Bi-clustering algorithm which is designed to cluster bike stations with similar POI characteristics and close geographical distances together. Then, the hourly bike check-ins and check-outs of functional zones are predicted by integrating three influential factors: distance preference, zone-to-zone preference, and zone characteristics. The station demand is estimated by studying the demand distributions among the stations within the same functional zone. Finally, the extensive experimental results on the NYC Citi Bike system with two expansion stages show the advantages of our approach on station demand and balance prediction for bike sharing system expansions.
Article
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
Article
Traditional taxi systems in metropolitan areas often suffer from inefficiencies due to uncoordinated actions as system capacity and customer demand change. With the pervasive deployment of networked sensors in modern vehicles, large amounts of information regarding customer demand and system status can be collected in real time. This information provides opportunities to perform various types of control and coordination for large-scale intelligent transportation systems. In this paper, we present a receding horizon control (RHC) framework to dispatch taxis, which incorporates highly spatiotemporally correlated demand/supply models and real-time GPS location and occupancy information. The objectives include matching spatiotemporal ratio between demand and supply for service quality with minimum current and anticipated future taxi idle driving distance. Extensive trace-driven analysis with a data set containing taxi operational records in San Francisco shows that our solution reduces the average total idle distance by 52%, and reduces the supply demand ratio error across the city during one experimental time slot by 45%. Moreover, our RHC framework is compatible with a wide variety of predictive models and optimization problem formulations. This compatibility property allows us to solve robust optimization problems with corresponding demand uncertainty models that provide disruptive event information.
Conference Paper
Taxis as a kind of public transit have been taken by citizens thousands of times every day in urban areas. However, it is economically inefficient for vacant taxis to randomly cruise around to seek for passengers. In this paper, we propose a dynamic taxi dispatch system for smart city which dispatches routes with high probability to encounter passengers for vacant taxis. In the system, a dynamic probabilistic model has been established, which considers the impact of time on passenger appearance and the effect of different vacant taxis traveling route on each other’s pick-up probability. Specifically, a novel feedback system has been introduced in the system, which utilizes the information about where taxis pick up passengers to amend system probabilistic model. Moreover, extensive trace-driven simulations based on real digital map of Shanghai and historical data of over 2,000 taxis demonstrate the good performance of our system.
Article
The GPS technology and new forms of urban geography have changed the paradigm for mobile services. As such, the abundant availability of GPS traces has enabled new ways of doing taxi business. Indeed, recent efforts have been made on developing mobile recommender systems for taxi drivers using Taxi GPS traces. These systems can recommend a sequence of pick-up points for the purpose of maximizing the probability of identifying a customer with the shortest driving distance. However, in the real world, the income of taxi drivers is strongly correlated with the effective driving hours. In other words, it is more critical for taxi drivers to know the actual driving routes to minimize the driving time before finding a customer. To this end, in this paper, we propose to develop a cost-effective recommender system for taxi drivers. The design goal is to maximize their profits when following the recommended routes for finding passengers. Specifically, we first design a net profit objective function for evaluating the potential profits of the driving routes. Then, we develop a graph representation of road networks by mining the historical taxi GPS traces and provide a Brute-Force strategy to generate optimal driving route for recommendation. However, a critical challenge along this line is the high computational cost of the graph based approach. Therefore, we develop a novel recursion strategy based on the special form of the net profit function for searching optimal candidate routes efficiently. Particularly, instead of recommending a sequence of pick-up points and letting the driver decide how to get to those points, our recommender system is capable of providing an entire driving route, and the drivers are able to find a customer for the largest potential profit by following the recommendations. This makes our recommender system more practical and profitable than other existing recommender systems. Finally, we carry out extensive experiments on a real-world data set collected from the San Francisco Bay area and the experimental results clearly validate the effectiveness of the proposed recommender system.
Article
In this paper we presen algorithms for the solution of the general assignment and transportation problems. In Section 1, a statement of the algorithm for the assignment problem appears, along with a proof for the correctness of the algorithm. The remarks which constitute the proof are incorporated parenthetically into the statement of the algorithm. Following this appears a discussion of certain theoretical aspects of the problem. In Section 2, the algorithm is generalized to one for the transportation problem. The algorithm of that section is stated as concisely as possible, with theoretical remarks omitted. 1. THE ASSIGNMENT PROBLEM. The personnel-assignment problem is the problem of choosing an optimal assignment of n men to n jobs, assuming that numerical ratings are given for each man’s performance on each job. An optimal assignment is one which makes the sum of the men’s ratings for their assigned jobs a maximum. There are n! possible assignments (of which several may be optimal), so that it is physically impossible, except
Article
In this paper we present and analyze a queueing-theoretical model for autonomous mobility-on-demand (MOD) systems where robotic, self-driving vehicles transport customers within an urban environment and rebalance themselves to ensure acceptable quality of service throughout the entire network. We cast an autonomous MOD system within a closed Jackson network model with passenger loss. It is shown that an optimal rebalancing algorithm minimizing the number of (autonomously) rebalancing vehicles and keeping vehicles availabilities balanced throughout the network can be found by solving a linear program. The theoretical insights are used to design a robust, real-time rebalancing algorithm, which is applied to a case study of New York City. The case study shows that the current taxi demand in Manhattan can be met with about 8,000 robotic vehicles (roughly 60% of the size of the current taxi fleet). Finally, we extend our queueing-theoretical setup to include congestion effects, and we study the impact of autonomously rebalancing vehicles on overall congestion. Collectively, this paper provides a rigorous approach to the problem of system-wide coordination of autonomously driving vehicles, and provides one of the first characterizations of the sustainability benefits of robotic transportation networks.
Article
The agent computing paradigm is rapidly emerging as one of the powerful technologies for the development of large-scale distributed systems to deal with the uncertainty in a dynamic environment. The domain of traffic and transportation systems is well suited for an agent-based approach because transportation systems are usually geographically distributed in dynamic changing environments. Our literature survey shows that the techniques and methods resulting from the field of agent and multiagent systems have been applied to many aspects of traffic and transportation systems, including modeling and simulation, dynamic routing and congestion management, and intelligent traffic control. This paper examines an agent-based approach and its applications in different modes of transportation, including roadway, railway, and air transportation. This paper also addresses some critical issues in developing agent-based traffic control and management systems, such as interoperability, flexibility, and extendibility. Finally, several future research directions toward the successful deployment of agent technology in traffic and transportation systems are discussed.
Article
Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason about the behavior of simultaneous learners in a shared environment.
Article
Hailing a taxi in Singapore now employs the latest positioning technology for matching passengers with the nearest available cabs, thus achieving greater productivity and customer satisfaction.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Article
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems. Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.
Multi-agent actor-critic for mixed cooperative-competitive environments Advances in Neural Information Processing Systems
  • Ryan Lowe
  • Yi Wu
  • Aviv Tamar
  • Jean Harb
  • Lowe Ryan