ArticlePDF Available

A Distributed Algorithm for Operating Large-Scale Ridesourcing Systems

Authors:

Abstract

With ridesourcing services gaining popularity in the past few years, there has been growing interest in algorithms that could enable real-time operation of these systems. As ridesourcing systems rely on independent entities to build the supply and demand sides of the market, they have been shown to operate more successfully in metropolitan areas where there is a high level of demand for rides as well as a high number of drivers, and a large volume of trips occurring within a geographically constrained region. Despite the suitable ecosystem that metropolitan areas offer for ridesourcing operations, there is a lack of methods that can provide high-quality matching solutions in real-time. To fill this gap, this paper introduces a framework that allows for solving the large-scale matching problems by means of solving smaller problems in a distributed fashion. The proposed methodology is based on constructing approximately-uniform clusters of trip requests, where vehicle tours form cluster centers. Using the New York Taxi dataset, we compare the performance of the proposed methodology against three benchmark methods to showcase its advantages in terms of solution quality and solution time.
A Distributed Algorithm for Operating Large-Scale Ridesourcing
Systems
September 20, 2021
Ruolin Zhang1, Neda Masoud1
1Civil and Environmental Engineering, University of Michigan Ann Arbor
Corresponding Author – Email: nmasoud@umich.edu
Abstract
With ridesourcing services gaining popularity in the past few years, there has been growing interest
in algorithms that could enable real-time operation of these systems. As ridesourcing systems rely
on independent entities to build the supply and demand sides of the market, they have been shown
to operate more successfully in metropolitan areas where there is a high level of demand for rides
as well as a high number of drivers, and a large volume of trips occurring within a geographically
constrained region. Despite the suitable ecosystem that metropolitan areas offer for ridesourcing
operations, there is a lack of methods that can provide high-quality matching solutions in real-time.
To fill this gap, this paper introduces a framework that allows for solving the large-scale matching
problems by means of solving smaller problems in a distributed fashion. The proposed methodology
is based on constructing approximately-uniform clusters of trip requests, where vehicle tours form
cluster centers. Using the New York Taxi dataset, we compare the performance of the proposed
methodology against three benchmark methods to showcase its advantages in terms of solution
quality and solution time.
1 Introduction
In recent years, population and economic growth have led to the formation of traffic jams in
metropolitan areas, with direct influence on pollution of exhaust emissions and increasing travel
time and cost. Single-occupancy vehicles are a major source for generating carbon dioxide emissions
[18]–a problem that is exacerbated due to congestion. However, scaling-up the infrastructure to
meet the growing demand is constrained and costly. Therefore, seeking solutions to increase the
utilization rate of the existing transportation infrastructure has been the focus of extensive research
in the past decade.
A number of alternative modes of transportation have been introduced to expand the utilization
rate of the existing transportation infrastructure. Public transportation is a traditional means to
reduce the number of single-occupancy vehicles. Public transportation systems, such as buses and
rail systems, are generally regulated on a fixed schedule and operated on established routes, and
charge a posted fee for each trip. Although having a fixed schedule and route could lead to offering
more reliable services, the limited operational flexibility of public transportation services leads to
more constrained coverage, both spatially and temporally. This has led to a growing interest in
shared mobility options, which introduce more flexibility and comfort compared to fixed public
1
transportation options, but offer discounted prices compared to taxis and other private modes of
transportation.
Technological advancements such as GPS-enabled smart personal devices, online payment
systems and big data together with a global quest for environmentally-friendly and cost-efficient
mobility options have led to the emergence of a significant number of internet-based companies
around the globe that offer ride-sharing and ridesourcing services to satisfy on-demand requests.
Examples ([27]) include Flinc [2], Ville Fluide [3], Carticipate [5], Uber [6], and Lyft [4]. Benefits of
ride-sharing consist of saving travel cost and possibly travel time for drivers and riders, alleviating
traffic congestion, conserving fuel, and mitigating air pollution. According to the National
Household Travel Survey (NHTS), which is the authoritative source reporting on the travel behavior
of the American public, the average light vehicle occupancy (the number of travelers per vehicle trip)
is relatively low–1.67 in 2017, unchanged from 2009. Therefore, ride-sharing services have great
potential for development. As such, devising algorithmic tools for real-time matching of drivers
and riders in a ride-sharing system, or ridesourcing with pooling, also known as the ride-matching
problem, is an important and timely topic of research.
This paper introduces a methodology for efficiently solving the one-to-many ride-matching
problem, in which a driver can carry multiple passengers at once or in sequence. More specifically,
we introduce a clustering method to decompose the problem into multiple sub-problems such
that sub-problems can be optimized independently of each other and in parallel.The proposed
method guarantees that the sizes of the sub-problems remain approximately uniform, since the
computational complexity of the ride-matching problem grows exponentially with the size of the
problem. We use the New York City Taxi dataset [1] to perform numerical experiments. To evaluate
the performance of our proposed methodology, we compare the results with the optimal solution as
well as three partitioning methods from the literature, namely point-based, balanced point-based,
and trip-based partitioning. We also conduct sensitivity analysis to test the impact of the degree
of uniformity in the size of sub-problems and the number of clusters on the computation time and
the objective function.
2 Literature Review
In this section we will review the relevant studies from the graph partitioning and ridesharing
literature.
2.1 Graph Partitioning
When modeling problems in different application domains, researchers often use graphs as
abstractions [42]. Splitting a graph into smaller sub-graphs is one of the basic algorithmic operations
that allows for solving a large-scale problem by means of solving smaller problems that correspond to
sub-graphs, in a distributed fashion. In the past decade, graph partitioning has gained increasingly
higher popularity due to the emergence of larger problem instances in various application domains.
Applications of graph partitioning in practice can be found in parallel processing [24,31], complex
networks [21], transportation networks [38,22], and image processing [14,36], among others.
A graph can be represented by a set of vertices and weighted edges. A graph partitioning
problem seeks to partition vertices and/or edges into different sub-graphs. In a number of
application domains, balancing constraints are also imposed during graph portioning to ensure
that all clusters have (approximately) equal weights, number of edges, or number of vertices. An
imbalance parameter εcan be used to impose the balancing constraint.
Graph partitioning can be formulated to optimize different objective functions, which reflect
different objectives of graph partitioning for different application domains. The most prominent
objective function is to minimize the total cut, typically quantified by the total weight of removed
2
edges form the original graph. It has been shown that the problem of dividing a graph into k
clusters with approximately equal size to minimize an objective function is NP-complete [14]. [13]
shows that on a general graph, a perfectly balanced partitioning (ε= 0) has no constant-factor
approximation. If ε(0,1], a O(log2n) factor approximation can be achieved.
There are a large number of methods to solve the graph partitioning problem. They can be
divided into two groups, namely, global algorithms, and local algorithms [42]. Global algorithms are
methods that apply to the entire graph and directly obtain the solution. This family of algorithms
could include exact methods [28,50,10,45,52,48] and heuristic algorithms [7,56,62,51]. These
algorithms are usually used for smaller graphs. Many of these methods are limited to bi-partitioning
but can also be applied to k-clustering by recursion [42]. Local algorithms, on the other hand, are
based a starting solution, where this starting solution is iteratively improved. Examples of this
family of algorithms include local search [7,8] and flow-based improvement [37,46].
Many applications in the transportation domain can be modeled using graphs, rendering graph
partitioning an important tool, especially due to the higher penetration rate of shared mobility as
well as emergence of connectivity that lead to higher complexity of transportation problems. In this
paper, we propose a methodology that utilizes approximately-uniform graph partitioning/clustering
for the real-time ride-matching problem. We formulate the problem as a clustering problem that
assigns trips to clusters such that a total cost is minimized, where we use vehicle tours as cluster
representatives, and impose uniformity constraints on clusters.
2.2 Dynamic Ride-sharing
According to the 2018 Global Traffic Scorecard, Americans lost an average of 97 hours a year due
to traffic congestion, costing nearly $87 billion in 2018, an average of $1,348 per driver. Increasing
the utilization rate of vehicles can be an effective way to reduce vehicle-miles-traveled (VMT) and
improve traffic congestion.
Ride-sharing has garnered plenty of attention in recent years due to its effectiveness in utilizing
empty car seats [34]. A ride-sharing system aims to bring together participants with compatible
routes and time schedules to share a vehicle. Here, we focus on a dynamic ride-sharing system,
where requests to participate in the system can be made at any point in time. Dynamic ride-sharing
concentrates on single, non-recurring trips, differentiating it from conventional carpooling [12,55],
which focuses on recurring trips. In such a system, participants can request a trip as riders or
provide rides as drivers. Participants input their requests, including the origins and destinations of
their trips and their trip timelines, and the operator of the system makes arrangements to match
drivers with riders on a short notice or even en-route. In a dynamic ride-sharing system, typically
a central ride-matching problem is solved periodically.
Ride-matching problems can have different objectives, such as minimizing system-wide VMT
[63], maximizing the operator’s profit [43], and maximizing the number of matched participants
[masoud2017decomposition], among others. When matching a driver with a rider, several
constraints must be considered. Many studies let a rider or driver provide their earliest departure
and latest arrival times, constructing a time window that constrains the matches [26]. In addition
to travel time, a number of other constraints may be imposed to satisfy a participant’s needs and
preferences [29].
A ride-sharing system can adopt one of several possible strategies when matching riders with
drivers: (1) a single rider matched with a single driver, (2) a single driver matched multiple riders
(i.e., pooling), (3) a single rider matched with multiple drivers (i.e., multi-hop matching), and (4)
multiple rider, multiple driver arrangements (i.e., pooling in a multi-hop system) [26,34]. Typically,
ride-matching algorithms are developed assuming that rider and driver roles are fixed. However,
a number of studies have considered the more general but complex scenario where a portion of
3
participants are flexible and can take any role that is assigned to them by the system [23,60].
A review of ride-matching algorithms for different system configurations is shown in Table 1. A
comprehensive review of ride-matching methods can be found in [61].
The simplest form of ride-matching involves matching a single rider with precisely a single
driver, also known as one-to-one matching. [23] formulates the one-to-one ride-matching problem as
a maximum weight bipartite matching problem. They use the the optimization-based approaches
with a rolling horizon strategy to solve the ride-matching problem. [53] proposes a clustering
heuristic algorithm to solve the one-to-one matching problem. More complicated forms of the ride-
matching problems aim to increase the number of riders on board beyond a single rider to take
advantage of empty seats in a vehicle. [30] proposes a genetic and insertion heuristic algorithm to
solve the ride-marching problem in which a driver can serve more than one rider, also known as the
one-to-many ride-matching problem. [32] formulates the one-to-many ride-matching problem as a
mixed integer-linear programming problem that can be solved by commercial optimization engines.
[41] models the system as a maximum weight bipartite matching problem, where the number of
stops for a rider is limited to two. [47] presents a scalable mathematical formulation of the one-
to-many problem, where within multiple steps vehicles are matched with groups of passengers.
Through numerical experiments, they demonstrate that when re-purposed for ridesharing, only a
fraction of taxis in the New York Taxi dataset can serve almost the entire demand for rides.
The one-to-many matching problem, also known as the pooling problem, arises in systems
similar to ridesharing, such as taxi sharing and pooled variants of ridesourcing. To serve on-
demand requests by shared taxis, [35] first prunes the search space by identifying candidate taxis
that can potentially serve a request, and then uses a scheduling algorithm to find a match that
imposes the least added distance traveled. [40] proposes a greedy randomized adaptive search
procedure to operate a ridesharing/taxi sharing system with on-demand request, and demonstrates
using numerical experiments that pooling could reduce the cost of trips by about 30% compared
to private rides. [64] proposes an artificial bee colony algorithm for serving requests in a pooled
ridesourcing system, where the objective is to maximize the number of served ride requests while
at the same time minimizing travel time and cost ratios. Their numerical experiments demonstrate
that pooling in ridesourcing systems can lead to substantial cost savings with minimal increase in
travel time.
A number of studies allow a rider to transfer between multiple vehicles to complete his/her
trip, giving rise to multi-hop systems. [25] proposes an evolutionary multi-objective route planning
algorithm to solve a many-to-one ride-matching problem in which riders can transfer between
different drivers, but a driver can transport a single passenger. [33] provides a solution to a
ride-matching problem with an arbitrary number of transfers that respect the users’ personal
preferences using a graph searching algorithm. Many-to-many matching problems are the most
complex problems that allow riders to transfer between different drivers and, at the same time,
drivers to have more than one riders on board. [20] models the many-to-many ride-matching
problem and solve relatively small instances of the problem using a branch-and-cut method. [29]
proposes a spatial, temporal, and hierarchical decomposition solution strategy to solve the many-
to-many ride-matching problem, formulated by a mixed-integer problem. However, the number
of transfers is limited. [49] models the multi-hop ride-matching problem as a binary program and
propose a decomposition algorithm to solve this problem. A comprehensive review of ride-matching
problems for different system configurations, including ride-sharing, Ridesourcing, taxi-sharing, and
demand-responsive services, can be found in [61,57,58]. For a comprehensive literature review on
the dial-a-ride problem, see [11,16,54].
4
Study Dynamic Configuration No. of
participants
Computation
time (s)
Solution Methodology
[23]Xsingle rider,
single driver 4,933 78 Max-weight bipartite matching with
rolling horizon strategy
[53]Xsingle rider,
single driver 15,412 80.21 Clustering heuristic algorithm with rolling
horizon strategy
[30]Xsingle driver,
multiple riders 744 100 Genetic and insertion heuristic algorithms
[41] single driver,
multiple riders 2,849 150 Bipartite matching formulation with
meeting points
[25]Xmultiple drivers,
single rider 250 1.476 Evolutionary multi-objective
route planning problem solved by the
NSGA-II algorithm
[49] multiple drivers,
multiple riders 400 6 An exact decomposition algorithm
[44] all but multiple
drivers, multiple
riders
0-2000 Decentralized (dynamic auction-based
multi-agent) optimization algorithms
[39]Xsingle rider,
single driver 4,500 A partition-based match-making
algorithm
[59]Xsingle rider,
single driver 22,000 36.32 A graph partitioning methodology
This
paper
X
single driver,
multiple riders
20,000 42.31 Approximately-uniform clustering with
tours as cluster representative
Table 1: A review of ride-matching methods. (If the study focuses on dynamic ride-matching, the number listed under the No. of participants
column is the number of participants during a one-hour horizon.)
5
In major metropolitan areas where thousands of ride requests arrive dynamically, centralized
ride-matching methods may fall short in providing high-quality solutions in real-time. Consequently,
several attempts have been made in the literature to develop decentralized optimization schemes.
One approach to decentralize the decision making process is adopting agent-based models. [15]
considers the transportation network as a mobile geo-sensor network of agents that interact locally
by short-range communication and heuristic way-finding strategies. [17] introduces an agent-based
approach where intelligent vehicle agents schedule their routes. [44] sets up a single-shot first-price
Vickrey auction where the virtual driver–passenger agents are matched.
Another approach to decentralized decision making is dividing the problem into multiple smaller
sub-problems that can be solved independently of each other. [39] partitions the road network
into distinct sub-networks that define the search space for ride matches. Recently, [53] proposed a
framework that embeds a network clustering heuristic within an on-demand ride-sharing system. To
address participants whose origins and destinations fall in different clusters, they solve an additional
matching problem. We refer to this method as “point-based culstering”. In another recent work,
[59] proposed a trip-based graph partitioning method for dynamic ridesharing systems. In contrast
to [53], they used complete trips, rather than trip ends, to form clusters, leading to partitions that
may be geographically overlapping. Furthermore, they form clusters that are approximately uniform
in size to reduce solution time. In Section 7, we use three benchmark methods to evaluate the
performance of our proposed methodology: point-based clustering, balanced point-based clustering
in which uniformity constraints are imposed when forming clusters in point-based clustering, and
trip-based clustering.
2.3 Our Contributions
In this paper, we propose a framework to solve the one-to-many ride-matching problem that arises
in ride-sharing and Ridesourcing systems in a distributed fashion. Our proposed method forms
approximately-uniform clusters of trips, and assigns drivers to trip clusters proportional to the
cluster sizes. Different from the existing literature that uses points or trips as cluster centers
during partitioning, we use vehicle tours as cluster representatives to capture the fact that ride
requests can be pooled and served by a single vehicle even when they do not share the same
origin, destination, or time window. This allows for obtaining higher-quality solutions for one-to-
many ridesharing systems or ridesourcing systems with pooling. The proposed method decomposes
the original problem into smaller sub-problems that are approximately uniform in size within a
threshold of ε. The value of εcaptures the trade-off between the complexity of solving the sub-
problems and the lost performance due to partitioning: While setting ε= 0 ensures all sub-problems
are uniform in size and therefore the solution time is minimized, to obtain such uniform partitioning
more potential matches are ignored, thereby affecting the solution quality. Finally, we compare the
results of our proposed methodology with three benchmark methods, namely point-based, balanced
point-based, and trip-based partitioning, as well as the optimal solution.
As such, the contributions of this paper can be summarized as follows. This is the first study to
propose an approximately-uniform clustering method with tours as cluster representatives, which
as demonstrated in the numerical experiments section, outperforms existing clustering approaches.
We develop an iterative procedure based on Lloyd’s algorithm [9] to find approximately-uniform
clusters, and show that this clustering approach has favorable properties, such as monotonically
converging to a local optimal solution in a finite number of steps.
3 Problem statement
Consider a dynamic ride-sharing system that matches drivers with riders in a region over time.Let
us divide the study area into a set of stations S={s1, s2,· · · , sm}, where drivers and riders start
6
Sets
S={s}set of stations where drivers and riders start or finish their trips
D={1,· · · ,D} set of drivers
R={1,· · · ,R} set of riders/trips
P=RDset of participants p
Rcset of riders whose trips do not lie on tours
K={1,· · · ,K} set of clusters
Tset of time intervals t
Lset of links of l= (ti, si, tj, sj)T×S×T×SL
Lpset of links that are accessible by participant p
Lrd set of links on which driver dcan serve rider r, where Lrd =LrLd
Lset of links where `(i, j)∈ L indicates rider jcan be served after rider i
Indices
r, i, j indices for rider/trip
dindex for driver/vehicle
pindex for participant (driver or rider)
kindex for cluster/tour
`= (i, j) link `exists if rider rjcan be served following rider ri
Parameters
sO
porigin station of participant p
sD
pdestination station of participant p
tED
pearliest departure time of participant p
tLA
platest arrival time of participant p
capdthe capacity of (the vehicle of) driver d(i.e., the maximum number of riders
allowed on board at one time)
Functions
T(si, sj) shortest-path travel time between station iand station j
c(r, k) the primary cost between rider trip rand tour k
d(r, k) the secondary cost between rider trip rand tour k
γ(d, k) the cost between driver dand tour k
Decision Variables
ωrd binary decision variable that holds the value 1 if rider ris matched with
driver d, and the value 0 otherwise
xl
dbinary decision variable that holds value 1 if driver dtravels on link l, and
value 0 otherwise
yl
rd binary decision variable that holds value 1 if rider ris served by driver don
link l, and value 0 otherwise
vij binary decision variable that holds the value 1 if link `ij is selected as a part
of the tour u, and value 0 otherwise
frk a binary variable that holds the value 1 if trip ris assigned to cluster k, and
value 0 otherwise
zdk binary decision variable that holds the value 1 if driver dis assigned to
cluster k, and the value 0 otherwise
Table 2: Table of Notation
7
or end their trips. Furthermore, we divide the time horizon, e.g., an hour, into a series of short
time intervals, e.g., 1 min. After this discretization in time and space, a travel time matrix Tcan
be used to retrieve the shortest path travel time between any pair of stations.
Let us consider a set of drivers Dand a set of riders Rin this system, and introduce set
P=RDto include all participants. A rider rRregisters her trip, including her origin station
sO
r, destination station sD
r, her earliest departure time tED
r, and her latest arrival time tLA
r. A
driver dDregisters her origin station sO
dand her earliest departure time tED
d. After serving
a rider, the driver’s time and location will be updated to the rider’s drop-off time and location,
respectively. Without loss of generality, here we assume that drivers are available for the entirety
of the study time horizon.
To accommodate the inherent uncertainty present in a dynamic ride-sharing system, we adopt a
rolling horizon strategy, in which the system operator solves the ride-matching problem periodically,
at evenly-spaced points in time to which we refer as re-optimization times. The time period between
two consecutive re-optimization times is a re-optimization period.
At each re-optimization time n, we formulate a ride-matching problem that consists of all
announced trips that have not yet expired or finalized. An announced trip is considered expired if
its latest departure time, i.e., tLA
rT(sO
r, sD
r), occurs before the end of the current re-optimization
period. In Fig. 1, for example, all trips whose announcement times are before re-optimization
time nand latest departure times are after re-optimization time n+ 1 will be considered in the
ride-matching problem in the specified re-optimization period. (Note that we require the latest
departure time of a trip to be higher than n+ 1 to account for solution time.) An announced trip
is considered finalized if it has been previously matched in the ride-matching problem. Drivers
who are matched previously can always be part of the new ride-matching problem after accounting
for their previous assignments, i.e., the new origin station and earliest departure time of a driver
who is transporting a passenger will be set to the drop off location and time of their onboard
passenger, respectively. The objective of the ride-matching problem is to maximize the number
of served riders. Since the dynamic ridesharing system solves ride-matching problems that are
structurally similar across re-optimization periods, in rest of this paper we focus our discussion to
the optimization problem in a single re-optimization period.
Figure 1: The rolling horizon implementation
4 The Ride-matching Problem
In this paper, we consider a one-to-many ride-matching problem in which a driver can serve multiple
riders. This ride-matching problem may arise in ride-sharing or ridesourcing systems with pooling
and can be formulated as in model (10) in Appendix I. This optimization problem is an NP-hard
problem, and cannot be readily adopted in a dynamic system. Hence, in this paper we introduce
8
a solution methodology that decomposes the original matching problem into multiple smaller sup-
problems that will be solved independently of each other, using model (10).
5 Illustrative Example
We use a small instance of a ridesourcing problem to clearly demonstrate each step of our proposed
method for a single static optimization problem. This example includes 4 drivers and 20 riders.
We use data form the New York City Taxi dataset [1] to construct this example. Due to the small
size of the problem, we only consider a single optimization period. Table 8in Appendix II provides
information on the 24 participants in this example, including their roles (rider/driver), their origin
and destination stations, and their earliest departure and latest arrival times. Tables 9and 10 in
Appendix III provide shortest path travel times between all pick-up and drop-off stations in this
example. We solved the ride-matching problem in model (10) for this small example using the
CPLEX solver in AMPL. The optimal matching results are demonstrated in Table 3. In the rest
of the paper, we apply the algorithms in each section to this example, and eventually compare the
outcome of the proposed methodology with that of the optimal solution.
Driver Served riders Vehicle itinerary
d1r1, r6, r9, r20 (19:00,162) (19 : 15,103) (19 : 22,97) (19 : 23,97) (19 : 30,56) (19 : 47,50)
(19 : 54,121) (19 : 59,125) (20 : 06,128)
d2r4, r11, r18 (19:12,95)(19 : 25,52) (19 : 35,124) (19 : 36,124) (19 : 43,76) (19 : 54,58)
(19 : 56,58) (20 : 04,159)
d3r2, r7, r14 (19:00,161)(19 : 13,80) (19 : 25,33) (19 : 33,60) (19 : 45,78) (19 : 47,78)
(19 : 58,161)
d4r3, r10, r16 (19:07,80)(19 : 16,129) (19 : 32,98) (19 : 34,98) (19 : 40,97) (19 : 50,98)
(19 : 52,98) (20 : 00,130)
Table 3: Optimal solution of the illustrative example. The itinerary of each driver is indicated as a sequence
of tuples. A tuple (t, s) indicates that the driver visits station sat time interval t.
6 Methodology
The matching problem in (10) can be solved quickly for the instance presented in section 5using
commercial solvers, because of the problem’s relatively small size. However, solving the optimization
problem in model (10) for larger instances of the ride-matching problem can be computationally
prohibitive for real-time implementations. As such, in this paper, we propose a framework to
produce high-quality solutions in near real-time, depicted in Figure 2.
This framework includes a clustering algorithm, which we call ε-uniform tour-based clustering.
This clustering algorithm includes a tour-forming problem, in which a tour, i.e., sequence of trips,
is formed in each cluster to represent the trips in the cluster, and an assignment step, in which trips
are assigned to clusters based on their proximity to the tours representing the clusters and subject
to a uniformity constraint. By iteratively solving these two problems, the clustering algorithm
groups trips into multiple clusters that can be optimized independently of each other. After the
clusters of trips are formed, in section 6.2 we present an optimization-based algorithm to assign
drivers to clusters of trips. Finally, the matching problem presented in Appendix I can be solved
independently for each cluster.
6.1 ε-uniform tour-based Clustering
The proposed ε-uniform tour-based clustering algorithm iterates between the following two steps
until convergence: (1) partitioning trip requests into approximately-uniform clusters so as to
minimize the intra-cluster distances, and (2) finding the best representative for each cluster. In
sections 6.1.1,6.1.2, and 6.1.3, we describe different components of the proposed clustering approach
and demonstrate each component using the illustrative example presented in Section 5. In section
9
Solve the tour-forming
problem for each cluster
(section 6.1.1)
Assign trips to clusters
under uniformity
constraints
(section 6.1.3)
Obtain trip-tour costs
(section 6.1.2)
Continue until the change in total cost between two
consecutive iterations is smaller than a threshold
Initialization
Split trips into 𝒦clusters randomly
𝜖-uniform
Clusters
of trips
Tour-based 𝝐-uniform clustering
(Algorithm 2, section 6.1)
Driver set
Assign drivers to clusters
Driver assignment
(Algorithm 3, section 6.2)
Vehicle and rider itineraries
Solve the matching problem
for each cluster
Figure 2: The general flow of the proposed framework
6.1.4, we combine the components to present the clustering method. Section 6.1.5 describes the
properties of the clustering approach.
6.1.1 The Tour Forming Problem
Let us define a cluster as a set of trips, and a vehicle tour to represent the members of a cluster.
A tour is a sequence of stations to be visited by a vehicle. We select a vehicle tour as a cluster
representative since the underlying ride-matching problem is a one-to-many problem, indicating
the importance of the sequence in which trips are served in obtaining a high quality solution. The
tour-forming problem seeks to find the best vehicle tour for a cluster of trips. This problem can be
represented by a graph G= (R, L), where Ris the set of ride requests and Lis the link set. A link
`ij ∈ L between riders iand jexists in graph Gif rider jcan always be served following rider i.
This condition can be mathematically expressed in inequality (1). This inequality ensures that a
driver who drops off rider iat her latest arrival time still has enough time to transport rider jto
her destination within this rider’s requested time window. Furthermore, we introduce two nodes,
Oand D, such that there is an outgoing link from node Oto all other nodes and an incoming link
from all nodes to node D.
tLA
i+T(sD
i, sO
j)tLA
jT(sO
j, sD
j) (1)
Under this setting, we seek to find the longest tour, i.e., the tour that contains the greatest
number of trips. This tour-finding problem can be formulated as a longest path problem, as
presented in model (2). The decision variable vij is a binary variable that takes the value 1 if link
`ij is selected as a part of the tour, and the value 0 otherwise. The objective function in (2a)
maximizes the number of trips that lie on the selected tour. Constraints (2b) and (2c) ensure that
the tour begins at Oand ends at D, respectively. Constraint (2d) is the flow balance constraint.
max w=X
i,jR
vij (2a)
s.t. X
jR
vOj= 1 (2b)
10
X
iR
viD= 1 (2c)
X
iR
vir X
iR
vri = 0 rR(2d)
vij ∈ {0,1} i, j R(2e)
To clearly demonstrate the tour-forming problem, we apply it to the illustrative example in
section 5. We randomly split the 20 rider trips in this problem into two clusters of size 10, and
solve the tour-forming problem on the graph G= (R, L) for each cluster, as demonstrated in
Figure 3. Each graph has 12 nodes, including 10 nodes associated with riders, node O, and node
D. The dashed arrows in this figure represent the link set L, obtained based on inequality (1).
The longest path problem in model (2) can be solved efficiently using polynomial time algorithms,
such as the Network Simplex algorithm. After solving the longest path problem, we obtain the
following tours (i.e., sequences of stations): [80 129 33 60 124 156] for cluster 1, and
[161 80 98 97 78 161] for cluster 2. These tours are demonstrated in Figure 3.
1
3
5
7
9
11
13
17
15
19
(a) Optimal tour in graph Gfor cluster 1
2
4
6
8
10
12
14
18
16
20
(b) Optimal tour in graph Gfor cluster 2
Figure 3: Tours associated with the two clusters in the illustrative example
6.1.2 The Cost Function
Once tours are formed to represent each cluster, we need to define a measure of distance between a
trip and a tour. This measure will be later used to assign trips to tours so as to minimize the sum
of the intra-cluster distances under uniformity constraints. Let us define the cost function c, whose
value c(r, k) denotes the distance between rider r’s trip and tour k, as presented in Equation (3).
We denote this cost as the primary cost, and define it as follows. If tour kcontains trip r, then
the value of the cost function is zero; otherwise, if trip ris not on tour k, then the value of cost
function is set to M–a large positive number. Note that this cost function is selected to guarantee
the convergence of the clustering algorithm, as will be discussed later in section 6.1.5.
c(r, k) = (0 if trip ris on the tour k
Mif trip ris not on the tour k(3)
When we assign trips to tours based on this primary cost, there might be many trips that do
not readily lie on any tour. In this case, a tie-breaking rule is needed. Here, we use a secondary
objective function to break the ties, to which we refer as the secondary cost. Since tours and trips
are both sequences of stations, we use an algorithm inspired by dynamic time warping (DTW)
to measure the cost of assigning a trip to a tour. DTW is a method developed for measuring the
similarity between two sequences by finding an optimal match between their elements [19]. Consider
a tour kthat has a sequence of mstations Kk= [sk(1), sk(2), ..., sk(m)], i.e., m/2 sequential trips.
11
Let us represent a trip ras a sequence of stations, denoted by Rr= [sr(1) = sO
r, sr(2) = sD
r]. We
measure the distance between each pair of stations iand jby the shortest-path travel time between
them, denoted by T(i, j).
The distance between a trip and a tour can be calculated as the smallest total distance between
the origin and destination of the trip (i.e., sr(1) and sr(2)) from any station on the tour, under
the condition that the station on the tour that is matched to the trip destination should appear
after the station matched to the trip origin. Mathematically, this condition can be specified as
q+ 1 sr(2) mwhen sr(1) = q. (Note that here we use equality to indicate the matching of
two stations.) It is easy to see that one can enumerate all the possible matchings between the two
sequences [sr(1), sr(2)] and [sk(1), sk(2), ..., sk(m)] to find the matching that provides the smallest
distance. Algorithm 1lays out the details of measuring this distance without having to enumerate
all the possibilities, rendering this step more computationally efficient. This algorithm starts by
defining a 2 ×mmatrix D, corresponding to the size of the two sequences. The first row of this
matrix is the shortest path travel time between the trip origin and the stations on the tour. In the
second row of this matrix, all distances are set to infinity, except for the distance between the trip
destination and the last station on the tour. The distances between other stations and the trip
destination are calculated recursively using Equation (5). Finally, Equation (6) evaluates the total
distances between the trip ends and their best matched stations on the tour, and finds the smallest
of these distances as the secondary cost.
Algorithm 1: Obtaining the dissimilarity between a trip and a tour
Input: A trip [sr(1), sr(2)]
Input: A tour [sk(1), sk(2), ..., sk(m)]
Input: Shortest-path travel time matrix Tt
Output: The distance d(r, k) between trip rand tour k
Step 1 Assume an initial distance matrix Dof size 2 ×mas follows
D=T(sr(1), sk(1)) · · · T(sr(1), sk(m1)) T(sr(1), sk(m))
· · · T(sr(2), sk(m)) (4)
Step 2 Update the distance matrix D
for q=m1,· · · ,1do
D[2, q] = min{T(sr(2), sk(q)) , D[2, q + 1] }(5)
Step 3 Calculate the distance between trip rand tour k
d(r, k) = min
q{D[1, q] + D[2, q]}(6)
Using these two cost functions, we calculate the primary and secondary costs, c(r, k) and d(r, k),
between trips and tours. In our illustrative example, the primary costs between trips R3,R7, and
R15 and tour K1, and between trips R2,R10, and R14 and tour K2, are zero. The primary costs
between other trips and the two tours are M. The primary and secondary costs of all trips in the
illustrative example are displayed in Table 11 in Appendix IV.
6.1.3 The Two-Step Trip Assignment Process
In this section, we discuss assignment of trips to clusters under a uniformity constraint. The output
of this step is a revised set of clusters. In a conventional clustering problem, objects are allocated
12
to clusters so as to minimize sum of intra-cluster costs. Here, our end goal is not to cluster trips,
but to solve the optimization problem in model (10) for each cluster in near real-time. Since the
computational complexity of the optimization problem in each cluster depends on the number of
trips in that cluster, clustering instances in which cluster sizes are highly non-uniform are not of
interest, since larger clusters would create a computational bottleneck. As such, we strive to form
clusters that are approximately uniform in their number of trips.
In this paper, we consider a two-step assignment process. The first assignment problem seeks
to allocate trips to clusters so as to minimize the total primary cost, as indicated in the objective
function (7a). Here, the decision variable frk takes value 1 if trip ris assigned to cluster k, and
value 0 otherwise. Constraint (7b) ensures that each trip is allocated to a single cluster. The
assignment problem in model (7) has a trivial solution wherein trips that lie on a tours, i.e., trips
for which c(r, k) = 0, would be allocated to the cluster represented by that tour.
min zp=X
rRX
kK
c(r, k)frk (7a)
s.t. X
kK
frk = 1 rR(7b)
frk ∈ {0,1} ∀rR, kK(7c)
After this initial assignment, we solve a second assignment problem to allocate trips that do
not readily lie on a tour. This allocation problem can be formulated as an ε-uniform assignment
problem in model (8). Let us define the set Rcto include ride requests that do not readily lie
on a tour, i.e., trips for which c(r, k) = M. The objective function in (8a) minimizes sum of the
secondary costs between trips and their associated cluster representatives. Constraint (8b) ensures
that each trip is assigned to a single cluster. constraint (8c) ensures that the difference between
the number of trips in set Rcin any two clusters is at most ε|Rc|. The parameter εis an imbalance
parameter, whose value affects the size clusters. ε= 0 ensures that all clusters have the exact same
size, while ε=K − 1 imposes no constraint on cluster sizes. Note that the ε-uniform assignment
problem was first proposed in [59] for a peer-to-peer ridesharing system. Model (8) is based on the
work in [59], customized for a ridesourcing system with one-to-many matching.
min zs=X
rRcX
kK
d(r, k)frk (8a)
s.t. X
kK
frk = 1 rRc(8b)
X
rRc
frk |Rc|
K(1 + ε)kK(8c)
frk ∈ {0,1} ∀rRc,kK(8d)
Figure 4shows the outcome of this two-step assignment process for the illustrative example. In
this figure, trips that lie on tours all have primary cost of zero. Other trips are allocated to tours
so as to minimize the secondary cost under the uniformity constraint for cluster sizes.
13
𝑟
1
𝑟
17
𝑟
8
𝑟
12
𝐷
𝑂
𝑟
4
𝑟
18
𝑟
15
𝑟
13
𝑟
11
𝑟
7
𝑟
5
𝑟
19
𝑟
20
𝑟
2𝑟
9
𝑟
6
𝑟
10
𝑟
16
𝑟
3
𝑟
14
Figure 4: Assignment in iteration 1 of the illustrative example. The trips on tours have primary cost of
zero. Other trips are allocated to clusters based on their secondary costs, as outlined in Table 11 in Appendix
IV.
6.1.4 The ε-uniform tour-based Clustering Algorithm
In the previous sections, we outlined different components of the ε-uniform tour-based clustering
algorithm, i.e., the tour-forming problem, the assignment problems, and the cost measures to
quantify the distance between a tour and a trip.
The ε-uniform tour-based clustering algorithm is described in Algorithm 2. The inputs to
this algorithm are the set of trips, the number of clusters, and the uniformity parameter, and the
maximum number of iterations. The algorithm starts (step 0) by randomly assigning Rtrips into K
clusters. In step 1, a tour is formed using the set of trips in each cluster. In step 2, distances between
trips and tours are obtained, allowing for computing the new total intra-cluster distances. In the
assignment step (step 3), first, the assignment solution based on the primary costs is obtained.
Next, the ε-uniform assignment problem is solved, where trips that do not readily lie on tours are
assigned to clusters based on their proximity to tours as well as the uniformity constraint applied
to cluster sizes. The objective function value corresponding to the optimal assignment based on
the primary costs provides the total sum of intra-cluster distances. Step 4 assesses the termination
criteria. If the total intra-cluster distance obtained from assignments in two consecutive iterations
(obtained in steps 2 and 3) remains the same or the maximum number of iterations is reached,
then the algorithm terminates, providing a local optimal solution. Otherwise, the iteration counter
αis increased by 1, and steps 1 through 3 are repeated. Since this process provides a local optimal
solution, we repeat it for a total of itr times, and report the final set of clusters, and their associated
tours, that provide the lowest overall intra-cluster distance.
𝑟
1
𝑟
17
𝑟
8
𝑟
12
𝐷
𝑂
𝑟
4
𝑟
18
𝑟
15
𝑟
13
𝑟
11
𝑟
7
𝑟
5
𝑟
19
𝑟
20
𝑟
2𝑟
9
𝑟
6
𝑟
10
𝑟
16
𝑟
3
𝑟
14
(a) Tours and assignments (b) Convergence of Algorithm 2
Figure 5: Final assignment for the illustrative example
14
Algorithm 2: The ε-uniform tour-based clustering Algorithm
Input: Set of trips, R
Input: Number of clusters, K
Input: Uniformity parameter, ε
Input: Max number of iterations, αmax
Output: Kclusters of trips and their corresponding tours, T
for a= 1, ..., itr do
Step 0 Initialization
Obtain f
rk (0) by randomly dividing Rtrips into Kclusters
α1
Step 1 Tour formation
Choose a random sample of trips in each cluster (where clusters are determined by
f
rk (α1)) to form new tours by solving the optimization problem in model (2) for
each cluster. Let wbe the optimal value of the objective function.
v
ij (α)argmax w
Step 2 Cost update
Calculate the new primary and secondary costs, c(n, k) and d(n, k), respectively, using
Equation (3) and Algorithm 1, and based on f
rk (α1) and v
ij (α)
Calculate hα=PK
k=1 PR
r=1 c(r, k)
Step 3 Assignment
Assign Rtrips to Kclusters. Set zas the number of trips on tours, and find z
sby
solving the optimization problems in model (8).
hα+1 z
f
rk (α)argmin z
s
Step 4 Termination criteria
if hα=hα+1 or α=αmax then
Ca=hα
Ta={f
rk (α), v
ij (α)}
Terminate
else
αα+ 1
Go to Step 1
TTacorresponding to the solution with the minimum Ca
The final clustering results for the illustrative example are shown in Figure 5. The solution
consists of two tours, one including three trips and the other one including four trips. As figure 5b
demonstrates, convergence is obtained in only three steps.
6.1.5 Properties of the ε-uniform tour-based Clustering Algorithm
In this section, we first prove that the objective function of the ε-uniform tour-based clustering
problem decreases with iterations in Algorithm 2. Next, we prove the convergence of the algorithm.
Proposition 1. The objective function in model (7) decreases with iterations of Algorithm 2.
Proof. In step 1 of Algorithm 2, the optimization problem in model (2) seeks to find the longest
tour within each cluster. It is easy to see that finding the tour with maximum length is equivalent to
15
solving an optimization problem that finds the min-cost tour, where the cost function is described
in Equation (3). In step 3 of Algorithm 2, the first step of the assignment is completed under the
same cost function. Therefore, it is easy to see that each of these two main steps in Algorithm 2
attempt to minimize the same cost function, and the results follow.
Proposition 2. Algorithm 2 Converges in a finite number of steps.
Proof. There are possibly many but finite number of ways to assign Rtrips to Kclusters.
Furthermore, in Proposition 1we showed that the objective function decreases from one iteration
to the next (otherwise, we stop). As such, there are a finite number of ways in which clusters could
change, and the algorithm is designed such that no solution is visited twice (unless at convergence,
at which point we stop). The results follow.
6.2 Driver Assignment
After clusters of trips are formed, we need to allocate drivers to these clusters. Algorithm 3details
the steps of this procedure. In the first step of this algorithm, the cost of assigning a driver dto
a cluster kis calculated. This cost is based on the time distance between the origin location of
driver dto the pick-up location of the first trip on tour kthat can be served by d, and the number
of remaining trips on the tour. In general, the higher the time distance, the higher the cost of
allocating the driver to the cluster. Adversely, the higher the number of the remaining trips on
tour k, the more effective driver dcan be in serving the cluster, and therefore the cost would be
lower.
In the second step of Algorithm 3, we solve the bipartite matching problem outlined in model
(9) to allocate the set of drivers to clusters. The decision variable zdk takes the value 1 if driver dis
assigned to cluster k, and the value 0 otherwise. The objective function in (9) minimizes the total
driver-cluster assignment cost. Constraint (9b) ensures that the proportion of drivers allocated to
a cluster is approximately equal to the proportion of trips in that cluster. Constraint (9c) ensures
that each driver is allocated to exactly one cluster. Constraint (9d) imposes the binary condition
on the decision variable zdk. Note that the constraint coefficient matrix in model (9) has a totally
unimodular structure, relaxing this optimization problem to a linear program that can be easily
solved in real-time.
The assignment of drivers to clusters is demonstrated in Table 4. The optimization problem in
model (9) allocates drivers 1 and 3 to cluster 1, and drivers 2 and 4 to cluster 2.
7 Numerical Experiments
In this section, we use the New York City Taxi dataset [1] to demonstrate the performance of the
ε-uniform tour-based clustering method.
7.1 Dataset
The New York City Taxi and Limousine Commission (NYC TLC), in partnership with the NYC
Department of Information Technology and Telecommunications (DOITT), has published millions
of trip records from yellow medallion taxis and green SHLs for several years. These records include
attributes such as pick-up and drop-off dates, times, and locations, trip distances, itemized fares,
rate types, payment types, and driver-reported passenger counts.
The data used in this study belongs to the evening peak hour (19:00-20:00) of Feb 19th in
2016. We select trips that are geographically concentrated in the Manhattan area, thereby creating
ride-matching problems that are large-scale due to the high number of trips as well as the high
spatiotemporal proximity between them.
16
Algorithm 3: Assignment of drivers to clusters
Input: Ktours, T
Input: Set of drivers, D, with their origins, sO
dand earliest departure times, tED
d
Output: Driver assignment to clusters, z
dk,dD, k K
nknumber of trips in cluster kunder T
Step 1 Average cost of driver-tour assignment
for dDdo
for kKdo
Find the first trip in tour kthat can be served by dbased on tED
d. Denote this trip
as r. Denote the number of trips after this trip on the tour as nrem
γ(d, k) = T(sO
d,sO
r)
nrem+1
Step 2 The driver assignment problem
Solve the optimization problem in (9) to obtain the optimal driver-cluster assignment, z
dk
min X
dDX
kK
γ(d, k)zdk (9a)
X
dD
zdk ≥ bPnRf
rk
RDc ∀kK(9b)
X
kK
zdk = 1 dD(9c)
zdk ∈ {0,1} ∀d, kK(9d)
7.2 Simulation Settings
In this study, we adopt a rolling-horizon approach with a re-optimization period of 1 minute,
indicating that the ride-matching optimization problem will be solved every 1 minute starting from
18:59pm. We assume that all trips will be completed on their shortest travel time paths, and obtain
the travel times for every re-optimization period from the Google Maps API. In this study, we set
the ratio of riders to drivers to 4, i.e., the total number of riders is 4 times the number of drivers.
We generate the earliest departure times of drivers uniformly from the window 18:59 to 20:00. We
assume that riders request a ride a few minutes ahead of their earliest departure times. Since the
dataset does not contain the times when ride requests are issued, we generate a uniform random
number from 0 to 30 for each rider, and subtract it from their earliest departure time to obtain
their ride-request time. Doing this allows us to have a mixture of pre-arranged and on-the-fly
trip requests. We assume that the study area consists of 184 pre-defined stations, denoted by S,
Driver Matched with Number of
riders being served
Cluster 1 d1r1,r6,r93
d3r2,r10,r14 3
Cluster 2 d2r4,r16 2
d4r3,r7,r12,r20 4
Table 4: The final ride-matching results
17
where participants start/end their trips. Stations are distributed in the network so as to make
sure that there is at least one station within the walking distance (<0.15 miles) of a typical trip’s
origin/destination. (Fig. 6). We set the capacity of each vehicle to four.
Figure 6: The pre-defined stations distribution of the Manhattan area.
7.3 Results
In this section, we first define a base experimental setting and conduct extensive numerical analysis
to draw a comparison between our proposed method, three benchmarks from the literature, and
the optimal solution. Next, we observe the convergence properties of Algorithm 2, and study the
impact of sample size in this algorithm. Finally, we conduct sensitivity analysis over some of the
critical parameters of our method. For all experiments conducted in this section, we consider a
planning horizon of one hour, with sixty one-minute re-optimization periods.
7.3.1 Base Experiment
In our base experimental setting, we set the total number of ride requests to 2000, and the sample
size in Algorithm 2to 150. We set the value of εto 0.1, and the maximum number of iterations to
10.
Figure 7demonstrates the clustering results, where for a given trip request, both trip ends are
colored based on the clusters to which the trip is assigned. It is interesting to note that the clusters
in Figure 7have a high level of spatial overlap.
In order to objectively assess the performance of the proposed method, we compare it against
three benchmark methods form the literature, including point-based clustering [53], balanced point-
based clustering (point-based clustering with uniform clusters), ε-uniform trip-based clustering
[59], and the exact mathematical formulation presented in model (10), solved to optimality. These
comparisons allow us to position our proposed method in terms of solution quality and the required
computational effort.
In the point-based method, trip ends (i.e., origin and destination locations) of all trips (i.e., both
rider and driver trips) are the objects in clustering. Once the clusters are obtained, an optimization
18
(a) Two clusters. (b) Three clusters. (c) Four clusters.
Figure 7: ε-uniform tour-based clustering results. Trip ends are colored based on their assigned clusters.
problem can be solved independently for each cluster. Trips whose origin and destination locations
fall in different clusters will not be served. The balanced point-based clustering is similar to point-
based clustering, except that balance constraints are imposed on the number of objects within
clusters. In trip-based clustering, each trip (whether it is a rider or a driver trip) is considered an
object in clustering. Once clustering is concluded, the rider and driver trips within each cluster are
matched independent of other clusters.
Tables 5,6and 7summarize the results. The value of kin these tables indicates the number
of clusters. The reported computation time is the total time spent on solving the ride-matching
problem, and the time for clustering where applicable, over 60 re-optimization periods, each of
duration 1 min. For the ε-uniform tour-based method, the computation time is provided in more
detail, breaking the total time into the clustering and optimization times. The clustering time
consists of the time required for generating the network, computing the costs, and the assignment
steps. The optimization solution time is the sum of the time spent on model construction and
obtaining a solution from a commercial solver. The numbers in parenthesis are the average
computation times across all re-optimization periods. The optimal solution finds a match for
60.40% of the ride requests. We normalize the optimal matching rate to 100%, and report the
solution quality of all other methods as their matching rate divided by the optimal matching rate.
Table 5: Comparison of matching rate and solution time between the proposed -uniform tour-based
method, the optimal solution, and the point-based clustering benchmark. The matching rate of the optimal
solution is normalized to 100%, and the matching rates obtained by other methods are reported in terms of
the percentage of the optimal matching rate. The value of Krefers to the number of clusters.
Optimal Point-based Tour-based
kComputation
Time (Sec)
Solution
quality kComputation
Time (Sec)
Solution
quality k
Computation
Time (Sec) Solution
quality
Total Graph
Partitioning
Optimization
Problem
11834
(30.56) 100%
2973
(16.22) 82.62% 2 366
(6.1) 146 220 94.95%
3942
(15.70) 67.30% 3 283
(4.72) 163 120 77.15%
4933
(13.88) 60.60% 4 197
(3.28) 151 46 65.98%
Table 5compares the ε-uniform tour-based clustering method with point-based clustering and
the optimal solution. This table demonstrates that the gap in solution quality by the two methods
decreases as we increase the number of clusters. This is due to the fact that when forming clusters,
both methods disregard potential matches between objects that are assigned to different clusters.
The higher solution times for the point-based method can be explained by the fact that this method
does not seek to construct uniform clusters, implying that some clusters may have a significantly
higher number of trips, and thereby higher optimization times. This imbalance in the size of clusters
19
Table 6: Comparison of matching rate and solution time between the proposed -uniform tour-based
method, the optimal solution, and the balanced point-based clustering benchmark. The matching rate of
the optimal solution is normalized to 100%, and the matching rates obtained by other methods are reported
in terms of the percentage of the optimal matching rate. The value of Krefers to the number of clusters.
Optimal Balanced point-based Tour-based
kComputation
Time (Sec)
Solution
quality kComputation
Time (Sec)
Solution
quality k
Computation
Time (Sec) Solution
quality
Total Graph
Partitioning
Optimization
Problem
11834
(30.56) 100%
2343
(5.72) 75.32% 2 366
(6.1) 146 220 94.95%
3251
(4.18) 62.43% 3 283
(4.72) 163 120 77.15%
4173
(2.88) 54.37% 4 197
(3.28) 151 46 65.98%
Table 7: Comparison of matching rate and solution time between the proposed -uniform tour-based
method, the optimal solution, and the -uniform trip-based clustering benchmark. The matching rate of the
optimal solution is normalized to 100%, and the matching rates obtained by other methods are reported in
terms of the percentage of the optimal matching rate. The value of Krefers to the number of clusters.
Optimal Trip-based Tour-based
kComputation
Time (Sec)
Solution
quality kComputation
Time (Sec)
Solution
quality k
Computation
Time (Sec) Solution
quality
Total Graph
Partitioning
Optimization
Problem
11834
(30.56) 100%
2350
(5.8) 83.61% 2 366
(6.1) 146 220 94.95%
3279
(4.65) 68.54% 3 283
(4.72) 163 120 77.15%
4185
(3.08) 59.85% 4 197
(3.28) 151 46 65.98%
under point-based clustering is demonstrated in Figure 8. The lower solution quality of the point-
based method is due to the fact that in this method clusters are constructed based on trip ends,
rather than whole trips. Therefore, since after clustering the matching optimization problems are
solved independently for each cluster, trips whose trip ends lie in different clusters are not served.
This is a drawback that is addressed by the trip-based clustering method in which an entire trip is
considered as a clustering object.
Table 6compares the ε-uniform tour-based clustering method with balanced point-based
clustering, in which a uniformity constraint is incorporated in the clustering algorithm, and the
optimal solution. This table suggests that when we seek to balance cluster sizes in point-based
clustering, the computation time decreases significantly. However, unsurprisingly, the solution
quality of the balanced point-based clustering is worse than that of the unbalanced point-based
clustering and the tour-based method. This table suggests that with k= 2, the solution quality
of the tour-based method is higher that of the balanced point-based method by about 20%. This
is due to the fact that when forming uniform clusters, the system has a lower level of flexibility to
assign trips to clusters, thereby resulting in lower quality solutions.
Finally, Table 7compares the tour-based and trip-based clustering approaches. The tour-
based method outperforms the solution quality of the trip-based method by about 11% under two
clusters. The reason behind this improvement in solution quality can be attributed to the fact that
in tour-based clustering trips that are close to any trip on the representative tour are assigned to
that cluster. As a result, not only are trips that are spatio-temporally close assigned to the same
cluster, as is the case in trip-based clustering, but also trips assigned to a cluster can be served
sequentially by a single vehicle. This makes it more likely for the matching problem to make a
more effective use of the vehicles in serving ride requests.
The gap between these solutions is reduced to 6% under 4 clusters for the same reasons discussed
above. The difference in solution times of these two methods is not statistically significant, as both
20
Figure 8: Cluster sizes for point-based, balanced point-based, trip-based and tour-based methods.
methods strive to generate clusters that are balanced in size within a threshed. The main difference
between the balanced point-based, trip-based, and tour-based approaches is what they consider as
a unit of analysis: the point-based method considers a trip end as a unit of modeling, while the
trip-based method considers an entire trip, and the tour-based method considers a tour–a sequence
of trips–as a cluster representative. The tour-based method provides the highest quality solution
because the cluster representatives can more closely capture the ultimate product of the matching
problem, which in a ridesourcing setting is a set of vehicle tours.
7.3.2 Algorithm Properties
In this section, we first study the convergence rate of our base experimental setting. Next, we
investigate the impact of sample size on the quality of solutions.
Convergence Properties
Figure 9demonstrates the convergence of the objective function in model (8) in our base experiment,
using a randomly-selected instance. This figure clearly shows that the objective function increases
monotonically, which is equivalent to the cost function decreasing monotonically. As demonstrated
in this figure, the objective function typically converges in a few iterations.
Figure 9: Primary and secondary costs per iteration.
Impact of Sample Size on Solution Quality
Figure 10 shows the impact of sample size in Algorithm 2on the computation time and the
secondary cost in the tour forming problem. Figure 10(a) displays that the computation time
increases super-linearly with sample size. Figure 10(b) demonstrates that the secondary cost
decreases with sample size. This is due to the fact that increasing the sample size increases
the likelihood of obtaining tours that better represent their corresponding clusters, resulting in
a smaller secondary cost for the entire set of trips. However, there is a critical sample size beyond
21
which the reduction in cost becomes negligible. In our experimental setting, this sample size is
about 150 trips, which is the number utilized in our experiments.
(a) Computation time (b) Total cost
Figure 10: The computation time and the secondary cost of the tour forming problem with different sample
sizes
7.3.3 Sensitivity Analysis
The Imbalance Parameter
Figure 11 shows the impact of changing the value of εon the number of served trips as well as
the computation time of the ε-uniform tour-based clustering method under different numbers of
clusters. Figure 11(a) displays that as the value of εincreases, the number of served riders increases
for all values of K. This is because a higher value for εprovides a higher level of flexibility to assign
trips to clusters, thereby resulting in higher quality solutions. Figure 11(b) demonstrates that the
computation time increases with εover all values of K. This is due to the fact that the increased
level of flexibility that accompanies a higher εvalue results in less uniform cluster sizes. As such,
some clusters will be larger than others, increasing the overall solution time.
(a) Number of riders being served. (b) Computation time
Figure 11: Number of served riders and computation time of the proposed algorithm with different values
of ε
22
Figure 12 provides a more detailed view of the change in computation time for different values
of ε. Figures 12(a) and 12(b) demonstrate the distribution of computation time under 3 and
4 clusters, respectively. These figures indicate that the cluster sizes become less uniform as we
increase ε, resulting in the larger clusters taking longer to optimize.
(a) Number of clusters = 3 (b) Number of clusters = 4
Figure 12: Distribution of computation time
Number of System Participants
Figure 13 displays the influence of the size of participants on the computation time and the number
of served riders under different numbers of clusters and ε= 0.1. Note that k= 1 provides the
optimal solution. Figure 13(a) demonstrates that the computation time decreases with the number
of clusters regardless of the number of participants. Still, the rate of reduction in solution time
decreases as the number of clusters becomes larger. This is because the larger number of clusters
indicates that there are fewer participants in each cluster, which results in less computation time.
However, once the number of clusters reaches a threshold, the reduction in solution time as we
increase the number of clusters becomes small, indicating that there is a critical threshold for k
where having more clusters does not help with reducing solution time any further, but decreases
system throughput (Figure 13(b)). Additionally, Figure 13(a) shows that once the number of
clusters is over a critical threshold, the computation time is not substantially affected by the
number of participants anymore.
Figure 13(b) shows that increasing the number of clusters reduces the matching rate; however,
this reduction is smaller when the base number of participants is lower. This is due to the fact that
with higher trip density produced by a higher number of participants, clustering leads to removing
a higher number of potential matches from the feasible region of the solution, leading to higher
loss in system throughput. This figure also shows that the rate of reduction in system throughput
decreases between k= 3 and k= 2, compared to k= 2 and k= 1.
Figure 13 shows that, regardless of the number of participants, there is a a critical value for k,
where increasing the number of clusters beyond this value does not reduce the solution time any
further, but reduces system throughput. For our experimental setting, this critical value is k= 2.
8 Conclusion
In this paper we devise a framework to solve the ride-matching problem that arises in dynamic
ridesourcing systems in a distributed fashion. The methodology is based on clustering, where ride
requests are grouped into a number of clusters so as to (1) maximize the intra-cluster similarity
23
(a) Computation time (b) Number of served riders
Figure 13: Number of served riders and computation time under different numbers of clusters
between trips within a cluster, and (2) guarantee cluster sizes to be uniform within a threshold.
The proposed clustering approach accounts for the fact the ultimate goal is to form vehicle tours in
each cluster, thereby using tours, i.e., sequences of trips, as cluster representative. We devise what
we call the ε-uniform tour-based algorithm to assign trips to clusters, and prove its convergence.
Next, we optimally assign drivers to clusters, and solve the ride-matching problem for clusters
independently of each other.
Using the New York City taxi dataset, we conduct extensive numerical experiments to analyze
the performance of the proposed methodology and compare it against three state-of-the-art
benchmarks, namely point-based, balanced point-based, and trip-based clustering, as well as the
optimal solution. First, we demonstrate that the proposed methodology has favorable convergence
properties, providing solutions in a few iterations. Secondly, we demonstrate the importance of
forming approximately-uniform clusters, and showcase the resulting trade-offs between solution
time and quality. Finally, we show that our proposed methodology could result in a statistically
significant increase in the matching rate compared to the benchmarks, where this improvement
decreases with the number of clusters.
Acknowledgment
The work described in this paper was supported by NSF award 2046372.
24
Appendix I
we consider a one-to-many ride-matching problem in which a driver can serve multiple riders. In
such systems, a trip can be denoted by a link, l= (ti, si, tj, sj)T×S×T×S, where Tis an ordered
set of time intervals during the study time horizon. Due to the large size of the transportation
network and the number of time intervals in the study horizon, solving such a ride-matching problem
can be computationally prohibitive. Therefore, we adopt the pre-processing procedure proposed
by [49] to reduce the size of the link sets. The rationale of the pre-processing procedure is that
the spatiotemporal constraints enforced by travel time windows of participants limit their access
to members of the link set L. This pre-processing procedure starts by forming an ellipse for
each participant, where the foci of the ellipse are set to the participant’s origin and destination
stations, the distance between the foci is the Euclidean distance between the participant’s origin
and destination stations, and the distance between the vertices of the ellipse is set to an upper-
bound on the distance that the participant can travel within their travel time window. This ellipse
defines a reduced graph, where any link with at least one station outside of the ellipse will be
infeasible for the participants as it will violate at least one of the spatio-temporal constraints.
This pre-processing procedure provides LdLand LrLas the set of links accessible to
driver dand rider r, respectively. Furthermore, we define Lrd =LrLd. Finally, TrTand
TdTare sets of time intervals within the time window of rider rand driver d, respectively. The
ride-matching problem can be modeled on a graph G= (S, L), where Sis the set of pick-up and
drop-off stations, and Lis the set of links.
This problem can be mathematically formulated as an integer programming model in (10). In
this model, the decision variable ωrd is a binary variable that takes on the value 1 if rider ris
matched with driver d, and the value 0 otherwise. There are two additional sets of binary variables
: (1) xl
dwhich takes the value 1 if driver dtravels on link l, and the value 0 otherwise; and (2) yl
rd
which takes the value 1 if rider ris transported by driver don link l, and the value 0 otherwise.
The objective function in (10a) maximizes the total number of served riders. Constraints (10b)
and (10c) ensure that drivers start their trips from their origin stations and end their trips at their
destination stations, respectively. Constraint (10d) guarantees the flow conservation of vehicles.
Constraints (10e) and (10f) ensure that served riders depart from their origin stations and arrive at
their destination stations within their specified time windows. Constraint (10g) guarantees the flow
conservation of riders. Constraint (10h) states that riders can be matched with only one driver.
Constraint (10i) limits the capacities of the vehicles.
max X
rRX
dD
ωrd (10a)
s.t. X
lLd:
si=sO
d;ti,tjTd
xl
dX
lLd:
sj=sO
d;ti,tjTd
xl
d= 1 dD; (10b)
X
lLd:
sj=sD
d;ti,tjTd
xl
dX
lLd:
si=sD
d;ti,tjTd
xl
d= 1 dD; (10c)
X
ti,si
l=(ti,si,t,s)Ld
xl
dX
ti,si
l=(t,s,ti,si)Ld
xl
d= 0 dD, tTd,sS\ {sO
dsD
d};
(10d)
25
X
lLrd:si=sO
r
;ti,tjTr
yl
rd X
lLrd:sj=sO
r;
ti,tjTr
yl
rd =ωrd rR, dD; (10e)
X
lLrd:
sj=sD
r;ti,tjTr
yl
rd X
lLrd:
si=sD
r;ti,tjTr
yl
rd =ωrd rR, dD; (10f )
X
dDX
ti,si
l=(ti,si,t,s)Lrd
yl
rd X
dDX
ti,si
l=(ti,si,t,s)Lrd
yl
rd = 0 rR, tTr,sS\ {sO
rsD
r};
(10g)
X
dD
ωrd 1rR(10h)
X
rR
yl
rd capddD, lLd(10i)
26
Appendix II
Participant Role Origin
station
Destination
station
Earliest
departure time
Latest arrival
time
Shortest path
travel time
r1rider 162 103 19:00 19:19 15
r2rider 161 80 19:00 19:15 13
r3rider 80 129 19:07 19:16 9
r4rider 95 52 19:12 19:30 13
r5rider 97 56 19:16 19:36 16
r6rider 97 121 19:23 19:55 30
r7rider 33 60 19:25 19:39 8
r8rider 50 35 19:28 19:53 20
r9rider 56 50 19:30 19:56 17
r10 rider 98 97 19:34 19:41 6
r11 rider 124 76 19:36 19:45 7
r12 rider 26 71 19:38 19:50 9
r13 rider 57 131 19:42 19:52 7
r14 rider 78 161 19:47 20:02 11
r15 rider 124 156 19:50 19:54 3
r16 rider 98 130 19:52 20:05 8
r17 rider 49 124 19:53 20:03 10
r18 rider 58 159 19:56 20:06 8
r19 rider 128 100 19:58 20:10 9
r20 rider 125 128 19:59 20:15 13
d1driver 98 Na 18:50 Na Na
d2driver 57 Na 19:00 Na Na
d3driver 31 Na 18:50 Na Na
d4driver 78 Na 19:00 Na Na
Table 8: Information of participants in the illustrative example
27
Appendix III
Table 9: Shortest path travel time for illustrative example (I).
Origin
Station
Destination
Station 26 31 33 35 49 50 52 56 57 58 60 71 76 78 80 95
26 0 25 16 29 7 11 20 29 33 38 37 9 13 19 23 13
31 10 0 11 18 11 9 6 16 21 26 36 19 10 7 11 20
33 14 4 0 9 15 13 10 7 12 17 8 23 14 11 10 24
35 18 8 4 0 19 17 14 5 5 10 21 27 18 15 14 28
49 7 21 28 31 0 4 14 26 30 35 46 9 10 16 20 10
50 6 17 26 20 12 0 10 23 27 32 43 11 7 13 17 11
52 11 7 18 25 11 9 0 18 23 28 28 20 10 9 13 20
56 18 8 4 7 19 17 12 0 5 10 3 27 18 33 39 28
57 21 11 7 3 22 20 15 3 0 5 16 30 21 16 12 31
58 24 14 10 6 25 23 18 6 3 0 11 33 24 19 15 34
60 34 24 20 16 35 33 28 16 13 10 0 43 34 29 25 43
71 13 24 24 35 10 14 19 28 32 37 38 0 12 18 22 12
76 17 12 19 23 13 11 7 16 20 25 36 22 0 6 10 10
78 16 6 13 17 17 15 7 10 14 19 30 25 11 0 4 21
80 22 12 9 13 23 21 13 6 10 15 26 31 22 13 0 32
95 15 18 15 29 11 9 13 22 26 31 18 20 6 12 16 0
97 22 12 19 23 22 20 11 16 20 25 36 31 11 6 10 12
98 19 9 16 20 20 18 10 13 17 22 33 28 14 3 7 18
100 26 16 12 15 27 25 17 8 12 17 28 35 26 17 4 29
103 32 22 18 14 33 31 26 14 11 10 18 41 32 25 18 32
121 16 27 34 38 13 17 22 31 35 40 47 7 15 21 25 12
124 17 22 29 33 14 13 17 26 30 35 45 16 7 16 20 4
125 18 21 28 32 14 12 16 25 29 34 42 21 9 15 19 3
128 22 12 19 23 23 21 13 16 20 25 31 31 17 6 10 21
129 28 16 9 23 25 22 18 18 21 21 26 27 15 9 9 17
130 31 21 17 20 32 30 22 13 17 21 26 40 27 16 9 27
131 28 18 14 17 29 27 22 10 14 18 23 37 28 21 12 30
156 22 25 32 36 18 16 20 29 33 38 40 15 13 19 23 7
159 28 18 25 29 26 24 19 22 26 31 36 30 21 12 16 15
161 29 19 21 24 30 28 20 17 21 26 30 38 24 13 13 24
162 32 22 12 23 33 31 23 16 20 24 26 38 27 16 12 23
Table 10: Shortest path travel time for illustrative example (II).
Origin
Station
Destination
Station 97 98 100 103 121 124 125 128 129 130 131 156 159 161 162
26 21 22 26 36 13 11 14 25 28 28 31 16 23 30 30
31 16 10 14 24 23 21 21 13 16 16 19 26 19 18 20
33 20 14 13 23 27 25 25 17 19 15 18 30 23 22 19
35 24 18 17 16 31 29 29 21 23 19 22 34 27 26 23
49 18 19 23 33 13 11 14 22 25 25 28 16 23 27 29
50 15 16 20 30 15 13 16 19 22 22 25 18 21 24 26
52 13 12 16 26 24 22 21 15 13 18 21 17 21 20 22
56 22 16 12 16 31 18 29 19 18 14 17 14 25 38 18
57 25 19 15 11 34 32 32 22 21 17 7 37 28 13 20
58 28 22 18 6 37 35 35 25 21 18 15 38 27 18 15
60 4 32 28 15 47 15 40 30 26 23 20 43 32 23 20
71 20 21 25 35 4 10 13 24 27 27 30 12 20 26 25
76 8 9 13 23 26 16 11 16 15 15 18 18 14 17 19
78 9 3 7 17 29 27 22 6 9 9 12 27 12 11 13
80 22 16 3 13 35 33 29 15 9 5 8 32 21 12 9
95 18 21 19 29 21 6 5 14 17 19 22 11 13 19 21
97 0 3 13 23 30 18 13 6 9 11 14 20 12 31 14
98 6 0 10 20 32 24 19 3 6 8 11 24 9 8 11
100 21 15 0 10 39 31 26 12 6 2 5 29 18 9 6
103 28 22 15 0 44 34 29 19 15 12 9 11 21 12 9
121 20 23 28 36 0 8 11 21 24 24 27 8 16 22 21
124 12 15 23 33 15 0 3 16 19 21 24 3 12 19 19
125 11 14 22 31 20 5 0 13 16 18 21 9 9 16 16
128 9 3 9 18 33 23 18 0 3 5 8 21 6 5 8
129 9 6 6 15 24 19 16 3 0 4 9 17 9 3 6
130 19 13 5 13 39 29 24 10 4 0 3 27 16 7 4
131 14 18 9 10 41 32 27 15 9 5 0 30 19 10 7
156 15 17 22 29 12 6 4 14 17 17 20 0 8 15 14
159 15 9 15 24 27 17 12 6 9 11 14 15 0 7 10
161 16 10 9 18 36 26 21 7 3 5 8 24 9 0 4
162 19 13 8 15 35 25 20 10 6 3 6 23 12 16 0
28
Appendix IV
Rider Preliminary Cost Secondary Cost
tour k1tour k2tour k1tour k2
r1M M 21 28
r2M 0
r30 M
r4M M 21 28
r5M M 19 23
r6M M 26 20
r70 M
r8M M 31 28
r9M M 32 30
r10 M 0
r11 M M 20 16
r12 M M 21 37
r13 M M 23 21
r14 M 0
r15 0 M
r16 M M 10 13
r17 M M 30 11
r18 M M 25 30
r19 M M 14 9
r20 M M 17 19
Table 11: The costs between trips and tours for the illustrative example
29
References
[1] NYC.gov. The New York City Taxi dataset. (accessed Dec. 15, 2020). url:https://www1.
nyc.gov/site/tlc/about/tlc-trip-record-data.page.
[2] Flinc. Germany. 2011 (accessed Dec. 15, 2020). url:https://flinc.org/.
[3] Ville Fluide. France. 2011 (accessed Dec. 15, 2020). url:http://www.villefluide.fr/.
[4] Lyft. United States. 2012 (accessed Dec. 15, 2020). url:https://www.lyft.com/.
[5] Carticipate. the United States. 2008 (accessed Dec. 15, 2020). url:https : / / www .
carticipate.com/.
[6] Uber. the United States. 2009 (accessed Dec. 15, 2020). url:https://www.uber.com/.
[7] Brian W Kernighan and Shen Lin. “An efficient heuristic procedure for partitioning graphs”.
In: The Bell system technical journal (1970), pp. 291–307.
[8] C Fiduccia and R Mattheyses. “A linear-time heuristic for improving network partitions”. In:
Proceedings of the 19th Design Automation Conference (1982), pp. 175–181.
[9] S. P. Lloyd. “Least squares quantization in PCM”. In: IEEE Trans 28.2 (1982), pp. 129–137.
[10] Stefan E Karisch, Franz Rendl, and Jens Clausen. “Solving graph bisection problems with
semidefinite programming”. In: INFORMS Journal on Computing (2000), pp. 177–191.
[11] Jean-Fran¸cois Cordeau and Gilbert Laporte. “The dial-a-ride problem (DARP): Variants,
modeling issues and algorithms”. In: Quarterly Journal of the Belgian, French and Italian
Operations Research Societies 1.2 (2003), pp. 89–101.
[12] Roberto Baldacci and Maniezzo Aristide Mingozzi. “An Exact Method for the Car Pooling
Problem Based on Lagrangean Column Generation”. In: Operations Research (2004),
pp. 422–439.
[13] Konstantin Andreev and Harald Racke. “Balanced graph partitioning”. In: Theory of
Computing Systems (2006), pp. 929–939.
[14] Leo Grady and Eric L Schwartz. “Isoperimetric graph partitioning for image segmentation”.
In: IEEE transactions on pattern analysis and machine intelligence (2006), pp. 469–475.
[15] Stephan Winter and Silvia Nittel. “Ad hoc shared-ride trip planning by mobile geosensor
networks”. In: International Journal of Geographical Information Science (2006), pp. 899–916.
[16] Jean-Fran¸cois Cordeau and Gilbert Laporte. “The dial-a-ride problem: models and
algorithms”. In: Annals of operations research 153.1 (2007), pp. 29–46.
[17] Martijn Mes, Matthieu Van Der Heijden, and Aart Van Harten. “Comparison of agent-based
scheduling to look-ahead heuristics for real-time transportation problems”. In: European
Journal of Operational Research (2007), pp. 59–75.
[18] David A Hensher. “Climate change, enhanced greenhouse gas emissions and passenger
transport–What can we do to make a difference?” In: Transportation Research Part D:
Transport and Environment (2008), pp. 95–111.
[19] Pavel Senin. “Dynamic time warping algorithm review”. In: Information and Computer
Science Department University of Hawaii at Manoa Honolulu, USA (2008), p. 40.
[20] Cristi´an E. Cort´es, Mart´ın Matamala, and Claudio Contardo. “The pickup and delivery
problem with transfers: Formulation and a branch-and-cut solution method”. In: European
Journal of Operational Research (2010), pp. 711–724.
30
[21] Santo Fortunato. “Community detection in graphs”. In: Physics reports (2010), pp. 75–174.
[22] Tim Kieritz et al. “Distributed Time-Dependent Contraction Hierarchies”. In: Experimental
Algorithms (2010), pp. 83–93.
[23] Niels Agatz et al. “Dynamic ride-sharing: A simulation study in metro Atlanta”. In:
Transportation Research Part B Methodological (2011), pp. 1450–1464.
[24] Shumo Chu and James Cheng. “Triangle listing in massive networks and its applications”.
In: Acm Sigkdd International Conference on Knowledge Discovery Data Mining. 2011.
[25] Wesam Herbawi and Michael Weber. “Evolutionary Multiobjective Route Planning
in Dynamic Multi-hop Ridesharing”. In: Evolutionary Computation in Combinatorial
Optimization. Springer Berlin Heidelberg, 2011, pp. 84–95.
[26] Niels Agatz et al. “Optimization for dynamic ride-sharing: A review”. In: European Journal
of Operational Research (2012), pp. 295–303.
[27] Nelson D Chan and Susan A Shaheen. “Ridesharing in North America: Past, present, and
future”. In: Transport reviews (2012), pp. 93–112.
[28] Daniel Delling and Renato Werneck. “Better Bounds for Graph Bisection”. In: Algorithms–
ESA 2012 (2012), pp. 407–418.
[29] Keivan Ghoseiri. “Dynamic Rideshare Optimized Matching problem”. In: Dissertations
Theses - Gradworks (2012).
[30] Wesam Herbawi and Michael Weber. “A Genetic and Insertion Heuristic Algorithm for
Solving the Dynamic Ridematching Problem with Time Windows”. In: GECCO’12 -
Proceedings of the 14th International Conference on Genetic and Evolutionary Computation
(2012).
[31] E. G. Boman, K. D. Devine, and S. Rajamanickam. “Scalable matrix computations on large
scale-free graphs using 2D graph partitioning”. In: SC ’13: Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Analysis. 2013,
pp. 1–12.
[32] A. Di Febbraro, E. Gattorna, and N. Sacco. “Optimization of Dynamic Ridesharing Systems”.
In: Transportation Research Record Journal of the Transportation Research Board (2013),
pp. 44–50.
[33] F. Drews and D. Luxen. “Multi-hop ride sharing”. In: Proceedings of the 6th Annual
Symposium on Combinatorial Search, SoCS 2013 (2013), pp. 71–79.
[34] Masabumi Furuhata et al. “Ridesharing: The state-of-the-art and future directions”. In:
Transportation Research Part B Methodological (2013), pp. 28–46.
[35] S. Ma, Z. Yu, and O. Wolfson. “T-share: A large-scale dynamic taxi ridesharing service”. In:
IEEE International Conference on Data Engineering. 2013.
[36] Bo Peng, Lei Zhang, and David Zhang. “A survey of graph theoretical approaches to image
segmentation”. In: Pattern recognition (2013), pp. 1020–1038.
[37] Peter Sanders and Christian Schulz. Engineering Multilevel Graph Partitioning Algorithms.
Springer Berlin Heidelberg, 2013.
[38] Dennis Luxen and Dennis Schieferdecker. “Candidate sets for alternative routes in road
networks”. In: Journal of Experimental Algorithmics (JEA) (2015), pp. 1–28.
31
[39] Dominik Pelzer et al. “A Partition-Based Match Making Algorithm for Dynamic
Ridesharing”. In: IEEE Transactions on Intelligent Transportation Systems (2015),
pp. 2587–2598.
[40] Douglas O Santos and Eduardo C Xavier. “Taxi and ride sharing: A dynamic dial-a-ride
problem with money as an incentive”. In: Expert Systems with Applications 42.19 (2015),
pp. 6728–6737.
[41] Mitja Stiglic et al. “The benefits of meeting points in ride-sharing systems”. In: Transportation
Research Part B (2015).
[42] Aydin Buluc et al. Recent Advances in Graph Partitioning. Springer International Publishing,
2016.
[43] E. Jafari et al. “The For-Profit Dial-a-Ride Problem on Dynamic Networks”. In:
Transportation Research Board 96th Annual Meeting. 2016.
[44] Mehdi Nourinejad and Matthew J. Roorda. “Agent based model for dynamic ridesharing”.
In: Transportation Research Part C (2016), pp. 117–132.
[45] Robert Regue, Neda Masoud, and Will Recker. “Car2work: Shared Mobility Concept to
Connect Commuters with Workplaces”. In: Transportation Research Record: Journal of the
Transportation Research Board 2542 (Jan. 2016), pp. 102–110. doi:10.3141/2542-12.
[46] Yaroslav Akhremtsev, Peter Sanders, and Christian Schulz. “High-Quality Shared-Memory
Graph Partitioning”. In: IEEE Transactions on Parallel and Distributed Systems (2017).
[47] Javier Alonso-Mora et al. “On-demand high-capacity ride-sharing via dynamic trip-vehicle
assignment”. In: Proceedings of the National Academy of Sciences 114.3 (2017), pp. 462–467.
[48] Roger Lloret-Batlle, Neda Masoud, and Daisik Nam. “Peer-to-peer ridesharing with ride-back
on high-occupancy-vehicle lanes: Toward a practical alternative mode for daily commuting”.
In: Transportation Research Record 2668.1 (2017), pp. 21–28.
[49] Neda Masoud and R Jayakrishnan. “A Decomposition Algorithm to Solve the Multi-Hop
Peer-to-Peer Ride-Matching Problem”. In: Transportation Research Part B Methodological
(2017), pp. 1–29.
[50] Neda Masoud and R Jayakrishnan. “A real-time algorithm to solve the peer-to-peer ride-
matching problem in a flexible ridesharing system”. In: Transportation Research part B:
Methodological 106 (2017), pp. 218–236.
[51] Neda Masoud and R Jayakrishnan. “Autonomous or driver-less vehicles: Implementation
strategies and operational concerns”. In: Transportation research part E: logistics and
transportation review 108 (2017), pp. 179–194.
[52] Neda Masoud et al. “Promoting Peer-to-Peer Ridesharing Services as Transit System
Feeders”. In: Transportation Research Record: Journal of the Transportation Research Board
(2017), pp. 74–83.
[53] Ali Najmi, David Rey, and Taha H. Rashidi. “Novel dynamic formulations for real-time ride-
sharing systems”. In: Transportation Research Part E: Logistics and Transportation Review
(2017), pp. 122–140.
[54] Sin C Ho et al. “A survey of dial-a-ride problems: Literature review and recent developments”.
In: Transportation Research Part B: Methodological 111 (2018), pp. 395–421.
32
[55] Ruimin Li, Zhiyong Liu, and Ruibo Zhang. “Studying the benefits of carpooling in an
urban area using automatic vehicle identification data”. In: Transportation Research Part
C: Emerging Technologies (2018), pp. 367–380.
[56] Daisik Nam et al. “Designing a Transit-Feeder System using Multiple Sustainable Modes:
Peer-to-Peer (P2P) Ridesharing, Bike Sharing, and Walking”. In: Transportation Research
Record Journal of the Transportation Research Board (2018).
[57] Hai Wang and Hai Yang. “Ridesourcing systems: A framework and review”. In: Transportation
Research Part B: Methodological 129 (2019), pp. 122–155.
[58] Jayita Chakraborty et al. “A review of Ride-Matching strategies for Ridesourcing and other
similar services”. In: Transport Reviews (2020), pp. 1–22.
[59] Amirmahdi Tafreshian and Neda Masoud. “Trip-based graph partitioning in dynamic
ridesharing”. In: Transportation Research Part C: Emerging Technologies (2020), pp. 532–553.
[60] Amirmahdi Tafreshian and Neda Masoud. “Using subsidies to stabilize peer-to-peer
ridesharing markets with role assignment”. In: Transportation Research Part C: Emerging
Technologies 120 (2020).
[61] Amirmahdi Tafreshian, Neda Masoud, and Yafeng Yin. “Frontiers in Service Science: Ride
Matching for Peer-to-Peer Ride Sharing: A Review and Future Directions”. In: Service Science
12.2-3 (2020), pp. 44–60.
[62] Zhenhao Zhang, Amirmahdi Tafreshian, and Neda Masoud. “Modular transit: Using
autonomy and modularity to improve performance in public transportation”. In:
Transportation Research Part E: Logistics and Transportation Review 141 (2020), p. 102033.
[63] Amirmahdi Tafreshian et al. “Proactive shuttle dispatching in large-scale dynamic dial-a-ride
systems”. In: Transportation Research Part B: Methodological 150 (2021), pp. 227–259.
[64] Xingbin Zhan et al. “A modified artificial bee colony algorithm for the dynamic ride-hailing
sharing problem”. In: Transportation Research Part E: Logistics and Transportation Review
150 (2021), p. 102124.
33
... Another strand of relevant literature examines operations and pricing for shared transportation platforms. While some work assumes that drivers follow platform recommendations (e.g., Bertsimas et al. (2019), Alonso-Mora et al. (2017), Braverman et al. (2019, Wen et al. (2017), Lyu et al. (2023), Zhang and Masoud (2021), Guo et al. (2021Guo et al. ( , 2022 and Lei et al. (2018)), other studies assume Bayes-rational freelance drivers. For example, Bai et al. (2019) proposes a queuing model to optimize platforms' profit while considering price-sensitive customers and earnings-sensitive drivers. ...
... Zhu et al. (2021) assumes that drivers use a Markov decision process to choose work and relocation decisions to maximize their long-term earnings. Other example of labor supply models are Cachon et al. (2017), Zha et al. (2017), Yan et al. (2020), Ke et al. (2020), Bimpikis et al. (2019), Xu et al. (2020), Yang et al. (2020), Urata et al. (2021), Dong et al. (2021), Zhang and Nie (2021), Bahrami et al. (2022) and Battifarano and Qian (2019). ...
Article
Freelance drivers in the shared mobility market frequently switch or work for multiple platforms, affecting driver labor supply. Due to the importance of driver labor supply for the shared mobility market, understanding drivers' switching and multi-homing behavior is vital to managing service quality on-and effective regulation of-mobility platforms. However, a lack of individual-level data on driver behavior has thus far impeded a deeper understanding. This paper taxonomizes and estimates perceived switching and multi-homing frictions on mobility platforms. Based on a structural model of driver labor supply, we estimate switching and multi-homing costs in a platform duopoly using public and limited high-level survey data. Estimated costs are sizeable, and reductions in multi-homing and switching costs significantly affect platform market shares and driver welfare. Driver labor supply elasticity with respect to platform wage is also discussed considering both multi-homing and switching frictions.
... It is a critical step to construct a large-scale computing standard platform. With the widespread application of localization technology, spatialtemporal mode manifested through passenger trajectories is extensively utilized in travel demands clustering [38,56,57]. Depending on the spatial-temporal information, passengers are segregated into k independent subsets as illustrated in Figure 6. ...
Article
Full-text available
Demand responsive transit (DRT) with app‐based reservation platforms is experiencing a renaissance to bring the tremendous potential for mobility in the urban universe. Nevertheless, how to attract and retain passengers for long‐term use has become one of the most significant problems. The decision‐making psychology of passengers is often overlooked but incredibly critical in the practical applicability of DRT services. This paper proposes a more flexible DRT service with soft time windows considering boundedly rational passengers. A compensation mechanism is developed to alleviate the dissatisfaction of passengers while considerably promoting the system efficiency. A two‐stage model is designed to incorporate bounded rationality into the optimization process of mixed demand, including the static phase for reservation passengers and the dynamic phase for real‐time passengers. To enhance the computational efficiency, a hybrid heuristic algorithm combining spatiotemporal clustering and non‐dominated sorting genetic algorithm (NSGA)‐II is constructed to obtain the Pareto solutions set. An illustrative example of the Nguyen–Dupuis network is presented to demonstrate the validity of the algorithm. Subsequently, a large‐scale case study in Beijing evaluated the applicability of DRT in the real‐world network. The results reveal that dynamic DRT with compensation mechanism can substantially improve the system performance while ensuring the service quality. The response rate of passengers has been dramatically promoted to 80%. The operating profit has been enormously improved by up to 73%. Therefore, this study is radically conducive to understanding the passenger's decision‐making psychology while constructing a more cost‐efficient flexible strategy for the service provider.
... In recent years, many researchers have studied different forms of P2P ridesharing to enhance the performance of these services (see e.g., Tafreshian et al. (2020) for a recent review of these studies). However, the overwhelming majority of these studies assumed systems with complete information and focused merely on improving the operations (routing, scheduling, and matching) of these services (see e.g., Tafreshian and Masoud (2020a); Zhang and Masoud (2021); Masoud et al. (2017b); Masoud and Jayakrishnan (2017)). Only a few studies considered P2P ridesharing systems as a market in which the information of users' trips are unknown and need to be solicited from the participants upon registering their trips into the system. ...
Article
Full-text available
Traffic congestion during peak periods has become a serious issue around the globe, mainly due to the high number of single-occupancy commuter trips. Peer-to-peer (P2P) ridesharing platforms can present a suitable alternative for serving commuter trips. However, they face a major obstacle that prevents them from being a viable mode of transportation in practice: ridesharing users often provide tight time windows, which ultimately leads to a low matching rate. This study addresses this issue by introducing a subsidy scheme that allocates incentives to encourage a few carefully selected set of travelers to change their desired departure or arrival times, and thereby form successful matches. In order to implement this scheme for a ridesharing platform in the existence of private information, we propose an auction-based mechanism that guarantees truthfulness, individual rationality, budget-balance, and computational efficiency. Using numerical experiments, we show the merits of the proposed subsidy scheme when compared to its no-deficit variant as well as the conventional VCG mechanism.
Article
Full-text available
The problem of dispatching shuttles to serve trip requests can be mathematically formulated as a dial-a-ride problem (DARP). With on-demand mobility services gaining more popularity due to the recent developments in the gig economy, communication technologies, and urbanization, the real-time application of DARP is attracting ever more interest. However, the fact that the size of DARP grows exponentially with number of requests and number of available seats renders the current solution methodologies inadequate for online applications. In order to tackle this issue, we propose a general framework that shifts much of the computational burden of the optimization problems that need to be solved into an offline phase, thereby addressing on-demand requests with fast and high-quality solutions in real time. Using numerical experiments, we demonstrate the benefits of the proposed method. Furthermore, we conduct a sensitivity analyses to show the performance of our methodology under different parameter settings.
Article
Full-text available
Ridesourcing services have emerged as a popular alternative for commuters in metropolitan areas. There is a significant spatio-temporal variation of demand and supply for such services, which requires efficient ride-matching strategies to ensure optimal allocation of trips to drivers and users. This paper reviews different ride-matching techniques/strategies that highlight the outlook of different stakeholders, such as, drivers, users, and service providers and summarises the impacts of the matching process on the ideologies of the stakeholders. The review found that searching techniques guide the primary stakeholders like riders and drivers, and the assignment techniques ensure trip allotment. We also observed that fleet size is an important attribute to ensure availability as well as the assignment of ridesourcing services in an urban area.
Article
Full-text available
Peer-to-peer (P2P) ridesharing is a form of shared-use mobility that has emerged in recent decades as a result of enabling of the sharing economy, and the advancement of new technologies that allow for easy and fast communication between individuals. A P2P ridesharing system provides a platform to match a group of drivers, who use their personal vehicles to travel, with their peer riders who are in need of transportation. P2P ridesharing systems are traditionally categorized as two-side markets, with two mutually exclusive sets of agents, i.e., riders and drivers. Fixing the roles of participants a priori, however, could come at an opportunity cost of missed social welfare/revenue for the system. Consequently, this paper proposes a new market game, and its corresponding mathematical formulation, that outputs matching, role assignment, and pricing. We investigate the stability properties of this market game, and present a mathematical formulation that yields a stable outcome if one exists. Furthermore, we propose a Lagrangian relaxation algorithm to obtain a stable solution for large-scale games with empty cores through subsidizing the system. Using numerical experiments, we demonstrate the benefits of the proposed methodology, and its advantages over previously proposed methods for stabilizing non-bipartite graphs.
Article
Full-text available
As a consequence of the sharing economy attaining more popularity, there has been a shift toward shared-use mobility services in recent years, especially those that encourage users to share their personal vehicles with others. To date, different variants of these services have been proposed that call for different settings and give rise to different research questions. Peer-to-peer (P2P) ridesharing is one such service that provides a platform for drivers to share their personal trips with riders who have similar itineraries. Unlike ride-sourcing services, drivers in P2P ridesharing have their own individual trips to make, and are not driving for the sole purpose of serving rider requests. Unlike traditional carpooling, P2P ridesharing can serve on-demand and one-time trip requests. P2P ridesharing has been identified as a sustainable mode of transportation that results in several individual and societal benefits. The core of a P2P ridesharing system is a ride-matching problem that determines ridesharing plans for users. This paper reviews the major studies on the operations of P2P ridesharing systems, with a focus on modeling and solution methodologies for matching, routing, and scheduling. In this paper, we classify ridesharing systems based on their operational features, and review the existing methodologies for each class. We further discuss a number of important directions for future research.
Article
Full-text available
In this paper we investigate a new form of automated public transportation, named 'modular transit', configured to overcome the shortcomings of the traditional bus, including the first-and last-mile problem, low occupancy, and low levels of comfort, accessibility, and flexibility. The modular transit system consists of a set of trailer modules who can travel locally to serve demand and to connect travelers to main modules for long-distance trips. We mathematically model this system on a time-expanded network, thereby reducing the size of the optimization problem and rendering the problem amenable to being solved with commercial optimization engines. We conduct extensive numerical experiments and sensitivity analyses to study the performance of modular buses under various configurations. Finally, we compare the modular transit service with a door-to-door shuttle service as benchmark to showcase the benefits of modular transit.
Article
Full-text available
A dynamic ridesharing system is a platform that connects drivers who use their personal vehicles to travel with riders who are in need of transportation, on a short notice. Since each driver/rider may have several potential matches, to achieve a high performance level, the ridesharing operator needs to make the matching decisions based on a global view of the system that includes all active riders and drivers. Consequently, the ride-matching problem that needs to be solved can become computationally expensive, especially when the system is operating over a large region, or when it faces high demand levels during certain hours of the day. This paper develops a graph partitioning methodology based on the bipartite graph that arises in the one-to-one ride-matching problem. The proposed method decomposes the original graph into multiple sub-graphs with the goal of reducing the overall computational complexity of the problem as well as providing high quality solutions. We further show that this methodology can be extended to more complex ride-matching problems in a dynamic ride-sharing system. Using numerical experiments, we showcase the advantages of the new partitioning method for different forms of ride-matching problems. Moreover, a sensitivity analysis is conducted to show the impact of different parameters on the quality of our solution.
Article
Full-text available
Peer-to-peer (P2P) ridesharing is a relatively new concept that aims to provide a sustainable method for transportation in urban areas. Previous studies have demonstrated that a system that incorporates both P2P ridesharing and transit would enhance mobility. We develop schemes to provide travel alternatives, routes and information across multiple modes, which includes P2P ridesharing, transit, city bike-sharing and walking, within the network. This study includes a case study of the operation of the multimodal system that includes P2P ridesharing participants (both drivers and riders), the Los Angeles Metro Red line subway rail, and the Los Angeles downtown bike-share system. The study conducts a simulation, enhanced by an optimization layer, of providing travel alternatives to passengers during morning peak hours. The results indicate that a multi-modal network expands the coverage of public transit, and that ride- and bike-sharing could be effective transit feeders when properly designed and integrated into the transit system.
Article
Ride-hailing sharing involves grouping ride-hailing customers with similar trips and time schedules to share the same ride-hailing vehicle to reduce their total travel cost. With the current information and communication technology, ride-hailing customers and drivers can be matched in real-time via a ride-hailing platform. This paper formulates a dynamic ride-hailing sharing problem that simultaneously maximizes the number of served customers, minimizes the travel cost and travel time ratios, and considers the capacity, time window, and travel cost constraints. The travel cost ratio is the ratio of actual passengers’ fare to the passengers’ fare without ride-hailing sharing, whereas the travel time ratio is defined as the actual travel time (including waiting time) over the maximum allowable travel time. To solve the dynamic problem, it is divided into many small and continuous static subproblems with an equal time interval. Each subproblem is solved by a modified artificial bee colony (MABC) algorithm with path relinking, while the contraction hierarchies and vantage point tree are used to determine the shortest path and accelerate the algorithm, respectively. Problem properties and the performance of the proposed solution method are demonstrated using large-scale real-time data from Didi that is the largest ride-hailing company in China. The proposed method is shown to outperform the benchmark, i.e., greedy randomized adaptive search procedure (GRASP) with path relinking. The proposed method also performs better when the length of each time interval is longer, and the tolerance for the incremental travel time caused by detours is higher. We also demonstrate that (a) considering both travel cost and travel time ratios in the objective can achieve a better sharing percentage, and balance the increase in the travel time ratio and the decrease in the travel cost ratio compared with the objective that misses either travel time or the travel cost ratio; and (b) the passengers can gain a large out-of-pocket cost saving in the case of ride-hailing sharing while enduring a relatively small increase in travel time compared with the case without ride-hailing sharing.
Article
With the rapid development and popularization of mobile and wireless communication technologies, ridesourcing companies have been able to leverage internet-based platforms to operate e-hailing services in many cities around the world. These companies connect passengers and drivers in real time and are disruptively changing the transportation industry. As pioneers in a general sharing economy context, ridesourcing shared transportation platforms consist of a typical two-sided market. On the demand side, passengers are sensitive to the price and quality of the service. On the supply side, drivers, as freelancers, make working decisions flexibly based on their income from the platform and many other factors. Diverse variables and factors in the system are strongly endogenous and interactively dependent. How to design and operate ridesourcing systems is vital—and challenging—for all stakeholders: passengers/users, drivers/service providers, platforms, policy makers, and the general public. In this paper, we propose a general framework to describe ridesourcing systems. This framework can aid understanding of the interactions between endogenous and exogenous variables, their changes in response to platforms’ operational strategies and decisions, multiple system objectives, and market equilibria in a dynamic manner. Under the proposed general framework, we summarize important research problems and the corresponding methodologies that have been and are being developed and implemented to address these problems. We conduct a comprehensive review of the literature on these problems in different areas from diverse perspectives, including (1) demand and pricing, (2) supply and incentives, (3) platform operations, and (4) competition, impacts, and regulations. The proposed framework and the review also suggest many avenues requiring future research.
Article
Carpooling has been considered a solution for alleviating traffic congestion and reducing air pollution in cities. However, the quantification of the benefits of large-scale carpooling in urban areas remains a challenge due to insufficient travel trajectory data. In this study, a trajectory reconstruction method is proposed to capture vehicle trajectories based on citywide license plate recognition (LPR) data. Then, the prospects of large-scale carpooling in an urban area under two scenarios, namely, all vehicle travel demands under real-time carpooling condition and commuter vehicle travel demands under long-term carpooling condition, are evaluated by solving an integer programming model based on an updated longest common subsequence (LCS) algorithm. A maximum weight non-bipartite matching algorithm is introduced to find the optimal solution for the proposed model. Finally, road network trip volume reduction and travel speed improvement are estimated to measure the traffic benefits attributed to carpooling. This study is applied to a dataset that contains millions of LPR data recorded in Langfang, China for 1 week. Results demonstrate that under the real-time carpooling condition, the total trip volumes for different carpooling comfort levels decrease by 32–49%, and the peak-hour travel speeds on most road segments increase by 5–40%. The long-term carpooling relationship among commuter vehicles can reduce commuter trips by an average of 30% and 24% in the morning and evening peak hours, respectively, during workdays. This study shows the application potential and promotes the development of this vehicle travel mode.