Content uploaded by Neda Masoud

Author content

All content in this area was uploaded by Neda Masoud on Sep 20, 2021

Content may be subject to copyright.

A Distributed Algorithm for Operating Large-Scale Ridesourcing

Systems

September 20, 2021

Ruolin Zhang1, Neda Masoud1

1Civil and Environmental Engineering, University of Michigan Ann Arbor

Corresponding Author – Email: nmasoud@umich.edu

Abstract

With ridesourcing services gaining popularity in the past few years, there has been growing interest

in algorithms that could enable real-time operation of these systems. As ridesourcing systems rely

on independent entities to build the supply and demand sides of the market, they have been shown

to operate more successfully in metropolitan areas where there is a high level of demand for rides

as well as a high number of drivers, and a large volume of trips occurring within a geographically

constrained region. Despite the suitable ecosystem that metropolitan areas oﬀer for ridesourcing

operations, there is a lack of methods that can provide high-quality matching solutions in real-time.

To ﬁll this gap, this paper introduces a framework that allows for solving the large-scale matching

problems by means of solving smaller problems in a distributed fashion. The proposed methodology

is based on constructing approximately-uniform clusters of trip requests, where vehicle tours form

cluster centers. Using the New York Taxi dataset, we compare the performance of the proposed

methodology against three benchmark methods to showcase its advantages in terms of solution

quality and solution time.

1 Introduction

In recent years, population and economic growth have led to the formation of traﬃc jams in

metropolitan areas, with direct inﬂuence on pollution of exhaust emissions and increasing travel

time and cost. Single-occupancy vehicles are a major source for generating carbon dioxide emissions

[18]–a problem that is exacerbated due to congestion. However, scaling-up the infrastructure to

meet the growing demand is constrained and costly. Therefore, seeking solutions to increase the

utilization rate of the existing transportation infrastructure has been the focus of extensive research

in the past decade.

A number of alternative modes of transportation have been introduced to expand the utilization

rate of the existing transportation infrastructure. Public transportation is a traditional means to

reduce the number of single-occupancy vehicles. Public transportation systems, such as buses and

rail systems, are generally regulated on a ﬁxed schedule and operated on established routes, and

charge a posted fee for each trip. Although having a ﬁxed schedule and route could lead to oﬀering

more reliable services, the limited operational ﬂexibility of public transportation services leads to

more constrained coverage, both spatially and temporally. This has led to a growing interest in

shared mobility options, which introduce more ﬂexibility and comfort compared to ﬁxed public

1

transportation options, but oﬀer discounted prices compared to taxis and other private modes of

transportation.

Technological advancements such as GPS-enabled smart personal devices, online payment

systems and big data together with a global quest for environmentally-friendly and cost-eﬃcient

mobility options have led to the emergence of a signiﬁcant number of internet-based companies

around the globe that oﬀer ride-sharing and ridesourcing services to satisfy on-demand requests.

Examples ([27]) include Flinc [2], Ville Fluide [3], Carticipate [5], Uber [6], and Lyft [4]. Beneﬁts of

ride-sharing consist of saving travel cost and possibly travel time for drivers and riders, alleviating

traﬃc congestion, conserving fuel, and mitigating air pollution. According to the National

Household Travel Survey (NHTS), which is the authoritative source reporting on the travel behavior

of the American public, the average light vehicle occupancy (the number of travelers per vehicle trip)

is relatively low–1.67 in 2017, unchanged from 2009. Therefore, ride-sharing services have great

potential for development. As such, devising algorithmic tools for real-time matching of drivers

and riders in a ride-sharing system, or ridesourcing with pooling, also known as the ride-matching

problem, is an important and timely topic of research.

This paper introduces a methodology for eﬃciently solving the one-to-many ride-matching

problem, in which a driver can carry multiple passengers at once or in sequence. More speciﬁcally,

we introduce a clustering method to decompose the problem into multiple sub-problems such

that sub-problems can be optimized independently of each other and in parallel.The proposed

method guarantees that the sizes of the sub-problems remain approximately uniform, since the

computational complexity of the ride-matching problem grows exponentially with the size of the

problem. We use the New York City Taxi dataset [1] to perform numerical experiments. To evaluate

the performance of our proposed methodology, we compare the results with the optimal solution as

well as three partitioning methods from the literature, namely point-based, balanced point-based,

and trip-based partitioning. We also conduct sensitivity analysis to test the impact of the degree

of uniformity in the size of sub-problems and the number of clusters on the computation time and

the objective function.

2 Literature Review

In this section we will review the relevant studies from the graph partitioning and ridesharing

literature.

2.1 Graph Partitioning

When modeling problems in diﬀerent application domains, researchers often use graphs as

abstractions [42]. Splitting a graph into smaller sub-graphs is one of the basic algorithmic operations

that allows for solving a large-scale problem by means of solving smaller problems that correspond to

sub-graphs, in a distributed fashion. In the past decade, graph partitioning has gained increasingly

higher popularity due to the emergence of larger problem instances in various application domains.

Applications of graph partitioning in practice can be found in parallel processing [24,31], complex

networks [21], transportation networks [38,22], and image processing [14,36], among others.

A graph can be represented by a set of vertices and weighted edges. A graph partitioning

problem seeks to partition vertices and/or edges into diﬀerent sub-graphs. In a number of

application domains, balancing constraints are also imposed during graph portioning to ensure

that all clusters have (approximately) equal weights, number of edges, or number of vertices. An

imbalance parameter εcan be used to impose the balancing constraint.

Graph partitioning can be formulated to optimize diﬀerent objective functions, which reﬂect

diﬀerent objectives of graph partitioning for diﬀerent application domains. The most prominent

objective function is to minimize the total cut, typically quantiﬁed by the total weight of removed

2

edges form the original graph. It has been shown that the problem of dividing a graph into k

clusters with approximately equal size to minimize an objective function is NP-complete [14]. [13]

shows that on a general graph, a perfectly balanced partitioning (ε= 0) has no constant-factor

approximation. If ε∈(0,1], a O(log2n) factor approximation can be achieved.

There are a large number of methods to solve the graph partitioning problem. They can be

divided into two groups, namely, global algorithms, and local algorithms [42]. Global algorithms are

methods that apply to the entire graph and directly obtain the solution. This family of algorithms

could include exact methods [28,50,10,45,52,48] and heuristic algorithms [7,56,62,51]. These

algorithms are usually used for smaller graphs. Many of these methods are limited to bi-partitioning

but can also be applied to k-clustering by recursion [42]. Local algorithms, on the other hand, are

based a starting solution, where this starting solution is iteratively improved. Examples of this

family of algorithms include local search [7,8] and ﬂow-based improvement [37,46].

Many applications in the transportation domain can be modeled using graphs, rendering graph

partitioning an important tool, especially due to the higher penetration rate of shared mobility as

well as emergence of connectivity that lead to higher complexity of transportation problems. In this

paper, we propose a methodology that utilizes approximately-uniform graph partitioning/clustering

for the real-time ride-matching problem. We formulate the problem as a clustering problem that

assigns trips to clusters such that a total cost is minimized, where we use vehicle tours as cluster

representatives, and impose uniformity constraints on clusters.

2.2 Dynamic Ride-sharing

According to the 2018 Global Traﬃc Scorecard, Americans lost an average of 97 hours a year due

to traﬃc congestion, costing nearly $87 billion in 2018, an average of $1,348 per driver. Increasing

the utilization rate of vehicles can be an eﬀective way to reduce vehicle-miles-traveled (VMT) and

improve traﬃc congestion.

Ride-sharing has garnered plenty of attention in recent years due to its eﬀectiveness in utilizing

empty car seats [34]. A ride-sharing system aims to bring together participants with compatible

routes and time schedules to share a vehicle. Here, we focus on a dynamic ride-sharing system,

where requests to participate in the system can be made at any point in time. Dynamic ride-sharing

concentrates on single, non-recurring trips, diﬀerentiating it from conventional carpooling [12,55],

which focuses on recurring trips. In such a system, participants can request a trip as riders or

provide rides as drivers. Participants input their requests, including the origins and destinations of

their trips and their trip timelines, and the operator of the system makes arrangements to match

drivers with riders on a short notice or even en-route. In a dynamic ride-sharing system, typically

a central ride-matching problem is solved periodically.

Ride-matching problems can have diﬀerent objectives, such as minimizing system-wide VMT

[63], maximizing the operator’s proﬁt [43], and maximizing the number of matched participants

[masoud2017decomposition], among others. When matching a driver with a rider, several

constraints must be considered. Many studies let a rider or driver provide their earliest departure

and latest arrival times, constructing a time window that constrains the matches [26]. In addition

to travel time, a number of other constraints may be imposed to satisfy a participant’s needs and

preferences [29].

A ride-sharing system can adopt one of several possible strategies when matching riders with

drivers: (1) a single rider matched with a single driver, (2) a single driver matched multiple riders

(i.e., pooling), (3) a single rider matched with multiple drivers (i.e., multi-hop matching), and (4)

multiple rider, multiple driver arrangements (i.e., pooling in a multi-hop system) [26,34]. Typically,

ride-matching algorithms are developed assuming that rider and driver roles are ﬁxed. However,

a number of studies have considered the more general but complex scenario where a portion of

3

participants are ﬂexible and can take any role that is assigned to them by the system [23,60].

A review of ride-matching algorithms for diﬀerent system conﬁgurations is shown in Table 1. A

comprehensive review of ride-matching methods can be found in [61].

The simplest form of ride-matching involves matching a single rider with precisely a single

driver, also known as one-to-one matching. [23] formulates the one-to-one ride-matching problem as

a maximum weight bipartite matching problem. They use the the optimization-based approaches

with a rolling horizon strategy to solve the ride-matching problem. [53] proposes a clustering

heuristic algorithm to solve the one-to-one matching problem. More complicated forms of the ride-

matching problems aim to increase the number of riders on board beyond a single rider to take

advantage of empty seats in a vehicle. [30] proposes a genetic and insertion heuristic algorithm to

solve the ride-marching problem in which a driver can serve more than one rider, also known as the

one-to-many ride-matching problem. [32] formulates the one-to-many ride-matching problem as a

mixed integer-linear programming problem that can be solved by commercial optimization engines.

[41] models the system as a maximum weight bipartite matching problem, where the number of

stops for a rider is limited to two. [47] presents a scalable mathematical formulation of the one-

to-many problem, where within multiple steps vehicles are matched with groups of passengers.

Through numerical experiments, they demonstrate that when re-purposed for ridesharing, only a

fraction of taxis in the New York Taxi dataset can serve almost the entire demand for rides.

The one-to-many matching problem, also known as the pooling problem, arises in systems

similar to ridesharing, such as taxi sharing and pooled variants of ridesourcing. To serve on-

demand requests by shared taxis, [35] ﬁrst prunes the search space by identifying candidate taxis

that can potentially serve a request, and then uses a scheduling algorithm to ﬁnd a match that

imposes the least added distance traveled. [40] proposes a greedy randomized adaptive search

procedure to operate a ridesharing/taxi sharing system with on-demand request, and demonstrates

using numerical experiments that pooling could reduce the cost of trips by about 30% compared

to private rides. [64] proposes an artiﬁcial bee colony algorithm for serving requests in a pooled

ridesourcing system, where the objective is to maximize the number of served ride requests while

at the same time minimizing travel time and cost ratios. Their numerical experiments demonstrate

that pooling in ridesourcing systems can lead to substantial cost savings with minimal increase in

travel time.

A number of studies allow a rider to transfer between multiple vehicles to complete his/her

trip, giving rise to multi-hop systems. [25] proposes an evolutionary multi-objective route planning

algorithm to solve a many-to-one ride-matching problem in which riders can transfer between

diﬀerent drivers, but a driver can transport a single passenger. [33] provides a solution to a

ride-matching problem with an arbitrary number of transfers that respect the users’ personal

preferences using a graph searching algorithm. Many-to-many matching problems are the most

complex problems that allow riders to transfer between diﬀerent drivers and, at the same time,

drivers to have more than one riders on board. [20] models the many-to-many ride-matching

problem and solve relatively small instances of the problem using a branch-and-cut method. [29]

proposes a spatial, temporal, and hierarchical decomposition solution strategy to solve the many-

to-many ride-matching problem, formulated by a mixed-integer problem. However, the number

of transfers is limited. [49] models the multi-hop ride-matching problem as a binary program and

propose a decomposition algorithm to solve this problem. A comprehensive review of ride-matching

problems for diﬀerent system conﬁgurations, including ride-sharing, Ridesourcing, taxi-sharing, and

demand-responsive services, can be found in [61,57,58]. For a comprehensive literature review on

the dial-a-ride problem, see [11,16,54].

4

Study Dynamic Conﬁguration No. of

participants

Computation

time (s)

Solution Methodology

[23]Xsingle rider,

single driver 4,933 78 Max-weight bipartite matching with

rolling horizon strategy

[53]Xsingle rider,

single driver 15,412 80.21 Clustering heuristic algorithm with rolling

horizon strategy

[30]Xsingle driver,

multiple riders 744 100 Genetic and insertion heuristic algorithms

[41] single driver,

multiple riders 2,849 150 Bipartite matching formulation with

meeting points

[25]Xmultiple drivers,

single rider 250 1.476 Evolutionary multi-objective

route planning problem solved by the

NSGA-II algorithm

[49] multiple drivers,

multiple riders 400 6 An exact decomposition algorithm

[44] all but multiple

drivers, multiple

riders

0-2000 Decentralized (dynamic auction-based

multi-agent) optimization algorithms

[39]Xsingle rider,

single driver 4,500 A partition-based match-making

algorithm

[59]Xsingle rider,

single driver 22,000 36.32 A graph partitioning methodology

This

paper

X

single driver,

multiple riders

20,000 42.31 Approximately-uniform clustering with

tours as cluster representative

Table 1: A review of ride-matching methods. (If the study focuses on dynamic ride-matching, the number listed under the No. of participants

column is the number of participants during a one-hour horizon.)

5

In major metropolitan areas where thousands of ride requests arrive dynamically, centralized

ride-matching methods may fall short in providing high-quality solutions in real-time. Consequently,

several attempts have been made in the literature to develop decentralized optimization schemes.

One approach to decentralize the decision making process is adopting agent-based models. [15]

considers the transportation network as a mobile geo-sensor network of agents that interact locally

by short-range communication and heuristic way-ﬁnding strategies. [17] introduces an agent-based

approach where intelligent vehicle agents schedule their routes. [44] sets up a single-shot ﬁrst-price

Vickrey auction where the virtual driver–passenger agents are matched.

Another approach to decentralized decision making is dividing the problem into multiple smaller

sub-problems that can be solved independently of each other. [39] partitions the road network

into distinct sub-networks that deﬁne the search space for ride matches. Recently, [53] proposed a

framework that embeds a network clustering heuristic within an on-demand ride-sharing system. To

address participants whose origins and destinations fall in diﬀerent clusters, they solve an additional

matching problem. We refer to this method as “point-based culstering”. In another recent work,

[59] proposed a trip-based graph partitioning method for dynamic ridesharing systems. In contrast

to [53], they used complete trips, rather than trip ends, to form clusters, leading to partitions that

may be geographically overlapping. Furthermore, they form clusters that are approximately uniform

in size to reduce solution time. In Section 7, we use three benchmark methods to evaluate the

performance of our proposed methodology: point-based clustering, balanced point-based clustering

in which uniformity constraints are imposed when forming clusters in point-based clustering, and

trip-based clustering.

2.3 Our Contributions

In this paper, we propose a framework to solve the one-to-many ride-matching problem that arises

in ride-sharing and Ridesourcing systems in a distributed fashion. Our proposed method forms

approximately-uniform clusters of trips, and assigns drivers to trip clusters proportional to the

cluster sizes. Diﬀerent from the existing literature that uses points or trips as cluster centers

during partitioning, we use vehicle tours as cluster representatives to capture the fact that ride

requests can be pooled and served by a single vehicle even when they do not share the same

origin, destination, or time window. This allows for obtaining higher-quality solutions for one-to-

many ridesharing systems or ridesourcing systems with pooling. The proposed method decomposes

the original problem into smaller sub-problems that are approximately uniform in size within a

threshold of ε. The value of εcaptures the trade-oﬀ between the complexity of solving the sub-

problems and the lost performance due to partitioning: While setting ε= 0 ensures all sub-problems

are uniform in size and therefore the solution time is minimized, to obtain such uniform partitioning

more potential matches are ignored, thereby aﬀecting the solution quality. Finally, we compare the

results of our proposed methodology with three benchmark methods, namely point-based, balanced

point-based, and trip-based partitioning, as well as the optimal solution.

As such, the contributions of this paper can be summarized as follows. This is the ﬁrst study to

propose an approximately-uniform clustering method with tours as cluster representatives, which

as demonstrated in the numerical experiments section, outperforms existing clustering approaches.

We develop an iterative procedure based on Lloyd’s algorithm [9] to ﬁnd approximately-uniform

clusters, and show that this clustering approach has favorable properties, such as monotonically

converging to a local optimal solution in a ﬁnite number of steps.

3 Problem statement

Consider a dynamic ride-sharing system that matches drivers with riders in a region over time.Let

us divide the study area into a set of stations S={s1, s2,· · · , sm}, where drivers and riders start

6

Sets

S={s}set of stations where drivers and riders start or ﬁnish their trips

D={1,· · · ,D} set of drivers

R={1,· · · ,R} set of riders/trips

P=R∪Dset of participants p

Rcset of riders whose trips do not lie on tours

K={1,· · · ,K} set of clusters

Tset of time intervals t

Lset of links of l= (ti, si, tj, sj)∈T×S×T×S∈L

Lpset of links that are accessible by participant p

Lrd set of links on which driver dcan serve rider r, where Lrd =Lr∪Ld

Lset of links where `(i, j)∈ L indicates rider jcan be served after rider i

Indices

r, i, j indices for rider/trip

dindex for driver/vehicle

pindex for participant (driver or rider)

kindex for cluster/tour

`= (i, j) link `exists if rider rjcan be served following rider ri

Parameters

sO

porigin station of participant p

sD

pdestination station of participant p

tED

pearliest departure time of participant p

tLA

platest arrival time of participant p

capdthe capacity of (the vehicle of) driver d(i.e., the maximum number of riders

allowed on board at one time)

Functions

T(si, sj) shortest-path travel time between station iand station j

c(r, k) the primary cost between rider trip rand tour k

d(r, k) the secondary cost between rider trip rand tour k

γ(d, k) the cost between driver dand tour k

Decision Variables

ωrd binary decision variable that holds the value 1 if rider ris matched with

driver d, and the value 0 otherwise

xl

dbinary decision variable that holds value 1 if driver dtravels on link l, and

value 0 otherwise

yl

rd binary decision variable that holds value 1 if rider ris served by driver don

link l, and value 0 otherwise

vij binary decision variable that holds the value 1 if link `ij is selected as a part

of the tour u, and value 0 otherwise

frk a binary variable that holds the value 1 if trip ris assigned to cluster k, and

value 0 otherwise

zdk binary decision variable that holds the value 1 if driver dis assigned to

cluster k, and the value 0 otherwise

Table 2: Table of Notation

7

or end their trips. Furthermore, we divide the time horizon, e.g., an hour, into a series of short

time intervals, e.g., 1 min. After this discretization in time and space, a travel time matrix Tcan

be used to retrieve the shortest path travel time between any pair of stations.

Let us consider a set of drivers Dand a set of riders Rin this system, and introduce set

P=R∪Dto include all participants. A rider r∈Rregisters her trip, including her origin station

sO

r, destination station sD

r, her earliest departure time tED

r, and her latest arrival time tLA

r. A

driver d∈Dregisters her origin station sO

dand her earliest departure time tED

d. After serving

a rider, the driver’s time and location will be updated to the rider’s drop-oﬀ time and location,

respectively. Without loss of generality, here we assume that drivers are available for the entirety

of the study time horizon.

To accommodate the inherent uncertainty present in a dynamic ride-sharing system, we adopt a

rolling horizon strategy, in which the system operator solves the ride-matching problem periodically,

at evenly-spaced points in time to which we refer as re-optimization times. The time period between

two consecutive re-optimization times is a re-optimization period.

At each re-optimization time n, we formulate a ride-matching problem that consists of all

announced trips that have not yet expired or ﬁnalized. An announced trip is considered expired if

its latest departure time, i.e., tLA

r−T(sO

r, sD

r), occurs before the end of the current re-optimization

period. In Fig. 1, for example, all trips whose announcement times are before re-optimization

time nand latest departure times are after re-optimization time n+ 1 will be considered in the

ride-matching problem in the speciﬁed re-optimization period. (Note that we require the latest

departure time of a trip to be higher than n+ 1 to account for solution time.) An announced trip

is considered ﬁnalized if it has been previously matched in the ride-matching problem. Drivers

who are matched previously can always be part of the new ride-matching problem after accounting

for their previous assignments, i.e., the new origin station and earliest departure time of a driver

who is transporting a passenger will be set to the drop oﬀ location and time of their onboard

passenger, respectively. The objective of the ride-matching problem is to maximize the number

of served riders. Since the dynamic ridesharing system solves ride-matching problems that are

structurally similar across re-optimization periods, in rest of this paper we focus our discussion to

the optimization problem in a single re-optimization period.

Figure 1: The rolling horizon implementation

4 The Ride-matching Problem

In this paper, we consider a one-to-many ride-matching problem in which a driver can serve multiple

riders. This ride-matching problem may arise in ride-sharing or ridesourcing systems with pooling

and can be formulated as in model (10) in Appendix I. This optimization problem is an NP-hard

problem, and cannot be readily adopted in a dynamic system. Hence, in this paper we introduce

8

a solution methodology that decomposes the original matching problem into multiple smaller sup-

problems that will be solved independently of each other, using model (10).

5 Illustrative Example

We use a small instance of a ridesourcing problem to clearly demonstrate each step of our proposed

method for a single static optimization problem. This example includes 4 drivers and 20 riders.

We use data form the New York City Taxi dataset [1] to construct this example. Due to the small

size of the problem, we only consider a single optimization period. Table 8in Appendix II provides

information on the 24 participants in this example, including their roles (rider/driver), their origin

and destination stations, and their earliest departure and latest arrival times. Tables 9and 10 in

Appendix III provide shortest path travel times between all pick-up and drop-oﬀ stations in this

example. We solved the ride-matching problem in model (10) for this small example using the

CPLEX solver in AMPL. The optimal matching results are demonstrated in Table 3. In the rest

of the paper, we apply the algorithms in each section to this example, and eventually compare the

outcome of the proposed methodology with that of the optimal solution.

Driver Served riders Vehicle itinerary

d1r1, r6, r9, r20 (19:00,162) →(19 : 15,103) →(19 : 22,97) →(19 : 23,97) →(19 : 30,56) →(19 : 47,50) →

(19 : 54,121) →(19 : 59,125) →(20 : 06,128)

d2r4, r11, r18 (19:12,95)→(19 : 25,52) →(19 : 35,124) →(19 : 36,124) →(19 : 43,76) →(19 : 54,58) →

(19 : 56,58) →(20 : 04,159)

d3r2, r7, r14 (19:00,161)→(19 : 13,80) →(19 : 25,33) →(19 : 33,60) →(19 : 45,78) →(19 : 47,78) →

(19 : 58,161)

d4r3, r10, r16 (19:07,80)→(19 : 16,129) →(19 : 32,98) →(19 : 34,98) →(19 : 40,97) →(19 : 50,98) →

(19 : 52,98) →(20 : 00,130)

Table 3: Optimal solution of the illustrative example. The itinerary of each driver is indicated as a sequence

of tuples. A tuple (t, s) indicates that the driver visits station sat time interval t.

6 Methodology

The matching problem in (10) can be solved quickly for the instance presented in section 5using

commercial solvers, because of the problem’s relatively small size. However, solving the optimization

problem in model (10) for larger instances of the ride-matching problem can be computationally

prohibitive for real-time implementations. As such, in this paper, we propose a framework to

produce high-quality solutions in near real-time, depicted in Figure 2.

This framework includes a clustering algorithm, which we call ε-uniform tour-based clustering.

This clustering algorithm includes a tour-forming problem, in which a tour, i.e., sequence of trips,

is formed in each cluster to represent the trips in the cluster, and an assignment step, in which trips

are assigned to clusters based on their proximity to the tours representing the clusters and subject

to a uniformity constraint. By iteratively solving these two problems, the clustering algorithm

groups trips into multiple clusters that can be optimized independently of each other. After the

clusters of trips are formed, in section 6.2 we present an optimization-based algorithm to assign

drivers to clusters of trips. Finally, the matching problem presented in Appendix I can be solved

independently for each cluster.

6.1 ε-uniform tour-based Clustering

The proposed ε-uniform tour-based clustering algorithm iterates between the following two steps

until convergence: (1) partitioning trip requests into approximately-uniform clusters so as to

minimize the intra-cluster distances, and (2) ﬁnding the best representative for each cluster. In

sections 6.1.1,6.1.2, and 6.1.3, we describe diﬀerent components of the proposed clustering approach

and demonstrate each component using the illustrative example presented in Section 5. In section

9

Solve the tour-forming

problem for each cluster

(section 6.1.1)

Assign trips to clusters

under uniformity

constraints

(section 6.1.3)

Obtain trip-tour costs

(section 6.1.2)

Continue until the change in total cost between two

consecutive iterations is smaller than a threshold

Initialization

Split ℛtrips into 𝒦clusters randomly

𝜖-uniform

Clusters

of trips

Tour-based 𝝐-uniform clustering

(Algorithm 2, section 6.1)

Driver set

Assign drivers to clusters

Driver assignment

(Algorithm 3, section 6.2)

Vehicle and rider itineraries

Solve the matching problem

for each cluster

Figure 2: The general ﬂow of the proposed framework

6.1.4, we combine the components to present the clustering method. Section 6.1.5 describes the

properties of the clustering approach.

6.1.1 The Tour Forming Problem

Let us deﬁne a cluster as a set of trips, and a vehicle tour to represent the members of a cluster.

A tour is a sequence of stations to be visited by a vehicle. We select a vehicle tour as a cluster

representative since the underlying ride-matching problem is a one-to-many problem, indicating

the importance of the sequence in which trips are served in obtaining a high quality solution. The

tour-forming problem seeks to ﬁnd the best vehicle tour for a cluster of trips. This problem can be

represented by a graph G= (R, L), where Ris the set of ride requests and Lis the link set. A link

`ij ∈ L between riders iand jexists in graph Gif rider jcan always be served following rider i.

This condition can be mathematically expressed in inequality (1). This inequality ensures that a

driver who drops oﬀ rider iat her latest arrival time still has enough time to transport rider jto

her destination within this rider’s requested time window. Furthermore, we introduce two nodes,

Oand D, such that there is an outgoing link from node Oto all other nodes and an incoming link

from all nodes to node D.

tLA

i+T(sD

i, sO

j)≤tLA

j−T(sO

j, sD

j) (1)

Under this setting, we seek to ﬁnd the longest tour, i.e., the tour that contains the greatest

number of trips. This tour-ﬁnding problem can be formulated as a longest path problem, as

presented in model (2). The decision variable vij is a binary variable that takes the value 1 if link

`ij is selected as a part of the tour, and the value 0 otherwise. The objective function in (2a)

maximizes the number of trips that lie on the selected tour. Constraints (2b) and (2c) ensure that

the tour begins at Oand ends at D, respectively. Constraint (2d) is the ﬂow balance constraint.

max w=X

i,j∈R

vij (2a)

s.t. X

j∈R

vOj= 1 (2b)

10

X

i∈R

viD= 1 (2c)

X

i∈R

vir −X

i∈R

vri = 0 ∀r∈R(2d)

vij ∈ {0,1} ∀ i, j ∈R(2e)

To clearly demonstrate the tour-forming problem, we apply it to the illustrative example in

section 5. We randomly split the 20 rider trips in this problem into two clusters of size 10, and

solve the tour-forming problem on the graph G= (R, L) for each cluster, as demonstrated in

Figure 3. Each graph has 12 nodes, including 10 nodes associated with riders, node O, and node

D. The dashed arrows in this ﬁgure represent the link set L, obtained based on inequality (1).

The longest path problem in model (2) can be solved eﬃciently using polynomial time algorithms,

such as the Network Simplex algorithm. After solving the longest path problem, we obtain the

following tours (i.e., sequences of stations): [80 →129 →33 →60 →124 →156] for cluster 1, and

[161 →80 →98 →97 →78 →161] for cluster 2. These tours are demonstrated in Figure 3.

1

3

5

7

9

11

13

17

15

19

(a) Optimal tour in graph Gfor cluster 1

2

4

6

8

10

12

14

18

16

20

(b) Optimal tour in graph Gfor cluster 2

Figure 3: Tours associated with the two clusters in the illustrative example

6.1.2 The Cost Function

Once tours are formed to represent each cluster, we need to deﬁne a measure of distance between a

trip and a tour. This measure will be later used to assign trips to tours so as to minimize the sum

of the intra-cluster distances under uniformity constraints. Let us deﬁne the cost function c, whose

value c(r, k) denotes the distance between rider r’s trip and tour k, as presented in Equation (3).

We denote this cost as the primary cost, and deﬁne it as follows. If tour kcontains trip r, then

the value of the cost function is zero; otherwise, if trip ris not on tour k, then the value of cost

function is set to M–a large positive number. Note that this cost function is selected to guarantee

the convergence of the clustering algorithm, as will be discussed later in section 6.1.5.

c(r, k) = (0 if trip ris on the tour k

Mif trip ris not on the tour k(3)

When we assign trips to tours based on this primary cost, there might be many trips that do

not readily lie on any tour. In this case, a tie-breaking rule is needed. Here, we use a secondary

objective function to break the ties, to which we refer as the secondary cost. Since tours and trips

are both sequences of stations, we use an algorithm inspired by dynamic time warping (DTW)

to measure the cost of assigning a trip to a tour. DTW is a method developed for measuring the

similarity between two sequences by ﬁnding an optimal match between their elements [19]. Consider

a tour kthat has a sequence of mstations Kk= [sk(1), sk(2), ..., sk(m)], i.e., m/2 sequential trips.

11

Let us represent a trip ras a sequence of stations, denoted by Rr= [sr(1) = sO

r, sr(2) = sD

r]. We

measure the distance between each pair of stations iand jby the shortest-path travel time between

them, denoted by T(i, j).

The distance between a trip and a tour can be calculated as the smallest total distance between

the origin and destination of the trip (i.e., sr(1) and sr(2)) from any station on the tour, under

the condition that the station on the tour that is matched to the trip destination should appear

after the station matched to the trip origin. Mathematically, this condition can be speciﬁed as

q+ 1 ≤sr(2) ≤mwhen sr(1) = q. (Note that here we use equality to indicate the matching of

two stations.) It is easy to see that one can enumerate all the possible matchings between the two

sequences [sr(1), sr(2)] and [sk(1), sk(2), ..., sk(m)] to ﬁnd the matching that provides the smallest

distance. Algorithm 1lays out the details of measuring this distance without having to enumerate

all the possibilities, rendering this step more computationally eﬃcient. This algorithm starts by

deﬁning a 2 ×mmatrix D, corresponding to the size of the two sequences. The ﬁrst row of this

matrix is the shortest path travel time between the trip origin and the stations on the tour. In the

second row of this matrix, all distances are set to inﬁnity, except for the distance between the trip

destination and the last station on the tour. The distances between other stations and the trip

destination are calculated recursively using Equation (5). Finally, Equation (6) evaluates the total

distances between the trip ends and their best matched stations on the tour, and ﬁnds the smallest

of these distances as the secondary cost.

Algorithm 1: Obtaining the dissimilarity between a trip and a tour

Input: A trip [sr(1), sr(2)]

Input: A tour [sk(1), sk(2), ..., sk(m)]

Input: Shortest-path travel time matrix Tt

Output: The distance d(r, k) between trip rand tour k

Step 1 Assume an initial distance matrix Dof size 2 ×mas follows

D=T(sr(1), sk(1)) · · · T(sr(1), sk(m−1)) T(sr(1), sk(m))

∞ · · · ∞ T(sr(2), sk(m)) (4)

Step 2 Update the distance matrix D

for q=m−1,· · · ,1do

D[2, q] = min{T(sr(2), sk(q)) , D[2, q + 1] }(5)

Step 3 Calculate the distance between trip rand tour k

d(r, k) = min

q{D[1, q] + D[2, q]}(6)

Using these two cost functions, we calculate the primary and secondary costs, c(r, k) and d(r, k),

between trips and tours. In our illustrative example, the primary costs between trips R3,R7, and

R15 and tour K1, and between trips R2,R10, and R14 and tour K2, are zero. The primary costs

between other trips and the two tours are M. The primary and secondary costs of all trips in the

illustrative example are displayed in Table 11 in Appendix IV.

6.1.3 The Two-Step Trip Assignment Process

In this section, we discuss assignment of trips to clusters under a uniformity constraint. The output

of this step is a revised set of clusters. In a conventional clustering problem, objects are allocated

12

to clusters so as to minimize sum of intra-cluster costs. Here, our end goal is not to cluster trips,

but to solve the optimization problem in model (10) for each cluster in near real-time. Since the

computational complexity of the optimization problem in each cluster depends on the number of

trips in that cluster, clustering instances in which cluster sizes are highly non-uniform are not of

interest, since larger clusters would create a computational bottleneck. As such, we strive to form

clusters that are approximately uniform in their number of trips.

In this paper, we consider a two-step assignment process. The ﬁrst assignment problem seeks

to allocate trips to clusters so as to minimize the total primary cost, as indicated in the objective

function (7a). Here, the decision variable frk takes value 1 if trip ris assigned to cluster k, and

value 0 otherwise. Constraint (7b) ensures that each trip is allocated to a single cluster. The

assignment problem in model (7) has a trivial solution wherein trips that lie on a tours, i.e., trips

for which c(r, k) = 0, would be allocated to the cluster represented by that tour.

min zp=X

r∈RX

k∈K

c(r, k)frk (7a)

s.t. X

k∈K

frk = 1 ∀r∈R(7b)

frk ∈ {0,1} ∀r∈R, ∀k∈K(7c)

After this initial assignment, we solve a second assignment problem to allocate trips that do

not readily lie on a tour. This allocation problem can be formulated as an ε-uniform assignment

problem in model (8). Let us deﬁne the set Rcto include ride requests that do not readily lie

on a tour, i.e., trips for which c(r, k) = M. The objective function in (8a) minimizes sum of the

secondary costs between trips and their associated cluster representatives. Constraint (8b) ensures

that each trip is assigned to a single cluster. constraint (8c) ensures that the diﬀerence between

the number of trips in set Rcin any two clusters is at most ε|Rc|. The parameter εis an imbalance

parameter, whose value aﬀects the size clusters. ε= 0 ensures that all clusters have the exact same

size, while ε=K − 1 imposes no constraint on cluster sizes. Note that the ε-uniform assignment

problem was ﬁrst proposed in [59] for a peer-to-peer ridesharing system. Model (8) is based on the

work in [59], customized for a ridesourcing system with one-to-many matching.

min zs=X

r∈RcX

k∈K

d(r, k)frk (8a)

s.t. X

k∈K

frk = 1 ∀r∈Rc(8b)

X

r∈Rc

frk ≤|Rc|

K(1 + ε)∀k∈K(8c)

frk ∈ {0,1} ∀r∈Rc,∀k∈K(8d)

Figure 4shows the outcome of this two-step assignment process for the illustrative example. In

this ﬁgure, trips that lie on tours all have primary cost of zero. Other trips are allocated to tours

so as to minimize the secondary cost under the uniformity constraint for cluster sizes.

13

𝑟

1

𝑟

17

𝑟

8

𝑟

12

𝐷

𝑂

𝑟

4

𝑟

18

𝑟

15

𝑟

13

𝑟

11

𝑟

7

𝑟

5

𝑟

19

𝑟

20

𝑟

2𝑟

9

𝑟

6

𝑟

10

𝑟

16

𝑟

3

𝑟

14

Figure 4: Assignment in iteration 1 of the illustrative example. The trips on tours have primary cost of

zero. Other trips are allocated to clusters based on their secondary costs, as outlined in Table 11 in Appendix

IV.

6.1.4 The ε-uniform tour-based Clustering Algorithm

In the previous sections, we outlined diﬀerent components of the ε-uniform tour-based clustering

algorithm, i.e., the tour-forming problem, the assignment problems, and the cost measures to

quantify the distance between a tour and a trip.

The ε-uniform tour-based clustering algorithm is described in Algorithm 2. The inputs to

this algorithm are the set of trips, the number of clusters, and the uniformity parameter, and the

maximum number of iterations. The algorithm starts (step 0) by randomly assigning Rtrips into K

clusters. In step 1, a tour is formed using the set of trips in each cluster. In step 2, distances between

trips and tours are obtained, allowing for computing the new total intra-cluster distances. In the

assignment step (step 3), ﬁrst, the assignment solution based on the primary costs is obtained.

Next, the ε-uniform assignment problem is solved, where trips that do not readily lie on tours are

assigned to clusters based on their proximity to tours as well as the uniformity constraint applied

to cluster sizes. The objective function value corresponding to the optimal assignment based on

the primary costs provides the total sum of intra-cluster distances. Step 4 assesses the termination

criteria. If the total intra-cluster distance obtained from assignments in two consecutive iterations

(obtained in steps 2 and 3) remains the same or the maximum number of iterations is reached,

then the algorithm terminates, providing a local optimal solution. Otherwise, the iteration counter

αis increased by 1, and steps 1 through 3 are repeated. Since this process provides a local optimal

solution, we repeat it for a total of itr times, and report the ﬁnal set of clusters, and their associated

tours, that provide the lowest overall intra-cluster distance.

𝑟

1

𝑟

17

𝑟

8

𝑟

12

𝐷

𝑂

𝑟

4

𝑟

18

𝑟

15

𝑟

13

𝑟

11

𝑟

7

𝑟

5

𝑟

19

𝑟

20

𝑟

2𝑟

9

𝑟

6

𝑟

10

𝑟

16

𝑟

3

𝑟

14

(a) Tours and assignments (b) Convergence of Algorithm 2

Figure 5: Final assignment for the illustrative example

14

Algorithm 2: The ε-uniform tour-based clustering Algorithm

Input: Set of trips, R

Input: Number of clusters, K

Input: Uniformity parameter, ε

Input: Max number of iterations, αmax

Output: Kclusters of trips and their corresponding tours, T∗

for a= 1, ..., itr do

Step 0 Initialization

Obtain f∗

rk (0) by randomly dividing Rtrips into Kclusters

α←1

Step 1 Tour formation

Choose a random sample of trips in each cluster (where clusters are determined by

f∗

rk (α−1)) to form new tours by solving the optimization problem in model (2) for

each cluster. Let w∗be the optimal value of the objective function.

v∗

ij (α)←argmax w∗

Step 2 Cost update

Calculate the new primary and secondary costs, c(n, k) and d(n, k), respectively, using

Equation (3) and Algorithm 1, and based on f∗

rk (α−1) and v∗

ij (α)

Calculate hα=PK

k=1 PR

r=1 c(r, k)

Step 3 Assignment

Assign Rtrips to Kclusters. Set z∗as the number of trips on tours, and ﬁnd z∗

sby

solving the optimization problems in model (8).

hα+1 ←z∗

f∗

rk (α)←argmin z∗

s

Step 4 Termination criteria

if hα=hα+1 or α=αmax then

Ca=hα

Ta={f∗

rk (α), v∗

ij (α)}

Terminate

else

α←α+ 1

Go to Step 1

T∗←Tacorresponding to the solution with the minimum Ca

The ﬁnal clustering results for the illustrative example are shown in Figure 5. The solution

consists of two tours, one including three trips and the other one including four trips. As ﬁgure 5b

demonstrates, convergence is obtained in only three steps.

6.1.5 Properties of the ε-uniform tour-based Clustering Algorithm

In this section, we ﬁrst prove that the objective function of the ε-uniform tour-based clustering

problem decreases with iterations in Algorithm 2. Next, we prove the convergence of the algorithm.

Proposition 1. The objective function in model (7) decreases with iterations of Algorithm 2.

Proof. In step 1 of Algorithm 2, the optimization problem in model (2) seeks to ﬁnd the longest

tour within each cluster. It is easy to see that ﬁnding the tour with maximum length is equivalent to

15

solving an optimization problem that ﬁnds the min-cost tour, where the cost function is described

in Equation (3). In step 3 of Algorithm 2, the ﬁrst step of the assignment is completed under the

same cost function. Therefore, it is easy to see that each of these two main steps in Algorithm 2

attempt to minimize the same cost function, and the results follow.

Proposition 2. Algorithm 2 Converges in a ﬁnite number of steps.

Proof. There are possibly many but ﬁnite number of ways to assign Rtrips to Kclusters.

Furthermore, in Proposition 1we showed that the objective function decreases from one iteration

to the next (otherwise, we stop). As such, there are a ﬁnite number of ways in which clusters could

change, and the algorithm is designed such that no solution is visited twice (unless at convergence,

at which point we stop). The results follow.

6.2 Driver Assignment

After clusters of trips are formed, we need to allocate drivers to these clusters. Algorithm 3details

the steps of this procedure. In the ﬁrst step of this algorithm, the cost of assigning a driver dto

a cluster kis calculated. This cost is based on the time distance between the origin location of

driver dto the pick-up location of the ﬁrst trip on tour kthat can be served by d, and the number

of remaining trips on the tour. In general, the higher the time distance, the higher the cost of

allocating the driver to the cluster. Adversely, the higher the number of the remaining trips on

tour k, the more eﬀective driver dcan be in serving the cluster, and therefore the cost would be

lower.

In the second step of Algorithm 3, we solve the bipartite matching problem outlined in model

(9) to allocate the set of drivers to clusters. The decision variable zdk takes the value 1 if driver dis

assigned to cluster k, and the value 0 otherwise. The objective function in (9) minimizes the total

driver-cluster assignment cost. Constraint (9b) ensures that the proportion of drivers allocated to

a cluster is approximately equal to the proportion of trips in that cluster. Constraint (9c) ensures

that each driver is allocated to exactly one cluster. Constraint (9d) imposes the binary condition

on the decision variable zdk. Note that the constraint coeﬃcient matrix in model (9) has a totally

unimodular structure, relaxing this optimization problem to a linear program that can be easily

solved in real-time.

The assignment of drivers to clusters is demonstrated in Table 4. The optimization problem in

model (9) allocates drivers 1 and 3 to cluster 1, and drivers 2 and 4 to cluster 2.

7 Numerical Experiments

In this section, we use the New York City Taxi dataset [1] to demonstrate the performance of the

ε-uniform tour-based clustering method.

7.1 Dataset

The New York City Taxi and Limousine Commission (NYC TLC), in partnership with the NYC

Department of Information Technology and Telecommunications (DOITT), has published millions

of trip records from yellow medallion taxis and green SHLs for several years. These records include

attributes such as pick-up and drop-oﬀ dates, times, and locations, trip distances, itemized fares,

rate types, payment types, and driver-reported passenger counts.

The data used in this study belongs to the evening peak hour (19:00-20:00) of Feb 19th in

2016. We select trips that are geographically concentrated in the Manhattan area, thereby creating

ride-matching problems that are large-scale due to the high number of trips as well as the high

spatiotemporal proximity between them.

16

Algorithm 3: Assignment of drivers to clusters

Input: Ktours, T∗

Input: Set of drivers, D, with their origins, sO

dand earliest departure times, tED

d

Output: Driver assignment to clusters, z∗

dk,∀d∈D, k ∈K

nk←number of trips in cluster kunder T∗

Step 1 Average cost of driver-tour assignment

for d∈Ddo

for k∈Kdo

Find the ﬁrst trip in tour kthat can be served by dbased on tED

d. Denote this trip

as r. Denote the number of trips after this trip on the tour as nrem

γ(d, k) = T(sO

d,sO

r)

nrem+1

Step 2 The driver assignment problem

Solve the optimization problem in (9) to obtain the optimal driver-cluster assignment, z∗

dk

min X

d∈DX

k∈K

γ(d, k)zdk (9a)

X

d∈D

zdk ≥ bPn∈Rf∗

rk

RDc ∀k∈K(9b)

X

k∈K

zdk = 1 ∀d∈D(9c)

zdk ∈ {0,1} ∀d, ∀k∈K(9d)

7.2 Simulation Settings

In this study, we adopt a rolling-horizon approach with a re-optimization period of 1 minute,

indicating that the ride-matching optimization problem will be solved every 1 minute starting from

18:59pm. We assume that all trips will be completed on their shortest travel time paths, and obtain

the travel times for every re-optimization period from the Google Maps API. In this study, we set

the ratio of riders to drivers to 4, i.e., the total number of riders is 4 times the number of drivers.

We generate the earliest departure times of drivers uniformly from the window 18:59 to 20:00. We

assume that riders request a ride a few minutes ahead of their earliest departure times. Since the

dataset does not contain the times when ride requests are issued, we generate a uniform random

number from 0 to 30 for each rider, and subtract it from their earliest departure time to obtain

their ride-request time. Doing this allows us to have a mixture of pre-arranged and on-the-ﬂy

trip requests. We assume that the study area consists of 184 pre-deﬁned stations, denoted by S,

Driver Matched with Number of

riders being served

Cluster 1 d1r1,r6,r93

d3r2,r10,r14 3

Cluster 2 d2r4,r16 2

d4r3,r7,r12,r20 4

Table 4: The ﬁnal ride-matching results

17

where participants start/end their trips. Stations are distributed in the network so as to make

sure that there is at least one station within the walking distance (<0.15 miles) of a typical trip’s

origin/destination. (Fig. 6). We set the capacity of each vehicle to four.

Figure 6: The pre-deﬁned stations distribution of the Manhattan area.

7.3 Results

In this section, we ﬁrst deﬁne a base experimental setting and conduct extensive numerical analysis

to draw a comparison between our proposed method, three benchmarks from the literature, and

the optimal solution. Next, we observe the convergence properties of Algorithm 2, and study the

impact of sample size in this algorithm. Finally, we conduct sensitivity analysis over some of the

critical parameters of our method. For all experiments conducted in this section, we consider a

planning horizon of one hour, with sixty one-minute re-optimization periods.

7.3.1 Base Experiment

In our base experimental setting, we set the total number of ride requests to 2000, and the sample

size in Algorithm 2to 150. We set the value of εto 0.1, and the maximum number of iterations to

10.

Figure 7demonstrates the clustering results, where for a given trip request, both trip ends are

colored based on the clusters to which the trip is assigned. It is interesting to note that the clusters

in Figure 7have a high level of spatial overlap.

In order to objectively assess the performance of the proposed method, we compare it against

three benchmark methods form the literature, including point-based clustering [53], balanced point-

based clustering (point-based clustering with uniform clusters), ε-uniform trip-based clustering

[59], and the exact mathematical formulation presented in model (10), solved to optimality. These

comparisons allow us to position our proposed method in terms of solution quality and the required

computational eﬀort.

In the point-based method, trip ends (i.e., origin and destination locations) of all trips (i.e., both

rider and driver trips) are the objects in clustering. Once the clusters are obtained, an optimization

18

(a) Two clusters. (b) Three clusters. (c) Four clusters.

Figure 7: ε-uniform tour-based clustering results. Trip ends are colored based on their assigned clusters.

problem can be solved independently for each cluster. Trips whose origin and destination locations

fall in diﬀerent clusters will not be served. The balanced point-based clustering is similar to point-

based clustering, except that balance constraints are imposed on the number of objects within

clusters. In trip-based clustering, each trip (whether it is a rider or a driver trip) is considered an

object in clustering. Once clustering is concluded, the rider and driver trips within each cluster are

matched independent of other clusters.

Tables 5,6and 7summarize the results. The value of kin these tables indicates the number

of clusters. The reported computation time is the total time spent on solving the ride-matching

problem, and the time for clustering where applicable, over 60 re-optimization periods, each of

duration 1 min. For the ε-uniform tour-based method, the computation time is provided in more

detail, breaking the total time into the clustering and optimization times. The clustering time

consists of the time required for generating the network, computing the costs, and the assignment

steps. The optimization solution time is the sum of the time spent on model construction and

obtaining a solution from a commercial solver. The numbers in parenthesis are the average

computation times across all re-optimization periods. The optimal solution ﬁnds a match for

60.40% of the ride requests. We normalize the optimal matching rate to 100%, and report the

solution quality of all other methods as their matching rate divided by the optimal matching rate.

Table 5: Comparison of matching rate and solution time between the proposed -uniform tour-based

method, the optimal solution, and the point-based clustering benchmark. The matching rate of the optimal

solution is normalized to 100%, and the matching rates obtained by other methods are reported in terms of

the percentage of the optimal matching rate. The value of Krefers to the number of clusters.

Optimal Point-based Tour-based

kComputation

Time (Sec)

Solution

quality kComputation

Time (Sec)

Solution

quality k

Computation

Time (Sec) Solution

quality

Total Graph

Partitioning

Optimization

Problem

11834

(30.56) 100%

2973

(16.22) 82.62% 2 366

(6.1) 146 220 94.95%

3942

(15.70) 67.30% 3 283

(4.72) 163 120 77.15%

4933

(13.88) 60.60% 4 197

(3.28) 151 46 65.98%

Table 5compares the ε-uniform tour-based clustering method with point-based clustering and

the optimal solution. This table demonstrates that the gap in solution quality by the two methods

decreases as we increase the number of clusters. This is due to the fact that when forming clusters,

both methods disregard potential matches between objects that are assigned to diﬀerent clusters.

The higher solution times for the point-based method can be explained by the fact that this method

does not seek to construct uniform clusters, implying that some clusters may have a signiﬁcantly

higher number of trips, and thereby higher optimization times. This imbalance in the size of clusters

19

Table 6: Comparison of matching rate and solution time between the proposed -uniform tour-based

method, the optimal solution, and the balanced point-based clustering benchmark. The matching rate of

the optimal solution is normalized to 100%, and the matching rates obtained by other methods are reported

in terms of the percentage of the optimal matching rate. The value of Krefers to the number of clusters.

Optimal Balanced point-based Tour-based

kComputation

Time (Sec)

Solution

quality kComputation

Time (Sec)

Solution

quality k

Computation

Time (Sec) Solution

quality

Total Graph

Partitioning

Optimization

Problem

11834

(30.56) 100%

2343

(5.72) 75.32% 2 366

(6.1) 146 220 94.95%

3251

(4.18) 62.43% 3 283

(4.72) 163 120 77.15%

4173

(2.88) 54.37% 4 197

(3.28) 151 46 65.98%

Table 7: Comparison of matching rate and solution time between the proposed -uniform tour-based

method, the optimal solution, and the -uniform trip-based clustering benchmark. The matching rate of the

optimal solution is normalized to 100%, and the matching rates obtained by other methods are reported in

terms of the percentage of the optimal matching rate. The value of Krefers to the number of clusters.

Optimal Trip-based Tour-based

kComputation

Time (Sec)

Solution

quality kComputation

Time (Sec)

Solution

quality k

Computation

Time (Sec) Solution

quality

Total Graph

Partitioning

Optimization

Problem

11834

(30.56) 100%

2350

(5.8) 83.61% 2 366

(6.1) 146 220 94.95%

3279

(4.65) 68.54% 3 283

(4.72) 163 120 77.15%

4185

(3.08) 59.85% 4 197

(3.28) 151 46 65.98%

under point-based clustering is demonstrated in Figure 8. The lower solution quality of the point-

based method is due to the fact that in this method clusters are constructed based on trip ends,

rather than whole trips. Therefore, since after clustering the matching optimization problems are

solved independently for each cluster, trips whose trip ends lie in diﬀerent clusters are not served.

This is a drawback that is addressed by the trip-based clustering method in which an entire trip is

considered as a clustering object.

Table 6compares the ε-uniform tour-based clustering method with balanced point-based

clustering, in which a uniformity constraint is incorporated in the clustering algorithm, and the

optimal solution. This table suggests that when we seek to balance cluster sizes in point-based

clustering, the computation time decreases signiﬁcantly. However, unsurprisingly, the solution

quality of the balanced point-based clustering is worse than that of the unbalanced point-based

clustering and the tour-based method. This table suggests that with k= 2, the solution quality

of the tour-based method is higher that of the balanced point-based method by about 20%. This

is due to the fact that when forming uniform clusters, the system has a lower level of ﬂexibility to

assign trips to clusters, thereby resulting in lower quality solutions.

Finally, Table 7compares the tour-based and trip-based clustering approaches. The tour-

based method outperforms the solution quality of the trip-based method by about 11% under two

clusters. The reason behind this improvement in solution quality can be attributed to the fact that

in tour-based clustering trips that are close to any trip on the representative tour are assigned to

that cluster. As a result, not only are trips that are spatio-temporally close assigned to the same

cluster, as is the case in trip-based clustering, but also trips assigned to a cluster can be served

sequentially by a single vehicle. This makes it more likely for the matching problem to make a

more eﬀective use of the vehicles in serving ride requests.

The gap between these solutions is reduced to 6% under 4 clusters for the same reasons discussed

above. The diﬀerence in solution times of these two methods is not statistically signiﬁcant, as both

20

Figure 8: Cluster sizes for point-based, balanced point-based, trip-based and tour-based methods.

methods strive to generate clusters that are balanced in size within a threshed. The main diﬀerence

between the balanced point-based, trip-based, and tour-based approaches is what they consider as

a unit of analysis: the point-based method considers a trip end as a unit of modeling, while the

trip-based method considers an entire trip, and the tour-based method considers a tour–a sequence

of trips–as a cluster representative. The tour-based method provides the highest quality solution

because the cluster representatives can more closely capture the ultimate product of the matching

problem, which in a ridesourcing setting is a set of vehicle tours.

7.3.2 Algorithm Properties

In this section, we ﬁrst study the convergence rate of our base experimental setting. Next, we

investigate the impact of sample size on the quality of solutions.

Convergence Properties

Figure 9demonstrates the convergence of the objective function in model (8) in our base experiment,

using a randomly-selected instance. This ﬁgure clearly shows that the objective function increases

monotonically, which is equivalent to the cost function decreasing monotonically. As demonstrated

in this ﬁgure, the objective function typically converges in a few iterations.

Figure 9: Primary and secondary costs per iteration.

Impact of Sample Size on Solution Quality

Figure 10 shows the impact of sample size in Algorithm 2on the computation time and the

secondary cost in the tour forming problem. Figure 10(a) displays that the computation time

increases super-linearly with sample size. Figure 10(b) demonstrates that the secondary cost

decreases with sample size. This is due to the fact that increasing the sample size increases

the likelihood of obtaining tours that better represent their corresponding clusters, resulting in

a smaller secondary cost for the entire set of trips. However, there is a critical sample size beyond

21

which the reduction in cost becomes negligible. In our experimental setting, this sample size is

about 150 trips, which is the number utilized in our experiments.

(a) Computation time (b) Total cost

Figure 10: The computation time and the secondary cost of the tour forming problem with diﬀerent sample

sizes

7.3.3 Sensitivity Analysis

The Imbalance Parameter

Figure 11 shows the impact of changing the value of εon the number of served trips as well as

the computation time of the ε-uniform tour-based clustering method under diﬀerent numbers of

clusters. Figure 11(a) displays that as the value of εincreases, the number of served riders increases

for all values of K. This is because a higher value for εprovides a higher level of ﬂexibility to assign

trips to clusters, thereby resulting in higher quality solutions. Figure 11(b) demonstrates that the

computation time increases with εover all values of K. This is due to the fact that the increased

level of ﬂexibility that accompanies a higher εvalue results in less uniform cluster sizes. As such,

some clusters will be larger than others, increasing the overall solution time.

(a) Number of riders being served. (b) Computation time

Figure 11: Number of served riders and computation time of the proposed algorithm with diﬀerent values

of ε

22

Figure 12 provides a more detailed view of the change in computation time for diﬀerent values

of ε. Figures 12(a) and 12(b) demonstrate the distribution of computation time under 3 and

4 clusters, respectively. These ﬁgures indicate that the cluster sizes become less uniform as we

increase ε, resulting in the larger clusters taking longer to optimize.

(a) Number of clusters = 3 (b) Number of clusters = 4

Figure 12: Distribution of computation time

Number of System Participants

Figure 13 displays the inﬂuence of the size of participants on the computation time and the number

of served riders under diﬀerent numbers of clusters and ε= 0.1. Note that k= 1 provides the

optimal solution. Figure 13(a) demonstrates that the computation time decreases with the number

of clusters regardless of the number of participants. Still, the rate of reduction in solution time

decreases as the number of clusters becomes larger. This is because the larger number of clusters

indicates that there are fewer participants in each cluster, which results in less computation time.

However, once the number of clusters reaches a threshold, the reduction in solution time as we

increase the number of clusters becomes small, indicating that there is a critical threshold for k

where having more clusters does not help with reducing solution time any further, but decreases

system throughput (Figure 13(b)). Additionally, Figure 13(a) shows that once the number of

clusters is over a critical threshold, the computation time is not substantially aﬀected by the

number of participants anymore.

Figure 13(b) shows that increasing the number of clusters reduces the matching rate; however,

this reduction is smaller when the base number of participants is lower. This is due to the fact that

with higher trip density produced by a higher number of participants, clustering leads to removing

a higher number of potential matches from the feasible region of the solution, leading to higher

loss in system throughput. This ﬁgure also shows that the rate of reduction in system throughput

decreases between k= 3 and k= 2, compared to k= 2 and k= 1.

Figure 13 shows that, regardless of the number of participants, there is a a critical value for k,

where increasing the number of clusters beyond this value does not reduce the solution time any

further, but reduces system throughput. For our experimental setting, this critical value is k= 2.

8 Conclusion

In this paper we devise a framework to solve the ride-matching problem that arises in dynamic

ridesourcing systems in a distributed fashion. The methodology is based on clustering, where ride

requests are grouped into a number of clusters so as to (1) maximize the intra-cluster similarity

23

(a) Computation time (b) Number of served riders

Figure 13: Number of served riders and computation time under diﬀerent numbers of clusters

between trips within a cluster, and (2) guarantee cluster sizes to be uniform within a threshold.

The proposed clustering approach accounts for the fact the ultimate goal is to form vehicle tours in

each cluster, thereby using tours, i.e., sequences of trips, as cluster representative. We devise what

we call the ε-uniform tour-based algorithm to assign trips to clusters, and prove its convergence.

Next, we optimally assign drivers to clusters, and solve the ride-matching problem for clusters

independently of each other.

Using the New York City taxi dataset, we conduct extensive numerical experiments to analyze

the performance of the proposed methodology and compare it against three state-of-the-art

benchmarks, namely point-based, balanced point-based, and trip-based clustering, as well as the

optimal solution. First, we demonstrate that the proposed methodology has favorable convergence

properties, providing solutions in a few iterations. Secondly, we demonstrate the importance of

forming approximately-uniform clusters, and showcase the resulting trade-oﬀs between solution

time and quality. Finally, we show that our proposed methodology could result in a statistically

signiﬁcant increase in the matching rate compared to the benchmarks, where this improvement

decreases with the number of clusters.

Acknowledgment

The work described in this paper was supported by NSF award 2046372.

24

Appendix I

we consider a one-to-many ride-matching problem in which a driver can serve multiple riders. In

such systems, a trip can be denoted by a link, l= (ti, si, tj, sj)∈T×S×T×S, where Tis an ordered

set of time intervals during the study time horizon. Due to the large size of the transportation

network and the number of time intervals in the study horizon, solving such a ride-matching problem

can be computationally prohibitive. Therefore, we adopt the pre-processing procedure proposed

by [49] to reduce the size of the link sets. The rationale of the pre-processing procedure is that

the spatiotemporal constraints enforced by travel time windows of participants limit their access

to members of the link set L. This pre-processing procedure starts by forming an ellipse for

each participant, where the foci of the ellipse are set to the participant’s origin and destination

stations, the distance between the foci is the Euclidean distance between the participant’s origin

and destination stations, and the distance between the vertices of the ellipse is set to an upper-

bound on the distance that the participant can travel within their travel time window. This ellipse

deﬁnes a reduced graph, where any link with at least one station outside of the ellipse will be

infeasible for the participants as it will violate at least one of the spatio-temporal constraints.

This pre-processing procedure provides Ld⊂Land Lr⊂Las the set of links accessible to

driver dand rider r, respectively. Furthermore, we deﬁne Lrd =Lr∩Ld. Finally, Tr⊂Tand

Td⊂Tare sets of time intervals within the time window of rider rand driver d, respectively. The

ride-matching problem can be modeled on a graph G= (S, L), where Sis the set of pick-up and

drop-oﬀ stations, and Lis the set of links.

This problem can be mathematically formulated as an integer programming model in (10). In

this model, the decision variable ωrd is a binary variable that takes on the value 1 if rider ris

matched with driver d, and the value 0 otherwise. There are two additional sets of binary variables

: (1) xl

dwhich takes the value 1 if driver dtravels on link l, and the value 0 otherwise; and (2) yl

rd

which takes the value 1 if rider ris transported by driver don link l, and the value 0 otherwise.

The objective function in (10a) maximizes the total number of served riders. Constraints (10b)

and (10c) ensure that drivers start their trips from their origin stations and end their trips at their

destination stations, respectively. Constraint (10d) guarantees the ﬂow conservation of vehicles.

Constraints (10e) and (10f) ensure that served riders depart from their origin stations and arrive at

their destination stations within their speciﬁed time windows. Constraint (10g) guarantees the ﬂow

conservation of riders. Constraint (10h) states that riders can be matched with only one driver.

Constraint (10i) limits the capacities of the vehicles.

max X

r∈RX

d∈D

ωrd (10a)

s.t. X

l∈Ld:

si=sO

d;ti,tj∈Td

xl

d−X

l∈Ld:

sj=sO

d;ti,tj∈Td

xl

d= 1 ∀d∈D; (10b)

X

l∈Ld:

sj=sD

d;ti,tj∈Td

xl

d−X

l∈Ld:

si=sD

d;ti,tj∈Td

xl

d= 1 ∀d∈D; (10c)

X

ti,si

l=(ti,si,t,s)∈Ld

xl

d−X

ti,si

l=(t,s,ti,si)∈Ld

xl

d= 0 ∀d∈D, ∀t∈Td,∀s∈S\ {sO

d∪sD

d};

(10d)

25

X

l∈Lrd:si=sO

r

;ti,tj∈Tr

yl

rd −X

l∈Lrd:sj=sO

r;

ti,tj∈Tr

yl

rd =ωrd ∀r∈R, ∀d∈D; (10e)

X

l∈Lrd:

sj=sD

r;ti,tj∈Tr

yl

rd −X

l∈Lrd:

si=sD

r;ti,tj∈Tr

yl

rd =ωrd ∀r∈R, ∀d∈D; (10f )

X

d∈DX

ti,si

l=(ti,si,t,s)∈Lrd

yl

rd −X

d∈DX

ti,si

l=(ti,si,t,s)∈Lrd

yl

rd = 0 ∀r∈R, ∀t∈Tr,∀s∈S\ {sO

r∪sD

r};

(10g)

X

d∈D

ωrd ≤1∀r∈R(10h)

X

r∈R

yl

rd ≤capd∀d∈D, ∀l∈Ld(10i)

26

Appendix II

Participant Role Origin

station

Destination

station

Earliest

departure time

Latest arrival

time

Shortest path

travel time

r1rider 162 103 19:00 19:19 15

r2rider 161 80 19:00 19:15 13

r3rider 80 129 19:07 19:16 9

r4rider 95 52 19:12 19:30 13

r5rider 97 56 19:16 19:36 16

r6rider 97 121 19:23 19:55 30

r7rider 33 60 19:25 19:39 8

r8rider 50 35 19:28 19:53 20

r9rider 56 50 19:30 19:56 17

r10 rider 98 97 19:34 19:41 6

r11 rider 124 76 19:36 19:45 7

r12 rider 26 71 19:38 19:50 9

r13 rider 57 131 19:42 19:52 7

r14 rider 78 161 19:47 20:02 11

r15 rider 124 156 19:50 19:54 3

r16 rider 98 130 19:52 20:05 8

r17 rider 49 124 19:53 20:03 10

r18 rider 58 159 19:56 20:06 8

r19 rider 128 100 19:58 20:10 9

r20 rider 125 128 19:59 20:15 13

d1driver 98 Na 18:50 Na Na

d2driver 57 Na 19:00 Na Na

d3driver 31 Na 18:50 Na Na

d4driver 78 Na 19:00 Na Na

Table 8: Information of participants in the illustrative example

27

Appendix III

Table 9: Shortest path travel time for illustrative example (I).

Origin

Station

Destination

Station 26 31 33 35 49 50 52 56 57 58 60 71 76 78 80 95

26 0 25 16 29 7 11 20 29 33 38 37 9 13 19 23 13

31 10 0 11 18 11 9 6 16 21 26 36 19 10 7 11 20

33 14 4 0 9 15 13 10 7 12 17 8 23 14 11 10 24

35 18 8 4 0 19 17 14 5 5 10 21 27 18 15 14 28

49 7 21 28 31 0 4 14 26 30 35 46 9 10 16 20 10

50 6 17 26 20 12 0 10 23 27 32 43 11 7 13 17 11

52 11 7 18 25 11 9 0 18 23 28 28 20 10 9 13 20

56 18 8 4 7 19 17 12 0 5 10 3 27 18 33 39 28

57 21 11 7 3 22 20 15 3 0 5 16 30 21 16 12 31

58 24 14 10 6 25 23 18 6 3 0 11 33 24 19 15 34

60 34 24 20 16 35 33 28 16 13 10 0 43 34 29 25 43

71 13 24 24 35 10 14 19 28 32 37 38 0 12 18 22 12

76 17 12 19 23 13 11 7 16 20 25 36 22 0 6 10 10

78 16 6 13 17 17 15 7 10 14 19 30 25 11 0 4 21

80 22 12 9 13 23 21 13 6 10 15 26 31 22 13 0 32

95 15 18 15 29 11 9 13 22 26 31 18 20 6 12 16 0

97 22 12 19 23 22 20 11 16 20 25 36 31 11 6 10 12

98 19 9 16 20 20 18 10 13 17 22 33 28 14 3 7 18

100 26 16 12 15 27 25 17 8 12 17 28 35 26 17 4 29

103 32 22 18 14 33 31 26 14 11 10 18 41 32 25 18 32

121 16 27 34 38 13 17 22 31 35 40 47 7 15 21 25 12

124 17 22 29 33 14 13 17 26 30 35 45 16 7 16 20 4

125 18 21 28 32 14 12 16 25 29 34 42 21 9 15 19 3

128 22 12 19 23 23 21 13 16 20 25 31 31 17 6 10 21

129 28 16 9 23 25 22 18 18 21 21 26 27 15 9 9 17

130 31 21 17 20 32 30 22 13 17 21 26 40 27 16 9 27

131 28 18 14 17 29 27 22 10 14 18 23 37 28 21 12 30

156 22 25 32 36 18 16 20 29 33 38 40 15 13 19 23 7

159 28 18 25 29 26 24 19 22 26 31 36 30 21 12 16 15

161 29 19 21 24 30 28 20 17 21 26 30 38 24 13 13 24

162 32 22 12 23 33 31 23 16 20 24 26 38 27 16 12 23

Table 10: Shortest path travel time for illustrative example (II).

Origin

Station

Destination

Station 97 98 100 103 121 124 125 128 129 130 131 156 159 161 162

26 21 22 26 36 13 11 14 25 28 28 31 16 23 30 30

31 16 10 14 24 23 21 21 13 16 16 19 26 19 18 20

33 20 14 13 23 27 25 25 17 19 15 18 30 23 22 19

35 24 18 17 16 31 29 29 21 23 19 22 34 27 26 23

49 18 19 23 33 13 11 14 22 25 25 28 16 23 27 29

50 15 16 20 30 15 13 16 19 22 22 25 18 21 24 26

52 13 12 16 26 24 22 21 15 13 18 21 17 21 20 22

56 22 16 12 16 31 18 29 19 18 14 17 14 25 38 18

57 25 19 15 11 34 32 32 22 21 17 7 37 28 13 20

58 28 22 18 6 37 35 35 25 21 18 15 38 27 18 15

60 4 32 28 15 47 15 40 30 26 23 20 43 32 23 20

71 20 21 25 35 4 10 13 24 27 27 30 12 20 26 25

76 8 9 13 23 26 16 11 16 15 15 18 18 14 17 19

78 9 3 7 17 29 27 22 6 9 9 12 27 12 11 13

80 22 16 3 13 35 33 29 15 9 5 8 32 21 12 9

95 18 21 19 29 21 6 5 14 17 19 22 11 13 19 21

97 0 3 13 23 30 18 13 6 9 11 14 20 12 31 14

98 6 0 10 20 32 24 19 3 6 8 11 24 9 8 11

100 21 15 0 10 39 31 26 12 6 2 5 29 18 9 6

103 28 22 15 0 44 34 29 19 15 12 9 11 21 12 9

121 20 23 28 36 0 8 11 21 24 24 27 8 16 22 21

124 12 15 23 33 15 0 3 16 19 21 24 3 12 19 19

125 11 14 22 31 20 5 0 13 16 18 21 9 9 16 16

128 9 3 9 18 33 23 18 0 3 5 8 21 6 5 8

129 9 6 6 15 24 19 16 3 0 4 9 17 9 3 6

130 19 13 5 13 39 29 24 10 4 0 3 27 16 7 4

131 14 18 9 10 41 32 27 15 9 5 0 30 19 10 7

156 15 17 22 29 12 6 4 14 17 17 20 0 8 15 14

159 15 9 15 24 27 17 12 6 9 11 14 15 0 7 10

161 16 10 9 18 36 26 21 7 3 5 8 24 9 0 4

162 19 13 8 15 35 25 20 10 6 3 6 23 12 16 0

28

Appendix IV

Rider Preliminary Cost Secondary Cost

tour k1tour k2tour k1tour k2

r1M M 21 28

r2M 0

r30 M

r4M M 21 28

r5M M 19 23

r6M M 26 20

r70 M

r8M M 31 28

r9M M 32 30

r10 M 0

r11 M M 20 16

r12 M M 21 37

r13 M M 23 21

r14 M 0

r15 0 M

r16 M M 10 13

r17 M M 30 11

r18 M M 25 30

r19 M M 14 9

r20 M M 17 19

Table 11: The costs between trips and tours for the illustrative example

29

References

[1] NYC.gov. The New York City Taxi dataset. (accessed Dec. 15, 2020). url:https://www1.

nyc.gov/site/tlc/about/tlc-trip-record-data.page.

[2] Flinc. Germany. 2011 (accessed Dec. 15, 2020). url:https://flinc.org/.

[3] Ville Fluide. France. 2011 (accessed Dec. 15, 2020). url:http://www.villefluide.fr/.

[4] Lyft. United States. 2012 (accessed Dec. 15, 2020). url:https://www.lyft.com/.

[5] Carticipate. the United States. 2008 (accessed Dec. 15, 2020). url:https : / / www .

carticipate.com/.

[6] Uber. the United States. 2009 (accessed Dec. 15, 2020). url:https://www.uber.com/.

[7] Brian W Kernighan and Shen Lin. “An eﬃcient heuristic procedure for partitioning graphs”.

In: The Bell system technical journal (1970), pp. 291–307.

[8] C Fiduccia and R Mattheyses. “A linear-time heuristic for improving network partitions”. In:

Proceedings of the 19th Design Automation Conference (1982), pp. 175–181.

[9] S. P. Lloyd. “Least squares quantization in PCM”. In: IEEE Trans 28.2 (1982), pp. 129–137.

[10] Stefan E Karisch, Franz Rendl, and Jens Clausen. “Solving graph bisection problems with

semideﬁnite programming”. In: INFORMS Journal on Computing (2000), pp. 177–191.

[11] Jean-Fran¸cois Cordeau and Gilbert Laporte. “The dial-a-ride problem (DARP): Variants,

modeling issues and algorithms”. In: Quarterly Journal of the Belgian, French and Italian

Operations Research Societies 1.2 (2003), pp. 89–101.

[12] Roberto Baldacci and Maniezzo Aristide Mingozzi. “An Exact Method for the Car Pooling

Problem Based on Lagrangean Column Generation”. In: Operations Research (2004),

pp. 422–439.

[13] Konstantin Andreev and Harald Racke. “Balanced graph partitioning”. In: Theory of

Computing Systems (2006), pp. 929–939.

[14] Leo Grady and Eric L Schwartz. “Isoperimetric graph partitioning for image segmentation”.

In: IEEE transactions on pattern analysis and machine intelligence (2006), pp. 469–475.

[15] Stephan Winter and Silvia Nittel. “Ad hoc shared-ride trip planning by mobile geosensor

networks”. In: International Journal of Geographical Information Science (2006), pp. 899–916.

[16] Jean-Fran¸cois Cordeau and Gilbert Laporte. “The dial-a-ride problem: models and

algorithms”. In: Annals of operations research 153.1 (2007), pp. 29–46.

[17] Martijn Mes, Matthieu Van Der Heijden, and Aart Van Harten. “Comparison of agent-based

scheduling to look-ahead heuristics for real-time transportation problems”. In: European

Journal of Operational Research (2007), pp. 59–75.

[18] David A Hensher. “Climate change, enhanced greenhouse gas emissions and passenger

transport–What can we do to make a diﬀerence?” In: Transportation Research Part D:

Transport and Environment (2008), pp. 95–111.

[19] Pavel Senin. “Dynamic time warping algorithm review”. In: Information and Computer

Science Department University of Hawaii at Manoa Honolulu, USA (2008), p. 40.

[20] Cristi´an E. Cort´es, Mart´ın Matamala, and Claudio Contardo. “The pickup and delivery

problem with transfers: Formulation and a branch-and-cut solution method”. In: European

Journal of Operational Research (2010), pp. 711–724.

30

[21] Santo Fortunato. “Community detection in graphs”. In: Physics reports (2010), pp. 75–174.

[22] Tim Kieritz et al. “Distributed Time-Dependent Contraction Hierarchies”. In: Experimental

Algorithms (2010), pp. 83–93.

[23] Niels Agatz et al. “Dynamic ride-sharing: A simulation study in metro Atlanta”. In:

Transportation Research Part B Methodological (2011), pp. 1450–1464.

[24] Shumo Chu and James Cheng. “Triangle listing in massive networks and its applications”.

In: Acm Sigkdd International Conference on Knowledge Discovery Data Mining. 2011.

[25] Wesam Herbawi and Michael Weber. “Evolutionary Multiobjective Route Planning

in Dynamic Multi-hop Ridesharing”. In: Evolutionary Computation in Combinatorial

Optimization. Springer Berlin Heidelberg, 2011, pp. 84–95.

[26] Niels Agatz et al. “Optimization for dynamic ride-sharing: A review”. In: European Journal

of Operational Research (2012), pp. 295–303.

[27] Nelson D Chan and Susan A Shaheen. “Ridesharing in North America: Past, present, and

future”. In: Transport reviews (2012), pp. 93–112.

[28] Daniel Delling and Renato Werneck. “Better Bounds for Graph Bisection”. In: Algorithms–

ESA 2012 (2012), pp. 407–418.

[29] Keivan Ghoseiri. “Dynamic Rideshare Optimized Matching problem”. In: Dissertations

Theses - Gradworks (2012).

[30] Wesam Herbawi and Michael Weber. “A Genetic and Insertion Heuristic Algorithm for

Solving the Dynamic Ridematching Problem with Time Windows”. In: GECCO’12 -

Proceedings of the 14th International Conference on Genetic and Evolutionary Computation

(2012).

[31] E. G. Boman, K. D. Devine, and S. Rajamanickam. “Scalable matrix computations on large

scale-free graphs using 2D graph partitioning”. In: SC ’13: Proceedings of the International

Conference on High Performance Computing, Networking, Storage and Analysis. 2013,

pp. 1–12.

[32] A. Di Febbraro, E. Gattorna, and N. Sacco. “Optimization of Dynamic Ridesharing Systems”.

In: Transportation Research Record Journal of the Transportation Research Board (2013),

pp. 44–50.

[33] F. Drews and D. Luxen. “Multi-hop ride sharing”. In: Proceedings of the 6th Annual

Symposium on Combinatorial Search, SoCS 2013 (2013), pp. 71–79.

[34] Masabumi Furuhata et al. “Ridesharing: The state-of-the-art and future directions”. In:

Transportation Research Part B Methodological (2013), pp. 28–46.

[35] S. Ma, Z. Yu, and O. Wolfson. “T-share: A large-scale dynamic taxi ridesharing service”. In:

IEEE International Conference on Data Engineering. 2013.

[36] Bo Peng, Lei Zhang, and David Zhang. “A survey of graph theoretical approaches to image

segmentation”. In: Pattern recognition (2013), pp. 1020–1038.

[37] Peter Sanders and Christian Schulz. Engineering Multilevel Graph Partitioning Algorithms.

Springer Berlin Heidelberg, 2013.

[38] Dennis Luxen and Dennis Schieferdecker. “Candidate sets for alternative routes in road

networks”. In: Journal of Experimental Algorithmics (JEA) (2015), pp. 1–28.

31

[39] Dominik Pelzer et al. “A Partition-Based Match Making Algorithm for Dynamic

Ridesharing”. In: IEEE Transactions on Intelligent Transportation Systems (2015),

pp. 2587–2598.

[40] Douglas O Santos and Eduardo C Xavier. “Taxi and ride sharing: A dynamic dial-a-ride

problem with money as an incentive”. In: Expert Systems with Applications 42.19 (2015),

pp. 6728–6737.

[41] Mitja Stiglic et al. “The beneﬁts of meeting points in ride-sharing systems”. In: Transportation

Research Part B (2015).

[42] Aydin Buluc et al. Recent Advances in Graph Partitioning. Springer International Publishing,

2016.

[43] E. Jafari et al. “The For-Proﬁt Dial-a-Ride Problem on Dynamic Networks”. In:

Transportation Research Board 96th Annual Meeting. 2016.

[44] Mehdi Nourinejad and Matthew J. Roorda. “Agent based model for dynamic ridesharing”.

In: Transportation Research Part C (2016), pp. 117–132.

[45] Robert Regue, Neda Masoud, and Will Recker. “Car2work: Shared Mobility Concept to

Connect Commuters with Workplaces”. In: Transportation Research Record: Journal of the

Transportation Research Board 2542 (Jan. 2016), pp. 102–110. doi:10.3141/2542-12.

[46] Yaroslav Akhremtsev, Peter Sanders, and Christian Schulz. “High-Quality Shared-Memory

Graph Partitioning”. In: IEEE Transactions on Parallel and Distributed Systems (2017).

[47] Javier Alonso-Mora et al. “On-demand high-capacity ride-sharing via dynamic trip-vehicle

assignment”. In: Proceedings of the National Academy of Sciences 114.3 (2017), pp. 462–467.

[48] Roger Lloret-Batlle, Neda Masoud, and Daisik Nam. “Peer-to-peer ridesharing with ride-back

on high-occupancy-vehicle lanes: Toward a practical alternative mode for daily commuting”.

In: Transportation Research Record 2668.1 (2017), pp. 21–28.

[49] Neda Masoud and R Jayakrishnan. “A Decomposition Algorithm to Solve the Multi-Hop

Peer-to-Peer Ride-Matching Problem”. In: Transportation Research Part B Methodological

(2017), pp. 1–29.

[50] Neda Masoud and R Jayakrishnan. “A real-time algorithm to solve the peer-to-peer ride-

matching problem in a ﬂexible ridesharing system”. In: Transportation Research part B:

Methodological 106 (2017), pp. 218–236.

[51] Neda Masoud and R Jayakrishnan. “Autonomous or driver-less vehicles: Implementation

strategies and operational concerns”. In: Transportation research part E: logistics and

transportation review 108 (2017), pp. 179–194.

[52] Neda Masoud et al. “Promoting Peer-to-Peer Ridesharing Services as Transit System

Feeders”. In: Transportation Research Record: Journal of the Transportation Research Board

(2017), pp. 74–83.

[53] Ali Najmi, David Rey, and Taha H. Rashidi. “Novel dynamic formulations for real-time ride-

sharing systems”. In: Transportation Research Part E: Logistics and Transportation Review

(2017), pp. 122–140.

[54] Sin C Ho et al. “A survey of dial-a-ride problems: Literature review and recent developments”.

In: Transportation Research Part B: Methodological 111 (2018), pp. 395–421.

32

[55] Ruimin Li, Zhiyong Liu, and Ruibo Zhang. “Studying the beneﬁts of carpooling in an

urban area using automatic vehicle identiﬁcation data”. In: Transportation Research Part

C: Emerging Technologies (2018), pp. 367–380.

[56] Daisik Nam et al. “Designing a Transit-Feeder System using Multiple Sustainable Modes:

Peer-to-Peer (P2P) Ridesharing, Bike Sharing, and Walking”. In: Transportation Research

Record Journal of the Transportation Research Board (2018).

[57] Hai Wang and Hai Yang. “Ridesourcing systems: A framework and review”. In: Transportation

Research Part B: Methodological 129 (2019), pp. 122–155.

[58] Jayita Chakraborty et al. “A review of Ride-Matching strategies for Ridesourcing and other

similar services”. In: Transport Reviews (2020), pp. 1–22.

[59] Amirmahdi Tafreshian and Neda Masoud. “Trip-based graph partitioning in dynamic

ridesharing”. In: Transportation Research Part C: Emerging Technologies (2020), pp. 532–553.

[60] Amirmahdi Tafreshian and Neda Masoud. “Using subsidies to stabilize peer-to-peer

ridesharing markets with role assignment”. In: Transportation Research Part C: Emerging

Technologies 120 (2020).

[61] Amirmahdi Tafreshian, Neda Masoud, and Yafeng Yin. “Frontiers in Service Science: Ride

Matching for Peer-to-Peer Ride Sharing: A Review and Future Directions”. In: Service Science

12.2-3 (2020), pp. 44–60.

[62] Zhenhao Zhang, Amirmahdi Tafreshian, and Neda Masoud. “Modular transit: Using

autonomy and modularity to improve performance in public transportation”. In:

Transportation Research Part E: Logistics and Transportation Review 141 (2020), p. 102033.

[63] Amirmahdi Tafreshian et al. “Proactive shuttle dispatching in large-scale dynamic dial-a-ride

systems”. In: Transportation Research Part B: Methodological 150 (2021), pp. 227–259.

[64] Xingbin Zhan et al. “A modiﬁed artiﬁcial bee colony algorithm for the dynamic ride-hailing

sharing problem”. In: Transportation Research Part E: Logistics and Transportation Review

150 (2021), p. 102124.

33