ArticlePDF Available

A Learning-Based Optimization Approach for Autonomous Ridesharing Platforms with Service Level Contracts and On-Demand Hiring of Idle Vehicles


Abstract and Figures

Current mobility services cannot compete on equal terms with self-owned mobility products concerning service quality. Due to supply and demand imbalances, ridesharing users invariably experience delays, price surges, and rejections. Traditional approaches often fail to respond to demand fluctuations adequately since service levels are, to some extent, bounded by fleet size. With the emergence of autonomous vehicles, however, the characteristics of mobility services change and new opportunities to overcome the prevailing limitations arise. In this paper, we consider an autonomous ridesharing problem in which idle vehicles are hired on-demand in order to meet the service level requirements of a heterogeneous user base. In the face of uncertain demand and idle vehicle supply, we propose a learning-based optimization approach that uses the dual variables of the underlying assignment problem to iteratively approximate the marginal value of vehicles at each time and location under different availability settings. These approximations are used in the objective function of the optimization problem to dispatch, rebalance, and occasionally hire idle third-party vehicles in a high-resolution transportation network of Manhattan. The results show that the proposed policy outperforms a reactive optimization approach in a variety of vehicle availability scenarios while hiring fewer vehicles. Moreover, we demonstrate that mobility services can offer strict service level contracts (SLCs) to different user groups featuring both delay and rejection penalties.
Content may be subject to copyright.
A Learning-Based Optimization Approach for
Autonomous Ridesharing Platforms with Service
Level Contracts and On-Demand Hiring of Idle
B. A. Beirigo
TU Delft,
F. Schulte
TU Delft,
R. R. Negenborn
TU Delft,
Current mobility services cannot compete on equal terms with self-owned mobility products concerning ser-
vice quality. Due to supply and demand imbalances, ridesharing users invariably experience delays, price
surges, and rejections. Traditional approaches often fail to respond to demand fluctuations adequately since
service levels are, to some extent, bounded by fleet size. With the emergence of autonomous vehicles, however,
the characteristics of mobility services change and new opportunities to overcome the prevailing limitations
arise. In this paper, we consider an autonomous ridesharing problem in which idle vehicles are hired on-
demand in order to meet the service level requirements of a heterogeneous user base. In the face of uncertain
demand and idle vehicle supply, we propose a learning-based optimization approach that uses the dual vari-
ables of the underlying assignment problem to iteratively approximate the marginal value of vehicles at each
time and location under dierent availability settings. These approximations are used in the objective func-
tion of the optimization problem to dispatch, rebalance, and occasionally hire idle third-party vehicles in a
high-resolution transportation network of Manhattan, New York City. The results show that the proposed
policy outperforms a reactive optimization approach in a variety of vehicle availability scenarios while hiring
fewer vehicles. Moreover, we demonstrate that mobility services can oer strict service level contracts (SLCs) to
dierent user groups featuring both delay and rejection penalties.
Key words: autonomous ridesharing platform; stochastic heterogeneous demand; stochastic vehicle supply;
machine learning; approximate dynamic programming; service level contracts; on-demand hiring.
1. Introduction
Mobility-on-demand (MoD) platforms and transportation network companies (TNCs) such as Uber
and Lyft have grown substantially and altered mobility behavior worldwide. Although envisioned
to make vehicle ownership superfluous and, eventually, alleviate congestion, these ride-hailing
platforms have primarily won customers from traditional public transport modes (Castiglione
et al. 2018). Ultimately, MoD solutions can only challenge vehicle ownership if service providers
can oer high service levels consistently.
Sucient vehicle supply is a critical factor when it comes to providing consistent service levels
eciently and sustainably. However, most existing models for ridesharing are not capable of re-
sponding quickly to significant demand changes. First, often fixed fleet sizes are assumed, which
makes it hard to react to demand fluctuations on a tactical level, let alone in real-time. Second,
when providers rely on third-party vehicles (i.e., independent drivers), they typically balance
supply and demand using surge prices: fares at under-supplied areas dynamically increase both
to attract more drivers and suppress excessive demand. Such a strategy, however, is highly contro-
versial since it mainly benefits the platform at the expense of drivers and riders (Xu, Yin, and Ye
2020). Regardless of the strategy, some customers end up being penalized with excessive delays,
abusive prices, and rejections. With the emergence of autonomous vehicles (AVs), however, new
possibilities to overcome the shortcomings caused by demand-supply imbalances arise. As soon
as vehicle availability is detached from driver availability, ridesharing platforms can count on a
larger pool of vehicles, which, currently non-automated, remain parked about 95% of the time
(Shoup 2017).
In this study, we consider an autonomous mobility-on-demand (AMoD) system where a rideshar-
ing platform can occasionally hire freelance autonomous vehicles (FAVs), that is, idle third-party-
owned AVs, to support its own platform-owned autonomous vehicles (PAVs) fulfilling the demand
adequately. Hence, in contrast with related literature, we model a highly diversified mobility
system where AV ownership is disseminated among the platform and individuals, who simulta-
neously own and hire out their vehicles. We refer to this system as the AMoD-H. To guarantee
service quality, the platform establishes strict service level contracts (SLCs) with its user base,
such that contract violations (e.g., extra delays, rejections) incur penalties. Hence, by harnessing
FAV availability, the platform can shorten the minimum size of the own fleet while addressing
personalized demand fluctuations in real-time.
Modeling AMoD-H poses several challenges since requests have to be handled dynamically in
the face of (i) irregular FAV availability and (ii) uncertain demand. First, while fleet availability
is mostly taken for granted, we assume that the location, announcement time, and total service
duration of freelance vehicles are uncertain. For example, FAV availability may resemble that of
future AVs whose owners commute by car to work and decide to rent out their vehicles to an
AMoD platform during designated intervals such that they have a chance to profit from other-
wise unproductive parking times. Second, analogously to service oers in the aviation and rail
industry, we segment users into first and second classes, such that the former is willing to pay a
premium to enjoy higher service levels. We consider not only the stochastic trip distribution but
also class membership distribution when designing anticipatory rebalancing strategies. Based on
such details, platforms can improve decision making by taking into account demand patterns
arising within its user base, besides moving forward in the direction of a more personalized user
experience. Since we consider a real-world transportation demand setting, determining an opti-
mal policy would incur all “curses of dimensionality”, and we are unable to enumerate all pos-
sible states and decisions, let alone the uncertainty associated with requests, and FAV hiring. We
therefore develop an approximate dynamic programming (ADP, Powell 2011) using value function
approximations (VFAs). In the proposed approach, the dual variables of the underlying assign-
ment problem, defined through a mixed integer programming (MIP) formulation, are iteratively
used to approximate value functions representing the benefit of having an additional vehicle of
either type at a certain location and time. Moreover, particularly for the freelance fleet, such ap-
proximations also indicate whether it is worthwhile to engage an FAV in further rebalancing or
pickup actions, based on its remaining available time or how far it is from its owner’s location. At
the same time, VFAs are actively used in the objective function of the MIP formulation to weigh
the outcome of present decisions (e.g., vehicle rebalancing, parking, and hiring). Eventually, after
a number of iterations and value function updates, these learned approximations more accurately
represent future states, such that solution quality improves over time. From a methodological
perspective, the approach oers:
1. An ADP algorithm for a novel AMoD application that sustains contracted service levels of a
heterogeneous user base by controlling vehicle supply on the operation level through on-demand
hiring. Requests and third-party AVs arrive stochastically within the service area, such that the
platform needs to determine a policy to fulfill the demand using either vehicle type (i.e., PAV or
2. A hierarchical aggregation structure that summarizes state features using both time and
space dimensions. Spatial levels are comprised of increasingly larger clusters (regional centers)
set up according to a minimum coverage set formulation on a high-resolution street network of
Manhattan, New York City. In turn, temporal levels conform with the level of responsiveness
demanded by modern mobility-on-demand applications, in which decisions (e.g., user-vehicle
matching, vehicle dispatching, and rebalancing) need to be derived in short intervals.
3. An online discount function that dampens value function approximations arising from de-
cisions involving multiperiod travel times (i.e., resource transformations that take more than one
period). Besides leading to more robust estimations and simplifying the state representation, we
show that such a discount function enables more complex rebalancing strategies since vehicles
can consider varying distance ranges.
From a managerial viewpoint, we show that our policy addresses the requirements of all stake-
1. Users enjoy personalized service levels and are compensated when these are violated.
2. Cities can impose strict street use regulations, such as the maximum number of cars per
intersection, congestion pricing, and parking schemes. This level of control is enabled by our
network representation, which is directly anchored to the real-world physical structure.
3. Independent AV owners can profit from their cars’ idleness by making them available to
join a transportation platform during predefined time windows.
4. AMoD platforms may keep the minimum number of cars necessary to maintain customers’
service levels, or instead, rely entirely on the freelance fleet, while maximizing profits.
The outline of the paper is as follows. We present our literature review in Section 2, define
the problem in Section 3, and formulate it using the language of dynamic resource management
in Section 4. Section 5 presents our approximate dynamic algorithm, and Section 6 lays out the
details of our experimental study and analyzes the performance of our method when dealing with
several transportation scenarios. Finally, Section 7 concludes the work and presents an outlook
for future research.
2. Literature
The goal of this literature review is threefold. First, we identify the underlying dial-a-ride problem
(DARP) our model stems from (Section 2.1). Second, we survey studies on transportation plat-
forms in which the demand (parcels or passengers) is fulfilled both by company- and/or third-
party-owned vehicles (Section 2.2). Third, we analyze the mobility-on-demand literature that
considers anticipation mechanisms (Section 2.3). We show that our work is the first to address two
dierent sources of uncertainty, namely, third-party vehicle availability and user service levels.
2.1. The dynamic and stochastic dial-a-ride problem
In this study, we introduce a generalization of the classic dynamic and stochastic dial-a-ride problem
(DSDARP) (see Ho et al. 2018, for a comprehensive survey on DARP). Regarding the sources of
uncertainty linked to our problem, as pointed out by Ho et al., stochasticity is generally on the
side of demand, and rarely on the side of the supply. The lack of studies on supply stochasticity
can also be seen across other transportation problems. For instance, on the vehicle routing problem
(VRPs) literature, the bulk of stochastic models also focus on uncertain demand features (see
Oyola, Arntzen, and Woodru2018, for a review on stochastic VRPs) with a few exceptions (e.g.,
vehicle breakdown). On the other hand, when the demand is a source of uncertainty, service level
stochasticity was not explored in the literature yet. This type of stochasticity arises from a user
base with heterogeneous customer profile segments whose transportation patterns, as well as
their expectations regarding service quality, dier markedly.
2.2. On-demand and crowdsourced vehicles
In this section, we review studies in which a crowdsourcing platform matches the demand (par-
tially or entirely) to third-party vehicles. Most studies in this category refer to ridesharing or
crowdshipping scenarios, in which a platform seeks to fit riders or parcels into already planned
driver routes. Moreover, in contrast with our work, these studies do not consider vehicle automa-
tion, such that the availability and preferences of the drivers (rather than the AV owners) have
to be taken into account to design feasible routes. Since we only focus on dual-fleet models, the
reader may refer to Furuhata et al. (2013) and Le et al. (2019) for comprehensive reviews on
ridesharing and crowdshipping, respectively.
First, regarding the ridesharing scenario, Lee and Savelsbergh (2015) assume dedicated drivers
complement the ad-hoc fleet, satisfying rider requests that would otherwise remain unmatched.
The authors argue that ensuring service levels is essential to retain more participants and in-
vestigate the cost-benefit of employing dedicated drivers. Their findings suggest this cost-benefit
depends on the number and time flexibility of the participants, as well as on the similarity be-
tween their travel patterns. Santos and Xavier (2015) consider a setting where passengers are
willing to share both taxis and rides, as long as sharing leads to lower costs than private trips. On
the other hand, vehicle owners can reduce costs by servicing multiple passengers on the way to
their destination. A greedy randomized adaptive search procedure (GRASP) heuristic is used to solve
the dynamic version of the problem in a realistic scenario, with requests arriving at every minute
and private vehicles at every hour, throughout a twelve-hour horizon. Although the underlying
DARP variant they propose is general enough to accommodate our problem, leveraging passenger
and vehicle stochasticity is out of the scope of their study. Moreover, following the ridesharing
tradition, they assume drivers stop servicing customers as soon as they reach their destination. In
contrast, our formulation is closer to the general pickup and delivery problem (GPDP) considered by
Savelsbergh and Sol (1998), where vehicles are stationed at a home depot, from where they can
go back and forth within a designated time window.
Second, regarding the crowdshipping scenario, Archetti, Savelsbergh, and Speranza (2016) con-
sider that a company can rely on occasional drivers (ODs) besides their own fleet to deliver goods.
After arriving at the company’s depot, each OD can make at most one delivery, provided that the
extra travel distance required to do so, does not violate a flexibility threshold. Arslan et al. (2019)
build on Archetti, Savelsbergh, and Speranza’s work by considering ODs can realize multiple
pickup and/or drop-otasks as long the extra time and number of stops does not inconvenience
the drivers. Similarly to traditional ridesharing approaches, however, they assume that the deliv-
ery platform relies solely on third-party vehicles and consider tasks can be eventually handled
by an emergency backup fleet to keep service levels high. Dahle, Andersson, and Christiansen
(2017) also extend Archetti, Savelsbergh, and Speranza’s by assuming ODs can perform multi-
ple pickup and delivery operations within a time window. Later, Dahle et al. (2019) focus on the
design of compensation schemes that can fulfill ODs personal expectations, which are modeled
through threshold constraints. They show that even sub-optimal compensation schemes, which
do not attract as many ODs, can yield substantial cost savings.
Table 1 oers an alternative view on the pickup and delivery literature where requests can be
fulfilled both by dedicated and third-party vehicles. In the first column, papers are subsumed
under parcel and passenger categories. In the second column, we identify how authors refer to
the provider’s fleet and the third-party fleet. We also indicate whether the model considers a
multi-trip (i.e., vehicles can return to their depot multiple times), or single-trip (i.e., vehicles
stop the service as soon as they reach their depot or destination) setting. A checkmark in the
“capacity” column indicates each vehicle can handle multiple requests at a time. To highlight
how information regarding the third-party vehicles and the customer demand unfolds through-
out time, we adopt the standard taxonomy used to classify transportation problems (e.g., VRPs,
DARPs). Traditionally, these problems can fall into four categories, namely, static-deterministic
(SD), dynamic-deterministic (DD), static-stochastic (SS), and dynamic-stochastic (DS). Dynamic
or static classes indicate whether new information (e.g., demand, third-party vehicles) can mod-
ify existing plans. In turn, deterministic or stochastic classes indicate whether information about
the uncertainty (e.g., demand and third-party vehicle distributions) is available at decision time.
Finally, the last column shows the method used to solve each problem.
2.3. Stochastic mobility-on-demand problems
Assuming a fleet of centrally controlled (autonomous) vehicles, Alonso-Mora et al. (2017), Vazifeh
et al. (2018), and Fagnant and Kockelman (2018) have demonstrated that historical taxi demand
could be almost entirely fulfilled with significantly fewer vehicles, especially when passengers are
willing to share their rides. Studies have also shown that service levels can be substantially im-
proved through anticipatory rebalancing strategies. For example, demand data has been already
successively exploited using frequentist approaches (e.g., Alonso-Mora, Wallar, and Rus 2017),
reinforcement learning (e.g., Wen, Zhao, and Jaillet 2017, Gueriau and Dusparic 2018, Lin et al.
2018), model predictive control (e.g., Zhang, Rossi, and Pavone 2016, Iglesias et al. 2018, Tsao,
Iglesias, and Pavone 2018), and approximate dynamic programming (e.g., Al-Kanj, Nascimento,
and Powell 2020). Most of these approaches, however, dimension fleet size experimentally, by
simulating configurations that can service the target demand under predefined minimum ser-
vice level requirements. As pointed out by Vazifeh et al., fleet-size inflation can be required as
a consequence of trip-demand bursts, occurring, for instance, after concerts or sports matches.
Table 1 Transportation problems in which demand is (partially) fulfilled by third-party vehicles.
Reference Terminology Third-party
fleet supply
trip Capacity Demand Method
Parcel transportation
Archetti, Savelsbergh, and Speranza (2016) company vehicle
occasional driver
Arslan et al. (2019) back-up vehicle
ad hoc driver
DD S DD Heuristic
Dahle, Andersson, and Christiansen (2017) company vehicle
occasional driver
Dahle et al. (2019) company vehicle
occasional driver
Savelsbergh and Sol (1998) company vehicle
independent driver
Passenger transportation
Lee and Savelsbergh (2015) dedicated driver
ad hoc driver
Santos and Xavier (2015) taxi
car owner
This study platform AV
freelance AV
Supply & Demand: SD (static-deterministic), SS (static-stochastic), DD (dynamic-deterministic), DS(dynamic-stochastic).
Method: TS (Tabu Search), LP (Linear Programming), MIP (Mixed Integer Programming), B&P (Branch and Price),
ADP (Approximate Dynamic Programming), GRASP (Greedy Randomized Adaptive Search Procedure),
NS (Neighborhood Search).
Hyland and Mahmassani (2017) refer to this ability to change the fleet size to flex with demand
as “fleet size elasticity” and highlight that the benefits of increasing vehicle supply in the short
term are likely significant. The authors also point out that, despite uncommon within the context
of shared autonomous vehicle (SAV) fleet management research, current TNCs rely entirely on this
feature, constantly manipulating prices to attract more drivers. Likewise, vehicle ownership may
be highly disseminated in the future autonomous mobility market, with most AVs owned by indi-
viduals and small fleet operators rather than a single service provider (Campbell 2018). Although
some models are flexible enough to handle dynamic fleet inflation (e.g., Ma, Zheng, and Wolfson
2015, Gueriau and Dusparic 2018), research on short-term SAV fleet size elasticity is still lacking
(Narayanan, Chaniotakis, and Antoniou 2020).
3. Problem description
The AMoD-H emerges on AV-based transportation platforms aiming to fulfill a set of pickup and
delivery requests Parising on an urban network G= (N,E), where Nis a set of nodes (locations)
and Eis a set of directed edges (streets). Requests arrive in batches Pt, where all requests rPt
have arrived at continuous times in the interval [t1, t), for discrete time t∈ T ={0,1,2,3,...,T}.
We consider that request arrival follows a known stochastic process FPconcerned with two
sources of uncertainty:
Request distribution: The number of requests, arrival times, and origin-destination nodes de-
pend on user demand patterns.
Request class: Each request is associated with a service quality class cCthat identifies mini-
mum service level requirements, particularly maximum pickup delays. Requests render the high-
est contributions when these requirements are respected, or, otherwise, incur in class-dependent
waiting and rejection penalties.
To ensure these service quality requirements are met fully, the platform can hire additional
vehicles online to address unexpected supply-demand mismatches. The fleet is comprised of a set
of platform-owned autonomous vehicles KPAV and a set of freelance autonomous vehicles KFAV,
such that the total fleet set is K=KPAV KFAV . While PAVs can initiate service at any location
nN, FAVs are distributed throughout a set of locations ONwhere they are typically parked.
We refer to locations oOas stations since FAVs are required to return to them upon finishing
the service contract. Although Ois known in advance, the platform deals with three sources of
uncertainty (which follow a stochastic process FO) when dealing with FAVs, namely:
Vehicle-station distribution: Some parking locations can be more prone to accommodate FAVs.
For example, vehicles can routinely park in the surroundings of their owner’s locations (e.g.,
workplace, garage), or in more aordable parking places on the outskirts of the city.
Announcement time: Vehicles are available to pick up users at stations at dierent times for a
given day. For example, FAVs can become available downtown as soon as they drop their owners
at work. Alternatively, some owners can make their FAVs available (possibly, from their garage)
during the night or over the weekend. Regardless of the case, provided that one’s itinerary is
somewhat irregular due to external factors (e.g., weather, congestion) or particular preferences
(e.g., appointments, company’s culture), the announcement time can change. For example, a sta-
tion that typically accommodates a hundred vehicles can have 20%, 50%, and 30% of them arriv-
ing in the intervals [7 AM, 8 AM), [8 AM, 9 AM), and [9, 10 AM), respectively.
Contract duration: From the announcement time on, FAV owners make their assets available
only during a predefined time interval. Consequently, vehicles must stop servicing users at the
right moment, such that they have enough time to travel back to their respective stations before
their owner’s deadline. Analogously to the announcement times, the contract durations may de-
pend on several factors related to an owner’s schedule, leading to varying return deadlines. For
example, contracts can be short (e.g., shopping, doctor appointments), average (e.g., oce hours,
evening), and long (e.g., the whole weekend, vacation).
Finally, over the planning horizon T, the platform aims to maximize the total contribution
accrued by adequately servicing the requests while minimizing the operational costs associated
with routing and hiring vehicles.
3.1. Example
In Figure 1, we illustrate the interplay between the elements of our model. For the sake of simplic-
ity, we represent both vehicle and request discrete locations on a one-dimensional space for each
time step such that N={A, B, C, D, E, F, G, H, I, J, K}and consider a time horizon T={1,2,...,10}.
We assume it takes a single period to travel between each location pair.
We represent the stochastic process FPin time and space by manipulating the transparency of
yellow and red colors, corresponding to first- and second-class customers, respectively (see the
color bar at the bottom). Regardless of the class, the more opaque is the color, the higher is the
probability of finding a request. We assume first-class requests (i) generate higher profits, and (ii)
demand higher service levels, such that failing to adhere to their performance requirements incur
higher penalties. We illustrate such higher service levels by assuming first-class users require to
be picked up within one period, whereas second-class users are willing to wait up to two periods.
In turn, regarding the availability of the freelance fleet, we represent the stochastic process FO
using the shades of gray on the axis tick labels. We assume that the darker is the shade, the
higher is the chance of an FAV appearing. Additionally, we assume FAV contract durations last
on average four time steps.
In the following, we describe the behavior of three vehicles in detail throughout ten periods.
First, at time t= 1, the PAV at location K is faced with two decisions; namely, it can stay in its
current location or move to a more promising location in anticipation of future demand. Promis-
ing areas consist of high-demand locations that typically generate the highest profits to the ser-
vice provider. Pursuing such future profits, the PAV performs two rebalance movements, moving
empty from K to J, and from J to I. Once it arrives at location I at time t= 3, the PAV is assigned
to a second-class request (black square) demanding a trip from G to F (solid upper arrow). From
this moment on, we consider the PAV is servicing the user, which covers both pick-up and deliv-
ery times. Once the PAV delivers the second-class user at F, it stays in F for one period, and then
rebalances to location I, in anticipation of future passenger demand.
Second, at decision time t= 2, an FAV with an eight-period contract duration becomes available
at location A and is immediately rebalanced to the high-demand area. Upon arriving at location
C, it is matched to a first-class request demanding a trip from D to E. The FAV travels from C to
D to pick up the user and finishes the service at location E and time t= 6. By this time, the FAV is
available for four additional periods but spends this remaining time traveling back to its station
at A to comply with the contract deadline.
2nd class1st class Origin Destination
1st None
Spatiotemporal distribution
of passengers
to station
PAV FAV (3/4 remaining) With passenger
Figure 1 Example of the AMoD-H.
Finally, at location J and time t= 5, a second FAV with a four-period contract duration appears.
However, since it cannot reach high-demanding areas in the subsequent periods, this vehicle ends
up not being hired by the platform, staying still until the end of its contract at time t= 9.
4. Problem formulation
We model the problem using the language of dynamic resource management (see Sim˜
ao et al.
2009), where AVs (resources) are servicing subsequent trip request batches (tasks) occurring at
a discrete-time t∈ {1,2,...,T}. Subsequently, we present the elements of our model: the system
state (Section 4.1), information arrival (Section 4.2), decisions (Section 4.3), costs (Section 4.4),
and objective function (Section 4.5).
4.1. System state
The state of a single resource is defined by an four-attribute vector agiven by
Vehicle type
Remaining servicing time
Current location
We refer to a single vehicle kKwith attribute aas ka. First, the vehicle type attribute atype
helps distinguishing between third-party-owned, freelance vehicles (FAVs), and platform-owned
vehicles (PAVs). This distinction is crucial because FAVs operate under a stricter availability and
dierent cost plan (owners are entitled to a higher share of the profits).
The duration of such availability, in turn, is captured by the remaining servicing time attribute
aremain, which corresponds to the remaining time interval an FAV still can spend servicing orders.
PAVs, on the other hand, are assumed to be always available. Thus, as time goes on, an FAV is
increasingly unable to pick up new orders, especially those whose destinations are far away from
the vehicle’s station.
The station attribute astation corresponds to the start and terminal location of each FAV, suppos-
edly, the parking place where the owner expects the vehicle to return once the remaining servicing
time has expired. It is worth noting that attributes aremain and astation are not taken into consid-
eration for PAVs; we assume these vehicles are available indefinitely, besides not being obliged to
depart from or return to a station.
Finally, similarly to astation, the current location field acurrent expresses where a vehicle is on the
service area. The locations identified by attributes astation and acurrent integrate the node set Nof
the street network graph G= (N,E).
By including the temporal dimension, we have at, or the attribute vector of an AV at time t. Let
Abe the set of all possible vehicle attribute vectors. The state of all vehicles with the same state
vector is modeled using
Rta = Number of vehicles with attribute vector aat time t.
Rt=(Rta)a∈A = The resource state vector at time t.
Each request, in turn, is modeled using a three-attribute vector b, given by
We refer to a single trip rPwith attribute vector bas rb. Similarly to vehicle locations,
origin (borigin) and destination (bdest) attributes correspond to nodes of the street network (i.e.,
borigin, bdest N), whereas the bclass attribute identifies the requested service quality cC.
Let Bbe the set of all possible request attribute vectors. The state of all rides with the same
state vector occurring at time tis modeled using
Dtb = The number of trips with attribute vector bat time t.
Dt=(Dtb)b∈B = The request state vector at time t.
With the resource and request state vectors, we defined our system state vector as
St= (Rt,Dt).
4.2. Exogenous information
Although the underlying system is known to evolve continuously over time, we measure states St
before making any decisions at discrete periods t∈ {0,1,2,3,...,T}. Between subsequent periods
t1 and t, we account for the exogenous information processes concerning both vehicle and
demand attribute updates using variables
Rta = The change in the number of FAVs with attribute aresulting from information
arriving between t1 and t.
Dtb = The number of new requests with attribute bplaced between t1 and t.
Wt= ( ˆ
Dt) = Exogenous information arriving between t1 and t.
For the complete stochastic process, we let ωrepresent the sample path W1, W2, .. ., WT,
where is the set of all sample paths.
In this study, we consider ˆ
Rta is concerned only with FAVs entering the system. However,
it could also account for several alternative sources of uncertainty, such as travel delays, vehi-
cle breakdowns, and early termination of FAV contracts. In any case, whenever an AV attribute
changes randomly from ato a0, we would have ˆ
Rta =1 and ˆ
Rta0= +1.
4.3. Decisions
Regarding the types of decisions used to act on the fleet, we consider every vehicle can service a
user (which is reachable within his maximum pickup delay), stay parked in its current location
waiting to pick up users, rebalance to a more promising location, or, in the case of FAVs, return to
its station before the contract deadline. Decisions are described using
dstay = Decision to stay parked in the current location.
dreturn = Decision to return to the station (FAV only).
DR= Set of all decisions to rebalance (i.e., move empty) to a set of neighboring
DS= Set of all decisions to service a user, where an element d∈ DSrepresents
the decision to cover a trip request of type bd∈ B.
c= Subset of decisions in DSassociated with each service quality cC.
D= Set of all decisions d∈ DSDR{dstay}{dreturn}.
xtad = Number of times decision dis applied to a vehicle with attribute vector a
at time t.
xt=(xtad )a∈A,d∈D = Decision vector at time t.
4.3.1. Transition function To model how decisions aect vehicle states, we consider a deter-
ministic transition function aM. Hence, before any new information arrives, applying a decision
dto a vehicle with attribute vector aat time t, leads to a post-decision attribute vector
In turn, the new time of availability is given by
t0=(t+ 1,if d=dstay,
t+τ(t, a, d),if d∈ DS DR{dreturn},
where τ(t, a, d)is the travel time spent to carry out a decision dto service a user, rebalance to
another location, or return to the station. We consider that between tand t0, vehicles are busy,
such that the system cannot exert any control over them. Therefore, if, for instance, a decision d
to cover a trip bis applied to a vehicle with attribute vector aat time t, this vehicle will end up in
state a0(with a0
current =bdest), and can only be used again at t0.
4.3.2. Abiding by the street network capacity To avoid unrealistic vehicle distributions, we
consider locations jNcan only accommodate up to kmax
jvehicles. In a real-world setting, dier-
ent locations have dierent capacities, which may depend not only on the physical infrastructure
(e.g., number of parking places), but also on city regulations. An artificial threshold may be im-
posed, for instance, to alleviate local congestion or improve accessibility to surrounding facilities.
To implement this restriction, we keep track of the number of vehicles kinbound
j(0 kinbound
j) inbound to jand assure that at most kmax
jextra vehicles can enter j. Further, we
define the set of all decisions leading vehicles kato post-decision locations jas
Da,j ={d|a0=aM(a,d), a0
current =j, a0
current ,a0
station,d∈ D \ DS},
Using Da,j and kinbound
j, we can calculate the post-decision number of vehicles inbound to j
(through either rebalance or stay decisions), and ensure the maximum capacity of jis not violated
(see constraints (3) in Section 4.3.4). One must notice that Da,j does not cover FAVs inbound to
their own stations (i.e., a0
current =a0
station =j). FAVs are assumed to have free access to their home
stations at any time.
4.3.3. Fulfilling contract time windows An FAV with attribute vector acan only be acted on
using a decision dto stay, rebalance, or service users, when there is enough remaining servicing
time to return to its station, that is
τ(t, a, d) + τ(t0, a0, dreturn)aremain a∈ A, d ∈ D − dreturn.
Otherwise, the decision is deemed to be invalid, and the corresponding xtad variable is preemp-
tively discarded. At each period t, we define the set of vehicle attribute vectors associated to FAVs
that must return to their station using
Areturn ={a| ∀a∈ AFAV , τ(t, a,dreturn ) = aremain}.
Although FAVs will eventually realize the return decision, we consider that they can always return
to their station directly, even before their contract due time. Doing so adds flexibility to FAV
operation since they can rebalance back and forth from their station, when suitable. This way,
provided that FAV owners have already covered parking costs at their stations, the platform may
find it worthwhile rebalancing FAVs back sometimes, to evade city parking costs.
4.3.4. Constraints The decision variables xtad must satisfy the following constraints:
xtad =Rta a∈ A (1)
xtad Dtbdd∈ DS(2)
a∈A X
xtad kmax
xtadreturn =Rta a∈ Areturn (4)
xtad 0a∈ A,d∈ D (5)
Constraints (1) guarantee that all available vehicles (i.e., parked at the current period t) are
assigned to a decision, whereas constraints (2) ensure that any trip request (identified by bd) can
be assigned to at most one vehicle. In turn, constraints (3) enforce that the number of vehicles
inbound to a location jdo not surpasses js remaining capacity. Finally, constraints (4) ensure
that every FAV whose returning trip delay t(acurrent, astation)is equal to its remaining contract
duration aremain is obliged to return to its station.
4.4. Cost function
Applying a decision dto a vehicle kaat time t, takes the vehicle to state a0at time t0and generates
a contribution ctad given by
ctad =
base +ptime ·ttrip ck
time ·(tpickup +ttrip)cc
delay ·wdelay,(service),
time ·trebalance,(rebalance),
time ·treturn,(return),
ct, j
Contributions ctad are comprised of
βk= Platform profit margin when using vehicle k.
base = Base fare of request b=bdof decision dfrom quality class c=bclass.
ptime = Time-dependent fare.
ttrip = Trip duration t(borigin, bdest) of request b=bd.
time = Time-dependent operational cost of vehicle k.
tpickup = Pickup duration t(acurrent, borigin ) from current location to trip borigin.
delay = Penalty due to the excess delay wdelay.
wdelay = max{0,tpickup wc
pickup}. Excess delay over the pickup delay wc
pickup contracted
by a user from class c.
trebalance = Rebalance travel duration t(acurrent, r) to target location rN.
treturn = Return travel duration t(acurrent, astation) of vehicle with atype =FAV .
ct, j
stay = Cost of staying at location jat time t, such that j=acurrent, and jN.
The profit margin βkdetermines the percentage owed to the platform by assigning trips to a
vehicle kaof type atype ∈ {PAV, FAV}. It allows us to adequately adjust, from the perspective of
the platform, the incentive FAVs have to serve at their available times. Similarly to today’s MoD
applications, we assume most profits belong to the independent contractors, namely, FAV owners.
In turn, constant ck
time represents typical operational costs (e.g., tolls, fuel, wear and tear) for a
vehicle k. We consider these costs are equal for all vehicles, regardless of the type. The basis of
a pay-per-use parking system is captured by the cost ct,j
stay of staying at the current location at
time t, allowing city managers to create incentives for vehicles to avoid parking in congested
areas. A user who requests a ride in class cexpects to be picked up within wc
pickup time units, but
can tolerate up to wc
tolerance time units over wc
pickup to be serviced, as long as he is compensated
for the excess delay wdelay = max{0,tpickup wc
pickup}and wdelay wc
tolerance. From the platform
perspective, this compensation represents a delay penalty cc
delay incurred for wdelay >0, defined
delay =pc
This way, if wdelay =wc
tolerance, the base fare is totally oset by the penalty, and the platform will
only profit from the time-dependent fare. Further, we consider that when the platform fails to
pick up a user from class cwithin the class maximum waiting time wc
tolerance, the platform
has to bear a rejection penalty
rejection =ρ·pc
where ρis a penalty factor. Hence, when rejecting a request, the platform does not only fail to
profit but may be required to compensate the inconvenienced users for a breach of contract, ac-
cording to their service-level class. By setting up ρ, we can choose the extent to which rejections
incur further losses, allowing us to experiment with dierent penalization schemes. Ultimately,
for each period t, the total rejection penalty resulting from failing to service users from dierent
classes is given by the function
rejection X
Dtb X
a∈A X
xtad .(6)
We consider the service level violation penalties cc
delay and cc
rejection are an essential element of
SLCs, once they further back up the platform’s commitment to service level fulfillment. By com-
bining these two penalties, we guarantee that the platform is always better oservicing a user,
regardless of the service level violation. Even when the base fare is totally oset by the delay
penalty, the platform still can profit from the time-dependent fare whereas rejecting a user always
leads to losses. Finally, the contribution function representing the profit a platform can accrue at
each period tis given by
ctad xtad Pt(St, xt).(7)
4.5. Objective
Let Xπ
t(St) be a decision function representing a policy πΠthat maps a state St∈ S to a decision
xt∈ Xt, where Sis the state space, Xtis the set of feasible decisions in state St, and Πis the set
of potential decision functions. Starting from an initial state S0, we aim to determine the optimal
policy πthat maximizes the expected cumulative contribution, discounted by a factor γ, over
the planning horizon T:
0(S0)= max
5. Algorithmic strategies
In principle, we can solve Equation (8) by means of classical dynamic programming, recursively
computing (backward through time) Bellman’s optimality equations
Vt(St) = max
(Ct(St,xt)+γE{Vt+1(St+1)|St, xt}),(9)
where St+1 =SM(St,xt, Wt+1 ). The transition function SM(.) describes how the pre-decision state
Stevolves to the subsequent pre-decision state St+1, upon applying decisions xtand receiving
random information Wt+1. For each period, using the expected contributions Vt+1 allows us to
account for the downstream eect of decision making.
Solving (9), however, is computationally intractable for our problem setting. Doing so, would
incur in all the three “curses of dimensionality” (Powell 2011). First, we are unable to enumerate
all states Stin state space S, to evaluate value functions Vt(St). Second, we are unable to find the
optimal decision from the decision space Xtfor all states in S. Third, we are unable to determine
the outcome space, whose dimensionality depends on the random information Wt+1, which for
our problem comprises the uncertainty associated with the appearance of requests and FAVs.
5.1. An approximate dynamic programming algorithm
To estimate the value functions around each state in Equation (9), we develop an approximate
value iteration algorithm (see, e.g., Powell, Sim˜
ao, and Bouzaiene-Ayari 2012). This ADP algo-
rithm relies on the concept of post-decision state, which is a deterministic state immediately after
implementing a decision and before any new information has arrived. Thus, applying the decision
vector xtto state Stleads to a deterministic post-decision state
t=SM,x (St, xt),
where SM,x (.) is a transition function describing how the system evolves from Stto Sx
decisions xt. Then, from the post-decision state Sx
t, we can compute the subsequent pre-decision
St+1 =SM,W (Sx
using SM,W (.), a transition function that models the arrival of new information Wt(ω). Through
these functions, using a given policy πover the planning horizon Twould produce a sequence
(S0, x0, Sx
0, W1(ω), S1, x1, Sx
1, W2(ω), S2, .. . , ST1, Sx
T1, WT(ω), ST). By breaking Equation (9) into
two steps we have:
Vt(St)= max
t) = E{Vt+1 (St+1)|Sx
Since we cannot compute Vt(Sx
t) exactly, we aim to find Vt(Sx
t), that is, a value function approx-
imation around the deterministic post-decision state Sx
t. Once we already penalize rejections, we
assume the unmet requests from the post-decision demand vector Dx
tare not carried over to fu-
ture periods (i.e., we set Dx
t=). In practice, this assumption implies that users will either walk
away or re-enter the system in the next period through a new request upon being rejected. Hence,
the post-decision state is equivalent to the post-decision resource vector, that is, Vt(Sx
Following the ADP algorithm, we estimate these approximations iteratively, such that at each it-
eration n= 1,2,...,I, a dierent sample path ωnis considered, and we can take decisions using
the value functions learned up to iteration n1. Accordingly, to indicate the iterative nature of
the algorithm, a superscript nis added to all variables.
Assuming Vn
tis linear in Rn
ta, we have
t) = X
δa0(a,d)xtad ,(10)
t0a0= Marginal value of a vehicle with post-decision attribute vector a0at arrival time t0at
iteration n.
δa0(a,d)= Transition function equals to 1, if aM(a,d)=a0, and 0, otherwise.
In our problem, the marginal values vn
ta have slightly dierent interpretations, depending on
vehicle type. For PAVs, these values approximate the overall contribution (i.e., until the end of
the simulation horizon T) of assigning an incremental vehicle to a certain location at a certain
time. For FAVs, however, a marginal value also reflects a vehicles remaining contract duration
and the station location. For example, FAVs with higher remaining service durations, operating in
locations close to their stations, are expected to draw higher contributions. Conversely, FAVs far
from their stations and with contracts about to expire, cannot render as high contributions since
the last moments of their contract are reserved to a return trip to their station.
Although we assume Vn
tis linear in Rn
t, we acknowledge this assumption is prone to re-
sult in an oversupply of vehicles in regions associated with high marginal values. Intuitively, the
more vehicles rebalance to a certain region, the lower becomes their average contribution since
only a few of them actually will service users. Instead of dampening these values as the number
of vehicles increases (by using piecewise-linear approximations as in Topaloglu and Powell 2006),
we limit the number of vehicles arriving at each network location. Our fine-grained spatiotempo-
ral representation (featuring short periods and exact street coordinates), allow us to restrict the
number of vehicles entering each location in constraints (3). Besides avoiding vehicles flooding
certain areas, these constraints add a degree of realism to the model since they are based on the
physical capacity of the actual infrastructure as well as city’s trac rules.
We do not restrict the number of vehicles dropping passengers at the same location because
they are already bounded by the number of demands. Due to the characteristics of our problem
setting, especially the adoption of short periods and the spatiotemporal distribution of the de-
mand, it is unlikely that a high number of users are arriving at the same location at the same
Finally, the problem of finding the optimal decision function is
t)=arg max
ctad xtad +γX
=arg max
ctad +γX
xtad (12)
=arg max
d∈Dctad +γvn1
5.2. A discount mechanism for multiperiod travel times
Mobility-on-demand users typically require quick response times from transportation platforms.
For this reason, most studies on urban MoD applications either process requests as soon as they
are placed or in batches, usually considering short time intervals. Following such practice in our
ADP approach, however, prevents us from assuming that all decisions acting on the resources
will be completed in the subsequent period. In fact, most pickup and rebalancing decisions can
last longer than a single period. Although we work with a high-resolution street network, our
locations still correspond to a restricted subset of all possible coordinates. The lower is the res-
olution of the underlying map, the fewer are the locations available, and the more multiperiod
travel times can be expected between location pairs. Therefore, incorporating such a feature helps
to create a more robust solution, independent of the length of the periods or the underlying map
Such multiperiod resource-transformation times have a significant influence on the value func-
tion approximations. To avoid adding another attribute to our resource attribute vector to account
for the arrival time at the destination location (see Topaloglu and Powell 2006), we implement an
online discount mechanism to all value function approximations associated with post-decision
states arising from rebalancing decisions. We dampen the value function of post-decision states
a0=aM(a,d)by discounting the opportunity cost of staying still (i.e., d=dstay) during the rebal-
ancing periods t00 ∈ {t+ 1, t + 2, ... , t01}using
t00aM(a,dstay )d∈ DR.(14)
In Equation (14), if the resource-transformation time takes a single period (i.e., t0=t+ 1) we
have vn
t0a0. Since we do not allow vehicles to interrupt a rebalance trip, this adaptation is
crucial to avoid vehicles are too far-sighted, pursuing future rewards at long-distance locations
while ignoring the requests that might occur (in the next periods) in the surroundings of their
starting location after their departure. On the other hand, this adaptation also avoids that vehicles
are stranded in low-demand areas, allowing them to move directly to farther high-demand areas
instead of endlessly rebalancing to nearby low-demand neighbors. Therefore, at every decision
time, an idle vehicle can also rebalance to farther, high-value function locations, as long as this
decision is (i) at least as good as staying still for the whole rebalancing time, and (ii) competitive
in relation to rebalancing to its closest neighbors.
5.3. Value function updates
At iteration n, we consider a sample path ωnthat determines ˆ
Rt(ωn) and ˆ
such that Wt(ωn)=(ˆ
t). Let Vn1
t) be an approximation of the value of being in the
post-decision state Sx,n
t=SM,x (Sn
t,xt) considering the first n1 iterations. Given the state Sn
SM,W (Sx,n
t1,Wt(ωn)), we can make decisions at time tby solving the optimization problem
t)= argmax
t,xt)+γV n1
where we seek to determine the decision vector xtin the feasible region Xn
tthat maximizes the
sum of the current contribution and the expected contribution (discounted by a γfactor).
In Algorithm 1, we present how our optimization problem is inserted into a classic ADP al-
gorithm. First, all value function approximations are set to zero by default. Then, we start from
an initial state S1
0= (R1
0), where R1
0comprises the state vectors of PAVs randomly distributed
throughout the map, and D1
0is empty (i.e., no requests have arrived). We update value functions
ta using the samples ˆ
ta drawn from attribute vector aat time tand iteration n. New samples are
smoothed using stepsizes αnwhich are updated every iteration according to the McClain’s rule
(see George and Powell 2006), such that
where αis a constant that is approached as nadvances. Initially, we set α1= 1 such that value
functions can start with the first sample value measured for each state.
Algorithm 1 An approximate dynamic programming algorithm to
solve the AMoD-H assignment problem.
1: Choose an initial approximation V0
t,t∈ T ={0,1,...,T}.
2: Set the initial state to S1
3: for n= 1,...,I do
4: Choose a sample path ωn.
5: for t= 0,1,...,T do
6: Let xn
tbe the solution of the optimization problem (15).
7: Let ˆ
ta be the dual variable corresponding to the resource
conservation constraint (1) for each Rn
ta >0.
8: Update the value function using:
ta = (1 αn)vn1
ta +αnˆ
9: Compute the subsequent pre-decision state:
t=SM,x (Sn
t+1 =SM,W (Sx,n
10: Update the total number of vehicles Kjinbound to each
location jN.
11: end for
12: end for
13: Return the value functions: {vn
ta | ∀t∈ T ,a∈ A}.
5.4. Approximating the value function
Since we are unable to enumerate all the attributes in the state-space A, we use hierarchical ag-
gregation (Section 5.4.1) to create a sequence of state spaces. Aggregating on the space dimension
helps to estimate the value function of states featuring locations that were not yet visited, by
using the estimates of regions in hierarchically superior levels. We define regions by clustering
nodes in our street network (Section 5.4.2) that can be accessed from central locations within in-
creasingly higher maximal delays (Section 5.4.3). Besides aggregating across space, near periods
can aggregate up to larger time intervals since the value function of a vehicle at a location (or
region) is likely to carry some resemblance to the value functions of anterior/posterior periods.
Such resemblance, therefore, allow us to approximate value functions across periods that belong
to longer time intervals. Later, in Section 6.2.2, we present the final spatiotemporal hierarchical
aggregation structure, achieved experimentally by assessing the performance of dierent aggre-
gation structures on a single baseline scenario.
5.4.1. Hierarchical aggregation In order to estimate the value function of attributes not yet
observed, we use hierarchical aggregation coupled with the weighting by inverse mean squared
errors (WIMSE) formula proposed by George, Powell, and Kulkarni (2008). Our hierarchical ag-
gregation structure lays out a sequence of state spaces {(T ×A)(g), g = 1,2,...,|G|} with successive
fewer elements, where (T ×A)(g)represents the gth level of aggregation of the time-attribute space
T ×A. Hence, each attribute ta ∈ T × A can be aggregated up to an attribute ta(g)=Gg(ta), where
Gg:T × A → (T ×A)(g). Doing so allows us to estimate the value vn
ta associated with an attribute
ta by combining the values v(g,n)
ta from superior levels using
ta =X
ta ·v(g,n)
ta .
Weights w(g)
ta on the estimates of dierent aggregation levels are inversely proportional to the
estimates of their mean squared deviations, according to the WIMSE formula:
ta 1
ta(g,n)+µ(g ,n)
ta 2,
where σ2
ta(g,n)is the variance of the estimate v(g,n)
ta , and µ(g,n)
ta 2
is the aggregation bias, that
is, the dierence between the estimate v(g,n)
ta at aggregate level gand the estimate v(0,n)
ta at the
disaggregate level. Next, we normalize all weights by doing
ta =1
ta(g,n)+µ(g ,n)
ta 2
ta 2
5.4.2. The street network map AMoD studies within the scope of reinforcement learning and
ADP generally consider cars can rebalance to their immediate neighboring zones. Moreover, such
rebalancing operations are expected to last at most a single period, such that, at decision time,
all vehicles are either servicing customers or idle (potentially, after finishing rebalancing). Such
zones, however, are defined using artificial grids (e.g., Wen, Zhao, and Jaillet 2017, Al-Kanj, Nasci-
mento, and Powell 2020) or neighborhood borders (e.g., Gueriau and Dusparic 2018, Lin et al.
2018), which not necessarily translate into realistic drivable streets. In contrast, we work with
a high-resolution transportation network of Manhattan comprised of 6,430 nodes and 11,581
edges. Therefore, pickup and rebalance decisions consist of movements between real-world street
coordinates, discretized in a set of network nodes, which we guarantee to be no longer than thirty
seconds away from one another (at an average speed of 20km/h). Such a high-granularity setup
allows us to consider a more realistic demand matching scenario since real-world trip requests
have a larger set of candidate nodes to which their GPS coordinates can be approximated.
5.4.3. Hierarchical regional centers In order to define hierarchical regions in our street-
network map, we implement a variant of the facility location problem proposed by Toregas et al.
(1971), which is concerned with the time that separates a location from its closest facility. The
goal of this problem is to determine the minimum set of facilities in the street network graph
G= (N, E) that together can cover (reach) all others within stime units. Let
xj= 1, if a facility is located at jN, 0 otherwise.
tij = Travel time between nodes i, j N.
s= The maximal service delay of a vehicle departing from a facility.
Np,s = Subset of locations able to reach location pNwithin stime units (i.e., Np,s ={j|tjp
s, jN}).
The minimum set covering problem is defined as follows:
Subject to:
xi∈ {0,1} iN(19)
An optimal solution to this set covering problem would give us the location of the minimum
set of facilities JsNthat would be required to service all locations pand still ensure a maximal
service time of sunits for the entire system. We assume these facilities are regional centers jJs,
and consider that each node pNintegrates the region of its closest center j= arg min
tip .
5.5. Benchmark policy
We benchmark our method against a myopic policy πmyopic comprised of two phases. In the first
phase we determine the optimal vehicle-request assignment at period tby maximizing the con-
tribution function given by Equation (7). Next, in the second phase, idle vehicles are optimally
rebalanced to under-supplied locations using the linear program proposed by Alonso-Mora et al.
(2017). This program aims to minimize the total travel distance of reaching the pickup locations
of unassigned requests while guaranteeing that either all vehicles or all requests are assigned.
The original formulation is adapted such that it abides by the contractual deadlines of freelance
vehicles. We preemptively discard FAVs that, although idle, cannot reach any rebalancing targets
within their remaining service time.
To increase the matching rate, we assume rebalancing decisions can be revoked at every de-
cision time. Hence, in the first phase of our policy, both rebalancing and idle vehicles can be
assigned to new requests. Thanks to our high-resolution network representation, we can calculate
the current location of all rebalancing vehicles at each period t. Therefore, rebalancing vehicles
can be matched to any request occurring in the surroundings of their ongoing route (i.e., the
shortest path to their destination).
6. Experimental study
We implemented our approach using Python 3.6 and Gurobi 8.1. Test cases were executed on a
2.60 GHz Intel Core i7 with 32 GB RAM.
6.1. Training and testing datasets
We create our dataset by randomly sampling 10% of the 2011 Manhattan, New York City taxi
demand. Value functions are created using requests sampled from the 15th Tuesday, and their
eectiveness is assessed on testing instances created using samples from the remaining 51 Tues-
days of 2011. This setup allows that we investigate how the policy learned from a single weekday
performs throughout the whole year. To assess the quality of our VFA policy, we compare the
average profit and service level across the 51 samples against the averages provided by the bench-
mark policy. Figure 2 shows the total request count for each day. For the sake of fairness, the
random processes associated with trip sampling, service class assignment, and fleet distribution
are a function of the iteration number (i.e., the seed). This way, regardless of the configuration, we
guarantee that the same information will be considered across all training iterations and testing
Figure 2 Request count between 5 AM and 9 AM throughout all 52 Tuesdays of 2011 of the Manhattan taxi
demand dataset. VFAs are determined using only samples from the 15th Tuesday.
Figure 3 oers a close-up on the transportation demand of the 15th Tuesday, highlighting the
morning peak from which we draw samples. The dashed lines at 4:30 AM and 10:00 AM delineate
the full extent of our experiment. During the interval [4:30, 5:00), the fleet has a thirty-minute
oset (30 periods) to rebalance in order to serve the future demand. Request batches arrive ev-
ery other minute in the interval [5:00, 9:00) (240 periods), and vehicles have a termination oset
[9:00, 10:00) (60 periods) to make sure all requests picked up around the end of the trip sam-
pling threshold can be delivered. The rebalance oset and the lack of requests at the end of the
trip sampling interval allow us to better assess the performance of our anticipatory rebalancing
method. Regarding the computation time, training and testing algorithms take on average five
and two minutes, respectively, to process a single iteration comprised of 330 steps.
Figure 3 Demand pattern of Manhattan taxi trips on a typical Tuesday, 2011. At every ADP iteration, our
simulation draws samples from the morning peak (the interval in red from 5 AM to 9 AM). The
dashed lines at 4:30 AM and 10:00 AM mark the full length of the experiment. The interval [4:30,
5:00) is a rebalancing oset, whereas the interval [9:00, 10:00) is a termination oset. The former
is laid out to provide extra time for vehicles to rebalance before any requests arrive, and the latter
allows enough time to deliver all requests picked up during the sampling interval.
6.2. Model tuning
In this section, we motivate our algorithmic choices by showing their eectiveness experimen-
tally. Firstly, in Section 6.2.1, we describe the baseline scenario we use throughout the tuning
process. Next, in Section 6.2.2, we present the spatiotemporal hierarchical aggregation structure
we use to approximate value functions. Section 6.2.3 highlights the eectiveness of our discount
function when dealing with multiperiod travel times, and Section 6.2.4 shows how we set up our
rebalancing strategy by combining short- and middle-range distance rebalancing targets. Section
6.2.5 describes the importance of setting a limit on the number of vehicles allowed in each node
of the street network. Finally, Section 6.2.6 oers a sensitivity analysis on the maximum pickup
times and base fares values.
Regarding the tuning of the ADP parameters, we have found that letting the stepsize α= 0.1
and the discount factor γ= 1 has led to superior performance experimentally for I= 500 itera-
tions. Hence, we adopted these values across all considered scenarios.
Table 2 Parameters for the baseline scenario, featuring a fixed PAV-fleet, homogeneous users, and no
service-level penalties.
Problem characteristic Attribute Value(s)
Fleet size (|K|) 300 PAVs
Max. #vehicles/location (kmax
j) 5 (for all locations jN)
Base fare (pbase) $2.4
Distance fare (ptime) $1.0/km
Driving costs (ctime) $0.1/km
Pickup delay (wpickup) 10 min
Number of locations (|N|) 6,430
Period length 1 minute
Simulation length (T) 330 periods (morning peak)
- 30: rebalancing oset (30 min)
- 240: trip sampling (5 AM to 9 AM)
- 60: finalize delivery oset (1 h)
Demand stochastic process (FP) 10% of the real-world Manhattan
taxi demand on the 15th Tuesday
of 2011 (randomly sampled)
6.2.1. Baseline scenario Before we study the impact of FAV hiring and service classes, we
tune our model using a baseline scenario that emulates a traditional MoD application with a
fixed fleet size, homogeneous users (i.e., no service quality classes), and no service level penalties.
This scenario features a fleet of 300 PAVs, which are randomly distributed throughout the street
network at the beginning of each iteration. Every minute, we sample the correspondent request
batch such that 10% of the requests are selected, totaling about 4,300 requests over all periods.
We set up the PAV fleet size experimentally, aiming to service the sampled demand partially. We
assume such an undersupplied scenario to guarantee that there are always unmet requests left to
be addressed, eventually, by the freelance fleet. Additionally, following constraints (3), we assume
there can be only five vehicles inbound to each location. Table 2 summarizes the baseline scenario
6.2.2. Hierarchical aggregation levels We determine our aggregation levels experimentally
by analyzing the quality of the solutions provided by dierent schemes that combine both space
and time. Figure 4 illustrates the underlying structure of our spatial aggregation configuration,
showing to which regional center each location in Naggregate up. In Table 3, we show the de-
cline in the attribute space size for each aggregation level. At the most disaggregated level (i.e.,
g= 0), we consider that both FAV and PAV value functions are indexed by time and location.
FAVs, in particular, are also indexed by their remaining contract durations and station locations.
Since considering fine-grained values for the FAV-only attributes could lead to an excessively
large attribute space, we replace them with coarser substitutes. First, for the remaining contract
N (6,430)
Manhattan street
network node set
RC5 (50)
regional centers
RC10 (21)
regional centers
Figure 4 Regional center distribution on the Manhattan street network graph. Labels RC5 and RC10 identify
the regional centers determined using the facility location problem formulation considering maxi-
mal service delays 5 and 10 minutes, respectively. Each location in N is connected to its respective
regional center (white circle) by a red line.
durations, we assume values are discretized in hours. Assuming FAVs arrive in the system during
the 240 one-minute trip sampling intervals (see Table 2), the longest contract can last four hours.
Therefore, contract durations in intervals 1-60, 61-120, 121-180, and 180-240, aggregate up to
one, two, three, and four remaining hours, respectively. Second, we assume the station locations
aggregate up to one of the 21 ten-minute regional centers. At aggregation level 1, we aggregate
time up to three-minute intervals and locations to the closest five-minute regional center. Ad-
ditionally, we stop considering FAV-related attributes, therefore using only the spatiotemporal
information to index them. Finally, at aggregation level 2, we continue to aggregate time in three-
minute intervals and aggregate locations up to the coarser ten-minute regional centers. At this
level, we are left out with 2,310 possible attributes for each fleet type, substantially improving
our ability to estimate values ˆ
vta of states not yet visited.
We separate states by car type to emphasize that PAVs and FAVs are not interchangeable: FAVs
are expected to work harmonically with PAVs as a backup fleet. The marginal value of an FAV
at a certain time and location diers from its PAV counterpart, not only because it depends on
the contract duration and station location attributes, but mainly because FAV operations entail a
lower profit margin to the platform which consequently leads to lower value functions.
6.2.3. Eectiveness of VFA discount function We assess whether our discount function is
able to produce high-quality value function approximations by disabling it and allowing vehicles
Table 3 Hierarchical aggregation levels. The symbol “-” indicates that the attribute is not considered.
g #Period #Location #Contract* #Stations* |T | ×|AFAV| |T |× |APAV|
0 330 (t = 1 min) 6,430 (N) 4 (4h/60 min) 21 (RC10) 178,239,600 2,121,900
1 110 (t = 3 min) 50 (RC5) - - 5,500 5,500
2 110 (t = 3 min) 21 (RC10) - - 2,310 2,310
*Only considered for FAVs.
to rebalance to increasingly farther distances. We assume vehicles can rebalance to eight regional
centers determined using one-, five-, and ten-minute maximal service delays. Accordingly, we
label these experiments as 8xRC1, 8xRC5, and 8xRC10, and add an extra label [P] to indicate
the cases where we apply the discount function. Therefore, in all test cases featuring the label
[P], rebalancing leads to penalties proportional to the trip duration. We benchmark these results
against a simple rebalancing procedure where vehicles can only move to their adjacent neighbors.
Since traveling to these neighbors in the street network graph is guaranteed to take less than
thirty seconds (see Section 5.4.2), no multiperiod travel times are incurred. Figure 5 shows that
for all three rebalancing strategies considered, applying the discount function leads to superior
results, with the 8xRC1[P] rebalancing configuration having the highest profits and percentage
of serviced users by the 500th iteration.
Figure 5 Performance comparison of rebalancing strategies when using the discount function (represented
by a label [P]).
6.2.4. Rebalancing configurations We study rebalancing configurations in which vehicles
can move to a subset of increasing distant regional centers besides its adjacent neighboring loca-
tions. Notably, in Figure 5, the 8xRC10[P] configuration allows high-quality results at the begin-
ning of the simulation (first 200 iterations) but is later surpassed by the 8xRC1[P] configuration.
The performance of the long-distance configuration (8 ×RC10[P]) is inferior to its counterparts
because rebalancing to farther ten-minute region centers prevents vehicles from consistently
measuring a greater range of states, although it allows them to escape from low-demand areas
faster initially. Therefore, rebalancing to one-minute region centers oers a more balanced trade-
obetween exploration and exploitation, since vehicles can visit more locations (|RC1|= 758
12% of node set N) and bypass the intricate complexity of the real-world street networks.
In order to assess whether we could benefit from combining the short-distance 8xRC1 con-
figuration with medium- and long-distance rebalancing movements, we created the following
Short + Medium (8 ×RC1 + 4 ×RC5): Rebalance to eight one-minute region centers and four
five-minute region centers.
Short + Long (8 ×RC1 + 4 ×RC10): Rebalance to eight one-minute region centers and four
ten-minute region centers.
Short + Medium + Long (8 ×RC1 + 4 ×RC5 + 2 ×RC10): Rebalance to eight one-minute
region centers, four five-minute region centers, and two ten-minute region centers.
Figure 6 shows that configuration 8 ×RC1 + 4 ×RC5 (i.e., adding four five-minute region
centers to the rebalancing pool of eight one-minute region centers) results in higher performance
than the 8xRC1 configuration initially while having comparable convergence behavior after the
400th iteration. However, since the rebalancing configurations perform similarly at the end of the
training iterations, we select 8 ×RC1 as the default rebalancing configuration. We do so, mainly
because this configuration requires less processing time than its counterparts, once fewer targets
are considered.
Figure 6 Performance comparison of rebalancing strategies combining short-, medium-, and long-distance
6.2.5. Setting the maximum number of vehicles per location Our baseline scenario consid-
ers that no more than five vehicles can be inbound to any location, according to constraints (3).
To investigate how much these constraints contribute to generating high-quality value function
approximations, we run the test cases 8xRC1 and 8xRC1+4xRC5, allowing that an unlimited
number of vehicles move to each location. Figure 7 shows that besides reaching a subpar perfor-
mance initially, disabling the maximum number of vehicles/location constraints is a great source
of instability as the experiment progresses. Vehicles end up rebalancing in troves to locations as-
sociated with high-value function approximations, producing a rather unrealistic scenario in our
problem setting, where locations correspond to GPS coordinates in a Manhattan street segment.
Figure 8 shows that allowing that up to five vehicles are inbound to each location achieves the
best performance for our baseline scenario and rebalance configuration 8xRC1+4xRC5.
Figure 7 Eect of allowing an unlimited number of vehicles at each node for rebalancing strategies 8xRC1
and 8xRC1+4xRC5.
6.2.6. Base fare values and service levels We also analyze the impact of base fare values and
pickup delays on the overall performance. While base fare values directly influence the scale
of value functions, the maximum pickup delays limit the matching radius of vehicles. Table 4
presents the average results of our testing instances considering our baseline scenario under nine
dierent combinations of maximum pickup delays and base fare values. Apart from the average
number of requests serviced, pickup delay, and the objective function, it also shows the average
trip distance of both serviced and rejected users as well as the share of the fleet total time spent
parked, rebalancing, picking up, and carrying users. Since we consider 330 periods and a 300-
vehicle fleet, this total fleet time corresponds to 990,000 periods (300×330).
Figure 8 Performance of rebalancing configuration 8xRC1+4xRC5 when allowing that at most two, five, and
ten vehicles are inbound to each location.
Increasing base fares makes the contribution accumulated via distance rates more and more ir-
relevant, as indicated by the increment in the average trip distance of rejected requests. Therefore,
adopting high base fares create a bias towards short-duration trips as indicated by the decrease of
both the share of the time picking up users and their average trip distances. Accordingly, vehicle
rebalancing also raises, once vehicles tend to return more frequently to high demand areas. As
for the influence of higher maximum pickup delays, increasing delays from five to ten minutes
can result in about a 10% increase in the number of requests serviced. Such an increase, how-
ever, is moderate when we contrast ten- and fifteen-minute delays (about 2 percentage points).
This result suggests that, for the fleet size we have set, it is unlikely that increasing pickup delays
even further will lead to more pickups. Since we consider the decision to pick up or reject a user
is taken within a single period, eventually, there are not enough vehicles to fulfill the demand,
regardless of how long users are willing to wait.
Table 4 Impact of maximum pickup delays and base fare values on the solution quality considering the
baseline scenario. Each value corresponds to an average of the results achieved for the 51 testing instances.
($) Requests
(min) Objective
Trip distance (km) Fleet total time / Status
Serviced Rejected Rebalancing Picking up Carrying Parked
5 2.4 76.88% 2.49 15,305 2.95 3.32 10.36% 7.92% 28.11% 53.60%
4.8 75.49% 2.50 22,200 2.88 3.51 11.73% 7.82% 26.94% 53.51%
9.6 78.19% 2.49 38,190 2.81 3.83 12.91% 8.07% 27.29% 51.73%
10 2.4 85.74% 3.72 16,834 2.89 3.94 8.64% 13.17% 30.73% 47.47%
4.8 87.77% 3.72 25,540 2.80 4.77 10.31% 13.52% 30.48% 45.69%
9.6 89.34% 3.86 43,531 2.77 5.34 8.83% 14.29% 30.69% 46.19%
15 2.4 87.77% 4.35 17,335 2.93 3.80 7.98% 15.75% 31.94% 44.33%
4.8 89.35% 4.40 26,095 2.83 4.81 8.43% 16.24% 31.40% 43.94%
9.6 89.71% 4.47 43,594 2.77 5.46 9.88% 16.57% 30.79% 42.76%
6.3. Platform fleet management
In this section, we illustrate the behavior of our πVFA policy. Besides the baseline parameters de-
scribed in Table 2, we consider the best tuning settings achieved in Section 6.2, namely, the three
hierarchical aggregation levels presented in Table 3, the five-vehicle limit per location, and the
rebalancing strategy 8 ×RC1. Figure 9 and Figure 10 compare the performance of the proposed
VFA policy against the myopic policy on a single testing instance. Since the myopic policy reacts
to request rejection, from Figure 9, we can see that the fleet can fulfill the demand entirely until
about 6:45 AM, when the first rebalancing movement appears. In contrast, under our VFA policy,
most vehicles are rebalancing before 6:30 AM. As can be seen in Figure 10, the πmyopic rejects
fewer users than πVFA until about 7:30 AM, but from this time on, the πVFA outperforms the
myopic approach, ultimately resulting in about 14% more users serviced. The dierence between
the policies is further highlighted in the busiest period (from 8 AM to 9 AM). The myopic policy
reacts immediately to the demand peak, picking up as many users as possible, disconsidering the
future outcome of these decisions. In contrast, under the VFA policy, caring about post-decision
outcomes makes many vehicles to stay parked or rebalance, which may result in some rejections
initially, but leads to higher service rates in the long-run.
Figure 11 further illustrates where vehicles are likely to move to, based on the dimension of the
value functions exploited by πVFA. For each location in N, we average the estimates across thirty-
minute intervals from 6 AM to 9 AM. As can be seen from Figure 11, having more vehicles in the
middle section of Manhattan between 6:30 AM and 7:30 AM is prone to lead to higher contribu-
tions. This period is consistent with the predominance of rebalancing operations shown in Figure
9. After 7:30 AM, VFAs get lower and lower, although demand is the highest. As demonstrated by
Al-Kanj, Nascimento, and Powell (2020) for a similar AMoD setting, value functions monotoni-
cally decrease with time, since they reflect the expected revenue vehicles can accrue until the end
of the time horizon. Hence, as time goes on, vehicles have less time to pick up users and make
6.4. Enforcing service level contracts
In this section, we build upon the baseline scenario such that service level violation penalties are
taken into consideration. We show how the penalization mechanisms, namely, the tolerance de-
lays and rejection penalties, can lead to a higher service rate while compensating users who have
had their service levels violated. First, regarding the service level preferences, we assume first-
class users (SQ1) can wait at most five minutes to be picked up, whereas second-class users (SQ2)
can wait at most ten minutes. Proportionally, we assume that the base fare of SQ1 users is twice
the SQ2 base fare, such that pSQ1
base = 4.8 and pSQ2
base = 2.4. One should notice that the parameters
defined for SQ2 users coincide with those used in the tuning.
πVFA πmyopic
Parked Rebalancing Picking upTotal Carrying
Figure 9 Comparison of the number of PAVs by state (parked, rebalancing, picking up, and carrying passen-
gers) for each policy on a single testing instance.
πVFA πmyopic
Figure 10 Comparison of the cumulative number of requests serviced throughout all time steps on a single
testing instance. The VFA policy leads to an 89.18% service rate, whereas the myopic policy can
reach a 78.17% service rate.
As for the penalty parameters, we consider five-minute tolerance delays for both classes and re-
jection penalties ρ∈ {0,1,2}. For ρ= 0, we have a scheme where only delay penalties are incurred,
whereas, for ρ∈ {1,2}, rejection penalties are equivalent to one and twofold the user base fares.
Finally, we also analyze the impact of these penalties when servicing users from SQ1 and SQ2,
both separately (scenarios A1 and A2) and combined (scenario A3). In A3, the service class dis-
tribution follows a stochastic process where first-class user locations and request times coincide
VFA : Low High
Figure 11 Average value function approximations (VFAs) for each location in the street network graph across
sequential thirty-minute intervals. Value functions are the highest in the middle section of Man-
hattan from 6 AM to 7:30 AM, shortly before the demand reaches its peak.
with the 20% most generous tippers (among tipping users) of the Manhattan demand occurring
between 5 AM and 9 AM. To create this distribution, we first aggregate all requests from the taxi
demand considered (all 2011 Tuesdays) according to their location and placement time (within
five-minute bins). Next, we assign first-class labels to all requests whose tip/fare ratio ranks over
the 80th percentile, which is around 0.26. Then, we determine for each location and time bin pair
the ratio of first-class requests, which we consider as the probability of them appearing.
Table 5 summarizes the parameters that we use to build upon the PAV baseline scenario to
assess the impact of enforcing service level contracts.
Table 5 Summary of the parameters used to enforce service level contracts on dierent user bases.
Problem characteristic Attribute Value(s)
Classes (C) {SQ1, SQ2}
Max. pickup delay class c (wc
pickup) SQ1 = 5 min / SQ2 = 10 min
Waiting tolerance class c (wc
tolerance) SQ1 = 5 min / SQ2 = 5 min
Penalty factor (ρ){0, 1, 2}
Base fare (pc
base) SQ1 = $2.4 / SQ2 = $4.8
User base Scenarios:
[A1] Only SQ1 users
[A2] Only SQ2 users
[A3] The 20% highest
tippers are SQ1
6.4.1. Sensitivity analysis of penalization schemes In Table 6, we show for the homogeneous
user base scenarios A1 and A2 to what extent manipulating the penalization scheme alters the
average performance of the fleet from both user and platform perspectives. For comparison, in
the top row of each user base, we place the results for instances similar to those presented in Table
4, in which neither delay nor rejection penalties are applied.
Although in practice, the same maximum delays are considered (i.e., ten and fifteen minutes),
applying tolerance delays alone (i.e., ρ= 0) leads to faster pickups for both SQ1 and SQ2 classes.
Since any time spent within the tolerance delay osets the base fare values, the πVFA policy ends
up incorporating a greater sense of urgency. From the perspective of the provider, such a penalty
mechanism enables improved user service levels at the expense of slightly lower total contribu-
tions. This tradeois more prominent for the user base A1, in which pickup delays decreased
in 41 seconds while increasing in 0.25 percentage points the number of serviced requests, at the
expense of $1,287 fewer profits. Moreover, a close analysis of the fleet total time indicates that the
tolerance delays remarkably impact the fleet management strategy to service A2, since vehicles
spend more time rebalancing and less time parking. These relations suggest that tolerance delays
help to achieve more accurate VFAs, which adequately and quickly drive vehicles to the most
promising areas.
Table 6 Sensitivity analysis on penalization schemes. Top lines feature results for comparable
configurations where no penalties are applied (see Table 4). Performance markers consist of the average
results achieved by applying our πVFA policy on the 51 testing instances.
Max. delay (min) Rej.
(min) Objective
Trip distance (km) Fleet total time / Status
Pickup Tolerance Serviced Rejected Rebalancing Picking up Carrying Parked
No penalties
4.8 10 - - 87.77% 3.72 25,540 2.80 4.77 10.31% 13.52% 30.48% 45.69%
Delay and rejection penalties (user base A1)
4.8 5 5 0 88.02% 3.31 24,253 2.85 4.39 6.31% 12.05% 31.17% 50.47%
1 88.94% 3.48 21,584 2.79 5.05 6.43% 12.83% 30.81% 49.93%
2 88.91% 3.56 19,009 2.77 5.23 8.10% 13.11% 30.53% 48.26%
No penalties
2.4 15 - - 87.77% 4.35 17,335 2.93 3.80 7.98% 15.75% 31.94% 44.33%
Delay and rejection penalties (user base A2)
2.4 10 5 0 87.76% 4.32 16,956 2.94 3.77 9.07% 15.65% 31.96% 43.32%
1 89.13% 4.40 15,776 2.83 4.74 8.61% 16.19% 31.34% 43.85%
2 89.25% 4.50 14,489 2.80 5.10 10.28% 16.57% 30.95% 42.20%
However, sole adopting tolerance delays only improves the ride experience of serviced users,
compensating them according to the inconvenience inflicted. A true commitment to SLCs have
to also adequately compensate those who have been through the greatest possible inconvenience,
namely, service rejection. By making up for rejections, platforms can improve customer loyalty,
once users can trust the transportation provider genuinely strives to keep consistent service qual-
ity, to the point of having “skin in the game” (i.e., risking company profits). Our results show
that, besides providing such a guarantee, the application of rejection penalties can also increase
the number of requests serviced, with vehicles spending more time rebalancing and less time
parked. High penalty factors, however, creates a rejection bias against long distance requests (see
the increase in the mean trip distance associated with rejections). Conversely, the trip distance of
serviced requests decreases, indicating that the fleet management strategy consists of fulfilling
short trips and quickly rebalancing back to high-demand areas.
Ultimately, our findings suggest that both measures are eective to improve service quality,
such that we incorporate them in our standard setup. Hence, we adopt the five-minute tolerance
delays and rejection penalties equivalent to the base fare (i.e., ρ= 1), since these oer a more
balanced trade-oregarding users’ trips distances. To illustrate how this scheme works in the
current transportation setting, in the following, we exemplify how the service provision unfolds
for a regular SQ1 user. First, in case the request cannot be fulfilled, the platform warns the user
(within one minute) and compensates him immediately a rejection penalty equivalent to the base
fare. Otherwise, when the user can be serviced, a vehicle takes in average 3.48 min to pick up
him. When the waiting time surpasses the five minute threshold, a fraction of the base fare is dis-
counted from the user’s total trip cost, proportional to the waiting time in the tolerance interval.
6.5. Vehicle productivity and fleet size
Although we have demonstrated that our penalization scheme can improve PAV-fleet productiv-
ity and user service levels, Table 6 shows that the platform still cannot service about 10% of the
users. Our results indicate that this inability to cover the demand entirely is due to insucient
vehicle supply. As can be seen in Figure 9, under our VFA policy, most vehicles are busy (i.e.,
rebalancing, picking up, or carrying users) during the demand peak. When rejections start to
accumulate from about 6:30 and on (see Figure 10), we can see that the number of parked vehi-
cles drop dramatically, especially in the myopic policy. In such a scarcity scenario, vehicles tend
to reject users whose trips are not economically ecient. Typically, a vehicle is better opark-
ing in high-demand areas than traveling to pick up users in low-demand areas, associated with
unpromising future returns. This fleet management strategy can also be seen during the busiest
period in Figure 9, which features two “idleness peaks” (at around 7:15 and 8:30) where about
fifty AVs are parked, waiting for future requests.
6.6. Freelance fleet management
In this section, we show how a third-party-owned fleet of FAVs can complement the PAV-fleet to
improve user service levels. First, we describe how we model the uncertainty associated with the
freelance fleet availability (Section 6.6.1) and then we assess the outcome of hiring FAVs (Section
6.6.1. Modeling FAV availability We assume both announcement times and contract dura-
tions are drawn from a truncated normal distribution ψ(µ, σ, a, b;x), where µand σare the mean
and variance of the normal distribution, whereas aand bspecify the truncation interval. Since
our study draws on Manhattan’s demand, we also harness the daily commuting patterns of the
island to establish realistic announcement times. We consider FAVs arrive between 5 AM and 9
AM, reaching an arrival peak at 8 AM. This arrival pattern is adapted from the time workers leave
home to go to work in Manhattan (see Table 7), where most departures (54.60%) occur between 7
AM to 9 AM.
Table 7 Time leaving home to go to work in Manhattan (U.S. Census Bureau 2015).
Time leaving home Workers
12:00 AM to 4:59 AM 1.10%
5:00 AM to 5:29 AM 1.40%
5:30 AM to 5:59 AM 1.10%
6:00 AM to 6:29 AM 4.10%
6:30 AM to 6:59 AM 4.60%
7:00 AM to 7:29 AM 9.70%
7:30 AM to 7:59 AM 10.20%
8:00 AM to 8:29 AM 20.10%
8:30 AM to 8:59 AM 14.60%
9:00 AM to 11:59 AM 33.00%
Regarding the contract durations, we investigated two scenarios. First, in scenario D1, vehicles
are available until the end of the trip sampling interval at 9 AM. Second, in scenario D2, contracts
can last from 1h to 4h (viz., trip sampling interval) and most FAVs are made available for 2h,
resulting in the distribution ψ(2h,1h,1h,4h; x). We generate these contract durations in tandem
with announcement times, adjusting durations that surpass the maximum simulation time when
added to their announcement times. For this reason, contracts in the range [1h, 1.5h] become
more common since FAVs arriving after 8:30 AM have maximum contract durations of 1.5 hours.
Regarding the spatial distribution of these vehicles over the map, we investigate two deploy-
ment scenarios with increasingly higher numbers of stations ON:
Clustered [C] – Stations are drawn from 1% distinct randomly chosen locations (|O|64).
In this scenario, AVs cruise to park in a small set of parking lots (e.g., due to incentives, city
Scattered [S] – Stations are drawn from all available locations (|O|6,430). This scenario sim-
ulates the behavior of AVs which park nearby their owners’ locations.
Table 8 Summary of the parameters for on-demand hiring.
Problem characteristic Attribute Value(s)
Profit margin (β) 100% (PAVs) and 30%(FAVs)
Fleet size (|K|) 300 PAVs + 200 FAVs
Number of stations (O)Distribution scenarios:
[C] Clustered - 64 (0.01*N)
[S] Scattered - 6,430 (1.00*N)
FAV hiring stochastic process (FO)Station: chosen at random from O
#Vehicles/station: random
Announcement time: ψ(8AM,1h,5AM,9AM; x)
Contract duration scenarios:
[D1] from announcement time until 9 AM
[D2] ψ(2h,1h,1h,4h; x)
We assume that across all iterations the station location set Oremains stable for all deployment
scenarios. Thus, under scenario C, for instance, FAVs always start from the same set of 64 nodes.
Table 8 summarizes the parameters governing an operational scenario in which the fleet is
comprised of PAVs and FAVs. This scenario extends our baseline scenario by allowing extra 200
FAVs into the platform, distributed according to the availability settings mentioned earlier.
6.6.2. Improving service quality with on-demand hiring In this section, we oer dierent
perspectives on the results achieved when FAVs, which are available according to the parameters
described in Table 8, join the PAV fleet to uphold user SLCs. Table 9 and Table 10 present an
average performance comparison between the VFA and myopic policies on the testing data set
for user base A3. Table 9 shows the influence of each FAV availability scenario (i.e., contract
duration and station distribution combination) on the mean objective function, percentage of
requests serviced, and pick up delay. Table 10 presents the fleet utilization breakdown, that is,
the percentage of the total fleet time spent in each vehicle status.
For the sake of comparison, the bottom line of each policy in both tables presents the results
achieved when hiring is not considered. As can be seen from Table 9, in the no-hiring scenario
the VFA policy can service about 18% more requests than the myopic policy, besides providing
lower pickup delays, especially for the SQ2 class. Once hiring is enabled, over 90% of requests
are picked up regardless of the policy across all scenarios. However, substantial dierences can be
seen between the policies when dierent contract durations are considered. On average, we have
found that D1 contracts allow a surplus of about 6,000 more minutes of total fleet time than D2.
This extra time reflects positively on the platform profits and in the number of requests serviced.
While the average dierence across station distribution between D1 to D2 contract durations is
about 4 percentage points in the myopic policy, this dierence is less than 0.5 percentage points
in the VFA policy. The same pattern can be seen on the dierence between SQ2 user pickup
Table 9 Comparison of the average objective function, number of requests serviced, and pickup delays
between VFA and myopic policies on all FAV availability scenarios.
duration Station
distr. Objective
func.($) Requests
Pk. delay (min)
Policy SQ1 SQ2
Myopic D1 C 18,267 96.73% 3.0 4.6
S 18,306 96.63% 3.0 4.7
D2 C 17,628 92.66% 3.1 5.0
S 17,587 92.16% 3.1 5.0
No hiring 15,273 75.32% 3.2 5.0
VFA D1 C 18,869 98.80% 3.1 4.7
S 18,986 98.90% 3.0 4.7
D2 C 18,811 98.45% 3.1 4.7
S 18,809 98.43% 3.0 4.6
No hiring 17,442 89.25% 3.1 4.4
Table 10 Comparison of the average fleet total time per status across all FAV availability scenarios
considering the VFA and myopic policies.
duration Station
Rebalancing Picking up Carrying Parked Returning
Myopic D1 C 0.49% 0.65% 13.56% 14.79% 31.39% 22.36% 54.57% 54.24% - 7.96%
S 0.52% 0.88% 13.60% 14.76% 31.46% 21.88% 54.42% 54.23% - 8.26%
D2 C 0.54% 1.10% 13.64% 21.89% 31.43% 26.06% 54.39% 38.91% - 12.05%
S 0.57% 1.65% 13.71% 21.69% 31.49% 25.38% 54.24% 38.29% - 12.98%
No hiring 0.75% - 14.17% - 31.82% - 53.25% - - -
VFA D1 C 8.14% 1.30% 15.08% 10.23% 31.53% 20.01% 45.25% 60.38% - 8.08%
S 7.34% 1.23% 14.90% 10.29% 31.61% 19.98% 46.15% 59.73% - 8.76%
D2 C 8.08% 1.31% 14.98% 13.10% 31.50% 24.08% 45.43% 51.15% - 10.35%
S 8.09% 1.12% 14.82% 12.48% 31.40% 24.23% 45.68% 50.88% - 11.29%
No hiring 8.57% - 15.25% - 31.28% - 44.90% - - -
delays, which dier much dramatically across the contract durations scenarios under the myopic
policy. Hence, by better managing both vehicle types, the VFA policy can sustain high service
levels for all user bases, even under a more restrict FAV availability. From a dierent perspective,
FAV owners wanting to improve the odds of renting out their vehicles, have to set up service
availability adequately, such that the platform has enough time to rebalance and return these
Moreover, as confirmed by the total fleet time breakdown in Table 10, FAVs tend to stay idler
under the proposed VFA policy. Although not highlighted by the objective functions due to our
low-cost setup, this characteristic is crucial for providers, especially in the light of vehicle au-
tomation, when induced demand due to ease of use may play a significant role. City managers
are increasingly concerned about trac, and proposals for imposing congestion charges abound.
Therefore, a platform owner is generally better ousing fewer vehicles, especially FAVs, which
need to spend extra time returning to their origin station. Ultimately, the proposed VFA policy
can find a compromise between service levels and vehicle activity, prioritizing the own fleet over
outside hire to address requests.
300 PAVs 200 FAVs (contract=D2, stations=S)
Parked Rebalancing Picking upTotal Carrying Returning
Figure 12 Number of vehicles per status (parked, rebalancing, servicing passengers, and returning to sta-
tion) by one-minute step separated by fleet type for a single testing instance. The total number of
PAVs is constant throughout the whole time horizon, whereas the number of FAVs varies according
to a stochastic process.
Figure 12 further illustrates the impact of including 200 FAVs to service user base A3 on a single
testing instance. FAVs arrive according to the stochastic process FOassuming contract duration
scenario D2 and station distribution scenario S. It can be seen that the vehicle/status distribution
still resembles the results achieved by a PAV-only fleet (see Figure 9), showing that the inclusion
of FAVs does not disrupt the PAV-fleet operation significantly. Since we assume 70% of the profits
accrued by FAVs belong to their owners, using FAVs returns fewer profits to the platform while
inflicting similar operational costs. That is the main reason why the service level improvement by
hiring vehicles (about 10% for the VFA policy) does not translate proportionally into the profits.
However, maintaining high service levels result in increased customer satisfaction, which may
improve the platforms reputation and generate a higher turnover in the long run.
7. Conclusions
Mobility-on-demand services can only challenge self-owned mobility products if they can oer
a competitive service quality. Service quality is based on two core elements, namely, maintain-
ing personalized service levels and making up for inconveniences (i.e., service level violations)
In this paper, we propose a solution to control service quality on an operational level using a
learned-based optimization approach. We introduce a model for a dynamic and stochastic dial-
a-ride problem arising on an AMoD platform that hires idle AVs to maintain consistent user
service levels. Developing an approximate dynamic programming algorithm, we iteratively im-
prove a policy to dispatch and rebalance both platform- and third-party vehicles on a real-world
street network of Manhattan. The proposed policy deals with two seldomly considered sources
of uncertainty, namely, (i) the spatiotemporal distribution of user service level preferences and
(ii) the availability of third-party vehicles. While (i) allows providers to better address heteroge-
neous user expectations by rebalancing more vehicles to areas featuring high demanding users,
(ii) enables the learning of routing policies that take into account when, where, and how many
third-party vehicles are expected to appear throughout the planning horizon.
The proposed approach improves service quality for the ridesharing platform customers in
multiple ways. First, penalizing both excessive delays and rejections following SLCs is shown to
be an eective measure to increase the number of requests serviced. Second, the policy learned
by sampling the demand from a particular weekday was shown to be generic enough to ade-
quately address the demand patterns of all similar weekdays throughout a whole year. Without
any hiring, such a policy consistently outperforms a reactive optimization policy, servicing on
average about 18% more requests. Moreover, although both policies manage to service most re-
quests when hiring is considered, the proposed policy has been shown to do it more eciently,
using fewer FAVs, and providing better service levels.
We conduct experiments on the historical Manhattan taxi demand considering a variety of fleet
and demand configuration scenarios. Using a baseline scenario featuring only PAVs and homo-
geneous users, we define a hierarchical aggregation structure to approximate value functions of
unvisited states. Besides time and space, the proposed layers also consider FAV-specific character-
istics, such as contract duration and home station location. In particular, the spatial hierarchical
aggregation structure improves existing configurations in which locations aggregate up to ad-hoc
regions. We propose a minimum set covering formulation to optimally determine regions whose
nodes can be accessed from a regional center within a maximal time limit. This formulation oers
a more robust and versatile approach to hierarchical spatial aggregation since it automatically
captures the peculiarities of any transportation network.
Optimal regional centers are also used to set up several rebalancing strategies, in which ve-
hicles can move to a subset of neighboring centers, determined through dierent maximal time
limits. The obtained results show that rebalancing to short-range regional centers allows vehicles
to incrementally scape from perpetually low-demand areas, besides oering a good compromise
regarding computational time. Since we adopt short intervals, these rebalancing movements oc-
casionally result in multiperiod travel times (i.e., at decision time, vehicles are still acting on
decisions from previous periods). We show that by actively lowering VFAs of post-decision states
of farther rebalancing targets improves the performance of our solution for test cases with in-
creasingly higher rebalancing distances.
Moreover, we develop a high-resolution state representation in which the spatial attributes cor-
respond to discretized GPS coordinates (rather than grids, zones, or areas) and periods are no
longer than one minute to comply with the demanding expectations of current MoD users. Such
characteristics prevent our policy from incurring into infeasible (concerning the infrastructure
capacity) or illegal (concerning the city regulations) decisions altogether in real-time. Ultimately,
making use of the underlying street network allows us not only to comply with real-world con-
straints but also improves the solution quality. Our experiments demonstrate that constraining
the maximum number of vehicles inbound to each intersection is crucial to achieving stable VFAs,
since these constraints exempt us from modeling the behavior of nonlinear approximations.
This research can be extended in many promising directions. First, one could focus on designing
an inverse formulation to determine the minimum number of company-owned vehicles necessary
to complement third-party fleets available according to varying stochastic distributions. Second,
the requirements of independent owners could take into consideration alternative parameters.
For example, they could establish minimum profit margins or compensations to join ridesharing
platforms. As a result, platforms would have to consider these parameters to achieve balanced
solutions, weighing customer dissatisfaction and outsourcing costs. Additionally, by considering
time travel uncertainty, service quality contracts would have to be further adapted to compensate
users beyond the violations previously described. Ultimately, this uncertainty could lead to ser-
vice time window violations on the supply side, such that platforms could also set up contracts
prescribing compensations for inconvenienced FAV owners. Lastly, one could also consider the
impact of cities’ trac management policies (e.g., congestion pricing, empty-vehicle fees, parking
costs) on platform operations.
This research is supported by the project “Dynamic Fleet Management (P14-18 – project 3)” (project 14894)
of the research programme i-CAVE, partly financed by the Netherlands Organization for Scientific Research
(NWO), domain Applied and Engineering Sciences (TTW).
Al-Kanj L, Nascimento J, Powell WB, 2020 Approximate dynamic programming for planning a ride-hailing
system using autonomous fleets of electric vehicles.European Journal of Operational Research 1–40.
Alonso-Mora J, Samaranayake S, Wallar A, Frazzoli E, Rus D, 2017 On-demand high-capacity ride-sharing via
dynamic trip-vehicle assignment.Proceedings of the National Academy of Sciences 114(3):462–467.
Alonso-Mora J, Wallar A, Rus D, 2017 Predictive routing for autonomous mobility-on-demand systems with
ride-sharing.Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 3583–3590 (Vancouver, Canada).
Archetti C, Savelsbergh M, Speranza MG, 2016 The vehicle routing problem with occasional drivers.European
Journal of Operational Research 254(2):472–480.
Arslan AM, Agatz N, Kroon L, Zuidwijk R, 2019 Crowdsourced delivery – A dynamic pickup and delivery
problem with ad hoc drivers.Transportation Science 53(1):222–235.
Campbell H, 2018 Who will own and have propriety over our automated future? Considering governance of
ownership to maximize access, eciency, and equity in cities.Transportation Research Record 2672(7):14–
Castiglione J, Cooper D, Sana B, Tischler D, Chang T, Erhardt G, Roy S, Chen M, Mucci A, 2018 TNCs
& Congestion. Technical report, San Francisco County Transportation Authority, San Francisco, CA,
United States.
Dahle L, Andersson H, Christiansen M, 2017 The vehicle routing problem with dynamic occasional drivers.
Proceedings of the 8th International Conference on Computational Logistics, 49–63 (Southampton, United
Dahle L, Andersson H, Christiansen M, Speranza MG, 2019 The pickup and delivery problem with time win-
dows and occasional drivers.Computers & Operations Research 109:122–133.
Fagnant DJ, Kockelman KM, 2018 Dynamic ride-sharing and fleet sizing for a system of shared autonomous
vehicles in Austin, Texas.Transportation 45(1):143–158.
Furuhata M, Dessouky M, Ord´
nez F, Brunet ME, Wang X, Koenig S, 2013 Ridesharing: The state-of-the-art
and future directions.Transportation Research Part B: Methodological 57:28–46.
George A, Powell WB, Kulkarni SR, 2008 Value function approximation using multiple aggregation for multi-
attribute resource management.Journal of Machine Learning Research 9:2079–2111.
George AP, Powell WB, 2006 Adaptive stepsizes for recursive estimation with applications in approximate dy-
namic programming.Machine Learning 65(1):167–198.
Gueriau M, Dusparic I, 2018 SAMoD: Shared autonomous mobility-on-demand using decentralized reinforce-
ment learning.Proceedings of the 21st International Conference on Intelligent Transportation Systems
(ITSC), 1558–1563 (Maui, HI, United States).
Ho SC, Szeto WY, Kuo YH, Leung JM, Petering M, Tou TW, 2018 A survey of dial-a-ride problems: Literature
review and recent developments.Transportation Research Part B: Methodological 111:1–27.
Hyland MF, Mahmassani HS, 2017 Taxonomy of shared autonomous vehicle fleet management problems to in-
form future transportation mobility.Transportation Research Record: Journal of the Transportation Research
Board 2653(1):26–34.
Iglesias R, Rossi F, Wang K, Hallac D, Leskovec J, Pavone M, 2018 Data-driven model predictive control of
autonomous mobility-on-demand systems.Proceedings of the IEEE International Conference on Robotics
and Automation (ICRA), 1–7 (Brisbane, QLD, Australia).
Le TV, Stathopoulos A, Van Woensel T, Ukkusuri SV, 2019 Supply, demand, operations, and management
of crowd-shipping services: A review and empirical evidence.Transportation Research Part C: Emerging
Technologies 103(3):83–103.
Lee A, Savelsbergh M, 2015 Dynamic ridesharing: Is there a role for dedicated drivers? Transportation Research
Part B: Methodological 81:483–497.
Lin K, Zhao R, Xu Z, Zhou J, 2018 Ecient large-scale fleet management via multi-agent deep reinforcement
learning.Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining, 1774–1783 (London, United Kingdom).
Ma S, Zheng Y, Wolfson O, 2015 Real-time city-scale taxi ridesharing.IEEE Transactions on Knowledge and
Data Engineering 27(7):1782–1795.
Narayanan S, Chaniotakis E, Antoniou C, 2020 Shared autonomous vehicle services: A comprehensive review.
Transportation Research Part C: Emerging Technologies 111:255–293.
Oyola J, Arntzen H, WoodruDL, 2018 The stochastic vehicle routing problem, a literature review, part i:
Models.EURO Journal on Transportation and Logistics 7(3):193–221.
Powell WB, 2011 Approximate Dynamic Programming: Solving the Curses of Dimensionality: Second Edition
(John Wiley and Sons).
Powell WB, Sim˜
ao HP, Bouzaiene-Ayari B, 2012 Approximate dynamic programming in transportation and
logistics: A unified framework.EURO Journal on Transportation and Logistics 1(3):237–284.
Santos DO, Xavier EC, 2015 Taxi and ride sharing: A dynamic dial-a-ride problem with money as an incentive.
Expert Systems with Applications 42(19):6728–6737.
Savelsbergh M, Sol M, 1998 Drive: Dynamic routing of independent vehicles.Operations Research 46(4):474–
Shoup D, 2017 The high cost of free parking: Updated edition (Routledge).
ao HP, Day J, George AP, Giord T, Nienow J, Powell WB, 2009 An approximate dynamic programming
algorithm for large-scale fleet management: A case application.Transportation Science 43(2):178–197.
Topaloglu H, Powell WB, 2006 Dynamic-programming approximations for stochastic time-staged integer
multicommodity-flow problems.INFORMS Journal on Computing 18(1):31–42.
Toregas C, Swain R, ReVelle C, Bergman L, 1971 The location of emergency service facilities.Operations Re-
search 19(6):1363–1373.
Tsao M, Iglesias R, Pavone M, 2018 Stochastic model predictive control for autonomous mobility on demand.
Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), 3941–3948
(Maui, HI, United States).
US Census Bureau, 2015 Time of departure to go to work (Manhattan borough, NY). 2015: American Commu-
nity Survey 5-year estimates (Table B08302).
Vazifeh MM, Santi P, Resta G, Strogatz SH, Ratti C, 2018 Addressing the minimum fleet problem in on-demand
urban mobility.Nature 557(7706):534–538.
Wen J, Zhao J, Jaillet P, 2017 Rebalancing shared mobility-on-demand systems: A reinforcement learning
approach.Proceedings of the IEEE 20th International Conference on Intelligent Transportation Systems
(ITSC), 220–225 (Yokohama, Japan).
Xu Z, Yin Y, Ye J, 2020 On the supply curve of ride-hailing systems.Transportation Research Part B: Method-
ological 132:29–43.
Zhang R, Rossi F, Pavone M, 2016 Model predictive control of autonomous mobility-on-demand systems.Pro-
ceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), 1382–1389
(Stockholm, Sweden).
... Sharing rides with other passengers is an alternative way of sharing transportation resources. Beirigo, Schulte, and Negenborn (2021) address the problem of traditional ridesharing platforms, namely, that they often do not adequately cope with demand fluctuation because they are bounded by fleet size. The authors propose to overcome this by including autonomous vehicles, and thereby they introduce the autonomous ridesharing problem, where idle vehicles can be hired on demand. ...
Traditionally, individuals and firms have bought and owned resources. For instance, owning a car was a status symbol for people. In recent years, this focus has changed. Young people just want to have some convenient means of transportation available. Sharing cars, bikes, or rides is also acceptable and has the advantage of increased sustainability compared with everyone owning such a device. A similar development can also be observed in industry. Already for decades, smaller farmers in rural areas have shared some expensive harvesting machines, via cooperatives. In recent years, cost pressures have incentivized all firms to use their equipment more efficiently and to avoid idle capacities. This has led, among other things, to the exchange of transportation requests between smaller carriers, when one of them faces an overload situation and another has idle resources. Similarly, some production firms share storage space and logistics service providers operate this storage jointly with greater efficiency than each firm could do separately. All these tendencies have led to the development of a new paradigm, the “Sharing Economy.” In transportation, these resources are the vehicles used to deliver goods or move passengers. Given an increasing pressure to act economically and ecologically sustainable, efficient mechanisms that help to benefit from idle capacities are on the rise. They are typically organized through digital platforms that facilitate the efficient exchange of goods or services and help cope with data privacy issues. Both transportation companies and heavy users of transportation services need to learn how to play in a world of shared idle capacities. “Transportation in the Sharing Economy” was the theme of the 2019 Transportation Science and Logistics Workshop held in Vienna, and it is therefore also the focus of this associated special issue. The papers included cover several forms of shared resources in transportation, where the first part deals with freight, whereas in the second part, the focus lies on passenger transportation. Each study highlights the benefits of sharing resources. We received 40 submissions, of which 10 papers were accepted for this special issue after the reviewing process.
... These idle hireable vehicles, which we refer to as FAVs, may be readily available during predefined time windows to join larger, centrally controlled fleets in exchange for compensation. As suggested by Hyland and Mahmassani (2017) and shown by Beirigo et al. (2021), AV fleet managers can significantly benefit from such short-term fleet size elasticity, increasing or decreasing the fleet to adequately meet the demand, by either hiring privately-owned AVs or drivers with non-AVs. ...
Full-text available
With the popularization of transportation network companies (TNCs) (e.g., Uber, Lyft) and the rise of autonomous vehicles (AVs), even major car manufacturers are increasingly considering themselves as autonomous mobility-on-demand (AMoD) providers rather than individual vehicle sellers. However, matching the convenience of owning a vehicle requires providing consistent service quality, taking into account individual expectations. Typically, different classes of users have different service quality (SQ) expectations in terms of responsiveness, reliability, and privacy. Nonetheless, AMoD systems presented in the literature do not enable active control of service quality in the short term, especially in light of unusual demand patterns, sometimes allowing extensive delays and user rejections. This study proposes a method to control the daily operations of an AMoD system that uses the SQ expectations of heterogeneous user classes to dynamically distribute service quality among riders. Additionally, we consider an elastic vehicle supply, that is, privately-owned freelance AVs (FAVs) can be hired on short notice to help providers meeting user service-level expectations. We formalize the problem as the dial-a-ride problem with service quality contracts (DARP-SQC) and propose a multi-objective matheuristic to address real-world requests from Manhattan, New York City. Applying the proposed service-level constraints, we improve user satisfaction (in terms of reached service-level expectations) by 53% on average compared to conventional ridesharing systems, even without hiring additional vehicles. By deploying service-quality-oriented on-demand hiring, our hierarchical optimization approach allows providers to adequately cater to each segment of the customer base without necessarily owning large fleets.
... With the popularization of app-based transportation services, the DARP has been the basis for passenger ridesharing services (see, e.g., [13], [19]). More recently, the advantages of autonomous vehicles for ridesharing have been explored, for instance, considering service quality improvements when platforms activate idle/ parked vehicles [2]. ...
The shared autonomous vehicle (SAV) is a new concept that meets the upcoming trends of autonomous driving and changing demands in urban transportation. SAVs can carry passengers and parcels simultaneously, making use of dedicated passenger and parcel modules on board. A fleet of SAVs could partly take over private transport, taxi, and last-mile delivery services. A reduced fleet size compared to conventional transportation modes would lead to less traffic congestion in urban centres. This paper presents a method to estimate the optimal capacity for the passenger and parcel compartments of SAVs. The problem is presented as a vehicle routing problem and is named variable capacity share-a-ride-problem (VCSARP). The model has a MILP formulation and is solved using a commercial solver. It seeks to create the optimal routing schedule between a randomly generated set of pick-up and drop-off requests of passengers and parcels. The objective function aims to minimize the total energy costs of each schedule, which is a trade-off between travelled distance and vehicle capacity. Different scenarios are composed by altering parameters, representing travel demand at different times of the day. The model results show the optimized cost of each simulation along with associated routes and vehicle capacities.
This paper shows that the prediction of vessel arrival times with AIS (Automatic Identification System) is increasing the number of vessels a port can handle without additional superstructure. The Port of Hamburg is used as a case study to show the difference between the as-is situation and one with the integrated information system. The simulation shows improvements with two different risk levels to prove the concept. The simulation uses simplified versions of an algorithm that assigns vessels to free berths without disrupting the normal terminal usage. It was possible to clear up to 44% more ships each day just with an additional system that utilises already existing data for achieving more efficiency within the port.
The craft beer supply chain in the USA differs from the supply chain of macro breweries in its structure, handled volumes and product shelf-life. In this work, we study how these smaller craft breweries can benefit from transparency in their supply chain. We consider additional information sharing of orders and inventories at downstream nodes. The levels that we investigate grant the brewery incremental access to distributor, wholesaler, and retailer data. We show how this knowledge can be incorporated effectively into the brewery’s production planning strategy. Extending the well-known beer game, we conduct a simulation study using real-world craft beer supply chain parameters and demand. We quantify the impact of information sharing on the craft brewery’s sales, spoilage, and beer quality. Our model is designed to directly support the brewery when evaluating the value of downstream information and negotiating data purchases with brokers. Through a computational analysis, we show that the brewery’s benefits increase almost linearly with every downstream node that it gets data from. Full transparency allows to halve the missed beer sales, and beer spoilage can even be reduced by 70% on average.
Current mobility services cannot compete on equal terms with self-owned mobility products concerning service quality. Due to supply and demand imbalances, ridesharing users invariably experience delays, price surges, and rejections. Traditional approaches often fail to respond to demand fluctuations adequately since service levels are, to some extent, bounded by fleet size. With the emergence of autonomous vehicles (AVs), however, the characteristics of mobility services change, and new opportunities to overcome the prevailing limitations arise. This thesis proposes a series of learning- and optimization-based strategies to help autonomous transportation providers meet the service quality expectations of diversified user bases. We show how autonomous mobility-on-demand (AMoD) systems can develop to revolutionize urban transportation, improving reliability, efficiency, and accessibility.
Full-text available
The actions of autonomous vehicle manufacturers and related industrial partners, as well as the interest from policy makers and researchers, point towards the likely initial deployment of autonomous vehicles as shared autonomous mobility services. Numerous studies are lately being published regarding Shared Autonomous Vehicle (SAV) applications and hence, it is imperative to have a comprehensive outlook, consolidating the existing knowledge base. This work comprehensively consolidates studies in the rapidly emerging field of SAV. The primary focus is the comprehensive review of the foreseen impacts, which are categorised into seven groups, namely (i) Traffic & Safety, (ii) Travel behaviour, (iii) Economy, (iv) Transport supply, (v) Land–use, (vi) Environment & (vii) Governance. Pertinently, an SAV typology is presented and the components involved in modelling SAV services are described. Issues relating to the expected demand patterns and a required suitable policy framework are explicitly discussed.
Full-text available
This paper studies the supply curve of ride-hailing systems under different market conditions. The curve defines a relationship between the throughput of trips of the system and the cost of riders it serves. We first focus on isotropic markets by revisiting a matching failure identified recently that matches a requesting rider with an idle driver very far away. The failure will cause the supply curve of an e-hailing market to be backward bending, but it is proved that the backward bend does not arise in the street-hailing market. By constructing a double-ended queuing model, we prove that the supply curve of an e-hailing system with finite matching radius is always backward bending, but a smaller matching radius leads to a weaker bend. We further reveal the possibility of completely avoiding the bend by adaptively adjusting the matching radius. We then turn to the anisotropic markets and identify another type of matching failure due to indiscriminate matching between drivers and riders, which again causes a backward bending supply. Given the prevalence of such a matching failure in real-world operations, we discuss how to avoid it using price or rationing discrimination. A conceptualized two-node network is constructed to facilitate the discussion.
We address a comprehensive ride-hailing system taking into account many of the decisions required to operate it in reality. The ride-hailing system is formed of a centrally managed fleet of autonomous electric vehicles which is creating a transformative new technology with significant cost savings. This problem involves a dispatch problem for assigning riders to cars, a surge pricing problem for deciding on the price per trip and a planning problem for deciding on the fleet size. We use approximate dynamic programming to develop high-quality operational dispatch strategies to determine which car is best for a particular trip, when a car should be recharged, when it should be re-positioned to a different zone which offers a higher density of trips and when it should be parked. These decisions have to be made in the presence of a highly dynamic call-in process, and assignments have to take into consideration the spatial and temporal patterns in trip demand which are captured using value functions. We prove that the value functions are monotone in the battery and time dimensions and use hierarchical aggregation to get better estimates of the value functions with a small number of observations. Then, surge pricing is discussed using an adaptive learning approach to decide on the price for each trip. Finally, we discuss the fleet size problem.
Crowd-shipping promises social, economic, and environmental benefits covering a range of stakeholders. Yet, at the same time, many crowd-shipping initiatives face multiple barriers, such as network effects, and concerns over trust, safety, and security. This paper reviews current practice, academic research, and empirical case studies from three pillars of supply, demand, and operations and management. Drawing on the observed gaps in practice and scientific research, we provide several avenues for promising areas of applications, operations and management, as well as improving behavioral and societal impacts to create and enable a crowd-shipping system that is complex, yet, integrated, dynamic and sustainable.
Technological advances in the past decades have sparked numerous creative and innovative solutions to lower costs for transportation companies. One such solution, adopted by Walmart and Amazon among others, is crowdshipping, i.e. getting ordinary people, who already have a planned route, to take a detour to pick up and deliver packages in exchange for a small compensation. We consider a setting in which a company not only has its own fleet of vehicles to service requests, but may also use the services of occasional drivers. These drivers are willing to take a detour to serve one or more transportation requests for a small compensation. This leads to a new extension of the pickup and delivery problem with time windows. Comparison between a compact and an extended formulation for the problem is performed, and the impact of reduction tests and symmetry breaking constraints is tested. The problem is solved to optimality for up to 70 requests. Three compensation schemes are introduced, and it is shown that the model gives cost savings of about 10–15%. Adding more complexity to the compensation schemes may yield larger savings.
This paper studies the supply curve of ride-hailing systems under different market conditions. The curve defines a relationship between the throughput of trips of the system and the cost of riders it serves. We first focus on isotropic markets by revisiting a matching failure identified recently that matches a requesting rider with an idle driver very far away. The failure will cause the supply curve of an e-hailing market to be backward bending, but it is proved that the backward bend does not arise in the street-hailing market. By constructing a double-ended queuing model, we prove that the supply curve of an e-hailing system with finite matching radius is always backward bending, but a smaller matching radius leads to a weaker bend. We further reveal the possibility of completely avoiding the bend by adaptively adjusting the matching radius. We then turn to the anisotropic markets and identify another type of matching failure due to indiscriminate matching between drivers and riders, which again causes a backward bending supply. Given the prevalence of such a matching failure in real-world operations, we discuss how to avoid it using price or rationing discrimination. A conceptualized two-node network is constructed to facilitate the discussion.