ArticlePDF Available

A Markov Decision Process Framework to Incorporate Network-Level Data in Motion Planning for Connected and Automated Vehicles

Authors:

Abstract and Figures

Autonomy and connectivity are expected to enhance safety and improve fuel efficiency in transportation systems. While connected vehicle-enabled technologies, such as coordinated cruise control, can improve vehicle motion planning by incorporating information beyond the line of sight of vehicles, their benefits are limited by the current shortsighted planning strategies that only utilize local information. In this paper, we propose a framework that devises vehicle trajectories by coupling a locally-optimal motion planner with a Markov decision process (MDP) model that can capture network-level information. Our proposed framework can guarantee safety while minimizing a trip's generalized cost, which comprises of its fuel and time costs. To showcase the benefits of incorporating network-level data when devising vehicle trajectories, we conduct a comprehensive simulation study in three experimental settings, namely a circular track, a highway with on-and off-ramps, and a small urban network. The simulation results indicate that statistically significant efficiency can be obtained for the subject vehicle and its surrounding vehicles in different traffic states under all experimental settings. This paper serves as a poof-of-concept to showcase how connectivity and autonomy can be leveraged to incorporate network-level information into motion planning.
Content may be subject to copyright.
A Markov Decision Process Framework to Incorporate
Network-Level Data in Motion Planning for Connected and
Automated Vehicles
Xiangguo Liu, Neda Masoud, Qi Zhu, Anahita Khojandi
Abstract
Autonomy and connectivity are expected to enhance safety and improve fuel efficiency in
transportation systems. While connected vehicle-enabled technologies, such as coordinated
cruise control, can improve vehicle motion planning by incorporating information beyond
the line of sight of vehicles, their benefits are limited by the current short-sighted planning
strategies that only utilize local information. In this paper, we propose a framework that
devises vehicle trajectories by coupling a locally-optimal motion planner with a Markov
decision process (MDP) model that can capture network-level information. Our proposed
framework can guarantee safety while minimizing a trip’s generalized cost, which comprises
of its fuel and time costs. To showcase the benefits of incorporating network-level data
when devising vehicle trajectories, we conduct a comprehensive simulation study in three
experimental settings, namely a circular track, a highway with on- and off-ramps, and a
small urban network. The simulation results indicate that statistically significant efficiency
can be obtained for the subject vehicle and its surrounding vehicles in different traffic states
under all experimental settings. This paper serves as a poof-of-concept to showcase how
connectivity and autonomy can be leveraged to incorporate network-level information into
motion planning.
Keywords: Connected and Automated Vehicles, Trajectory planning
1. Introduction1
Connected vehicle (CV) technology facilitates communication among vehicles, their sur-2
rounding infrastructure, and other road users. This connectivity is enabled through Dedi-3
cated Short Range Communication (DSRC) [1] or cellular technologies, and paints a more4
comprehensive picture of the transportation network than what could be observed by each5
individual road user. As such, it is expected that upon deployment, CV technology would6
significantly improve mobility [2, 3, 4], enhance safety [5, 6] and traffic flow stability [7],7
reduce congestion [8, 9], and improve fuel economy [10], among other benefits [11]. CV tech-8
nology has enabled several advanced driving assistance systems (ADAS), such as Cooperative9
Adaptive Cruise Control (CACC) [12, 13, 14], Connected Cruise Control (CCC) [15, 16] and10
Platooning [17, 18, 19, 20, 21]. Although existing CV-enabled technologies are based on local11
communications, the CV technology can also provide granular data at the network level by12
strategically positioning road side units (RSUs) to ensure connectivity throughout an entire13
network.14
Preprint submitted to Transportation Research Part C: Emerging Technologies January 7, 2022
Motion planning in transportation networks has been traditionally carried out using tech-15
niques that leverage local information to make locally-optimal decisions [22]. In particular,16
optimal control-based models have been widely applied to traditional transportation net-17
works for their ability to provide short-term efficient solutions. The CV technology can18
help improve these locally-optimal motion planners, as it allows vehicles to see beyond line19
of sight. More importantly, it enables vehicles to obtain network-level information through20
communication with other connected vehicles and RSUs. Such connectivity can be leveraged21
to enhance long-term safety and efficiency of planned trajectories; however, for this potential22
to be realized, the network-level information should be integrated into the decision making23
systems. This cannot be accomplished using existing techniques, as they are not scalable24
to utilize granular data collected from the entire network. Hence, new methods need to25
be developed that can (i) leverage network-level data, and (ii) provide fast and efficient26
trajectories that adapt to the stochasticity of traffic networks.27
This paper introduces a general framework that combines high-level network-level infor-28
mation with granular local information to devise network-informed cruising, routing, lane-29
changing, and platoon-merging decisions for a CAVs in a mixed traffic scenario, as shown30
in Figure 1. As demonstrated in this figure, the proposed framework combines an optimal31
control (OC) trajectory planning model proposed in [23] with a Markov decision process32
(MDP) model developed in this paper to devise an efficient trajectory for an entire trip.33
The proposed MDP model can capture the progression of traffic as a stochastic process34
at an aggregate level, thereby complementing the optimal-control-based motion planning35
model through incorporating network-level information. In this context, using the proposed36
MDP framework allows vehicles to skip near-sighted locally-optimal trajectories [23], and37
make routing, lane-changing, and platoon-merging decisions with a long-term view so as to38
minimize a combination of short-term and long-term costs.39
2. Related Works40
2.1. Motion Planning41
Motion planning for automated vehicles has been an active research topic [22, 24, 25, 26].42
With the advancement of communication, computation and sensing technologies, various43
planning and control techniques have been proposed, developed, and applied in complex44
traffic environments. Paden et al. [27] reviewed the planning and control techniques in an45
urban environment, Claussmann et al. [28] reviewed motion planning techniques for highway46
driving, Gritschneder et al. [29], Katrakazas et al. [30] emphasized the real time performance47
of planning techniques, and Zeng and Wang [31], De Nunzio et al. [32], Rakha and Ka-48
malanathsharma [33] focused on the efficiency of the proposed methods. Li et al. [34], Liu49
et al. [35] attempted to balance computational performance and solution quality. Zhang50
et al. [15], Orosz [16] considered communication delay and reaction time in designing motion51
planners, and Hardy and Campbell [36], Galceran et al. [37], Brechtel et al. [38] addressed52
the uncertainty in the driving behaviour of vehicles surrounding autonomous vehicles.53
The motivation of this large body of research on motion planning has been to improve54
safety and comfort as well as reduce travel time and fuel consumption. Safety and colli-55
sion avoidance have been discussed in many studies [39, 40, 41, 42], some of which have56
2
 Short term cost of the optimal
trajectory by the OC model in the remaining
of the road piece
Computed in real-time
 Expected discounted cost of the trip
from road piece    to the end of the trip
Looked-up from a policy table
    
MDP model: Select the trajectory with
the lowest total cost
  
  
  
  
At each update period , within road piece :
Coarse network-level data
Granular data
For each action   
Figure 1: Structure of the proposed MDP framework. The optimal control (OC) model plans a number
of trajectory to determine the short-term cost [23] associated with every higher-layer action aA, which
includes a combination of route choice, lane changing, and platoon merging. The MDP model assesses the
long-term cost associated with each higher-level action aA. The MDP framework selects the action aA
that provides the minimum expected discounted cost of a trip, which is sum of the costs estimated by the
OC and MDP models.
Figure 2: The upper figure displays a freeway stretch segmented into merge (on-ramp), diverge (off-ramp),
and regular road pieces, where the MDP model operates. The lower figure displays a zoomed out view of a
road piece, where the cost of each action (i.e., lane-changing, platoon-merging, and routing) is determined
based on local information. Note that the cost of the optimal control model, Coc, is computed for a given
action, which is determines by the starting and ending states, siand sj.
considered the uncertainty of the surrounding vehicles’ motion [36, 43]. Besides safety guar-57
antee, efficiency manifested in the form of reducing travel time [44, 45] and fuel consumption58
[44, 46, 47] or increasing traffic flow [48, 49] has been one of the driving objectives in devel-59
3
oping motion planners. Despite the proven short-term capability of the proposed methods60
to increase efficiency, they cannot guarantee long-term efficiency due to the limited captured61
horizon.62
Several attempts have been made in the literature to devise trajectories that account for63
beyond the local neighbourhood of the subject vehicle. One such approach is hierarchical64
design, which is sometimes referred to as the combination of trajectory planning and tracking65
[39, 40, 41, 50, 51, 52], and sometimes as the combination of long- and short-horizon planning.66
To avoid confusion, in this study, we use the term hierarchical design to denote the long-67
and short-horizon planning, where higher- and lower-layer decisions are made, respectively.68
There have been several attempts in the literature to conduct longer-term horizon plan-69
ning using hierarchical design, under specific assumptions. Zeng and Wang [31] proposed a70
dynamic programming algorithm under the assumption that the speed profile of the subject71
vehicle’s immediate leading vehicle is fixed and known. Similarly, Qian et al. [53] assumed72
the surrounding vehicles’ future motions to be given. Studies that assume the surrounding73
traffic environment to be fixed and known can compute the optimal speed profile of the74
subject vehicle and have the subject vehicle follow this profile [54, 55, 31]. However, due75
to the assumptions on the motion profiles of the surrounding vehicles, these higher-layer76
plans are not guaranteed to be well-executed or feasible to navigate by lower-layer planners.77
Because the lower-layer planners need to ensure safety and comfort and follow traffic rules,78
sometimes they cannot follow the suggested speed or the planned route due to not finding79
the opportunity to change lane, etc. On the other hand, the hierarchical layered design80
cannot simply be replaced with a one-time optimization problem to make both higher- and81
lower-layer decisions, due to its high computational complexity [56]. In this paper, We aim to82
bridge the gap between the hierarchical but non-efficient trajectory planning and the optimal83
but computationally-complex planning by establishing a feedback loop between higher- and84
lower-layer decisions in hierarchical schemes. In our proposed method, while the lower-layer85
planner attempts to follow the plan provided by the higher-layer planner, the higher-layer86
plan can also be adjusted according to the real-time execution status in the lower-layer.87
In addition to the possibility that the higher-layer plan may not be executable, the88
plans at the lower-layer, the higher-layer, or both layers may be outdated at the time of89
execution in a fast-changing traffic environment. To combat outdated decisions, Paikari90
et al. [45], Boriboonsomsin et al. [47] proposed to update the higher-layer plan, while Guo91
et al. [39], Li et al. [40], Alia et al. [41] considered updating the lower-layer plan. Huang92
et al. [54] utilized a genetic algorithm for higher-layer planning, and a quadratic program93
for lower-layer adaptation, where plans on both layers are updated periodically. The two94
layers of decision making in our proposed hierarchical design are also closely coupled, as the95
lower-layer plan is devised based on higher-layer decisions, and the higher-layer plan can also96
be adapted based on the lower-layer execution status. Moreover, the higher-layer plan in our97
work is updated not only based on real-time state of the downstream traffic, but also based98
on the network-level evolution of traffic. Additionally, our framework is more comprehensive99
as it includes decision making for routing, lane-changing, platooning, and cruising.100
2.2. Markov Decision Processes in Transportation101
A Markov decision process (MDP) is a stochastic control process that is used extensively102
in many fields, including transportation, robotics and economics. MDPs can model the103
4
interaction between agents and the stochastic environment. The goal of an MDP model104
is to find a policy that maximizes the total expected cumulative reward in a stochastic105
environment [57, 58].106
In the transportation field, MDPs have been utilized to plan local trajectories by mod-107
eling the uncertainty of driver behavior [43, 59]. MDP and its variant, partially observable108
Markov decision process (POMDP), have also been applied for vehicle behavior analysis and109
prediction [60, 37, 38] and driving entity switching policy [5]. Brechtel et al. [56] proposed110
an MDP-based motion planning model to devise a vehicle’s target position and velocity. The111
authors identified the scalability of their proposed method with respect to the number of112
vehicles as an open problem. To tackle the computational complexity of the problem, the113
authors adopted a fixed discretization of the action space to formulate the problem, which114
could render their methodology inefficient. You et al. [61] designed a reward function for115
MDP with the objective of obtaining expert-like driving behavior. This model determines116
the velocity of the subject vehicle and whether the vehicle should change lanes, considering117
the relative position of the subject vehicle and its surrounding vehicles.118
The studies above mostly employ MDPs to determine the velocity of the subject vehicle,119
leaving out higher-layer decisions. A recent work ([62]) developed a hierarchical framework120
in which an MDP model was employed to make lane-changing decisions in the higher layer.121
They introduced three models, namely a trajectory smoother, a longitudinal controller, and122
a lateral controller to address the detailed execution in the lower layer. In our work, we123
further consider the long-term efficiency of a trajectory by extending the MDP model to a124
more general motion planner, which includes routing, lane-changing, and platoon-merging.125
In our proposed work, safety and comfort are ensured by the planner in the lower layer,126
while the MDP model explores the long term benefits of the planned trajectory by consid-127
ering the stochastic changes in the downstream traffic environment. We use simulations to128
demonstrate that our proposed method results in statistically significant reductions in the129
long-term generalized trip cost.130
2.3. Our Contributions131
This paper introduces a framework that facilitates making trajectory planning decisions132
(namely, cruising, lane-changing, platoon-merging, and route choice) based on both local and133
network-level data. More specifically, our framework makes joint cruising, lane-changing,134
platoon-merging, and routing decisions to minimize the total expected discounted cost of a135
(leg of a) trip in a dynamic environment. This is accomplished through two main modules136
within an MDP framework: (1) an optimal-control-based trajectory planning model that137
provides the vehicle’s acceleration profile with the goal of maximizing safety and comfort138
locally [23]; and (2) an MDP model that enables incorporating network-level information139
into the decision making process.140
The contributions of this paper are as follows. This work is the first to advance the tra-141
ditional local motion planning models by incorporating strategically-condensed high volume142
of network-level data using a Markov Decision Process (MDP) modeling framework, hence143
devising entire efficient trajectories in dynamic traffic streams. In this general framework,144
cruising, routing, lane-changing, and platoon-merging decisions are made concurrently. We145
conduct comprehensive simulation experiments to demonstrate the benefits of augmenting146
traditional trajectory planning models with an MDP model for both the subject vehicle and147
5
its surrounding vehicles. We demonstrate that not only does a CV benefit from utilizing148
network-level information in devising its own trajectory, but its surrounding vehicles, which149
may be CAVs or legacy vehicles, also experience second-hand cost-reduction benefits. These150
results could have great policy implications, as they demonstrate that only a handful of151
CAVs in a traffic stream could serve as traffic regulators.152
3. Problem Statement153
Consider a CAV, to which we refer as the subject vehicle, who is making a trip from a154
known origin to a known destination. The subject vehicle is able to directly observe its sur-155
roundings using its onboard sensor systems as well as basic safety messages (BSMs) obtained156
from other vehicles or RSUs within its communication range. Owing to its connectivity, the157
subject vehicle can also obtain network-level information about the state of traffic. The158
objective of the subject vehicle is to navigate the network safely and comfortably, while at159
the same time minimizing its travel cost, which is composed of time cost and energy cost,160
by utilizing both granular local data and coarse network-level information.161
4. Methodology162
4.1. The MDP Framework163
The proposed framework determines the trajectory of a subject vehicle, including fine-164
grained decisions (i.e., the acceleration profile) and coarse decisions (i.e., routing, lane chang-165
ing, and platoon merging). In this framework, fine-grained decisions are made by a local166
optimal control trajectory planning model using only local information, and coarse decisions167
are made by an MDP model using network-level information. The MDP framework combines168
the two models to make a final decision about the trajectory of the subject vehicle: For each169
coarse action (where a coarse action is a unique vector of route choice, platoon merging,170
and lane changing), the MDP framework uses the optimal control model to obtain the low-171
est short-term cost of completing the action, and the MDP model to obtain the long-term172
expected discounted cost of completing the same action. Finally, the action that provides173
the the lowest total cost will be selected and pursued by the vehicle. This framework is174
demonstrated in Figure 1.175
An example network is displayed in Figure 2, where the subject vehicle is located on176
the right lane, planning to take the off-ramp marked by an arrow. The general travel cost177
incurred by the vehicle is a linear combination of the route travel time and fuel cost. To178
optimize its trajectory, in addition to determining the exact position, speed, and acceleration179
of the subject vehicle at each point in time, we need to make three sets of higher-level180
decisions with long-term implications: whether (and where) to change lanes, whether to join181
(or split from) a platoon, and which route to take.182
Each action can have conflicting implications in terms of energy efficiency, travel time,183
and passenger safety. For example, the vehicle would be able to travel at a higher speed184
on the left lane, but may have more opportunities to join a platoon and increase its fuel185
economy on the right lane. The trade-offs between these actions can be captured by an186
optimal-control-based trajectory planning model that uses local information (i.e., the speed187
and availability of platoons at both lanes). As another example, while joining a platoon would188
6
provide fuel efficiency, changing platoon membership frequently could pose safety risks on189
the vehicle occupants and create instability in the traffic stream. This example highlights190
the importance of not making decisions based solely on minimizing the short-term vehicle-191
specific costs, and taking a longer-term, futuristic view of the cost that requires incorporating192
network-level information into the decision making process. As such, the proposed MDP193
framework is designed to capture the expected long-term cost of each action, allowing the194
vehicle to make informed decisions based on both local and network-level information.195
In order to model the system with a view on facilitating the incorporation of both granular196
and network-level information, we make a number of assumptions. First, we divide the197
network into a number of relatively large cells, to which we refer as road pieces. Road pieces198
are constructed such that (i) the macroscopic-level traffic dynamics are homogeneous within199
each piece at each point in time; and (ii) all vehicles within a road piece are within a reliable200
communication range of one-another. As such, we introduce three types of road pieces,201
namely, merge (which includes a single on-ramp/road), diverge (which includes a single off-202
ramp/road), and regular (which does not include any on- or off-ramps). In Figure 2, for203
example, l1is an on-ramp or merge piece, while l4and l5are regular pieces.204
The trajectory planning model is re-optimized dynamically as the immediate neighbour-205
hood of the subject vehicle evolves. This re-optimization occurs after a time period tupd
206
has lapsed, which is set to 0.4 sec following [23]. The MDP model is solved off-line, and its207
resulting optimal policies are stored in a look-up policy table that can be accessed at any208
time. In the rest of this section, we elaborate on the MDP model in subsection 4.2, and209
provide a brief overview of the optimal control model in subsection 4.3.210
4.2. The MDP Model211
The MDP framework considers three traffic states, namely, free-flow, onset-of-congestion,212
and congested traffic. The traveling speed of the subject vehicle is determined based on the213
traffic state of the road piece the vehicle is traversing. When the subject vehicle enters a new214
road piece li, a decision is made as to whether the vehicle should change lanes and whether215
to join a platoon. It is assumed that the vehicle can finish the lane changing and platoon216
merging processes within the same road piece li. If there are more than one road pieces217
following li, the subject vehicle also has to make a route choice decision by selecting one of218
the candidate road pieces, l0
iSl(li), where Sl(li) = {l0
i1, l0
i2, . . .}is the set of road pieces219
connected to li, and therefore depends on the network structure.220
Table 1: Table of notation
Notation Definition
Le Left lane
Ri Right lane
liRoad piece i
lA generic road piece
l0
ij The jth road piece directly connected to road piece li
Sl(li) = {l0
i1, l0
i2, . . .}Set of road pieces directly connected to road piece li
l0
iSl(li)The selected road piece among the set of road pieces connected
to li
7
L={li}Set of road pieces
loThe road piece at the origin of the trip
ldThe road piece at the destination of the trip
ξLe
tr Macroscopic state of traffic in the left lane
ξRi
tr Macroscopic state of traffic in the right lane
ξtr = [ξLe
tr , ξRi
tr ] Vector specifying the macroscopic state of traffic
ξLe
pPercentage of platoon-enabled vehicles in the left lane
ξRi
pPercentage of platoon-enabled vehicles in the right lane
ξp= [ξLe
p, ξRi
p]Vector specifying the percentage of platoon-enabled vehicles
µ= [li, ξtr, ξp] The environment state vector
φl∈ {Le, Ri}The lateral position of the subject vehicle
φp∈ {0,1}Platoon membership status of the vehicle
φ= [φl, φp, d] The vehicle state vector
dThe number of road pieces to the scheduled splitting of the
platoon, where d=1 if the subject vehicle is not in a platoon
s= (µ, φ)SState of the traffic dynamics process
S={s}Set of all possible states of the traffic dynamics process
cf(s) The fuel cost of the subject vehicle at state s
ct(s) The time cost of the subject vehicle at state s
cdi(s1, s2)Passenger discomfort/safety risk cost for a vehicle transitioning
from state s1to s2
Nlc Number of lane changes
Λ=[λf, λt, λdi]Vector of cost component coefficients, containing elements for
fuel, time, and discomfort/safety
C(s1, s2) = [(cf(s1) +
cf(s2))/2,(ct(s1) +
ct(s2))/2, cdi(s1, s2)]>
Cost vector for a vehicle transitioning from state s1to s2
Cs2
s1= ΛC(s1, s2) Sum of fuel, time, and comfort/safety costs
V([l, ξtr, ξp],[φl, φp, d]) The minimum total expected discounted cost-to-go starting
from state s= (µ, φ)
cfl Cost of missing the trip destination
Probability distributions
qf
φ(µ)Probability that the subject vehicle fails to change lanes if such
a decision has been made
g1
l(ξp) Probability of successful platoon merging with lane changing
g0
l(ξp) Probability of successful platoon merging without lane changing
w(k)Probability distribution for the number of road pieces, k, for
which the subject vehicle can stay with a platoon it has met
Transition matrices
pLe
l((ξLe
tr )0|ξtr, ξp)Probability that the traffic state transitions to (ξLe
tr )0in the left
lane, given ξtr and ξp
pRi
l((ξRi
tr )0|ξtr, ξp)Probability that the traffic state transitions to (ξRi
tr )0in the
right lane, given ξtr and ξp
8
hLe
l((ξLe
p)0|ξtr, ξp)Probability that the platoon intensity transitions to (ξLe
p)0in
the left lane, given ξtr and ξp
hRi
l((ξRi
p)0|ξtr, ξp)Probability that the platoon intensity transitions to (ξRi
p)0in
the right lane, given ξtr and ξp
Actions
al∈ {Le, Ri}Target lane
ap∈ {0,1}Target platoon membership
arSl(li) Target route
a= [al, ap, ar] The action taken by the subject vehicle
A={a}Action set
Let s= (µ, φ)Sdenote the state of the traffic dynamics process and Sis the set of221
all possible states s. Vector µ= [li, ξtr, ξp] in this process denotes the location-dependent222
environment state, where liLand Lincludes the location of the origin and destination223
of the trip (leg), denoted by loand ld, respectively, and all other road pieces on all possible224
paths that connect the origin to the destination. The vector ξtr = [ξLe
tr , ξRi
tr ] denotes the225
macroscopic state of traffic on the left and right lanes, respectively. More specifically, we226
consider three macroscopic traffic states of free-flow, onset-of-congestion, and congested.227
Vector ξp= [ξLe
p, ξRi
p] denotes the percentage of platoon-enabled vehicles on the left and228
right lanes, respectively.229
Let φ= [φl, φp, d] denote the state of the subject vehicle. Here, φl∈ {Le, Ri}denotes230
the lateral position of the subject vehicle, where ‘Le’ and ‘Ri’ refer to the left and right231
lanes, respectively. Furthermore, φp∈ {0,1}is a binary indicator denoting the platoon232
membership status of the vehicle, where φp= 0 indicates that the subject vehicle is not a233
platoon member and φp= 1 indicates otherwise. Let ddenote the number of road pieces to234
the scheduled splitting of the platoon the subject vehicle is a member of. We set d=1 if235
the subject vehicle is not in a platoon. We assume that before merging, vehicles that will236
stay in a same platoon will negotiate and reach consensus on the scheduled splitting position237
d=k. Vehicles in a platoon moving to the next road piece will have their ddecreased238
by 1. The platoon has to split/dissolve when d= 0. (We assume the subject vehicle can239
optimize its action periodically and thus actively split before the scheduled splitting position240
in our current model, but this can be easily modified by disabling the first, third and fourth241
expression in Equation (8).)242
Let a= [al, ap, ar] denote the action taken by the subject vehicle in the beginning of each243
road piece, where al∈ {Le, Ri}denotes the target lane for the subject vehicle, ap∈ {0,1}244
denotes the target platoon membership, where ap= 0 indicates that the vehicle stays as245
a free agent and ap= 1 indicates that the vehicle merges into a platoon, and arSl(li)246
denotes the path selected by the vehicle.247
Let cf(s) and ct(s) denote the fuel cost and time cost of the subject vehicle at the state248
s, respectively. See [23] for the computation of fuel cost, cf(s). The time cost of a trip (leg)249
can be computed as the length of the road piece leniover the velocity in lane φlunder traffic250
condition ξtr,v(ξtr , φl), i.e.,251
ct(s) = leni/v(ξtr , φl) (1)
Let cdi(s1, s2) denote the cost associated with passenger discomfort/safety risk for tran-252
sitioning from state s1to s2. The passenger discomfort/safety cost is assumed to be realized253
9
when the vehicle is changing lanes, and increase linearly with the number of lane changes.254
Therefore,255
cdi(s1, s2) = g(Nlc ) (2)
where g(.) is a linear function and Nlc is the number of lane changes in the current road256
piece. In one road piece, the subject vehicle is not expected to change lanes more than once,257
i.e., Nlc ∈ {0,1}.258
Let Cs2
s1denote the sum of all three costs discussed above for a vehicle that starts a259
road piece in state s1and ends it in state s2. The exact transition position depends on the260
real-time traffic environment. For simplification, we assume the transition takes place in the261
middle of a road piece, and therefore Cs2
s1can be formulated as:262
Cs2
s1= ΛC(s1, s2) = λf(cf(s1) + cf(s2))/2 + λt(ct(s1) + ct(s2))/2 + λdicdi (s1, s2)(3)
where the vector Λ = [λf, λt, λdi] contains the corresponding coefficients for each cost com-263
ponent, and C(s1, s2) = [(cf(s1) + cf(s2))/2,(ct(s1) + ct(s2))/2, cdi(s1, s2)]>is the cost vector264
for a vehicle transitioning from state s1to s2. Note that all costs are functions of our action,265
where the action is implied from the transition of the state from s1to s2. We assume that266
Λ can be different for each driver, since different cost terms are of different importance for267
each driver. The total travel cost Cs2
s1describes the generalized cost of travel in a road piece.268
For example, the MDP cost for a vehicle that starts a road piece on the left lane as a free269
agent and ends the road piece on the right lane as a free agent can be denoted by CRi,0,1
Le,0,1.270
An important part of the MDP model is the transition probability matrices that allow271
us to model the dynamics of the system. Let pLe
l((ξLe
tr )0|ξtr, ξp) and pRi
l((ξRi
tr )0|ξtr, ξp) de-272
note the probability that given ξtr and ξp, the traffic state transitions to (ξLe
tr )0in the left273
lane and to (ξRi
tr )0in the right lane in road piece li, respectively. Let hLe
l((ξLe
p)0|ξtr, ξp) and274
hRi
l((ξRi
p)0|ξtr, ξp) denote the probability that given ξtr and ξp, the platoon intensity tran-275
sitions to (ξLe
p)0in the left lane and to (ξRi
p)0in right lane, respectively. These transition276
probability matrices can be learnt from historical data.277
Let qf
φ(µ) denote the probability that the subject vehicle fails to change lanes if such a278
decision has been made. Note that qf
φis a function of the traffic state in target lane. Let279
g1
l(ξp) and g0
l(ξp) denote the probability of successful platoon merging with and without lane280
changing, respectively. Note that g1
lis a function of the density of platoon-enabled vehicles281
in the target lane, and g0
lis a function of the availability of platoon-enabled vehicles in the282
immediate downstream of the subject vehicle in the original lane. Let l0
iand µ0= [l0
i, ξ0
tr, ξ0
p]283
denote a candidate road piece directly connected to liand its corresponding environment284
state vector, respectively. Hence, the problem terminates when the vehicle reaches its desti-285
nation, i.e., li=ld. Finally, let V([li, ξtr, ξp],[φl, φp, d]) denote the minimum total expected286
discounted cost starting with the vehicle state [φl, φp, d] and the environment state [li, ξtr, ξp].287
Hence, for li=ld, the minimum total expected discounted cost is given by288
V([ld, ξtr, ξp],[φl, φp, d]) = (0 if the vehicle is at the correct destination
cfl otherwise (4)
where cfl is a cost incurred should the subject vehicle fail to reach its destination (e.g., the289
vehicle should be a single vehicle in the right lane at the target off-ramp piece).290
10
For li6=ld, when φl=Le,φp= 0, d=1, the minimum expected discounted cost is291
given by292
V(µ, Le, 0,1) =
minaA
ΛCLe,0,1
Le,0,1+U(µ0, Le, 0,1) al=Le,ap= 0
g0
l(ξp){ΛCLe,1,k1
Le,0,1+W(µ0, Le, 1, k 1)}+
(1 g0
l(ξp)){ΛCLe,0,1
Le,0,1+U(µ0, Le, 0,1)}al=Le,ap= 1
qf
φ(µ){ΛCLe,0,1
Le,0,1+U(µ0, Le, 0,1)}+
(1 qf
φ(µ)){ΛCRi,0,1
Le,0,1+U(µ0, Ri, 0,1)}al=Ri,ap= 0
g1
l(ξp)(1 qf
φ(µ)){ΛCRi,1,k1
Le,0,1+W(µ0, Ri, 1, k 1)}
+(1 g1
l(ξp)(1 qf
φ(µ))){ΛCLe,0,1
Le,0,1+U(µ0, Le, 0,1)}al=Ri,ap= 1
(5)
where Uand Ware described in Equations (6) and (7) as the minimum expected discounted293
cost of the remainder of the trip starting from the next road piece l0
ifor a vehicle that intends294
to maintain its state and join a platoon, respectively.295
U(µ0, φl, φp, d) = αX
ξ0
trξ0
p
pLe
l((ξLe
tr )0|ξtr, ξp)pRi
l((ξRi
tr )0|ξtr, ξp)
hLe
l((ξLe
p)0|ξp, ξp)hRi
l((ξRi
p)0|ξp, ξp)V(µ0, φl, φp, d)
(6)
W(µ0, φl,1, k 1) = αX
ξ0
trξ0
pX
k
w(k)pLe
l((ξLe
tr )0|ξtr, ξp)pRi
l((ξRi
tr )0|ξtr, ξp)
hLe
l((ξLe
p)0|ξp, ξp)hRi
l((ξRi
p)0|ξp, ξp)V(µ0, φl,1, k 1)
(7)
The four arguments of the min function in Equation (5) correspond to the costs of the296
lane changing and platoon merging actions. The expected discounted cost (with the initial297
values as specified) is then the minimum cost over the entire action set, which consists of298
lane changing, platoon merging, and route choice. The first expression in Equation (5)299
corresponds to the action that results in no change in the state of the vehicle; that is, the300
subject vehicle stays on the left lane as a single agent. The cost of this action is equal to the301
cost of continuing with the initial state (Le, 0,1) on the current road piece, plus the min302
expected discounted cost of starting the next road piece under the same initial state.303
The second expression in Equation (5) corresponds to the action of staying on the left lane,304
but joining a platoon. The first term here corresponds to the expected cost of the scenario305
where the vehicle successfully joins a platoon. Under this scenario, the vehicle incurs both306
the cost of this new trajectory on the current road piece and the expected discounted cost of307
the rest of the trip starting from its new state as a platoon member. In case the execution308
of this action fails (i.e., the vehicle cannot join a platoon), the vehicle will continue under309
11
the previous state on the current road piece, and incurs an expected discounted cost for the310
rest of the trip starting from the left lane as a single agent. This cost is demonstrated in the311
second term.312
The third expression in Equation (5) corresponds to the action of changing to the right313
lane and remaining a free agent. Similar to the previous case, the first term captures the314
expected cost if the action can be completed, and the second term corresponds to the cost315
of the trajectory if the vehicle fails to complete the action.316
Finally, the last expression in Equation (5) corresponds to the action of changing lanes317
and joining a platoon. In this case, the expected discounted cost is the summation of two318
terms, the first term corresponding to the entire action being completed, and the second319
term corresponding to the action failing.320
For the case where the subject vehicle is a platoon member and the platoon splitting time321
has not been reached (i.e., li6=ld, when φl=Le,φp= 1, d > 0), the minimum expected322
discounted cost is given by323
V(µ, Le, 1, d) =
minaA
ΛCLe,0,1
Le,1,d +U(µ0, Le, 0,1) al=Le,ap= 0
ΛCLe,1,d1
Le,1,d +U(µ0, Le, 1, d 1) al=Le,ap= 1
qf
φ(µ){ΛCLe,1,d1
Le,1,d +U(µ0, Le, 1, d 1)}+
(1 qf
φ(µ)){ΛCRi,0,1
Le,1,d +U(µ0, Ri, 0,1)}al=Ri,ap= 0
1g1
l(ξp)(1 qf
φ(µ)){ΛCLe,1,d1
Le,1,d +U(µ0, Le, 1, d 1)}
+g1
l(ξp)(1 qf
φ(µ)){ΛCRi,1,k1
Le,1,d +W(µ0, Ri, 1, k 1)}al=Ri,ap= 1
(8)
The first expression in the min function in Equation (8) refers to the case that the subject324
vehicle splits from its platoon without changing lanes. Since this can always be achieved,325
the expected discounted cost of this action is the cost of the subject vehicle traveling on its326
current road piece as a free agent, plus its expected discounted cost of continuing to travel327
as a free agent starting from the next road piece.328
The second expression in Equation (8) describes the scenario where the subject vehicle329
maintains its current state. Under this scenario, the subject vehicle traverses its current330
road piece while maintaining its state, and continues the rest of its trip with the platoon331
splitting time reduced by one unit.332
The third expression in Equation (8) has the subject vehicle splitting from the platoon333
and changing lanes. When the subject vehicle decides to change lanes while in a platoon,334
it has to split from its platoon first. The first term here captures the scenario where the335
subject vehicle is not able to change lanes, under which case it will continue in its current336
platoon. Note that the OC model will inform the subject vehicle whether it can successfully337
change lanes. As such, if OC determines that changing lanes cannot take place safely, the338
subject vehicle will not split from its platoon. If the subject vehicle can change lanes, it will339
split from its platoon and continue the rest of the trip on the right lane as a free agent.340
12
Finally, the fourth expression in Equation (8) has the subject vehicle changing lanes and341
traveling on the right lane in a platoon. For this action to take place, the subject vehicle342
should split from its current platoon, change lanes, and join a platoon on the right lane. Since343
we are assuming that the subject vehicle is always able to split from its current platoon, the344
probability of completing this action is the probability of successfully changing lanes and345
joining a platoon in the new lane. The first term here captures the cost of this action failing,346
in which case the subject vehicle would continue on the left lane in its current platoon. The347
second term captures the cost of the action being completed successfully.348
For the case where the vehicle is a platoon member on the left lane and the platoon349
splitting time has arrived (i.e., li6=ld, when φl=Le,φp= 1, d= 0), the minimum expected350
discounted cost is given by351
V(µ, Le, 1,0) =
minaA
ΛCLe,0,1
Le,1,0+U(µ0, Le, 0,1) al=Le,ap= 0
g0
l(ξp){ΛCLe,1,k1
Le,1,0+W(µ0, Le, 1, k 1)}+
(1 g0
l(ξp)){ΛCLe,0,1
Le,1,0+U(µ0, Le, 0,1)}al=Le,ap= 1
qf
φ(µ){ΛCLe,0,1
Le,1,0+U(µ0, Le, 0,1)}+
(1 qf
φ(µ)){ΛCRi,0,1
Le,1,0+U(µ0, Ri, 0,1)}al=Ri,ap= 0
g1
l(ξp)(1 qf
φ(µ)){ΛCRi,1,k1
Le,1,0+W(µ0, Ri, 1, k 1)}
+1g1
l(ξp)(1 qf
φ(µ)){ΛCLe,0,1
Le,1,0+U(µ0, Le, 0,1)}al=Ri,ap= 1
(9)
In Equation (9), d= 0 indicates that the platoon is dissolving and the subject vehicle352
has to split from it in the current road piece. The first expression in Equation (9) captures353
the scenario where the subject vehicle continues to travel on the left lane as a free agent354
after splitting from its current platoon.355
The second expression in Equation (9) captures the case where the subject vehicle decides356
to join another platoon in the left lane, which may fail due to the absence of platoon-enabled357
vehicles in the left lane (second term).358
The third expression in Equation (9) indicates that the subject vehicle plans to change359
lanes and continue to travel as a free agent. This action may fail if the subject vehicle cannot360
change lanes (first term), in which case the subject vehicle continues to travel on the left361
lane as a free agent. Otherwise, the subject vehicle travels on the right lane as a free agent.362
The fourth expression in Equation (9) captures the scenario where the subject vehicle363
switches to the right lane and joins a platoon. The first term is the cost of the case where364
this action can be completed successfully, and the second term captures the case where this365
action fails.366
For other cases that the vehicle is on the right lane (i.e., φl=Ri), the minimum expected367
discounted cost has similar formulas as above. Refer to Appendix A for details.368
13
4.3. The Optimal Control (OC) Model369
The MDP model creates a policy that advises the set of coarse actions the vehicle needs370
to take in order to complete its trip in the most cost-effective way. However, the MDP371
model cannot provide a full, implementable trajectory for the subject vehicle that includes372
its target acceleration profile. As such, the MDP framework utilizes an optimal control373
(OC) model to bridge this gap. The role of the OC model is two-fold: First, it devises374
an acceleration profile for the subject vehicle to complete coarse actions (or determines the375
infeasibility of completing the coarse actions) following a quintic trajectory function and376
subject to collision avoidance and bounds on the vehicle’s speed, acceleration, and jerk [23].377
The quintic trajectory function is selected due to its ability to provide a smooth trajectory.378
This function is demonstrated in Equation (10). In this equation, x(t) and y(t) indicate379
the longitudinal and lateral positions of the vehicle at time t, respectively, Coefficients ai
0
380
through ai
5and bi
0through ai
5are decision variables that determine the optimal solution.381
(x(t) = ai
5t5+ai
4t4+ai
3t3+ai
2t2+ai
1t+ai
0
y(t) = bi
5t5+bi
4t4+bi
3t3+bi
2t2+bi
1t+bi
0
.(10)
Additionally, the OC model quantifies the short-term cost of completing the coarse actions382
based on the acceleration profile of the vehicle [23]. More specifically, given the action383
a={al, ap, ar}, the OC model plans a trajectory that minimizes a convex combination of384
fuel and time costs, subject to safety and comfort guarantees. The details on the OC model385
can be found in [23].386
For each action aA, this short term cost Coc is then combined with the expected387
long-term cost V(µ, φ) in the MDP framework. The MDP framework enumerates all coarse388
actions aA, and selects the action that minimizes the total cost by the OC and MDP389
models.390
5. Experiments and Analysis391
In this section, we will conduct simulations in three experimental settings, namely a392
circular track, a straight highway, and a small network with route choice. We compare the393
performance of the local OC model and the MDP framework, in which the OC and MDP394
models are combined, under different traffic states in all three experimental settings. Our395
simulations are based on a previously built simulation platform in [23], in which surrounding396
vehicles follow the Intelligent Driver Model [63]. We consider aerodynamic, rolling, grade,397
and inertial resistance forces for fuel cost computation [64], and set the value of time (VoT)398
to 10 dollars per hour.399
5.1. Model Calibration400
In a future connected and automated vehicle system, parameters of the MDP framework401
can be calibrated using historical data. Note that even when abundant CAV data becomes402
available, it could still be a difficult task to fully and precisely represent every single driving403
scenario due to the complexity of human behaviour, non-linearity of interactions between404
vehicles, and the dynamic nature of the transportation network. Therefore, a more practical405
14
approach would be to use historical data to partition ξLe
tr ,ξRi
tr ,ξLe
pand ξRi
pinto different clus-406
ters, representing different traffic states and platoon intensities in the left and right lanes,407
respectively. The transition probabilities can then be estimated using the maximum likeli-408
hood principle, based on the occurrence percentages of the corresponding state transitions in409
historical records. Furthermore, once data is available, we can use it in a maximum likelihood410
estimation framework to calibrate functions qf
φ(µ), g1
l(ξp) and g0
l(ξp), and w(k).411
For the current study, since historical data does not exist, we use simulations to create412
CAV driving scenarios, and treat observations within simulations as historical data. We con-413
duct simulations using the OC model proposed in [23], in which a mixed traffic of CAVs and414
legacy vehicle can travel in a traffic stream. The parameter values used in these simulations415
are specified in Appendix B. After a warm-up period of about 20 minutes, we estimate the416
required parameters for this study using the maximum likelihood principle.417
In this work, we assume that only the subject vehicle is adopting the MDP framework,418
and thus the actions taken by a single vehicle do not change the macroscopic traffic state419
of the system. If the penetration rate of vehicles that adopt the MDP framework is high,420
actions taken by these vehicles could change the state. In this case, model parameters and421
the optimal MDP policy can be updated periodically to capture such changes.422
5.2. A Circular Track Scenario423
Circular track is a great experimental setting as it can demonstrate the impact of the424
proposed methodology not only on the generalized cost of a trip, but also on the properties425
of traffic wave propagation [65, 66]. Stern et al. [9] demonstrate that a low penetration of426
autonomous vehicles can effectively dampen the stop-and-go wave using a circular track.427
Here, we conduct our simulations in a circular track, where the surrounding vehicles can428
merge into platoons, but cannot change lanes, enter through on-ramps, or exit from off-429
ramps. In these simulations, the subject vehicle will have a trip of 10.8 kilometers in length,430
and different traffic states (e.g., free-flow, onset-of-congestion and congested) are generated431
similar to [23], by utilizing a fundamental diagram of traffic flow.432
In figures presented in this paper, OC and MDP refer to the local optimal controller433
and the MDP framework (also referred to as the MDP controller), respectively. The suffix434
xK indicates that the circle length is xkilometers. The suffix low,medium and high435
represent the penetration of platoon-enabled vehicles. Specifically, low indicates that all436
surrounding vehicles are non-platoon-enabled, medium indicates a subset (about 30%) of437
surrounding vehicles are platoon-enabled, and high indicates that all surrounding vehicles438
are platoon-enabled.439
Figure 3 shows the generalized cost incurred by the subject vehicle under the OC and440
MDP controllers when the circular track is 2, 5 and 10 km in perimeter, respectively. This441
figure indicates that the circle perimeter does not significantly affect the subject vehicle’s442
generalized cost. In the free-flow and onset-of-congestion states, the MDP controller provides443
statistically significant (at the 5% significance level) lower costs. In the congested traffic444
state, no statistically significant difference in cost is observed between the MDP and OC445
controllers, although the variance of cost is lower under the MDP controller.446
Figure 4 shows the generalized cost for the subject vehicle under different controllers and a447
track perimeter of 5km, as the penetration rate of platoon-enabled vehicles in the surrounding448
traffic changes. In the free-flow state, it is only under a high penetration rate that the MDP449
15
Figure 3: The simulation environment is a circular track. The top, middle and bottom sub-figures represent
the free-flow, onset-of-congestion, and congested traffic states, respectively. The vertical axes show the
generalized costs with VoT set to 10 dollars per hour. Along the horizontal axes, the generalized costs of
the subject vehicle under different controllers in circular tracks of different lengths are compared. Here ‘OC’
and ‘MDP’ denote local optimal and the MDP controllers, respectively. The suffix ‘ xK’ indicates that the
length of the circular track is x kilometers.
controller results in a significantly smaller cost compared with the OC controller, and there450
is no significant difference when penetration rate is low or medium. In onset-of-congestion451
traffic state, the MDP controller has significantly smaller costs than the OC controller at452
all penetration rates. In congested traffic, the MDP and OC controllers are not different in453
a statistically significant manner, although the generalized cost is much lower under a high454
penetration rate of platoon-enabled vehicles. Generally, higher intensity of platoon-enabled455
vehicles gives rise to more opportunities for the subject vehicle to join a platoon, thereby456
resulting in less cost.457
It is hypothesized that the benefits of CAVs can be extended to their surrounding vehicles.458
To put this hypothesis to test, we measure the average time cost, fuel cost and generalized459
cost of 15 vehicles traveling upstream of the subject vehicle. Figures 5 and 6 show the460
costs incurred by the upstream vehicles under the same settings as in Figures 3 and 4,461
respectively. In order to observe the performance of the OC and MDP controllers directly,462
we subtract the costs under the OC controller from those of the MDP controller. Figure463
5 shows that generally speaking, a subject vehicle that travels under the MDP controller464
can induce statistically significant cost savings for its surrounding vehicles under any traffic465
state. Figure 6 confirms the same conclusion. This figure also shows that these second-hand466
benefits are more highlighted when the density of platoon-enabled vehicles is higher.467
16
Figure 4: The suffixes ‘ low’, ‘ medium’ and ‘ high’ represent different levels of intensities of platoon-enabled
vehicles in the environment. Specifically, ‘ low’ indicates that all surrounding vehicles are non-platoon-
enabled, ‘ medium’ indicates that a proportion (about 30%) of the surrounding vehicles are platoon-enabled,
and ‘ high’ indicates that all surrounding vehicles are platoon-enabled. Other settings are the same as Figure
3.
Figure 5: Differences in average time, fuel, and generalized costs of the vehicles upstream to the subject
vehicle for different track lengths. A positive value indicates that MDP results in a higher cost than OC,
while a negative value indicates that MDP brings more cost savings than OC. The simulation settings are
similar to those in Figure 3.
17
Figure 6: Differences in average time, fuel, and generalized costs of the vehicles upstream to the subject
vehicle under different penetration rates of platoon-enabled vehicles. A positive value indicates that MDP
results in a higher cost than OC, while a negative value indicates that MDP brings more cost savings than
OC. The simulation settings are similar to those in Figure 4.
Figure 7: The simulation environment is a highway with on- and off-ramps. The value following ‘MDP ’ in
the name of the controller specifies the discount factor, αin the MDP model. Other settings are similar to
those in Figure 3.
5.3. A Two-lane Highway Scenario468
In this highway scenario, we adopt the same surrounding environment setting as in [23].469
Surrounding vehicles can change lanes, merge/exit from the highway, and join into/split from470
18
Figure 8: The average generalized cost of the surrounding vehicles. The value following MDP in the name
of the controller specifies the discount factor, αin the MDP model. Other settings are similar to those in
Figure 7.
a platoon. Figure 7 demonstrates the generalized costs of different controllers, where the471
number in the controller name is the value of α, i.e., the discount factor used in the MDP472
model. This figure shows that in all traffic states, the larger the discount factor (i.e., the473
more weight on the expectation of the long term cost), the smaller the cost for the subject474
vehicle along the entire trip, which highlights the importance of accounting for the long-term475
trip cost. Figure 8 shows the generalized cost of the surrounding vehicles. In the free-flow476
traffic state, the MDP controller results in significantly smaller cost for the surrounding477
vehicles, and these savings grow as the MDP discount factor increases. However, under the478
onset-of-congestion and congested traffic states, the OC and MDP controllers do not show479
significant differences in cost.480
Figure 9 shows the generalized costs incurred by the subject vehicle and its immediate481
downstream vehicles for an example trip in the onset-of-congestion traffic state, as well as482
the lateral position and platoon membership status of the subject vehicle. The top plot483
in this figure pertains to the trajectory formed by the OC model, and the bottom plot484
demonstrates the trajectory devised by the MDP controller. In the top plot, the subject485
vehicle makes decisions based solely on local information; as such, its trajectory tends to486
closely follow the trajectory of its downstream vehicle. This figure shows that under the OC487
controller, the subject vehicle changes to the left lane at about 2950 time steps, and then488
returns to its original lane at about 3750 time steps, an indicator of short-sighted decisions.489
The subject vehicle’s platoon membership status also changes frequently starting at about490
4600 time steps. These actions disturb the traffic stream and increase the generalized cost491
of the subject vehicle and its surrounding vehicles. In the bottom plot, the subject vehicle492
changes to the left lane at an early time, in which it travels for the rest of its trip. The493
19
Figure 9: The vertical axis shows the generalized cost, with the unit of dollars per 10 km. The horizontal
axis is time, with the unit of 0.1 second. Generalized cost of the subject vehicle and its immediate upstream
vehicles, as well as its lane position and platoon membership status are shown. In the top plot, the subject
vehicle is traveling under the OC controller, while in the bottom plot, the subject vehicle is traveling under
the MDP controller.
subject vehicle also joins platoons twice during its trips, but for longer periods of time. In494
general, the cost of the subject vehicle under the OC controller is much higher than that of495
the MDP controller.496
5.4. A Network-level Scenario with Route Choice497
In these experiments, we show the extensibility of the MDP framework in a joint decision498
making scenario, in which the framework makes routing, lane-changing, and platoon-merging499
decisions. In the scenario shown in Figure 10, the subject vehicle has two possible routes500
to the destination, namely ‘Route1’ and ‘Route2’. Figure 11 shows the results under three501
scenarios. Under Route1 and Route2, the traveling route is fixed, and the OC model deter-502
mines the lane changing and platoon merging decisions. Under MDP, the MDP framework503
makes all three sets of decisions. This figure demonstrates that under all traffic states, the504
MDP model results in statistically significant savings in the generalized cost compared to505
the OC model with a fixed route.506
20
Figure 10: The subject vehicle has two available routes from the origin (blue point) to the destination (red
point). Route 1 has a slightly shorter distance, but it is more congested compared with route 2.
Figure 11: ‘Route1’ and ‘Route2’ refer to scenarios where the subject vehicle will take routes 1 and 2,
respectively. In these two scenarios, the OC controller is applied. The ‘MDP’ label refers to the case where
the MDP framework selects the adopted route. Other settings are the same as Figure 3.
6. Conclusion507
In this paper we proposed a motion planning framework for a CAV in a mixed traffic en-508
vironment. The framework design leverages an optimal control model to quantify the short-509
term cost of a trip and an MDP model to capture its long-term cost. This general framework510
outputs the target acceleration profile of the vehicle as well as routing, platooning and lane511
changing decisions in a dynamic traffic environment. We implemented this motion planning512
framework in three experimental scenarios including a highway section with multiple on-513
and off-ramps, a circular track, and an urban network with route choice, and conducted a514
comprehensive set of simulations to quantify the long-term benefits the subject vehicle and515
its surrounding vehicles may experience as a result of incorporating network-level informa-516
21
tion into the decision-making process. Our experiments indicate that, generally speaking,517
the MDP framework outperforms a local OC controller in reducing the generalized trip cost.518
With higher intensity of platoon-enabled vehicles or higher weight on long-term cost (larger519
discounting factor), the reduction in generalized cost for both the subject vehicle and its520
upstream vehicles is statistically significant. This significant cost saving, which originates521
from accounting for network-level conditions, exists in all simulated environments, under522
various traffic states.523
ACKNOWLEDGMENT524
The work described in this paper is supported by research grants from the National525
Science Foundation (CPS-1837245, CPS-1839511, IIS-1724341).526
Appendix A. Expected Discounted Cost in Right Lane527
For li6=ld, when φl=Ri,φp= 0, and d=1, the minimum expected discounted cost528
is given by529
V(µ, Ri, 0,1) =
minaA
ΛCRi,0,1
Ri,0,1+U(µ0, Ri, 0,1) al=Ri,ap= 0
g0
l(ξp){ΛCRi,1,k1
Ri,0,1+W(µ0, Ri, 1, k 1)}+
(1 g0
l(ξp)){ΛCRi,0,1
Ri,0,1+U(µ0, Ri, 0,1)}al=Ri,ap= 1
qf
φ(µ){ΛCRi,0,1
Ri,0,1+U(µ0, Ri, 0,1)}+
(1 qf
φ(µ)){ΛCLe,0,1
Ri,0,1+U(µ0, Le, 0,1)}al=Le,ap= 0
g1
l(ξp)(1 qf
φ(µ)){ΛCLe,1,k1
Ri,0,1+W(µ0, Le, 1, k 1)}
+(1 g1
l(ξp)(1 qf
φ(µ))){ΛCRi,0,1
Ri,0,1+U(µ0, Ri, 0,1)}al=Le,ap= 1
(A.1)
The explanation for the case that the subject vehicle is a free agent in the right lane is530
similar to that in the left lane.531
For li6=ld, when φl=Ri,φp= 1, d > 0, the minimum expected discounted cost is given532
22
by533
V(µ, Ri, 1, d) =
minaA
ΛCRi,0,1
Ri,1,d +U(µ0, Ri, 0,1) al=Ri,ap= 0
ΛCRi,1,d1
Ri,1,d +U(µ0, Ri, 1, d 1) al=Ri,ap= 1
qf
φ(µ){ΛCRi,1,d1
Ri,1,d +U(µ0, Ri, 1, d 1)}+
(1 qf
φ(µ)){ΛCLe,0,1
Ri,1,d +U(µ0, Le, 0,1)}al=Le,ap= 0
1g1
l(ξp)(1 qf
φ(µ)){ΛCRi,1,d1
Ri,1,d +U(µ0, Ri, 1, d 1)}
+g1
l(ξp)(1 qf
φ(µ)){ΛCLe,1,k1
Ri,1,d +W(µ0, Le, 1, k 1)}al=Le,ap= 1
(A.2)
The explanation for the case that the subject vehicle is a member of platoon in the right534
lane is similar to the case that in the left lane.535
For li6=ld, when φl=Ri,φp= 1, d= 0, the minimum expected discounted cost is given536
by537
V(µ, Ri, 1,0) =
minaA
ΛCRi,0,1
Ri,1,0+U(µ0, Ri, 0,1) al=Ri,ap= 0
g0
l(ξp){ΛCRi,1,k1
Ri,1,0+W(µ0, Ri, 1, k 1)}+
(1 g0
l(ξp)){ΛCRi,0,1
Ri,1,0+U(µ0, Ri, 0,1)}al=Ri,ap= 1
qf
φ(µ){ΛCRi,0,1
Ri,1,0+U(µ0, Ri, 0,1)}+
(1 qf
φ(µ)){ΛCLe,0,1
Ri,1,0+U(µ0, Le, 0,1)}al=Le,ap= 0
g1
l(ξp)(1 qf
φ(µ)){ΛCLe,1,k1
Ri,1,0+W(µ0, Le, 1, k 1)}
+(1 g1
l(ξp)(1 qf
φ(µ))){ΛCRi,0,1
Ri,1,0+U(µ0, Ri, 0,1)}al=Le,ap= 1
(A.3)
The explanation for the case that the subject vehicle is in the right lane is similar to the538
last case that it is in the left lane.539
Appendix B. Parameters for Generating Simulations540
Appendix C. Sensitivity Analysis over Parameters in Traffic Environment541
To demonstrate the performance of our method under various settings, we conduct sen-542
sitivity analysis over parameters pon ,pof f ,pnpe,pmerge and pchang e in the two-lane highway543
scenario.544
Under univariate analysis, we adjust the value of one parameter at a time while keeping545
the values of other parameters unchanged. To maintain a relatively steady traffic environ-546
ment, i.e., to avoid changes in traffic state, we use the same value for pon and pof f to balance547
the number of vehicles entering and exiting the highway.548
23
Table B.1: Summary of parameters
Parameter Value Definition
tupd 0.4 secs the updating period of the trajectory of the subject vehicle
pon 0.6 the probability that a vehicle is interested in joining the freeway
from an on-ramp
poff 0.6 the probability that a vehicle is interested in taking an off-ramp
pnpe 0.5 the probability that the vehicle is not platoon-enabled
pmerge 0.6 the probability of that a vehicle intends to merge
pchange 0.1 the probability of that the vehicle intends to change lane
tp3.5 secs the time gap between two successive vehicles not in a platoon
tg0.55 secs the time gap between two successive vehicles in a platoon
tlcp 3.6 secs the period of time within which the surrounding vehicles
complete changing lanes
tlc 5 secs the minimum time interval between two successive lane changes
by two successive vehicles in the same lane
τs0.4 secs the reaction time delay in the car-following model
tNact 10 secs the prediction horizon in the optimal control model
vle
m20 m/s the velocity in the left lane at the maximum flow rate
vri
m14 m/s the velocity in the right lane at the maximum flow rate
vle
max 30 m/s maximum velocity in left lane
vri
max 20 m/s maximum velocity in right lane
amax 2 m/s2maximum acceleration for the subject vehicle
jmax 3.5 m/s3maximum jerk for the subject vehicle
dcg 50 m critical gap to decide whether it is feasible to change lanes
lcar 5 m length of a vehicle
hst 5 m vehicle would stop at headway of this value
a2 m/s2the maximum desired acceleration
b3 m/s2the comfortable deceleration
γAR 0.3987 coefficient for air resistance force
γRR 281.547 coefficient for rolling resistance force
γGR 0 coefficient for grade resistance force
γIR 1750 coefficient for inertial resistance force
ηf5.98×10-8 fuel cost for a unit energy consumed by the vehicle (dollars/Joule)
Psch {2,10,50}the scheduled splitting position can be in 2, 10 or 50 road pieces
N(µsch, σsch)N(2,5), left,
N(1,5), right
the normal distribution of the scheduled splitting position in the
left and right lanes
24
Figures C.1 and C.2 display the generalized cost when pon =poff = 0.4 and pon =pof f =549
0.8 for the subject vehicle and surrounding vehicles, respectively. Figures C.3 and C.4550
demonstrate the generalized costs when pnpe = 0.1 and pnpe = 0.9, respectively. Figures C.5551
and C.6 correspond to the cases where pmerge = 0.4 and pmerge = 0.8. Figures C.7 and C.8552
show the cost when pchange = 0.05 and pchange = 0.3.553
Under all these settings, our MDP framework generally results in statistically signifi-554
cant cost savings for subject vehicle and its surrounding vehicles in free-flow and onset-of-555
congestion states, and there is no significant difference in the congested state.556
Figure C.1: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pon and
pof f . Other settings are similar to those in Figure 7.
25
Figure C.2: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pon and
pof f . Other settings are similar to those in Figure 8.
Figure C.3: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pnpe.
Other settings are similar to those in Figure 7.
[1] J. B. Kenney, Dedicated short-range communications (dsrc) standards in the united557
states, Proceedings of the IEEE 99 (2011) 1162–1182.558
[2] Z. Zhang, A. Tafreshian, N. Masoud, Modular transit: Using autonomy and modularity559
26
Figure C.4: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pnpe.
Other settings are similar to those in Figure 8.
Figure C.5: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pmerge.
Other settings are similar to those in Figure 7.
to improve performance in public transportation, Transportation Research Part E:560
Logistics and Transportation Review 141 (2020) 102033.561
[3] M. Abdolmaleki, Y. Yin, N. Masoud, A unifying graph-coloring approach for intersec-562
27
Figure C.6: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pmerge.
Other settings are similar to those in Figure 8.
Figure C.7: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pchange.
Other settings are similar to those in Figure 7.
tion control in a connected and automated vehicle environment, Available at SSRN563
3944348 (2021).564
[4] N. Masoud, R. Jayakrishnan, Autonomous or driver-less vehicles: Implementation565
28
Figure C.8: The value following ‘OC ’ or ‘MDP ’ in the name of the controller specifies the value of pchange.
Other settings are similar to those in Figure 8.
strategies and operational concerns, Transportation research part E: logistics and trans-566
portation review 108 (2017) 179–194.567
[5] F. van Wyk, A. Khojandi, N. Masoud, Optimal switching policy between driving entities568
in semi-autonomous vehicles, Transportation Research Part C: Emerging Technologies569
114 (2020) 517–531.570
[6] F. van Wyk, A. Khojandi, N. Masoud, A path towards understanding factors affect-571
ing crash severity in autonomous vehicles using current naturalistic driving data, in:572
Proceedings of SAI Intelligent Systems Conference, Springer, Cham, pp. 106–120.573
[7] S. Cui, B. Seibold, R. Stern, D. B. Work, Stabilizing traffic flow via a single autonomous574
vehicle: Possibilities and limitations, in: 2017 IEEE Intelligent Vehicles Symposium575
(IV), IEEE, pp. 1336–1341.576
[8] M. W. Levin, Congestion-aware system optimal route choice for shared autonomous577
vehicles, Transportation Research Part C: Emerging Technologies 82 (2017) 229–247.578
[9] R. E. Stern, S. Cui, M. L. Delle Monache, R. Bhadani, M. Bunting, M. Churchill,579
N. Hamilton, H. Pohlmann, F. Wu, B. Piccoli, et al., Dissipation of stop-and-go waves580
via control of autonomous vehicles: Field experiments, Transportation Research Part581
C: Emerging Technologies 89 (2018) 205–221.582
[10] B. HomChaudhuri, A. Vahidi, P. Pisu, A fuel economic model predictive control strategy583
for a group of connected vehicles in urban roads, in: 2015 American Control Conference584
(ACC), IEEE, pp. 2741–2746.585
29
[11] T. Ersal, I. Kolmanovsky, N. Masoud, N. Ozay, J. Scruggs, R. Vasudevan, G. Orosz,586
Connected and automated road vehicles: state of the art and future challenges, Vehicle587
system dynamics 58 (2020) 672–704.588
[12] S. E. Shladover, C. Nowakowski, X. Y. Lu, R. Ferlis, Cooperative adaptive cruise control589
(cacc) definitions and operating concepts, in: Trb Conference.590
[13] Z. Wang, G. Wu, M. J. Barth, A review on cooperative adaptive cruise control (cacc)591
systems: Architectures, controls, and applications (2018).592
[14] V. Milan´es, S. E. Shladover, Modeling cooperative and autonomous adaptive cruise593
control dynamic responses using experimental data, Transportation Research Part C:594
Emerging Technologies 48 (2014) 285–300.595
[15] L. Zhang, J. Sun, G. Orosz, Hierarchical design of connected cruise control in the596
presence of information delays and uncertain vehicle dynamics, IEEE Transactions on597
Control Systems Technology 26 (2017) 139–150.598
[16] G. Orosz, Connected cruise control: modelling, delay effects, and nonlinear behaviour,599
Vehicle System Dynamics 54 (2016) 1147–1176.600
[17] J. Lioris, R. Pedarsani, F. Y. Tascikaraoglu, P. Varaiya, Platoons of connected vehicles601
can double throughput in urban roads, Transportation Research Part C: Emerging602
Technologies 77 (2017) 292–305.603
[18] S. Maiti, S. Winter, L. Kulik, A conceptualization of vehicle platoons and platoon604
operations, Transportation Research Part C Emerging Technologies 80 (2017) 1–19.605
[19] Z. Huang, D. Chu, C. Wu, Y. He, Path planning and cooperative control for automated606
vehicle platoon using hybrid automata, IEEE Transactions on Intelligent Transportation607
Systems 20 (2018) 959–974.608
[20] A. K. Bhoopalam, N. Agatz, R. Zuidwijk, Planning of truck platoons: A literature re-609
view and directions for future research, Transportation research part B: methodological610
107 (2018) 212–228.611
[21] M. Abdolmaleki, M. Shahabi, Y. Yin, N. Masoud, Itinerary planning for cooperative612
truck platooning, Transportation Research Part B: Methodological 153 (2021) 91–110.613
[22] D. Gonz´alez, J. P´erez, V. Milan´es, F. Nashashibi, A review of motion planning tech-614
niques for automated vehicles, IEEE Transactions on Intelligent Transportation Systems615
17 (2015) 1135–1145.616
[23] X. Liu, G. Zhao, N. Masoud, Q. Zhu, Trajectory Planning for Connected and Automated617
Vehicles: Cruising, Lane Changing, and Platooning, SAE International Journal of618
Connected and Automated Vehicles (2021, in press).619
[24] Z. Zheng, Recent developments and research needs in modeling lane changing, Trans-620
portation research part B: methodological 60 (2014) 16–32.621
30
[25] E. Larsson, G. Sennton, J. Larson, The vehicle platooning problem: Computational622
complexity and heuristics, Transportation Research Part C: Emerging Technologies 60623
(2015) 258–277.624
[26] J. Cheng, J. Cheng, M. Zhou, F. Liu, S. Gao, C. Liu, Routing in internet of vehicles: A625
review, IEEE Transactions on Intelligent Transportation Systems 16 (2015) 2339–2352.626
[27] B. Paden, M. ˇ
ap, S. Z. Yong, D. Yershov, E. Frazzoli, A survey of motion planning627
and control techniques for self-driving urban vehicles, IEEE Transactions on intelligent628
vehicles 1 (2016) 33–55.629
[28] L. Claussmann, M. Revilloud, D. Gruyer, S. Glaser, A review of motion planning for630
highway autonomous driving, IEEE Transactions on Intelligent Transportation Systems631
(2019).632
[29] F. Gritschneder, K. Graichen, K. Dietmayer, Fast trajectory planning for automated633
vehicles using gradient-based nonlinear model predictive control, in: 2018 IEEE/RSJ634
International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 7369–635
7374.636
[30] C. Katrakazas, M. Quddus, W.-H. Chen, L. Deka, Real-time motion planning meth-637
ods for autonomous on-road driving: State-of-the-art and future research directions,638
Transportation Research Part C: Emerging Technologies 60 (2015) 416–442.639
[31] X. Zeng, J. Wang, Globally energy-optimal speed planning for road vehicles on a given640
route, Transportation Research Part C: Emerging Technologies 93 (2018) 148–160.641
[32] G. De Nunzio, C. C. De Wit, P. Moulin, D. Di Domenico, Eco-driving in urban traffic642
networks using traffic signals information, International Journal of Robust and Nonlin-643
ear Control 26 (2016) 1307–1324.644
[33] H. Rakha, R. K. Kamalanathsharma, Eco-driving at signalized intersections using v2i645
communication, in: 2011 14th international IEEE conference on intelligent transporta-646
tion systems (ITSC), IEEE, pp. 341–346.647
[34] B. Li, Y. Zhang, Y. Feng, Y. Zhang, Y. Ge, Z. Shao, Balancing computation speed648
and quality: A decentralized motion planning method for cooperative lane changes of649
connected and automated vehicles, IEEE Transactions on Intelligent Vehicles 3 (2018)650
340–350.651
[35] C. Liu, C.-Y. Lin, M. Tomizuka, The convex feasible set algorithm for real time op-652
timization in motion planning, SIAM Journal on Control and optimization 56 (2018)653
2712–2733.654
[36] J. Hardy, M. Campbell, Contingency planning over probabilistic obstacle predictions655
for autonomous road vehicles, IEEE Transactions on Robotics 29 (2013) 913–929.656
31
[37] E. Galceran, A. G. Cunningham, R. M. Eustice, E. Olson, Multipolicy decision-making657
for autonomous driving via changepoint-based behavior prediction., in: Robotics: Sci-658
ence and Systems, volume 1.659
[38] S. Brechtel, T. Gindele, R. Dillmann, Probabilistic decision-making under uncertainty660
for autonomous driving using continuous pomdps, in: 17th International IEEE Confer-661
ence on Intelligent Transportation Systems (ITSC), IEEE, pp. 392–399.662
[39] H. Guo, C. Shen, H. Zhang, H. Chen, R. Jia, Simultaneous trajectory planning and663
tracking using an mpc method for cyber-physical systems: A case study of obstacle664
avoidance for an intelligent vehicle, IEEE Transactions on Industrial Informatics 14665
(2018) 4273–4283.666
[40] X. Li, Z. Sun, D. Cao, D. Liu, H. He, Development of a new integrated local trajectory667
planning and tracking control framework for autonomous ground vehicles, Mechanical668
Systems and Signal Processing 87 (2017) 118–137.669
[41] C. Alia, T. Gilles, T. Reine, C. Ali, Local trajectory planning and tracking of au-670
tonomous vehicles, using clothoid tentacles method, in: 2015 IEEE Intelligent Vehicles671
Symposium (IV), IEEE, pp. 674–679.672
[42] J. Zhou, H. Zheng, J. Wang, Y. Wang, B. Zhang, Q. Shao, Multi-objective optimization673
of lane-changing strategy for intelligent vehicles in complex driving environments, IEEE674
Transactions on Vehicular Technology (2019).675
[43] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, D. Rus, Intention-676
aware motion planning, in: Algorithmic foundations of robotics X, Springer, 2013, pp.677
475–491.678
[44] H. A. Rakha, K. Ahn, K. Moran, Integration framework for modeling eco-routing strate-679
gies: Logic and preliminary results, International Journal of Transportation Science and680
Technology 1 (2012) 259–274.681
[45] E. Paikari, L. Kattan, S. Tahmasseby, B. H. Far, Modeling and simulation of advisory682
speed and re-routing strategies in connected vehicles systems for crash risk and travel683
time reduction, in: 2013 26th IEEE Canadian Conference on Electrical and Computer684
Engineering (CCECE), IEEE, pp. 1–4.685
[46] K. Ahn, H. A. Rakha, Network-wide impacts of eco-routing strategies: a large-scale686
case study, Transportation Research Part D: Transport and Environment 25 (2013)687
119–130.688
[47] K. Boriboonsomsin, M. J. Barth, W. Zhu, A. Vu, Eco-routing navigation system based689
on multisource historical and real-time traffic information, IEEE Transactions on Intel-690
ligent Transportation Systems 13 (2012) 1694–1704.691
[48] A. Duret, M. Wang, A. Ladino, A hierarchical approach for splitting truck platoons692
near network discontinuities, Transportation Research Part B: Methodological (2019).693
32
[49] Y. Wei, C. Avcı, J. Liu, B. Belezamo, N. Aydın, P. T. Li, X. Zhou, Dynamic694
programming-based multi-vehicle longitudinal trajectory optimization with simplified695
car following models, Transportation research part B: methodological 106 (2017) 102–696
129.697
[50] X. Qian, A. De La Fortelle, F. Moutarde, A hierarchical model predictive control frame-698
work for on-road formation control of autonomous vehicles, in: 2016 IEEE Intelligent699
Vehicles Symposium (IV), IEEE, pp. 376–381.700
[51] M. Neunert, C. De Crousaz, F. Furrer, M. Kamel, F. Farshidian, R. Siegwart, J. Buchli,701
Fast nonlinear model predictive control for unified trajectory optimization and tracking,702
in: 2016 IEEE international conference on robotics and automation (ICRA), IEEE, pp.703
1398–1404.704
[52] C. Huang, F. Naghdy, H. Du, Model predictive control-based lane change control system705
for an autonomous vehicle, in: 2016 IEEE Region 10 Conference (TENCON), IEEE,706
pp. 3349–3354.707
[53] X. Qian, F. Altch´e, P. Bender, C. Stiller, A. de La Fortelle, Optimal trajectory plan-708
ning for autonomous driving integrating logical constraints: An miqp perspective, in:709
2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC),710
IEEE, pp. 205–210.711
[54] K. Huang, X. Yang, Y. Lu, C. C. Mi, P. Kondlapudi, Ecological driving system for712
connected/automated vehicles using a two-stage control hierarchy, IEEE Transactions713
on Intelligent Transportation Systems 19 (2018) 2373–2384.714
[55] G. Guo, Q. Wang, Fuel-efficient en route speed planning and tracking control of truck715
platoons, IEEE Transactions on Intelligent Transportation Systems 20 (2018) 3091–716
3103.717
[56] S. Brechtel, T. Gindele, R. Dillmann, Probabilistic mdp-behavior planning for cars,718
in: 2011 14th International IEEE Conference on Intelligent Transportation Systems719
(ITSC), IEEE, pp. 1537–1542.720
[57] R. Bellman, A markovian decision process, Journal of mathematics and mechanics721
(1957) 679–684.722
[58] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018.723
[59] H. Mouhagir, R. Talj, V. Cherfaoui, F. Aioun, F. Guillemard, Integrating safety dis-724
tances with trajectory planning by modifying the occupancy grid for autonomous vehicle725
navigation, in: 2016 IEEE 19th International Conference on Intelligent Transportation726
Systems (ITSC), IEEE, pp. 1114–1119.727
[60] M. Kamrani, A. R. Srinivasan, S. Chakraborty, A. J. Khattak, Applying markov decision728
process to understand driving decisions using basic safety messages data, Transportation729
Research Part C: Emerging Technologies 115 (2020) 102642.730
33
[61] C. You, J. Lu, D. Filev, P. Tsiotras, Highway traffic modeling and decision making for731
autonomous vehicle using reinforcement learning, in: 2018 IEEE Intelligent Vehicles732
Symposium (IV), IEEE, pp. 1227–1232.733
[62] S. Zhou, Y. Wang, M. Zheng, M. Tomizuka, A hierarchical planning and control frame-734
work for structured highway driving, IFAC-PapersOnLine 50 (2017) 9101–9107.735
[63] I. G. Jin, G. Orosz, Optimal control of connected vehicle systems with communica-736
tion delay and driver reaction time, IEEE Transactions on Intelligent Transportation737
Systems 18 (2016) 2056–2070.738
[64] T. D. Gillespie, Fundamentals of vehicle dynamics, Technical Report, SAE Technical739
Paper, 1992.740
[65] Y. Sugiyama, M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, K. Nishinari, S.-i.741
Tadaki, S. Yukawa, Traffic jams without bottlenecks—experimental evidence for the742
physical mechanism of the formation of a jam, New journal of physics 10 (2008) 033001.743
[66] S.-i. Tadaki, M. Kikuchi, M. Fukui, A. Nakayama, K. Nishinari, A. Shibata,744
Y. Sugiyama, T. Yosida, S. Yukawa, Phase transition in traffic jam experiment on745
a circuit, New Journal of Physics 15 (2013) 103034.746
34
... While this approach considers uncertainty in kick selection, it is suggested that incorporating uncertainty in perception, including localization, vision, and object tracking, would further enhance the results of the method. Markov Decision Process (MDP) is commonly used in the context of autonomous vehicles to model and solve decision-making problems under uncertainty [26][27][28][29]. In [26], MDP-based approach is used for trajectory planning with the clothoid tentacles method that generates tentacles in an egocentred grid, that represents feasible trajectories by the vehicle. ...
... Finally, by employing the occupancy grid based on evidential theory, the system is capable of developing more efficient reward calculation while taking into consideration the uncertainty of the environment surrounding the vehicle. MDP enables autonomous vehicles to make informed decisions, considering uncertainties and optimizing objectives as illustrated in Figure 3; according to [27,28]. They provide a framework for rational decision-making and contribute to safer and more efficient navigation in complex environments. ...
... Markov Decision Process [26][27][28] Dynamic environment with uncertain dynamics. High complexity Bayesian Networks [29,34] Considers interaction between participants, flexible design. ...
Article
Full-text available
This abstract discusses the significant progress made in autonomous vehicles, focusing on decision‐making systems and control algorithms. It explores recent advances, challenges, and contributions in the field, emphasizing the need for precise navigation and control. The paper covers various methodologies, including rule‐based methods, machine learning, deep learning, probabilistic approaches, and hybrid approaches, examining their applications and effectiveness in ensuring safe navigation. Additionally, it reviews ongoing research efforts, emerging trends, and persistent challenges related to decision‐making and manoeuvre execution in autonomous vehicles, addressing complex topics such as sensor measurement uncertainty, dynamic environment modelling, real‐time responsiveness, and safe interactions with other road users. The objective is to provide a comprehensive overview of the state of the art in autonomous vehicle navigation and control for readers.
... MDP is used in several domains, such in [33], the author devises vehicle trajectories by coupling a locally-optimal motion planner with a Markov decision process (MDP) model that can capture network-level information. ...
Article
Full-text available
Pursuit-Evasion Game (PEG) can be defined as a set of agents known as pursuers, which cooperate with the aim forming dynamic coalitions to capture dynamic evader agents, while the evaders try to avoid this capture by moving in the environment according to specific velocities. The factor of capturing time was treated by various studies before, but remain the powerful tools used to satisfy this factor object of research. To improve the capturing time factor we proposed in this work a novel online decentralized coalition formation algorithm equipped with Convolutional Neural Network (CNN) and based on the Iterated Elimination of Dominated Strategies (IEDS). The coalition is formed such that the pursuer should learn at each iteration the approximator formation achieving the capture in the shortest time. The pursuer’s learning process depends on the features extracted by CNN at each iteration. The proposed supervised technique is compared through simulation, with the IEDS algorithm, AGR algorithm. Simulation results show that the proposed learning technique outperform the IEDS algorithm and the AGR algorithm with respect to the learning time which represents an important factor in a chasing game.
... While connectivity technology [18] has the potential to greatly mitigate the challenges in predicting the intention and future trajectory of surrounding vehicles [19], [20], [21], [22], it is expected that there will be a long transition period before the full deployment of connected vehicles and traffic infrastructures [23]. Moreover, in mixed traffic, human drivers have different driving patterns, which can even change over time [24], while autonomous vehicles designed by different companies can have varied driving strategies for similar scenarios [25]. ...
Preprint
Full-text available
Recently significant progress has been made in vehicle prediction and planning algorithms for autonomous driving. However, it remains quite challenging for an autonomous vehicle to plan its trajectory in complex scenarios when it is difficult to accurately predict its surrounding vehicles' behaviors and trajectories. In this work, to maximize performance while ensuring safety, we propose a novel speculative planning framework based on a prediction-planning interface that quantifies both the behavior-level and trajectory-level uncertainties of surrounding vehicles. Our framework leverages recent prediction algorithms that can provide one or more possible behaviors and trajectories of the surrounding vehicles with probability estimation. It adapts those predictions based on the latest system states and traffic environment, and conducts planning to maximize the expected reward of the ego vehicle by considering the probabilistic predictions of all scenarios and ensure system safety by ruling out actions that may be unsafe in worst case. We demonstrate the effectiveness of our approach in improving system performance and ensuring system safety over other baseline methods, via extensive simulations in SUMO on a challenging multi-lane highway lane-changing case study.
... To understand driver behavior and driving decisions, MDP was used to analyze basic safety message data from vehicles [41], which discovered that the driver prefers to accelerate in order to escape the crowdedness around them. For motion planning for connected and automated vehicles, MDP can be used to incorporate network-level data and make decisions on platoon membership, lane changing, and route choice [42]. Traffic light control problem can be modeled as MDP to enhance the junction flow rate. ...
Article
Full-text available
Multimodal transportation systems require an effective journey planner to allocate multiple passengers to transport operators. One example is mobility-as-a-service (MaaS), a new mobility service that integrates various transport modes through a single platform. In such a multimodal and diverse journey planning problem, accommodating heterogeneous passengers with different and dynamic preferences can be challenging. Furthermore, passengers may behave based on experiences and expectations, in the sense that the transport experience affects their state and decision of the next transport service. Current methods of treating each journey planning optimization as a non-time varying single experience problem cannot adequately model passenger experience and memories over many journeys over time. In this paper, we model passenger experience as a Markov model where prior experiences have a transient effect on future long-term satisfaction and retainment. As such, we formulate a multi-objective journey planning problem that considers individual passenger preferences, experiences, and memories. The proposed approach dynamically determines utility weights to obtain an optimal journey plan for individual passengers based on their status. To balance the profit received by each transport operator, we present a variant based proportional fairness. Our experiments using real-world and synthetic datasets show that our approach enhances passenger satisfaction, compared to baseline methods. We demonstrate that the overall profit is increased by 2.3 times, resulting in a higher retention rate caused by higher satisfaction levels. Our proposed approach can facilitate the participation of transport operators and promote passenger acceptance of MaaS.
... Correspondingly, some studies have attempted to fnd ways to minimise the generalised costs. By combining a locally-optimal motion planner with a Markov decision process (MDP) model, Liu et al. [102] simulated vehicle trajectories. Te framework that they proposed reduced the trip costs of journeys made using AVs, including fuel and travel time costs, while also guaranteeing safety. ...
Article
Full-text available
In recent years, the level of acceptance of autonomous vehicles (AVs) has changed with the advent of new sensor technologies and the proportional increase in market perception of these vehicles. Our study provides an overview of the relevant existing studies in order to consolidate current knowledge and pave the way for future studies in this area. The paper first reviews studies investigating the market acceptance of AVs. We identify the nonbehavioural factors that account for the level of acceptance and examine these in detail by cross-referencing the results of relevant papers published between 2014 and 2021 to reach a consensus on the perceived benefits and concerns. The findings showed that previous studies have found legal liability, safety, privacy, security, traffic conditions, and cost to be key external factors influencing the acceptance or rejection of AVs, and that the upsides of adopting AVs in regard to improving traffic conditions and safety outweigh the risks identified in relation to these areas. This resulted in an overall weighted average of 65% market acceptance of AVs among the 11,057 people surveyed in this regard. However, the remaining respondents were not very favourably disposed towards adopting AVs because of unresolved issues related to data privacy, security breaches, and legal liability in the event of accidents. In addition, our evaluation showed that the worldwide market purchasing power for an AV, based on 2022 prices, is around $38k, which is significantly below the current anticipated price of $100k.
... A system that creates vehicle trajectories by fusing a locally optimal mobility controller with MDP (Markov Decision Process) model which can integrate network-level data was introduced by Xiangguo et al. [7]. Our suggested architecture may ensure safety while decreasing the total cost of a journey, which includes its time and fuel expenditures. ...
Article
Full-text available
Over the past ten years, autonomous driving has garnered a great deal of interest from both the scientific community and business. Strong technological advancements have made automated driving more practical because human driving abilities seem limited in terms of driving experience, reaction time, and the effectiveness of real‐time decisions. The development of highly autonomous driving algorithms is inextricably tied to planning and changing a vehicle path that must be user‐acceptable, efficient, and collision‐free. Path planning for road vehicles is a difficult problem due to the high speed involved and the requirement to assure passenger safety. Here, a new path‐planning method is developed for both connected and disconnected automatic road vehicles on multilane highways. This paradigm states that the right phrases to describe the objectives of vehicle improvement, passenger comfort, prevention of vehicle‐to‐vehicle collisions and road deviations are included in the objective function. Hunger Games improved Archimedes optimization (HGE‐ARCO) is used to optimize the paths for achieving better‐planned outcomes. At the 100th penetration rate, the HGE‐ARCO scheme reached a top speed of about 99 km/h. The results shows unmistakably that the proposed HGE‐ARCO produces a time of 12.3021 s, which is less than other conventional methods.
Article
Full-text available
With rapid development in hardware (sensors and processors) and AI algorithms, automated driving techniques have entered the public’s daily life and achieved great success in supporting human driving performance. However, due to the high contextual variations and temporal dynamics in pedestrian behaviors, the interaction between autonomous-driving cars and pedestrians remains challenging, impeding the development of fully autonomous driving systems. This paper focuses on predicting pedestrian intention with a novel transformer-based evidential prediction (TrEP) algorithm. We develop a transformer module towards the temporal correlations among the input features within pedestrian video sequences and a deep evidential learning model to capture the AI uncertainty under scene complexities. Experimental results on three popular pedestrian intent benchmarks have verified the effectiveness of our proposed model over the state-of-the-art. The algorithm performance can be further boosted by controlling the uncertainty level. We systematically compare human disagreements with AI uncertainty to further evaluate AI performance in confusing scenes. The code is released at https://github.com/zzmonlyyou/TrEP.git.
Article
Full-text available
Autonomy and connectivity are considered among the most promising technologies to improve safety and mobility and reduce fuel consumption and travel delay in transportation systems. In this paper, we devise an optimal control-based trajectory planning model that can provide safe and efficient trajectories for the subject vehicle while incorporating platoon formation and lane-changing decisions. We embed this trajectory planning model in a simulation framework to quantify its fuel efficiency and travel time reduction benefits for the subject vehicle in a dynamic traffic environment. Specifically, we compare and analyze the statistical performance of different controller designs in which lane changing or platooning may be enabled, under different values of time (VoTs) for travelers. Results from extensive numerical experiments indicate that our design can not only provide first-hand cost savings for the subject vehicle but also second-hand savings for vehicles upstream of the subject vehicle. Experiments also highlight that lane changing and platooning can both offer benefits, depending on the relative values of fuel cost and the traveler's VoT: with a small VoT, the fuel efficiency benefits of platooning outweigh time savings offered by lane changing. However, a vehicle with a high VoT may find it more beneficial to travel outside of a platoon and complete its trip faster by leveraging lane changes.
Article
Full-text available
In this paper we investigate a new form of automated public transportation, named 'modular transit', configured to overcome the shortcomings of the traditional bus, including the first-and last-mile problem, low occupancy, and low levels of comfort, accessibility, and flexibility. The modular transit system consists of a set of trailer modules who can travel locally to serve demand and to connect travelers to main modules for long-distance trips. We mathematically model this system on a time-expanded network, thereby reducing the size of the optimization problem and rendering the problem amenable to being solved with commercial optimization engines. We conduct extensive numerical experiments and sensitivity analyses to study the performance of modular buses under various configurations. Finally, we compare the modular transit service with a door-to-door shuttle service as benchmark to showcase the benefits of modular transit.
Article
Full-text available
In the future, autonomous vehicles are expected to safely move people and cargo around. However, as of now, automated entities do not necessarily outperform human drivers under all circumstances, particularly under certain road and environmental factors such as bright light, heavy rain, poor quality of road and traffic signs, etc. Therefore, in certain conditions it is safer for the driver to take over the control of the vehicle. However, switching control back and forth between the human driver and the automated driving entity may itself pose a short-term, elevated risk, particularly because of the out of the loop (OOTL) issue for humans. In this study, we develop a mathematical framework to determine the optimal driving-entity switching policy between the automated driving entity and the human driver. Specifically, we develop a Markov decision process (MDP) model to prescribe the entity in charge to minimize the expected safety cost of a trip, considering the dynamic changes of the road/environment during the trip. In addition, we develop a partially observable Markov decision process (POMDP) model to accommodate the fact that the risk posed by the immediate road/environment may only be partially observed. We conduct extensive numerical experiments and thorough sensitivity and robustness analyses, where we also compare the expected safety cost of trips under the optimal and single driving entity policies. In addition, we quantify the risks associated with the policies, as well as the impact of miss-estimating road/environment condition risk level by the driving entities, and provide insights. The proposed frameworks can be used as a policy tool to identify factors that can render a region suitable for level four autonomy.
Article
Full-text available
Truck platooning has attracted substantial attention due to its pronounced benefits in saving energy and promising business model in freight transportation. However, one prominent challenge for the successful implementation of truck platooning is the safe and efficient interaction with surrounding traffic, especially at network discontinuities where mandatory lane changes may lead to the decoupling of truck platoons. This contribution puts forward an efficient method for splitting a platoon of vehicles near network merges. A model-based bi-level control strategy is proposed. A supervisory tactical strategy based on a first-order car-following model with bounded acceleration is designed to maximize the flow at merge discontinuities. The decisions taken at this level include optimal vehicle order after the merge, new equilibrium gaps of automated trucks at the merging point, and anticipation horizon that the platoon members start to track the new equilibrium gaps. The lower-level operational layer uses a third-order longitudinal dynamics model to compute the optimal truck accelerations so that new equilibrium gaps are created when merging vehicles start to change lane and the transient maneuvers are efficient, safe and comfortable. The tactical decisions are derived from an analytic car-following model and the operational accelerations are controlled via model predictive control with guaranteed stability. Simulation experiments are provided in order to test the feasibility and demonstrate the performance and robustness of the proposed strategy.
Article
Full-text available
Self-driving vehicles will soon be a reality, as main automotive companies have announced that they will sell their driving automation modes in the 2020s. This technology raises relevant controversies, especially with recent deadly accidents. Nevertheless, autonomous vehicles are still popular and attractive thanks to the improvement they represent to people's way of life (safer and quicker transit, more accessible, comfortable, convenient, efficient, and environment-friendly). This paper presents a review of motion planning techniques over the last decade with a focus on highway planning. In the context of this article, motion planning denotes path generation and decision making. Highway situations limit the problem to high speed and small curvature roads, with specific driver rules, under a constrained environment framework. Lane change, obstacle avoidance, car following, and merging are the situations addressed in this paper. After a brief introduction to the context of autonomous ground vehicles, the detailed conditions for motion planning are described. The main algorithms in motion planning, their features, and their applications to highway driving are reviewed, along with current and future challenges and open issues.
Article
A cooperative truck platoon is a set of virtually linked trucks driving with a small intra-vehicle headway enabled by connected and automated vehicle technologies. One of the primary benefits of truck platooning is energy savings due to the reduction of aerodynamic drag on the platooned vehicles. The focus of this paper is on scheduling travel itineraries of a given set of trucks to facilitate the formation of platoons to maximize energy savings. By constructing a time-expanded network, we formulate the problem as a minimum concave-cost network flow problem, and devise a few solution methods to find the optimal or high-quality solutions. The solution methods include an outer approximation algorithm for solving a mixed-integer convex minimization reformulation of the problem, a dynamic-programming-based heuristic scalable to large-scale instances, and a fast approximation algorithm with guaranteed performance for a restrictive version of the problem. All the proposed algorithms are examined and benchmarked on medium to large networks under various test scenarios. The numerical results demonstrate the efficiency of the proposed methods and their applicability in real-world settings.
Article
While a number of studies have investigated driving behaviors, detailed microscopic driving data has only recently become available for analysis. Through Basic Safety Message (BSM) data from the Michigan Safety Pilot Program, this study applies a Markov Decision Process (MDP) framework to understand driving behavior in terms of acceleration, deceleration and maintaining speed decisions. Personally Revealed Choices (PRC) that maximize the expected sum of rewards for individual drivers are obtained by analyzing detailed data from 120 trips and the application of MDP. Specifically, this paper defines states based on the number of objects around the host vehicle and the distance to the front object. Given the states, individual drivers’ reward functions are estimated using the multinomial logit model and used in the MDP framework. Optimal policies (i.e. PRC) are obtained through a value iteration algorithm. The results show that as the number of objects increases around a host vehicle, the driver prefer to accelerate in order to escape the crowdedness around them. In addition, when trips are segmented based on the level of crowdedness, increased levels of trip crowdedness results in a fewer number of drivers accelerating because the traffic conditions constrain them to maintaining constant speed or deceleration. One potential application of this study is to generate short-term predictive driver decision information through historical driving performance, which can be used to warn a host vehicle driver when the person substantially deviates from their own historical PRC. This information could also be disseminated to surrounding vehicles as well, enabling them to foresee the states and actions of other drivers and potentially avoid collisions.
Article
The state of the art of modelling, control, and optimisation is discussed for automated road vehicles that may utilise wireless vehicle-to-everything (V2X) connectivity. The appropriate tools to address safety and energy efficiency are described and the effects on traffic dynamics are highlighted. Finally, the economical and societal impacts of the deployment of connected and automated vehicles are discussed.
Article
This paper describes an optimal lane-changing strategy for intelligent vehicles under the constraints of collision avoidance in complex driving environments. The key technique is optimization in a collision-free lane-changing trajectory cluster. To achieve this cluster, a tuning factor is first derived by optimizing a cubic polynomial. Then, a feasible trajectory cluster is generated by adjusting the tuning factor in a stable handling envelope that is defined from vehicle dynamics limits. Furthermore, considering the motions of surrounding vehicles, a collision avoidance algorithm is employed in the feasible cluster to select the collision-free trajectory cluster. To extract the optimal trajectory from this cluster, the TOPSIS algorithm is utilized to solve a multiobjective optimization problem that is subject to lane change performance indices, i.e., trajectory following, comfort, lateral slip and lane-changing efficiency. In this way, the collision risk is eliminated, and the lane change performance is improved. Simulation results show that the presented strategy is able to plan suitable lane-changing trajectories while avoiding collisions in complex driving environments.
Conference Paper
In the U.S., in 2015 alone, there were approximately 35,000 fatalities and 2.4 million injuries caused by an estimated 6.3 million traffic accidents. In the future, it is speculated that automated systems will help to avoid or decrease the number and severity of accidents. However, before such a time, a broad range of vehicles, from non-autonomous to fully-autonomous, will share the road. Hence, measures need to be put in place to improve both safety and efficiency, while not compromising the advantages of autonomous driving technology. In this study, a Bayesian network model is developed to predict the severity of an accident, should it occur, given the road and the immediate environment conditions. The model is calibrated for the case of traditional vehicles using pre-crash information on driver behaviour, road surface conditions, weather and lighting conditions, among other variables, to predict two categories of consequences for accidents, namely property damage and injury/fatality. The results demonstrate that the proposed methodology and the determinant factors used in the models can predict the consequences of an accident, and more importantly, the probability of a crash causing injury/fatality, with high accuracy. Approaches to extend this model are proposed to predict accident severity for autonomous vehicles through leveraging their sensor data. Such a model would assist the development of countermeasures to identify the most important factors impacting severity of accidents for semi- and fully-autonomous vehicles to prevent accidents, decrease accident severity in cases where accidents are bound to occur, and improve transportation safety in general.