Content uploaded by Neda Masoud

Author content

All content in this area was uploaded by Neda Masoud on Jan 07, 2022

Content may be subject to copyright.

A Markov Decision Process Framework to Incorporate

Network-Level Data in Motion Planning for Connected and

Automated Vehicles

Xiangguo Liu, Neda Masoud, Qi Zhu, Anahita Khojandi

Abstract

Autonomy and connectivity are expected to enhance safety and improve fuel eﬃciency in

transportation systems. While connected vehicle-enabled technologies, such as coordinated

cruise control, can improve vehicle motion planning by incorporating information beyond

the line of sight of vehicles, their beneﬁts are limited by the current short-sighted planning

strategies that only utilize local information. In this paper, we propose a framework that

devises vehicle trajectories by coupling a locally-optimal motion planner with a Markov

decision process (MDP) model that can capture network-level information. Our proposed

framework can guarantee safety while minimizing a trip’s generalized cost, which comprises

of its fuel and time costs. To showcase the beneﬁts of incorporating network-level data

when devising vehicle trajectories, we conduct a comprehensive simulation study in three

experimental settings, namely a circular track, a highway with on- and oﬀ-ramps, and a

small urban network. The simulation results indicate that statistically signiﬁcant eﬃciency

can be obtained for the subject vehicle and its surrounding vehicles in diﬀerent traﬃc states

under all experimental settings. This paper serves as a poof-of-concept to showcase how

connectivity and autonomy can be leveraged to incorporate network-level information into

motion planning.

Keywords: Connected and Automated Vehicles, Trajectory planning

1. Introduction1

Connected vehicle (CV) technology facilitates communication among vehicles, their sur-2

rounding infrastructure, and other road users. This connectivity is enabled through Dedi-3

cated Short Range Communication (DSRC) [1] or cellular technologies, and paints a more4

comprehensive picture of the transportation network than what could be observed by each5

individual road user. As such, it is expected that upon deployment, CV technology would6

signiﬁcantly improve mobility [2, 3, 4], enhance safety [5, 6] and traﬃc ﬂow stability [7],7

reduce congestion [8, 9], and improve fuel economy [10], among other beneﬁts [11]. CV tech-8

nology has enabled several advanced driving assistance systems (ADAS), such as Cooperative9

Adaptive Cruise Control (CACC) [12, 13, 14], Connected Cruise Control (CCC) [15, 16] and10

Platooning [17, 18, 19, 20, 21]. Although existing CV-enabled technologies are based on local11

communications, the CV technology can also provide granular data at the network level by12

strategically positioning road side units (RSUs) to ensure connectivity throughout an entire13

network.14

Preprint submitted to Transportation Research Part C: Emerging Technologies January 7, 2022

Motion planning in transportation networks has been traditionally carried out using tech-15

niques that leverage local information to make locally-optimal decisions [22]. In particular,16

optimal control-based models have been widely applied to traditional transportation net-17

works for their ability to provide short-term eﬃcient solutions. The CV technology can18

help improve these locally-optimal motion planners, as it allows vehicles to see beyond line19

of sight. More importantly, it enables vehicles to obtain network-level information through20

communication with other connected vehicles and RSUs. Such connectivity can be leveraged21

to enhance long-term safety and eﬃciency of planned trajectories; however, for this potential22

to be realized, the network-level information should be integrated into the decision making23

systems. This cannot be accomplished using existing techniques, as they are not scalable24

to utilize granular data collected from the entire network. Hence, new methods need to25

be developed that can (i) leverage network-level data, and (ii) provide fast and eﬃcient26

trajectories that adapt to the stochasticity of traﬃc networks.27

This paper introduces a general framework that combines high-level network-level infor-28

mation with granular local information to devise network-informed cruising, routing, lane-29

changing, and platoon-merging decisions for a CAVs in a mixed traﬃc scenario, as shown30

in Figure 1. As demonstrated in this ﬁgure, the proposed framework combines an optimal31

control (OC) trajectory planning model proposed in [23] with a Markov decision process32

(MDP) model developed in this paper to devise an eﬃcient trajectory for an entire trip.33

The proposed MDP model can capture the progression of traﬃc as a stochastic process34

at an aggregate level, thereby complementing the optimal-control-based motion planning35

model through incorporating network-level information. In this context, using the proposed36

MDP framework allows vehicles to skip near-sighted locally-optimal trajectories [23], and37

make routing, lane-changing, and platoon-merging decisions with a long-term view so as to38

minimize a combination of short-term and long-term costs.39

2. Related Works40

2.1. Motion Planning41

Motion planning for automated vehicles has been an active research topic [22, 24, 25, 26].42

With the advancement of communication, computation and sensing technologies, various43

planning and control techniques have been proposed, developed, and applied in complex44

traﬃc environments. Paden et al. [27] reviewed the planning and control techniques in an45

urban environment, Claussmann et al. [28] reviewed motion planning techniques for highway46

driving, Gritschneder et al. [29], Katrakazas et al. [30] emphasized the real time performance47

of planning techniques, and Zeng and Wang [31], De Nunzio et al. [32], Rakha and Ka-48

malanathsharma [33] focused on the eﬃciency of the proposed methods. Li et al. [34], Liu49

et al. [35] attempted to balance computational performance and solution quality. Zhang50

et al. [15], Orosz [16] considered communication delay and reaction time in designing motion51

planners, and Hardy and Campbell [36], Galceran et al. [37], Brechtel et al. [38] addressed52

the uncertainty in the driving behaviour of vehicles surrounding autonomous vehicles.53

The motivation of this large body of research on motion planning has been to improve54

safety and comfort as well as reduce travel time and fuel consumption. Safety and colli-55

sion avoidance have been discussed in many studies [39, 40, 41, 42], some of which have56

2

Short term cost of the optimal

trajectory by the OC model in the remaining

of the road piece

Computed in real-time

Expected discounted cost of the trip

from road piece to the end of the trip

Looked-up from a policy table

MDP model: Select the trajectory with

the lowest total cost

At each update period , within road piece :

Coarse network-level data

Granular data

For each action

Figure 1: Structure of the proposed MDP framework. The optimal control (OC) model plans a number

of trajectory to determine the short-term cost [23] associated with every higher-layer action a∈A, which

includes a combination of route choice, lane changing, and platoon merging. The MDP model assesses the

long-term cost associated with each higher-level action a∈A. The MDP framework selects the action a∈A

that provides the minimum expected discounted cost of a trip, which is sum of the costs estimated by the

OC and MDP models.

Figure 2: The upper ﬁgure displays a freeway stretch segmented into merge (on-ramp), diverge (oﬀ-ramp),

and regular road pieces, where the MDP model operates. The lower ﬁgure displays a zoomed out view of a

road piece, where the cost of each action (i.e., lane-changing, platoon-merging, and routing) is determined

based on local information. Note that the cost of the optimal control model, Coc, is computed for a given

action, which is determines by the starting and ending states, siand sj.

considered the uncertainty of the surrounding vehicles’ motion [36, 43]. Besides safety guar-57

antee, eﬃciency manifested in the form of reducing travel time [44, 45] and fuel consumption58

[44, 46, 47] or increasing traﬃc ﬂow [48, 49] has been one of the driving objectives in devel-59

3

oping motion planners. Despite the proven short-term capability of the proposed methods60

to increase eﬃciency, they cannot guarantee long-term eﬃciency due to the limited captured61

horizon.62

Several attempts have been made in the literature to devise trajectories that account for63

beyond the local neighbourhood of the subject vehicle. One such approach is hierarchical64

design, which is sometimes referred to as the combination of trajectory planning and tracking65

[39, 40, 41, 50, 51, 52], and sometimes as the combination of long- and short-horizon planning.66

To avoid confusion, in this study, we use the term hierarchical design to denote the long-67

and short-horizon planning, where higher- and lower-layer decisions are made, respectively.68

There have been several attempts in the literature to conduct longer-term horizon plan-69

ning using hierarchical design, under speciﬁc assumptions. Zeng and Wang [31] proposed a70

dynamic programming algorithm under the assumption that the speed proﬁle of the subject71

vehicle’s immediate leading vehicle is ﬁxed and known. Similarly, Qian et al. [53] assumed72

the surrounding vehicles’ future motions to be given. Studies that assume the surrounding73

traﬃc environment to be ﬁxed and known can compute the optimal speed proﬁle of the74

subject vehicle and have the subject vehicle follow this proﬁle [54, 55, 31]. However, due75

to the assumptions on the motion proﬁles of the surrounding vehicles, these higher-layer76

plans are not guaranteed to be well-executed or feasible to navigate by lower-layer planners.77

Because the lower-layer planners need to ensure safety and comfort and follow traﬃc rules,78

sometimes they cannot follow the suggested speed or the planned route due to not ﬁnding79

the opportunity to change lane, etc. On the other hand, the hierarchical layered design80

cannot simply be replaced with a one-time optimization problem to make both higher- and81

lower-layer decisions, due to its high computational complexity [56]. In this paper, We aim to82

bridge the gap between the hierarchical but non-eﬃcient trajectory planning and the optimal83

but computationally-complex planning by establishing a feedback loop between higher- and84

lower-layer decisions in hierarchical schemes. In our proposed method, while the lower-layer85

planner attempts to follow the plan provided by the higher-layer planner, the higher-layer86

plan can also be adjusted according to the real-time execution status in the lower-layer.87

In addition to the possibility that the higher-layer plan may not be executable, the88

plans at the lower-layer, the higher-layer, or both layers may be outdated at the time of89

execution in a fast-changing traﬃc environment. To combat outdated decisions, Paikari90

et al. [45], Boriboonsomsin et al. [47] proposed to update the higher-layer plan, while Guo91

et al. [39], Li et al. [40], Alia et al. [41] considered updating the lower-layer plan. Huang92

et al. [54] utilized a genetic algorithm for higher-layer planning, and a quadratic program93

for lower-layer adaptation, where plans on both layers are updated periodically. The two94

layers of decision making in our proposed hierarchical design are also closely coupled, as the95

lower-layer plan is devised based on higher-layer decisions, and the higher-layer plan can also96

be adapted based on the lower-layer execution status. Moreover, the higher-layer plan in our97

work is updated not only based on real-time state of the downstream traﬃc, but also based98

on the network-level evolution of traﬃc. Additionally, our framework is more comprehensive99

as it includes decision making for routing, lane-changing, platooning, and cruising.100

2.2. Markov Decision Processes in Transportation101

A Markov decision process (MDP) is a stochastic control process that is used extensively102

in many ﬁelds, including transportation, robotics and economics. MDPs can model the103

4

interaction between agents and the stochastic environment. The goal of an MDP model104

is to ﬁnd a policy that maximizes the total expected cumulative reward in a stochastic105

environment [57, 58].106

In the transportation ﬁeld, MDPs have been utilized to plan local trajectories by mod-107

eling the uncertainty of driver behavior [43, 59]. MDP and its variant, partially observable108

Markov decision process (POMDP), have also been applied for vehicle behavior analysis and109

prediction [60, 37, 38] and driving entity switching policy [5]. Brechtel et al. [56] proposed110

an MDP-based motion planning model to devise a vehicle’s target position and velocity. The111

authors identiﬁed the scalability of their proposed method with respect to the number of112

vehicles as an open problem. To tackle the computational complexity of the problem, the113

authors adopted a ﬁxed discretization of the action space to formulate the problem, which114

could render their methodology ineﬃcient. You et al. [61] designed a reward function for115

MDP with the objective of obtaining expert-like driving behavior. This model determines116

the velocity of the subject vehicle and whether the vehicle should change lanes, considering117

the relative position of the subject vehicle and its surrounding vehicles.118

The studies above mostly employ MDPs to determine the velocity of the subject vehicle,119

leaving out higher-layer decisions. A recent work ([62]) developed a hierarchical framework120

in which an MDP model was employed to make lane-changing decisions in the higher layer.121

They introduced three models, namely a trajectory smoother, a longitudinal controller, and122

a lateral controller to address the detailed execution in the lower layer. In our work, we123

further consider the long-term eﬃciency of a trajectory by extending the MDP model to a124

more general motion planner, which includes routing, lane-changing, and platoon-merging.125

In our proposed work, safety and comfort are ensured by the planner in the lower layer,126

while the MDP model explores the long term beneﬁts of the planned trajectory by consid-127

ering the stochastic changes in the downstream traﬃc environment. We use simulations to128

demonstrate that our proposed method results in statistically signiﬁcant reductions in the129

long-term generalized trip cost.130

2.3. Our Contributions131

This paper introduces a framework that facilitates making trajectory planning decisions132

(namely, cruising, lane-changing, platoon-merging, and route choice) based on both local and133

network-level data. More speciﬁcally, our framework makes joint cruising, lane-changing,134

platoon-merging, and routing decisions to minimize the total expected discounted cost of a135

(leg of a) trip in a dynamic environment. This is accomplished through two main modules136

within an MDP framework: (1) an optimal-control-based trajectory planning model that137

provides the vehicle’s acceleration proﬁle with the goal of maximizing safety and comfort138

locally [23]; and (2) an MDP model that enables incorporating network-level information139

into the decision making process.140

The contributions of this paper are as follows. This work is the ﬁrst to advance the tra-141

ditional local motion planning models by incorporating strategically-condensed high volume142

of network-level data using a Markov Decision Process (MDP) modeling framework, hence143

devising entire eﬃcient trajectories in dynamic traﬃc streams. In this general framework,144

cruising, routing, lane-changing, and platoon-merging decisions are made concurrently. We145

conduct comprehensive simulation experiments to demonstrate the beneﬁts of augmenting146

traditional trajectory planning models with an MDP model for both the subject vehicle and147

5

its surrounding vehicles. We demonstrate that not only does a CV beneﬁt from utilizing148

network-level information in devising its own trajectory, but its surrounding vehicles, which149

may be CAVs or legacy vehicles, also experience second-hand cost-reduction beneﬁts. These150

results could have great policy implications, as they demonstrate that only a handful of151

CAVs in a traﬃc stream could serve as traﬃc regulators.152

3. Problem Statement153

Consider a CAV, to which we refer as the subject vehicle, who is making a trip from a154

known origin to a known destination. The subject vehicle is able to directly observe its sur-155

roundings using its onboard sensor systems as well as basic safety messages (BSMs) obtained156

from other vehicles or RSUs within its communication range. Owing to its connectivity, the157

subject vehicle can also obtain network-level information about the state of traﬃc. The158

objective of the subject vehicle is to navigate the network safely and comfortably, while at159

the same time minimizing its travel cost, which is composed of time cost and energy cost,160

by utilizing both granular local data and coarse network-level information.161

4. Methodology162

4.1. The MDP Framework163

The proposed framework determines the trajectory of a subject vehicle, including ﬁne-164

grained decisions (i.e., the acceleration proﬁle) and coarse decisions (i.e., routing, lane chang-165

ing, and platoon merging). In this framework, ﬁne-grained decisions are made by a local166

optimal control trajectory planning model using only local information, and coarse decisions167

are made by an MDP model using network-level information. The MDP framework combines168

the two models to make a ﬁnal decision about the trajectory of the subject vehicle: For each169

coarse action (where a coarse action is a unique vector of route choice, platoon merging,170

and lane changing), the MDP framework uses the optimal control model to obtain the low-171

est short-term cost of completing the action, and the MDP model to obtain the long-term172

expected discounted cost of completing the same action. Finally, the action that provides173

the the lowest total cost will be selected and pursued by the vehicle. This framework is174

demonstrated in Figure 1.175

An example network is displayed in Figure 2, where the subject vehicle is located on176

the right lane, planning to take the oﬀ-ramp marked by an arrow. The general travel cost177

incurred by the vehicle is a linear combination of the route travel time and fuel cost. To178

optimize its trajectory, in addition to determining the exact position, speed, and acceleration179

of the subject vehicle at each point in time, we need to make three sets of higher-level180

decisions with long-term implications: whether (and where) to change lanes, whether to join181

(or split from) a platoon, and which route to take.182

Each action can have conﬂicting implications in terms of energy eﬃciency, travel time,183

and passenger safety. For example, the vehicle would be able to travel at a higher speed184

on the left lane, but may have more opportunities to join a platoon and increase its fuel185

economy on the right lane. The trade-oﬀs between these actions can be captured by an186

optimal-control-based trajectory planning model that uses local information (i.e., the speed187

and availability of platoons at both lanes). As another example, while joining a platoon would188

6

provide fuel eﬃciency, changing platoon membership frequently could pose safety risks on189

the vehicle occupants and create instability in the traﬃc stream. This example highlights190

the importance of not making decisions based solely on minimizing the short-term vehicle-191

speciﬁc costs, and taking a longer-term, futuristic view of the cost that requires incorporating192

network-level information into the decision making process. As such, the proposed MDP193

framework is designed to capture the expected long-term cost of each action, allowing the194

vehicle to make informed decisions based on both local and network-level information.195

In order to model the system with a view on facilitating the incorporation of both granular196

and network-level information, we make a number of assumptions. First, we divide the197

network into a number of relatively large cells, to which we refer as road pieces. Road pieces198

are constructed such that (i) the macroscopic-level traﬃc dynamics are homogeneous within199

each piece at each point in time; and (ii) all vehicles within a road piece are within a reliable200

communication range of one-another. As such, we introduce three types of road pieces,201

namely, merge (which includes a single on-ramp/road), diverge (which includes a single oﬀ-202

ramp/road), and regular (which does not include any on- or oﬀ-ramps). In Figure 2, for203

example, l1is an on-ramp or merge piece, while l4and l5are regular pieces.204

The trajectory planning model is re-optimized dynamically as the immediate neighbour-205

hood of the subject vehicle evolves. This re-optimization occurs after a time period tupd

206

has lapsed, which is set to 0.4 sec following [23]. The MDP model is solved oﬀ-line, and its207

resulting optimal policies are stored in a look-up policy table that can be accessed at any208

time. In the rest of this section, we elaborate on the MDP model in subsection 4.2, and209

provide a brief overview of the optimal control model in subsection 4.3.210

4.2. The MDP Model211

The MDP framework considers three traﬃc states, namely, free-ﬂow, onset-of-congestion,212

and congested traﬃc. The traveling speed of the subject vehicle is determined based on the213

traﬃc state of the road piece the vehicle is traversing. When the subject vehicle enters a new214

road piece li, a decision is made as to whether the vehicle should change lanes and whether215

to join a platoon. It is assumed that the vehicle can ﬁnish the lane changing and platoon216

merging processes within the same road piece li. If there are more than one road pieces217

following li, the subject vehicle also has to make a route choice decision by selecting one of218

the candidate road pieces, l0

i∈Sl(li), where Sl(li) = {l0

i1, l0

i2, . . .}is the set of road pieces219

connected to li, and therefore depends on the network structure.220

Table 1: Table of notation

Notation Deﬁnition

Le Left lane

Ri Right lane

liRoad piece i

lA generic road piece

l0

ij The jth road piece directly connected to road piece li

Sl(li) = {l0

i1, l0

i2, . . .}Set of road pieces directly connected to road piece li

l0

i∈Sl(li)The selected road piece among the set of road pieces connected

to li

7

L={li}Set of road pieces

loThe road piece at the origin of the trip

ldThe road piece at the destination of the trip

ξLe

tr Macroscopic state of traﬃc in the left lane

ξRi

tr Macroscopic state of traﬃc in the right lane

ξtr = [ξLe

tr , ξRi

tr ] Vector specifying the macroscopic state of traﬃc

ξLe

pPercentage of platoon-enabled vehicles in the left lane

ξRi

pPercentage of platoon-enabled vehicles in the right lane

ξp= [ξLe

p, ξRi

p]Vector specifying the percentage of platoon-enabled vehicles

µ= [li, ξtr, ξp] The environment state vector

φl∈ {Le, Ri}The lateral position of the subject vehicle

φp∈ {0,1}Platoon membership status of the vehicle

φ= [φl, φp, d] The vehicle state vector

dThe number of road pieces to the scheduled splitting of the

platoon, where d=−1 if the subject vehicle is not in a platoon

s= (µ, φ)∈SState of the traﬃc dynamics process

S={s}Set of all possible states of the traﬃc dynamics process

cf(s) The fuel cost of the subject vehicle at state s

ct(s) The time cost of the subject vehicle at state s

cdi(s1, s2)Passenger discomfort/safety risk cost for a vehicle transitioning

from state s1to s2

Nlc Number of lane changes

Λ=[λf, λt, λdi]Vector of cost component coeﬃcients, containing elements for

fuel, time, and discomfort/safety

C(s1, s2) = [(cf(s1) +

cf(s2))/2,(ct(s1) +

ct(s2))/2, cdi(s1, s2)]>

Cost vector for a vehicle transitioning from state s1to s2

Cs2

s1= ΛC(s1, s2) Sum of fuel, time, and comfort/safety costs

V([l, ξtr, ξp],[φl, φp, d]) The minimum total expected discounted cost-to-go starting

from state s= (µ, φ)

cfl Cost of missing the trip destination

Probability distributions

qf

φ(µ)Probability that the subject vehicle fails to change lanes if such

a decision has been made

g1

l(ξp) Probability of successful platoon merging with lane changing

g0

l(ξp) Probability of successful platoon merging without lane changing

w(k)Probability distribution for the number of road pieces, k, for

which the subject vehicle can stay with a platoon it has met

Transition matrices

pLe

l((ξLe

tr )0|ξtr, ξp)Probability that the traﬃc state transitions to (ξLe

tr )0in the left

lane, given ξtr and ξp

pRi

l((ξRi

tr )0|ξtr, ξp)Probability that the traﬃc state transitions to (ξRi

tr )0in the

right lane, given ξtr and ξp

8

hLe

l((ξLe

p)0|ξtr, ξp)Probability that the platoon intensity transitions to (ξLe

p)0in

the left lane, given ξtr and ξp

hRi

l((ξRi

p)0|ξtr, ξp)Probability that the platoon intensity transitions to (ξRi

p)0in

the right lane, given ξtr and ξp

Actions

al∈ {Le, Ri}Target lane

ap∈ {0,1}Target platoon membership

ar∈Sl(li) Target route

a= [al, ap, ar] The action taken by the subject vehicle

A={a}Action set

Let s= (µ, φ)∈Sdenote the state of the traﬃc dynamics process and Sis the set of221

all possible states s. Vector µ= [li, ξtr, ξp] in this process denotes the location-dependent222

environment state, where li∈Land Lincludes the location of the origin and destination223

of the trip (leg), denoted by loand ld, respectively, and all other road pieces on all possible224

paths that connect the origin to the destination. The vector ξtr = [ξLe

tr , ξRi

tr ] denotes the225

macroscopic state of traﬃc on the left and right lanes, respectively. More speciﬁcally, we226

consider three macroscopic traﬃc states of free-ﬂow, onset-of-congestion, and congested.227

Vector ξp= [ξLe

p, ξRi

p] denotes the percentage of platoon-enabled vehicles on the left and228

right lanes, respectively.229

Let φ= [φl, φp, d] denote the state of the subject vehicle. Here, φl∈ {Le, Ri}denotes230

the lateral position of the subject vehicle, where ‘Le’ and ‘Ri’ refer to the left and right231

lanes, respectively. Furthermore, φp∈ {0,1}is a binary indicator denoting the platoon232

membership status of the vehicle, where φp= 0 indicates that the subject vehicle is not a233

platoon member and φp= 1 indicates otherwise. Let ddenote the number of road pieces to234

the scheduled splitting of the platoon the subject vehicle is a member of. We set d=−1 if235

the subject vehicle is not in a platoon. We assume that before merging, vehicles that will236

stay in a same platoon will negotiate and reach consensus on the scheduled splitting position237

d=k. Vehicles in a platoon moving to the next road piece will have their ddecreased238

by 1. The platoon has to split/dissolve when d= 0. (We assume the subject vehicle can239

optimize its action periodically and thus actively split before the scheduled splitting position240

in our current model, but this can be easily modiﬁed by disabling the ﬁrst, third and fourth241

expression in Equation (8).)242

Let a= [al, ap, ar] denote the action taken by the subject vehicle in the beginning of each243

road piece, where al∈ {Le, Ri}denotes the target lane for the subject vehicle, ap∈ {0,1}244

denotes the target platoon membership, where ap= 0 indicates that the vehicle stays as245

a free agent and ap= 1 indicates that the vehicle merges into a platoon, and ar∈Sl(li)246

denotes the path selected by the vehicle.247

Let cf(s) and ct(s) denote the fuel cost and time cost of the subject vehicle at the state248

s, respectively. See [23] for the computation of fuel cost, cf(s). The time cost of a trip (leg)249

can be computed as the length of the road piece leniover the velocity in lane φlunder traﬃc250

condition ξtr,v(ξtr , φl), i.e.,251

ct(s) = leni/v(ξtr , φl) (1)

Let cdi(s1, s2) denote the cost associated with passenger discomfort/safety risk for tran-252

sitioning from state s1to s2. The passenger discomfort/safety cost is assumed to be realized253

9

when the vehicle is changing lanes, and increase linearly with the number of lane changes.254

Therefore,255

cdi(s1, s2) = g(Nlc ) (2)

where g(.) is a linear function and Nlc is the number of lane changes in the current road256

piece. In one road piece, the subject vehicle is not expected to change lanes more than once,257

i.e., Nlc ∈ {0,1}.258

Let Cs2

s1denote the sum of all three costs discussed above for a vehicle that starts a259

road piece in state s1and ends it in state s2. The exact transition position depends on the260

real-time traﬃc environment. For simpliﬁcation, we assume the transition takes place in the261

middle of a road piece, and therefore Cs2

s1can be formulated as:262

Cs2

s1= ΛC(s1, s2) = λf(cf(s1) + cf(s2))/2 + λt(ct(s1) + ct(s2))/2 + λdicdi (s1, s2)(3)

where the vector Λ = [λf, λt, λdi] contains the corresponding coeﬃcients for each cost com-263

ponent, and C(s1, s2) = [(cf(s1) + cf(s2))/2,(ct(s1) + ct(s2))/2, cdi(s1, s2)]>is the cost vector264

for a vehicle transitioning from state s1to s2. Note that all costs are functions of our action,265

where the action is implied from the transition of the state from s1to s2. We assume that266

Λ can be diﬀerent for each driver, since diﬀerent cost terms are of diﬀerent importance for267

each driver. The total travel cost Cs2

s1describes the generalized cost of travel in a road piece.268

For example, the MDP cost for a vehicle that starts a road piece on the left lane as a free269

agent and ends the road piece on the right lane as a free agent can be denoted by CRi,0,−1

Le,0,−1.270

An important part of the MDP model is the transition probability matrices that allow271

us to model the dynamics of the system. Let pLe

l((ξLe

tr )0|ξtr, ξp) and pRi

l((ξRi

tr )0|ξtr, ξp) de-272

note the probability that given ξtr and ξp, the traﬃc state transitions to (ξLe

tr )0in the left273

lane and to (ξRi

tr )0in the right lane in road piece li, respectively. Let hLe

l((ξLe

p)0|ξtr, ξp) and274

hRi

l((ξRi

p)0|ξtr, ξp) denote the probability that given ξtr and ξp, the platoon intensity tran-275

sitions to (ξLe

p)0in the left lane and to (ξRi

p)0in right lane, respectively. These transition276

probability matrices can be learnt from historical data.277

Let qf

φ(µ) denote the probability that the subject vehicle fails to change lanes if such a278

decision has been made. Note that qf

φis a function of the traﬃc state in target lane. Let279

g1

l(ξp) and g0

l(ξp) denote the probability of successful platoon merging with and without lane280

changing, respectively. Note that g1

lis a function of the density of platoon-enabled vehicles281

in the target lane, and g0

lis a function of the availability of platoon-enabled vehicles in the282

immediate downstream of the subject vehicle in the original lane. Let l0

iand µ0= [l0

i, ξ0

tr, ξ0

p]283

denote a candidate road piece directly connected to liand its corresponding environment284

state vector, respectively. Hence, the problem terminates when the vehicle reaches its desti-285

nation, i.e., li=ld. Finally, let V([li, ξtr, ξp],[φl, φp, d]) denote the minimum total expected286

discounted cost starting with the vehicle state [φl, φp, d] and the environment state [li, ξtr, ξp].287

Hence, for li=ld, the minimum total expected discounted cost is given by288

V([ld, ξtr, ξp],[φl, φp, d]) = (0 if the vehicle is at the correct destination

cfl otherwise (4)

where cfl is a cost incurred should the subject vehicle fail to reach its destination (e.g., the289

vehicle should be a single vehicle in the right lane at the target oﬀ-ramp piece).290

10

For li6=ld, when φl=Le,φp= 0, d=−1, the minimum expected discounted cost is291

given by292

V(µ, Le, 0,−1) =

mina∈A

ΛCLe,0,−1

Le,0,−1+U(µ0, Le, 0,−1) al=Le,ap= 0

g0

l(ξp){ΛCLe,1,k−1

Le,0,−1+W(µ0, Le, 1, k −1)}+

(1 −g0

l(ξp)){ΛCLe,0,−1

Le,0,−1+U(µ0, Le, 0,−1)}al=Le,ap= 1

qf

φ(µ){ΛCLe,0,−1

Le,0,−1+U(µ0, Le, 0,−1)}+

(1 −qf

φ(µ)){ΛCRi,0,−1

Le,0,−1+U(µ0, Ri, 0,−1)}al=Ri,ap= 0

g1

l(ξp)(1 −qf

φ(µ)){ΛCRi,1,k−1

Le,0,−1+W(µ0, Ri, 1, k −1)}

+(1 −g1

l(ξp)(1 −qf

φ(µ))){ΛCLe,0,−1

Le,0,−1+U(µ0, Le, 0,−1)}al=Ri,ap= 1

(5)

where Uand Ware described in Equations (6) and (7) as the minimum expected discounted293

cost of the remainder of the trip starting from the next road piece l0

ifor a vehicle that intends294

to maintain its state and join a platoon, respectively.295

U(µ0, φl, φp, d) = αX

ξ0

trξ0

p

pLe

l((ξLe

tr )0|ξtr, ξp)pRi

l((ξRi

tr )0|ξtr, ξp)

hLe

l((ξLe

p)0|ξp, ξp)hRi

l((ξRi

p)0|ξp, ξp)V(µ0, φl, φp, d)

(6)

W(µ0, φl,1, k −1) = αX

ξ0

trξ0

pX

k

w(k)pLe

l((ξLe

tr )0|ξtr, ξp)pRi

l((ξRi

tr )0|ξtr, ξp)

hLe

l((ξLe

p)0|ξp, ξp)hRi

l((ξRi

p)0|ξp, ξp)V(µ0, φl,1, k −1)

(7)

The four arguments of the min function in Equation (5) correspond to the costs of the296

lane changing and platoon merging actions. The expected discounted cost (with the initial297

values as speciﬁed) is then the minimum cost over the entire action set, which consists of298

lane changing, platoon merging, and route choice. The ﬁrst expression in Equation (5)299

corresponds to the action that results in no change in the state of the vehicle; that is, the300

subject vehicle stays on the left lane as a single agent. The cost of this action is equal to the301

cost of continuing with the initial state (Le, 0,−1) on the current road piece, plus the min302

expected discounted cost of starting the next road piece under the same initial state.303

The second expression in Equation (5) corresponds to the action of staying on the left lane,304

but joining a platoon. The ﬁrst term here corresponds to the expected cost of the scenario305

where the vehicle successfully joins a platoon. Under this scenario, the vehicle incurs both306

the cost of this new trajectory on the current road piece and the expected discounted cost of307

the rest of the trip starting from its new state as a platoon member. In case the execution308

of this action fails (i.e., the vehicle cannot join a platoon), the vehicle will continue under309

11

the previous state on the current road piece, and incurs an expected discounted cost for the310

rest of the trip starting from the left lane as a single agent. This cost is demonstrated in the311

second term.312

The third expression in Equation (5) corresponds to the action of changing to the right313

lane and remaining a free agent. Similar to the previous case, the ﬁrst term captures the314

expected cost if the action can be completed, and the second term corresponds to the cost315

of the trajectory if the vehicle fails to complete the action.316

Finally, the last expression in Equation (5) corresponds to the action of changing lanes317

and joining a platoon. In this case, the expected discounted cost is the summation of two318

terms, the ﬁrst term corresponding to the entire action being completed, and the second319

term corresponding to the action failing.320

For the case where the subject vehicle is a platoon member and the platoon splitting time321

has not been reached (i.e., li6=ld, when φl=Le,φp= 1, d > 0), the minimum expected322

discounted cost is given by323

V(µ, Le, 1, d) =

mina∈A

ΛCLe,0,−1

Le,1,d +U(µ0, Le, 0,−1) al=Le,ap= 0

ΛCLe,1,d−1

Le,1,d +U(µ0, Le, 1, d −1) al=Le,ap= 1

qf

φ(µ){ΛCLe,1,d−1

Le,1,d +U(µ0, Le, 1, d −1)}+

(1 −qf

φ(µ)){ΛCRi,0,−1

Le,1,d +U(µ0, Ri, 0,−1)}al=Ri,ap= 0

1−g1

l(ξp)(1 −qf

φ(µ)){ΛCLe,1,d−1

Le,1,d +U(µ0, Le, 1, d −1)}

+g1

l(ξp)(1 −qf

φ(µ)){ΛCRi,1,k−1

Le,1,d +W(µ0, Ri, 1, k −1)}al=Ri,ap= 1

(8)

The ﬁrst expression in the min function in Equation (8) refers to the case that the subject324

vehicle splits from its platoon without changing lanes. Since this can always be achieved,325

the expected discounted cost of this action is the cost of the subject vehicle traveling on its326

current road piece as a free agent, plus its expected discounted cost of continuing to travel327

as a free agent starting from the next road piece.328

The second expression in Equation (8) describes the scenario where the subject vehicle329

maintains its current state. Under this scenario, the subject vehicle traverses its current330

road piece while maintaining its state, and continues the rest of its trip with the platoon331

splitting time reduced by one unit.332

The third expression in Equation (8) has the subject vehicle splitting from the platoon333

and changing lanes. When the subject vehicle decides to change lanes while in a platoon,334

it has to split from its platoon ﬁrst. The ﬁrst term here captures the scenario where the335

subject vehicle is not able to change lanes, under which case it will continue in its current336

platoon. Note that the OC model will inform the subject vehicle whether it can successfully337

change lanes. As such, if OC determines that changing lanes cannot take place safely, the338

subject vehicle will not split from its platoon. If the subject vehicle can change lanes, it will339

split from its platoon and continue the rest of the trip on the right lane as a free agent.340

12

Finally, the fourth expression in Equation (8) has the subject vehicle changing lanes and341

traveling on the right lane in a platoon. For this action to take place, the subject vehicle342

should split from its current platoon, change lanes, and join a platoon on the right lane. Since343

we are assuming that the subject vehicle is always able to split from its current platoon, the344

probability of completing this action is the probability of successfully changing lanes and345

joining a platoon in the new lane. The ﬁrst term here captures the cost of this action failing,346

in which case the subject vehicle would continue on the left lane in its current platoon. The347

second term captures the cost of the action being completed successfully.348

For the case where the vehicle is a platoon member on the left lane and the platoon349

splitting time has arrived (i.e., li6=ld, when φl=Le,φp= 1, d= 0), the minimum expected350

discounted cost is given by351

V(µ, Le, 1,0) =

mina∈A

ΛCLe,0,−1

Le,1,0+U(µ0, Le, 0,−1) al=Le,ap= 0

g0

l(ξp){ΛCLe,1,k−1

Le,1,0+W(µ0, Le, 1, k −1)}+

(1 −g0

l(ξp)){ΛCLe,0,−1

Le,1,0+U(µ0, Le, 0,−1)}al=Le,ap= 1

qf

φ(µ){ΛCLe,0,−1

Le,1,0+U(µ0, Le, 0,−1)}+

(1 −qf

φ(µ)){ΛCRi,0,−1

Le,1,0+U(µ0, Ri, 0,−1)}al=Ri,ap= 0

g1

l(ξp)(1 −qf

φ(µ)){ΛCRi,1,k−1

Le,1,0+W(µ0, Ri, 1, k −1)}

+1−g1

l(ξp)(1 −qf

φ(µ)){ΛCLe,0,−1

Le,1,0+U(µ0, Le, 0,−1)}al=Ri,ap= 1

(9)

In Equation (9), d= 0 indicates that the platoon is dissolving and the subject vehicle352

has to split from it in the current road piece. The ﬁrst expression in Equation (9) captures353

the scenario where the subject vehicle continues to travel on the left lane as a free agent354

after splitting from its current platoon.355

The second expression in Equation (9) captures the case where the subject vehicle decides356

to join another platoon in the left lane, which may fail due to the absence of platoon-enabled357

vehicles in the left lane (second term).358

The third expression in Equation (9) indicates that the subject vehicle plans to change359

lanes and continue to travel as a free agent. This action may fail if the subject vehicle cannot360

change lanes (ﬁrst term), in which case the subject vehicle continues to travel on the left361

lane as a free agent. Otherwise, the subject vehicle travels on the right lane as a free agent.362

The fourth expression in Equation (9) captures the scenario where the subject vehicle363

switches to the right lane and joins a platoon. The ﬁrst term is the cost of the case where364

this action can be completed successfully, and the second term captures the case where this365

action fails.366

For other cases that the vehicle is on the right lane (i.e., φl=Ri), the minimum expected367

discounted cost has similar formulas as above. Refer to Appendix A for details.368

13

4.3. The Optimal Control (OC) Model369

The MDP model creates a policy that advises the set of coarse actions the vehicle needs370

to take in order to complete its trip in the most cost-eﬀective way. However, the MDP371

model cannot provide a full, implementable trajectory for the subject vehicle that includes372

its target acceleration proﬁle. As such, the MDP framework utilizes an optimal control373

(OC) model to bridge this gap. The role of the OC model is two-fold: First, it devises374

an acceleration proﬁle for the subject vehicle to complete coarse actions (or determines the375

infeasibility of completing the coarse actions) following a quintic trajectory function and376

subject to collision avoidance and bounds on the vehicle’s speed, acceleration, and jerk [23].377

The quintic trajectory function is selected due to its ability to provide a smooth trajectory.378

This function is demonstrated in Equation (10). In this equation, x(t) and y(t) indicate379

the longitudinal and lateral positions of the vehicle at time t, respectively, Coeﬃcients ai

0

380

through ai

5and bi

0through ai

5are decision variables that determine the optimal solution.381

(x(t) = ai

5t5+ai

4t4+ai

3t3+ai

2t2+ai

1t+ai

0

y(t) = bi

5t5+bi

4t4+bi

3t3+bi

2t2+bi

1t+bi

0

.(10)

Additionally, the OC model quantiﬁes the short-term cost of completing the coarse actions382

based on the acceleration proﬁle of the vehicle [23]. More speciﬁcally, given the action383

a={al, ap, ar}, the OC model plans a trajectory that minimizes a convex combination of384

fuel and time costs, subject to safety and comfort guarantees. The details on the OC model385

can be found in [23].386

For each action a∈A, this short term cost Coc is then combined with the expected387

long-term cost V(µ, φ) in the MDP framework. The MDP framework enumerates all coarse388

actions a∈A, and selects the action that minimizes the total cost by the OC and MDP389

models.390

5. Experiments and Analysis391

In this section, we will conduct simulations in three experimental settings, namely a392

circular track, a straight highway, and a small network with route choice. We compare the393

performance of the local OC model and the MDP framework, in which the OC and MDP394

models are combined, under diﬀerent traﬃc states in all three experimental settings. Our395

simulations are based on a previously built simulation platform in [23], in which surrounding396

vehicles follow the Intelligent Driver Model [63]. We consider aerodynamic, rolling, grade,397

and inertial resistance forces for fuel cost computation [64], and set the value of time (VoT)398

to 10 dollars per hour.399

5.1. Model Calibration400

In a future connected and automated vehicle system, parameters of the MDP framework401

can be calibrated using historical data. Note that even when abundant CAV data becomes402

available, it could still be a diﬃcult task to fully and precisely represent every single driving403

scenario due to the complexity of human behaviour, non-linearity of interactions between404

vehicles, and the dynamic nature of the transportation network. Therefore, a more practical405

14

approach would be to use historical data to partition ξLe

tr ,ξRi

tr ,ξLe

pand ξRi

pinto diﬀerent clus-406

ters, representing diﬀerent traﬃc states and platoon intensities in the left and right lanes,407

respectively. The transition probabilities can then be estimated using the maximum likeli-408

hood principle, based on the occurrence percentages of the corresponding state transitions in409

historical records. Furthermore, once data is available, we can use it in a maximum likelihood410

estimation framework to calibrate functions qf

φ(µ), g1

l(ξp) and g0

l(ξp), and w(k).411

For the current study, since historical data does not exist, we use simulations to create412

CAV driving scenarios, and treat observations within simulations as historical data. We con-413

duct simulations using the OC model proposed in [23], in which a mixed traﬃc of CAVs and414

legacy vehicle can travel in a traﬃc stream. The parameter values used in these simulations415

are speciﬁed in Appendix B. After a warm-up period of about 20 minutes, we estimate the416

required parameters for this study using the maximum likelihood principle.417

In this work, we assume that only the subject vehicle is adopting the MDP framework,418

and thus the actions taken by a single vehicle do not change the macroscopic traﬃc state419

of the system. If the penetration rate of vehicles that adopt the MDP framework is high,420

actions taken by these vehicles could change the state. In this case, model parameters and421

the optimal MDP policy can be updated periodically to capture such changes.422

5.2. A Circular Track Scenario423

Circular track is a great experimental setting as it can demonstrate the impact of the424

proposed methodology not only on the generalized cost of a trip, but also on the properties425

of traﬃc wave propagation [65, 66]. Stern et al. [9] demonstrate that a low penetration of426

autonomous vehicles can eﬀectively dampen the stop-and-go wave using a circular track.427

Here, we conduct our simulations in a circular track, where the surrounding vehicles can428

merge into platoons, but cannot change lanes, enter through on-ramps, or exit from oﬀ-429

ramps. In these simulations, the subject vehicle will have a trip of 10.8 kilometers in length,430

and diﬀerent traﬃc states (e.g., free-ﬂow, onset-of-congestion and congested) are generated431

similar to [23], by utilizing a fundamental diagram of traﬃc ﬂow.432

In ﬁgures presented in this paper, OC and MDP refer to the local optimal controller433

and the MDP framework (also referred to as the MDP controller), respectively. The suﬃx434

xK indicates that the circle length is xkilometers. The suﬃx low,medium and high435

represent the penetration of platoon-enabled vehicles. Speciﬁcally, low indicates that all436

surrounding vehicles are non-platoon-enabled, medium indicates a subset (about 30%) of437

surrounding vehicles are platoon-enabled, and high indicates that all surrounding vehicles438

are platoon-enabled.439

Figure 3 shows the generalized cost incurred by the subject vehicle under the OC and440

MDP controllers when the circular track is 2, 5 and 10 km in perimeter, respectively. This441

ﬁgure indicates that the circle perimeter does not signiﬁcantly aﬀect the subject vehicle’s442

generalized cost. In the free-ﬂow and onset-of-congestion states, the MDP controller provides443

statistically signiﬁcant (at the 5% signiﬁcance level) lower costs. In the congested traﬃc444

state, no statistically signiﬁcant diﬀerence in cost is observed between the MDP and OC445

controllers, although the variance of cost is lower under the MDP controller.446

Figure 4 shows the generalized cost for the subject vehicle under diﬀerent controllers and a447

track perimeter of 5km, as the penetration rate of platoon-enabled vehicles in the surrounding448

traﬃc changes. In the free-ﬂow state, it is only under a high penetration rate that the MDP449

15

Figure 3: The simulation environment is a circular track. The top, middle and bottom sub-ﬁgures represent

the free-ﬂow, onset-of-congestion, and congested traﬃc states, respectively. The vertical axes show the

generalized costs with VoT set to 10 dollars per hour. Along the horizontal axes, the generalized costs of

the subject vehicle under diﬀerent controllers in circular tracks of diﬀerent lengths are compared. Here ‘OC’

and ‘MDP’ denote local optimal and the MDP controllers, respectively. The suﬃx ‘ xK’ indicates that the

length of the circular track is x kilometers.

controller results in a signiﬁcantly smaller cost compared with the OC controller, and there450

is no signiﬁcant diﬀerence when penetration rate is low or medium. In onset-of-congestion451

traﬃc state, the MDP controller has signiﬁcantly smaller costs than the OC controller at452

all penetration rates. In congested traﬃc, the MDP and OC controllers are not diﬀerent in453

a statistically signiﬁcant manner, although the generalized cost is much lower under a high454

penetration rate of platoon-enabled vehicles. Generally, higher intensity of platoon-enabled455

vehicles gives rise to more opportunities for the subject vehicle to join a platoon, thereby456

resulting in less cost.457

It is hypothesized that the beneﬁts of CAVs can be extended to their surrounding vehicles.458

To put this hypothesis to test, we measure the average time cost, fuel cost and generalized459

cost of 15 vehicles traveling upstream of the subject vehicle. Figures 5 and 6 show the460

costs incurred by the upstream vehicles under the same settings as in Figures 3 and 4,461

respectively. In order to observe the performance of the OC and MDP controllers directly,462

we subtract the costs under the OC controller from those of the MDP controller. Figure463

5 shows that generally speaking, a subject vehicle that travels under the MDP controller464

can induce statistically signiﬁcant cost savings for its surrounding vehicles under any traﬃc465

state. Figure 6 conﬁrms the same conclusion. This ﬁgure also shows that these second-hand466

beneﬁts are more highlighted when the density of platoon-enabled vehicles is higher.467

16

Figure 4: The suﬃxes ‘ low’, ‘ medium’ and ‘ high’ represent diﬀerent levels of intensities of platoon-enabled

vehicles in the environment. Speciﬁcally, ‘ low’ indicates that all surrounding vehicles are non-platoon-

enabled, ‘ medium’ indicates that a proportion (about 30%) of the surrounding vehicles are platoon-enabled,

and ‘ high’ indicates that all surrounding vehicles are platoon-enabled. Other settings are the same as Figure

3.

Figure 5: Diﬀerences in average time, fuel, and generalized costs of the vehicles upstream to the subject

vehicle for diﬀerent track lengths. A positive value indicates that MDP results in a higher cost than OC,

while a negative value indicates that MDP brings more cost savings than OC. The simulation settings are

similar to those in Figure 3.

17

Figure 6: Diﬀerences in average time, fuel, and generalized costs of the vehicles upstream to the subject

vehicle under diﬀerent penetration rates of platoon-enabled vehicles. A positive value indicates that MDP

results in a higher cost than OC, while a negative value indicates that MDP brings more cost savings than

OC. The simulation settings are similar to those in Figure 4.

Figure 7: The simulation environment is a highway with on- and oﬀ-ramps. The value following ‘MDP ’ in

the name of the controller speciﬁes the discount factor, αin the MDP model. Other settings are similar to

those in Figure 3.

5.3. A Two-lane Highway Scenario468

In this highway scenario, we adopt the same surrounding environment setting as in [23].469

Surrounding vehicles can change lanes, merge/exit from the highway, and join into/split from470

18

Figure 8: The average generalized cost of the surrounding vehicles. The value following MDP in the name

of the controller speciﬁes the discount factor, αin the MDP model. Other settings are similar to those in

Figure 7.

a platoon. Figure 7 demonstrates the generalized costs of diﬀerent controllers, where the471

number in the controller name is the value of α, i.e., the discount factor used in the MDP472

model. This ﬁgure shows that in all traﬃc states, the larger the discount factor (i.e., the473

more weight on the expectation of the long term cost), the smaller the cost for the subject474

vehicle along the entire trip, which highlights the importance of accounting for the long-term475

trip cost. Figure 8 shows the generalized cost of the surrounding vehicles. In the free-ﬂow476

traﬃc state, the MDP controller results in signiﬁcantly smaller cost for the surrounding477

vehicles, and these savings grow as the MDP discount factor increases. However, under the478

onset-of-congestion and congested traﬃc states, the OC and MDP controllers do not show479

signiﬁcant diﬀerences in cost.480

Figure 9 shows the generalized costs incurred by the subject vehicle and its immediate481

downstream vehicles for an example trip in the onset-of-congestion traﬃc state, as well as482

the lateral position and platoon membership status of the subject vehicle. The top plot483

in this ﬁgure pertains to the trajectory formed by the OC model, and the bottom plot484

demonstrates the trajectory devised by the MDP controller. In the top plot, the subject485

vehicle makes decisions based solely on local information; as such, its trajectory tends to486

closely follow the trajectory of its downstream vehicle. This ﬁgure shows that under the OC487

controller, the subject vehicle changes to the left lane at about 2950 time steps, and then488

returns to its original lane at about 3750 time steps, an indicator of short-sighted decisions.489

The subject vehicle’s platoon membership status also changes frequently starting at about490

4600 time steps. These actions disturb the traﬃc stream and increase the generalized cost491

of the subject vehicle and its surrounding vehicles. In the bottom plot, the subject vehicle492

changes to the left lane at an early time, in which it travels for the rest of its trip. The493

19

Figure 9: The vertical axis shows the generalized cost, with the unit of dollars per 10 km. The horizontal

axis is time, with the unit of 0.1 second. Generalized cost of the subject vehicle and its immediate upstream

vehicles, as well as its lane position and platoon membership status are shown. In the top plot, the subject

vehicle is traveling under the OC controller, while in the bottom plot, the subject vehicle is traveling under

the MDP controller.

subject vehicle also joins platoons twice during its trips, but for longer periods of time. In494

general, the cost of the subject vehicle under the OC controller is much higher than that of495

the MDP controller.496

5.4. A Network-level Scenario with Route Choice497

In these experiments, we show the extensibility of the MDP framework in a joint decision498

making scenario, in which the framework makes routing, lane-changing, and platoon-merging499

decisions. In the scenario shown in Figure 10, the subject vehicle has two possible routes500

to the destination, namely ‘Route1’ and ‘Route2’. Figure 11 shows the results under three501

scenarios. Under Route1 and Route2, the traveling route is ﬁxed, and the OC model deter-502

mines the lane changing and platoon merging decisions. Under MDP, the MDP framework503

makes all three sets of decisions. This ﬁgure demonstrates that under all traﬃc states, the504

MDP model results in statistically signiﬁcant savings in the generalized cost compared to505

the OC model with a ﬁxed route.506

20

Figure 10: The subject vehicle has two available routes from the origin (blue point) to the destination (red

point). Route 1 has a slightly shorter distance, but it is more congested compared with route 2.

Figure 11: ‘Route1’ and ‘Route2’ refer to scenarios where the subject vehicle will take routes 1 and 2,

respectively. In these two scenarios, the OC controller is applied. The ‘MDP’ label refers to the case where

the MDP framework selects the adopted route. Other settings are the same as Figure 3.

6. Conclusion507

In this paper we proposed a motion planning framework for a CAV in a mixed traﬃc en-508

vironment. The framework design leverages an optimal control model to quantify the short-509

term cost of a trip and an MDP model to capture its long-term cost. This general framework510

outputs the target acceleration proﬁle of the vehicle as well as routing, platooning and lane511

changing decisions in a dynamic traﬃc environment. We implemented this motion planning512

framework in three experimental scenarios including a highway section with multiple on-513

and oﬀ-ramps, a circular track, and an urban network with route choice, and conducted a514

comprehensive set of simulations to quantify the long-term beneﬁts the subject vehicle and515

its surrounding vehicles may experience as a result of incorporating network-level informa-516

21

tion into the decision-making process. Our experiments indicate that, generally speaking,517

the MDP framework outperforms a local OC controller in reducing the generalized trip cost.518

With higher intensity of platoon-enabled vehicles or higher weight on long-term cost (larger519

discounting factor), the reduction in generalized cost for both the subject vehicle and its520

upstream vehicles is statistically signiﬁcant. This signiﬁcant cost saving, which originates521

from accounting for network-level conditions, exists in all simulated environments, under522

various traﬃc states.523

ACKNOWLEDGMENT524

The work described in this paper is supported by research grants from the National525

Science Foundation (CPS-1837245, CPS-1839511, IIS-1724341).526

Appendix A. Expected Discounted Cost in Right Lane527

For li6=ld, when φl=Ri,φp= 0, and d=−1, the minimum expected discounted cost528

is given by529

V(µ, Ri, 0,−1) =

mina∈A

ΛCRi,0,−1

Ri,0,−1+U(µ0, Ri, 0,−1) al=Ri,ap= 0

g0

l(ξp){ΛCRi,1,k−1

Ri,0,−1+W(µ0, Ri, 1, k −1)}+

(1 −g0

l(ξp)){ΛCRi,0,−1

Ri,0,−1+U(µ0, Ri, 0,−1)}al=Ri,ap= 1

qf

φ(µ){ΛCRi,0,−1

Ri,0,−1+U(µ0, Ri, 0,−1)}+

(1 −qf

φ(µ)){ΛCLe,0,−1

Ri,0,−1+U(µ0, Le, 0,−1)}al=Le,ap= 0

g1

l(ξp)(1 −qf

φ(µ)){ΛCLe,1,k−1

Ri,0,−1+W(µ0, Le, 1, k −1)}

+(1 −g1

l(ξp)(1 −qf

φ(µ))){ΛCRi,0,−1

Ri,0,−1+U(µ0, Ri, 0,−1)}al=Le,ap= 1

(A.1)

The explanation for the case that the subject vehicle is a free agent in the right lane is530

similar to that in the left lane.531

For li6=ld, when φl=Ri,φp= 1, d > 0, the minimum expected discounted cost is given532

22

by533

V(µ, Ri, 1, d) =

mina∈A

ΛCRi,0,−1

Ri,1,d +U(µ0, Ri, 0,−1) al=Ri,ap= 0

ΛCRi,1,d−1

Ri,1,d +U(µ0, Ri, 1, d −1) al=Ri,ap= 1

qf

φ(µ){ΛCRi,1,d−1

Ri,1,d +U(µ0, Ri, 1, d −1)}+

(1 −qf

φ(µ)){ΛCLe,0,−1

Ri,1,d +U(µ0, Le, 0,−1)}al=Le,ap= 0

1−g1

l(ξp)(1 −qf

φ(µ)){ΛCRi,1,d−1

Ri,1,d +U(µ0, Ri, 1, d −1)}

+g1

l(ξp)(1 −qf

φ(µ)){ΛCLe,1,k−1

Ri,1,d +W(µ0, Le, 1, k −1)}al=Le,ap= 1

(A.2)

The explanation for the case that the subject vehicle is a member of platoon in the right534

lane is similar to the case that in the left lane.535

For li6=ld, when φl=Ri,φp= 1, d= 0, the minimum expected discounted cost is given536

by537

V(µ, Ri, 1,0) =

mina∈A

ΛCRi,0,−1

Ri,1,0+U(µ0, Ri, 0,−1) al=Ri,ap= 0

g0

l(ξp){ΛCRi,1,k−1

Ri,1,0+W(µ0, Ri, 1, k −1)}+

(1 −g0

l(ξp)){ΛCRi,0,−1

Ri,1,0+U(µ0, Ri, 0,−1)}al=Ri,ap= 1

qf

φ(µ){ΛCRi,0,−1

Ri,1,0+U(µ0, Ri, 0,−1)}+

(1 −qf

φ(µ)){ΛCLe,0,−1

Ri,1,0+U(µ0, Le, 0,−1)}al=Le,ap= 0

g1

l(ξp)(1 −qf

φ(µ)){ΛCLe,1,k−1

Ri,1,0+W(µ0, Le, 1, k −1)}

+(1 −g1

l(ξp)(1 −qf

φ(µ))){ΛCRi,0,−1

Ri,1,0+U(µ0, Ri, 0,−1)}al=Le,ap= 1

(A.3)

The explanation for the case that the subject vehicle is in the right lane is similar to the538

last case that it is in the left lane.539

Appendix B. Parameters for Generating Simulations540

Appendix C. Sensitivity Analysis over Parameters in Traﬃc Environment541

To demonstrate the performance of our method under various settings, we conduct sen-542

sitivity analysis over parameters pon ,pof f ,pnpe,pmerge and pchang e in the two-lane highway543

scenario.544

Under univariate analysis, we adjust the value of one parameter at a time while keeping545

the values of other parameters unchanged. To maintain a relatively steady traﬃc environ-546

ment, i.e., to avoid changes in traﬃc state, we use the same value for pon and pof f to balance547

the number of vehicles entering and exiting the highway.548

23

Table B.1: Summary of parameters

Parameter Value Deﬁnition

tupd 0.4 secs the updating period of the trajectory of the subject vehicle

pon 0.6 the probability that a vehicle is interested in joining the freeway

from an on-ramp

poﬀ 0.6 the probability that a vehicle is interested in taking an oﬀ-ramp

pnpe 0.5 the probability that the vehicle is not platoon-enabled

pmerge 0.6 the probability of that a vehicle intends to merge

pchange 0.1 the probability of that the vehicle intends to change lane

tp3.5 secs the time gap between two successive vehicles not in a platoon

tg0.55 secs the time gap between two successive vehicles in a platoon

tlcp 3.6 secs the period of time within which the surrounding vehicles

complete changing lanes

tlc 5 secs the minimum time interval between two successive lane changes

by two successive vehicles in the same lane

τs0.4 secs the reaction time delay in the car-following model

tNact 10 secs the prediction horizon in the optimal control model

vle

m20 m/s the velocity in the left lane at the maximum ﬂow rate

vri

m14 m/s the velocity in the right lane at the maximum ﬂow rate

vle

max 30 m/s maximum velocity in left lane

vri

max 20 m/s maximum velocity in right lane

amax 2 m/s2maximum acceleration for the subject vehicle

jmax 3.5 m/s3maximum jerk for the subject vehicle

dcg 50 m critical gap to decide whether it is feasible to change lanes

lcar 5 m length of a vehicle

hst 5 m vehicle would stop at headway of this value

a2 m/s2the maximum desired acceleration

b3 m/s2the comfortable deceleration

γAR 0.3987 coeﬃcient for air resistance force

γRR 281.547 coeﬃcient for rolling resistance force

γGR 0 coeﬃcient for grade resistance force

γIR 1750 coeﬃcient for inertial resistance force

ηf5.98×10-8 fuel cost for a unit energy consumed by the vehicle (dollars/Joule)

Psch {2,10,50}the scheduled splitting position can be in 2, 10 or 50 road pieces

N(µsch, σsch)N(2,5), left,

N(−1,5), right

the normal distribution of the scheduled splitting position in the

left and right lanes

24

Figures C.1 and C.2 display the generalized cost when pon =poff = 0.4 and pon =pof f =549

0.8 for the subject vehicle and surrounding vehicles, respectively. Figures C.3 and C.4550

demonstrate the generalized costs when pnpe = 0.1 and pnpe = 0.9, respectively. Figures C.5551

and C.6 correspond to the cases where pmerge = 0.4 and pmerge = 0.8. Figures C.7 and C.8552

show the cost when pchange = 0.05 and pchange = 0.3.553

Under all these settings, our MDP framework generally results in statistically signiﬁ-554

cant cost savings for subject vehicle and its surrounding vehicles in free-ﬂow and onset-of-555

congestion states, and there is no signiﬁcant diﬀerence in the congested state.556

Figure C.1: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pon and

pof f . Other settings are similar to those in Figure 7.

25

Figure C.2: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pon and

pof f . Other settings are similar to those in Figure 8.

Figure C.3: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pnpe.

Other settings are similar to those in Figure 7.

[1] J. B. Kenney, Dedicated short-range communications (dsrc) standards in the united557

states, Proceedings of the IEEE 99 (2011) 1162–1182.558

[2] Z. Zhang, A. Tafreshian, N. Masoud, Modular transit: Using autonomy and modularity559

26

Figure C.4: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pnpe.

Other settings are similar to those in Figure 8.

Figure C.5: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pmerge.

Other settings are similar to those in Figure 7.

to improve performance in public transportation, Transportation Research Part E:560

Logistics and Transportation Review 141 (2020) 102033.561

[3] M. Abdolmaleki, Y. Yin, N. Masoud, A unifying graph-coloring approach for intersec-562

27

Figure C.6: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pmerge.

Other settings are similar to those in Figure 8.

Figure C.7: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pchange.

Other settings are similar to those in Figure 7.

tion control in a connected and automated vehicle environment, Available at SSRN563

3944348 (2021).564

[4] N. Masoud, R. Jayakrishnan, Autonomous or driver-less vehicles: Implementation565

28

Figure C.8: The value following ‘OC ’ or ‘MDP ’ in the name of the controller speciﬁes the value of pchange.

Other settings are similar to those in Figure 8.

strategies and operational concerns, Transportation research part E: logistics and trans-566

portation review 108 (2017) 179–194.567

[5] F. van Wyk, A. Khojandi, N. Masoud, Optimal switching policy between driving entities568

in semi-autonomous vehicles, Transportation Research Part C: Emerging Technologies569

114 (2020) 517–531.570

[6] F. van Wyk, A. Khojandi, N. Masoud, A path towards understanding factors aﬀect-571

ing crash severity in autonomous vehicles using current naturalistic driving data, in:572

Proceedings of SAI Intelligent Systems Conference, Springer, Cham, pp. 106–120.573

[7] S. Cui, B. Seibold, R. Stern, D. B. Work, Stabilizing traﬃc ﬂow via a single autonomous574

vehicle: Possibilities and limitations, in: 2017 IEEE Intelligent Vehicles Symposium575

(IV), IEEE, pp. 1336–1341.576

[8] M. W. Levin, Congestion-aware system optimal route choice for shared autonomous577

vehicles, Transportation Research Part C: Emerging Technologies 82 (2017) 229–247.578

[9] R. E. Stern, S. Cui, M. L. Delle Monache, R. Bhadani, M. Bunting, M. Churchill,579

N. Hamilton, H. Pohlmann, F. Wu, B. Piccoli, et al., Dissipation of stop-and-go waves580

via control of autonomous vehicles: Field experiments, Transportation Research Part581

C: Emerging Technologies 89 (2018) 205–221.582

[10] B. HomChaudhuri, A. Vahidi, P. Pisu, A fuel economic model predictive control strategy583

for a group of connected vehicles in urban roads, in: 2015 American Control Conference584

(ACC), IEEE, pp. 2741–2746.585

29

[11] T. Ersal, I. Kolmanovsky, N. Masoud, N. Ozay, J. Scruggs, R. Vasudevan, G. Orosz,586

Connected and automated road vehicles: state of the art and future challenges, Vehicle587

system dynamics 58 (2020) 672–704.588

[12] S. E. Shladover, C. Nowakowski, X. Y. Lu, R. Ferlis, Cooperative adaptive cruise control589

(cacc) deﬁnitions and operating concepts, in: Trb Conference.590

[13] Z. Wang, G. Wu, M. J. Barth, A review on cooperative adaptive cruise control (cacc)591

systems: Architectures, controls, and applications (2018).592

[14] V. Milan´es, S. E. Shladover, Modeling cooperative and autonomous adaptive cruise593

control dynamic responses using experimental data, Transportation Research Part C:594

Emerging Technologies 48 (2014) 285–300.595

[15] L. Zhang, J. Sun, G. Orosz, Hierarchical design of connected cruise control in the596

presence of information delays and uncertain vehicle dynamics, IEEE Transactions on597

Control Systems Technology 26 (2017) 139–150.598

[16] G. Orosz, Connected cruise control: modelling, delay eﬀects, and nonlinear behaviour,599

Vehicle System Dynamics 54 (2016) 1147–1176.600

[17] J. Lioris, R. Pedarsani, F. Y. Tascikaraoglu, P. Varaiya, Platoons of connected vehicles601

can double throughput in urban roads, Transportation Research Part C: Emerging602

Technologies 77 (2017) 292–305.603

[18] S. Maiti, S. Winter, L. Kulik, A conceptualization of vehicle platoons and platoon604

operations, Transportation Research Part C Emerging Technologies 80 (2017) 1–19.605

[19] Z. Huang, D. Chu, C. Wu, Y. He, Path planning and cooperative control for automated606

vehicle platoon using hybrid automata, IEEE Transactions on Intelligent Transportation607

Systems 20 (2018) 959–974.608

[20] A. K. Bhoopalam, N. Agatz, R. Zuidwijk, Planning of truck platoons: A literature re-609

view and directions for future research, Transportation research part B: methodological610

107 (2018) 212–228.611

[21] M. Abdolmaleki, M. Shahabi, Y. Yin, N. Masoud, Itinerary planning for cooperative612

truck platooning, Transportation Research Part B: Methodological 153 (2021) 91–110.613

[22] D. Gonz´alez, J. P´erez, V. Milan´es, F. Nashashibi, A review of motion planning tech-614

niques for automated vehicles, IEEE Transactions on Intelligent Transportation Systems615

17 (2015) 1135–1145.616

[23] X. Liu, G. Zhao, N. Masoud, Q. Zhu, Trajectory Planning for Connected and Automated617

Vehicles: Cruising, Lane Changing, and Platooning, SAE International Journal of618

Connected and Automated Vehicles (2021, in press).619

[24] Z. Zheng, Recent developments and research needs in modeling lane changing, Trans-620

portation research part B: methodological 60 (2014) 16–32.621

30

[25] E. Larsson, G. Sennton, J. Larson, The vehicle platooning problem: Computational622

complexity and heuristics, Transportation Research Part C: Emerging Technologies 60623

(2015) 258–277.624

[26] J. Cheng, J. Cheng, M. Zhou, F. Liu, S. Gao, C. Liu, Routing in internet of vehicles: A625

review, IEEE Transactions on Intelligent Transportation Systems 16 (2015) 2339–2352.626

[27] B. Paden, M. ˇ

C´ap, S. Z. Yong, D. Yershov, E. Frazzoli, A survey of motion planning627

and control techniques for self-driving urban vehicles, IEEE Transactions on intelligent628

vehicles 1 (2016) 33–55.629

[28] L. Claussmann, M. Revilloud, D. Gruyer, S. Glaser, A review of motion planning for630

highway autonomous driving, IEEE Transactions on Intelligent Transportation Systems631

(2019).632

[29] F. Gritschneder, K. Graichen, K. Dietmayer, Fast trajectory planning for automated633

vehicles using gradient-based nonlinear model predictive control, in: 2018 IEEE/RSJ634

International Conference on Intelligent Robots and Systems (IROS), IEEE, pp. 7369–635

7374.636

[30] C. Katrakazas, M. Quddus, W.-H. Chen, L. Deka, Real-time motion planning meth-637

ods for autonomous on-road driving: State-of-the-art and future research directions,638

Transportation Research Part C: Emerging Technologies 60 (2015) 416–442.639

[31] X. Zeng, J. Wang, Globally energy-optimal speed planning for road vehicles on a given640

route, Transportation Research Part C: Emerging Technologies 93 (2018) 148–160.641

[32] G. De Nunzio, C. C. De Wit, P. Moulin, D. Di Domenico, Eco-driving in urban traﬃc642

networks using traﬃc signals information, International Journal of Robust and Nonlin-643

ear Control 26 (2016) 1307–1324.644

[33] H. Rakha, R. K. Kamalanathsharma, Eco-driving at signalized intersections using v2i645

communication, in: 2011 14th international IEEE conference on intelligent transporta-646

tion systems (ITSC), IEEE, pp. 341–346.647

[34] B. Li, Y. Zhang, Y. Feng, Y. Zhang, Y. Ge, Z. Shao, Balancing computation speed648

and quality: A decentralized motion planning method for cooperative lane changes of649

connected and automated vehicles, IEEE Transactions on Intelligent Vehicles 3 (2018)650

340–350.651

[35] C. Liu, C.-Y. Lin, M. Tomizuka, The convex feasible set algorithm for real time op-652

timization in motion planning, SIAM Journal on Control and optimization 56 (2018)653

2712–2733.654

[36] J. Hardy, M. Campbell, Contingency planning over probabilistic obstacle predictions655

for autonomous road vehicles, IEEE Transactions on Robotics 29 (2013) 913–929.656

31

[37] E. Galceran, A. G. Cunningham, R. M. Eustice, E. Olson, Multipolicy decision-making657

for autonomous driving via changepoint-based behavior prediction., in: Robotics: Sci-658

ence and Systems, volume 1.659

[38] S. Brechtel, T. Gindele, R. Dillmann, Probabilistic decision-making under uncertainty660

for autonomous driving using continuous pomdps, in: 17th International IEEE Confer-661

ence on Intelligent Transportation Systems (ITSC), IEEE, pp. 392–399.662

[39] H. Guo, C. Shen, H. Zhang, H. Chen, R. Jia, Simultaneous trajectory planning and663

tracking using an mpc method for cyber-physical systems: A case study of obstacle664

avoidance for an intelligent vehicle, IEEE Transactions on Industrial Informatics 14665

(2018) 4273–4283.666

[40] X. Li, Z. Sun, D. Cao, D. Liu, H. He, Development of a new integrated local trajectory667

planning and tracking control framework for autonomous ground vehicles, Mechanical668

Systems and Signal Processing 87 (2017) 118–137.669

[41] C. Alia, T. Gilles, T. Reine, C. Ali, Local trajectory planning and tracking of au-670

tonomous vehicles, using clothoid tentacles method, in: 2015 IEEE Intelligent Vehicles671

Symposium (IV), IEEE, pp. 674–679.672

[42] J. Zhou, H. Zheng, J. Wang, Y. Wang, B. Zhang, Q. Shao, Multi-objective optimization673

of lane-changing strategy for intelligent vehicles in complex driving environments, IEEE674

Transactions on Vehicular Technology (2019).675

[43] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, D. Rus, Intention-676

aware motion planning, in: Algorithmic foundations of robotics X, Springer, 2013, pp.677

475–491.678

[44] H. A. Rakha, K. Ahn, K. Moran, Integration framework for modeling eco-routing strate-679

gies: Logic and preliminary results, International Journal of Transportation Science and680

Technology 1 (2012) 259–274.681

[45] E. Paikari, L. Kattan, S. Tahmasseby, B. H. Far, Modeling and simulation of advisory682

speed and re-routing strategies in connected vehicles systems for crash risk and travel683

time reduction, in: 2013 26th IEEE Canadian Conference on Electrical and Computer684

Engineering (CCECE), IEEE, pp. 1–4.685

[46] K. Ahn, H. A. Rakha, Network-wide impacts of eco-routing strategies: a large-scale686

case study, Transportation Research Part D: Transport and Environment 25 (2013)687

119–130.688

[47] K. Boriboonsomsin, M. J. Barth, W. Zhu, A. Vu, Eco-routing navigation system based689

on multisource historical and real-time traﬃc information, IEEE Transactions on Intel-690

ligent Transportation Systems 13 (2012) 1694–1704.691

[48] A. Duret, M. Wang, A. Ladino, A hierarchical approach for splitting truck platoons692

near network discontinuities, Transportation Research Part B: Methodological (2019).693

32

[49] Y. Wei, C. Avcı, J. Liu, B. Belezamo, N. Aydın, P. T. Li, X. Zhou, Dynamic694

programming-based multi-vehicle longitudinal trajectory optimization with simpliﬁed695

car following models, Transportation research part B: methodological 106 (2017) 102–696

129.697

[50] X. Qian, A. De La Fortelle, F. Moutarde, A hierarchical model predictive control frame-698

work for on-road formation control of autonomous vehicles, in: 2016 IEEE Intelligent699

Vehicles Symposium (IV), IEEE, pp. 376–381.700

[51] M. Neunert, C. De Crousaz, F. Furrer, M. Kamel, F. Farshidian, R. Siegwart, J. Buchli,701

Fast nonlinear model predictive control for uniﬁed trajectory optimization and tracking,702

in: 2016 IEEE international conference on robotics and automation (ICRA), IEEE, pp.703

1398–1404.704

[52] C. Huang, F. Naghdy, H. Du, Model predictive control-based lane change control system705

for an autonomous vehicle, in: 2016 IEEE Region 10 Conference (TENCON), IEEE,706

pp. 3349–3354.707

[53] X. Qian, F. Altch´e, P. Bender, C. Stiller, A. de La Fortelle, Optimal trajectory plan-708

ning for autonomous driving integrating logical constraints: An miqp perspective, in:709

2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC),710

IEEE, pp. 205–210.711

[54] K. Huang, X. Yang, Y. Lu, C. C. Mi, P. Kondlapudi, Ecological driving system for712

connected/automated vehicles using a two-stage control hierarchy, IEEE Transactions713

on Intelligent Transportation Systems 19 (2018) 2373–2384.714

[55] G. Guo, Q. Wang, Fuel-eﬃcient en route speed planning and tracking control of truck715

platoons, IEEE Transactions on Intelligent Transportation Systems 20 (2018) 3091–716

3103.717

[56] S. Brechtel, T. Gindele, R. Dillmann, Probabilistic mdp-behavior planning for cars,718

in: 2011 14th International IEEE Conference on Intelligent Transportation Systems719

(ITSC), IEEE, pp. 1537–1542.720

[57] R. Bellman, A markovian decision process, Journal of mathematics and mechanics721

(1957) 679–684.722

[58] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, MIT press, 2018.723

[59] H. Mouhagir, R. Talj, V. Cherfaoui, F. Aioun, F. Guillemard, Integrating safety dis-724

tances with trajectory planning by modifying the occupancy grid for autonomous vehicle725

navigation, in: 2016 IEEE 19th International Conference on Intelligent Transportation726

Systems (ITSC), IEEE, pp. 1114–1119.727

[60] M. Kamrani, A. R. Srinivasan, S. Chakraborty, A. J. Khattak, Applying markov decision728

process to understand driving decisions using basic safety messages data, Transportation729

Research Part C: Emerging Technologies 115 (2020) 102642.730

33

[61] C. You, J. Lu, D. Filev, P. Tsiotras, Highway traﬃc modeling and decision making for731

autonomous vehicle using reinforcement learning, in: 2018 IEEE Intelligent Vehicles732

Symposium (IV), IEEE, pp. 1227–1232.733

[62] S. Zhou, Y. Wang, M. Zheng, M. Tomizuka, A hierarchical planning and control frame-734

work for structured highway driving, IFAC-PapersOnLine 50 (2017) 9101–9107.735

[63] I. G. Jin, G. Orosz, Optimal control of connected vehicle systems with communica-736

tion delay and driver reaction time, IEEE Transactions on Intelligent Transportation737

Systems 18 (2016) 2056–2070.738

[64] T. D. Gillespie, Fundamentals of vehicle dynamics, Technical Report, SAE Technical739

Paper, 1992.740

[65] Y. Sugiyama, M. Fukui, M. Kikuchi, K. Hasebe, A. Nakayama, K. Nishinari, S.-i.741

Tadaki, S. Yukawa, Traﬃc jams without bottlenecks—experimental evidence for the742

physical mechanism of the formation of a jam, New journal of physics 10 (2008) 033001.743

[66] S.-i. Tadaki, M. Kikuchi, M. Fukui, A. Nakayama, K. Nishinari, A. Shibata,744

Y. Sugiyama, T. Yosida, S. Yukawa, Phase transition in traﬃc jam experiment on745

a circuit, New Journal of Physics 15 (2013) 103034.746

34