Content uploaded by Mao Shan
Author content
All content in this area was uploaded by Mao Shan on Oct 19, 2023
Content may be subject to copyright.
Game Theory-Based Simultaneous Prediction and Planning for
Autonomous Vehicle Navigation in Crowded Environments
Kunming Li1, Yijun Chen1, Mao Shan1, Jiachen Li2, Stewart Worrall1and Eduardo Nebot1
Abstract— Navigating crowded environments with substan-
tial pedestrian interactions poses distinctive challenges for
autonomous vehicles (AVs), primarily due to the interdepen-
dence of prediction and planning modules and the necessity
for vehicles and pedestrians to cooperate to achieve mutual
benefits. In this work, we introduce a novel game theory-based
autonomous vehicle stack designed to address these challenges
while ensuring safe and efficient navigation in such complex
settings. Our primary contributions encompass the development
of an efficient Nash equilibrium approximation, facilitating
simultaneous prediction and planning outputs, and effectively
resolving the “chicken and egg” problem inherent in traditional
autonomous software stacks. Moreover, we formulate a payoff
function for the game that encapsulates the interactions among
agents in the environment and demonstrate that our proposed
stack improves classic probabilistic predictor. Through evalu-
ations conducted using real-world datasets, we show that the
proposed stack not only enhances prediction accuracy but also
improves the planning success rate while mitigating integration
errors. Further results can be accessed at the following link:
https://youtu.be/ctGg4-ifCYQ.
I. INTRODUCTION
Autonomous vehicles (AVs) are commonly designed with
modular architectures that consist of various components
responsible for tasks such as perception, prediction, and
planning. These modular structures are typically favored
due to their verifiability, interpretability, and generalization
capabilities [1] [2].
However, standard modular architectures also present chal-
lenges when integrating individual components into a co-
hesive system. Issues such as compounding errors, infor-
mation bottlenecks, and integration difficulties may emerge
[1]. Compounding errors involve the accumulation of in-
accuracy throughout multiple stages of a modular system,
potentially leading to significant discrepancies in the final
output. Information bottlenecks occur when the data flow
between modules is constrained, limiting the system’s ability
to fully utilize available information and possibly resulting in
suboptimal decision-making. Integration difficulties relate to
the challenges of combining various modules into a seamless
system, often requiring careful coordination and optimization
of interfaces between modules to ensure effective commu-
nication and overall system performance. Addressing these
challenges is crucial when developing a novel autonomous
vehicle stack for safe and efficient navigation in crowded
environments, including interactions with pedestrians.
1Australian Centre for Robotics, The University of Sydney, Sydney, NSW
2006, Australia. E-mails: {k.li, y.chen, m.shan, s.worrall,
e.nebot}@acfr.usyd.edu.au.
2Stanford University, USA. E-mail: jiachen li@stanford.edu.
Fig. 1. A sample result from a real-world dataset [3]. The green
lines represent the future trajectories of pedestrians, while the red line
demonstrates the path planned by the proposed stack.
Moreover, the relationship between prediction and plan-
ning in autonomous vehicles resembles a “chicken and egg”
problem due to their interdependence and cyclical nature [4]
[5]. Accurate predictions are vital for generating safe driving
trajectories, while planning module actions can influence pre-
dictions by altering the vehicle’s behavior and affecting sur-
rounding objects, such as pedestrians. This interdependence
complicates determining which module should be prioritized
and presents challenges in optimizing and coordinating the
two components effectively. Integrated approaches that si-
multaneously address prediction and planning often provide
solutions to this problem, allowing better communication and
coordination between modules, leading to improved safety
and efficiency in autonomous vehicle operation, especially
in crowded environments.
Autonomous navigation in crowded environments can be
viewed as a non-zero-sum game, where all agents, including
vehicles and pedestrians, cooperate for mutual benefit, focus-
ing on primary driving factors such as collision avoidance,
path following, and comfort [6] [7] [8]. However, solving this
game presents several challenges including but not limited to:
1) the reward functions of the agents are generally unknown,
making it difficult to assess the actions’ effectiveness; 2)
finding a Nash equilibrium is computationally demanding
due to the extensive search space and complex interactions
between agents; and 3) incomplete or imperfect information
on agents’ strategies and payoffs adds uncertainty to the
decision-making process.
In this work, we present a novel stack for autonomous
navigation in crowded environments. Our major contribu-
tions include: 1) the development of a game theory-based
autonomous vehicle stack with an efficient Nash equilib-
rium approximation, which enables simultaneous prediction
and planning outputs, effectively addressing the inherent
“chicken and egg” problem in traditional stacks; 2) the defini-
tion of a payoff function for the game, which captures the in-
teractions between the agents in the environment; and 3) the
demonstration that our proposed stack improves probabilistic
prediction. Through evaluations using real-world datasets, we
show that the proposed stack not only improves prediction
accuracy but also increases the planning success rate, while
reducing integration errors. This innovative approach offers a
more efficient and safer solution for autonomous navigation
in highly dynamic environments with a significant level of
vehicle-pedestrian interaction.
II. REL AT ED WORK
A. Probabilistic Models for Pedestrian Motion Prediction
Probabilistic models have been widely employed to quan-
tify prediction uncertainty in the context of pedestrian motion
prediction. Some approaches generate a distribution for each
pedestrian directly [9]–[11], while others sample from a
learned distribution [12], [13]. Previous studies [9], [14],
[15] have demonstrated promising performance in predicting
pedestrian future trajectories by modeling social interactions
across RNN hidden states using pooling layers. Recent
research [16], [17] has shown that Graph Attention Networks
(GATs) can effectively incorporate social interactions as
attention in pooling layers, resulting in superior performance.
Studies by [17], [18] have highlighted the effectiveness of
CNN-based approaches for pedestrian trajectory prediction,
which generally require fewer computational resources than
Recurrent Neural Networks for sequence prediction. Li et
al. [18] proposed a spatial-temporal CNN-based model with
a graph attention network for pedestrian trajectory predic-
tion, which improved prediction accuracy while maintaining
computational efficiency. These works have shown promising
performance in pedestrian motion prediction, but they have
not considered the response from the autonomous vehicle or
the mutual interaction between planning and prediction. In
addition, these probabilistic models also output significant
uncertainty, as the outputs are represented by Gaussian
parameters of the predicted distribution and evaluated by the
best sample.
B. Autonomous Vehicle Stack
Autonomous vehicle stacks can be broadly categorized
into two primary types: modular stacks and end-to-end
stacks. Modular autonomous vehicle stacks emphasize the
employment of neural networks in differentiable AV mod-
ules, such as perception and prediction [19]. Numerous
studies have investigated the tighter integration of prediction
within planning [20]–[26], but these works typically utilize
predictions in planning without enhancing the prediction
module using information from planning. In contrast, end-
to-end stacks learn a policy network directly from data,
benefiting from the elimination of information bottlenecks
and performance scaling with increasing dataset sizes [27]–
[30]. However, fully end-to-end approaches suffer from a
lack of interpretability, which is essential for debugging,
verification, and safety guarantees. Moreover, the general-
izability of purely data-driven methods to unseen or rare-
but-critical scenarios remains unclear [1] [2]. Notably, end-
to-end methods are designed to process raw sensor data
and form decisions based on that information, incorporat-
ing intermediary tasks such as object detection and agent
behavior prediction. However, a crucial obstacle is that the
entirety of this procedure is governed by neural networks,
which complicates the integration with conventional, system-
atic planning and control algorithms. These algorithms are
frequently employed in practical applications due to their
established efficiency and dependability.
C. Game-theoretic Modeling of Crowds Navigation
Game-theoretic modeling can enhance the decision-
making of ego-autonomous vehicles in traffic scenarios. It
models traffic as a game where the ego vehicle aims to reach
its destination safely and efficiently, while pedestrians aim to
walk smoothly and avoid collisions. The reasonable strategies
of the ego vehicle and pedestrians fall into the concept of
Nash equilibrium. For example, if the ego vehicle deviates
from its current planning trajectory, it may potentially result
in damage and prevent itself from reaching its destination.
There is extensive literature on game-theoretic road-user
interaction models in applications such as modeling lane
changes and merging onto motorways, route selection, and
socio-economic choices [31]–[36]. There are three streams
of research based on the number of road-users involved,
including two agents game [33], [34], small group game
[36], and crowd game [35]. Several research gaps exist in
the current literature. The most extensive effort of research
is spent on two-agents game models. Most studies focus on
a limited decision space for agents, involving actions such as
deceleration, maintaining speed, staying in the current lane,
changing lanes, and similar choices. Additionally, apart from
the study conducted by [33], the majority of research lacks
experimental validation using real-world datasets.
III. METHODOLOGY
A. Problem Formulation
Consider a scenario that encompasses an autonomous
vehicle as the ego vehicle indexed by 0and a set of
pedestrians denoted by P={1,2, . . . , P }. The ego vehicle
and pedestrians are predicted to move with varying positions
a[t]
i:= (x[t]
i, y[t]
i)∈ A[t]
i⊆R2, i ∈ N := {0}∪P over
prediction time steps t∈ Tpred := {1,2, . . . , T }, where
those A[t]
i, i ∈ N , t ∈ Tpred represents the position space of
Fig. 2. This flowchart outlines our proposed game theory-based autonomous navigation stack, designed to guide vehicles efficiently and safely through
crowded environments. The multi-agent scene simplifies the interaction dynamics into a two-player game to reduce computational complexity. A predictor
module, grounded in deep learning, uses graph neural encoding to capture the social interactions among pedestrians based on their historical states,
subsequently generating a probabilistic distribution of potential future pedestrian paths. A subset of potential trajectories is then sampled, representing the
predicted pedestrians’ possible trajectories. The stack iterates over all possible strategy support combinations for both players. With each combination, the
stack computes the constraints implied by the strategy supports, distinguishing strategies with either positive or zero probabilities. Support enumeration aids
in identifying optimal response strategies under these constraints. Utilizing a payoff function, our stack enhances the prediction of the crowd’s response
to the ego vehicle’s strategy, and accordingly identifies an optimal navigation strategy that minimizes potential conflicts and ensures safe navigation in
densely populated areas.
the ego vehicle and pedestrians predicted by deep learning
models (See more detail in Section III-B.2).
We now present the normal-form game representation of
the aforementioned scenario. The ego vehicle and pedestrians
are considered to be players indexed in set N. Each player’s
strategy space is considered as any possible time-sequential
positions (x[t]
i, y[t]
i), t ∈ Tpred predicted by probabilistic
models or deep learning predictors. Then, each player i∈ N
selects its strategy ai:= (x[t]
i, y[t]
i)t∈Tpred from its strategy
space Ai:= Πt∈Tpred A[t]
i⊆R2Tto maximize its payoff
Ji. For player 0(the ego vehicle), it aims to achieve its
navigation goal b⋆:= (x⋆
0, y⋆
0)while preventing collision
and thus ensuring safety. Hence, player 0’s payoff function
is such a mapping that
J0(a0,a1,...,aP):Πj∈N Aj→R.
For player i∈ P (pedestrians), it seeks to reduce the jerk of
its movement and maintain comfortable distances from other
players including the ego vehicle and other pedestrians. As a
result, the payoff function of player i∈ P is such a mapping
that
Ji(a0,...,ai,...,aP):Πj∈N Aj→R.
In game theory, Nash equilibrium is a common solution
concept, describing a profile of strategies where no player has
an incentive to unilaterally deviate from their current strategy,
assuming that all other players are also maintaining their
current strategies. The goal of the aforementioned normal-
form game is to determine Nash equilibrium.
B. Efficient Nash Equilibrium Approximation
Solving multi-agent games for autonomous navigation
in crowded environments is challenging due to the vast
strategy space and complex interactions between diverse
agents. Incomplete or imperfect information and the compu-
tationally demanding nature of finding a Nash equilibrium
further complicate real-time decision-making. In this work,
we propose an efficient approach for approximating the
Nash equilibrium, which simplifies the game to a two-player
model, reducing the computational complexity. The strategy
set for the crowds is narrowed down, and the best response
pairs for each player are identified by enumerating pure
strategy supports, enabling a more efficient determination of
the Nash equilibrium.
1) Simplified Game Representation for Efficient Interac-
tion: In this section, we present a simplified version of the
game representation, as discussed in Section III-A. Here, we
consider a scenario involving two primary players. Player
0represents the ego vehicle, while player 1embodies the
surrounding crowds, encompassing all pedestrians in set P.
Player 0selects its strategy a0from its strategy space A0to
optimize its payoff function J0. As player 1signifies all sur-
rounding pedestrians, player 1chooses its strategy {ai}i∈P
from its strategy space Πi∈P Ai⊆R2P T to maximize its
payoff function 1
PPi∈P Ji.
In this game formulation, the ego vehicle’s strategy space
A0originates from alternative trajectories provided by the
planner, whereas pedestrians’ strategy space Πi∈PAiare
the output of the predictor. Although this simplified version
of the game encompasses only two players, it retains the
fundamental characteristics of crowd navigation involving
multiple pedestrians.
The proposed Nash equilibrium approximation algorithm,
presented in Section III-B.3, offers an optimal solution for
non-cooperative interactions between the ego vehicle and
multiple pedestrians. Under this framework, neither the ego
vehicle nor the pedestrians can unilaterally modify their
strategies to attain better outcomes, thereby ensuring a stable
and efficient navigation strategy.
2) Using Deep Learning Predictor to Narrow Strategy
Sets: To approximate a Nash equilibrium for multi-agent
interaction, it is essential to assess all possible scenarios.
Nonetheless, it is quite difficult to enumerate every interac-
tion between all agents and establish the optimal response
for each. To streamline this process, we utilize the Social-
STGCNN pedestrian motion predictor [17], which is trained
on real-world data and comprises two main components:
the Spatio-Temporal Graph Convolution Neural Network
(ST-GCNN) and the Time-Extrapolator Convolution Neural
Network (TXP-CNN).
For a given pedestrian trajectory sequence, we construct
a graph representation of the scene. At each time step, an
undirected graph Gt= (Vt,At)is defined, where each node
vi
t∈Vtsignifies the displacement of agent ifrom the
previous timestep, accounting for Npedestrians. The graph
edges are denoted using a weighted adjacency matrix At. The
entire spatio-temporal graph G=G1,...,GTobs spanning
the observed time period Tobs is utilized as input to the ST-
GCNN.
We employ the graph spatial convolution operation, as
introduced in [17], on G. This generalizes convolution from
an array to graph representation, using a kernel to aggregate
features in a local graph neighborhood across spatial and
temporal dimensions, akin to grid-based CNNs [37] [17]:
ˆ
At=˜
D−1
2
t(At+I)˜
D−1
2
t,(1)
where ˜
Dtdenotes the diagonal degree matrix of node con-
nections for (At+I)at time step t,Irepresents the identity
matrix, and the normalized weight adjacency matrix at time
tis denoted by ˆ
At.
The input to each layer lof the ST-GCNN and TXP-CNN
are shown as Eq. (2a) and Eq. (2c), where Vand ˆ
Adenote
the concatenations of Vtand ˆ
Atalong the time dimension
respectively, for all t∈Tobs :
H(l+1) =σ(ˆ
AH(l)W(l))(2a)
U0=σtxp(HW0
txp)(2b)
U(k+1) =σtxp((UkWk
txp) + Uk), k ≥0,(2c)
where σtxp and Wk
txp represent the activation function and
layer-specific trainable weight matrices, respectively.
This model outputs bi-variate Gaussian distribution ˆ
Yt
i∼
N(ˆµt
i,ˆσt
i,ˆρt
i)for pedestrian i∈ P at each time step
t∈ Tpred, from which discrete strategy space Πi∈P Ai
are sampled. Accounting for the joint distribution of x
and yat each time tduring the sampling process would
significantly increase the computational cost, particularly
when considering the interactions between all pedestrians.
To reduce computational complexity and enable real-time
operation of the algorithm, we disregard this correlation and
directly sample predicted trajectories for all individuals from
the distribution. This method enables us to concentrate on a
narrower set of pedestrian future trajectories, thereby making
it more manageable to pinpoint the best responses for each
agent.
3) Approximating Nash Equilibrium: We propose Algo-
rithm 1to efficiently compute pure strategy NE for simplified
game representation in Section III-B.1. It efficiently finds
pure strategy NE by eliminating strategies that lead to colli-
sions, computing payoff matrices, and iteratively identifying
the best response strategies for each player. Algorithm 1
returns a set of NEs, which represent stable strategies where
neither player has an incentive to unilaterally deviate.
Algorithm 1: ComputePureStrategyNE
Input: Player 0’s discrete strategy space A0and
Player 1’s discrete strategy space Πi∈PAi
Output: Approximated NE(s) a∗
0,a∗
1,· · · ,a∗
P
// Eliminate collision strategies:
1Aelim
0← ∅,Aelim
1← ∅;
2foreach strategy s0in player 0’s A0do
3foreach strategy s1in player 1’s Πi∈P Aido
4if s0and s1do not collide then
5Aelim
0← Aelim
0∪ {s0};
6Aelim
1← Aelim
1∪ {s1};
// Compute payoff matrices Q0and Q1
7foreach strategy s0in player 0’s Aelim
0do
8foreach strategy s1in player 1’s Aelim
1do
9j←index(s0), k ←index(s1);
10 Q0(j, k)←J0(s0, s1);
11 Q1(j, k)←1
PPi∈P Ji(s0, s1);
// Find NE by best response (BR) process
12 NE ← ∅;
13 foreach strategy s0in player 0’s Aelim
0do
// Compute player 1’s BR given s0
14 j←index(s0);
15 BR1= arg maxs1Q1(j, index(s1));
16 foreach strategy s1in BR1do
// Compute player 0’s BR given BR1
17 k←index(s1);
18 BR0= arg maxs0Q0(index(s0), k);
// Pair best response
19 if s0∈BR0then
20 NE ←NE ∪(s0, s1)
21 Return NE
C. Payoff Functions for Optimal Prediction and Path Plan-
ning
We propose payoff functions for pedestrians and au-
tonomous vehicles to optimize safety, efficiency, and pedes-
trian motion prediction in autonomous navigation. These
functions guide each player’s behavior, encouraging strate-
gies for better prediction and path planning. This enhances
the performance and safety of the navigation system.
1) Payoff Function for Ego Vehicle: For autonomous ve-
hicles, the payoff function is formulated to reward strategies
that lead to optimal path planning, taking into considera-
tion several criteria. These include avoiding collisions with
pedestrians, maintaining a safe distance from pedestrians,
and reaching the destination within the shortest possible time.
The function assigns higher rewards to planning paths that
strike a balance between safety and efficiency, resulting in
a smoother navigation experience. The ego vehicle’s payoff
function takes the form
J0=−w1∥a[T]
0−b⋆∥2
|{z }
navigation goal
−w2d1
P T X
t∈Tpred
X
i∈P
1d1(∥a[t]
i−a[t]
0∥2)
|{z }
vehicle safety term
with indicator function 1d1(x)=1if x>dand 1d1(x)=0
otherwise. Here, d1represents the safe distance threshold
between the ego vehicle and pedestrians, serving as a de-
terminant for maintaining collision avoidance. The weight
factors w1and w2play an important role in determining
the relative importance of the navigation goal and the safety
requirement. To be more specific, w1represents the weight
assigned to the “vehicle navigation goal”, which is a measure
of how well the ego vehicle is able to reach its destination.
On the other hand, w2represents the weight assigned to the
“vehicle safety term”, which is a measure of the ego vehicle’s
proximity to pedestrians.
2) Payoff Functions for Pedestrians: For pedestrians, the
payoff function is designed to reward predicted behaviors
that maintain a consistent velocity, avoid collisions with other
pedestrians, and ensure a safe distance between pedestrians
and vehicles. The function takes into account multiple fac-
tors, such as the relative speed and distance between pedes-
trians, and assigns higher rewards to behaviors that adhere to
the desired safety and efficiency objectives. This contributes
to identifying the best motion prediction for pedestrians,
leading to enhanced safety and smoother navigation. The
payoff function of pedestrian i∈ P is characterized by
Ji=−w3
T−3
T−3
X
t=1
∥a[t+3]
i−3a[t+2]
i+ 3a[t+1]
i−a[t]
i∥1
|{z }
pedestrian i’s jerk term
−w4d2
TX
t∈Tpred
1d2(∥a[t]
i−a[t]
0∥2)
| {z }
pedestrian i’s safety term with ego vehicle
−w5d3
T(P−1) X
t∈Tpred
X
j∈P,j =i
1d3(∥a[t]
j−a[t]
i∥2)
| {z }
pedestrian i’s safety term with other pedestrians
,
where d2represents the safe distance threshold between
the pedestrian and the ego vehicle, while d3represents
the safe distance threshold between a pedestrian and other
pedestrians. Specifically, w3represents the weight assigned
to the “pedestrian jerk term”, which is a measure of the
smoothness of the pedestrian’s movement. On the other hand,
w4and w5represent the weight assigned to the “pedestrian
i’s safety term” related to ego vehicle and other pedestrians,
which are two measures of the pedestrian i’s proximity to
the ego vehicle and other pedestrians j∈ P, j =i.
D. Implementation
The implemented predictor model comprises one ST-
GCNN layer and five TXP-CNN layers. Both temporal
convolutions and graph convolutions utilize a kernel size
of 3. The deterministic and probabilistic models employ
and are trained for 150 epochs with a batch size of 128.
Since the VCI dataset does not provide sufficient data for
training, the ETH/UCY dataset is used to train the predictor
model and test it on the VCI dataset. This work employs
the vehicle trajectory planner proposed by [38], which uses
Frenet coordinates to represent road positions more intu-
itively compared to traditional Cartesian coordinates. Frenet
coordinates employ variables sand dto describe a vehicle’s
position on the road or a reference path. These coordinates
offer a simpler mathematical representation of a reference
path, as its run length is described by the s-axis. During the
enumeration of supports, the best planning trajectory that
takes safety and comfort into account is identified.
IV. EXP ERI MEN T
In this experimental section, the main goal is to evaluate
the effectiveness of our proposed stack in terms of reducing
prediction uncertainty and improving integration error. A
series of meticulously designed experiments are conducted
to provide a comprehensive assessment of the stack’s perfor-
mance.
For this experiment, the Vehicle-Crowd Interaction (VCI)
dataset [3] is utilized, and an open-loop validation approach
is adopted, implying that no position feedback is provided
for the ego vehicle. Vehicle trajectories from the VCI dataset
are selected, and our proposed method is applied to perform
path planning. By comparing the resulting path planning to
the ground truth, we can effectively gauge the performance
of our proposed approach.
A. Dataset
In this work, the Vehicle-Crowd Interaction (VCI), ETH
[39] and UCY [40] are utilized. The VCI dataset, collected
from two bustling campus locations, captures interactions
between pedestrians and vehicles in situations where neither
party has priority. One location includes a pedestrian cross-
walk at an unsignalized intersection, while the other consists
of a spacious shared area where pedestrians and vehicles
can move freely. The ETH dataset comprises two top-down
view scenes converted to world coordinates, recorded at an
observation frequency of 2.5 Hz. The UCY dataset contains
single top-down view urban scenes, also observed at a 2.5
Hz frequency.
Owing to the limited size of the VCI dataset, the predic-
tive models require more comprehensive data for training
purposes. Consequently, the larger ETH and UCY datasets
are utilized for training, while the VCI dataset is reserved
for testing and evaluating the performance of the developed
models.
In the testing phase, certain vehicle trajectories within the
VCI dataset that are either very short or remain stationary
are identified and removed. This step ensures a more accurate
and reliable evaluation of the planning model’s performance,
as it focuses on meaningful interactions between pedestrians
and vehicles rather than insignificant or irrelevant data points.
B. Evaluation Metrics
•Average Displacement Error (ADE): The mean Eu-
clidean distance between ground truth and predicted
trajectories for each predicted timestep is computed.
•Final Displacement Error (FDE): The Euclidean dis-
tance between ground truth and predicted trajectories
at the last predicted time step is determined.
•Prediction Collision Rate (COL): This metric represents
the percentage of collision occurrences in the predicted
trajectories for all pedestrians.
•Success Rate (SR): The proportion of planning scenar-
ios in which the vehicle successfully avoids colliding
with pedestrians, among all tested scenarios, is assessed.
C. Baseline
The first baseline method comprises a standard au-
tonomous vehicle stack, which includes prediction and plan-
ning modules. This method predicts 20 possible future tra-
jectories for each pedestrian and subsequently generates path
planning that avoids these predicted trajectories, representing
the conventional approach in autonomous vehicle navigation.
Another baseline method we compare against is a planner
that utilizes ground truth pedestrian future trajectories as in-
put for path planning. This method offers an ideal scenario in
which the planner has access to the actual future trajectories
of pedestrians, allowing us to evaluate the performance of
our proposed stack relative to this idealized baseline. We also
conduct a comparison between our method and a traditional
probabilistic predictor with respect to prediction outcomes.
Our stack picks out one or several potential trajectories from
the predicted set. For comparison purposes, we generate a
single trajectory from both the classic probabilistic predictor
and our proposed stack.
To ensure a fair comparison across different stacks, we
employ the same predictor and implementation methodology
for all methods being evaluated. This approach allows us to
isolate the impact of our proposed method on the overall
performance of the stack in the context of autonomous
vehicle navigation and pedestrian interaction.
V. RE SULTS A ND DISCUSSION
In the experiment presented in this paper, we assess the
effectiveness of our proposed method across several key
aspects of autonomous vehicle navigation and pedestrian
interaction. The results of the experiment demonstrate the
following major findings:
Refined probabilistic prediction: Our proposed stack sig-
nificantly improves probabilistic predictions for future pedes-
trian trajectories. Previous probabilistic prediction models
usually output distributions of pedestrians’ future trajectories,
wherein each point on the predicted trajectory is considered
as an independent event. In the real world, this assumption
may not always be valid, as the movement of pedestrians
often depends on their previous positions and movements,
their current context, and their goals. Therefore, treating
each point as an independent event can lead to considerable
uncertainty in the trajectory prediction. To attenuate the
risk of collisions, it is beneficial for such probabilistic
prediction models to sample a vast array of trajectories
from the predicted distribution. However, this practice may
conversely cause a substantial reduction in the search space
for path planning. In the standard probabilistic predictor,
a distribution of potential future positions is produced, as
demonstrated by the contour in Fig. 3. Our method success-
fully refines the range of predicted trajectory sets, providing
planners with flexibility and accommodating large search
spaces. In standard probabilistic predictions, overlapping
predicted distributions imply that pedestrians are expected
to collide with each other. However, in real-world situations,
pedestrians typically avoid collisions. This discrepancy in-
dicates that the standard probabilistic prediction contains a
considerable amount of uncertainty, which could result in
suboptimal planning decisions. Our proposed method ad-
dresses this issue by incorporating more realistic assumptions
about pedestrian behavior, leading to a significant reduction
in prediction uncertainty. As demonstrated in the table, the
proposed stack reduces collision rates by 4%.
Improved path planning: The proposed stack not only
improves the prediction aspect of the navigation process but
also enhances the overall path planning. By utilizing our
method, the autonomous vehicle is able to generate safer
and more efficient plans for navigating through dynamic
environments involving pedestrian interactions. As shown
in Table I, the game theory-based stack achieves a higher
success rate compared to the standard probabilistic predic-
tion. This is because the standard probabilistic prediction
produces excessive uncertainty, which significantly reduces
the planner’s search space and results in failure to find
successful paths to the destination. It can be observed from
the table that the success rate is the lowest when the planner
uses ground truth for planning. This is because the planner
relies on short-term (5-second) predictions of pedestrian
trajectories to plan the path, which may lead to a scenario
where the vehicle avoids collisions with pedestrians in the
short term but ends up in the most congested area, ultimately
colliding with pedestrians. This local planning approach
leads to a low success rate for this stack. Incorporating a
TABLE I
COMPARISON OF STACK PERFORMANCE: P REDICTION AND PLANNING
IN RE AL-WORLD DATASE T.
Method ADE FDE COL SR
Ground truth - planner / / / 42%
Standard stack (predictor - planner) 1.18 2.08 10% 48%
Ours 0.99 1.71 6% 51%
(a) (b)
(c) (d)
Fig. 3. Figure (a)-(d): Experimental results using real-world data for autonomous vehicle path predictions in crowded environments. Figures (a) and (b)
present a comparison between the autonomous vehicle path predictions (red) and human driver trajectories (blue), illustrating the successful navigation of
the vehicle through the crowds. Figures (c) and (d) compare the pedestrian trajectory predictions from a deep learning model (blue) to those from the
proposed game theoretic stack (red), demonstrating the improvement in probabilistic prediction and enhanced performance of the game theoretic stack.
broader range of pedestrian movement possibilities into the
stack can improve the planner’s success rate. In Figure 1,
due to the reward function’s settings (higher rewards for
the vehicle closer to the destination), the vehicle chooses
a path closer to the endpoint. However, as the vehicle gets
closer to pedestrians, it immediately changes its direction
to maintain a safe distance, selecting an alternative path
with fewer pedestrian interactions to reach the destination.
Figure 2 demonstrates that, in highly congested situations,
the planner’s path is remarkably similar to that of a human
driver.
Enhanced prediction accuracy: Our proposed stack not
only reduces uncertainty but also improves the overall accu-
racy of the predictions. By incorporating the mutual effects
between prediction and planning, our model is capable of
generating more precise estimations of pedestrian trajecto-
ries, leading to more reliable navigation in complex environ-
ments. We compared the predicted trajectory from the pre-
dicted distribution to the prediction in the Nash equilibrium
state, and observed that both the average displacement error
and the final displacement error decreased, as illustrated in
Table I. This demonstrates the effectiveness of our proposed
method in capturing these nuanced pedestrian behaviors and
improving prediction accuracy.
VI. CONCLUSIONS AND FUTURE WORK
This work presents a novel game theory-based autonomous
vehicle stack for safe and efficient navigation in crowded
environments with significant pedestrian interaction. The
proposed stack effectively addresses the challenges of stan-
dard modular architectures, such as compounding errors,
information bottlenecks, and integration difficulties. Our ap-
proach employs an efficient Nash equilibrium approximation
that enables simultaneous prediction and planning outputs,
providing a solution to the “chicken and egg” problem.
The stack incorporates a payoff function capturing agent
interactions in the environment. Evaluations using real-
world datasets demonstrated reduced prediction uncertainty,
enhanced accuracy, and improved planning success rate
while mitigating integration errors. The proposed approach
contributes to autonomous navigation solutions in dynamic
environments and highlights the importance of effective
communication and coordination between prediction and
planning modules.
One limitation of our work is the adoption of open-loop
validation. This methodology implies that the behavior of
the pedestrians remains unaffected by the implementation of
the new game theory planner as the data is predetermined
and obtained from a static dataset. This limits the scope
of interaction and adaptability, thereby potentially affecting
the effectiveness of the autonomous navigation system in
dynamic real-world scenarios. Furthermore, there is ample
room for improvement in the individual modules utilized
in our work. The planning algorithms we employed are
more commonly used in less congested environments, such
as highways, as opposed to densely populated areas. A
comprehensive exploration and integration of diverse planner
modules specifically designed for crowded environments
could potentially augment the performance and robustness of
our proposed navigation stack. Future work should address
these limitations and further improve upon the described
system.
REFERENCES
[1] P. Karkus, B. Ivanovic, S. Mannor, and M. Pavone, “Diffstack: A
differentiable and modular control stack for autonomous vehicles,” in
Conference on Robot Learning. PMLR, 2023, pp. 2170–2180.
[2] K. Leung, S. Veer, E. Schmerling, and M. Pavone, “Learning
autonomous vehicle safety concepts from demonstrations,” arXiv
preprint arXiv:2210.02761, 2022.
[3] D. Yang, L. Li, K. Redmill, and ¨
U. ¨
Ozg¨
uner, “Top-view trajectories:
A pedestrian dataset of vehicle-crowd interaction from controlled
experiments and crowded campus,” in IV, 2019.
[4] S. Eiffert and S. Sukkarieh, “Predicting Responses to a Robot’s Future
Motion Using Generative Recurrent Neural Networks,” in ACRA,
2019.
[5] S. Eiffert, N. Wallace, H. Kong, N. Pirmarzdashti, and S. Sukkarieh,
“Path Planning in Dynamic Environments using Generative RNNs and
Monte Carlo Tree Search,” ICRA, 2020.
[6] S. Bansal, A. Cosgun, A. Nakhaei, and K. Fujimura, “Collaborative
planning for mixed-autonomy lane merging,” in 2018 IEEE/RSJ inter-
national conference on intelligent robots and systems (IROS). IEEE,
2018, pp. 4449–4455.
[7] C. Kim and R. Langari, “Game theory based autonomous vehicles
operation,” International Journal of Vehicle Design, vol. 65, no. 4, pp.
360–383, 2014.
[8] Y. Rahmati, A. Talebpour, A. Mittal, and J. Fishelson, “Game theory-
based framework for modeling human–vehicle interactions on the
road,” Transportation research record, vol. 2674, no. 9, pp. 701–713,
2020.
[9] S. Eiffert, K. Li, M. Shan, S. Worrall, S. Sukkarieh, and E. Nebot,
“Probabilistic Crowd GAN: Multimodal Pedestrian Trajectory Pre-
diction using a Graph Vehicle-Pedestrian Attention Network,” IEEE
Robotics and Automation Letters, 2020.
[10] B. Ivanovic and M. Pavone, “The Trajectron: Probabilistic Multi-Agent
Trajectory Modeling With Dynamic Spatiotemporal Graphs,” in ICCV,
2019.
[11] T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajec-
tron++: Multi-agent generative trajectory forecasting with heteroge-
neous data for control,” arXiv:2001.03093, 2020.
[12] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social
GAN: Socially acceptable trajectories with generative adversarial
networks,” in CVPR, 2018.
[13] V. Kosaraju, A. Sadeghian, R. Mart´
ın-Mart´
ın, I. Reid, S. H.
Rezatofighi, and S. Savarese, “Social-BiGAT: Multimodal Trajectory
Forecasting using Bicycle-GAN and Graph Attention Networks,” in
NeurIPS, 2019.
[14] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and
S. Savarese, “Social LSTM: Human Trajectory Prediction in Crowded
spaces,” in CVPR, 2016.
[15] P. Zhang, W. Ouyang, P. Zhang, J. Xue, and N. Zheng, “SR-LSTM:
State Refinement for LSTM towards Pedestrian Trajectory Prediction,”
in CVPR, 2019.
[16] P. Veliˇ
ckovi´
c, G. Cucurull, A. Casanova, A. Romero, P. Li `
o, and
Y. Bengio, “Graph Attention Networks,” in ICLR, 2018.
[17] A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-
STGCNN: A social spatio-temporal graph convolutional neural net-
work for human trajectory prediction,” in CVPR, 2020.
[18] K. Li, S. Eiffert, M. Shan, F. Gomez-Donoso, S. Worrall, and E. Nebot,
“Attentional-GCNN: Adaptive pedestrian trajectory prediction towards
generic autonomous vehicle use cases,” in ICRA, 2021.
[19] X. Weng, B. Ivanovic, and M. Pavone, “Mtp: multi-hypothesis tracking
and prediction for reduced error propagation,” in 2022 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2022, pp. 1218–1225.
[20] J. Ziegler, P. Bender, T. Dang, and C. Stiller, “Trajectory planning for
bertha—a local, continuous method,” in 2014 IEEE intelligent vehicles
symposium proceedings. IEEE, 2014, pp. 450–457.
[21] C. Liu, S. Lee, S. Varnhagen, and H. E. Tseng, “Path planning for
autonomous vehicles using model predictive control,” in 2017 IEEE
Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 174–179.
[22] B. Ivanovic, A. Elhafsi, G. Rosman, A. Gaidon, and M. Pavone, “Mats:
An interpretable trajectory forecasting representation for planning and
control,” arXiv preprint arXiv:2009.07517, 2020.
[23] Y. Chen, U. Rosolia, C. Fan, A. Ames, and R. Murray, “Reactive
motion planning with probabilisticsafety guarantees,” in Conference
on Robot Learning. PMLR, 2021, pp. 1958–1970.
[24] S. Schaefer, K. Leung, B. Ivanovic, and M. Pavone, “Leveraging
neural network gradients within trajectory optimization for proactive
human-robot interactions,” in 2021 IEEE International Conference on
Robotics and Automation (ICRA). IEEE, 2021, pp. 9673–9679.
[25] S. Casas, C. Gulino, S. Suo, and R. Urtasun, “The importance of
prior knowledge in precise multimodal prediction,” in 2020 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS).
IEEE, 2020, pp. 2295–2302.
[26] B. Ivanovic and M. Pavone, “Injecting planning-awareness into pre-
diction and detection evaluation,” in 2022 IEEE Intelligent Vehicles
Symposium (IV). IEEE, 2022, pp. 821–828.
[27] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp,
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to
end learning for self-driving cars,” arXiv preprint arXiv:1604.07316,
2016.
[28] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep rein-
forcement learning framework for autonomous driving,” arXiv preprint
arXiv:1704.02532, 2017.
[29] A. Jain, L. Del Pero, H. Grimmett, and P. Ondruska, “Autonomy
2.0: Why is self-driving always 5 years away?” arXiv preprint
arXiv:2107.08142, 2021.
[30] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D.
Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in 2019
International Conference on Robotics and Automation (ICRA). IEEE,
2019, pp. 8248–8254.
[31] R. Elvik, “A review of game-theoretic models of road user behaviour,”
Accident Analysis & Prevention, vol. 62, pp. 388–396, 2014.
[32] N. Radwan, W. Burgard, and A. Valada, “Multimodal interaction-
aware motion prediction for autonomous street crossing,” The Interna-
tional Journal of Robotics Research, vol. 39, no. 13, pp. 1567–1598,
2020.
[33] W.-C. Ma, D.-A. Huang, N. Lee, and K. M. Kitani, “Forecasting in-
teractive dynamics of pedestrians with fictitious play,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2017, pp. 774–782.
[34] A. Turnwald, W. Olszowy, D. Wollherr, and M. Buss, “Interactive
navigation of humans from a game theoretic perspective,” in 2014
IEEE/RSJ International Conference on Intelligent Robots and Systems.
IEEE, 2014, pp. 703–708.
[35] B. L. Mesmer and C. L. Bloebaum, “Modeling decision and game
theory based pedestrian velocity vector decisions with interacting
individuals,” Safety science, vol. 87, pp. 116–130, 2016.
[36] F. T. Johora and J. P. M¨
uller, “Modeling interactions of multimodal
road users in shared spaces,” in 2018 21st International Conference
on Intelligent Transportation Systems (ITSC). IEEE, 2018, pp. 3568–
3574.
[37] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[38] M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimal trajectory
generation for dynamic street scenarios in a frenet frame,” in 2010
IEEE International Conference on Robotics and Automation. IEEE,
2010, pp. 987–993.
[39] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool, “You’ll never
walk alone: Modeling social behavior for multi-target tracking,” in
ICCV, 2009.
[40] A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,”
in Computer graphics forum, vol. 26, no. 3, 2007, pp. 655–664.