PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Predicting future behavior of the surrounding vehicles is crucial for self-driving platforms to safely navigate through other traffic. This is critical when making decisions like crossing an unsignalized intersection. We address the problem of vehicle motion prediction in a challenging roundabout environment by learning from human driver data. We extend existing recurrent encoder-decoder models to be advantageously combined with anchor trajectories to predict vehicle behaviors on a roundabout. Drivers' intentions are encoded by a set of maneuvers that correspond to semantic driving concepts. Accordingly, our model employs a set of maneuver-specific anchor trajectories that cover the space of possible outcomes at the roundabout. The proposed model can output a multi-modal distribution over the predicted future trajectories based on the maneuver-specific anchors. We evaluate our model using the public RounD dataset and the experiment results show the effectiveness of the proposed maneuver-based anchor regression in improving prediction accuracy, reducing the average RMSE to 28% less than the best baseline. Our code is available at https://github.com/m-hasan-n/roundabout.
Content may be subject to copyright.
Maneuver-based Anchor Trajectory Hypotheses at Roundabouts
Mohamed Hasan1, Evangelos Paschalidis2, Albert Solernou2, He Wang1,
Gustav Markkula2and Richard Romano2
Abstract Predicting future behavior of the surrounding
vehicles is crucial for self-driving platforms to safely navigate
through other traffic. This is critical when making decisions like
crossing an unsignalized intersection. We address the problem
of vehicle motion prediction in a challenging roundabout
environment by learning from human driver data. We extend
existing recurrent encoder-decoder models to be advantageously
combined with anchor trajectories to predict vehicle behaviors
on a roundabout. Drivers’ intentions are encoded by a set
of maneuvers that correspond to semantic driving concepts.
Accordingly, our model employs a set of maneuver-specific
anchor trajectories that cover the space of possible outcomes
at the roundabout. The proposed model can output a multi-
modal distribution over the predicted future trajectories based
on the maneuver-specific anchors. We evaluate our model using
the public RounD dataset and the experiment results show the
effectiveness of the proposed maneuver-based anchor regression
in improving prediction accuracy, reducing the average RMSE
to 28% less than the best baseline. Our code is available at
https://github.com/m-hasan-n/roundabout.
I. INTRODUCTION
Predicting future agent states is a crucial task for robot
planning and control in general [1]. In this paper, we address
this problem for self-driving vehicles to be able to predict
behaviors of surrounding vehicles, which is vital for safe and
efficient driving. For example, it is important when deciding
whether to yield to a coming vehicle or to merge into traffic.
To safely drive vehicles and navigate through other traffic,
human drivers need to anticipate the intentions of other road
users. Experienced human drivers are able to confidently
infer future behaviors of the surrounding vehicles [2]. This is
critical when making decisions like crossing an unsignalized
intersection.
In order to build fully autonomous vehicles, they should
be able to predict the future states of its environment and
respond appropriately. Predicting the behavior of human
drivers is necessary for such autonomous platforms to share
the road with humans [3]. More importantly, being able
to predict human driver behaviors will lead to human-like
responding strategies, which is crucial for human road users
to predict the autonomous vehicle behavior. While human
drivers can do this prediction seamlessly, it is still an open
problem for self-driving cars.
Vehicle motion prediction is challenging due to the inher-
ent uncertainty about latent variables such as the motivations
of road users. Additionally, driver behavior tends to be multi-
modal: a driver could make one of many decisions under the
same traffic circumstances [4].
1School of Computing, University of Leeds
2Institute for Transport Studies, University of Leeds
Although this problem has been extensively studied on
highways or on highly structured intersections, fewer works
addressed un-signalized intersections like roundabouts [5].
Roundabouts are popular in urban areas as they do not
require expensive traffic lights, improve safety and have
higher throughput than a four-way stop. Being able to
safely navigate these highly dynamic scenarios is critical for
autonomous vehicles to function properly.
We focus on addressing the problem of trajectory pre-
diction of the human-driven vehicles surrounding an au-
tonomous vehicle in a roundabout environment. In line with
the existing work [4], [6], this is achieved by iteratively
predicting the trajectory of each surrounding agent around
the autonomous one. We call the subject surrounding vehicle
to be predicted as “ego” and the vehicles around the ego
vehicle as “neighbors”. One concrete task to this end is:
given the observed motion trajectories of natural human
drivers (position and heading of the past e.g. 2s) of an ego
vehicle and all its neighbor vehicles on a roundabout, to
predict the future trajectory of this ego vehicle. We seek a
predictive model that outputs a multi-modal distribution over
the predicted trajectory.
We propose a model that employs a fixed set of anchor
trajectories that cover the space of possible outcomes at
the roundabout environment. The anchor trajectories are
associated with a corresponding set of maneuvers that model
the (short-term) intention of human drivers. These maneuvers
(intentions) represent semantic concepts like “slow down”
and “advance to a specific zone of the roundabout”.
Our model can hierarchically factor the uncertainty inher-
ent in the prediction process. First, the maneuver prediction
captures the uncertainty about the driver’s intention and is
encoded as a distribution over a set of discrete intentions. We
model a discrete set of anchor trajectories corresponding to
the maneuver set covering the space of modelled human in-
tentions. Second, given the intended maneuvers and the cor-
responding anchor trajectory, the predicted trajectory models
the uncertainty performing such maneuvers by estimating a
maneuver-specific residual from the given anchor trajectory.
Thus, we introduce an encoder-decoder based model for
vehicle motion prediction that can evaluate the likelihood
of each maneuver and regress a maneuver-specific future
trajectory segment as a residual from the corresponding
anchor trajectory, providing a multi-modal distribution over
the predicted future trajectories based on the maneuver-
specific anchors.
The contributions of our work are three fold: (1) incor-
porating anchor trajectories into recurrent models to predict
vehicle behaviors, (2) parameterization of the whole maneu-
ver space in a roundabout by maneuver-specific anchor tra-
jectories, and (3) extending a pooling strategy to account for
vehicle poses that are predicted as a multi-variate Gaussian
distribution.
II. REL ATED WORK
Recurrent networks for trajectory prediction. Recurrent
Neural Networks (RNNs) represent a rich class of dynamic
models which extend feedforward networks for sequence
generation [7] in diverse domains like speech recognition
[8], machine translation [9] and image captioning [10].
Motion prediction is considered as a sequence generation
task. Hence, a number of RNN-based approaches have been
proposed for trajectory prediction [2]–[4], [6], [11]–[13] of
pedestrians and vehicles. In their seminal work Social LSTM
(Long-Short Term Memory), Alahi et al. [6] extended RNNs
to human trajectory prediction using a social pooling layer
that models nearby pedestrians. Gupta et al. proposed a Gen-
erative Adversarial Networks [14] (GAN): a RNN Encoder-
Decoder generator and a RNN based encoder discriminator,
to predict socially-acceptable multimodal pedestrian trajec-
tories [3]. Social LSTM approach was further improved in
[4] by using convolutional social pooling applied to vehicle
motion prediction on highways. LSTM network is also used
to predict the location of vehicles in an occupancy grid
[15] at different future intervals. Convolutional networks and
LSTM were combined to predict multi-modal trajectories for
an agent on a bird’s eye view image [16]. The majority of
the vehicle prediction literature addressed the problem on
highways and structured intersections, whereas we focus on
the vehicle trajectory prediction on roundabouts.
Multimodal predictive distribution. Deo et al. proposed
a model [4] that outputs a multi-modal predictive distribution
over future trajectories based on maneuver classes. Three lat-
eral and two longitudinal maneuver classes have been consid-
ered. MultiPath model [1] can predict a discrete distribution
over a set of future state-sequence anchors and output multi-
modal future distributions. The GAN based encoder-decoder
architecture in [3] encourages diverse multimodal predictions
of pedestrian trajectories with the introduced variety loss. A
multi-modal probabilistic prediction approach was presented
in [17] based on a Conditional Variational Autoencoder and
is capable of jointly predicting sequential motions of each
pair of interacting vehicles. Zyner et al. [2] presented a model
based on RNN with a mixture density network output layer,
for predicting driver intent at urban single-lane roundabouts
through multi-modal trajectory prediction with uncertainty.
Neither maneuver recognition nor anchor-based regression
were considered in [17] or [2] where the roundabout problem
was addressed.
Anchor trajectories. The concept of predefined anchors
has been effectively applied in machine learning and com-
puter vision to handle multi-modal problems [18]–[20].
These approaches predict the likelihood of anchors as well as
continuous refinements of state conditioned on these anchors.
For the sake of vehicle trajectory prediction, MultiPath model
Fig. 1: The roundabout zones after segmentation.
[1] employs a fixed set of trajectory anchors that are found
in the training data via unsupervised learning. At inference
time, the model predicts a discrete distribution over the
anchors and, for each anchor, regresses offsets from anchor
waypoints along with uncertainties. Considering the set of
anchors as the driver intents, these were not associated
with semantic concepts like “slow down” or “lane change”.
Phan-Minh et. al introduced CoverNet [21] for multimodal,
probabilistic trajectory prediction for urban driving. The
trajectory prediction problem was framed as a classification
over a diverse set of trajectories [22]. The trajectory set was
structured to cover the state space, and eliminated physically
impossible trajectories.
Our work contrasts with the previous approaches in the
following aspects: (1) we design the anchor trajectories to
correspond to the human driver maneuvers, (2) our model
is based on recurrent (not convolutional like [1], [21] and
[22]) networks to encode the trajectory sequence history
and decode its future prediction, and (3) it can capture the
inter dependencies between all the participating vehicles (not
being restricted by a specific grid size like [6] and [4]).
III. ROUN D DATAS ET
The RounD dataset [23] is a new dataset of naturalistic
road user trajectories recorded at German roundabouts. Traf-
fic was recorded using a drone at three different locations,
and the trajectory for each road user and its type was
extracted. Using state-of-the-art computer vision algorithms,
the positional error is typically less than 10 centimetres. The
dataset provides Xand Yposition, velocity and acceleration,
heading, lateral and longitudinal velocity and accelerations
of the tracked objects in a local coordinate system, recorded
at 25 Hz. In addition, the RounD dataset contains the
dimensions (length and width) and class (e.g. car, truck,
pedestrian) of the tracked objects.
A. Dataset Processing
The roundabout environment was segmented -following
[24]- into zones that include: the entrance lanes, the main
roundabout areas, the conflict zones and the exit lanes
(Fig. 1). To focus on the vehicle trajectory prediction at
the roundabout scenario, we refined the dataset to exclude
trajectories: of pedestrian, bicycle and motorcycle classes,
and in none-relevant lanes (IDs 5,8,11 and 200 in Fig. 1). The
vehicle behavior long before it enters the roundabout or long
after it leaves the roundabout is out of the scope of the paper.
Therefore, we additionally removed the trajectory segments
that are farther away from the roundabout (entries/exits) than
the vehicle length.
IV. PROBLEM FORMULATION
Inline with the previous work [1], [4], We formulate the
problem of vehicle trajectory prediction by estimating the
probability distribution of the possible future trajectories of
an ego vehicle conditioned on its track history and on the
track histories of the vehicles around it, at each discrete time
step t. Thus, given observations Xof past trajectories of the
ego and its neighbor vehicles in a scene, our model seeks
to provide a distribution over future trajectories of the ego
vehicle P(Y|X).
A. Inputs and Outputs
The inputs to our model are track histories:
X= [x(tth), ..., x(t1), x(t)](1)
where this a fixed (history) time horizon, and at any time
instant t,
x(t)= [x(t)
0, y(t)
0, θ(t)
0, x(t)
1, y(t)
1, θ(t)
1, ..., x(t)
n, y(t)
n, θ(t)
n](2)
represents the pose of the ego vehicle (subscript 0), and
the surrounding vehicles (subscripts 1to n). The pose is
given by the position (xand ycoordinates) and orienta-
tion θ. We assume that the sensors/computers on-board the
autonomous vehicle can measure/compute the pose of the
ego and neighbor vehicles. The output of the model is a
probability distribution:
Y= [y(t+1), ..., y(t+tf)](3)
where tfis a fixed (future) time horizon, and:
y(t)= [x(t)
0, y(t)
0, θ(t)
0](4)
represents the future pose of the ego vehicle being predicted
at any time instant t.
B. Multi-modal Probabilistic Trajectory Prediction
The multi-modal distribution over future trajectories can
be hierarchically factorized [1]. First, estimate the intent
uncertainty over the set of maneuver classes, and get the
corresponding anchor trajectory. For example, this may be
the uncertainty about advancing to a specific zone of the
roundabout with an intended speed profile (e.g. slowing
down). Second, given the intended maneuvers and their
corresponding anchor, predict the vehicle trajectory as a
residual (offset) from the given anchor trajectory.
Maneuvers are chosen to semantically represent how hu-
man drivers may plan their decisions. Drivers may first
decide a target zone, then choose some acceleration to reach
this target according to the traffic situation. Accordingly,
intentions of drivers are modelled using two discrete sets
of maneuver types: (1) location-wise maneuvers Ml=
{mlp}P
p=1, and (2) acceleration-wise maneuvers Ma=
{maq}Q
q=1. We model the uncertainty over each discrete set
of maneuvers using a softmax distribution. For example, the
distribution over Mlis given by:
P(mlp|X) = exp(flp(X))
Piexp(fli(X)) (5)
where flp(X)is the output of a deep neural network whose
input is the encoding of the history trajectories.
Correspondingly, we model a discrete set of Kmaneuver-
specific anchor trajectories A={ak}K
k=1. Each anchor
trajectory is modelled as a sequence of future poses of the
ego vehicle:
ak= [y(t+1)
k, ..., y(t+tf)
k](6)
Anchor trajectories are defined to cover the joint maneuver
space, i.e. K=P Q. The distribution over the anchor
trajectories can be computed from the probabilities of the
maneuver classes as follows:
P(ak|X) = P(mlp|X)P(maq|X)(7)
Given the intended maneuvers and the corresponding
anchor trajectory, the uncertainty over the future trajectory
is modelled as a Gaussian distribution:
PΘk(Y|ak, X) = N(Y|ak+µk(X),Σk(X)) (8)
where in the Gaussian distribution mean ak+µk(X), the
term µk(X)represents an offset from the given anchor
trajectory ak. This allows the model to refine the static
maneuver-specific anchor trajectories to the current context
[1], with variations coming from the history trajectories of
the vehicles in the scene. This distribution is parameterized
by [4]:
Θk= [Θ(t+1)
k, ..., Θ(t+tf)
k](9)
which are the parameters of a multi-variate Gaussian distribu-
tion at each time step in the future. At any time t, this is given
by the Gaussian parameters: Θt
k= [µ(t)
k,Σ(t)
k], corresponding
to the mean and variance of the future vehicle pose as an
offset from the anchor trajectory pose at the same time step.
The multi-modal output conditional distribution over fu-
ture trajectories can now be expanded in terms of the anchors
as:
P(Y|X) = X
k
P(ak|X)PΘk(Y|ak, X)(10)
which yields to a Gaussian Mixture Model distribution
(GMM) [1]. The mixture weights are defined by the proba-
bilities of the anchors.
V. MANEUVER-BAS ED ANCHOR TR A JE CTO RI E S
A. Maneuver Classes
We consider two types of maneuvers: location-wise Ml
and acceleration-wise Mamaneuvers. The first type defines
the intended future location on the roundabout (P= 8
classes), whereas the second type defines the intended speed-
profile (Q= 3 classes) when advancing to occupy this space.
Fig. 2: An example of location-wise maneuver classes. The
shown trajectories belong to same class vehicles, since their
future will occupy the section grouping lanes 106 and 116.
The circular infrastructure of the roundabout is segmented
into 16 zones (101 to 108 and 111 to 118 as shown in Fig.
1). We group these zones into eight sections which define the
location-wise classes. For example, the trajectories shown in
Fig. 2 are for vehicles belonging to the same class, as their
future trajectories will occupy the section grouping the two
zones 106 and 116. This includes vehicles: advancing from
the previous section (the one grouping zones 105 and 115),
entering the roundabout (from zones 1 and 2) and changing
zones within the same section (between zones 106 and 116).
Although this definition of location classes depends on the
given roundabout segmentation, our model can cope with
any given segmentation as long as it provides a reasonable
number of well-distinguished classes.
Three acceleration-wise maneuvers are considered: slow-
ing down, constant speed and speeding up. The RounD
dataset provides the acceleration of each vehicle. We an-
notated trajectory segments with an acceleration less/greater
than a/+awith the slowing/speeding class labels respec-
tively while marking trajectory segments under the constant-
speed class otherwise.
According to the intended entry and exit lanes, a vehicle
trajectory passes through a sequence of zones of the round-
about. This is accompanied by a sequence of acceleration
intents (e.g. slowing down before the entrance or speeding up
once accepting a gap). Accordingly, the pose of the vehicle at
each time instant is annotated with the labels of the location
and acceleration classes.
B. Anchor Trajectories
Anchor trajectories are defined over the set of the joint
location-acceleration maneuver space. This results in a total
of K= 24 anchor trajectories corresponding to the grid of
maneuvers, as for each of the eight spatial classes in Fig. 2,
we have three anchor trajectories correspond to this zone of
the roundabout (one anchor for each acceleration class).
Given a vehicle trajectory, the pose at each time tis
annotated by two maneuver labels as given in Sec V-A. The
future trajectory segment from t+1 to t+tfis extracted and
annotated by the same maneuver labels. The vehicle poses of
this segment are referenced to the pose at time t. Repeating
this process for all trajectories and for each time step, we end
up with Ktrajectory sets, each of them includes a number of
trajectory segments annotated by the same maneuver labels.
Fig. 3: The set of anchor trajectories. Each of the 8 location
classes includes 3 anchors (corresponding to the acceleration
classes) that are plotted with the same color.
Anchor trajectories are obtained by finding the best rep-
resentative trajectory of each set. This is done in our work
by finding the mean trajectory of a subset of empirically
chosen candidates from each set. The resulting set of anchor
trajectories employed in this work are shown in Fig. 3. Each
anchor trajectory is modelled as a sequence of future poses
of the ego vehicle (as given by (6)) that starts with a zero
pose and has a fixed number of tfdiscrete time steps.
VI. MO D EL
The proposed model is shown in Fig. 4 consisting of an
LSTM encoder, a pooling module, a maneuver recognition
module and an LSTM decoder.
A. Encoder
We encode the state of motion of each vehicle using an
LSTM encoder. At any time t, a sequence of thtime steps of
the track history is passed through the encoder. The LSTM
states for each vehicle are updated frame by frame over the th
past frames [4]. This is applied to all vehicles: the one whose
future is being predicted (ego) and its surrounding vehicles
(neighbors). The final LSTM (hidden) state of each vehicle
is assumed to encode its motion dynamics, and the LSTM
weights are shared across the sequences of all vehicles [6].
B. Pooling Module
The interaction of vehicles in a given scene is captured
by the pooling module. This is achieved by pooling the
LSTM states of all the neighbor vehicles around the ego
vehicle. The output is a pooling vector summarizing the
context information. Social Pooling [6] and its convolutional
extension [4] addressed this problem by proposing a grid
based pooling scheme. However, this solution fails to capture
global context, and may not suit the roundabout environment.
Hence, we extend the pooling mechanism in [3] to model
vehicle poses.
First, at any time tthe neighbor vehicles around the
ego vehicle are identified. The relative pose between each
neighbor and the ego vehicle is then computed, relative to
the ego pose at t. Second, the relative poses are concate-
nated with each vehicle’s LSTM hidden state, and processed
independently by a Multi-Layer Perceptron (MLP). Finally,
The MLP output is Max-pooled to compute the pooling
Fig. 4: Proposed model having the following modules: (1) encoder: learns the vehicle dynamics, (2) pooling: learns the
interaction between vehicles, (3) maneuver: estimates maneuver class probabilities, and (4) decoder: outputs a multi-modal
distribution of the ego vehicle future conditioned on the maneuver-based anchor trajectories.
vector of the ego vehicle. This method can capture the inter-
dependencies among all vehicles, without being restricted to
a specific grid size.
C. Maneuver Recognition
The maneuver recognition module consists of two softmax
layers to recognize the location and acceleration classes. The
input to this module is the LSTM state of the ego vehicle
augmented by the pooling vector. Each softmax layer outputs
the probability of each maneuver class. Probabilities output
by the two layers are multiplied to calculate the probability
for choosing each anchor trajectory (P(ak|X)in (7)). The
maneuver encoding block in Fig. 4 encodes the output from
each softmax layer into a one-hot vector. Both vectors are
concatenated with the trajectory encoding and the resulting
tensor is passed to the decoder module.
D. Decoder
An LSTM decoder is used to generate the conditional
distribution over future trajectories of the ego vehicle. At any
time t, the decoder generates future trajectories over the next
tfsteps in the form of multi-modal distribution. For each
anchor trajectory, the decoder estimates an anchor-specific
mean and variance residual of the vehicle pose.
VII. EXP ER I ME N TS A ND RES ULTS
A. Implementation Details
We use the RounD dataset (consisting of 22 subsets
recorded at three different locations) in our experiments. We
use the data from the 20 subsets that were fully recorded at
the same roundabout (Fig. 1). The whole data are split into
training (71%), validation (10%) and testing (19%) sets. Each
of the three sets is randomly sampled from all the employed
subsets of RounD. Anchor trajectories are computed offline
using the training set.
Vehicle trajectories are split into 6s-segments, where th=
2strack history and tf= 4 sprediction horizon are used.
These 6s-segments are sampled at the dataset sampling rate
of 25 Hz, and then downsampled by a factor of 4 before
going to the LSTMs, to reduce the model complexity.
We implemented our proposed model based on the basic
features of [4] using the same negative log likelihood loss.
We extended these features to account for: (1) anchor-based
prediction, (2) the pooling mechanism given in Sec. VI-B,
and (3) the multi-variate Gaussian distribution. The LSTM
encoder and decoder have 32- and 64-dimensional states
respectively. The fully connected layer used to embed the
vehicle dynamics encoding has a size of 16. The MLP used
in the pooling module has a size of 256 and is followed by
batch-normalization and leaky-ReLU layers. The model is
implemented using PyTorch [25] and trained end-to-end.
B. Evaluation metric
All results are reported in terms of the root of the mean
squared error (RMSE) of the predicted trajectories with
respect to the ground truth future trajectories, over the
prediction horizon. Since the LSTM models generate multi-
variate Gaussian distributions, the means of the Gaussian
components are used for RMSE calculation.
C. Baselines
We characterised the performance of the following models
in increasing complexity:
2D: vehicle trajectory is represented by the position (x
and y), similar to [3], [4], [6]. Maneuvers and anchors
are not included.
3D: vehicle trajectory is represented by the pose (po-
sition and heading θ). Maneuvers and anchors are not
included.
3D-M: adds the maneuver recognition module to the
previous 3D model.
3D-A-P: adds the anchors estimation to the previous
model. This is our full proposed model. At the evalua-
tion time, the model outputs the maximum a posteriori
probability (MAP) trajectory estimate using the anchor
trajectory having the maximum probability.
3D-A-W: a variation of our full proposed model. At
evaluation time, the model outputs a weighted sum of
the MAP trajectory estimates from all anchors. The
weights are defined by anchor probabilities P(ak|X).
D. Results
RMSE results of the baseline models are reported in Table
I. Although the results of the 2D model are better than its
counterpart 3D model at the short horizon (up to 1s), its pre-
diction accuracy degrades with increasing prediction horizon.
This shows how the heading of a vehicle is a valuable signal,
TABLE I: Performance comparison of the proposed model
in increasing complexity using the RMSE of the predicted
trajectories (m) at different prediction horizons (s).
Prediction
Horizon (s) 2D 3D 3D-M 3D-A-P 3D-A-W
1 0.53 0.55 0.42 0.39 0.38
2 1.62 1.32 0.87 0.84 0.80
3 3.10 2.50 1.95 1.87 1.76
4 4.92 4.02 3.31 3.27 3.08
especially when crossing a roundabout. Using the maneuver-
recognition module in the 3D-M model improves the results
further by reducing the RMSE average by nearly 24%.
The full proposed model has been evaluated in two ways:
(1) using the 3D-A-P model that predicts a single MAP
trajectory estimate using the most-likely anchor trajectory,
and (2) through the 3D-A-W model that outputs a GMM
distribution, with mixture of weights defined by the proba-
bilities of the anchors. Table I shows that the latter improves
the prediction accuracy over all other baselines. More im-
portantly, comparing the results shows the effectiveness of
generating a multi-modal distribution against a single best-
estimate.
VIII. CONCLUSIONS
We demonstrated how existing encoder-decoder based
models can be advantageously combined with anchor tra-
jectories to predict vehicle behaviors. We also argue for the
benefit of using anchor trajectories that are based on human
drivers’ intentions. The effectiveness of the proposed model
employing multi-modal estimation and anchor-based regres-
sion, has been proved through experiments on the public
RounD dataset. One limitation of the proposed approach is
its pure dependence on vehicle track histories. Contextual
information like road semantics and visual cues can be
used to improve the prediction accuracy. Incorporating such
information into the current model will be addressed in our
future work.
ACKNOWLEDGMENT
The work described in this paper is supported by Veri-
CAV project, which is funded by the Centre for Connected
and Autonomous Vehicles, via Innovate UK (Grant number
104527). This paper is published with kind permission from
the VeriCAV consortium: Horiba MIRA, Aimsun, University
of Leeds, and Connected Places Catapult. This work was un-
dertaken on ARC4, part of the High Performance Computing
facilities at the University of Leeds, UK.
REFERENCES
[1] Y. Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple
probabilistic anchor trajectory hypotheses for behavior prediction,
arXiv preprint arXiv:1910.05449, 2019.
[2] A. Zyner, S. Worrall, and E. Nebot, “Naturalistic driver intention and
path prediction using recurrent neural networks,” IEEE transactions
on intelligent transportation systems, 2019.
[3] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social
gan: Socially acceptable trajectories with generative adversarial net-
works,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2018.
[4] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle
trajectory prediction,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, 2018.
[5] A. Zyner, S. Worrall, and E. Nebot, “A recurrent neural network
solution for predicting driver intention at unsignalized intersections,
IEEE Robotics and Automation Letters, 2018.
[6] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and
S. Savarese, “Social lstm: Human trajectory prediction in crowded
spaces,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016.
[7] A. Graves, “Generating sequences with recurrent neural networks,
arXiv preprint arXiv:1308.0850, 2013.
[8] J. Chorowski, D. Bahdanau, K. Cho, and Y. Bengio, “End-to-end
continuous speech recognition using attention-based recurrent nn: First
results,” arXiv preprint arXiv:1412.1602, 2014.
[9] J. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, and Y. Bengio,
“A recurrent latent variable model for sequential data,” arXiv preprint
arXiv:1506.02216, 2015.
[10] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell:
A neural image caption generator,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2015.
[11] F. Altch´
e and A. de La Fortelle, “An lstm network for highway
trajectory prediction,” in IEEE International Conference on Intelligent
Transportation Systems. IEEE, 2017.
[12] S. H. Park, B. Kim, C. M. Kang, C. C. Chung, and J. W.
Choi, “Sequence-to-sequence prediction of vehicle trajectory via lstm
encoder-decoder architecture,” in IEEE Intelligent Vehicles Sympo-
sium. IEEE, 2018.
[13] H. Xue, D. Q. Huynh, and M. Reynolds, “Ss-lstm: A hierarchical lstm
model for pedestrian trajectory prediction,” in IEEE Winter Conference
on Applications of Computer Vision (WACV). IEEE, 2018.
[14] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial
networks,” arXiv preprint arXiv:1406.2661, 2014.
[15] B. Kim, C. M. Kang, J. Kim, S. H. Lee, C. C. Chung, and J. W. Choi,
“Probabilistic vehicle trajectory prediction over occupancy grid map
via recurrent neural network,” in IEEE International Conference on
Intelligent Transportation Systems, 2017.
[16] A. Bhattacharyya, B. Schiele, and M. Fritz, “Accurate and diverse
sampling of sequences based on a “best of many” sample objective,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2018.
[17] Y. Hu, W. Zhan, L. Sun, and M. Tomizuka, “Multi-modal probabilistic
prediction of interactive behavior via an interpretable model,” in IEEE
Intelligent Vehicles Symposium. IEEE, 2019.
[18] C. M. Bishop, Pattern recognition and machine learning. springer,
2006.
[19] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object
detection using deep neural networks,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2014.
[20] Y. Yang and D. Ramanan, “Articulated human detection with flexible
mixtures of parts,” IEEE transactions on pattern analysis and machine
intelligence, 2012.
[21] T. Phan-Minh, E. C. Grigore, F. A. Boulton, O. Beijbom, and E. M.
Wolff, “Covernet: Multimodal behavior prediction using trajectory
sets,” in Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2020.
[22] F. A. Boulton, E. C. Grigore, and E. M. Wolff, “Motion prediction us-
ing trajectory sets and self-driving domain knowledge,arXiv preprint
arXiv:2006.04767, 2020.
[23] R. Krajewski, T. Moers, J. Bock, L. Vater, and L. Eckstein, “The round
dataset: A drone dataset of road user trajectories at roundabouts in
germany,” submitted.
[24] E. Paschalidis, A. Solernou, M. Hasan, G. Markkula, H. Wang, and
R. Romano, “Estimation and safety validation of a roundabout gap-
acceptance model in a simulated environment,” in Proceedings of the
Road Safety and Simulation Conference conference, 2021, p. to appear.
[25] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito,
Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differ-
entiation in pytorch,” Conference on Neural Information Processing
Systems, 2017.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. Previous work has employed a variety of methods, including multimodal regression, occupancy maps, and 1-step stochastic policies. We instead frame the trajectory prediction problem as classification over a diverse set of trajectories. The size of this set remains manageable due to the limited number of distinct actions that can be taken over a reasonable prediction horizon. We structure the trajectory set to a) ensure a desired level of coverage of the state space, and b) eliminate physically impossible trajectories. By dynamically generating trajectory sets based on the agent's current state, we can further improve our method's efficiency. We demonstrate our approach on public, real world self-driving datasets, and show that it outperforms state-of-the-art methods.
Conference Paper
Full-text available
For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem has been formalized as a sequence extrapolation problem, where a number of observations are used to predict the sequence into the future. Real-world scenarios demand a model of uncertainty of such predictions, as predictions become increasingly uncertain-in particular on long time horizons. While impressive results have been shown on point estimates, scenarios that induce multi-modal distributions over future sequences remain challenging. Our work addresses these challenges in a Gaussian Latent Variable model for sequence prediction. Our core contribution is a "Best of Many" sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data. Beyond our analysis of improved model fit, our models also empirically outperform prior work on three diverse tasks ranging from traffic scenes to weather data.
Article
Understanding the intentions of drivers at intersections is a critical component for autonomous vehicles. Urban intersections that do not have traffic signals are a common epicenter of highly variable vehicle movement and interactions. We present a method for predicting driver intent at urban intersections through multi-modal trajectory prediction with uncertainty. Our method is based on recurrent neural networks combined with a mixture density network output layer. To consolidate the multi-modal nature of the output probability distribution, we introduce a clustering algorithm that extracts the set of possible paths that exist in the prediction output and ranks them according to probability. To verify the method's performance and generalizability, we present a real-world dataset that consists of over 23,000 vehicles traversing five different intersections, collected using a vehicle-mounted lidar-based tracking system. An array of metrics is used to demonstrate the performance of the model against several baselines.