PreprintPDF Available

Adaptive Splitting of Reusable Temporal Monitors for Rare Traffic Violations

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Autonomous Vehicles (AVs) are often tested in simulation to estimate the probability they will violate safety specifications. Two common issues arise when using existing techniques to produce this estimation: If violations occur rarely, simple Monte-Carlo sampling techniques can fail to produce efficient estimates; if simulation horizons are too long, importance sampling techniques (which learn proposal distributions from past simulations) can fail to converge. This paper addresses both issues by interleaving rare-event sampling techniques with online specification monitoring algorithms. We use adaptive multi-level splitting to decompose simulations into partial trajectories, then calculate the distance of those partial trajectories to failure by leveraging robustness metrics from Signal Temporal Logic (STL). By caching those partial robustness metric values, we can efficiently re-use computations across multiple sampling stages. Our experiments on an interstate lane-change scenario show our method is viable for testing simulated AV-pipelines, efficiently estimating failure probabilities for STL specifications based on real traffic rules. We produce better estimates than Monte-Carlo and importance sampling in fewer simulations.
Content may be subject to copyright.
Adaptive Splitting of Reusable Temporal Monitors for Rare Traffic
Violations
Craig Innes and Subramanian Ramamoorthy1
Abstract Autonomous Vehicles (AVs) are often tested in
simulation to estimate the probability they will violate safety
specifications. Two common issues arise when using existing
techniques to produce this estimation: If violations occur rarely,
simple Monte-Carlo sampling techniques can fail to produce ef-
ficient estimates; if simulation horizons are too long, importance
sampling techniques (which learn proposal distributions from
past simulations) can fail to converge. This paper addresses both
issues by interleaving rare-event sampling techniques with on-
line specification monitoring algorithms. We use adaptive multi-
level splitting to decompose simulations into partial trajectories,
then calculate the distance of those partial trajectories to failure
by leveraging robustness metrics from Signal Temporal Logic
(ST L). By caching those partial robustness metric values, we
can efficiently re-use computations across multiple sampling
stages. Our experiments on an interstate lane-change scenario
show our method is viable for testing simulated AV-pipelines,
efficiently estimating failure probabilities for STL specifications
based on real traffic rules. We produce better estimates than
Monte-Carlo and importance sampling in fewer simulations.
I. STATIS TI CA L SIMULATION FOR AV TESTI NG
Autonomous Vehicles (AVs) typically undergo rigorous
simulated testing before deployment [39]. A standard set of
steps for testing is as follows: First we define a scenario (e.g.,
a highway lane-change expressed in OpenScenario [42]).
Next, we define a safety specification (e.g., “avoid impeding
traffic flow”) in a formal language like Signal Temporal
Logic (ST L). Then, we run stochastic simulations to estimate
the probability our AV-system violates our specification [11].
This statistical simulation approach is used because modern
AVs contain “black box” components like Neural-Network
perception modules and non-linear solvers [45]. Such compo-
nents provide few analytical guarantees over their behaviour.
A core problem plaguing statistical simulation is estimat-
ing rare events. Consider a stochastic simulation scenario
where there exists a 104probability that random noise in
the sensors will cause our AV to “fail” (i.e., to violate our
safety specification). If we ran 100 simulations, it is likely
none would produce a failure. Even if sampling did produce
a failure, estimation variance would be unacceptable [24].
Many works address rare-event problems for AVs via
Importance Sampling [25], [6]: Importance samplers draw
simulations from a proposal distribution where the factors
leading to a failure occur more frequently. The final es-
timate of failure probability is then re-weighted to reflect
the original distribution. Since we do not know in advance
all combinations of states which result in failure, such
Authors are with the School of Informatics, University of Edinburgh,
10 Crichton St, EH8 9AB, United Kingdom. Corresponding author:
craig.innes@ed.ac.uk
20 40 60 80 100
−4
−2
0
2
4
6
13
15
100
“Preserve Traffic Flow”
[0,](¬slow_leading_vehicle(xego, xo1...o3) =preserves_flow(xego))
Fig. 1: Lane-change. Moving vehicles (blue) shown with
trajectory. ‘Ego’ vehicle must avoid static obstacles (red).
We monitor the safety constraint shown in English and ST L.
techniques must learn a good proposal. This learning step
has no convergence guarantees, and probability estimates
from such adaptive techniques can have unbounded error [3].
Importance sampling also tends to fare better when failures
are caused by instantaneous single-state errors, but in the AV
domain, failures often occur as a result of accumulated errors
over dependent states [9].
This paper instead proposes an approach to AV rare-event
simulation based on merging Adaptive Multi-level Splitting
(AMS) [9] with ST L monitoring. AMS relies on estimating
probabilities for a sequence of decreasing failure thresholds
γ1> γ2· · · > γM, where the final γMis equivalent to the
rare failure event of interest. The key idea is that estimating
any intermediate γi(given γi1) is easier than estimating γM
outright. To adapt AMS from estimating isolated phenomena
(e.g., particle transport [30]) to estimating complex AV-
system failures, we face two issues:
The main issue is how to consistently produce simulations
which fall below those intermediate failure thresholds γ0...M ,
and how to efficiently measure the distance to failure in
the first place. Our approach measures failure using metrics
for evaluating STL specification robustness. By leveraging
online monitoring [13], we can cache the robustness values
of partial trajectories, stop simulations at the point where
they fall below the current threshold, and re-sample from
this point onwards to produce trajectories which fall below
subsequent failure thresholds.
A secondary issue is generating stochastic AV perceptual
errors. Approaches which assume noise follows a well-
known (e.g., Gaussian) state-independent distribution [7] are
insufficient to capture the perceptual variety of a typical AV-
system—a LiDAR detector may be great for close range
traffic, but terrible for long range or occluded traffic [36].
We therefore use a Perception Error Model (PEM) [37]—a
surrogate trained on real sensor data which mimics percep-
tual errors encountered in regular operation (Sec II-A).
arXiv:2405.15771v1 [cs.RO] 13 Mar 2024
The main contribution of this paper is a new method
for assessing failure probability in AV-scenarios (Sec III),
combining AMS (Sec III-B), PE Ms (Sec II-A), and online
ST L monitoring (Sec III-C). Our experiments focus on the
case study of a highway lane-change scenario, and show our
method can be used to test a full AV-pipeline—perception
down to control (Sec IV). Our approach outperforms Monte-
Carlo sampling, as well as fixed and adaptive importance
sampling, across various ST L specifications (Sec IV-A).
To limit the scope of experiments, this paper exclusively
considers probabilistic noise in the perception system as the
primary source of simulation stochasticity (As is standard in
other works [10]). However, our proposed sampling method
can easily be applied to simulators which consider other
sources of stochasticity such as those arising from traffic
behaviour or physical uncertainties.
II. PROBA BI LIT Y OF FAILURE IN BLACK BOX
SIMULATION
Consider the lane-change maneuver in Fig (1). Our car
(the left-most ego vehicle), must change to the left lane
to avoid an obstacle, then re-merge. Formally, let’s assume
our scenario takes place over a total of Ttime steps. We
denote the d-dimensional state of our ego vehicle at time tas
xego
tRd; other vehicles as xoi
t. For succinctness, we write
xt={xego
t, xo0
t, . . . , xoM
t}for the combined state. The state
xtcontains the position, velocity and rotation of each vehicle.
At each time step t, the ego vehicle’s control system takes
an action atR2(desired acceleration and turn-velocity)
with the aim of minimizing costs associated with competing
driving goals (e.g., maintaining a reference velocity and
minimizing abrupt steering), and subject to constraints (e.g.,
limits on acceleration, avoiding collisions, staying within
road boundaries). We can run a simulation of this system
to generate a trajectory τ= [(x0, a0). . . (xT, aT)], where
τ[i:j]= [(xi, ai). . . , (xj, aj)]) denotes a partial slice. For
a given scenario, we wish to test whether our above AV -
system will violate an ST L safety specification φ. Due to
probabilistic noise in the perception system, our simulator
is inherently stochastic. Therefore our aim is to calculate
the probability that, for a random run of our simulator, our
AV-system will violate φ:
Pfail =E[1{τφ}](1)
Where 1{τφ}is an indicator function which returns 1
if τviolates φand 0otherwise. To explain how our method
efficiently calculates (1), we first cover the pre-requisites
for perception, tracking, and control for simulating our AV
(Sec II-A-II-B). We then cover defining safety specifications
φin ST L, and how to quantify their satisfaction using a
robustness metric (III-A). We can then describe our main
contribution—interleaving online monitoring and Adaptive
Multi-level Splitting to estimate a failure probability for AV-
systems via statistical simulation (III-B).
30 40 50 60 70 80 90
−4
−2
0
2
4
6
13
15
100
Fig. 2: Lane change with PEM observations, tracking, and
prediction. Green/Orange crosses show PE M obstacle obser-
vations. Purple dots/lines mark estimated/predicted positions.
A. Simulated State Estimation with PEMs
Most AV problems assume our system does not have access
to the true state xt. Instead our AV must estimate this state
via observations from its sensors (see Fig (2)). At time t, let
us denote a full-snapshot of the world by wt. This snapshot
implicitly contains the relevant ground-truth state xt, but also
other scene information (e.g., vehicle types, dimensions etc.).
A sensor Scan be thought of as a function which takes
wtand produces high-dimensional raw sensor data S(wt)
(e.g., a LiDAR point-cloud). This sensor data is then passed
to a perception function f(e.g., a neural-network obstacle
detector [38]), to produce an observation ytRd(e.g., the
bounding boxes of other vehicles relative to the ego):
yt=f(S(wt)) (2)
By using standard tracking algorithms [48], we can use
these observations to get an estimate ˆxtof the current state:
ˆxt=E[x|y0...t](3)
If we have a one-step vehicle dynamics model fdyn, we
can use it to predict the state in future time steps:
ˆxt+i|t=fdyn · · · fdyn
| {z }
itimes
xt)(4)
Here, ˆxt+i|tdenotes the predicted state at time t+igiven
an estimate at i. In a real-world system, this setup allows us
to sense, estimate, track and predict the state; in simulation,
we have a problem: fis typically a data-driven percep-
tion module trained on real sensors, but most simulators
cannot generate high fidelity sensor inputs (e.g., photo-
realistic images). We can resolve this issue by re-framing
our perception system as a noisy projection from the state
space to observation space:
yt=f(S(wt)) = Hxt+ϵ(g(wt))) (5)
In this view, fis composed of a d×dprojection matrix
Hon state xt, plus stochastic error dependent on the current
world state wt. The ϵfunction is a surrogate model known as
aPerception Error Model [37]. This is a probabilistic model
of the original AV’s perception noise, dependent on salient
features g(w)extractable from simulated w. Salient features
can include obstacle positions, dimensions, occlusion, or
environment factors. We model ϵas a gaussian process [49],
where m,κare mean/kernel functions1:
ϵ(w)GP(m(g(w)), κ(g(w), g (w))) (6)
Now, instead of using real-world sensor inputs directly,
our simulator applies probabilistic noise from the PE M based
on the current simulated state. This makes each run of
the simulator inherently stochastic, as different amounts of
perceptual noise may be applied to observations on each run.
B. Model Predictive Control for Highway Maneuvers
For a sufficiently complex control task such as lane
changing, choosing the best actions atat each time step
is a non-linear constrained control task. We phrase the
controller of our AV system under test as a Receding-
Horizon Model-Predictive Control optimization [41]. Eq (7)
provides a formal definition of the optimization problem:
At each time step t, the controller aims to choose actions
at:(t+H)over a finite time horizon Hwhich minimize a cost
function J(x, a). Actions must be chosen subject to a set of
constraints cj(x, a)and obey physical dynamics fdyn:
min
xt:(t+H)
at:(t+H)
t+H
P
k=t
J(xk, ak)
s.t. xt= ˆxt
k, xk+1 =fdyn (xk, ak)
j, cj(xk, ak)<0
(7)
Cost function J(x, a)balances multiple factors such as
tracking a reference velocity, minimizing abrupt movement,
and staying close to the lane centre. The constraint functions
cj(x, a)ensure states and actions remain feasible (e.g., that
the car stays within road bounds and acceleration limits).
We give a further breakdown of the cost function and
implementation in Section (IV). However, the purpose of
describing Eq (7) here in the context of our testing problem
is to highlight that our AV-controller represents yet another
“black-box” component of our system: Despite behaving
deterministically, solvers for nonlinear control problems are
not guaranteed to find a global solution, and can perform
arbitrarily poorly [18]. Other typical methods (such as rein-
forcement learning), pose the same problem.
III. ESTIMATING SPE CI FIC ATI ON FAILURE PRO BABI LI TY
We have defined our testing goal and outlined the compo-
nents of our system-under-test. Now we can show how we
formalize our safety properties, how we draw samples from
the simulator, and how those aspects interact.
A. Specifying Safety with Signal Temporal Logic
We can express AV traffic rules involving statements about
continuous values over time using Signal Temporal Logic
[33]. ST L has grammar:
1The nuances of GP -inference and kernel choice are beyond the scope of
this paper, but see [17] for discussion.
φ:= | η| ¬φ|φ1φ2|Iφ|
Iφ|φ1UIφ2|HIφ|OIφ(8)
Here, ηis any predicate ρ(s)b0(with bRand
ρ:X R). Iφmeans φis always true at every future
time in interval I.Iφmeans φmust eventually be true in
I,φ1UIφ2means φ1must remain true within Iuntil φ2is
true. HI,OIare past versions of I,I.
We can convert τand φto a robustness metric L(φ, τ, i)
over trajectories [19]. This measures how strongly τsatisfied
φ(starting from time i). Large positive values indicate robust
satisfaction; negative values a strong violation; near-zero
values a trajectory on the boundary of satisfaction/violation.
Eq (9) shows a subset of Ls semantics. Other operators
follow similar definitions [13].
L(ρ(τ)>0, τ, i) = ρ(xi)
L(¬φ, τ, i) = −L(φ, τ, i)
L(φ1φ2, τ, i) = min(L(φ1, τ, i),L(φ2, τ , i))
L(Iφ, τ, i) = inf
ii+IL(φ, τ, i)
(9)
B. Estimating the Rare Event with Splitting
With our PEM,AV-controller, specification φand metric
L, we now describe our adaptive sampling contribution:
Given Nsimulated trajectories τ(1...N), we wish to calculate
Eq (1)—the probability a simulation violates φ. A naive
approach might use Monte-Carlo sampling:
E[1{L(φ, τ, 0) < γ}]1
N
N
X
i=1
1{L(φ, τi,0) < γ}(10)
However, when violating φis rare, the number of simula-
tions needed to achieve low relative error rapidly becomes in-
feasible [24]. We instead take a multi-level splitting approach
[9]. Rather than immediately estimating failure (γ= 0) we
instead estimate decreasing thresholds γ1> γ2>· · · >
γ. At each stage m, starting with Ntrajectories we take
two steps. First, we discard trajectories that do not fall
below threshold γm. Second, we replenish back up to N
trajectories. To replenish discarded trajectories, we clone a
random un-discarded τ(i)up to t—the first time step where
L(τ(i)
[0:t], φ, 0) < γm. Then, we re-simulate τ(i)from tto
T. This ensures all Ntrajectories at stage mare below γm.
With staged, partial re-samplings, Eq (1) becomes:
M
Y
m=1
P(L(φ, τ, 0) < γm|L(φ, τ , 0) < γm1)(11)
Given enough levels, each conditional probability should
be significantly larger than P(L(φ, τ, 0) < γ). The final
computation:
ˆpams =(M
Y
m=1
NKm
N)×1
N
N
X
i=1
1{L(φ, τ (i),0)}(12)
has Mstages and Ninitial simulations, where Kmis
the number discards per stage. Unlike adaptive importance
sampling, AMS guarantees convergence as N [8].
C. Adaptive Splits via Online STL Monitors
To achieve our high-level goal of adapting AMS to AV
testing of ST L specifications, we currently face a slight
computational dilemma: Our robustness metric L(τ, φ, 0)
defines a single batch robustness value from the start to
the end of a complete trajectory, and takes computation
time proportional to the length of τ. However in Section
(III-B), we saw that the discard and replenishment steps
require access to the robustness values at arbitrary prefixes
of trajectories. In other words, AMS re-simulation requires
online computation of all partial trajectories:
nL(τ(i)
[0:t], φ, 0) |t[0, T ], i [0, N ]o(13)
We resolve this dilemma by taking a key insight from
the online monitoring literature—we can interleave partial
computations into the AMS process. To achieve this in-
terleaving, we must first slightly alter our definition of L
from (9) to define our metric in terms of partial rather than
full trajectories. Eq (14) is a modified metric Ln(where n
references the “nominal semantics” of [13], augmented with
past operators). This definition makes explicit that we only
partially evaluate trajectory τup to fixed time step t:
Ln(ρ(x)>0, τ[0:t], i) = ρ(τ[i])
Ln(¬φ, τ[0:t], i) = −Ln(φ, τ[0:t], i)
Ln(Iφ, τ[0:t], i) = inf
i(i+I[0,t])(Ln(φ, τ[0:t], i))
Ln(Iφ, τ[0:t], i) = sup
i(i+I[0,t])
(Ln(φ, τ[0:t], i))
Ln(OIφ, τ[0:t], i) = sup
i(iI[0,t])
(Ln(φ, τ[0:t], i))
Ln(HIφ, τ[0:t], i) = inf
i(iI[0,t])(Ln(φ, τ[0:t], i))
Ln(φ1UIφ2, τ[0:t], i) = sup
i2(i+I[0,t])
min Ln(φ2, τ[0:t], i2),
inf
i1[i,i2]Ln(φ1, τ[0:t], i1)(14)
With the above definition, we could now naively compute
all members of (13) by re-evaluating φat every i, up to every
t, at every re-sampling step. This is computationally wasteful,
as the robustness values of many partial trajectories share
many operations with the computations of their prefixes. To
take advantage of this fact, we instead maintain a work-list
for each τ. A work-list stores a mapping from specification
φand time step tto robustness value Ln(φ, τ[0:t],0). By
using a work-list, we can obtain a robustness value for the
partial trajectories at time t+1using just the newly available
state xt+1 and the previous values of the work-list, rather
than repeating computations over the entire trajectory.
Alg (1) takes as input the current work-list at time step
tand sketches how it is updated online using the newly
arrived state xt+1: For predicates ρ(x), incoming state xt+1
is added only if it is in φ’s time horizon. For example, for
φ=[0,2](ρ(x)>0), state x3would not be added, as it falls
outside the relevant interval. For formulas like negation and
conjunction, pointwise operators are leveraged to combine
the existing results from previous sub-formula computations.
Similarly for temporal operators, we can use the sliding min-
max algorithm of [27] to compute running maxes over the
relevant sub-formula intervals. Further optimizations can be
added (e.g., replacing chunks of a work-list with ‘summaries’
as sufficient information arrives [13]) but we omit the details
here. We can access the robustness of a partial trajectory from
the updated work-list by querying w-list[φ][0].
Algorithm 1: Update Work List (Adapted from [13])
1Function upd-wl(w-list, φ,xt+1)
2switch φdo
3case ρ(x)>0do
4if t+1is within time horizon of φthen
w-list[φ][t+ 1] ρ(xt+1 );
5case ¬ψdo
6upd-wl(w-list, ψ,xt+1)
7w-list[φ]Pointwise negation of w-list[ψ]
8case ψ1ψ2do
9upd-wl(w-list, ψ1,xt+1)
10 upd-wl(w-list, ψ2,xt+1)
11 w-list[φ]Pointwise mins of w-list[ψ1]and
w-list[ψ2]
12 case Iψdo
13 upd-wl(w-list, ψ,xt+1)
14 w-list[φ]Sliding min window of width |I|across
w-list[ψ]
15 case Iψdo
16 upd-wl(w-list, ψ,xt+1)
17 w-list[φ]Sliding max window of width |I|
across w-list[ψ]
18 case ψ1UIψ2do
19 upd-wl(w-list, ψ1,xt+1)
20 upd-wl(w-list, ψ2,xt+1)
21 lr-mins Pointwise mins of w-list[ψ1]and
w-list[ψ2]
// Calculate backwards inductively
22 for iin descending timesteps do
23 us[i]max (lr-min[i],min(w-list[ψ1][i], us[i+ 1]))
24 w-list[φ]us
25 case HIψdo
26 upd-wl(w-list, ψ,xt+1)
27 w-list[φ]Sliding min window of width |I|across
(reversed) w-list[ψ]
28 case OIψdo
29 upd-wl(w-list, ψ,xt+1)
30 w-list[φ]Sliding max window of width |I|across
(reversed) w-list[ψ]
Now that we can compute and cache ST L robustness
values for partial trajectories, we can interleave this with our
AMS sampler. Alg (2) provides an overview of our sampling
technique for computing the probability that our AV violates
an ST L specification in a stochastic simulation. It takes as
input a starting state x0, specification φ, failure threshold γ,
initial simulation amount Nand discard rate K.
Lines (1-6) generate Ninitial trajectories by simulating
perceptual observations, control actions, and forward dynam-
ics as outlined in sections (II-A-II-B). Lines (7-8) track the
ST L robustness value of trajectories at every intermediate
time step by maintaining up-to-date work-lists as described
in Alg (1). Lines (9, 17) adaptively set discard thresholds
γmsuch that the Ksafest trajectories are discarded at each
Algorithm 2: Online ST L-AMS
1Function stl-ams(x0,φ,γ,T,K,N)
2for i[1, N], t [0, T ]do
3ϵt, yt,ˆxt, atSample observations from PEM (5-6),
track via (3) and choose actions by solving (7)
4Append xt, atto trajectory τ(i)
// Maintain work-list per trajectory (Alg 1)
5w-list(i)
t+1 upd-wl(w-list(i)
t,φ,xt+1)
6L(i)
t+1 w-list(i)
t+1[φ][0] // Robustness of τ(i)
[0:t+1]
7xt+1 Step forward simulation
8Sort {L(0)
T...L(N)
T}then set γ0L(K)
T
9m0
10 while γm> γ do
11 mm+ 1
12 Discard all trajectories trajectories τ(i)where L(i)
Tγk
13 IkIndices of remaining un-discarded trajectories
14 for i[0, N]\ Ikdo
15 Select a random jIk
16 Find the first time step twhere L(j)
t< γk
17 τ(i)
[0:t], L(i)
[0:t]Copy values from τ(j)
18 τ(i)
[t:T], L(i)
[t:T]Re-simulate τ(j)from time t
19 Sort {L(0)
T...L(N)
T}then set γmL(K)
T
20 return ˆpams via Eqn (12)
stage. To replenish those discarded trajectories, lines (14-16)
copy one of the remaining un-discarded τ(i)up until the
first time step jwhere the robustness value of the partial
trajectory falls below γm. We then re-simulate starting from
jto produce a new trajectory. We repeat this process until the
discard threshold γmfalls below the desired failure threshold
γ, then calculate a final estimate ˆpams using Eq (12).
IV. EXP ERIME NT S
The following experiments use our running example of
an AV lane-change maneuver to evaluate our method. They
demonstrate that Alg (2) can provide failure estimates for
a full black-box AV-system across multiple common traffic
rules. When compared to baselines, we find Alg (2) provides
more accurate failure estimates in fewer simulations. Further,
we investigate how sampling performance differs across
discard-rates and rule types. Fig (1) shows our CommonRoad
simulation setup [2]: The ego starts in the centre-lane at
15mwith velocity 20 m/s. Its primary goal is to track a
reference velocity of vg=30m/s. Three obstacles impede it—
a static obstacle at 40m(forcing the ego to change lanes);
A centre-lane vehicle at 50mwith velocity 5m/s, which
cuts into the overtake lane after 0.6seconds (slowing the
ego or forcing a lane change); a right-lane vehicle at 50m,
velocity 10m, which merges after 1second (preventing the
ego from slowing abruptly).2Simulations last T=40 steps
(4s,t=0.1s). Dynamics evolve according to a kinematic
single-track model [40].
Our AV-system under test uses a standard pre-trained lidar-
based obstacle detector—OpenPCDet’s Multi-Head PointPil-
lar [26], [47]. As described in Section (II-A), we train a
surrogate PE M to replicate the behaviour of this detector in
2Full scenario specification and code at github.com/craigiedon/
CommonRules
simulation. The PEM is composed of two separate Gaussian
Processes: The first is a binary classifier, which predicts
whether the lidar perception system would have failed to
detect a given obstacle. The second is a regression model,
which predicts how much noise would typically be added
to the true location of an obstacle’s bounding box. The
GPs were fitted using Pyro [4], with an R BF kernel and
sparse variational regression with 100 inducing points. As
training data for fitting the GP s, we used 65500 entries
from the NuScenes Lidar Validation Set.3The G P input
features were a 7-d vector—the x/y obstacle position, ro-
tation, length/width/height dimensions, obstacle “visibility
category” (where 1 means 40% occlusion; 4 means
80%). The GP outputs consist of a 1-d binary variable for
successful/unsuccessful obstacle detection, and a 3-d real-
valued variable for the offsets between the obstacle’s true
x/y/rotation and the PointPillar estimate.
For tracking and predicting future vehicle locations, we
used the interacting multiple models (IMM) filter for lane-
changes from [7]. This method operates similarly to a typical
kalman filter, but instead of making estimates based on a
single model, it maintains estimates from multiple models
(i.e., for whether the vehicles will stay in the current lane,
switch to the left lane, or switch to the right lane) and merges
those estimates based on each model’s current likelihood.
For model predictive control (Eq (7)) we use the lane-
change controller from [29]. At a high level, its cost function
Jis comprised of 8 sub-goals: Reach a target destination;
track a reference velocity, minimize acceleration, turn veloc-
ity, jerk and heading angle; stay close to the centre of the
nearest lane; and avoid entering the “potential field” of other
obstacles. To solve (7), we used the Gurobi [21] optimizer. To
ensure a feasible control action is always available, we first
pre-solve a convex simplification of (7) with CVXPy [14]
(following [18]). For full implementation details of the cost-
function sub-goals (and the weights used to balance them)
see [29] and the associated code for our paper.
We test our sampling method with respect to 4 formaliza-
tions of rules from the Vienna Convention on Road Traffic
(taken from [32]). Table (I) shows the ST L for each rule.
Full definitions of individual predicates (in_same_lane,
drives_faster etc.) are in [32], but high level descrip-
tions of each rule are as follows: φ1—maintain a minimum
distance from vehicles in front (proportional to vehicle
speed). If a vehicle “cuts in” from an adjacent lane, the
ego gets tcut seconds to re-establish distance. φ2—never
drop acceleration below “unnecessary” levels (relative to
vehicles in front). φ3—velocity should never fall below some
minimum level (unless stuck in traffic). φ4—do not exceed
the speed of left-lane vehicles unless merging from an access
lane, or left-lane traffic is slow moving.
Our algorithm (listed as STL-AMS below) uses N=250
starting simulations, a discard amount of K=25, and final
failure threshold of γ= 0. We compare against three base-
lines: First, a Monte-Carlo sampler (Raw-MC), which runs
3https://www.nuscenes.org/nuscenes
TABLE I: Interstate traffic rules (Predicate definitions in [32]).
Rule Description STL
φ1Safe Dist
from Vehicles
[0,](in_same_lane(xego, xoi)in_front_of(xego, xo)
¬O[0,tcut](cut_in(xo, xego )H[1,](¬cut_in(xo, xego)))
=keeps_safe_distance_prec(xego, xo))
φ2Unnecessary
Braking
[0,](¬unnecessary_braking(xego,{xo1,...,o3}))
φ3Preserve
Traffic Flow
[0,](¬slow_leading_vehicle(xego, xo1...o3) =preserves_flow(xego ))
φ4Don’t Drive
Faster than
Left Traffic
[0,](left_of(xoi, xego)drives_faster(xego, xoi)
=(in_slow_traffic(xoi, x{o1,...o3}\xoi)slightly_higher_speed(xego, xoi))
(on_access_ramp(xego)on_main_carriageway(xoi)))
Nsimulations, estimating via (10). Second, an importance
sampler with a fixed proposal (Imp-Naive). We choose a
proposal distribution which deliberately fails to detect 50%
of obstacles, and applies gaussian noise (µ=0,σ2=1) to
the bounding boxes of those it does detect. Third, a neural
network-based importance sampler with an adaptive proposal
learned via the cross-entropy method (Imp-CE) from [22].
Its inputs and outputs are the same as those of the GP-PE M
described previously. Similar to AMS, adaptive importance
sampling proceeds in stages: At each stage m(for a total of
M= 10 stages), Nm=250 trajectories are sampled and sorted
by robustness. Imp-CE then takes the lowest K=0.1Nm
trajectories4and minimizes their KL-divergence under the
target PEM versus current proposal. The intuition is that
biasing our proposal towards the least robust trajectories in
each stage should train a proposal which samples failure
events with increasing probability.
A. Results and Discussion
Table (II) shows estimated probabilities per rule. For
“ground-truth”, we computed Eq (10) using 100000 simula-
tions. Across rules, our method produces the most accurate
estimates.
Raw-MC gives a reasonable estimate for φ1(the least
rare). For other rules with lower probability, raw sampling
yields zero simulation failures, resulting in 0% estimates.
Imp-Naive produces non-zero estimates for all specifications
except φ4, but vastly underestimates failure probability. This
underscores the difficulty of fixed proposals—if a proposal
distribution is too far from the original target, the likelihood
weights per sample vary wildly, with estimates dominated
by a tiny number of simulations. Imp-CE also produces
unreliable estimates. Figs (3a-3c) provide insights why: As
learning progresses for φ3, the number of failures produced
goes up, yet failure probability goes down. This suggests
Imp-CE learns to bias its distribution towards a small set
of unlikely failures, rather than approaching the true failure
distribution. For φ4, robustness thresholds initially decrease,
but flatline around m=6—no failures found. We observe that
this “flatlining” continues even past the M= 10 stages
documented in this paper. This highlights how challenging it
can be to learn relevant features given longer horizons and
state-spaces.
4Standard practice sets Kin range [0.01,0.2] Nm[11].
02468
Stage
0
1
2
3
4
γm
φ1
φ2
φ3
φ4
(a) Robustness Threshold
02468
Stage
−50
−45
−40
−35
−30
−25
−20
−15
−10
Log Failure Probability
φ1
φ2
φ3
φ4
(b) Log Probs (-below axis)
02468
Stage
0
2
4
6
8
10
12
14
16
Failures
φ1
φ2
φ3
φ4
(c) Failures
Fig. 3: Imp-CE baseline performance over 10 stages of
proposal learning.
The results of Table (II) are encouraging, but we found
Alg (2) was sensitive to discard rate K. Fig (4) shows how
threshold levels evolve at each stage (for Kvalues from 2
to 225). For φ4(the rarest), we found that too low or high
Ks caused unacceptable numbers of “extinctions”[9]—stages
where all trajectories have identical robustness, rendering
replenishment impossible.
Experiments demonstrate Alg (2) is viable for accu-
rately estimating specification failure for a black-box AV-
system. However, this case study looks only at a single
interstate traffic scenario, and our experiments necessarily
have limitations: We considered perceptual disturbance as
the sole source of simulation stochasticity; vehicle starting
configurations remained fixed. Such experiments could be
extended by placing a prior distribution over starts [35],
without altering the method. To trust our estimates, we also
assume our simulator accurately represents reality. Whilst
outside this paper’s scope, clearly this assumption may not
TABLE II: Estimated Failure Probabilities (5 Repetitions)
Method φ1φ2φ3φ4
Raw-MC250 1.2e-02 (±4.0e-03) 0.0 (±0.0) 0.0 (±0.0) 0.0 (±0.0)
Imp-Naive 1.4e-08 (±2.5e-08) 2.8e-08 (±8.4e-08) 2.0e-08 (±3.90e-08) 0.0 (±0.0)
Imp-CE 7.1e-06 (±2.1e-05) 7.6e-22 (±2.2e-21) 4.4e-18 (±8.9e-18) 0.0 (±0.0)
STL -AMS (ours) 8.8e-03 (±6.2e-03) 1.5e-03 (±2.1e-03) 4.7e-03 (±3.4e-03) 3.5e-04 (±1.8e-03)
Ground Truth 9.1e-03 2.0e-03 3.6e-03 4.8e-05
0 100 200 300
Stage
0
1
2
3
4
5
γm
φ1
φ2
φ3
φ4
(a) K= 2
0 20 40 60
Stage
0
1
2
3
4
5
γm
φ1
φ2
φ3
φ4
(b) K= 25
0 5 10 15
Stage
0
1
2
3
4
5
γm
φ1
φ2
φ3
φ4
(c) K= 125
01234
Stage
0
1
2
3
4
5
γm
φ1
φ2
φ3
φ4
(d) K= 225
Fig. 4: STL-AMS robustness thresholds by stage.
hold: Our PE M may be an inaccurate surrogate of the test
domain (i.e., it would be beneficial to incorporate work on
ML-uncertainty calibration [20]). Our scenario also had fixed
traffic behaviour, but real traffic is reactive and stochastic.
Other works explore these issues in detail [28]. Finally,
while it can be seen as an advantage that our method
adaptively selects an appropriate number of simulations, and
re-uses results from previous simulations, these advantages
complicate comparisons of our method to others in terms
of sample efficiency. In future work, we aim to compare
performance across a wider range of scenarios in terms of
fixed computational effort across the full sampling pipeline.
V. RE LATED A ND FUTURE WORK
This paper estimates failure probabilities. Similar tasks
include falsification (find one failure) and adaptive stress
testing (find the most-likely failure) [11]. Such tasks do
not directly accomplish our goal, but may contain insights
for rapidly guiding initial simulations towards failure areas.
A related task is synthesis—constructing controllers which
explicitly obey φ[1]. While synthesis can enforce adherence
to specifications expressed in tractable ST L subsets, the
perceptual and control uncertainty in AV scenarios means
testing remains necessary.
Combining splitting and logic has been attempted pre-
viously [23]. Rather than use STL robustness, such works
restrict themselves to heuristic decompositions of linear
temporal logic formulae. This renders them unsuitable for
cyber-physical domains like AV.
One of our experiment baselines was importance sampling.
Proposals are often represented by exponential distributions,
since those have analytic solutions [44], [51]. Neural net-
works have also been used to represent the proposal (as we
did), [34]. Most work considers rarity in the context of vehi-
cle configurations or behaviour. Our work instead considers
rarity in the context of perceptual disturbances. A sampler
category unexplored in this paper are markov-chain methods
[5]. With appropriate assumptions on metric smoothness and
system linearizability, such techniques have shown promise
in domains with long chains of dependent states [46], [43].
While out of scope, integrating such techniques with online
ST L monitoring may prove fruitful.
Despite the asymptotic normality of AMS, both splitting
and sampling lack guarantees on estimation error for fixed N.
Certifiable sampling addresses this with efficiency certificates
[3]—customized samplers with a bound on sampling error
relative to N. However, certification methods depend on first
being able to find the failure regions, so techniques from this
paper remain relevant.
Online and offline algorithms exist to calculate STL ro-
bustness [16], [13]. Typically, efficiency is not considered
in the context of a sampling regime. Yet recent advances
in online monitoring could be leveraged within AMS to dis-
card infeasible trajectories early. For example, incorporating
system dynamics, or causality [50], [12].
Our experiments target interstate lane changes. Others
encode rules for intersections, and situational awareness [15],
[31]. In future work, we aim to assess sampling effectiveness
across this diversity of specifications.
REFERENCES
[1] Aasi, E., Vasile, C.I., Belta, C.: A control architecture for provably-
correct autonomous driving. In: 2021 American Control Conference
(ACC). pp. 2913–2918. IEEE (2021)
[2] Althoff, M., Koschi, M., Manzinger, S.: Commonroad: Composable
benchmarks for motion planning on roads. In: 2017 IEEE Intelligent
Vehicles Symposium (IV). pp. 719–726. IEEE (2017)
[3] Arief, M., Bai, Y., Ding, W., He, S., Huang, Z., Lam, H., Zhao,
D.: Certifiable deep importance sampling for rare-event simulation of
black-box systems. arXiv preprint arXiv:2111.02204 (2021)
[4] Bingham, E., Chen, J.P., Jankowiak, M., Obermeyer, F., Pradhan,
N., Karaletsos, T., Singh, R., Szerlip, P.A., Horsfall, P., Goodman,
N.D.: Pyro: Deep universal probabilistic programming. J. Mach. Learn.
Res. 20, 28:1–28:6 (2019), http://jmlr.org/papers/v20/
18-403.html
[5] Botev, Z.I., L’Ecuyer, P., Tuffin, B.: Markov chain importance sam-
pling with applications to rare event probability estimation. Statistics
and Computing 23, 271–285 (2013)
[6] Bugallo, M.F., Elvira, V., Martino, L., Luengo, D., Miguez, J., Djuric,
P.M.: Adaptive importance sampling: The past, the present, and the
future. IEEE Signal Processing Magazine 34(4), 60–79 (2017)
[7] Carvalho, A., Gao, Y., Lefevre, S., Borrelli, F.: Stochastic predictive
control of autonomous vehicles in uncertain environments. In: 12th
international symposium on advanced vehicle control. vol. 9 (2014)
[8] Cérou, F., Guyader, A.: Fluctuation analysis of adaptive multilevel
splitting. The Annals of Applied Probability 26(6), 3319 3380
(2016). https://doi.org/10.1214/16-AAP1177, https://doi.org/
10.1214/16-AAP1177
[9] Cérou, F., Guyader, A., Rousset, M.: Adaptive multilevel splitting:
Historical perspective and recent results. Chaos: An Interdisciplinary
Journal of Nonlinear Science 29(4) (2019)
[10] Corso, A., Lee, R., Kochenderfer, M.J.: Scalable autonomous vehicle
safety validation through dynamic programming and scene decom-
position. In: 2020 IEEE 23rd International Conference on Intelligent
Transportation Systems (ITSC). pp. 1–6. IEEE (2020)
[11] Corso, A., Moss, R., Koren, M., Lee, R., Kochenderfer, M.: A survey
of algorithms for black-box safety validation of cyber-physical sys-
tems. Journal of Artificial Intelligence Research 72, 377–428 (2021)
[12] Deng, Z., Eshima, S.P., Nabity, J., Kong, Z.: Causal signal temporal
logic for the environmental control and life support system’s fault
analysis and explanation. IEEE Access 11, 26471–26482 (2023)
[13] Deshmukh, J.V., Donzé, A., Ghosh, S., Jin, X., Juniwal, G., Seshia,
S.A.: Robust online monitoring of signal temporal logic. Formal
Methods in System Design 51, 5–30 (2017)
[14] Diamond, S., Boyd, S.: CVXPY: A Python-embedded modeling lan-
guage for convex optimization. Journal of Machine Learning Research
17(83), 1–5 (2016)
[15] Dokhanchi, A., Amor, H.B., Deshmukh, J.V., Fainekos, G.: Evaluating
perception systems for autonomous vehicles using quality temporal
logic. In: Runtime Verification: 18th International Conference, RV
2018, Limassol, Cyprus, November 10–13, 2018, Proceedings 18. pp.
409–416. Springer (2018)
[16] Donzé, A., Ferrere, T., Maler, O.: Efficient robust monitoring for stl.
In: Computer Aided Verification: 25th International Conference, CAV
2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings 25. pp.
264–279. Springer (2013)
[17] Duvenaud, D.: Automatic model construction with Gaussian processes.
Ph.D. thesis, University of Cambridge (2014)
[18] Eiras, F., Hawasly, M., Albrecht, S.V., Ramamoorthy, S.: A two-
stage optimization-based motion planner for safe urban driving. IEEE
Transactions on Robotics 38(2), 822–834 (2021)
[19] Fainekos, G.E., Pappas, G.J.: Robustness of temporal logic speci-
fications for continuous-time signals. Theoretical Computer Science
410(42), 4262–4291 (2009)
[20] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of
modern neural networks. In: International conference on machine
learning. pp. 1321–1330. PMLR (2017)
[21] Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual
(2023), https://www.gurobi.com
[22] Innes, C., Ramamoorthy, S.: Testing rare downstream safety violations
via upstream adaptive sampling of perception error models. In: 2023
IEEE International Conference on Robotics and Automation (ICRA).
pp. 12744–12750. IEEE (2023)
[23] Jegourel, C., Legay, A., Sedwards, S.: An effective heuristic for
adaptive importance splitting in statistical model checking. In: Inter-
national Symposium On Leveraging Applications of Formal Methods,
Verification and Validation. pp. 143–159. Springer (2014)
[24] Juneja, S., Shahabuddin, P.: Rare-event simulation techniques: An
introduction and recent advances. Handbooks in operations research
and management science 13, 291–350 (2006)
[25] Kim, Y., Kochenderfer, M.J.: Improving aircraft collision risk esti-
mation using the cross-entropy method. Journal of Air Transportation
24(2), 55–62 (2016)
[26] Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.:
Pointpillars: Fast encoders for object detection from point clouds. In:
Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition. pp. 12697–12705 (2019)
[27] Lemire, D.: Streaming maximum-minimum filter using no more than
three comparisons per element. arXiv preprint cs/0610046 (2006)
[28] Li, J., Sun, L., Zhan, W., Tomizuka, M.: Interaction-aware behav-
ior planning for autonomous vehicles validated with real traffic
data. In: Dynamic Systems and Control Conference. vol. 84287, p.
V002T31A005. American Society of Mechanical Engineers (2020)
[29] Liu, C., Lee, S., Varnhagen, S., Tseng, H.E.: Path planning for
autonomous vehicles using model predictive control. In: 2017 IEEE
Intelligent Vehicles Symposium (IV). pp. 174–179. IEEE (2017)
[30] Louvin, H., Dumonteil, E., Lelièvre, T., Rousset, M., Diop, C.M.:
Adaptive multilevel splitting for monte carlo particle transport. In:
EPJ Web of Conferences. vol. 153, p. 06006. EDP Sciences (2017)
[31] Maierhofer, S., Moosbrugger, P., Althoff, M.: Formalization of in-
tersection traffic rules in temporal logic. In: 2022 IEEE Intelligent
Vehicles Symposium (IV). pp. 1135–1144. IEEE (2022)
[32] Maierhofer, S., Rettinger, A.K., Mayer, E.C., Althoff, M.: Formal-
ization of interstate traffic rules in temporal logic. In: 2020 IEEE
Intelligent Vehicles Symposium (IV). pp. 752–759. IEEE (2020)
[33] Maler, O., Nickovic, D.: Monitoring temporal properties of continuous
signals. In: International Symposium on Formal Techniques in Real-
Time and Fault-Tolerant Systems. pp. 152–166. Springer (2004)
[34] Müller, T., McWilliams, B., Rousselle, F., Gross, M., Novák, J.: Neural
importance sampling. ACM Transactions on Graphics (ToG) 38(5), 1–
19 (2019)
[35] O’Kelly, M., Sinha, A., Namkoong, H., Tedrake, R., Duchi, J.C.: Scal-
able end-to-end autonomous vehicle testing via rare-event simulation.
Advances in neural information processing systems 31 (2018)
[36] Pandharipande, A., Cheng, C.H., Dauwels, J., Gurbuz, S.Z.,
Ibanez-Guzman, J., Li, G., Piazzoni, A., Wang, P., Santra,
A.: Sensing and machine learning for automotive perception:
A review. IEEE Sensors Journal 23(11), 11097–11115 (2023).
https://doi.org/10.1109/JSEN.2023.3262134
[37] Piazzoni, A., Cherian, J., Dauwels, J., Chau, L.P.: Pem: Perception
error model for virtual testing of autonomous vehicles. IEEE Trans-
actions on Intelligent Transportation Systems (2023)
[38] Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical
feature learning on point sets in a metric space. Advances in neural
information processing systems 30 (2017)
[39] Rajabli, N., Flammini, F., Nardone, R., Vittorini, V.: Software verifi-
cation and validation of safe autonomous cars: A systematic literature
review. IEEE Access 9, 4797–4819 (2020)
[40] Rajamani, R.: Vehicle dynamics and control. Springer Science &
Business Media (2011)
[41] Rawlings, J.B., Mayne, D.Q., Diehl, M.: Model predictive control:
theory, computation, and design, vol. 2. Nob Hill Publishing Madison,
WI (2017)
[42] Riedmaier, S., Ponn, T., Ludwig, D., Schick, B., Diermeyer, F.: Survey
on scenario-based safety assessment of automated vehicles. IEEE
access 8, 87456–87477 (2020)
[43] Scher, G., Sadraddini, S., Tedrake, R., Kress-Gazit, H.: Elliptical
slice sampling for probabilistic verification of stochastic systems
with signal temporal logic specifications. In: Proceedings of the 25th
ACM International Conference on Hybrid Systems: Computation and
Control. pp. 1–11 (2022)
[44] Schmerling, E., Pavone, M.: Evaluating trajectory collision probability
through adaptive importance sampling for safe motion planning. arXiv
preprint arXiv:1609.05399 (2016)
[45] Schwarting, W., Alonso-Mora, J., Rus, D.: Planning and decision-
making for autonomous vehicles. Annual Review of Control, Robotics,
and Autonomous Systems 1, 187–210 (2018)
[46] Sinha, A., O’Kelly, M., Tedrake, R., Duchi, J.C.: Neural bridge
sampling for evaluating safety-critical autonomous systems. Advances
in Neural Information Processing Systems 33, 6402–6416 (2020)
[47] Team, O.D.: Openpcdet: An open-source toolbox for 3d object detec-
tion from point clouds. https://github.com/open-mmlab/
OpenPCDet (2020)
[48] Thrun, S.: Probabilistic robotics. Communications of the ACM 45(3),
52–57 (2002)
[49] Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine
learning, vol. 2. MIT press Cambridge, MA (2006)
[50] Yu, X., Dong, W., Yin, X., Li, S.: Online monitoring of dynamic sys-
tems for signal temporal logic specifications with model information.
In: 2022 IEEE 61st Conference on Decision and Control (CDC). pp.
1553–1559. IEEE (2022)
[51] Zhao, D., Lam, H., Peng, H., Bao, S., LeBlanc, D.J., Nobukawa,
K., Pan, C.S.: Accelerated evaluation of automated vehicles safety
in lane-change scenarios based on importance sampling techniques.
IEEE transactions on intelligent transportation systems 18(3), 595–
607 (2016)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Modern cyber-physical systems would often fall victim to unanticipated anomalies. Humans are still required in many operations to troubleshoot and respond to such anomalies, such those in future deep space habitats. To maximize the effectiveness and efficiency of the anomaly response process, the information provided by anomaly response technologies to their human operators must be epistemically accessible or explainable. This paper offers a first step towards developing explainable anomaly response systems. It proposes a logic, Causal Signal Temporal Logic (CaSTL), which can formally describe cause-effect relationships pertaining to fault explanation. Moreover, it develops an algorithm to infer a CaSTL formula that explains why a fault has happened in a system, given the model of the system and an observation about the fault. The effectiveness of the proposed algorithm is demonstrated with a simulated Environmental Control and Life Support System (ECLSS).
Conference Paper
Full-text available
Intersections are difficult to navigate for both human drivers and autonomous vehicles because several diverse traffic rules must be considered. In addition, current traffic rules are ambiguous and cannot be applied directly by autonomous vehicles. Therefore, national traffic rules must be concretized and formalized so that they are machine-interpretable. We present formalized intersection traffic rules in temporal logic and use the German traffic regulations as a concrete example. Our formalization considers different types of intersections, i.e., signalized, traffic-sign-regulated, and unregulated intersections. We also define predicates and functions that can be easily reused for other national traffic laws. We evaluate our formalized traffic rules on recorded real-world scenarios and manually-created test scenarios. Our evaluation validates the formalization from different legal sources.
Article
Full-text available
Autonomous cyber-physical systems (CPS) can improve safety and efficiency for safety-critical applications, but require rigorous testing before deployment. The complexity of these systems often precludes the use of formal verification and real-world testing can be too dangerous during development. Therefore, simulation-based techniques have been developed that treat the system under test as a black box operating in a simulated environment. Safety validation tasks include finding disturbances in the environment that cause the system to fail (falsification), finding the most-likely failure, and estimating the probability that the system fails. Motivated by the prevalence of safety-critical artificial intelligence, this work provides a survey of state-of-the-art safety validation techniques for CPS with a focus on applied algorithms and their modifications for the safety validation problem. We present and discuss algorithms in the domains of optimization, path planning, reinforcement learning, and importance sampling. Problem decomposition techniques are presented to help scale algorithms to large state spaces, which are common for CPS. A brief overview of safety-critical applications is given, including autonomous vehicles and aircraft collision avoidance systems. Finally, we present a survey of existing academic and commercially available safety validation tools.
Article
Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particular challenge is the problem of including the Sensing and Perception (S&P) subsystem into the virtual simulation loop in an efficient and effective manner. In this article, we define Perception Error Models (PEM), a virtual simulation component that can enable the analysis of the impact of perception errors on AV safety, without the need to model the sensors themselves. We propose a generalized data-driven procedure towards parametric modeling and evaluate it using Apollo, an open-source driving software, and nuScenes, a public AV dataset. Additionally, we implement PEMs in SVL, an open-source vehicle simulator. Furthermore, we demonstrate the usefulness of PEM-based virtual tests, by evaluating camera, LiDAR, and camera-LiDAR setups. Our virtual tests highlight limitations in the current evaluation metrics, and the proposed approach can help study the impact of perception errors on AV safety.
Article
Automotive perception involves understanding the external driving environment as well as the internal state of the vehicle cabin and occupants using sensor data. It is critical to achieving high levels of safety and autonomy in driving. This paper provides an overview of different sensor modalities like cameras, radars, and LiDARs used commonly for perception, along with the associated data processing techniques. Critical aspects in perception are considered, like architectures for processing data from single or multiple sensor modalities, sensor data processing algorithms and the role of machine learning techniques, methodologies for validating the performance of perception systems, and safety. The technical challenges for each aspect are analyzed, emphasizing machine learning approaches given their potential impact on improving perception. Finally, future research opportunities in automotive perception for their wider deployment are outlined.
Article
2018 Curran Associates Inc.All rights reserved. While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the de facto evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. Using adaptive importance-sampling methods to accelerate rare-event probability evaluation, we estimate the probability of an accident under a base distribution governing standard traffic behavior. We demonstrate our framework on a highway scenario, accelerating system evaluation by 2-20 times over naive Monte Carlo sampling methods and 10-300P times (where P is the number of processors) over real-world testing.