Content uploaded by Mehul Bhatt
Author content
All content in this area was uploaded by Mehul Bhatt on May 31, 2019
Content may be subject to copyright.
Out of Sight But Not Out of Mind:
An Answer Set Programming Based Online Abduction Framework for
Visual Sensemaking in Autonomous Driving
Jakob Suchan1,3,Mehul Bhatt2,3and Srikrishna Varadarajan3
1University of Bremen, Germany, 2¨
Orebro University, Sweden
3CoDesign Lab / Cognitive Vision – www.codesign-lab.org
Abstract
We demonstrate the need and potential of system-
atically integrated vision and semantics solutions
for visual sensemaking (in the backdrop of au-
tonomous driving). A general method for online vi-
sual sensemaking using answer set programming is
systematically formalised and fully implemented.
The method integrates state of the art in (deep
learning based) visual computing, and is developed
as a modular framework usable within hybrid ar-
chitectures for perception & control. We evaluate
and demo with community established benchmarks
KITTIMOD and MOT. As use-case, we focus on the
significance of human-centred visual sensemaking
—e.g., semantic representation and explainability,
question-answering, commonsense interpolation—
in safety-critical autonomous driving situations.
1 MOTIVATION
Autonomous driving research has received enormous aca-
demic & industrial interest in recent years. This surge has
coincided with (and been driven by) advances in deep learn-
ing based computer vision research. Although deep learn-
ing based vision & control has (arguably) been successful
for self-driving vehicles, we posit that there is a clear need
and tremendous potential for hybrid visual sensemaking so-
lutions (integrating vision and semantics) towards fulfilling
essential legal and ethical responsibilities involving explain-
ability, human-centred AI, and industrial standardisation (e.g,
pertaining to representation, realisation of rules and norms).
Autonomous Driving: “Standardisation & Regulation”
As the self-driving vehicle industry develops, it will be nec-
essary —e.g., similar to sectors such as medical comput-
ing, computer aided design— to have an articulation and
community consensus on aspects such as representation, in-
teroperability, human-centred performance benchmarks, and
data archival & retrieval mechanisms.1In spite of major in-
1Within autonomous driving, the need for standardisation and ethical reg-
ulation has most recently garnered interest internationally, e.g., with the Fed-
eral Ministry of Transport and Digital Infrastructure in Germany taking a lead in
eliciting 20 key propositions (with legal implications) for the fulfilment of ethical
commitments for automated and connected driving systems [BMVI, 2018].
gets occluded reappears
Figure 1: Out of sight but not out of mind; the case of hidden
entities: an occluded cyclist.
vestments in self-driving vehicle research, issues related to
human-centred’ness, human collaboration, and standardisa-
tion have been barely addressed, with the current focus in
driving research primarily being on two basic considerations:
how fast to drive, and which way and how much to steer.
This is necessary, but inadequate if autonomous vehicles are
to become commonplace and function with humans. Ethi-
cally driven standardisation and regulation will require ad-
dressing challenges in semantic visual interpretation, natural
/ multimodal human-machine interaction, high-level data an-
alytics (e.g., for post hoc diagnostics, dispute settlement) etc.
This will necessitate –amongst other things– human-centred
qualitative benchmarks and multifaceted hybrid AI solutions.
Visual Sensemaking Needs Both “Vision & Semantics”
We demonstrate the significance of semantically-driven
methods rooted in knowledge representation and reasoning
(KR) in addressing research questions pertaining to explain-
ability and human-centred AI particularly from the viewpoint
of sensemaking of dynamic visual imagery. Consider the oc-
clusion scenario in Fig. 1:
Car (c)isin-front, and indicating to turn-right;during this time, per-
son (p) is on a bicycle (b) and positioned front-right of cand moving-
forward. Car cturns-right, during which the bicyclist < p, b > is
not visible.Subsequently, bicyclist < p, b > reappears.
The occlusion scenario indicates several challenges concern-
ing aspects such as: identity maintenance, making default as-
sumptions, computing counterfactuals, projection, and inter-
polation of missing information (e.g., what could be hypoth-
Events
Cameras
Radars
LIDAR
GPS
... High-Level Abduction
Ontology
Semantic Query Processing
Hypothesized Situation
Low-Level
Control
Low-Level
Motion Tracking
Declarative Model of Scene Dynamics
Space
Blocked Lane
Prediction
Association
Motion Objects
Motion Tracks
Matching
Blocked Visibility Lane Changes
Hidden Entity Sudden Stop ...
Occlusion Identity
Visuo-spatial Concepts
Attachment
Control Decisions
Visual Processing
Object Detaction
Semantic Segmentation
Lane Detection Ego-Motion
assign start end halt resume standby
Hypotheses on
Object Interactions
Joint Optimization
of Scene Dynamics
Object Tracks Observations
for each t in T:
slow_down change_lane
...emergency_break
Trk1
Trk2
t1
t2
Trk1
Trk2
t1t2t3
passing behindgetting occluded
...
ts
te
ts+1
εts+1
MToi = {εts,..., εte}
Trk1
Trk2
t1
t2
moving together
P
e
r
c
e
i
v
e
D
e
c
i
d
e
I
n
t
e
r
p
r
e
t
Online
Vision and
Control
Figure 2: A General Online Abduction Framework / Conceptual Overview
esised about bicyclist < p, b > when it is occluded; how can
this hypothesis enable in planning an immediate next step).
Addressing such challenges —be it realtime or post-hoc—
in view of human-centred AI concerns pertaining to ethics,
explainability and regulation requires a systematic integra-
tion of Semantics and Vision, i.e., robust commonsense rep-
resentation & inference about spacetime dynamics on the one
hand, and powerful low-level visual computing capabilities,
e.g., pertaining to object detection and tracking on the other.
Key Contributions. We develop a general and systematic
declarative visual sensemaking method capable of online
abduction: realtime, incremental, commonsense semantic
question-answering and belief maintenance over dynamic vi-
suospatial imagery. Supported are (1–3):(1). human-centric
representations semantically rooted in spatio-linguistic prim-
itives as they occur in natural language [Bhatt et al., 2013;
Mani and Pustejovsky, 2012];(2). driven by Answer Set
Programming (ASP) [Brewka et al., 2011], the ability to ab-
ductively compute commonsense interpretations and expla-
nations in a range of (a)typical everyday driving situations,
e.g., concerning safety-critical decision-making; (3). online
performance of the overall framework modularly integrating
high-level commonsense reasoning and state of the art low-
level visual computing for practical application in real world
settings. We present the formal framework & its implemen-
tation, and demo & empirically evaluate with community es-
tablished real-world datasets and benchmarks, namely: KIT-
TIMOD [Geiger et al., 2012]and MOT [Milan et al., 2016].
2 VISUAL SENSEMAKING:
A GENERAL METHOD DRIVEN BY ASP
Our proposed framework, in essence, jointly solves the prob-
lem of assignment of detections to tracks and explaining
overall scene dynamics (e.g. appearance,disappearance) in
terms of high-level events within an online integrated low-
level visual computing and high level abductive reasoning
framework (Fig. 2). Rooted in answer set programming, the
framework is general, modular, and designed for integration
as a reasoning engine within (hybrid) architectures designed
for real-time decision-making and control where visual per-
ception is needed as one of the several components. In such
large scale AI systems the declarative model of the scene dy-
namics resulting from the presented framework can be used
for semantic Q/A, inference etc. to support decision-making.
2.1 SPACE, MOTION, OBJECTS, EVENTS
Reasoning about dynamics is based on high-level representa-
tions of objects and their respective motion & mutual interac-
tions in spacetime. Ontological primitives for commonsense
reasoning about spacetime (Σst) and dynamics (Σdyn ) are:
•Σst:domain-objects O={o1, ..., on}represent the vi-
sual elements in the scene, e.g., people,cars,cyclists; el-
ements in Oare geometrically interpreted as spatial en-
tities E={ε1, ..., εn}; spatial entities Emay be regarded
as points,line-segments or (axis-aligned) rectangles based
on their spatial properties (and a particular reasoning task
at hand). The temporal dimension is represented by time
points T={t1, ..., tn}.MT oi= (εts, ..., εte) represents
the motion track of a single object oi, where tsand tede-
note the start and end time of the track and εtsto εtede-
notes the spatial entity (E) —e.g., the axis-aligned bound-
ing box—corresponding to the object oiat time points ts
to te. The spatial configuration of the scene and changes
within it are characterised based on the qualitative spatio-
temporal relationships (R) between the domain objects. For
the running and demo examples of this paper, positional re-
lations on axis-aligned rectangles based on the rectangle al-
gebra (RA) [Balbiani et al., 1999]suffice; RA uses the rela-
tions of Interval Algebra (IA) [Allen, 1983]RIA ≡{before,
after,during,contains,starts,started by,finishes,finished by,
overlaps,overlapped by,meets,met by,equal}to relate two ob-
jects by the interval relations projected along each dimension
separately (e.g., horizontal and vertical dimensions).
Algorithm 1: Online Abduction(V,Σ)
Data: Visual imagery (V), and
background knowledge Σ≡def Σdyn ∪Σst
Result: Visual Explanations (EXP)(also: Refer Fig 3)
1MT ← ∅,Hevents ←∅
2for t∈Tdo
3VOt←observe(Vt)
4Pt←∅,MLt←∅
5for trk ∈ MT t−1do
6ptrk ←kalman predict(trk)
7Pt← Pt∪ptrk
8for obs ∈ VOtdo
9mltrk,obs ←calc IoU(ptrk , obs)
10 MLt← MLt∪mltrk,obs
11 Abduce(<Hassign
t,Hevents
t>), such that: (Step 2)
Σ∧Hevents ∧[Hassign
t∧Hevents
t]|=VOt∧Pt∧MLt
12 Hevents ← Hevents ∪ Hevents
t
13 MT t←update(MT t−1,VOt,Hassign)
14 return EXP ← <Hevents,MT >
•Σdyn: The set of fluents Φ={φ1, ..., φn}and events Θ=
{θ1, ..., θn}respectively characterise the dynamic properties
of the objects in the scene and high-level abducibles (Table
1). For reasoning about dynamics (with <Φ,Θ>), we use a
variant of event calculus as per [Ma et al., 2014;Miller et al.,
2013]; in particular, for examples of this paper, the functional
event calculus fragment (Σdyn)ofMa et al. [2014]suffices:
main axioms relevant here pertain to occurs-at(θ, t)denoting
that an event occurred at time tand holds-at(φ, v, t)denoting
that vholds for a fluent φat time t.2
•Σ:Let Σ≡def Σdyn <Φ,Θ>∪Σst <O,E,T,MT ,R>
2.2 TRACKING AS ABDUCTION
Scene dynamics are tracked using a detect and track ap-
proach: we tightly integrate low-level visual computing (for
detecting scene elements) with high-level ASP-based abduc-
tion to solve the assignment of observations to object tracks
in an incremental manner. For each time point twe generate
aproblem specification consisting of the object tracks and vi-
sual observations and use (ASP) to abductively solve the cor-
responding assignment problem incorporating the ontological
structure of the domain / data (abstracted with Σ). Steps 1–3
(Alg. 1& Fig. 3) are as follows:
Step 1. FORMULATING THE PROBLEM SPECIFICATION
The ASP problem specification for each time point tis given
by the tuple <VOt,Pt,MLt>and the sequence of events
(Hevents) before time point t.
•Visual Observations Scene elements derived directly from
the visual input data are represented as spatial entities E, i.e.,
VOt={εobs1, ..., εobsn}is the set of observations at time t
(Fig. 3). For the examples and empirical evaluation in this
paper (Sec. 3) we focus on Obstacle / Object Detections
– detecting cars, pedestrians, cyclists, traffic lights etc using
YOLOv3 [Redmon and Farhadi, 2018]. Further we generate
scene context using Semantic Segementation – segmenting
2ASP encoding of the domain independent axioms of the Functional Event
Calculus (FEC) used as per: https://www.ucl.ac.uk/infostudies/efec/fec.lp
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For each t2T
Step 1. Formulating the Problem Specification <VOt,Pt,MLt>
(1) detect Visual Observations (VOt) e.g., People, Cars, Objects, Roads, Lanes,(2)
Predictions (Pt) of next position and size of object tracks using kalman filters, and (3)
calculate Matching Likelihood (MLt) based on Intersection over Union (IoU) between
predictions and detections.
obs(obs_0,car,99). obs(...). ... box2d(obs_16,1078,86,30,44). ...
trk(trk_0,car). trk(...). ... box2d(trk_0,798,146,113,203). ...
iou(trk_0,obs_0,83921). iou(...). ... iou(trk_23,obs_16,0). ...
Step 2. Abduction based Association generate hypothesis for (1) matching of tracks
and observations (Hassign
t), and (2) and high-level events (Hevents
t) explaining (1).
trk1trk2obs2
obs1
topology: po IOU: 0.89 IOU: 0.23 IOU: 0.00
... obsn
start
...
end assign assign halt
Tracks Observations
PREDICT
trk1trk2
...
Tracks
UPDATE
tntn+1
obs1... obsn
Observations
halt assign standby
topology: po IOU: 0.91 conf: 0.43
holds-at(visibility(trk2),
partially_occluded, tn).
occurs-at(hides_behind(trk2, trk1), tn+1).
holds-at(visibility(trk2), fully_visible, tn-1).
occurs-at(gets_behind(trk2, trk1), tn).
holds-at(visibility(trk2),
fully_occluded, tn+1).
Step 3. Finding the Optimal Hypothesis Jointly optimize Hassign
tand Hevents
t
by maximizing matching likelihood MLtand minimizing event costs.
RESULT. Visuo-Spatial Scene Semantics Resulting motion tracks and the corre-
sponding event sequence, explaining the low-level motion.
... occurs_at(missing_detections(trk_10),35) ... occurs_at(...)
... occurs_at(recover(trk_10),36) ... occurs_at(lost(trk_18),41)
... occurs_at(hides_behind(trk_9,trk_10),41) ... occurs_at(...)
... occurs_at(unhides_from_behind(trk_9,trk_10),42) ...
Table 1: Computational Steps for Online Visual Abduction
Figure 3: Computational Steps for Online Visual Abduction
the road, sidewalk, buildings, cars, people, trees, etc. us-
ing DeepLabv3+ [Chen et al., 2018], and Lane Detection –
estimating lane markings, to detect lanes on the road, using
SCNN [Pan et al., 2018]. Type and confidence score for each
observation is given by typeobsiand confobsi.
•Movement Prediction For each track trkichanges in po-
sition and size are predicted using kalman filters; this results
in an estimate of the spatial entity εfor the next time-point t
of each motion track Pt={εtrk1, ..., εtrkn}.
•Matching Likelihood For each pair of tracks and ob-
servations εtrkiand εobsj, where εtrki∈ Ptand
εobsj∈ VOt, we compute the likelihood MLt=
{mltrk1,obs1, ..., mltrki,obsj}that εobsjbelongs to εtrki. The
intersection over union (IoU) provides a measure for the
amount of overlap between the spatial entities εobsjand εtrki.
Step 2. ABDUCTION BASED ASSOCIATION Following
perception as logical abduction most directly in the sense of
Shanahan [2005], we define the task of abducing visual expla-
nations as finding an association (Hassign
t) of observed scene
elements (VOt) to the motion tracks of objects (MT ) given
by the predictions Pt, together with a high-level explanation
(Hevents
t), such that [Hassign
t∧ Hevents
t]is consistent with
the background knowledge and the previously abduced event
sequence Hevents, and entails the perceived scene given by
<VOt,Pt,MLt>:
IΣ∧ Hevents ∧[Hassign
t∧ Hevents
t]|=V Ot∧ Pt∧ MLt
where Hassign
tconsists of the assignment of detections to ob-
ject tracks, and Hevents
tconsists of the high-level events Θ
explaining the assignments.
•Associating Objects and Observations Finding the best
match between observations (VOt) and object tracks (Pt) is
done by generating all possible assignments and then max-
imising a matching likelihood mltrki,obsjbetween pairs of
spatial entities for matched observations εobsjand predicted
track region εtrki(See Step 3). Towards this we use choice
rules [Gebser et al., 2014](i.e., one of the heads of the rule
has to be in the stable model) for εobsjand εtrki, generat-
ing all possible assignments in terms of assignment actions:
assign, start, end, halt, resume, ignore det, ignore trk.
IMATCHING TRACKS AND DETECTIONS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For each assignment action we define integrity constraints3
that restrict the set of answers generated by the choice rules,
e.g., the following constraints are applied to assigning an ob-
servation εobsjto a track trki, applying thresholds on the
IoUtrki,obsjand the confidence of the observation confobsj,
further we define that the type of the observation has to match
the type of the track it is assigned to:
IINTEGRITY CONSTRAINTS ON MATCHING
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
•Abducible High-Level Events For the length of this paper,
we restrict to high-level visuo-spatial abducibles pertaining
to object persistence and visibility (Table 1): (1).Occlusion:
Objects can disappear or reappear as result of occlusion with
other objects; (2).Entering / Leaving the Scene: Objects can
enter or leave the scene at the borders of the field of view;
(3).Noise and Missing Observation: (Missing-)observations
can be the result of faulty detections.
Lets take the case of occlusion: functional fluent visibil-
ity could be denoted fully visible,partially occluded or
fully occluded:
IVISIBILITY - FLUENT AND POSSIBLE VALUES
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We define the event hides behind/2, stating that an object
hides behind another object by defining the conditions that
3Integrity constraints restrict the set of answers by eliminating stable models
where the body is satisfied.
EVENTS Description
enters fov(Trk)Track Trk enters the field of view.
leaves fov(Trk)Track Trk leaves the field of view.
hides behind(Trk1,Trk2)Track Trk1hides behind track Trk2.
unhides from behind(Trk1,Trk2)Track Trk1unhides from behind track Trk2.
missing detections(Trk)Missing detections for track Trk.
FLUENTS Values Description
in fov(Trk){true;false}Track Trk is in the field of view.
hidden by(Trk1,Trk2){true;false}Track Trk1 is hidden by Trk2.
visibility(Trk){fully visible;
partially occluded;
fully occluded}
Visibility state of track Trk.
Table 1: Abducibles; Events and Fluents Explaining
(Dis)Appearance
have to hold for the event to possibly occur, and the ef-
fects the occurrence of the event has on the properties of
the objects, i.e., the value of the visibility fluent changes to
fully occluded.
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For abducing the occurrence of an event we use choice rules
that connect the event with assignment actions, e.g., a track
getting halted may be explained by the event that the track
hides behind another track.
IGENERATING HYPOTHESES ON EVENTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Step 3. FINDING THE OPTIMAL HYPOTHESIS To en-
sure an optimal assignment, we use ASP based optimization
to maximize the matching likelihood between matched pairs
of tracks and detections. Towards this, we first define the
matching likelihood based on the Intersection over Union
(IoU) between the observations and the predicted boxes for
each track as described in [Bewley et al., 2016]:
IASSIGNMENT LIKELIHOOD
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We then maximize the matching likelihood for all assign-
ments, using the build in maximize statement:
IMAXIMIZING ASSIGNMENT LIKELIHOOD
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
To find the best set of hypotheses with respect to the obser-
vations, we minimize the occurrence of certain events and as-
sociation actions, e.g., the following optimization statements
minimize starting and ending tracks; the resulting assignment
is then used to update the motion tracks accordingly.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
It is important here to note that: (1). by jointly abducing the
object dynamics and high-level events we can impose con-
Situation Objects Description
OVERTAKING vehicle, vehicle vehicle is overtaking another vehicle
HIDDEN ENTITY entity, object traffic participant hidden by obstacle
REDUCED VISIBILITY object visibility reduced by object in front.
SUDDEN STOP vehical vehicle in front stopping suddenly
BLOCKED LANE lane, object lane of the road is blocked by some object.
EXITING VEHICLE person, vehicle person is exiting a parked vehicle.
Table 2: Safety-Critical Situations
straints on the assignment of detections to tracks, i.e., an as-
signment is only possible if we can find an explanation sup-
porting the assignment; and (2). the likelihood that an event
occurs guides the assignments of observations to tracks. In-
stead of independently tracking objects and interpreting the
interactions, this yields to event sequences that are consistent
with the abduced object tracks, and noise in the observations
is reduced (See evaluation in Sec. 3).
3 APPLICATION & EVALUATION
We demonstrate applicability towards identifying and inter-
preting safety-critical situations (e.g., Table 2); these encom-
pass those scenarios where interpretation of spacetime dy-
namics, driving behaviour, environmental characteristics is
necessary to anticipate and avoid potential dangers.
Reasoning about Hidden Entities Consider the situation
of Fig. 4: a car gets occluded by another car turning left and
reappears in front of the autonomous vehicle. Using online
abduction for abducing high-level interactions of scene ob-
jects we can hypothesize that the car got occluded and antici-
pate its reappearance based on the perceived scene dynamics.
The following shows data and abduced events.
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We define a rule stating that a hidden object may unhide from
behind the object it is hidden by and anticipate the time point
tbased on the object movement as follows:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We then interpolate the objects position at time point tto pre-
dict where the object may reappear.
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For the occluded car in our example we get the following
prediction for time tand position x, y:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Based on this prediction we can then define a rule that gives
a warning if a hidden entity may reappear in front of the ve-
hides behind unhides from behind
t160 t180
t200
t220
fully_occluded
anticipated reappearance
Figure 4: Abducing Occlusion to Anticipate Reappearance
hicle, which could be used by the control mechanism, e.g., to
adapt driving and slow down in order to keep safe distance:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Empirical Evaluation For online sensemaking, evaluation
focusses on accuracy of abduced motion tracks, real-time per-
formance, and the tradeoff between performance and accu-
racy. Our evaluation uses the KITTI object tracking dataset
[Geiger et al., 2012], which is a community established
benchmark dataset for autonomous cars: it consists of 21
training and 29 test scenes, and provides accurate track an-
notations for 8object classes (e.g., car, pedestrian, van, cy-
clist). We also evaluate tracking results using the more gen-
eral cross-domain Multi-Object Tracking (MOT) dataset [Mi-
lan et al., 2016]established as part of the MOT Challenge; it
consists of 7training and 7test scenes which are highly un-
constrained videos filmed with both static and moving cam-
eras. We evaluate on the available groundtruth for training
scenes of both KITTI using YOLOv3 detections, and MOT17
using the provided faster RCNN detections.
•Evaluating Object Tracking For evaluating accuracy
(MOTA) and precision (MOTP) of abduced object tracks we
follow the ClearMOT [Bernardin and Stiefelhagen, 2008]
evaluation schema. Results (Table 3) show that jointly ab-
ducing high-level object interactions together with low-level
scene dynamics increases the accuracy of the object tracks,
i.e, we consistently observe an improvement of about 5%,
from 45.72% to 50.5% for cars and 28.71% to 32.57% for
pedestrians on KITTI, and from 41.4% to 46.2% on MOT.
•Online Performance and Scalability Performance of on-
line abduction is evaluated with respect to its real-time ca-
pabilities.4(1). We compare the time & accuracy of online
abduction for state of the art (real-time) detection methods:
YOLOv3,SSD [Liu et al., 2016], and Faster RCNN [Ren et
al., 2015](Fig. 5). (2). We evaluate scalability of the ASP
based abduction on a synthetic dataset with controlled num-
ber of tracks and % of overlapping tracks per frame. Results
(Fig. 5) show that online abduction can perform with above
30 frames per second for scenes with up to 10 highly over-
lapping object tracks, and more than 50 tracks with 1fps (for
the sake of testing, it is worth noting that even for 100 objects
per frame it only takes about an average of 4secs per frame).
Importantly, for realistic scenes such as in the KITTI dataset,
abduction runs realtime at 33.9fps using YOLOv3, and 46.7
using SSD with a lower accuracy but providing good preci-
sion.
4Evaluation using a dedicated Intel Core i7-6850K 3.6GHz 6-Core Proces-
sor, 64GB RAM, and a NVIDIA Titan V GPU 12GB.
SEQUENCE Tracking MOTA MOTP ML MT FP FN ID sw. Frag.
KITTI tracking –Cars without Abduction 45.72 % 76.89 % 19.14 % 23.04 % 785 11182 1097 1440
(8008 frames, 636 targets) with Abduction 50.5 % 74.76 % 20.21 % 23.23 % 1311 10439 165 490
KITTI tracking –Pedestrians without Abduction 28.71 % 71.43 % 26.94 % 9.58 % 1261 6119 539 833
(8008 frames, 167 targets) with Abduction 32.57 % 70.68 % 22.15 % 14.37 % 1899 5477 115 444
MOT 2017 without Abduction 41.4 % 88.0 % 35.53 % 16.48 % 4877 60164 779 741
(5316 frames, 546 targets) with Abduction 46.2 % 87.9 % 31,32 % 20.7 % 5195 54421 800 904
Table 3: Evaluation of Tracking Performance; accuracy (MOTA), precision (MOTP), mostly tracked (MT) and mostly lost (ML) tracks,
false positives (FP), false negatives (FN), identity switches (ID Sw.), and fragmentation (Frag.).
DETECTOR Recall MOTA MOTP f psdet fpsabd
YOLOv3 0.690 50.5 % 74.76 % 45 33.9
SSD 0.599 30.63 % 77.4 % 8 46.7
FRCNN 0.624 37.96 % 72.9 % 5 32.0
ms/frame
1
10
100
1K
num. tracks
5
10
20
50
100
30 fps
1 fps
15 fps
2
No. tracks ms/fr ame fps
5 23.33 42.86
10 31.36 31.89
20 62.08 16.11
50 511.83 1.95
100 3996.38 0.25
Figure 5: Online Performance and Scalability; performance for
pretrained detectors on the ’cars’ class of KITTI dataset, and pro-
cessing time relative to the no. of tracks on synthetic dataset.
Discussion of Empirical Results Results show that inte-
grating high-level abduction and object tracking improves the
resulting object tracks and reduce the noise in the visual ob-
servations. For the case of online visual sense-making, ASP
based abduction provides the required performance: even
though the complexity of ASP based abduction increases
quickly, with large numbers of tracked objects the frame-
work can track up to 20 objects simultaneously with 30fps
and achieve real-time performance on the KITTI benchmark
dataset. It is also important to note that the tracking approach
in this paper is based on tracking by detection using a naive
measure, i.e, the IoU (Sec. 2.2; Step 1), to associate obser-
vations and tracks, and it is not using any visual information
in the prediction or association step. Naturally, this results in
a lower accuracy, in particular when used with noisy detec-
tions and when tracking fast moving objects in a benchmark
dataset such as KITTI. That said, due to the modularity of
the implemented framework, extensions with different meth-
ods for predicting motion (e.g., using particle filters or opti-
cal flow based prediction) are straightforward: i.e., improving
tracking is not the aim of our research.
4 RELATED WORK
ASP is now widely used as an underlying knowledge
representation language and robust methodology for non-
monotonic reasoning [Brewka et al., 2011;Gebser et al.,
2012]. With ASP as a foundation, and driven by semantics,
commonsense and explainability [Davis and Marcus, 2015],
this research aims to bridge the gap between high-level for-
malisms for logical abduction and low-level visual processing
by tightly integrating semantic abstractions of space-change
with their underlying numerical representations. Within KR,
the significance of high-level (abductive) explanations in a
range of contexts is well-established: planning & process
recognition [Kautz, 1991], vision & abduction [Shanahan,
2005], probabilistic abduction [Blythe et al., 2011], reason-
ing about spatio-temporal dynamics [Bhatt and Loke, 2008],
reasoning about continuous spacetime change [Muller, 1998;
Hazarika and Cohn, 2002]etc. Dubba et al. [2015]uses ab-
ductive reasoning in an inductive-abductive loop within in-
ductive logic programming (ILP).Aditya et al. [2015]for-
malise general rules for image interpretation with ASP. Simi-
larly motivated to this research is [Suchan et al., 2018], which
uses a two-step approach (with one huge problem specifica-
tion), first tracking and then explaining (and fixing) tracking
errors; such an approach is not runtime / realtime capable. In
computer vision research there has recently been an interest to
synergise with cognitively motivated methods; in particular,
e.g., for perceptual grounding & inference [Yu et al., 2015]
and combining video analysis with textual information for
understanding events & answering queries about video data
[Tu et al., 2014].
5 CONCLUSION & OUTLOOK
We develop a novel abduction-driven online (i.e., realtime, in-
cremental) visual sensemaking framework: general, system-
atically formalised, modular and fully implemented. Integrat-
ing robust state-of-the-art methods in knowledge representa-
tion and computer vision, the framework has been evaluated
and demonstrated with established benchmarks. We highlight
application prospects of semantic vision for autonomous driv-
ing, a domain of emerging & long-term significance. Spe-
cialised commonsense theories (e.g., about multi-sensory in-
tegration & multi-agent belief merging, contextual knowl-
edge) may be incorporated based on requirements. Our on-
going focus is to develop a novel dataset emphasising se-
mantics and (commonsense) explainability; this is driven by
mixed-methods research –AI, Psychology, HCI– for the study
of driving behaviour in low-speed, complex urban environ-
ments with unstructured traffic. Here, emphasis is on natural
interactions (e.g., gestures, joint attention) amongst drivers,
pedestrians, cyclists etc. Such interdisciplinary studies are
needed to better appreciate the complexity and spectrum of
varied human-centred challenges in autonomous driving, and
demonstrate the significance of integrated vision & semantics
solutions in those contexts.
Acknowledgements
Partial funding by the German Research Foundation (DFG)
via the CRC 1320 EASE – Everyday Activity Science and
Engineering” (www.ease-crc.org), Project P3: Spatial Rea-
soning in Everyday Activity is acknowledged.
References
[Aditya et al., 2015]Somak Aditya, Yezhou Yang, Chitta Baral,
Cornelia Fermuller, and Yiannis Aloimonos. Visual common-
sense for scene understanding using perception, semantic parsing
and reasoning. In 2015 AAAI Spring Symposium Series, 2015.
[Allen, 1983]James F. Allen. Maintaining knowledge about tem-
poral intervals. Commun. ACM, 26(11):832–843, 1983.
[Balbiani et al., 1999]Philippe Balbiani, Jean-Franc¸ois Condotta,
and Luis Fari˜
nas del Cerro. A new tractable subclass of the rect-
angle algebra. In Thomas Dean, editor, IJCAI 1999, Sweden,
pages 442–447. Morgan Kaufmann, 1999.
[Bernardin and Stiefelhagen, 2008]Keni Bernardin and Rainer
Stiefelhagen. Evaluating multiple object tracking performance:
The clear mot metrics. EURASIP Journal on Image and Video
Processing, 2008(1):246309, May 2008.
[Bewley et al., 2016]Alex Bewley, Zongyuan Ge, Lionel Ott,
Fabio Ramos, and Ben Upcroft. Simple online and realtime track-
ing. In 2016 IEEE International Conference on Image Processing
(ICIP), pages 3464–3468, 2016.
[Bhatt and Loke, 2008]Mehul Bhatt and Seng W. Loke. Modelling
dynamic spatial systems in the situation calculus. Spatial Cogni-
tion & Computation, 8(1-2):86–130, 2008.
[Bhatt et al., 2013]Mehul Bhatt, Carl Schultz, and Christian
Freksa. The ‘Space’ in Spatial Assistance Systems: Conception,
Formalisation and Computation. In: Representing space in cog-
nition: Interrelations of behavior, language, and formal models.
Series: Explorations in Language and Space. 978-0-19-967991-
1, Oxford University Press, 2013.
[Blythe et al., 2011]James Blythe, Jerry R. Hobbs, Pedro Domin-
gos, Rohit J. Kate, and Raymond J. Mooney. Implementing
weighted abduction in markov logic. In Proc. of 9th Intl. Confer-
ence on Computational Semantics, IWCS ’11, USA, 2011. ACL.
[BMVI, 2018]BMVI. Report by the ethics commission on auto-
mated and connected driving., bmvi: Federal ministry of trans-
port and digital infrastructure, germany, 2018.
[Brewka et al., 2011]Gerhard Brewka, Thomas Eiter, and
Miroslaw Truszczy´
nski. Answer set programming at a glance.
Commun. ACM, 54(12):92–103, December 2011.
[Chen et al., 2018]Liang-Chieh Chen, Yukun Zhu, George Papan-
dreou, Florian Schroff, and Hartwig Adam. Encoder-decoder
with atrous separable convolution for semantic image segmen-
tation. arXiv:1802.02611, 2018.
[Davis and Marcus, 2015]Ernest Davis and Gary Marcus. Com-
monsense reasoning and commonsense knowledge in artificial
intelligence. Commun. ACM, 58(9):92–103, 2015.
[Dubba et al., 2015]Krishna Sandeep Reddy Dubba, Anthony G.
Cohn, David C. Hogg, Mehul Bhatt, and Frank Dylla. Learning
relational event models from video. J. Artif. Intell. Res. (JAIR),
53:41–90, 2015.
[Gebser et al., 2012]Martin Gebser, Roland Kaminski, Benjamin
Kaufmann, and Torsten Schaub. Answer Set Solving in Practice.
Morgan & Claypool Publishers, 2012.
[Gebser et al., 2014]Martin Gebser, Roland Kaminski, Benjamin
Kaufmann, and Torsten Schaub. Clingo = ASP + control: Pre-
liminary report. CoRR, abs/1405.3694, 2014.
[Geiger et al., 2012]Andreas Geiger, Philip Lenz, and Raquel Ur-
tasun. Are we ready for autonomous driving? the kitti vision
benchmark suite. In Conference on Computer Vision and Pattern
Recognition (CVPR), 2012.
[Hazarika and Cohn, 2002]Shyamanta M Hazarika and Anthony G
Cohn. Abducing qualitative spatio-temporal histories from par-
tial observations. In KR, pages 14–25, 2002.
[Kautz, 1991]Henry A. Kautz. Reasoning about plans. chapter
A Formal Theory of Plan Recognition and Its Implementation,
pages 69–124. Morgan Kaufmann Publishers Inc., USA, 1991.
[Liu et al., 2016]Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan-
der C. Berg. SSD: single shot multibox detector. In ECCV (1),
volume 9905 of LNCS, pages 21–37. Springer, 2016.
[Ma et al., 2014]Jiefei Ma, Rob Miller, Leora Morgenstern, and
Theodore Patkos. An epistemic event calculus for asp-based rea-
soning about knowledge of the past, present and future. In LPAR:
19th Intl. Conf. on Logic for Programming, Artificial Intelligence
and Reasoning, volume 26 of EPiC Series in Computing, pages
75–87. EasyChair, 2014.
[Mani and Pustejovsky, 2012]Inderjeet Mani and James Puste-
jovsky. Interpreting Motion - Grounded Representations for Spa-
tial Language, volume 5 of Explorations in language and space.
Oxford University Press, 2012.
[Milan et al., 2016]Anton Milan, Laura Leal-Taix´
e, Ian D. Reid,
Stefan Roth, and Konrad Schindler. MOT16: A benchmark for
multi-object tracking. CoRR, abs/1603.00831, 2016.
[Miller et al., 2013]Rob Miller, Leora Morgenstern, and Theodore
Patkos. Reasoning about knowledge and action in an epistemic
event calculus. In COMMONSENSE 2013, 2013.
[Muller, 1998]Philippe Muller. A qualitative theory of motion
based on spatio-temporal primitives. In Anthony G. Cohn et. al.,
editor, KR 1998, Italy. Morgan Kaufmann, 1998.
[Pan et al., 2018]Xingang Pan, Jianping Shi, Ping Luo, Xiaogang
Wang, and Xiaoou Tang. Spatial as deep: Spatial CNN for traf-
fic scene understanding. In Sheila A. McIlraith and Kilian Q.
Weinberger, editors, AAAI 2018. AAAI Press, 2018.
[Redmon and Farhadi, 2018]Joseph Redmon and Ali Farhadi.
Yolov3: An incremental improvement. CoRR, abs/1804.02767,
2018.
[Ren et al., 2015]Shaoqing Ren, Kaiming He, Ross B. Girshick,
and Jian Sun. Faster R-CNN: towards real-time object detection
with region proposal networks. In Annual Conference on Neural
Information Processing Systems 2015, Canada, 2015.
[Shanahan, 2005]Murray Shanahan. Perception as abduction:
Turning sensor data into meaningful representation. Cognitive
Science, 29(1):103–134, 2005.
[Suchan and Bhatt, 2016]Jakob Suchan and Mehul Bhatt. Seman-
tic question-answering with video and eye-tracking data: AI
foundations for human visual perception driven cognitive film
studies. In S. Kambhampati, editor, IJCAI 2016, New York, USA,
pages 2633–2639. IJCAI/AAAI Press, 2016.
[Suchan et al., 2018]Jakob Suchan, Mehul Bhatt, Przemyslaw An-
drzej Walega, and Carl P. L. Schultz. Visual explanation by high-
level abduction: On answer-set programming driven reasoning
about moving objects. In: AAAI 2018. AAAI Press, 2018.
[Tu et al., 2014]Kewei Tu, Meng Meng, Mun Wai Lee, Tae Eun
Choe, and Song Chun Zhu. Joint video and text parsing for under-
standing events and answering queries. IEEE MultiMedia, 2014.
[Yu et al., 2015]Haonan Yu, N. Siddharth, Andrei Barbu, and Jef-
frey Mark Siskind. A Compositional Framework for Grounding
Language Inference, Generation, and Acquisition in Video. J.
Artif. Intell. Res. (JAIR), 52:601–713, 2015.