Conference PaperPDF Available

Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving

Authors:

Abstract and Figures

We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e.g., semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations.
Content may be subject to copyright.
Out of Sight But Not Out of Mind:
An Answer Set Programming Based Online Abduction Framework for
Visual Sensemaking in Autonomous Driving
Jakob Suchan1,3,Mehul Bhatt2,3and Srikrishna Varadarajan3
1University of Bremen, Germany, 2¨
Orebro University, Sweden
3CoDesign Lab / Cognitive Vision – www.codesign-lab.org
Abstract
We demonstrate the need and potential of system-
atically integrated vision and semantics solutions
for visual sensemaking (in the backdrop of au-
tonomous driving). A general method for online vi-
sual sensemaking using answer set programming is
systematically formalised and fully implemented.
The method integrates state of the art in (deep
learning based) visual computing, and is developed
as a modular framework usable within hybrid ar-
chitectures for perception & control. We evaluate
and demo with community established benchmarks
KITTIMOD and MOT. As use-case, we focus on the
significance of human-centred visual sensemaking
—e.g., semantic representation and explainability,
question-answering, commonsense interpolation—
in safety-critical autonomous driving situations.
1 MOTIVATION
Autonomous driving research has received enormous aca-
demic & industrial interest in recent years. This surge has
coincided with (and been driven by) advances in deep learn-
ing based computer vision research. Although deep learn-
ing based vision & control has (arguably) been successful
for self-driving vehicles, we posit that there is a clear need
and tremendous potential for hybrid visual sensemaking so-
lutions (integrating vision and semantics) towards fulfilling
essential legal and ethical responsibilities involving explain-
ability, human-centred AI, and industrial standardisation (e.g,
pertaining to representation, realisation of rules and norms).
Autonomous Driving: “Standardisation & Regulation”
As the self-driving vehicle industry develops, it will be nec-
essary —e.g., similar to sectors such as medical comput-
ing, computer aided design— to have an articulation and
community consensus on aspects such as representation, in-
teroperability, human-centred performance benchmarks, and
data archival & retrieval mechanisms.1In spite of major in-
1Within autonomous driving, the need for standardisation and ethical reg-
ulation has most recently garnered interest internationally, e.g., with the Fed-
eral Ministry of Transport and Digital Infrastructure in Germany taking a lead in
eliciting 20 key propositions (with legal implications) for the fulfilment of ethical
commitments for automated and connected driving systems [BMVI, 2018].
gets occluded reappears
Figure 1: Out of sight but not out of mind; the case of hidden
entities: an occluded cyclist.
vestments in self-driving vehicle research, issues related to
human-centred’ness, human collaboration, and standardisa-
tion have been barely addressed, with the current focus in
driving research primarily being on two basic considerations:
how fast to drive, and which way and how much to steer.
This is necessary, but inadequate if autonomous vehicles are
to become commonplace and function with humans. Ethi-
cally driven standardisation and regulation will require ad-
dressing challenges in semantic visual interpretation, natural
/ multimodal human-machine interaction, high-level data an-
alytics (e.g., for post hoc diagnostics, dispute settlement) etc.
This will necessitate –amongst other things– human-centred
qualitative benchmarks and multifaceted hybrid AI solutions.
Visual Sensemaking Needs Both “Vision & Semantics”
We demonstrate the significance of semantically-driven
methods rooted in knowledge representation and reasoning
(KR) in addressing research questions pertaining to explain-
ability and human-centred AI particularly from the viewpoint
of sensemaking of dynamic visual imagery. Consider the oc-
clusion scenario in Fig. 1:
Car (c)isin-front, and indicating to turn-right;during this time, per-
son (p) is on a bicycle (b) and positioned front-right of cand moving-
forward. Car cturns-right, during which the bicyclist < p, b > is
not visible.Subsequently, bicyclist < p, b > reappears.
The occlusion scenario indicates several challenges concern-
ing aspects such as: identity maintenance, making default as-
sumptions, computing counterfactuals, projection, and inter-
polation of missing information (e.g., what could be hypoth-
Events
Cameras
Radars
LIDAR
GPS
... High-Level Abduction
Ontology
Semantic Query Processing
Hypothesized Situation
Low-Level
Control
Low-Level
Motion Tracking
Declarative Model of Scene Dynamics
Space
Blocked Lane
Prediction
Association
Motion Objects
Motion Tracks
Matching
Blocked Visibility Lane Changes
Hidden Entity Sudden Stop ...
Occlusion Identity
Visuo-spatial Concepts
Attachment
Control Decisions
Visual Processing
Object Detaction
Semantic Segmentation
Lane Detection Ego-Motion
assign start end halt resume standby
Hypotheses on
Object Interactions
Joint Optimization
of Scene Dynamics
Object Tracks Observations
for each t in T:
slow_down change_lane
...emergency_break
Trk1
Trk2
t1
t2
Trk1
Trk2
t1t2t3
passing behindgetting occluded
...
ts
te
ts+1
εts+1
MToi = {εts,..., εte}
Trk1
Trk2
t1
t2
moving together
P
e
r
c
e
i
v
e
D
e
c
i
d
e
I
n
t
e
r
p
r
e
t
Online
Vision and
Control
Figure 2: A General Online Abduction Framework / Conceptual Overview
esised about bicyclist < p, b > when it is occluded; how can
this hypothesis enable in planning an immediate next step).
Addressing such challenges —be it realtime or post-hoc—
in view of human-centred AI concerns pertaining to ethics,
explainability and regulation requires a systematic integra-
tion of Semantics and Vision, i.e., robust commonsense rep-
resentation & inference about spacetime dynamics on the one
hand, and powerful low-level visual computing capabilities,
e.g., pertaining to object detection and tracking on the other.
Key Contributions. We develop a general and systematic
declarative visual sensemaking method capable of online
abduction: realtime, incremental, commonsense semantic
question-answering and belief maintenance over dynamic vi-
suospatial imagery. Supported are (1–3):(1). human-centric
representations semantically rooted in spatio-linguistic prim-
itives as they occur in natural language [Bhatt et al., 2013;
Mani and Pustejovsky, 2012];(2). driven by Answer Set
Programming (ASP) [Brewka et al., 2011], the ability to ab-
ductively compute commonsense interpretations and expla-
nations in a range of (a)typical everyday driving situations,
e.g., concerning safety-critical decision-making; (3). online
performance of the overall framework modularly integrating
high-level commonsense reasoning and state of the art low-
level visual computing for practical application in real world
settings. We present the formal framework & its implemen-
tation, and demo & empirically evaluate with community es-
tablished real-world datasets and benchmarks, namely: KIT-
TIMOD [Geiger et al., 2012]and MOT [Milan et al., 2016].
2 VISUAL SENSEMAKING:
A GENERAL METHOD DRIVEN BY ASP
Our proposed framework, in essence, jointly solves the prob-
lem of assignment of detections to tracks and explaining
overall scene dynamics (e.g. appearance,disappearance) in
terms of high-level events within an online integrated low-
level visual computing and high level abductive reasoning
framework (Fig. 2). Rooted in answer set programming, the
framework is general, modular, and designed for integration
as a reasoning engine within (hybrid) architectures designed
for real-time decision-making and control where visual per-
ception is needed as one of the several components. In such
large scale AI systems the declarative model of the scene dy-
namics resulting from the presented framework can be used
for semantic Q/A, inference etc. to support decision-making.
2.1 SPACE, MOTION, OBJECTS, EVENTS
Reasoning about dynamics is based on high-level representa-
tions of objects and their respective motion & mutual interac-
tions in spacetime. Ontological primitives for commonsense
reasoning about spacetime (Σst) and dynamics (Σdyn ) are:
Σst:domain-objects O={o1, ..., on}represent the vi-
sual elements in the scene, e.g., people,cars,cyclists; el-
ements in Oare geometrically interpreted as spatial en-
tities E={ε1, ..., εn}; spatial entities Emay be regarded
as points,line-segments or (axis-aligned) rectangles based
on their spatial properties (and a particular reasoning task
at hand). The temporal dimension is represented by time
points T={t1, ..., tn}.MT oi= (εts, ..., εte) represents
the motion track of a single object oi, where tsand tede-
note the start and end time of the track and εtsto εtede-
notes the spatial entity (E) —e.g., the axis-aligned bound-
ing box—corresponding to the object oiat time points ts
to te. The spatial configuration of the scene and changes
within it are characterised based on the qualitative spatio-
temporal relationships (R) between the domain objects. For
the running and demo examples of this paper, positional re-
lations on axis-aligned rectangles based on the rectangle al-
gebra (RA) [Balbiani et al., 1999]suffice; RA uses the rela-
tions of Interval Algebra (IA) [Allen, 1983]RIA {before,
after,during,contains,starts,started by,finishes,finished by,
overlaps,overlapped by,meets,met by,equal}to relate two ob-
jects by the interval relations projected along each dimension
separately (e.g., horizontal and vertical dimensions).
Algorithm 1: Online Abduction(V,Σ)
Data: Visual imagery (V), and
background knowledge Σdef Σdyn Σst
Result: Visual Explanations (EXP)(also: Refer Fig 3)
1MT ,Hevents
2for tTdo
3VOtobserve(Vt)
4Pt,MLt
5for trk ∈ MT t1do
6ptrk kalman predict(trk)
7Pt← Ptptrk
8for obs ∈ VOtdo
9mltrk,obs calc IoU(ptrk , obs)
10 MLt← MLtmltrk,obs
11 Abduce(<Hassign
t,Hevents
t>), such that: (Step 2)
Σ∧Hevents [Hassign
t∧Hevents
t]|=VOt∧Pt∧MLt
12 Hevents ← Hevents ∪ Hevents
t
13 MT tupdate(MT t1,VOt,Hassign)
14 return EXP <Hevents,MT >
Σdyn: The set of fluents Φ={φ1, ..., φn}and events Θ=
{θ1, ..., θn}respectively characterise the dynamic properties
of the objects in the scene and high-level abducibles (Table
1). For reasoning about dynamics (with <Φ,Θ>), we use a
variant of event calculus as per [Ma et al., 2014;Miller et al.,
2013]; in particular, for examples of this paper, the functional
event calculus fragment (Σdyn)ofMa et al. [2014]suffices:
main axioms relevant here pertain to occurs-at(θ, t)denoting
that an event occurred at time tand holds-at(φ, v, t)denoting
that vholds for a fluent φat time t.2
Σ:Let Σdef Σdyn <Φ,Θ>Σst <O,E,T,MT ,R>
2.2 TRACKING AS ABDUCTION
Scene dynamics are tracked using a detect and track ap-
proach: we tightly integrate low-level visual computing (for
detecting scene elements) with high-level ASP-based abduc-
tion to solve the assignment of observations to object tracks
in an incremental manner. For each time point twe generate
aproblem specification consisting of the object tracks and vi-
sual observations and use (ASP) to abductively solve the cor-
responding assignment problem incorporating the ontological
structure of the domain / data (abstracted with Σ). Steps 1–3
(Alg. 1& Fig. 3) are as follows:
Step 1. FORMULATING THE PROBLEM SPECIFICATION
The ASP problem specification for each time point tis given
by the tuple <VOt,Pt,MLt>and the sequence of events
(Hevents) before time point t.
Visual Observations Scene elements derived directly from
the visual input data are represented as spatial entities E, i.e.,
VOt={εobs1, ..., εobsn}is the set of observations at time t
(Fig. 3). For the examples and empirical evaluation in this
paper (Sec. 3) we focus on Obstacle / Object Detections
– detecting cars, pedestrians, cyclists, traffic lights etc using
YOLOv3 [Redmon and Farhadi, 2018]. Further we generate
scene context using Semantic Segementation – segmenting
2ASP encoding of the domain independent axioms of the Functional Event
Calculus (FEC) used as per: https://www.ucl.ac.uk/infostudies/efec/fec.lp
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For each t2T
Step 1. Formulating the Problem Specification <VOt,Pt,MLt>
(1) detect Visual Observations (VOt) e.g., People, Cars, Objects, Roads, Lanes,(2)
Predictions (Pt) of next position and size of object tracks using kalman filters, and (3)
calculate Matching Likelihood (MLt) based on Intersection over Union (IoU) between
predictions and detections.
obs(obs_0,car,99). obs(...). ... box2d(obs_16,1078,86,30,44). ...
trk(trk_0,car). trk(...). ... box2d(trk_0,798,146,113,203). ...
iou(trk_0,obs_0,83921). iou(...). ... iou(trk_23,obs_16,0). ...
Step 2. Abduction based Association generate hypothesis for (1) matching of tracks
and observations (Hassign
t), and (2) and high-level events (Hevents
t) explaining (1).
trk1trk2obs2
obs1
topology: po IOU: 0.89 IOU: 0.23 IOU: 0.00
... obsn
start
...
end assign assign halt
Tracks Observations
PREDICT
trk1trk2
...
Tracks
UPDATE
tntn+1
obs1... obsn
Observations
halt assign standby
topology: po IOU: 0.91 conf: 0.43
holds-at(visibility(trk2),
partially_occluded, tn).
occurs-at(hides_behind(trk2, trk1), tn+1).
holds-at(visibility(trk2), fully_visible, tn-1).
occurs-at(gets_behind(trk2, trk1), tn).
holds-at(visibility(trk2),
fully_occluded, tn+1).
Step 3. Finding the Optimal Hypothesis Jointly optimize Hassign
tand Hevents
t
by maximizing matching likelihood MLtand minimizing event costs.
RESULT. Visuo-Spatial Scene Semantics Resulting motion tracks and the corre-
sponding event sequence, explaining the low-level motion.
... occurs_at(missing_detections(trk_10),35) ... occurs_at(...)
... occurs_at(recover(trk_10),36) ... occurs_at(lost(trk_18),41)
... occurs_at(hides_behind(trk_9,trk_10),41) ... occurs_at(...)
... occurs_at(unhides_from_behind(trk_9,trk_10),42) ...
Table 1: Computational Steps for Online Visual Abduction
Figure 3: Computational Steps for Online Visual Abduction
the road, sidewalk, buildings, cars, people, trees, etc. us-
ing DeepLabv3+ [Chen et al., 2018], and Lane Detection
estimating lane markings, to detect lanes on the road, using
SCNN [Pan et al., 2018]. Type and confidence score for each
observation is given by typeobsiand confobsi.
Movement Prediction For each track trkichanges in po-
sition and size are predicted using kalman filters; this results
in an estimate of the spatial entity εfor the next time-point t
of each motion track Pt={εtrk1, ..., εtrkn}.
Matching Likelihood For each pair of tracks and ob-
servations εtrkiand εobsj, where εtrki∈ Ptand
εobsj∈ VOt, we compute the likelihood MLt=
{mltrk1,obs1, ..., mltrki,obsj}that εobsjbelongs to εtrki. The
intersection over union (IoU) provides a measure for the
amount of overlap between the spatial entities εobsjand εtrki.
Step 2. ABDUCTION BASED ASSOCIATION Following
perception as logical abduction most directly in the sense of
Shanahan [2005], we define the task of abducing visual expla-
nations as finding an association (Hassign
t) of observed scene
elements (VOt) to the motion tracks of objects (MT ) given
by the predictions Pt, together with a high-level explanation
(Hevents
t), such that [Hassign
t∧ Hevents
t]is consistent with
the background knowledge and the previously abduced event
sequence Hevents, and entails the perceived scene given by
<VOt,Pt,MLt>:
IΣ∧ Hevents [Hassign
t∧ Hevents
t]|=V Ot∧ Pt∧ MLt
where Hassign
tconsists of the assignment of detections to ob-
ject tracks, and Hevents
tconsists of the high-level events Θ
explaining the assignments.
Associating Objects and Observations Finding the best
match between observations (VOt) and object tracks (Pt) is
done by generating all possible assignments and then max-
imising a matching likelihood mltrki,obsjbetween pairs of
spatial entities for matched observations εobsjand predicted
track region εtrki(See Step 3). Towards this we use choice
rules [Gebser et al., 2014](i.e., one of the heads of the rule
has to be in the stable model) for εobsjand εtrki, generat-
ing all possible assignments in terms of assignment actions:
assign, start, end, halt, resume, ignore det, ignore trk.
IMATCHING TRACKS AND DETECTIONS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For each assignment action we define integrity constraints3
that restrict the set of answers generated by the choice rules,
e.g., the following constraints are applied to assigning an ob-
servation εobsjto a track trki, applying thresholds on the
IoUtrki,obsjand the confidence of the observation confobsj,
further we define that the type of the observation has to match
the type of the track it is assigned to:
IINTEGRITY CONSTRAINTS ON MATCHING
Abducible High-Level Events For the length of this paper,
we restrict to high-level visuo-spatial abducibles pertaining
to object persistence and visibility (Table 1): (1).Occlusion:
Objects can disappear or reappear as result of occlusion with
other objects; (2).Entering / Leaving the Scene: Objects can
enter or leave the scene at the borders of the field of view;
(3).Noise and Missing Observation: (Missing-)observations
can be the result of faulty detections.
Lets take the case of occlusion: functional fluent visibil-
ity could be denoted fully visible,partially occluded or
fully occluded:
IVISIBILITY - FLUENT AND POSSIBLE VALUES
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We define the event hides behind/2, stating that an object
hides behind another object by defining the conditions that
3Integrity constraints restrict the set of answers by eliminating stable models
where the body is satisfied.
EVENTS Description
enters fov(Trk)Track Trk enters the field of view.
leaves fov(Trk)Track Trk leaves the field of view.
hides behind(Trk1,Trk2)Track Trk1hides behind track Trk2.
unhides from behind(Trk1,Trk2)Track Trk1unhides from behind track Trk2.
missing detections(Trk)Missing detections for track Trk.
FLUENTS Values Description
in fov(Trk){true;false}Track Trk is in the field of view.
hidden by(Trk1,Trk2){true;false}Track Trk1 is hidden by Trk2.
visibility(Trk){fully visible;
partially occluded;
fully occluded}
Visibility state of track Trk.
Table 1: Abducibles; Events and Fluents Explaining
(Dis)Appearance
have to hold for the event to possibly occur, and the ef-
fects the occurrence of the event has on the properties of
the objects, i.e., the value of the visibility fluent changes to
fully occluded.
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For abducing the occurrence of an event we use choice rules
that connect the event with assignment actions, e.g., a track
getting halted may be explained by the event that the track
hides behind another track.
IGENERATING HYPOTHESES ON EVENTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Step 3. FINDING THE OPTIMAL HYPOTHESIS To en-
sure an optimal assignment, we use ASP based optimization
to maximize the matching likelihood between matched pairs
of tracks and detections. Towards this, we first define the
matching likelihood based on the Intersection over Union
(IoU) between the observations and the predicted boxes for
each track as described in [Bewley et al., 2016]:
IASSIGNMENT LIKELIHOOD
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We then maximize the matching likelihood for all assign-
ments, using the build in maximize statement:
IMAXIMIZING ASSIGNMENT LIKELIHOOD
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
To find the best set of hypotheses with respect to the obser-
vations, we minimize the occurrence of certain events and as-
sociation actions, e.g., the following optimization statements
minimize starting and ending tracks; the resulting assignment
is then used to update the motion tracks accordingly.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
It is important here to note that: (1). by jointly abducing the
object dynamics and high-level events we can impose con-
Situation Objects Description
OVERTAKING vehicle, vehicle vehicle is overtaking another vehicle
HIDDEN ENTITY entity, object traffic participant hidden by obstacle
REDUCED VISIBILITY object visibility reduced by object in front.
SUDDEN STOP vehical vehicle in front stopping suddenly
BLOCKED LANE lane, object lane of the road is blocked by some object.
EXITING VEHICLE person, vehicle person is exiting a parked vehicle.
Table 2: Safety-Critical Situations
straints on the assignment of detections to tracks, i.e., an as-
signment is only possible if we can find an explanation sup-
porting the assignment; and (2). the likelihood that an event
occurs guides the assignments of observations to tracks. In-
stead of independently tracking objects and interpreting the
interactions, this yields to event sequences that are consistent
with the abduced object tracks, and noise in the observations
is reduced (See evaluation in Sec. 3).
3 APPLICATION & EVALUATION
We demonstrate applicability towards identifying and inter-
preting safety-critical situations (e.g., Table 2); these encom-
pass those scenarios where interpretation of spacetime dy-
namics, driving behaviour, environmental characteristics is
necessary to anticipate and avoid potential dangers.
Reasoning about Hidden Entities Consider the situation
of Fig. 4: a car gets occluded by another car turning left and
reappears in front of the autonomous vehicle. Using online
abduction for abducing high-level interactions of scene ob-
jects we can hypothesize that the car got occluded and antici-
pate its reappearance based on the perceived scene dynamics.
The following shows data and abduced events.
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We define a rule stating that a hidden object may unhide from
behind the object it is hidden by and anticipate the time point
tbased on the object movement as follows:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
We then interpolate the objects position at time point tto pre-
dict where the object may reappear.
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
For the occluded car in our example we get the following
prediction for time tand position x, y:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Based on this prediction we can then define a rule that gives
a warning if a hidden entity may reappear in front of the ve-
hides behind unhides from behind
t160 t180
t200
t220
fully_occluded
anticipated reappearance
Figure 4: Abducing Occlusion to Anticipate Reappearance
hicle, which could be used by the control mechanism, e.g., to
adapt driving and slow down in order to keep safe distance:
IMATCHING TRACKS AND DETECTIONS
1{assign(Trk,Det):det(Det,_,_);
end(Trk); ignore_trk(Trk); halt(Trk);
resume(Trk,Det):det(Det,_,_)}1:- trk(Trk,_).
1{assign(Trk,Det):trk(Trk,_);
start(Det); ignore_det(Det);
resume(Trk,Det):trk(Trk,_). }1:- det(Det,_,_).
IINTEGRITY CONSTRAINTS ON MATCHING
:- assign(Trk,Det), not assignment_constraints(Trk,Det).
assignment_constraints(Trk,Det) :-
trk(Trk,Trk_Type), trk_state(Trk,active),
det(Det,Det_Type,Conf), Conf >conf_thresh_assign,
match_type(Trk_Type,Det_Type),
iou(Trk,Det,IOU), IOU >iou_thresh.
IVISIBILITY - FLUENT AND POSSIBLE VALUES
fluent(visibility(Trk)) :- trk(Trk,_).
possVal(visibility(Trk), fully_visible):-trk(Trk,_).
possVal(visibility(Trk), partially_visible):-trk(Trk,_).
possVal(visibility(Trk), not_visible):-trk(Trk,_).
IOCCLUSION - EVENT, EFFECTS AND (SPATIAL) CONSTRAINTS
event(hides_behind(Trk1,Trk2)) :- trk(Trk1,_),trk(Trk2,_).
causesValue(hides_behind(Trk1,Trk2),
visibility(Trk1), not_visible,T) :-
trk(Trk1,_), trk(Trk2,_), time(T).
:- occurs_at(hides_behind(Trk1,Trk2), curr_time),
trk(Trk1,_), trk(Trk2,_),
not position(overlapping_top,Trk1,Trk2).
IGENERATING HYPOTHESES ON EVENTS
1{occurs_at(hides_behind(Trk,Trk2), curr_time):
trk(Trk2,_); ... }1:- halt(Trk).
IASSIGNMENT LIKELIHOOD
assignement_prob(Trk,Det,IOU) :-
det(Det,_,_), trk(Trk,_), iou(Trk,Det,IOU).
IMAXIMIZING ASSIGNMENT LIKELIHOOD
#maximize {(Prob)@1,Trk,Det :
assign(Trk,Det), assignment_prob(Trk,Det,Prob)}.
IOPTIMIZE EVENT AND ASSOCIATION COSTS
#minimize {5@2,Trk:end(Trk)}.
#minimize {5@2,Det:start(Det)}.
trk(trk_3,car). trk_state(trk_3,active). ...
... trk(trk_41,car).trk_state(trk_41,active). ...
... det(det_1,car,98). ...
box2d(trk_3,660,460,134,102). ...
... box2d(trk_41,631,471,40,47). ...
... occurs_at(hides_behind(trk_41,trk_3),179)) ...
anticipate(unhides_from_behind(Trk1,Trk2), T) :-
time(T), curr_time <T,
holds_at(hidden_by(Trk1,Trk2), curr_time),
topology(proper_part,Trk1,Trk2),
movement(moves_out_of,Trk1,Trk2,T).
point2d(interpolated_position(Trk,T), PosX,PosY) :-
time(T), curr_time <T,T1 =T-curr_time,
box2d(Trk1,X,Y,_,_), trk_mov(Trk1,MovX,MovY),
PosX =X+MovX*T1,PosY =Y+MovX*T1.
anticipate(unhides_from_behind(trk_41,trk_2), 202)
point2d(interpolated_position(trk_41,202), 738,495)
warning(hidden_entity_in_front(Trk1,T)) :-
time(T), T-curr_time <anticipation_threshold,
anticipate(unhides_from_behind(Trk1,_), T),
position(in_front,interpolated_pos(Trk1,T)).
Empirical Evaluation For online sensemaking, evaluation
focusses on accuracy of abduced motion tracks, real-time per-
formance, and the tradeoff between performance and accu-
racy. Our evaluation uses the KITTI object tracking dataset
[Geiger et al., 2012], which is a community established
benchmark dataset for autonomous cars: it consists of 21
training and 29 test scenes, and provides accurate track an-
notations for 8object classes (e.g., car, pedestrian, van, cy-
clist). We also evaluate tracking results using the more gen-
eral cross-domain Multi-Object Tracking (MOT) dataset [Mi-
lan et al., 2016]established as part of the MOT Challenge; it
consists of 7training and 7test scenes which are highly un-
constrained videos filmed with both static and moving cam-
eras. We evaluate on the available groundtruth for training
scenes of both KITTI using YOLOv3 detections, and MOT17
using the provided faster RCNN detections.
Evaluating Object Tracking For evaluating accuracy
(MOTA) and precision (MOTP) of abduced object tracks we
follow the ClearMOT [Bernardin and Stiefelhagen, 2008]
evaluation schema. Results (Table 3) show that jointly ab-
ducing high-level object interactions together with low-level
scene dynamics increases the accuracy of the object tracks,
i.e, we consistently observe an improvement of about 5%,
from 45.72% to 50.5% for cars and 28.71% to 32.57% for
pedestrians on KITTI, and from 41.4% to 46.2% on MOT.
Online Performance and Scalability Performance of on-
line abduction is evaluated with respect to its real-time ca-
pabilities.4(1). We compare the time & accuracy of online
abduction for state of the art (real-time) detection methods:
YOLOv3,SSD [Liu et al., 2016], and Faster RCNN [Ren et
al., 2015](Fig. 5). (2). We evaluate scalability of the ASP
based abduction on a synthetic dataset with controlled num-
ber of tracks and % of overlapping tracks per frame. Results
(Fig. 5) show that online abduction can perform with above
30 frames per second for scenes with up to 10 highly over-
lapping object tracks, and more than 50 tracks with 1fps (for
the sake of testing, it is worth noting that even for 100 objects
per frame it only takes about an average of 4secs per frame).
Importantly, for realistic scenes such as in the KITTI dataset,
abduction runs realtime at 33.9fps using YOLOv3, and 46.7
using SSD with a lower accuracy but providing good preci-
sion.
4Evaluation using a dedicated Intel Core i7-6850K 3.6GHz 6-Core Proces-
sor, 64GB RAM, and a NVIDIA Titan V GPU 12GB.
SEQUENCE Tracking MOTA MOTP ML MT FP FN ID sw. Frag.
KITTI tracking Cars without Abduction 45.72 % 76.89 % 19.14 % 23.04 % 785 11182 1097 1440
(8008 frames, 636 targets) with Abduction 50.5 % 74.76 % 20.21 % 23.23 % 1311 10439 165 490
KITTI tracking Pedestrians without Abduction 28.71 % 71.43 % 26.94 % 9.58 % 1261 6119 539 833
(8008 frames, 167 targets) with Abduction 32.57 % 70.68 % 22.15 % 14.37 % 1899 5477 115 444
MOT 2017 without Abduction 41.4 % 88.0 % 35.53 % 16.48 % 4877 60164 779 741
(5316 frames, 546 targets) with Abduction 46.2 % 87.9 % 31,32 % 20.7 % 5195 54421 800 904
Table 3: Evaluation of Tracking Performance; accuracy (MOTA), precision (MOTP), mostly tracked (MT) and mostly lost (ML) tracks,
false positives (FP), false negatives (FN), identity switches (ID Sw.), and fragmentation (Frag.).
DETECTOR Recall MOTA MOTP f psdet fpsabd
YOLOv3 0.690 50.5 % 74.76 % 45 33.9
SSD 0.599 30.63 % 77.4 % 8 46.7
FRCNN 0.624 37.96 % 72.9 % 5 32.0
ms/frame
1
10
100
1K
num. tracks
5
10
20
50
100
30 fps
1 fps
15 fps
2
No. tracks ms/fr ame fps
5 23.33 42.86
10 31.36 31.89
20 62.08 16.11
50 511.83 1.95
100 3996.38 0.25
Figure 5: Online Performance and Scalability; performance for
pretrained detectors on the ’cars’ class of KITTI dataset, and pro-
cessing time relative to the no. of tracks on synthetic dataset.
Discussion of Empirical Results Results show that inte-
grating high-level abduction and object tracking improves the
resulting object tracks and reduce the noise in the visual ob-
servations. For the case of online visual sense-making, ASP
based abduction provides the required performance: even
though the complexity of ASP based abduction increases
quickly, with large numbers of tracked objects the frame-
work can track up to 20 objects simultaneously with 30fps
and achieve real-time performance on the KITTI benchmark
dataset. It is also important to note that the tracking approach
in this paper is based on tracking by detection using a naive
measure, i.e, the IoU (Sec. 2.2; Step 1), to associate obser-
vations and tracks, and it is not using any visual information
in the prediction or association step. Naturally, this results in
a lower accuracy, in particular when used with noisy detec-
tions and when tracking fast moving objects in a benchmark
dataset such as KITTI. That said, due to the modularity of
the implemented framework, extensions with different meth-
ods for predicting motion (e.g., using particle filters or opti-
cal flow based prediction) are straightforward: i.e., improving
tracking is not the aim of our research.
4 RELATED WORK
ASP is now widely used as an underlying knowledge
representation language and robust methodology for non-
monotonic reasoning [Brewka et al., 2011;Gebser et al.,
2012]. With ASP as a foundation, and driven by semantics,
commonsense and explainability [Davis and Marcus, 2015],
this research aims to bridge the gap between high-level for-
malisms for logical abduction and low-level visual processing
by tightly integrating semantic abstractions of space-change
with their underlying numerical representations. Within KR,
the significance of high-level (abductive) explanations in a
range of contexts is well-established: planning & process
recognition [Kautz, 1991], vision & abduction [Shanahan,
2005], probabilistic abduction [Blythe et al., 2011], reason-
ing about spatio-temporal dynamics [Bhatt and Loke, 2008],
reasoning about continuous spacetime change [Muller, 1998;
Hazarika and Cohn, 2002]etc. Dubba et al. [2015]uses ab-
ductive reasoning in an inductive-abductive loop within in-
ductive logic programming (ILP).Aditya et al. [2015]for-
malise general rules for image interpretation with ASP. Simi-
larly motivated to this research is [Suchan et al., 2018], which
uses a two-step approach (with one huge problem specifica-
tion), first tracking and then explaining (and fixing) tracking
errors; such an approach is not runtime / realtime capable. In
computer vision research there has recently been an interest to
synergise with cognitively motivated methods; in particular,
e.g., for perceptual grounding & inference [Yu et al., 2015]
and combining video analysis with textual information for
understanding events & answering queries about video data
[Tu et al., 2014].
5 CONCLUSION & OUTLOOK
We develop a novel abduction-driven online (i.e., realtime, in-
cremental) visual sensemaking framework: general, system-
atically formalised, modular and fully implemented. Integrat-
ing robust state-of-the-art methods in knowledge representa-
tion and computer vision, the framework has been evaluated
and demonstrated with established benchmarks. We highlight
application prospects of semantic vision for autonomous driv-
ing, a domain of emerging & long-term significance. Spe-
cialised commonsense theories (e.g., about multi-sensory in-
tegration & multi-agent belief merging, contextual knowl-
edge) may be incorporated based on requirements. Our on-
going focus is to develop a novel dataset emphasising se-
mantics and (commonsense) explainability; this is driven by
mixed-methods research –AI, Psychology, HCI– for the study
of driving behaviour in low-speed, complex urban environ-
ments with unstructured traffic. Here, emphasis is on natural
interactions (e.g., gestures, joint attention) amongst drivers,
pedestrians, cyclists etc. Such interdisciplinary studies are
needed to better appreciate the complexity and spectrum of
varied human-centred challenges in autonomous driving, and
demonstrate the significance of integrated vision & semantics
solutions in those contexts.
Acknowledgements
Partial funding by the German Research Foundation (DFG)
via the CRC 1320 EASE – Everyday Activity Science and
Engineering” (www.ease-crc.org), Project P3: Spatial Rea-
soning in Everyday Activity is acknowledged.
References
[Aditya et al., 2015]Somak Aditya, Yezhou Yang, Chitta Baral,
Cornelia Fermuller, and Yiannis Aloimonos. Visual common-
sense for scene understanding using perception, semantic parsing
and reasoning. In 2015 AAAI Spring Symposium Series, 2015.
[Allen, 1983]James F. Allen. Maintaining knowledge about tem-
poral intervals. Commun. ACM, 26(11):832–843, 1983.
[Balbiani et al., 1999]Philippe Balbiani, Jean-Franc¸ois Condotta,
and Luis Fari˜
nas del Cerro. A new tractable subclass of the rect-
angle algebra. In Thomas Dean, editor, IJCAI 1999, Sweden,
pages 442–447. Morgan Kaufmann, 1999.
[Bernardin and Stiefelhagen, 2008]Keni Bernardin and Rainer
Stiefelhagen. Evaluating multiple object tracking performance:
The clear mot metrics. EURASIP Journal on Image and Video
Processing, 2008(1):246309, May 2008.
[Bewley et al., 2016]Alex Bewley, Zongyuan Ge, Lionel Ott,
Fabio Ramos, and Ben Upcroft. Simple online and realtime track-
ing. In 2016 IEEE International Conference on Image Processing
(ICIP), pages 3464–3468, 2016.
[Bhatt and Loke, 2008]Mehul Bhatt and Seng W. Loke. Modelling
dynamic spatial systems in the situation calculus. Spatial Cogni-
tion & Computation, 8(1-2):86–130, 2008.
[Bhatt et al., 2013]Mehul Bhatt, Carl Schultz, and Christian
Freksa. The ‘Space’ in Spatial Assistance Systems: Conception,
Formalisation and Computation. In: Representing space in cog-
nition: Interrelations of behavior, language, and formal models.
Series: Explorations in Language and Space. 978-0-19-967991-
1, Oxford University Press, 2013.
[Blythe et al., 2011]James Blythe, Jerry R. Hobbs, Pedro Domin-
gos, Rohit J. Kate, and Raymond J. Mooney. Implementing
weighted abduction in markov logic. In Proc. of 9th Intl. Confer-
ence on Computational Semantics, IWCS ’11, USA, 2011. ACL.
[BMVI, 2018]BMVI. Report by the ethics commission on auto-
mated and connected driving., bmvi: Federal ministry of trans-
port and digital infrastructure, germany, 2018.
[Brewka et al., 2011]Gerhard Brewka, Thomas Eiter, and
Miroslaw Truszczy´
nski. Answer set programming at a glance.
Commun. ACM, 54(12):92–103, December 2011.
[Chen et al., 2018]Liang-Chieh Chen, Yukun Zhu, George Papan-
dreou, Florian Schroff, and Hartwig Adam. Encoder-decoder
with atrous separable convolution for semantic image segmen-
tation. arXiv:1802.02611, 2018.
[Davis and Marcus, 2015]Ernest Davis and Gary Marcus. Com-
monsense reasoning and commonsense knowledge in artificial
intelligence. Commun. ACM, 58(9):92–103, 2015.
[Dubba et al., 2015]Krishna Sandeep Reddy Dubba, Anthony G.
Cohn, David C. Hogg, Mehul Bhatt, and Frank Dylla. Learning
relational event models from video. J. Artif. Intell. Res. (JAIR),
53:41–90, 2015.
[Gebser et al., 2012]Martin Gebser, Roland Kaminski, Benjamin
Kaufmann, and Torsten Schaub. Answer Set Solving in Practice.
Morgan & Claypool Publishers, 2012.
[Gebser et al., 2014]Martin Gebser, Roland Kaminski, Benjamin
Kaufmann, and Torsten Schaub. Clingo = ASP + control: Pre-
liminary report. CoRR, abs/1405.3694, 2014.
[Geiger et al., 2012]Andreas Geiger, Philip Lenz, and Raquel Ur-
tasun. Are we ready for autonomous driving? the kitti vision
benchmark suite. In Conference on Computer Vision and Pattern
Recognition (CVPR), 2012.
[Hazarika and Cohn, 2002]Shyamanta M Hazarika and Anthony G
Cohn. Abducing qualitative spatio-temporal histories from par-
tial observations. In KR, pages 14–25, 2002.
[Kautz, 1991]Henry A. Kautz. Reasoning about plans. chapter
A Formal Theory of Plan Recognition and Its Implementation,
pages 69–124. Morgan Kaufmann Publishers Inc., USA, 1991.
[Liu et al., 2016]Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan-
der C. Berg. SSD: single shot multibox detector. In ECCV (1),
volume 9905 of LNCS, pages 21–37. Springer, 2016.
[Ma et al., 2014]Jiefei Ma, Rob Miller, Leora Morgenstern, and
Theodore Patkos. An epistemic event calculus for asp-based rea-
soning about knowledge of the past, present and future. In LPAR:
19th Intl. Conf. on Logic for Programming, Artificial Intelligence
and Reasoning, volume 26 of EPiC Series in Computing, pages
75–87. EasyChair, 2014.
[Mani and Pustejovsky, 2012]Inderjeet Mani and James Puste-
jovsky. Interpreting Motion - Grounded Representations for Spa-
tial Language, volume 5 of Explorations in language and space.
Oxford University Press, 2012.
[Milan et al., 2016]Anton Milan, Laura Leal-Taix´
e, Ian D. Reid,
Stefan Roth, and Konrad Schindler. MOT16: A benchmark for
multi-object tracking. CoRR, abs/1603.00831, 2016.
[Miller et al., 2013]Rob Miller, Leora Morgenstern, and Theodore
Patkos. Reasoning about knowledge and action in an epistemic
event calculus. In COMMONSENSE 2013, 2013.
[Muller, 1998]Philippe Muller. A qualitative theory of motion
based on spatio-temporal primitives. In Anthony G. Cohn et. al.,
editor, KR 1998, Italy. Morgan Kaufmann, 1998.
[Pan et al., 2018]Xingang Pan, Jianping Shi, Ping Luo, Xiaogang
Wang, and Xiaoou Tang. Spatial as deep: Spatial CNN for traf-
fic scene understanding. In Sheila A. McIlraith and Kilian Q.
Weinberger, editors, AAAI 2018. AAAI Press, 2018.
[Redmon and Farhadi, 2018]Joseph Redmon and Ali Farhadi.
Yolov3: An incremental improvement. CoRR, abs/1804.02767,
2018.
[Ren et al., 2015]Shaoqing Ren, Kaiming He, Ross B. Girshick,
and Jian Sun. Faster R-CNN: towards real-time object detection
with region proposal networks. In Annual Conference on Neural
Information Processing Systems 2015, Canada, 2015.
[Shanahan, 2005]Murray Shanahan. Perception as abduction:
Turning sensor data into meaningful representation. Cognitive
Science, 29(1):103–134, 2005.
[Suchan and Bhatt, 2016]Jakob Suchan and Mehul Bhatt. Seman-
tic question-answering with video and eye-tracking data: AI
foundations for human visual perception driven cognitive film
studies. In S. Kambhampati, editor, IJCAI 2016, New York, USA,
pages 2633–2639. IJCAI/AAAI Press, 2016.
[Suchan et al., 2018]Jakob Suchan, Mehul Bhatt, Przemyslaw An-
drzej Walega, and Carl P. L. Schultz. Visual explanation by high-
level abduction: On answer-set programming driven reasoning
about moving objects. In: AAAI 2018. AAAI Press, 2018.
[Tu et al., 2014]Kewei Tu, Meng Meng, Mun Wai Lee, Tae Eun
Choe, and Song Chun Zhu. Joint video and text parsing for under-
standing events and answering queries. IEEE MultiMedia, 2014.
[Yu et al., 2015]Haonan Yu, N. Siddharth, Andrei Barbu, and Jef-
frey Mark Siskind. A Compositional Framework for Grounding
Language Inference, Generation, and Acquisition in Video. J.
Artif. Intell. Res. (JAIR), 52:601–713, 2015.
... In this position statement, we aim to present the confluence of the aforestated computational and behavioural aspects of our research. Our aim is to conceptually highlight recent and emerging work, while pointing out interested readers to relevant publications -e.g., particularly [2,4,5,6,7]where furthher (KR-centric) details may be consulted. ...
... Focus is on a computational framework for semantic-question answering with video and eye-tracking data founded in constraint logic programming; we also demonstrate an application in cognitive film & media studies, where human perception of films vis-a-via cinematographic devices is of interest. [4,7,24]. Focus is on a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data (for active vision for autonomous driving, as well as for offline processing). ...
... In this position statement, we have attempted to summarise our ongoing work towards establishing a human-centric foundation and roadmap for the development of neurosymbolically grounded inference about embodied multimodal interaction as relevant to a range of application contexts. For key technical details and to obtain a summary of open directions, we direct interested readers to [2,4,5,6,7]. ...
Conference Paper
Full-text available
We position recent and emerging research in cognitive vision and perception addressing three key questions: (1) What kind of relational abstraction mechanisms are needed to perform (explainable) grounded inference --e.g., question-answering, qualitative generalisation, hypothetical reasoning-- relevant to embodied multimodal interaction? (2) How can such abstraction mechanisms be founded on behaviourally established cognitive human-factors emanating from naturalistic empirical observation? and (3) How to articulate behaviourally established abstraction mechanisms as formal declarative models suited for grounded knowledge representation and reasoning (KR) as part of large-scale hybrid AI and computational cognitive systems. We contextualise (1--3) in the backdrop of recent results at the interface of AI/KR, and Spatial Cognition and Computation. Our main purpose is to emphasise the importance of behavioural research based foundations for next-generation, human-centred AI, e.g., as relevant to applications in Autonomous Vehicles, Social and Industrial Robots, and Visuo-Auditory Media.
... For an AI system, we employ research into generative models (Jebara, 2012) with inductive logic programming (ILP) (Muggleton, 1991) to achieve appropriate model fusion, which is comparable to process mining. Generative models and ILP are one of the most active research areas in the field of AI inference technology and are now being widely used in everything, for example, data augmentation to investigate unknown infectious diseases by the former (Waheed et al., 2020) and advanced satisfiability problem solving for autonomous driving controls by the latter (Suchan et al., 2019). The customer journey, automatically output as the third layer, is both an explanation of the environment learned in the black box and a predictor that is continuously updated as a behavioral model. ...
Article
Full-text available
Due to advances in computing power and internet technology, various industrial sectors are adopting IT infrastructure and artificial intelligence (AI) technologies. Recently, data-driven predictions have attracted interest in high-stakes decision-making. Despite this, advanced AI methods are less often used for such tasks. This is because AI technology is a black box for the social systems it is meant to support; trustworthiness and fairness have not yet been established. Meanwhile in the field of marketing, strategic decision-making is a high-stakes problem that has a significant impact on business trends. For global marketing, with its diverse cultures and market environments, future decision-making is likely to focus on building consensus on the formulation of the problem itself rather than on solutions for achieving the goal. There are two important and conflicting facts: the fact that the core of domestic strategic decision-making comes down to the formulation of the problem itself, and the fact that it is difficult to realize AI technology that can achieve problem formulation. How can we resolve this difficulty with current technology? This is the main challenge for the realization of high-level human-AI systems in the marketing field. Thus, we propose customer journey mapping (CJM) automation through model-level data fusion, a process for the practical problem formulation known as explainable alignment. Using domain-specific requirements and observations as inputs, the system automatically outputs a CJM. Explainable alignment corresponds with both human and AI perspectives and in formulating the problem, thereby improving strategic decision-making in marketing. Following preprocessing to make latent variables and their dynamics transparent with latent Dirichlet allocation and a variational autoencoder, a post-hoc explanation is implemented in which a hidden Markov model and learning from an interpretation transition are combined with a long short-term memory architecture that learns sequential data between touchpoints for extracting attitude rules for CJM. Finally, we realize the application of human-AI systems to strategic decision-making in marketing with actual logs in over-the-top media services, in which the dynamic behavior of customers for CJM can be automatically extracted.
... The system outlined in Suchan et al. (2019) is a hybrid, modular framework for visual sensemaking and real-time decision support, that utilizes deep learning for low-level visual computing and ASP for high-level commonsense reasoning. The system is designed for online track abduction as well as providing explainability of situational dynamics, and affords "realtime, incremental, commonsense question answering and belief maintenance over dynamic visuospatial imagery". ...
Technical Report
Full-text available
In this technical report, a novel technique for automating the task of detecting high-level situations in data streams is surveyed. The technique combines the two main flavors of artificial intelligence methods: subsymbolic approaches (in particular machine learning) and symbolic (logic-based) approaches in an attempt to exploit their respective strengths and mask their respective weaknesses. In order to limit the task, the focus has been on solutions utilizing the logic programming language Answer Set Programming (ASP), and the main question was whether it is feasible to use neuro-symbolic ASP in real-time systems. The results in the surveyed articles indicates that this can be case. However, due to the fact that the complexity of the reasoning involved highly influences the time needed to do the calculations, this finding is not definitive. There is thus a need to follow up on this preliminary study with more in-depth technical studies before the usefulness of these techniques in a military real-time context can be established.
... Among them, many recent works, e.g. [35,17], were motivated by several use cases in autonomous driving and robotics. In parallel, reasoning and knowledge-based features attracted many research in robotics such as KnowRob [2] and semantics for robotic mapping, perception, and interaction [11]. ...
Preprint
Full-text available
Stream processing and reasoning is getting considerable attention in various application domains such as IoT, Industry IoT and Smart Cities. In parallel, reasoning and knowledge-based features have attracted research into many areas of robotics, such as robotic mapping, perception and interaction. To this end, the Semantic Stream Reasoning (SSR) framework can unify the representations of symbolic/semantic streams with deep neural networks, to integrate high-dimensional data streams, such as video streams and LiDAR point clouds, with traditional graph or relational stream data. As such, this positioning and system paper will outline our approach to build a platform to facilitate semantic stream reasoning capabilities on a robotic operating system called SemRob.
... The experiments collected on the INTER-ACTION dataset [130] demonstrate efficiency of the proposed technique and show that merging at highway on-ramps can be delineated by three interpretable internal states in terms of absolute speed of a vehicle while merging. Suchan et al. [116] develop an answer set programming-based abductive reasoning framework for online sensemaking that is useful for perception and control tasks. In its essence, the framework integrates knowledge representation and computer vision in an online manner to explain dynamics of traffic scenes, particularly occlusion scenarios. ...
Preprint
Full-text available
Autonomous driving has achieved a significant milestone in research and development over the last decade. There is increasing interest in the field as the deployment of self-operating vehicles on roads promises safer and more ecologically friendly transportation systems. With the rise of computationally powerful artificial intelligence (AI) techniques, autonomous vehicles can sense their environment with high precision, make safe real-time decisions, and operate more reliably without human interventions. However, intelligent decision-making in autonomous cars is not generally understandable by humans in the current state of the art, and such deficiency hinders this technology from being socially acceptable. Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how these decisions are constructed in order to be regulatory compliant across many jurisdictions. Our study sheds a comprehensive light on developing explainable artificial intelligence (XAI) approaches for autonomous vehicles. In particular, we make the following contributions. First, we provide a thorough overview of the present gaps with respect to explanations in the state-of-the-art autonomous vehicle industry. We then show the taxonomy of explanations and explanation receivers in this field. Thirdly, we propose a framework for an architecture of end-to-end autonomous driving systems and justify the role of XAI in both debugging and regulating such systems. Finally, as future research directions, we provide a field guide on XAI approaches for autonomous driving that can improve operational safety and transparency towards achieving public approval by regulators, manufacturers, and all engaged stakeholders.
... Use of formal logic and ASP to model driving has been proposed in the past. Bhatt et al. have employed ASP and the CLINGO system for autonomous driving experiments (Suchan et al. 2018;Suchan, Bhatt, and Varadarajan 2019). They propose a framework that takes visual observations computed by deep learning methods as input and provides visuo-spatial semantics at each timestamp. ...
Preprint
Full-text available
Driving an automobile involves the tasks of observing surroundings, then making a driving decision based on these observations (steer, brake, coast, etc.). In autonomous driving, all these tasks have to be automated. Autonomous driving technology thus far has relied primarily on machine learning techniques. We argue that appropriate technology should be used for the appropriate task. That is, while machine learning technology is good for observing and automatically understanding the surroundings of an automobile, driving decisions are better automated via commonsense reasoning rather than machine learning. In this paper, we discuss (i) how commonsense reasoning can be automated using answer set programming (ASP) and the goal-directed s(CASP) ASP system, and (ii) develop the AUTO-DISCERN system using this technology for automating decision-making in driving. The goal of our research, described in this paper, is to develop an autonomous driving system that works by simulating the mind of a human driver. Since driving decisions are based on human-style reasoning, they are explainable, their ethics can be ensured, and they will always be correct, provided the system modeling and system inputs are correct.
... Such languages consist of abstract, qualitative, expressions like inside, before, or north 5 of to spatially or temporally relate two or more objects to one another, without involving any quantitative information. Thus, QSTR offers tools for efficiently automating common-sense spatio-temporal reasoning and, hence, further boosts research to a plethora of application areas and domains that deal with spatiotemporal information, such as cognitive robotics [3], deep learning [4], visual 10 explanation [5] and sensemaking [6], semantic question-answering [7], qualitative simulation [8], modal logic [9,10,11,12,13], temporal diagnosis [14], and stream reasoning [15,16]. ...
Article
We introduce and evaluate dynamic branching strategies for solving Qualitative Constraint Networks (QCNs), which are networks that are mostly used to represent and reason about spatial and temporal information via the use of simple qualitative relations, e.g., a constraint can be “Task A is scheduled after or during Task C”. In qualitative constraint-based reasoning, the state-of-the-art approach for tackling a given QCN consists in employing a backtracking algorithm, where the branching decisions during search are governed by the restrictiveness of the possible relations for a given constraint (e.g., after can be more restrictive than during). In the literature, that restrictiveness is defined a priori by means of static weights that are precomputed and associated with the relations of a given calculus, without any regard to the particulars of a given network instance of that calculus, such as its structure. In this paper, we address this limitation by proposing heuristics that dynamically associate a weight with a relation, based on the count of local models (or local scenarios) that the relation is involved with in a given QCN; these models are local in that they focus on triples of variables instead of the entire QCN. Therefore, our approach is adaptive and seeks to make branching decisions that preserve most of the solutions by determining what proportion of local solutions agree with that decision. Experimental results with a random and a structured dataset of QCNs of Interval Algebra show that it is possible to achieve up to 5 times better performance for structured instances, whilst maintaining non-negligible gains of around 20% for random ones. Finally, we show that these results can be further and notably improved via a selection protocol algorithm that aims to effectively and efficiently synthesize the involved heuristics (static or dynamic) into an overall better performing meta-heuristic in the phase transition.
Article
Logic Production System (LPS) is a logic-based framework for modelling reactive behaviour. Based on abductive logic programming, it combines reactive rules with logic programs, a database and a causal theory that specifies transitions between the states of the database. This paper proposes a systematic mapping of the Kernel of this framework (called KELPS) into an answer set program (ASP). For this purpose a new variant of KELPS with finite models, called n -distance KELPS, is introduced. A formal definition of the mapping from this n -distance KELPS to ASP is given and proven sound and complete. The Answer Set Programming paradigm allows to capture additional behaviours to the basic reactivity of KELPS, in particular proactive, pre-emptive and prospective behaviours. These are all discussed and illustrated with examples. Then a hybrid framework is proposed that integrates KELPS and ASP, allowing to combine the strengths of both paradigms.
Preprint
Logic Production System (LPS) is a logic-based framework for modelling reactive behaviour. Based on abductive logic programming, it combines reactive rules with logic programs, a database and a causal theory that specifies transitions between the states of the database. This paper proposes a systematic mapping of the Kernel of this framework (called KELPS) into an answer set program (ASP). For this purpose a new variant of KELPS with finite models, called $n$-distance KELPS, is introduced. A formal definition of the mapping from this $n$-distance KELPS to ASP is given and proven sound and complete. The Answer Set Programming paradigm allows to capture additional behaviours to the basic reactivity of KELPS, in particular proactive, preemptive and prospective behaviours. These are all discussed and illustrated with examples. Then a hybrid framework is proposed that integrates KELPS and ASP, allowing to combine the strengths of both paradigms. Under consideration in Theory and Practice of Logic Programming (TPLP).
Keni Bernardin and Rainer Stiefelhagen. Evaluating multiple object tracking performance: The clear mot metrics
References [Aditya et al., 2015] Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, and Yiannis Aloimonos. Visual commonsense for scene understanding using perception, semantic parsing and reasoning. In 2015 AAAI Spring Symposium Series, 2015. [Allen, 1983] James F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832-843, 1983. [Balbiani et al., 1999] Philippe Balbiani, Jean-François Condotta, and Luis Fariñas del Cerro. A new tractable subclass of the rectangle algebra. In Thomas Dean, editor, IJCAI 1999, Sweden, pages 442-447. Morgan Kaufmann, 1999. [Bernardin and Stiefelhagen, 2008] Keni Bernardin and Rainer Stiefelhagen. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing, 2008(1):246309, May 2008. [Bewley et al., 2016] Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP), pages 3464-3468, 2016.
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks
  • Milan
and Pustejovsky, 2012] Inderjeet Mani and James Pustejovsky. Interpreting Motion -Grounded Representations for Spatial Language, volume 5 of Explorations in language and space. Oxford University Press, 2012. [Milan et al., 2016] Anton Milan, Laura Leal-Taixé, Ian D. Reid, Stefan Roth, and Konrad Schindler. MOT16: A benchmark for multi-object tracking. CoRR, abs/1603.00831, 2016. [Miller et al., 2013] Rob Miller, Leora Morgenstern, and Theodore Patkos. Reasoning about knowledge and action in an epistemic event calculus. In COMMONSENSE 2013, 2013. [Muller, 1998] Philippe Muller. A qualitative theory of motion based on spatio-temporal primitives. In Anthony G. Cohn et. al., editor, KR 1998, Italy. Morgan Kaufmann, 1998. [Pan et al., 2018] Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Spatial as deep: Spatial CNN for traffic scene understanding. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, AAAI 2018. AAAI Press, 2018. [Redmon and Farhadi, 2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. CoRR, abs/1804.02767, 2018. [Ren et al., 2015] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Annual Conference on Neural Information Processing Systems 2015, Canada, 2015. [Shanahan, 2005] Murray Shanahan. Perception as abduction: Turning sensor data into meaningful representation. Cognitive Science, 29(1):103-134, 2005. [Suchan and Bhatt, 2016] Jakob Suchan and Mehul Bhatt. Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In S. Kambhampati, editor, IJCAI 2016, New York, USA, pages 2633-2639. IJCAI/AAAI Press, 2016. [Suchan et al., 2018] Jakob Suchan, Mehul Bhatt, Przemyslaw Andrzej Walega, and Carl P. L. Schultz. Visual explanation by highlevel abduction: On answer-set programming driven reasoning about moving objects. In: AAAI 2018. AAAI Press, 2018. [Tu et al., 2014] Kewei Tu, Meng Meng, Mun Wai Lee, Tae Eun Choe, and Song Chun Zhu. Joint video and text parsing for understanding events and answering queries. IEEE MultiMedia, 2014. [Yu et al., 2015] Haonan Yu, N. Siddharth, Andrei Barbu, and Jeffrey Mark Siskind. A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video. J. Artif. Intell. Res. (JAIR), 52:601-713, 2015.
Encoder-decoder with atrous separable convolution for semantic image segmentation
et al., 2018] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv:1802.02611, 2018. [Davis and Marcus, 2015] Ernest Davis and Gary Marcus. Commonsense reasoning and commonsense knowledge in artificial intelligence. Commun. ACM, 58(9):92-103, 2015. [Dubba et al., 2015] Krishna Sandeep Reddy Dubba, Anthony G. Cohn, David C. Hogg, Mehul Bhatt, and Frank Dylla. Learning relational event models from video. J. Artif. Intell. Res. (JAIR), 53:41-90, 2015.
Abducing qualitative spatio-temporal histories from partial observations
  • Gebser
Gebser et al., 2012] Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub. Answer Set Solving in Practice. Morgan & Claypool Publishers, 2012. [Gebser et al., 2014] Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub. Clingo = ASP + control: Preliminary report. CoRR, abs/1405.3694, 2014. [Geiger et al., 2012] Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [Hazarika and Cohn, 2002] Shyamanta M Hazarika and Anthony G Cohn. Abducing qualitative spatio-temporal histories from partial observations. In KR, pages 14-25, 2002.
Rob Miller, Leora Morgenstern, and Theodore Patkos. Reasoning about knowledge and action in an epistemic event calculus
  • Milan
Milan et al., 2016] Anton Milan, Laura Leal-Taixé, Ian D. Reid, Stefan Roth, and Konrad Schindler. MOT16: A benchmark for multi-object tracking. CoRR, abs/1603.00831, 2016. [Miller et al., 2013] Rob Miller, Leora Morgenstern, and Theodore Patkos. Reasoning about knowledge and action in an epistemic event calculus. In COMMONSENSE 2013, 2013.