Conference PaperPDF Available

Automatic Generation of Situation Models for Plan Recognition Problems

Authors:

Abstract and Figures

Recent attempts at behaviour understanding through language grounding have shown that it is possible to automatically generate models for planning problems from textual instructions. One drawback of these approaches is that they either do not make use of the semantic structure behind the model elements identified in the text, or they manually incorporate a collection of concepts with semantic relationships between them. We call this collection of knowledge situation model. The situation model introduces additional context information to the model. It could also potentially reduce the complexity of the planning problem compared to models that do not use situation models. To address this problem, we propose an approach that automatically generates the situation model from textual instructions. The approach is able to identify various hierarchical, spatial, directional, and causal relations. We use the situation model to automatically generate planning problems in a PDDL notation and we show that the situation model reduces the complexity of the PDDL model in terms of number of operators and branching factor compared to planning models that do not make use of situation models. We also compare the generated PDDL model to a handcrafted one and show that the generated model performs comparable to simple handcrafted models.
Content may be subject to copyright.
Automatic Generation of Situation Models for Plan Recognition Problems
Kristina Y. Yordanova
University of Rostock
18059 Rostock
Germany
kristina.yordanova@uni-rostock.de
Abstract
Recent attempts at behaviour understand-
ing through language grounding have
shown that it is possible to automatically
generate models for planning problems
from textual instructions. One drawback
of these approaches is that they either do
not make use of the semantic structure be-
hind the model elements identified in the
text, or they manually incorporate a col-
lection of concepts with semantic relation-
ships between them. We call this collec-
tion of knowledge situation model. The
situation model introduces additional con-
text information to the model. It could
also potentially reduce the complexity of
the planning problem compared to mod-
els that do not use situation models. To
address this problem, we propose an ap-
proach that automatically generates the
situation model from textual instructions.
The approach is able to identify various hi-
erarchical, spatial, directional, and causal
relations. We use the situation model to
automatically generate planning problems
in a PDDL notation and we show that the
situation model reduces the complexity of
the PDDL model in terms of number of
operators and branching factor compared
to planning models that do not make use
of situation models.
1 Introduction
Libraries of plans combined with observations are
often used for behaviour understanding (Ramirez
and Geffner,2011;Kr¨
uger et al.,2014;Yor-
danova and Kirste,2015). Such approaches rely
on PDDL-like notations to generate a library of
plans and then reason about the agent’s actions,
plans, and goals based on observations. Mod-
els describing plan recognition problems for be-
haviour understanding are typically manually de-
veloped (Ram´
ırez and Geffner,2009;Ramirez and
Geffner,2011;Baker et al.,2009). The man-
ual modelling is however time consuming and
error prone and often requires domain expertise
(Nguyen et al.,2013).
To reduce the need of domain experts and to re-
duce the time required for building the model, one
can substitute them with textual data (Philipose
et al.,2004). More precisely, one can utilise the
knowledge encoded in textual instructions to learn
the model structure. Textual instructions specify
tasks for achieving a given goal without explicitly
stating all the required steps. On the one hand, this
makes them a challenging source for learning a
model (Branavan et al.,2010). On the other hand,
they are usually written in imperative form, have
a simple sentence structure, and are highly organ-
ised. Compared to rich texts, this makes them a
better source for identifying the sequence of ac-
tions needed for reaching the goal (Zhang et al.,
2012).
According to (Branavan et al.,2012), to learn a
model for planning problems from textual instruc-
tions, the system has to: 1. extract the actions’ se-
mantics from the text, 2. learn the model seman-
tics through language grounding, 3. and finally to
translate it into computational model for plan-
ning problems.
In this work we add the learning of a situ-
ation model as a requirement for learning the
model structure. As the name suggests, it provides
context information about the situation (Ye et al.,
2012). It is a collection of concepts with semantic
relations between them. In that sense, the situation
model plays the role of the common knowledge
base shared between different entities.
In this work, we show that a computational
model for plan recognition problems can bene-
fit from a situation model, which describes the
semantic structure of the model elements, as it
(1) introduces additional context to the model and
(2) it can be used to reduce the model complex-
ity through action specialisation. We propose
a method for learning the situation model from
textual instructions that relies on language tax-
onomies, word dependencies and implicit causal
relations to identify the semantic structure of the
model. We use the situation model to generate
planning operators for a planning problem in a
Planning Domain Definition Language (PDDL)
notation. We evaluate our approach by generating
a model that describes the preparation of brown-
ies. We compare the model complexity with and
without the usage of the situation model in terms
of number of operators and mean branching factor.
2 Related Work
The goal of grounded language acquisition is to
learn linguistic analysis from a situated context
(Branavan et al.,2011;Vogel and Jurafsky,2010).
This could be done in different ways: through
grammatical patterns that are used to map the
sentence to a machine understandable model of
the sentence (Li et al.,2010;Zhang et al.,2012;
Branavan et al.,2012); through machine learn-
ing techniques (Sil and Yates,2011;Chen and
Mooney,2011;Benotti et al.,2014;Goldwasser
and Roth,2014;Kollar et al.,2014); or through
reinforcement learning approaches that learn lan-
guage by interacting with an external environment
(Branavan et al.,2012,2011,2010;Vogel and Ju-
rafsky,2010;Babes¸-Vroman et al.,2012;Gold-
wasser and Roth,2014;Kollar et al.,2014).
Models learned through language grounding
have been used for plan generation (Li et al.,2010;
Branavan et al.,2012), for learning the optimal se-
quence of instruction execution (Branavan et al.,
2011,2010), for learning navigational directions
(Vogel and Jurafsky,2010;Chen and Mooney,
2011), and for interpreting human instructions for
robots to follow them (Kollar et al.,2014;Tenorth
et al.,2010).
All of the above approaches have two draw-
backs. The first problem is the way in which the
preconditions and effects for the planning opera-
tors are identified. They are learned through ex-
plicit causal relations, that are grammatically ex-
pressed in the text (Li et al.,2010;Sil and Yates,
2011). The existing approaches, however, either
rely on initial manual definition to learn these re-
lations (Branavan et al.,2012), or on grammati-
cal patterns and rich texts with complex sentence
structure (Li et al.,2010). Textual instructions
however usually have a simple sentence struc-
ture where grammatical patterns are rarely discov-
ered (Yordanova,2015). The existing approaches
do not address the problem of discovering causal
relations between sentences, but assume that all
causal relations are expressed within the sentence
(Tenorth et al.,2010). In textual instructions
however, the elements representing cause and ef-
fect are usually found in different sentences (Yor-
danova,2015).
The second problem is that existing approaches
either rely on manually defined situation model
(Sil and Yates,2011;Branavan et al.,2012;Gold-
wasser and Roth,2014), or do not use one (Li
et al.,2010;Branavan et al.,2011,2010;Zhang
et al.,2012;Vogel and Jurafsky,2010). However,
one needs a situation model to deal with model
generalisation problems and as a means for ex-
pressing the semantic relations between model el-
ements (Yordanova and Kirste,2016;Yordanova,
2016). What is more, the manual definition is time
consuming and often requires domain experts.
To address these two problems, in previous
works we outlined an approach for automatic
generation of behaviour models from texts (Yor-
danova and Kirste,2016;Yordanova,2016,2017).
In this work, we extend the approach by propos-
ing a method for automatic generation of situa-
tion models. The method adapts the idea proposed
by (Yordanova,2015) to use time series analysis
to identify the causal relations between text ele-
ments. Our approach uses this idea to discover
causal relations between actions. It also makes use
of existing language taxonomies and word depen-
dencies to identify hierarchical, spatial and direc-
tional relations, as well as relations identifying the
means through which an action is accomplished.
The situation model is then used to generate plan-
ning operators that use the situation model’s se-
mantic structure in order to specialise the opera-
tors and thus to reduce the model complexity. In
the following, we describe the approach in details.
3 Approach
The goal of the work is to build a situation model
for a given planning problem. In this sense, a sit-
uation model is the knowledge base containing all
relevant information about a given situation. This
information is represented in terms of entities de-
scribing the relevant elements for a given situation
and the semantic relations between these entities.
In the following we first discuss which elements
are of interest for us and then we describe our
approach for generating the situation model from
textual instructions.
3.1 Identifying Elements of Interest
The first step in generating the situation model is
to identify the elements of interest in the text. We
consider a text to be a sequence of sentences di-
vided by a sentence separator.
Each sentence in the text is then represented by
a sequence of words, where each word has a tag
describing its part of speech (POS) meaning.
In a text we have different types of words. We
are most interested in verbs as they describe the
actions that can be executed in the environment.
The actions are then verbs in their infinitive form
or in present tense, as textual instructions are usu-
ally described in imperative form with a missing
agent.
We are also interested in those nouns that are
the direct (accusative) objects of the verb. These
nouns give us the elements of the world with
which the agent is interacting (in other words, ob-
jects on which the action is executed).
Apart from the direct objects, we are also inter-
ested in any indirect objects of the action. Namely,
any nouns that are connected to the action through
a preposition. These nouns give us spacial, lo-
cational or directional information about the ac-
tion being executed, or the means through which
the action is executed (e.g. an action is executed
“with” the help of an object). We denote the set of
direct and indirect objects with O.
3.2 Building the Initial Situation Model
Given the set of objects O, the goal is to build the
initial structure of the situation model from these
elements. This structure consists of words, de-
scribing the elements of a situation and the rela-
tions between these elements. If we think of the
words as nodes and the relations as edges, we can
then represent the situation model as a graph.
Definition 1 (Situation model) Situation model
G:= (W, R)is a graph consisting of nodes rep-
resented through words Wand of edges repre-
sented through relations R, where for two words
a, b W, there exists a relation rRsuch that
r(a, b).
The initial structure of the situation model is
represented by a taxonomy that contains the ob-
jects Oand their abstracted meaning on different
levels of abstraction. To do that, a language taxon-
omy Lcontaining hyperonymy relations between
the words of the language is used (this is the is-a
relation between words).
To build the initial situation model, we start
with the set Oas the leaves of the taxonomy and
for each object oOwe recursively search for
its hypernyms. This results in a hierarchy where
the bottommost layer consist of the elements in O
and the uppermost layer contains the most abstract
word, that is the least common parent of all oO.
In that sense, a word at a higher abstraction level is
the least common parent of some words on a lower
abstraction level.
3.3 Extending the Situation Model
As the initial situation model contains only the ab-
straction hierarchy of the identified objects, we ex-
tend it by first including the list of all actions to the
situation model and then adding the relations be-
tween actions and indirect objects and actions and
direct objects to the graph.
On one hand, this step is performed in order to
enrich the semantic structure of the model. On the
other hand, it gives the basis for the planning op-
erators, as the list of arguments in an operator is
represented by all objects that are related to the
action.
3.4 Adding Causal Relations
The last step is extending the situation model with
causal relations. The causal relations provide the
cause-effect relation between actions in the model.
It is important for the planning problem, as the
planning operators are defined through precondi-
tions and effects, which in turn build up the causal
structure of the planning problem.
To discover causal relations between actions in
the text, we consider two cases: (1) relations be-
tween two actions in the text; (2) relations between
two action-object pairs in the text. We consider the
first case as there are actions that are not related
to a specific direct or indirect object but that still
are causally related to other actions. We consider
the second case because applying one action on an
object can cause the execution of another action
on the same object. We can think of the second
case as a special case of the first, where we have
filtered out any elements that could cause “noise”
when searching for causality.
To discover causal relations between actions,
we adapt the algorithm proposed by (Yordanova,
2015), which makes use of time series analysis.
We start by representing each unique action (or
each action-object tuple) in a text as a time se-
ries. Each element in the series is a tuple consist-
ing of the number of the sentence in the text, and
the number of occurrences of the action (tuple) in
the sentence.
In order to discover causal relations based on
the generated time series, we make use of the
Granger causality test. It is a statistical test for
determining whether one time series is useful for
forecasting another. More precisely, Granger test-
ing performs statistical significance test for one
time series, “causing” the other time series with
different time lags using auto-regression (Granger,
1969).
Generally, for two time series, we perform
Granger test, and if the pvalue of the result is
under the significance threshold, we conclude that
the first time series causes the second, hence the
first word causes the second. For example, we
generate time series for the words “take” and “put”
and after applying the Granger test, it concludes
that the lagged time series for “take” significantly
improve the forecast of the “put” time series, thus
we conclude that “take” causes “put”.
Now that we have identified the causal rela-
tions between actions, we add them in the situa-
tion model. We do that by adding the set of new
relations to the existing set of relations in the situ-
ation model. An example of a situation model can
be seen in Figure 2.
3.5 Generating Planning Operators
In order to test whether the situation model re-
duces the complexity of the planning problem, we
generate operators based on the situation model.
Figure 1shows an example of an operator in the
Planning Domain Definition Language (PDDL).
It consists of action name, parameters (or argu-
ments), and preconditions, which tell us what con-
straints have to be satisfied for an action to be ex-
ecutable, and effects, which define how the action
execution changes the world.
(:action put
:parameters (?o - object ?to -
location)
:precondition (and
(not (executed-put ?o ?to)))
:effect (and
(executed-put ?o ?to))
)
Figure 1: Example of an action template put in the
PDDL notation.
To generate an operator, we take the name from
the set of actions in the situation model. Then, for
each action a, we take the set of arguments from
the objects oOin the situation model that have
object-verb relations to the action.
The set of preconditions of an operator is then
generated from the set of causal relations to ac-
tions, which cause a given action ato become ex-
ecutable. The set of effects consists of marking
the action as executed with the given set of argu-
ments and of negating the execution of another ac-
tion if they are cyclic. Cyclic actions are actions
that negate each other’s effects. For example, the
execution of “put the apple on the table” negates
the effect of the action “take the apple”. In that
respect, for two operators aand bwith cyclic re-
lation, we have to negate the effects of aafter ex-
ecuting band vice versa, otherwise it will not be
possible to execute these actions again.
4 Evaluation
To evaluate the approach, we generated a planning
model from experiment instructions describing the
preparation of brownies. Table 1shows a small
excerpt of the instructions.
1Ope n t h e br o w n i e ba g .
2P ut t he s c i s s o r s i n th e d ra w er .
3Ta k e t h e b r o w n i e b a g a n d r i p t h e b r o w n i e b a g .
4P u t th e r i p p e d b r o w n i e b a g i n t h e s i n k .
5C l o s e t h e dr a w e r .
Table 1: Excerpt from instruction describing how
to prepare brownies.
The instructions consisted of 110 short sen-
tences describing the step by step execution of the
experiment. To obtain the part of speech tags and
dependencies between words, we used the Stan-
ford NLP parser. We used the taxonomy of En-
glish language WordNet (Miller,1995) to obtain
the hyperonyms of the identified objects. As some
shell isa
take
dobj
put
dobj
open
dobjegg
from
instrumentality
isa
causes
scissors
dobj
isa
dobj implement
isa
pan
dobj
isa
can
intobowl
into
dobj
isa
baking
has-property
fork
dobj
isa
stir
with
dobj
tableware
isa cupboard
dobjfrom
isa
in
dobj
close
dobj
artifact
isa
cup
dobj
isa
dobj measuring
dobj
fill
from intoisa
cap
dobj
isa
dobj
isa
dobj
isa
dobj container
isa
box
dobj from
isa
dobj into
dobj
brownie
has-property isa
dobjfrom
isa
bag
into dobj
dobj in
from into vessel
isa
bottle
dobj
isa
dobj
dobjoil
from
dobj
isa
dobj
isa
rip
dobj
dobj
dobjfromhas-property isa
ripped
has-property causes
causes
oven
in
dobj
isa
dobj
turn
on
home appliance
isa
fridge
dobj dobj
walk
from to isa
drawer
from
in
dobj
isa
dobjisa
event
isa
water
with
isa
dobj
chemical
isa
causes
dobj
isa
physical entity
isa
mix
dobj
isa
has-property
concoction
isa dobj
isa
living thing
isa
dough
dobj
isa
isa
butter
dobj
isa
foodstuff
isa
causes
spray
from
isa
isa
tap
from
isa
counter
from
on
on
isa
fromto
isa
sink
in on
isa
isa
whole
isa
entity
isa
isa
isasubstance
isa
isa
isa
isa
isa form isa
abstraction
isa
isa
isa
isa
isa
isa
isa isa
isa
shake
causes
egg
tableware
cup isa
fill
from intoisa has-property
bag
into
dobj
from into oil isa
rip
dobjfromhas-property
ripped
has-property
isa
dobj
causes
dobj
isa
dobj
isa
from
isa
shake
causes
Figure 2: Extract of the situation model for the brownies instruction. Blue circles indicate actions, grey
– objects, lila – properties, white taxonomy of objects. Dark blue relations indicate direct object – verb
relation, yellow – different types of relations between indirect objects and verbs or nouns, light blue –
causal relations, grey – abstraction hirarchy.
words have different meanings, we took the most
frequently used meaning for each object. The
Granger causality test was implemented in R in
order to discover causal relations between the ac-
tions. Finally, the automatic generation of the sit-
uation model and the PDDL model were imple-
mented in Haskell.
We generated a situation model which consisted
of 29 objects and 10 actions identified in the text.
Furthermore, 38 unique hyperonyms were identi-
fied for the 29 objects as well as 6 hyperonyms
based on the relations of the indirect objects to
the action. Finally, 12 causal relations were dis-
covered. The resulting situation model contained
138 unique relations between the above identified
elements. Figure 2shows the resulting situation
model for the brownies.
To evaluate the situation model, we investigated
two hypotheses.
H1 The situation model provides additional con-
text knowledge and semantic structure to the
planning problem.
H2 The situation model reduces the planning
model complexity compared to models that
do not use situation models or only use man-
ually defined situation models.
To investigate H1, we compared the PDDL
model that makes use of the situation model
(P DDLsm ) to: 1. a PDDL model containing only
actions and no arguments. We call this model
P DDL1; 2. a PDDL model containing only the
unique actions – arguments pairs discovered in the
instructions. We call this model P DDL2.
To investigate H2, we compared P DDLsm to:
1. a PDDL model that does not use a situation
model. That is, each action template in the model
has the same number of parameters, but they are
all of the same type. In other words, any object
can be used as argument for this action. We call
this model P DDL3; 2. a PDDL model that makes
use of the hyperonyms extracted through Word-
Net. We assume this model represents a man-
ually built situation model. We call this model
P DDL4; 3. a PDDL model that makes use of the
hyperonyms extracted through WordNet and of the
abstraction achieved through the relations between
indirect objects and actions. We call this model
P DDL5.
We use the following metrics: number of oper-
ators and mean branching factor.
4.1 Results
The results for H1 can be seen in Table 2. They
show that P DDLsm has much more operators
than P DDL1and P D DL2. This is apparent,
as P DDL1uses only the action classes and does
not have any arguments. P DD L2has arguments
but these are only the concrete action – arguments
pairs discovered in each sentence, so the model
does not make any other combinations of argu-
ments that might be applicable to the same ac-
tion. P DDLsm , however, generates the operators
based on the identified PDDL action templates,
making use of the situation model. This allows for
various combinations of arguments when ground-
ing the templates. On the one hand, this has the
positive effect of actions and plan variability. The
model will be able to explain many more varia-
Metrics P DDLsm P DD L1P DDL2
N: operators 1426 11 74
mean br. factor 1071 11 74
Table 2: Comparison between P DDLsm ,
P DDL1, and P D DL2.
tions in the behaviour of the agent then P DDL1
and P DDL2. On the other hand, the high num-
ber of operators also increases the mean branch-
ing factor of the model. In other words, it would
be much easier to recognise the correct actions and
plan of the agent in the case of 11 (respectively 74)
choices than 1136.
The results for H2 show that the situation model
generated through our approach reduces the com-
plexity of the model compared to models, which
do not make use of situation models or use man-
ually defined situation models (see Table 3). Ta-
Metrics P DDLsm P DD L3P DDL4P DDL5
oper. 1426 3053 1736 1426
br. fac. 1071 3053 1736 1426
Table 3: Comparison between P DDLsm ,
P DDL3,P D DL4, and P DDL5.
ble 3shows that P DDL3has the highest num-
ber of operators (3053), which is to be expected as
each action template can be grounded with any of
the available objects. The number of operators in
P DDL4decreases compared to P D DL3(1736).
This is due to the introduced type hierarchy ex-
tracted through WordNet. The number of opera-
tors decreases further when one takes into account
the additional relations identified between indirect
objects and actions (P DDL5with 1426 opera-
tors). Adding the causal relations does not de-
crease the number of operators (P DDLsm ) com-
pared to P DDL5. It is also to be expected as the
causal relations are not part of the type hierarchy.
The causal relations, however, reduce the mean
branching factor of the model. It decreases from
1426 in P DDL5to 1071 in P D DLsm. In other
words, in the rest of the models each action is al-
ways executable as there are no constraints to re-
duce the number of applicable actions. Adding the
causal relations to operators reduces the branching
factor allowing for better execution of the correct
action. This is also visible in Figure 3. It shows
the branching factor over time given a plan the
model had to explain. The plan consists of 66
actions and was manually built by watching the
video log of the brownies experiment. It can be
seen that for the models, which do not make use of
0
1000
2000
3000
0 20 40 60
timestep
bfactor
model
PDDL−sm
PDDL3
PDDL4
PDDL5
Figure 3: Branching factor for the different models
for each time step.
causal relations, the branching factor is constant.
For P DDLsm , however, we see both fluctuation
and reduction of the branching factor, which is due
to the preconditions and effects introduced in the
model.
5 Discussion and Conclusion
In this work we proposed an approach that gener-
ates situation models from textual instructions. It
then uses the situation model to generate PDDL
operators for planning problems. The results
showed that the situation model introduces addi-
tional context information and semantic structure
to the PDDL model. It also reduces the model
complexity compared to models, which do not use
situation models or which use manually developed
situation models.
The automatically generated situation model
provides valuable additional context information
about the semantic structure behind the executed
behaviour. It can potentially be used as a means
to reason beyond the actions and the goal in the
executed plan, namely by providing information
about the situation in which the agent is acting.
Acknowledgments
This work is funded by the German Research
Foundation (DFG) within the context of the
project TextToHBM, grant number YO 226/1-1.
References
Monica Babes¸-Vroman, James MacGlashan, Ruoyuan
Gao, Kevin Winner, Richard Adjogah, Marie
desJardins, Michael Littman, and Smaranda Mure-
san. 2012. Learning to interpret natural language
instructions. In Proceedings of the Second Work-
shop on Semantic Interpretation in an Actionable
Context. Association for Computational Linguis-
tics, Stroudsburg, PA, USA, SIAC ’12, pages 1–6.
http://dl.acm.org/citation.cfm?id=2390927.2390928.
Chris L. Baker, Rebecca Saxe, and Joshua B. Tenen-
baum. 2009. Action understanding as inverse plan-
ning. Cognition 113(3):329–349.
Luciana Benotti, Tessa Lau, and Mart´
ın Villalba.
2014. Interpreting natural language instructions
using language, vision, and behavior.ACM
Trans. Interact. Intell. Syst. 4(3):13:1–13:22.
https://doi.org/10.1145/2629632.
S. R. K. Branavan, Nate Kushman, Tao Lei, and
Regina Barzilay. 2012. Learning high-level
planning from text. In Proceedings of the 50th
Annual Meeting of the Association for Com-
putational Linguistics: Long Papers - Volume
1. Association for Computational Linguistics,
Stroudsburg, PA, USA, ACL ’12, pages 126–135.
http://dl.acm.org/citation.cfm?id=2390524.2390543.
S. R. K. Branavan, David Silver, and Regina
Barzilay. 2011. Learning to win by read-
ing manuals in a monte-carlo framework. In
Proceedings of the 49th Annual Meeting of
the Association for Computational Linguis-
tics: Human Language Technologies - Volume
1. Association for Computational Linguistics,
Stroudsburg, PA, USA, HLT ’11, pages 268–277.
http://dl.acm.org/citation.cfm?id=2002472.2002507.
S. R. K. Branavan, Luke S. Zettlemoyer, and Regina
Barzilay. 2010. Reading between the lines:
Learning to map high-level instructions to com-
mands. In Proceedings of the 48th Annual Meeting
of the Association for Computational Linguis-
tics. Association for Computational Linguistics,
Stroudsburg, PA, USA, ACL ’10, pages 1268–1277.
http://dl.acm.org/citation.cfm?id=1858681.1858810.
David L. Chen and Raymond J. Mooney. 2011.
Learning to interpret natural language naviga-
tion instructions from observations. In Pro-
ceedings of the 25th AAAI Conference on Ar-
tificial Intelligence (AAAI-2011). pages 859–865.
http://www.cs.utexas.edu/users/ai-lab/?chen:aaai11.
Dan Goldwasser and Dan Roth. 2014. Learning from
natural instructions.Machine Learning 94(2):205–
232. https://doi.org/10.1007/s10994-013-5407-y.
C. W. J. Granger. 1969. Investigating Causal
Relations by Econometric Models and Cross-
spectral Methods.Econometrica 37(3):424–438.
https://doi.org/10.2307/1912791.
Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas
Roy. 2014. Grounding verbs of motion in nat-
ural language commands to robots. In Oussama
Khatib, Vijay Kumar, and Gaurav Sukhatme, edi-
tors, Experimental Robotics, Springer Berlin Hei-
delberg, volume 79 of Springer Tracts in Advanced
Robotics, pages 31–47. https://doi.org/10.1007/978-
3-642-28572-1 3.
Frank Kr¨
uger, Martin Nyolt, Kristina Yordanova, Al-
bert Hein, and Thomas Kirste. 2014. Computational
state space models for activity and intention recogni-
tion. a feasibility study.PLoS ONE 9(11):e109381.
https://doi.org/10.1371/journal.pone.0109381.
Xiaochen Li, Wenji Mao, Daniel Zeng, and Fei-
Yue Wang. 2010. Automatic construction of
domain theory for attack planning. In IEEE
International Conference on Intelligence and
Security Informatics (ISI), 2010. pages 65–70.
https://doi.org/10.1109/ISI.2010.5484775.
George A. Miller. 1995. Wordnet: A lexical
database for english.Commun. ACM 38(11):39–41.
https://doi.org/10.1145/219717.219748.
Tuan A Nguyen, Subbarao Kambhampati, and Minh
Do. 2013. Synthesizing robust plans under incom-
plete domain models. In C.J.C. Burges, L. Bottou,
M. Welling, Z. Ghahramani, and K.Q. Weinberger,
editors, Advances in Neural Information Process-
ing Systems 26, Curran Associates, Inc., pages
2472–2480. http://papers.nips.cc/paper/5120-
synthesizing-robust-plans-under-incomplete-
domain-models.pdf.
Matthai Philipose, Kenneth P. Fishkin, Mike
Perkowitz, Donald J. Patterson, Dieter Fox,
Henry Kautz, and Dirk Hahnel. 2004. In-
ferring activities from interactions with ob-
jects.IEEE Pervasive Computing 3(4):50–57.
https://doi.org/10.1109/MPRV.2004.7.
Miquel Ram´
ırez and Hector Geffner. 2009. Plan
recognition as planning. In Proceedings of the 21st
International Jont Conference on Artifical Intel-
ligence. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, IJCAI’09, pages 1778–1783.
http://dl.acm.org/citation.cfm?id=1661445.1661731.
Miquel Ramirez and Hector Geffner. 2011. Goal
recognition over pomdps: Inferring the inten-
tion of a pomdp agent. In Proceedings of
the Twenty-Second International Joint Conference
on Artificial Intelligence. AAAI Press, Barcelona,
Spain, volume 3 of IJCAI’11, pages 2009–
2014. https://doi.org/10.5591/978-1-57735-516-
8/IJCAI11-335.
Avirup Sil and Alexander Yates. 2011. Extracting
strips representations of actions and events. In
Proceedings of the International Conference Recent
Advances in Natural Language Processing 2011.
RANLP 2011 Organising Committee, Hissar, Bul-
garia, pages 1–8. http://aclweb.org/anthology/R11-
1001.
M. Tenorth, D. Nyga, and M. Beetz. 2010. Un-
derstanding and executing instructions for ev-
eryday manipulation tasks from the world wide
web. In IEEE International Conference on
Robotics and Automation (ICRA). pages 1486–1491.
https://doi.org/10.1109/ROBOT.2010.5509955.
Adam Vogel and Dan Jurafsky. 2010. Learn-
ing to follow navigational directions. In
Proceedings of the 48th Annual Meeting of
the Association for Computational Linguis-
tics. Association for Computational Linguistics,
Stroudsburg, PA, USA, ACL ’10, pages 806–814.
http://dl.acm.org/citation.cfm?id=1858681.1858764.
Juan Ye, Simon Dobson, and Susan McK-
eever. 2012. Review: Situation identification
techniques in pervasive computing: A re-
view.Pervasive Mob. Comput. 8(1):36–66.
https://doi.org/10.1016/j.pmcj.2011.01.004.
Kristina Yordanova. 2015. Discovering causal rela-
tions in textual instructions. In Recent Advances in
Natural Language Processing. RANLP 2015 Organ-
ising Committee, Hissar, Bulgaria, pages 714–720.
http://www.aclweb.org/anthology/R15-1091.
Kristina Yordanova. 2016. From textual instructions to
sensor-based recognition of user behaviour. In Com-
panion Publication of the 21st International Con-
ference on Intelligent User Interfaces. ACM, New
York, NY, USA, IUI ’16 Companion, pages 67–73.
https://doi.org/10.1145/2876456.2879488.
Kristina Yordanova. 2017. TextToHBM: A gen-
eralised approach to learning models of human
behaviour for activity recognition from textual
instructions. In Proceedings of the AAAI Workshop
on Plan, Activity and Intent Recognition (PAIR).
AAAI, San Francosco, USA, pages 891–898.
https://www.aaai.org/ocs/index.php/WS/AAAIW17/
paper/view/15110.
Kristina Yordanova and Thomas Kirste. 2015. A pro-
cess for systematic development of symbolic mod-
els for activity recognition.ACM Transactions
on Interactive Intelligent Systems 5(4):20:1–20:35.
https://doi.org/10.1145/2806893.
Kristina Yordanova and Thomas Kirste. 2016. Learn-
ing models of human behaviour from textual in-
structions. In Proceedings of the 8th Interna-
tional Conference on Agents and Artificial Intelli-
gence (ICAART 2016). Rome, Italy, pages 415–422.
https://doi.org/10.5220/0005755604150422.
Ziqi Zhang, Philip Webster, Victoria Uren, Andrea
Varga, and Fabio Ciravegna. 2012. Automatically
extracting procedural knowledge from instructional
texts using natural language processing. In Nico-
letta Calzolari, Khalid Choukri, Thierry Declerck,
Mehmet U˘
gur Do˘
gan, Bente Maegaard, Joseph Mar-
iani, Jan Odijk, and Stelios Piperidis, editors, Pro-
ceedings of the Eighth International Conference
on Language Resources and Evaluation (LREC-
2012). European Language Resources Association
(ELRA), Istanbul, Turkey, pages 520–527. ACL
Anthology Identifier: L12-1094. http://www.lrec-
conf.org/proceedings/lrec2012/pdf/244 Paper.pdf.
... In our previous works, the Planning Domain Definition Language (PDDL) has been used as the targeted planning language [14]. In this work, we discuss the mapping of the situation model to event calculus as it has been shown that event calculus is suitable for describing the activities of the surgeons in training [8]. ...
... The situation model, as the name suggests, contains information about the situation in which the task is being executed [14]. It describes the objects, locations, and any elements in the environment that are taken as arguments in the actions as well as the abstraction levels of the elements. ...
... We then use these rules to reason about the possible correct exercise execution. In previous works we have used PDDL (Planning Domain Definition Language) as the targeted format [14,16,17]. In this work, however, we show that it is possible to map the situation model to event calculus, as it has already been shown that event calculus is suitable for reasoning in the context of minimally invasive surgery exercises [8]. ...
Conference Paper
One of the major difficulties in activity recognition stems from the lack of a model of the world where activities and events are to be recognised. When the domain is fixed and repetitive we can manually include this information using some kind of ontology or set of constraints. On many occasions, however, there are many new situations for which only some knowledge is common and many other domain-specific relations have to be inferred. Humans are able to do this from short descriptions in natural language, describing the scene or the particular task to be performed. In this paper we apply a tool that extracts situation models and rules from natural language description to a series of exercises in a surgical domain, in which we want to identify the sequence of events that are not possible, those that are possible (but incorrect according to the exercise) and those that correspond to the exercise or plan expressed by the description in natural language. The preliminary results show that a large amount of valuable knowledge can be extracted automatically, which could be used to express domain knowledge and exercises description in languages such as event calculus that could help bridge these high-level descriptions with the low-level events that are recognised from videos.
... Figure 6 shows the situation model generated for the instruction of preparing brownies. For more information about the procedure of generating situation models see [22]. Figure 6: Extract of the situation model for the brownies instruction. ...
... Blue circles indicate actions, grey -objects, lila -properties, white taxonomy of objects. Dark blue relations indicate direct object -verb relation, yellowdifferent types of relations between indirect objects and verbs or nouns, light blue -causal relations, grey -abstraction hierarchy (figure adapted from [22]). ...
Conference Paper
Ground truth is essential for activity recognition problems. It is used to apply methods of supervised learning, to provide context information for knowledge-based methods, and to quantify the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label and enables reasoning about the semantic structure of the observed activity. The development of semantic annotation for activity recognition is a time consuming task, which involves a lot of effort and expertise. To reduce the time needed to develop semantic annotation, we propose an approach that automatically generates semantic models based on manually assigned symbolic labels. We provide a detailed description of the automated process for annotation generation and we discuss how it replaces the manual process. To validate our approach we compare automatically generated semantic annotation for the CMU grand challenge dataset with manual semantic annotation for the same dataset. The results show that automatically generated models are comparable to manually developed models but it takes much less time and no expertise in model development is required
... In previous work we proposed an approach for extracting domain knowledge and generating situation models from textual instructions, based on which simple planning operators can be built [26]. We extend our previous work by proposing a mechanism for generation of rich models from instructional texts and providing a detailed description of the methodology. ...
... In previous work, we addressed these two problems by proposing an approach for automatic generation of situation models for planning problems [26]. In this work, we extend the approach to generate rich planning operators and we show first empirical evidence that it is possible to reason about human behaviour based on the generated models. ...
... In previous work we proposed an approach for extracting domain knowledge and generating situation models from textual instructions, based on which simple planning operators can be built [26]. We extend our previous work by proposing a mechanism for generation of rich models from instructional texts and providing a detailed description of the methodology. ...
... In previous work, we addressed these two problems by proposing an approach for automatic generation of situation models for planning problems [26]. In this work, we extend the approach to generate rich planning operators and we show first empirical evidence that it is possible to reason about human behaviour based on the generated models. ...
Conference Paper
Full-text available
Recent attempts at behaviour understanding through language grounding have shown that it is possible to automatically generate planning models from instructional texts. One drawback of these approaches is that they either do not make use of the semantic structure behind the model elements identified in the text, or they manually incorporate a collection of concepts with semantic relationships between them. To use such models for behaviour understanding, however, the system should also have knowledge of the semantic structure and context behind the planning operators. To address this problem, we propose an approach that automatically generates planning operators from textual instructions. The approach is able to identify various hierarchical, spatial, directional, and causal relations between the model elements. This allows incorporating context knowledge beyond the actions being executed. We evaluated the approach in terms of correctness of the identified elements, model search complexity, model coverage, and similarity to handcrafted models. The results showed that the approach is able to generate models that explain actual tasks executions and the models are comparable to handcrafted models.
... Specification-based approaches rely on manually incorporating expert knowledge into logic rules that allow reasoning about the situation [15]. On the other hand, learning-based methods can rely on the sensor data to learn the situation [16] or on textual sources to extract the situation related information and its semantic structure [17,18,19,20]. In this work we use specification-based approach to build the situation model. ...
Article
Full-text available
Background: Dementia impairs spatial orientation and route planning, thus often affecting the patient’s ability to move outdoors and maintain social activities. Situation-aware deliberative assistive technology devices (ATD) can substitute impaired cognitive function in order to maintain one’s level of social activity. To build such system one needs domain knowledge about the patient’s situation and needs. We call this collection of knowledge situation model. Objective: To construct a situation model for the outdoor mobility of people with dementia (PwD). The model serves two purposes: (a) as a knowledge base from which to build an ATD describing the mobility of PwD; (b) as a codebook for the annotation of the recorded behavior. Methods: We perform systematic knowledge elicitation to obtain the relevant knowledge. The OBO Edit tool is used for implementing and validating the situation model.The model is evaluated by using it as a codebook for annotating the behavior of PwD during a mobility study and interrater agreement is computed. In addition, clinical experts perform manual evaluation and curation of the model. Results: The situation model consists of 101 concepts with 11 relation types between them. The results from the annotation showed substantial overlapping between two annotators (Cohen’s kappa pf 0.61). Conclusion: The situation model is a first attempt to systematically collect and organize information related to the outdoor mobility of PwD for the purposes of situation-aware assistance. The model is the base for building an ATD able to provide situation-aware assistance and to potentially improve the quality of life of PwD.
... We intend to extend the model to include rules addressing all parts of speech and not only the actions (verbs respectively). In this context, we intend to add rules for correcting the dependencies between words in sentences as they are important for identifying semantic relations when generating human behaviour models (Yordanova, 2017a). ...
Conference Paper
Full-text available
Different approaches for behaviour understanding rely on textual instructions to generate models of human behaviour. These approaches usually use state of the art parsers to obtain the part of speech (POS) meaning and dependencies of the words in the instructions. For them it is essential that the parser is able to correctly annotate the instructions and especially the verbs as they describe the actions of the person. State of the art parsers usually make errors when annotating textual instructions, as they have short sentence structure often in imperative form. The inability of the parser to identify the verbs results in the inability of behaviour understanding systems to identify the relevant actions. To address this problem, we propose a simple rule-based model that attempts to correct any incorrectly annotated verbs. We argue that the model is able to significantly improve the parser's performance without the need of additional training data. We evaluate our approach by extracting the actions from 61 textual instructions annotated only with the Stanford parser and once again after applying our model. The results show a significant improvement in the recognition rate when applying the rules (75% accuracy compared to 68% without the rules, p-value < 0.001).
Article
The rapid growth of data collected for training models to detect activities and changes in behavior and the adoption of wearables and smart devices are making the development of novel pervasive healthcare solutions possible. Training and validating these models so that they are able to interpret data and recognize one's health status for a pervasive healthcare system require properly annotated training data. To address this problem, I discuss some of the open challenges on gathering quality health-related annotated data.
Conference Paper
Full-text available
Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.
Conference Paper
Full-text available
One aspect of ontology learning methods is the discovery of relations in textual data. One kind of such relations are causal relations. Our aim is to discover causations described in texts such as recipes and manuals. There is a lot of research on causal relations discovery that is based on grammatical patterns. These patterns are, however , rarely discovered in textual instructions (such as recipes) with short and simple sentence structure. Therefore we propose an approach that makes use of time series to discover causal relations. We distinguish causal relations from correlation by assuming that one word causes another only if it precedes the second word temporally. To test the approach, we compared the discovered by our approach causal relations to those obtained through grammatical patterns in 20 textual instructions. The results showed that our approach has an average recall of 41% compared to 13% obtained with the grammatical patterns. Furthermore the discovered by the two approaches causal relations are usually dis-joint. This indicates that the approach can be combined with grammatical patterns in order to increase the number of causal relations discovered in textual instructions.
Conference Paper
Full-text available
There are various knowledge-based activity recognition approaches that rely on manual definition of rules to describe user behaviour. These rules are later used to generate computational models of human behaviour that are able to reason about the user behaviour based on sensor observations. One problem with these approaches is that the manual rule definition is time consuming and error prone process. To address this problem, in this paper we outline an approach that learns the model structure from textual sources and later optimises it based on observations. The approach includes extracting the model elements and generating rules from textual instructions. It then learns the optimal model structure based on observations in the form of manually created plans and sensor data. The learned model can then be used to recognise the behaviour of users during their daily activities. We illustrate the approach with an example from the cooking domain.
Conference Paper
Full-text available
There are various activity recognition approaches that rely on manual definition of precondition-effect rules to describe human behaviour. These rules are later used to generate computational models of human behaviour that are able to reason about the user behaviour based on sensor observations. One problem with these approaches is that the manual rule definition is time consuming and error prone process. To address this problem, in this paper we propose an approach that learns the rules from textual instructions. In difference to existing approaches, it is able to learn the causal relations between the actions without initial training phase. Furthermore, it learns the domain ontology that is used for the model generalisation and specialisation. To evaluate the approach, a model describing cooking task was learned and later applied for explaining seven plans of actual human behaviour. It was then compared to a hand-crafted model describing the same problem. The results showed that the learned model was able to recognise the plans with higher overall probability compared to the hand-crafted model. It also learned a more complex domain ontology and was more general than the handcrafted model. In general, the results showed that it is possible to learn models of human behaviour from textual instructions which are able to explain actual human behaviour.
Article
Full-text available
Background: Computational state space models (CSSMs) enable the knowledge-based construction of Bayesian filters for recognizing intentions and reconstructing activities of human protagonists in application domains such as smart environments, assisted living, or security. Computational, i. e., algorithmic, representations allow the construction of increasingly complex human behaviour models. However, the symbolic models used in CSSMs potentially suffer from combinatorial explosion, rendering inference intractable outside of the limited experimental settings investigated in present research. The objective of this study was to obtain data on the feasibility of CSSM-based inference in domains of realistic complexity. Methods: A typical instrumental activity of daily living was used as a trial scenario. As primary sensor modality, wearable inertial measurement units were employed. The results achievable by CSSM methods were evaluated by comparison with those obtained from established training-based methods (hidden Markov models, HMMs) using Wilcoxon signed rank tests. The influence of modeling factors on CSSM performance was analyzed via repeated measures analysis of variance. Results: The symbolic domain model was found to have more than 10(8) states, exceeding the complexity of models considered in previous research by at least three orders of magnitude. Nevertheless, if factors and procedures governing the inference process were suitably chosen, CSSMs outperformed HMMs. Specifically, inference methods used in previous studies (particle filters) were found to perform substantially inferior in comparison to a marginal filtering procedure. Conclusions: Our results suggest that the combinatorial explosion caused by rich CSSM models does not inevitably lead to intractable inference or inferior performance. This means that the potential benefits of CSSM models (knowledge-based model construction, model reusability, reduced need for training data) are available without performance penalty. However, our results also show that research on CSSMs needs to consider sufficiently complex domains in order to understand the effects of design decisions such as choice of heuristics or inference procedure on performance.
Conference Paper
This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with highlevel guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the builtin AI of Civilization II.
Conference Paper
There are various activity recognition approaches that rely on manual definition of precondition-effect rules to describe user behaviour. These rules are later used to generate computational models of human behaviour that are able to reason about the user behaviour based on sensor observations. One problem with these approaches is that the manual rule definition is time consuming and error prone process. To address this problem, in this paper we outline an approach that extracts the rules from textual instructions. It then learns the optimal model structure based on observations in the form of manually created plans and sensor data. The learned model can then be used to recognise the behaviour of users during their daily activities.
Article
The article can be downloaded from https://mmis.informatik.uni-rostock.de/index.php?title=A_Process_for_Systematic_Development_of_Symbolic_Models_for_Activity_Recognition Several emerging approaches to activity recognition (AR) combine symbolic representation of user actions with probabilistic elements for reasoning under uncertainty. These approaches provide promising results in terms of recognition performance, coping with the uncertainty of observations, and model size explosion when complex problems are modelled. But experience has shown that it is not always intuitive to model even seemingly simple problems. To date, there are no guidelines for developing such models. To address this problem, in this work we present a development process for building symbolic models that is based on experience acquired so far as well as on existing engineering and data analysis workflows. The proposed process is a first attempt at providing structured guidelines and practices for designing, modelling, and evaluating human behaviour in the form of symbolic models for AR. As an illustration of the process, a simple example from the office domain was developed. The process was evaluated in a comparative study of an intuitive process and the proposed process. The results showed a significant improvement over the intuitive process. Furthermore, the study participants reported greater ease of use and perceived effectiveness when following the proposed process. To evaluate the applicability of the process to more complex AR problems, it was applied to a problem from the kitchen domain. The results showed that following the proposed process yielded an average accuracy of 78%. The developed model outperformed state-of-the-art methods applied to the same dataset in previous work, and it performed comparably to a symbolic model developed by a model expert without following the proposed development process.
Article
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Conference Paper
Comprehending action preconditions and effects is an essential step in modeling the dynamics of the world. In this paper, we express the semantics of precondition relations extracted from text in terms of planning operations. The challenge of modeling this connection is to ground language at the level of relations. This type of grounding enables us to create high-level plans based on language abstractions. Our model jointly learns to predict precondition relations from text and to perform high-level planning guided by those relations. We implement this idea in the reinforcement learning framework using feedback automatically obtained from plan execution attempts. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, yielding an F-measure of 66% compared to the baseline's 65%. Additionally, we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline -- successfully completing 80% of planning tasks as compared to 69% for the baseline.