Conference PaperPDF Available

Extracting Planning Operators from Instructional Texts for Behaviour Interpretation

Authors:

Abstract and Figures

Recent attempts at behaviour understanding through language grounding have shown that it is possible to automatically generate planning models from instructional texts. One drawback of these approaches is that they either do not make use of the semantic structure behind the model elements identified in the text, or they manually incorporate a collection of concepts with semantic relationships between them. To use such models for behaviour understanding, however, the system should also have knowledge of the semantic structure and context behind the planning operators. To address this problem, we propose an approach that automatically generates planning operators from textual instructions. The approach is able to identify various hierarchical, spatial, directional, and causal relations between the model elements. This allows incorporating context knowledge beyond the actions being executed. We evaluated the approach in terms of correctness of the identified elements, model search complexity, model coverage, and similarity to handcrafted models. The results showed that the approach is able to generate models that explain actual tasks executions and the models are comparable to handcrafted models.
Content may be subject to copyright.
Extracting Planning Operators from
Instructional Texts for Behaviour Interpretation
Kristina Yordanova
University of Rostock, 18059 Rostock, Germany
Abstract
Recent attempts at behaviour understanding through language ground-
ing have shown that it is possible to automatically generate planning mod-
els from instructional texts. One drawback of these approaches is that
they either do not make use of the semantic structure behind the model
elements identified in the text, or they manually incorporate a collection
of concepts with semantic relationships between them. To use such mod-
els for behaviour understanding, however, the system should also have
knowledge of the semantic structure and context behind the planning
operators. To address this problem, we propose an approach that au-
tomatically generates planning operators from textual instructions. The
approach is able to identify various hierarchical, spatial, directional, and
causal relations between the model elements. This allows incorporating
context knowledge beyond the actions being executed. We evaluated the
approach in terms of correctness of the identified elements, model search
complexity, model coverage, and similarity to handcrafted models. The
results showed that the approach is able to generate models that explain
actual tasks executions and the models are comparable to handcrafted
models.
1 Introduction
Libraries of plans combined with observations are often used for behaviour un-
derstanding [18, 12]. Such approaches rely on PDDL-like notations to generate
a library of plans and reason about the agent’s actions, plans, and goals based
on observations. Models describing plan recognition problems for behaviour un-
derstanding are typically manually developed [18, 2]. The manual modelling is
however time consuming and error prone and often requires domain expertise
[16]. To reduce the need of domain experts and the time required for building
the model, one can substitute them with textual data [17]. As [23] propose,
one can utilise the knowledge encoded in instructional texts, such as manuals,
recipes, and howto articles, to learn the model structure. Such texts specify
tasks for achieving a given goal without explicitly stating all the required steps.
On the one hand, this makes them a challenging source for learning a model [5].
1
On the other hand, they are written in imperative form, have a simple sentence
structure, and are highly organised. Compared to rich texts, this makes them
a better source for identifying the sequence of actions needed for reaching the
goal [28].
According to [4], to learn a model for planning problems from textual in-
structions, the system has to: 1. extract the actions’ semantics from the
text, 2. learn the model semantics through language grounding, 3. and fi-
nally to translate it into computational model for planning problems. In
this work we add 4. the learning of a situation model as a requirement
for learning the model structure. As the name suggests, it provides context
information about the situation [24]. It is a collection of concepts with semantic
relations between them. In that sense, the situation model plays the role of
the common knowledge base shared between different entities. We also add 5.
the need to extract implicit causal relations from the texts as explicit
relations are rarely found in such type of texts.
In previous work we proposed an approach for extracting domain knowl-
edge and generating situation models from textual instructions, based on which
simple planning operators can be built [26]. We extend our previous work by
proposing a mechanism for generation of rich models from instructional texts
and providing a detailed description of the methodology. Further, we show first
empirical results that the approach is able to generate planning operators, which
capture the behaviour of the user. To evaluate the approach, we examine the
correctness of the identified elements, the complexity of the search space, the
model coverage, and its similarity to handcrafted models.
The work is structured as follows. Section 2 provides the state of the art in
language grounding for behaviour understanding; Section 3 provides a formal
description of the proposed approach; Section 4 contains the empirical eval-
uation of our approach. The work concludes with discussion of future work
(Section 5).
2 Related work
The goal of grounded language acquisition is to learn linguistic analysis from a
situated context [22]. This could be done in different ways: through grammatical
patterns that are used to map the sentence to a machine understandable model
of the sentence [13, 28, 4]; through machine learning techniques [19, 6, 3, 8, 11];
or through reinforcement learning approaches that learn language by interacting
with the environment [4, 5, 22, 1, 8, 11]. Models learned through language
grounding have been used for plan generation [13, 4, 14], for learning the optimal
sequence of instruction execution [5], for learning navigational directions [22, 6],
and for interpreting human instructions for robots to follow them [11, 20].
All of the above approaches have two drawbacks. The first problem is the
way in which the preconditions and effects for the planning operators are identi-
fied. They are learned through explicit causal relations, that are grammatically
expressed in the text [13, 19]. The existing approaches either rely on initial
2
manual definition to learn these relations [4], or on grammatical patterns and
rich texts with complex sentence structure [13]. In contrast, textual instructions
usually have a simple sentence structure and grammatical patterns are rarely
discovered [25]. The existing approaches do not address the problem of discov-
ering causal relations between sentences, but assume that all causal relations
are within the sentence [20]. In contrast, in instructional texts, the elements
representing cause and effect are usually found in different sentences [25].
The second problem is that existing approaches either rely on manually
defined situation model [19, 4, 8], or do not use one [13, 5, 28, 22]. Still, one
needs a situation model to deal with model generalisation and as a means for
expressing the semantic relations between model elements. What is more, the
manual definition is time consuming and often requires domain experts. [14]
propose dealing with model generalisation by clustering similar actions together.
We propose an alternative solution where we exploit the semantic structure of
the knowledge present in the text and in language taxonomies.
In previous work, we addressed these two problems by proposing an approach
for automatic generation of situation models for planning problems [26]. In this
work, we extend the approach to generate rich planning operators and we show
first empirical evidence that it is possible to reason about human behaviour
based on the generated models. The method adapts an approach proposed by
[25] to use time series analysis to identify the causal relations between text
elements. We use it to discover implicit causal relations between actions. We
also make use of existing language taxonomies and word dependencies to identify
hierarchical, spatial and directional relations, as well as relations identifying the
means through which an action is accomplished. The situation model is then
used to generate planning operators.
3 Approach
3.1 Identifying elements of interest
The first step in generating the model is to identify the elements of interest in
the text. We consider a text Xto be a sequence of sentences S={s1, s2, ..., sn}.
Each sentence sis represented by a sequence of words Ws={w1s, w2s, ..., wms},
where each word has a tag twdescribing its part of speech (POS) meaning. In
a text we have different types of words. We are interested in verbs vV,
VWas they describe the actions that can be executed in the environment.
The set of actions EVare verbs in their infinitive form or in present tense,
as textual instructions are usually described in imperative form with a missing
agent. We are also interested in nouns n,NWthat are related to the
verb. One type of nouns are the direct (accusative) objects of the verb dD,
DN. These nouns give us the elements of the world with which the agent is
interacting (in other words, objects on which the action is executed). We denote
the relation between dand eas dobj(e, d). Here a relation ris a function applied
to two words aand b. We denote this as r(a, b). Note that r(a, b)6=r(b, a). An
3
example of such relation can be seen in Fig. 1, where “knife” is the direct object
of “take”.
Apart from the direct objects, we are also interested in any indirect objects
iI,INof the action. Namely, any nouns that are connected to the action
through a preposition. These nouns give us spacial, locational or directional
information about the action being executed, or the means through which the
action is executed (e.g. an action is executed “with” the help of an object).
More formally, an indirect object ipIof an action eis the noun connected to
ethrough a preposition p. We denote the relation between ipand eas p(e, ip).
For example, in Fig. 1 “counter” is the indirect object of “take” and its relation
is denoted as from(take,counter). We define the set O:= DIof all relevant
objects as the union of all unique direct and indirect objects in a text.
The last type of element is the object’s property. A property cC,CW
of an object ois a word that has one of the following relations with the object:
amod(c, o), denoting the adjectival modifier or nsubj(c, o), denoting the nominal
subject. We denote such relation as property(c, o). For example, in Fig. 1,
“clean” is the property of “knife”. As in instructions the object is often omitted
(e.g. “Simmer (the sauce) until thickened."), we also investigate the relation
between an action and past tense verbs or adjectives that do not belong to
an adjectival modifier or to nominal subject, but that might still describe this
relation.
3.2 Building the initial situation model
Given the set of objects O, the goal is to build the initial structure of the
situation model. It consists of words, describing the elements of a situation and
the relations between these elements. If we think of the words as nodes and the
relations as edges, we can represent the situation model as a graph.
Definition 1 (Situation model) Situation model G:= (W, R)is a graph con-
sisting of nodes represented through words Wand of edges represented through
Take the clean knife from the counter.
VB DT NN IN DT NN
dobj prep_from
Action Object Ind. object (from)
JJ
Property
amod
Relation
POS
Type
(:action take
:parameters(?o - object ?l - surface)
:precondition (and
(<= (number-executed-take ?o ?l) nExecuted)
(is-utensil ?o)
(is-from ?l)
(clean ?o)
(executed-put ?o ?l) )
:eect (and (increase (number-executed-take ?o ?l) 1)
(not (executed-put ?o ?l))
(executed-take ?o ?l))
)
Figure 1: Elements of a sentence necessary for the model generation and the
corresponding PDDL operator. Each sentence is assigned a part of speech tag
and the dependencies are annotated. Based on them, the relevant elements are
identified. Later, PDDL operators are generated from the identified elements.
4
relations R, where for two words a, b W, there exists a relation rRsuch
that r(a, b).
The initial structure of the situation model is represented through a taxonomy
that contains the objects Oand their abstracted meaning on different levels
of abstraction. To do that, a language taxonomy Lcontaining hyperonymy
relations between the words of the language is used (this is the is-a relation
between words). For example, the relation isa(knife,tool) indicates that the
concrete object “knife” is of type “tool”. To build the initial situation model, we
start with the set Oas the leaves of the taxonomy and for each object oO
we recursively search for its hyperonyms. This results in a hierarchy where
the bottommost layer consist of the elements in Oand the uppermost layer
contains the most abstract word, that is the least common parent of all oO.
Here the least common parent lcp(a,b) of two words aand bis the parent
on the lowest level in the taxonomy that contains both aand bas children.
Then the initial situation model is Ginit := (Winit, Rinit)with Winit =O
hyperonyms(O, L)and Rinit := isa(Winit )where Ois the set of objects and
Lis a language taxonomy. Furthermore, for every two objects oi, ojO, it
holds that there exists lLsuch that l=lcp(oi, oj). Note that here we use
a function hyperonyms(O,L), which returns all hyperonyms of Ofound in L.
The abstraction hierarchy is later used to generalise or specialise the action
templates in a planning model.
3.3 Extending the situation model
As the initial situation model contains only the abstraction hierarchy of the iden-
tified objects, we extend it by first including the list of all actions and properties
to the situation model and then adding the relations between actions and indi-
rect objects, actions and direct ob jects, and properties and objects to the graph.
We define the extended situation model as Gext := (Wext, Rext ), such that
Wext := Winit ECand Rext := Rinit dobj(E, O)p(E, O)property(C, O),
where Eis the set of actions, Ois the set of objects, Cis the set of properties,
and dobj(E, O)and p(E, O)are the direct, respectively indirect, relations be-
tween object and action, while property(C, O)is the property - object relation.
On the one hand, this step is performed to enrich the semantic structure of the
model. On the other hand, it gives the basis for the planning operators as the
arguments in an operator are represented by all objects that are related to the
action.
3.4 Adding implicit causal relations
The last step is extending the situation model with causal relations. They build
up the preconditions and effects in a planning operator. There are two types of
predicates that describe the preconditions and effects. The first type is described
through the identified properties (e.g. the condition that the knife has the
property “clean”) and through the indirect object relations (e.g. the counter has
5
the role “from”). The second type of preconditions are based on the assumption
that a certain action has to be executed to enable the execution of another
action. We call this “predictive causality” [7, p. 254] and the corresponding
relations “predictive causal relations” or “implicit causal relations”.
To discover implicit causal relations between actions in the text, we consider
two cases: (1) relations between two actions in the text; (2) relations between
two action-object pairs in the text. We consider the first case as there are actions
that are not related to a specific direct or indirect object but that still are
causally related to other actions. We consider the second case because applying
one action on an object can cause the execution of another action on the same
object. We denote predictive causal relations with qQ,QR. To discover
causal relations between actions, we adapt the algorithm proposed by [25], which
makes use of time series analysis. We start by representing each unique action
(or each action-object tuple) in a text as a time series. Each element in the series
represents the number of occurrences of the action in the sentence. We then
make use of the Granger causality test. It is a statistical test for determining
whether one time series is useful for forecasting another. It performs statistical
significance test for one time series, “causing" the other time series with different
time lags using auto-regression [9]. Given two sets of time series xtand yt, we
can test whether xtGranger causes ytwith a maximum ptime lag. To do that,
we estimate the regression yt=ao+a1yt1+... +apytp+b1xt1+... +bpxtp.
An F-test is then used to determine whether the lagged xterms are significant1.
For example, we generate time series for the words “take” and “put” and after
applying the Granger test, it concludes that the lagged time series for “take”
significantly improve the forecast of the “put” time series, thus we conclude that
“take” causes “put”.
Now that we have identified the implicit causal relations between actions,
we add them in the situation model. The final situation model is Gf in :=
(Wfin, Rf in )such that Wfin := Wext and Rf in := Rext Q, where Qis the set
of discovered causal relations, Wext is the set of words and Rext is the set of
relations in the extended situation model.
3.5 Generating planning operators
The next step is to generate operators based on the situation model. An operator
a:= (e, Z, P r, F p, Ef, F e)is a tuple, where eis the name of the operator, Z
represents the set of arguments with which the operator can be parameterised;
P r,Ef Pare the set of precondition, respectively effect, predicates; F p,
F e Fare the set of precondition, respectively effect, functions. The predicates
Pare boolean functions that provide statements about the model world state.
In difference to predicates, functions provide higher-order statements about the
1Note that regression usually reflects correlation. Granger, however, argued that causality
in economics could be tested for by measuring the ability to predict the future values of a
time series using prior values of another time series. As the question of “true causality” is
philosophical, the Granger causality test assumes that one thing preceding another can be
used as evidence of causation.
6
Algorithm 1 Generating planning operators from the situation mo del
Require: E, C, R, O actions, properties, relations, objects from Gf in
Require: n number of times an action can be executed
Require: A:= empty set of operators
1: for ein Edo for each action in E
2: (nameae, Zae, P rae, F pae, E fae, F eae)initialise()
3: nameaee
4: for oin Odo
5: if r:= relation(e, o), r Rdobj Rpthen add arguments
6: Zaeadd.argument(Zae, o)
7: end if
8: if r:= p(e, o), r Rpthen add predicates from indirect ob ject relations
9: P raeadd.predicate(P rae,property-p(o))
10: end if
11: end for
12: F paeadd.function(F pae,(number-executed-e(Z)< n)) default precondition function
13: for zin Zaedo add property predicates
14: for cin Cdo
15: if r:= property(c, z ), r Rthen
16: P raeadd.predicate(P rae,property-c(z))
17: end if
18: end for
19: end for
20: for yin E,y6=edo
21: if r:= causes(y, e), r Q, Q Rthen add causal predicates to precondition
22: P raeadd.predicate(P rae,executed(y))
23: end if
24: for win E,w6=e,w6=ydo remove transitive actions in the precondition
25: if u:= cyclic(y, e)∧ ∃l:= cyclic(w, e)∧ ∃t:= cyclic(y, w ), u, l, t Rthen
26: tmp get.weakest(e, y , w)identify the weakest transitive action
27: if tmp 6=ethen
28: P raeremove.predicate(P rae,executed(tmp))
29: end if
30: end if
31: end for
32: if r:= cyclic(y, e), r Rthen add predicate for cyclic actions in the effects
33: Efaeadd.predicate(E fae,¬executed(y))
34: end if
35: end for
36: F eaeadd.function(F eae,(number-executed-e(Z) + 1)) increase value of precondition
function with 1
37: Efaeadd.predicate(E fae,executed(e)) mark action as executed
38: Aadd.op(A, (nameae, Zae, P rae, F pae, E fae, F eae))
39: end for
40: return unique(A)return all unique operators
model world (e.g. increasing the function value).
Algorithm 1 shows the procedure for generating the operators from the sit-
uation model. We take the name efrom the set of actions Ein the situation
model. Then, for each action e, we take the set of arguments Zfrom the objects
Oin the situation model that have object-verb relations to the action. The set
of precondition predicates P r is generated from the set of actions, which have
implicit causal relation to e, and the set of identified properties, related to the
action or its arguments. The set of effects consists of marking the action as
executed, increasing the value of the precondition function, and of negating the
execution of another action if they are cyclic. Cyclic actions are actions that
negate each other’s effects. a, b Eare cyclic, if causes(a, b)and causes(b, a).
We denote them as cyclic(a, b). For example, the execution of “put the apple on
the table” negates the effect of the action “take the apple”. For two operators
7
aand bwith cyclic relation, we have to negate the effects of aafter execut-
ing band vice versa, otherwise it will not be possible to execute these actions
again. Another problem that arises are transitive causal relations. We say that
three actions a, b and care transitive if for a, b, c E, it holds that cyclic(a, b),
cyclic(b, c), and cyclic(a, c). The problem here is that the preconditions and ef-
fects of these actions block the execution of at least one of the transitive actions.
It no longer suffice to just negate the effects of the cyclic actions, as there is a
third action influencing the execution of the two remaining actions. To solve
this problem, we follow an approach similar to the one proposed in [21]. We
identify any transitive relations an action has, then remove the weakest relation,
ending up with only cyclic relations. We find the weakest relation by calculating
the frequency of appearance of the relations in the text and removing the one
with the lowest frequency. Example operator can be seen in Fig. 1, left.
The target language is the Planning Domain Definition Language (PDDL),
which represents the operators through abstracted action templates. To gen-
erate templates we replace the operator’s arguments with the corresponding
hyperonym on level min the abstraction hierarchy and then removing any re-
peating abstracted operators. In that manner we control the model specificity.
Using hyperonyms on higher abstraction level produces more general models
and using those on lower abstraction level produces more specific models.
The planning model Mis a tuple (P, F, L, A, Z, x0, g), where Pis a set of
predicates, Fis a set of functions, Lis the language taxonomy (or abstraction
hierarchy) from the situation model, Ais a set of actions, Zis a set of arguments,
xois the initial state, and gis the set of goals. The predicates and functions
build up the model states xX, where Xis the model state space, which
represents a unique combination of the values of all predicates and functions.
The initial state x0is the state of the world before any action has been executed.
To generate x0, we set all the predicates identifying the execution of a cyclic
action to true, add all identified properties, and set all functions to their initial
value. Furthermore, we perform analysis based on the action order in the text.
We check if the precondition of the first action that requires enabling are initially
enabled. In case some predicates cannot be enabled based on the original action
order, they are set to true in the initial state description. The rest of the
predicates are set to false. The goal states gXrepresent all the predicates
that have to hold for the goal to be reached. We generate the goal states
by defining that each type of action aAhas to be executed at least once.
The generated operators often have contradicting preconditions and effects. To
address this problem, we use a strategy where all ground operators that have
impossible preconditions, given the initial state, are removed [10]. The same
applies to predicates and functions that are used only in impossible actions.
This strategy removes all impossible candidate operators and predicates and
returns a model that is causally correct.
8
0
10
20
30
40
number
elements
actions
causal rel.
direct obj. action rel.
hierarhcy
indirect obj. rel.
objects
properties
Figure 2: Median number of elements and relations incorporated in the situation
models extracted from 20 instructional texts.
4 Evaluation
To evaluate the approach, we used 20 instructional texts, from which we gener-
ated planning models. We used an extended version of PDDL [10] as a target
format for the planning models. The instructions included cooking recipes (3
instructions), texts from coffee and washing machines manuals (4 instructions),
texts from wikiHow2(3 instructions), descriptions of the tasks performed in the
CMU kitchen dataset3,4(3 instructions), and descriptions of student exercises
for minimally invasive surgery (7 instructions).
The instructions had between 7 and 111 sentences with a mean of 26.5 sen-
tences and a mean sentence length in a text between 5 and 16 words. To obtain
the part of speech tags and dependencies between words, we used the Stanford
NLP parser. As state of the art parsers have shown to perform poorly with iden-
tifying events in instructional texts, we use a postprocessing step, as proposed in
[27], to improve the tags accuracy. We used the taxonomy of English language
WordNet [15] to obtain the hyperonyms of the identified objects. As some words
have different meanings, we took the most frequently used meaning for each ob-
ject. To generate the PDDL action templates, we used abstraction level of 2.
Figure 2 shows the statistics for the extracted from the texts elements that were
incorporated in the situation models. The average number of identified actions
in the instructional texts was 17.3 and the average number of objects was 10.2.
Relatively small number of properties was discovered (mean of 3.55) with more
properties discovered in cooking recipes that use more unstructured language
with longer sentences (with maximum number of 18 properties). An average of
7.3 causal relations were discovered in the texts with more causal relations when
the texts had more sentences (maximum of 24 relations). This is to be expected,
as the time series analysis performs better with longer series. The opposite was
observed in discovering semantic relations (i.e. the relations between objects,
properties and actions within the sentences). Texts with longer sentence struc-
ture but with less sentences tended to have more semantic relations. Finally, the
generated abstraction hierarchy (i.e. hyperonyms) had between 3 and 8 levels
with an average of 5 levels.
2https://www.wikihow.com/Main-Page
3http://kitchen.cs.cmu.edu/
4These descriptions have been generated based on the observed in the video log behaviour.
9
Actions Objects Properties
●●●
●●●
10
15
20
5 10 15 20
instruction
number
−1
0
1
2
3
4
distance
10
20
30
40
5 10 15 20
instruction
number
−5
−4
−3
−2
−1
0
distance
●●●●●●
●●
●●●●●●
●●
0
5
10
15
5 10 15 20
instruction
number
−1
0
1
2
3
distance
Figure 3: Difference between automatically discovered elements and manually
discovered elements in the 20 instructional texts. Green indicates that the
human annotator discovered more elements, red means that our approach dis-
covered false positives, while yellow indicates the same number of elements.
Correctness of the identified elements: To evaluate whether the ap-
proach is able to correctly identify objects, actions and properties from texts,
we asked a human annotator to manually identify these elements in the texts5.
Figure 3 graphically shows the number of discovered elements and the distance
between the number of manually and automatically discovered elements. It can
be seen that in the majority of the cases both discovered the same number of
elements6. Interestingly enough, our approach tended to discover more objects
than the human annotator, producing false positives. This can be explained
with the fact that it identified abstract concepts such as “level”, “time” etc. as
objects while the human annotator considered only physical objects.
Complexity of the model: Figure 4 (left) shows the median number of
generated action templates, and the resulting number of grounded operators,
predicates, and functions after the pruning phase. A minimum of 7 and a
maximum of 57 action templates were generated based on the situation model
5Note that we did not compare the identified causal relations. This is because implicit
causal relations are a subject of interpretation. For that reason, we consider the relations are
correctly identified if the model is able to explain the given plan.
6In the case where the number of elements was the same for both the human annotator
and our tool, the discovered elements were the same in both cases.
Figure 4: Median number of action templates, operators, predicates, and func-
tions (left). Median branching factor and states (right).
10
with a mean of 19.85 templates. The templates resulted in models with 71.5
operators on average, 9.4 predicates, and 64.35 functions. We also applied
iterative deepening depth-first search to analyse the state space complexity and
branching factor for the resulting models. We limited the search depth to 5 as
some of the models had state spaces of hundreds of millions of states. Figure 4
(right) shows the maximum and median branching factors as well as the number
of discovered states at a search depth of 5. The branching factor tells us how
many states are reachable from any given state in the model. High branching
factor indicates that the probability of selecting the actually observed action
will be low. The models with small number of operators generated as few as
327 states, while some models had as many as 10 million states at search level
5 (with the search being incomplete at this level). This was also reflected in the
branching factor, where a maximum branching factor of 449 was observed. On
average, however, the number of reachable states from any given state was 56.
This is still a very high number, but with such number plan recognition would
still be feasible, especially in the presence of unambiguous observations.
Model coverage: To evaluate whether a generated model is actually able
to explain human behaviour, we used the CMU kitchen dataset. We analysed
15 video logs from the “brownies” dataset and based on the observed execution
sequences, we manually generated 15 plans. We then tested a model generated
from a text describing the “brownies” dataset. The text was written based on
the behaviour observed in the first video log. We expected that the model will
be able to better explain the plan corresponding to the first log.
To evaluate the model, we first looked whether the model is able to explain
the plans at all (i.e. whether the observed execution sequences are part of the
model). The results showed that the model was able to explain all of the 15
plans. We then calculated the final log likelihood of the model. This is the
likelihood that tells us how well the model fits to the provided observation
sequence (in our case to the plan). The final log likelihood is calculated based
on the cumulative action probability of the observed in the plan action, given
a model M. This approach is similar to model learning through observations
[8]. Figure 5 shows the final log likelihood for the models when explaining the
15 plans and its relation to the length of the executed plan. We fitted a linear
model to the results (in blue). It showed that the likelihood of the model (i.e.
how well it fits the given plan) is linearly proportional to the length of the plan.
60
80
100
−500 −400 −300
likelihood
length
Figure 5: Negative log likelihood of the model, given a certain plan.
11
Metrics P DDLgP DDLh1P D DLh2P DDLh3
operators 421 1854 1461 257
predicates 10 1853 424 48
functions 329 0 0 41
min/mean/max br. 1/231.19/421 1848/1848/1854 82/117.28/290 5/30.82/55
states (depth 5) 10 000 227 10 000 162 10 000 009 1 785 896
Table 1: Comparison between P DDLg,P DDLh1,P D DLh2, and P DDLh3.
This stands to show that the model was able to explain all plans in a similar
manner (i.e. it was not overfitted for the first plan).
Similarity to handcrafted models: To investigate how a generated model
compares to a handcrafted model, we asked experts to develop 3 PDDL mod-
els for the “brownies” experiment. We call the generated model P DDLg. We
compared P DDLgto P D DLh1,P DDLh2, and P D DLh3, each of which had
increasing complexity in terms of constraints and domain knowledge. P D DLh3
was overfitted to explain only the sequences in the “brownies” experiment. Table
1 shows the comparison between the handcrafted models and P DDLg. There
the more complex the constraints and context knowledge, the more specific the
model becomes, and the search space complexity decreases. In terms of opera-
tors and predicates P DDLgperformed more similar to the overfitted P DDLh3
with 1.6 more operators than P DD Lh3.P DDLghad two times higher branch-
ing factor than P DDLh2but it still had 8 times smaller branching factor than
P DDLh1. This stands to show that the model is comparable to handcrafted
models that do not encode implicit common sense knowledge or knowledge used
by the system designer to reduce the model state space.
5 Conclusion and Future Work
In this work we showed first empirical results from an approach that generates
PDDL models for behaviour understanding from instructional texts. The results
showed that the approach is able to identify most of the relevant model elements
from textual narratives. In that sense, it performed comparable to human an-
notator. The approach was also able to generate a model that can explain the
actual execution sequences observed in the video logs of the “brownies” dataset
from the CMU kitchen activities. Finally, comparing the generated model with
handcrafted models, it was shown that the model has better parameters and
encodes more context knowledge than a simple handcrafted model but is unable
to capture the “common sense” knowledge that is encoded in overfitted hand-
crafted models. In the future, we plan to address this problem by introducing an
additional learning phase, where the generated model is further adjusted based
on observations of already executed plans.
References
[1] M. Babeş-Vroman, J. MacGlashan, R. Gao, K. Winner, R. Adjogah,
M. desJardins, M. Littman, and S. Muresan. Learning to interpret nat-
12
ural language instructions. In Proc. Workshop on Semantic Interpretation
in an Actionable Context, pages 1–6, Stroudsburg, PA, USA, 2012.
[2] C. Baker, R. Saxe, and J. Tenenbaum. Action understanding as inverse
planning. Cognition, 113(3):329–349, 2009.
[3] L. Benotti, T. Lau, and M. Villalba. Interpreting natural language instruc-
tions using language, vision, and behavior. ACM Trans. Interact. Intell.
Syst., 4(3):13:1–13:22, Aug 2014.
[4] S. Branavan, N. Kushman, T. Lei, and R. Barzilay. Learning high-level
planning from text. In Proc. Ann. Meeting of Assoc. for Computational
Linguistics, pages 126–135, Stroudsburg, PA, USA, 2012.
[5] S. Branavan, L. Zettlemoyer, and R. Barzilay. Reading between the lines:
Learning to map high-level instructions to commands. In Proc. Ann. Meet-
ing of Assoc. for Computational Linguistics, pages 1268–1277, Stroudsburg,
PA, USA, 2010.
[6] D. Chen and R. Mooney. Learning to interpret natural language navi-
gation instructions from observations. In Proc. AAAI Conf. on Artificial
Intelligence, pages 859–865, Aug 2011.
[7] F. Diebold, K. Witman, D. Hanseman, L. Lysne, and T. Moore. Elements
of Forecasting. Cengage Learning, second edition, 2000.
[8] D. Goldwasser and D. Roth. Learning from natural instructions. Machine
Learning, 94(2):205–232, 2014.
[9] C. Granger. Investigating Causal Relations by Econometric Models and
Cross-spectral Methods. Econometrica, 37(3):424–438, Aug 1969.
[10] T. Kirste and F. Krüger. CCBM-a tool for activity recognition using com-
putational causal behavior models. Technical Report CS-01-12, Institut für
Informatik, Universität Rostock, Rostock, Germany, May 2012.
[11] T. Kollar, S. Tellex, D. Roy, and N. Roy. Grounding verbs of motion in nat-
ural language commands to robots. In Experimental Robotics, volume 79,
pages 31–47. Springer Berlin Heidelberg, 2014.
[12] F. Krüger, M. Nyolt, K. Yordanova, A. Hein, and T. Kirste. Computational
state space models for activity and intention recognition. a feasibility study.
PLoS ONE, 9(11):e109381, Nov 2014.
[13] X. Li, W. Mao, D. Zeng, and F.-Y. Wang. Automatic construction of
domain theory for attack planning. In IEEE Int. Conf. on Intelligence and
Security Informatics, pages 65–70, May 2010.
[14] A. Lindsay, J. Read, J. Ferreira, T. Hayton, J. Porteous, and P. Gregory.
Framer: Planning models from natural language action descriptions. In
Int. Conf. on Automated Planning and Scheduling, 2017.
13
[15] G. Miller. Wordnet: A lexical database for english. Commun. ACM,
38(11):39–41, Nov 1995.
[16] T. A. Nguyen, S. Kambhampati, and M. Do. Synthesizing robust plans
under incomplete domain models. In Advances in Neural Information Pro-
cessing Systems 26, pages 2472–2480. Curran Associates, Inc., 2013.
[17] M. Philipose, K. Fishkin, M. Perkowitz, D. Patterson, D. Fox, H. Kautz,
and D. Hahnel. Inferring activities from interactions with objects. IEEE
Pervasive Computing, 3(4):50–57, Oct 2004.
[18] M. Ramirez and H. Geffner. Goal recognition over pomdps: Inferring the
intention of a pomdp agent. In Proc. Int. J. Conf. on Artificial Intelligence,
volume 3 of IJCAI’11, pages 2009–2014, Barcelona, Spain, 2011.
[19] A. Sil and A. Yates. Extracting strips representations of actions and events.
In Recent Advances in Natural Language Processing, pages 1–8, Hissar,
Bulgaria, Sep 2011.
[20] M. Tenorth, D. Nyga, and M. Beetz. Understanding and executing instruc-
tions for everyday manipulation tasks from the world wide web. In IEEE
Int. Conf. on Robotics and Automation, pages 1486–1491, May 2010.
[21] M. Veloso, A. Perez, and J. Carbonell. Nonlinear planning with parallel
resource allocation. In Proc. DARPA Workshop of Innovative Approaches
to Planning, Scheduling and Control, Nov 1990.
[22] A. Vogel and D. Jurafsky. Learning to follow navigational directions. In
Proc. Ann. Meeting of Assoc. for Computational Linguistics, pages 806–
814, Stroudsburg, PA, USA, 2010.
[23] B. Webber, N. Badler, B. Eugenio, C. Geib, L. Levison, and M. Moore.
Instructions, intentions and expectations. Artificial Intelligence, 73(1):253
– 269, 1995.
[24] J. Ye, S. Dobson, and S. McKeever. Review: Situation identification
techniques in pervasive computing: A review. Pervasive Mob. Comput.,
8(1):36–66, Feb 2012.
[25] K. Yordanova. Discovering causal relations in textual instructions. In
Recent Advances in Natural Language Processing, pages 714–720, Hissar,
Bulgaria, Sep 2015.
[26] K. Yordanova. Automatic generation of situation models for plan recog-
nition problems. In Proceedings of the International Conference Recent
Advances in Natural Language Processing, pages 823–830, Varna, Bulgaria,
September 2017. INCOMA Ltd.
14
[27] K. Yordanova. A simple model for improving the performance of the stan-
ford parser for action detection in textual instructions. In Proceedings of the
International Conference Recent Advances in Natural Language Processing,
pages 831–838, Varna, Bulgaria, September 2017. INCOMA Ltd.
[28] Z. Zhang, P. Webster, V. Uren, A. Varga, and F. Ciravegna. Automati-
cally extracting procedural knowledge from instructional texts using nat-
ural language processing. In Proc. Int. Conf. on Language Resources and
Evaluation, pages 520–527, Istanbul, Turkey, May 2012.
15
... There are also works such as the one of Benotti et al. [2] which propose the usage of unlabelled instructions and agent reactions gathered in a game-like virtual environment for the purpose of plan generation. Yordanova et al. showed that sensor data can be combined with textual descriptions of task execution sequences and general purpose taxonomies in order to create models of human behaviour [19,20]. Apart from textual instructions, a lot of researchers propose methods to predict behaviour by using domain knowledge representation. ...
... Existing works have shown that it is possible to automatically generate computational models of human behaviour from textual instructions [20,19]. These models, however, are very general and lack certain steps because these steps are also missing from the source data. ...
... Instructional texts are analysed in order to extract the actions, objects, locations and further context relevant for the given domain. We follow the approach proposed in [20]: we first perform part of speech (POS) tagging and dependencies parsing (DP) in order to identify relevant actions, objects, any properties to the objects, and locations where the actions could be executed. We then perform time series analysis on the identified actions to identify candidate causal relations. ...
Chapter
Automated systems for assisting persons to achieve their everyday tasks are gaining popularity, both in the application domains for supporting healthy persons, as well as for assisting people with impairments. The development of such assistive systems is a challenging task associated with a lot of time and effort and often requires the involvement of domain experts. To address this problem, different works have investigated the automated knowledge extraction and model generation for behaviour interpretation and assistance. Existing works, however, usually concentrate on one source of data for the task of automated knowledge generation, which could potentially result in simpler models that are unable to adequately support the person. To address this problem, in this work we present the BehavE methodology, which proposes the extraction of knowledge from different types of sources and its consolidation into a unified semantic model that is used for behaviour interpretation and generation of assistance strategies.
... This could be done in different ways: through grammatical patterns that are used to map the sentence to a machine understandable model of the sentence [6,29]; through machine learning techniques [3,11,15]; or through reinforcement learning approaches that learn language by interacting with an external environment [6,7,11,15,19]. Models learned through model grounding have been used for plan generation [6,23], for learning the optimal sequence of instruction execution [7], for learning navigational directions [19], and for interpreting human instructions for robots to follow them [15]. To our knowledge, model generation has not been used for building semantic models for model-based annotation. ...
... To address this problem, we extend the approach so that it automatically generates the underlying semantic model needed for validating the annotation. The method adapts the idea of learning planning operators from textual instructions proposed in [23]. It differs from existing works for model generation in the source of data from which the model is learned. ...
... We automate this step by first automatically obtaining the implicit causal relations between the actions in the textual instructions. This is done by converting the textual instructions into time series and then performing a time series analysis to discover any causal dependencies between the series as proposed in [20,23]. We start by representing each unique action in a text as a time series. ...
Conference Paper
Ground truth is essential for activity recognition problems. It is used to apply methods of supervised learning, to provide context information for knowledge-based methods, and to quantify the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label and enables reasoning about the semantic structure of the observed activity. The development of semantic annotation for activity recognition is a time consuming task, which involves a lot of effort and expertise. To reduce the time needed to develop semantic annotation, we propose an approach that automatically generates semantic models based on manually assigned symbolic labels. We provide a detailed description of the automated process for annotation generation and we discuss how it replaces the manual process. To validate our approach we compare automatically generated semantic annotation for the CMU grand challenge dataset with manual semantic annotation for the same dataset. The results show that automatically generated models are comparable to manually developed models but it takes much less time and no expertise in model development is required
... There are different works that address the problem of learning models from textual instructions. Such models are used for constructing plans of human behaviour [15,4,26,28], for learning an optimal actions' execution sequence based on natural instructions [5,6,22,7,1,3], for constructing machine understandable model from natural language instructions [35,11], and for automatically generating semantic annotation for sensor datasets [27,30]. Model learning from textual instructions has applications in different fields of computer science: constructing plans of terrorist attacks [15], improving tasks execution (such as navigation, computer commands following, games playing) by interpreting natural language instructions [5,6,22,7,1,3], the ability of a robot or machine to interpret instructions given in natural language [35,11], or for behaviour analysis tasks based on sensor observations [26,32]. ...
... Such models are used for constructing plans of human behaviour [15,4,26,28], for learning an optimal actions' execution sequence based on natural instructions [5,6,22,7,1,3], for constructing machine understandable model from natural language instructions [35,11], and for automatically generating semantic annotation for sensor datasets [27,30]. Model learning from textual instructions has applications in different fields of computer science: constructing plans of terrorist attacks [15], improving tasks execution (such as navigation, computer commands following, games playing) by interpreting natural language instructions [5,6,22,7,1,3], the ability of a robot or machine to interpret instructions given in natural language [35,11], or for behaviour analysis tasks based on sensor observations [26,32]. ...
... One general challenge that remains is how to empirically compare the different approaches. Yordanova [26] proposes different measures such as correctness of the identified elements, complexity of the model, model coverage, and similarity to handcrafted models. To compare different approaches with these metrics, however, one needs a common dataset 1 . ...
Preprint
Full-text available
Recent research in behaviour understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.
... To address the problem of labelled data, some works propose the combination of video-based object detection and clustering of video descriptions to identify action classes and their relations to objects [12]. Other works propose methods for automatic generation of semantic behaviour models based on textual instructions [21]. These models are then used to recognise the action classes and the objects on which the classes are executed. ...
... see [27,28]), we utilise an approach for automatic model generation based on the label strings produced through the ELAN annotation tool. This approach is based on works proposing learning behaviour models from textual instructions [20,21] and is described in [22]. ...
Conference Paper
With the demographic change towards ageing population, the number of people suffering from neurodegenerative diseases such as dementia increases. As the ratio between young and elderly population changes towards the seniors, it becomes important to develop intelligent technologies for supporting the elderly in their everyday activities. Such intelligent technologies usually rely on training data in order to learn models for recognising problematic behaviour. One problem these systems face is that there are not many datasets containing training data for people with dementia. What is more, many of the existing datasets are not publicly available due to privacy concerns. To address the above problems, in this paper we present a sensor dataset for the kitchen task assessment containing normal and erroneous behaviour due to dementia. The dataset is recorded by actors, who follow instructions describing normal and erroneous behaviour caused by the progression of dementia. Furthermore, we present a semantic annotation scheme which allows reasoning not only about the observed behaviour but also about the causes of the errors
... Specification-based approaches rely on manually incorporating expert knowledge into logic rules that allow reasoning about the situation [26]. On the other hand, learning-based methods can rely on the sensor data to learn the situation [27] or on textual sources to extract the situation-related information and its semantic structure [34,31]. Regardless of the approach applied for collecting the knowledge, it is then encoded in the form of ontology. ...
... In that manner, if the person's behaviour changes over time, the ontology is automatically adapted with the new knowledge obtained from texts, if the changes could be observed in the person's interactions with the environment. These automated approaches show promising results towards reducing the effort associated with manual ontology development [31,34]. There is, however, still the need of some quality control by experts, especially in the case of applications that support people with cognitive impairments, as misleading or incorrect knowledge can have serious effects on the person's wellbeing. ...
Chapter
With the changing demographics toward aging population, also the number of people suffering from dementia increases. To allow the prolonged independent and socially active life of patients with dementia (PwD), some works propose the development of intelligent assistive systems that aim to support the PwD during their everyday activities. With the help of a structured knowledge base such systems are able to reason about the person's behavior, causes of deviations from normal behavior, and the appropriate intervention strategies. The knowledge base is usually modeled in the form of an ontology, allowing its reuse in other applications aiming to support the independent life of PwD. In this article, we describe how assistive systems use ontologies in order to support the PwD, and we present ontologies that contain domain-specific knowledge used by healthcare and monitoring systems for PwD. Furthermore, we discuss how these ontologies can be linked to other ontologies in order to extend functionality and applicability to different problems from the domain of dementia.
... Another interesting direction is to derive possible actions from semi-structured data (e.g. textual descriptions of the domain), as demonstrated by Yordanova for the case of PDDL actions [238]. ...
Thesis
Full-text available
Bayesian filtering (BF) is a general probabilistic framework for estimating the state of a dynamic system that can be observed only indirectly thorough noisy measurements. This thesis focuses on systems that consist of multiple, interacting entites (e.g. agents or objects), for which the system dynamics can be specified naturally by multiset rewriting systems (MRSs). Unfortunately, BF in MRSs is computationally challenging due to the combinatorial explosion in the state space size. Therefore, we investigate efficient BF algorithms for such multi-entity systems. The main insight is that the state space that is underling an MRS exhibits a certain symmetry, which can be exploited to increase inference efficiency. This thesis provides five main contributions. First, we show how distributions over multi- sets can be decomposed into two factors: A distribution over the structures and multiplicities of entities, and a distribution over values of the entities’ properties. This representation al- lows to group together entities with identical structure, thus achieving a substantial reduction in representation complexity. As this representation bears some similarity to other concepts from lifted probabilistic inference, we call it a lifted representation. Secondly, we introduce a BF algorithm that works directly on this lifted representation, which is able to achieve a factorial reduction in space and time complexity, compared to conventional, ground filtering. When observations or system dynamics break symmetry, the algorithm automatically adapts by splitting. When a maximally parallel action execution semantics is used – when all entities can act in parallel – exact BF can become intractable due the large number of parallel actions. To alleviate this problem, our third contribution is a Markov chain Monte Carlo algorithm that samples parallel actions instead of performing full enumeration. Fourth, we address the problem that due to symmetry breaks, the algorithm must perform splitting, so that the model can become completely propositional over time and inference becomes intractable. This is done by introducing inverse merging operations for a number of practically relevant special cases. Finally, we empirically evaluate the lifted BF algorithm on real-world human activity recognition domains, and show that the algorithm can be more efficient than propositional BF. To the best of our knowledge, this is the first attempt to provide BF for systems with MRS dynamics and the first attempt that allows to perform prediction and update directly on the lifted representation.
... The authors confirmed that the CSSMs outperformed a training-based model (HMM) when the amount of information from training data is limited. Yordanova [40] focused on structured behavior such as cooking and automatically constructed a knowledge-based behavior model from instruction text that was used as a Planning Domain Definition Language (PDDL). ...
Preprint
Full-text available
This paper presents a robust unsupervised method for recognizing factory work using sensor data from body-worn acceleration sensors. In line-production systems, each factory worker repetitively performs a predefined work process with each process consisting of a sequence of operations. Because of the difficulty in collecting labeled sensor data from each factory worker, unsupervised factory activity recognition has been attracting attention in the ubicomp community. However, prior unsupervised factory activity recognition methods can be adversely affected by any outlier activities performed by the workers. In this study, we propose a robust factory activity recognition method that tracks frequent sensor data motifs, which can correspond to particular actions performed by the workers, that appear in each iteration of the work processes. Specifically, this study proposes tracking two types of motifs: period motifs and action motifs, during the unsupervised recognition process. A period motif is a unique data segment that occurs only once in each work period (one iteration of an overall work process). An action motif is a data segment that occurs several times in each work period, corresponding to an action that is performed several times in each period. Tracking multiple period motifs enables us to roughly capture the temporal structure and duration of the work period even when outlier activities occur. Action motifs, which are spread throughout the work period, permit us to precisely detect the start time of each operation. We evaluated the proposed method using sensor data collected from workers in actual factories and achieved state-of-the-art performance.
... In the future, we plan to test the approach on new datasets from the warehouses domain recorded both in laboratory and real settings. Furthermore, we plan to replace the manual rule definition with an automatic generation from textual sources as proposed in works such as [23]. ...
... That is especially true in medical applications, such as health monitoring. One potential solution to this problem is to automatically generate the CCBM models from textual sources provided by domain experts [57,58]. ...
Article
Full-text available
Wellbeing is often affected by health-related conditions. Among them are nutrition-related health conditions, which can significantly decrease the quality of life. We envision a system that monitors the kitchen activities of patients and that based on the detected eating behaviour could provide clinicians with indicators for improving a patient’s health. To be successful, such system has to reason about the person’s actions and goals. To address this problem, we introduce a symbolic behaviour recognition approach, called Computational Causal Behaviour Models (CCBM). CCBM combines symbolic representation of person’s behaviour with probabilistic inference to reason about one’s actions, the type of meal being prepared, and its potential health impact. To evaluate the approach, we use a cooking dataset of unscripted kitchen activities, which contains data from various sensors in a real kitchen. The results show that the approach is able to reason about the person’s cooking actions. It is also able to recognise the goal in terms of type of prepared meal and whether it is healthy. Furthermore, we compare CCBM to state-of-the-art approaches such as Hidden Markov Models (HMM) and decision trees (DT). The results show that our approach performs comparable to the HMM and DT when used for activity recognition. It outperformed the HMM for goal recognition of the type of meal with median accuracy of 1 compared to median accuracy of 0.12 when applying the HMM. Our approach also outperformed the HMM for recognising whether a meal is healthy with a median accuracy of 1 compared to median accuracy of 0.5 with the HMM.
Conference Paper
Full-text available
Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.
Conference Paper
Full-text available
Different approaches for behaviour understanding rely on textual instructions to generate models of human behaviour. These approaches usually use state of the art parsers to obtain the part of speech (POS) meaning and dependencies of the words in the instructions. For them it is essential that the parser is able to correctly annotate the instructions and especially the verbs as they describe the actions of the person. State of the art parsers usually make errors when annotating textual instructions, as they have short sentence structure often in imperative form. The inability of the parser to identify the verbs results in the inability of behaviour understanding systems to identify the relevant actions. To address this problem, we propose a simple rule-based model that attempts to correct any incorrectly annotated verbs. We argue that the model is able to significantly improve the parser's performance without the need of additional training data. We evaluate our approach by extracting the actions from 61 textual instructions annotated only with the Stanford parser and once again after applying our model. The results show a significant improvement in the recognition rate when applying the rules (75% accuracy compared to 68% without the rules, p-value < 0.001).
Conference Paper
Full-text available
Recent attempts at behaviour understanding through language grounding have shown that it is possible to automatically generate models for planning problems from textual instructions. One drawback of these approaches is that they either do not make use of the semantic structure behind the model elements identified in the text, or they manually incorporate a collection of concepts with semantic relationships between them. We call this collection of knowledge situation model. The situation model introduces additional context information to the model. It could also potentially reduce the complexity of the planning problem compared to models that do not use situation models. To address this problem, we propose an approach that automatically generates the situation model from textual instructions. The approach is able to identify various hierarchical, spatial, directional, and causal relations. We use the situation model to automatically generate planning problems in a PDDL notation and we show that the situation model reduces the complexity of the PDDL model in terms of number of operators and branching factor compared to planning models that do not make use of situation models. We also compare the generated PDDL model to a handcrafted one and show that the generated model performs comparable to simple handcrafted models.
Conference Paper
Full-text available
One aspect of ontology learning methods is the discovery of relations in textual data. One kind of such relations are causal relations. Our aim is to discover causations described in texts such as recipes and manuals. There is a lot of research on causal relations discovery that is based on grammatical patterns. These patterns are, however , rarely discovered in textual instructions (such as recipes) with short and simple sentence structure. Therefore we propose an approach that makes use of time series to discover causal relations. We distinguish causal relations from correlation by assuming that one word causes another only if it precedes the second word temporally. To test the approach, we compared the discovered by our approach causal relations to those obtained through grammatical patterns in 20 textual instructions. The results showed that our approach has an average recall of 41% compared to 13% obtained with the grammatical patterns. Furthermore the discovered by the two approaches causal relations are usually dis-joint. This indicates that the approach can be combined with grammatical patterns in order to increase the number of causal relations discovered in textual instructions.
Article
Full-text available
Background: Computational state space models (CSSMs) enable the knowledge-based construction of Bayesian filters for recognizing intentions and reconstructing activities of human protagonists in application domains such as smart environments, assisted living, or security. Computational, i. e., algorithmic, representations allow the construction of increasingly complex human behaviour models. However, the symbolic models used in CSSMs potentially suffer from combinatorial explosion, rendering inference intractable outside of the limited experimental settings investigated in present research. The objective of this study was to obtain data on the feasibility of CSSM-based inference in domains of realistic complexity. Methods: A typical instrumental activity of daily living was used as a trial scenario. As primary sensor modality, wearable inertial measurement units were employed. The results achievable by CSSM methods were evaluated by comparison with those obtained from established training-based methods (hidden Markov models, HMMs) using Wilcoxon signed rank tests. The influence of modeling factors on CSSM performance was analyzed via repeated measures analysis of variance. Results: The symbolic domain model was found to have more than 10(8) states, exceeding the complexity of models considered in previous research by at least three orders of magnitude. Nevertheless, if factors and procedures governing the inference process were suitably chosen, CSSMs outperformed HMMs. Specifically, inference methods used in previous studies (particle filters) were found to perform substantially inferior in comparison to a marginal filtering procedure. Conclusions: Our results suggest that the combinatorial explosion caused by rich CSSM models does not inevitably lead to intractable inference or inferior performance. This means that the potential benefits of CSSM models (knowledge-based model construction, model reusability, reduced need for training data) are available without performance penalty. However, our results also show that research on CSSMs needs to consider sufficiently complex domains in order to understand the effects of design decisions such as choice of heuristics or inference procedure on performance.
Article
In this paper, we describe an approach for learning planning domain models directly from natural language (NL) descriptions of activity sequences. The modelling problem has been identified as a bottleneck for the widespread exploitation of various technologies in Artificial Intelligence, including automated planners. There have been great advances in modelling assisting and model generation tools, including a wide range of domain model acquisition tools. However, for modelling tools, there is the underlying assumption that the user can formulate the problem using some formal language. And even in the case of the domain model acquisition tools, there is still a requirement to specify input plans in an easily machine readable format. Providing this type of input is impractical for many potential users. This motivates us to generate planning domain models directly from NL descriptions, as this would provide an important step in extending the widespread adoption of planning techniques. We start from NL descriptions of actions and use NL analysis to construct structured representations, from which we construct formal representations of the action sequences. The generated action sequences provide the necessary structured input for inducing a PDDL domain, using domain model acquisition technology. In order to capture a concise planning model, we use an estimate of functional similarity, so sentences that describe similar behaviours are represented by the same planning operator. We validate our approach with a user study, where participants are tasked with describing the activities occurring in several videos. Then our system is used to learn planning domain models using the participants' NL input. We demonstrate that our approach is effective at learning models on these tasks.
Article
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
Conference Paper
Comprehending action preconditions and effects is an essential step in modeling the dynamics of the world. In this paper, we express the semantics of precondition relations extracted from text in terms of planning operations. The challenge of modeling this connection is to ground language at the level of relations. This type of grounding enables us to create high-level plans based on language abstractions. Our model jointly learns to predict precondition relations from text and to perform high-level planning guided by those relations. We implement this idea in the reinforcement learning framework using feedback automatically obtained from plan execution attempts. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, yielding an F-measure of 66% compared to the baseline's 65%. Additionally, we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline -- successfully completing 80% of planning tasks as compared to 69% for the baseline.