Conference PaperPDF Available

Learning Models of Human Behaviour from Textual Instructions

Authors:

Abstract and Figures

There are various activity recognition approaches that rely on manual definition of precondition-effect rules to describe human behaviour. These rules are later used to generate computational models of human behaviour that are able to reason about the user behaviour based on sensor observations. One problem with these approaches is that the manual rule definition is time consuming and error prone process. To address this problem, in this paper we propose an approach that learns the rules from textual instructions. In difference to existing approaches, it is able to learn the causal relations between the actions without initial training phase. Furthermore, it learns the domain ontology that is used for the model generalisation and specialisation. To evaluate the approach, a model describing cooking task was learned and later applied for explaining seven plans of actual human behaviour. It was then compared to a hand-crafted model describing the same problem. The results showed that the learned model was able to recognise the plans with higher overall probability compared to the hand-crafted model. It also learned a more complex domain ontology and was more general than the handcrafted model. In general, the results showed that it is possible to learn models of human behaviour from textual instructions which are able to explain actual human behaviour.
Content may be subject to copyright.
Learning Models of Human Behaviour from Textual Instructions
Kristina Yordanova and Thomas Kirste1
1University of Rostock
18051 Rostock
Germany
{kristina.yordanova,thomas.kirste}@uni-rostock.de
Keywords: precondition-effect rules, models of human behaviour, natural language processing, activity recognition
Abstract:
There are various activity recognition approaches that rely on manual definition of precondition-effect rules to
describe human behaviour. These rules are later used to generate computational models of human behaviour that
are able to reason about the user behaviour based on sensor observations. One problem with these approaches is
that the manual rule definition is time consuming and error prone process. To address this problem, in this paper
we propose an approach that learns the rules from textual instructions. In difference to existing approaches, it is
able to learn the causal relations between the actions without initial training phase. Furthermore, it learns the
domain ontology that is used for the model generalisation and specialisation. To evaluate the approach, a model
describing cooking task was learned and later applied for explaining seven plans of actual human behaviour. It
was then compared to a hand-crafted model describing the same problem. The results showed that the learned
model was able to recognise the plans with higher overall probability compared to the hand-crafted model. It
also learned a more complex domain ontology and was more general than the handcrafted model. In general,
the results showed that it is possible to learn models of human behaviour from textual instructions which are
able to explain actual human behaviour.
1 INTRODUCTION
Assistive systems support the daily activities and
allow even people with impairments to continue their
independent life (Hoey et al., 2010). Such systems
have to recognise the user actions and intentions, track
the user interactions with a variety of objects, detect
errors in the user behaviour, and find the best way of
assisting them (Hoey et al., 2010). This can be done
by activity recognition (AR) approaches that utilise
human behaviour models (HBM) in the form of rules.
These rules are used to generate probabilistic models
with which the system can infer the user actions and
goals (Kr
¨
uger et al., 2014; Hiatt et al., 2011; Ramirez
and Geffner, 2011). Such types of models are also
known as computational state space models (CSSM)
(Kr
¨
uger et al., 2014). They treat activity recognition
as a plan recognition problem, where given an initial
state, a set of possible actions, and a set of observa-
tions, the executed actions and the user goals have to
be recognised (Ramirez and Geffner, 2011). These ap-
proaches rely on prior knowledge to obtain the context
information needed for building the user actions and
the problem domain. The prior knowledge is provided
in the form of precondition-effect rules by a domain
expert or by the model designer. This knowledge is
then used to manually build a CSSM. The manual
modelling is however time consuming and error prone
(Nguyen et al., 2013; Kr¨
uger et al., 2012).
To address this problem, different works propose
the learning of models from sensor data (Zhuo and
Kambhampati, 2013; Okeyo et al., 2011). One prob-
lem these approaches face is that sensor data is ex-
pensive (Ye et al., 2014). Furthermore, sensors are
sometimes unable to capture fine-grained activities
(Chen et al., 2012), thus, they might potentially not be
learned.
To reduce the need of domain experts and / or
sensor data, one can substitute them with textual data
(Philipose et al., 2004). More precisely, one can utilise
the knowledge encoded in textual instructions to learn
the model structure. Textual instructions specify tasks
for achieving a given goal without explicitly stating all
the required steps. On the one hand, this makes them
a challenging source for learning a model (Branavan
et al., 2010). On the other hand, they are usually
written in imperative form, have a simple sentence
structure, and are highly organised. Compared to rich
texts, this makes them a better source for identifying
the sequence of actions needed for reaching the goal
(Zhang et al., 2012).
According to (Branavan et al., 2012), to learn a
model of human behaviour from textual instructions,
the system has to: 1.
extract the actions’ semantics
from the text, 2.
learn the model semantics
through
language grounding, 3. and, finally, to
translate it
into computational model of human behaviour
for
planning problems. To address the problem of learn-
ing models of human behaviour for AR, we extend
the steps proposed by (Branavan et al., 2012). We add
the need of 4.
learning the domain ontology
that is
used to abstract and / or specialise the model. Further-
more, we consider
computational models for activ-
ity recognition
as the targeted model format, as they
represent the problem in the form of a planning prob-
lem and are able to reason about the human behaviour
based on observations (Hiatt et al., 2011; Ramirez and
Geffner, 2011).
In this work we concentrate on the problem of (1)
learning the precondition-effect rules that describe the
human behaviour; (2) learning the domain ontology
that describes the context information and its semantic
structure; and (3) the ability of the learned models to
explain real human behaviour in the form of plans.
2 RELATED WORK
There are various approaches to learning models
of human behaviour from textual instructions: through
grammatical patterns that are used to map the sentence
to a machine understandable model of the sentence
(Zhang et al., 2012; Branavan et al., 2012); through ma-
chine learning techniques (Sil and Yates, 2011; Chen
and Mooney, 2011; Kollar et al., 2014); or through
reinforcement learning approaches that learn language
by interacting with an external environment (Branavan
et al., 2012; Branavan et al., 2010; Kollar et al., 2014).
Models learned through model grounding have
been used for plan generation (Li et al., 2010; Brana-
van et al., 2012), for learning the optimal sequence of
instruction execution (Branavan et al., 2010), for learn-
ing navigational directions (Chen and Mooney, 2011),
and for interpreting human instructions for robots to
follow them (Kollar et al., 2014; Tenorth et al., 2010).
To our knowledge, any attempts to apply language
grounding to learning models for AR rely on iden-
tifying objects from textual data and do not build a
computational model of human behaviour (Perkowitz
et al., 2004; Ye et al., 2014). This, however, suggests
that models learned from text could be used for AR
tasks. AR here is treated as a plan recognition problem,
thus the plan elements have to be learned. Existing
approaches that learn human behaviour from text make
simplifying assumptions about the learning problem,
making them unsuitable for more general AR prob-
lems. More precisely, the preconditions and effects
are learned through explicit causal relations, that are
grammatically expressed in the text (Li et al., 2010; Sil
and Yates, 2011). They however, either rely on initial
manual definition to learn these relations (Branavan
et al., 2012), or on grammatical patterns and rich texts
with complex sentence structure (Li et al., 2010). They
do not address the problem of discovering causal re-
lations between sentences, but assume that all causal
relations are expressed within the sentence (Tenorth
et al., 2010). They also do not identify implicit rela-
tions. However, to find causal relations in instructions
without a training phase, one has to rely on alterna-
tive methods, such as time series analysis (Yordanova,
2015a). Furthermore, they rely on manually defined
ontology, or do not use one. However, one needs an
ontology to deal with model generalisation problems
and as a means for expressing the semantic relations
between model elements.
Moreover, there have been previously no attempts
at learning CSSMs from textual instructions. Existing
CSSM approaches rely on manual rules definition to
build the preconditions and effects of the models. For
example, (Hiatt et al., 2011) use the cognitive archi-
tecture ACT-R, a sub-symbolic production system. It
allows the manual description of actions in terms of
preconditions and effects, while the state of the world
is modelled as information chunks that can be retrieved
from the memory of the system. Other approaches rely
on a PDDL
1
-like notations to describe the possible ac-
tions (Ramirez and Geffner, 2011; Kr
¨
uger et al., 2013).
Then, based on a set of observations, the agent’s ac-
tions and goals are recognised.
In this work we represent the learned CSSM in
a PDDL-like notation and use the learned model to
explain plans that describe actual human behaviour.
3 APPROACH
3.1 Identifying text elements of interest
To extract the text elements that describe the user ac-
tions, their causal relations to other entities and the
environment have to be identified. This is achieved
through assigning each word in a text the correspond-
ing part of speech (POS) tag. Furthermore, the depen-
dencies between text elements are identified through
dependencies parser.
1Planning Domain Definition Language
To identify the human actions, the verbs from the
POS-tagged text are extracted. We are interested in
present tense verbs, as textual instructions are usually
written in present tense, imperative form.
After identifying the actions, we extract any nouns
that are direct objects to the actions. These will be the
objects in the environment with which the human can
interact. Furthermore, we extract any nouns that are in
conjunction to the identified objects. These will have
dependencies to the same actions, to which the objects
with which they are in conjunction are dependent.
Moreover, any preposition relations such as in,on,
at, etc. between the objects and other elements in the
text are identified. These provide spacial or directional
information about the action of interest. For example,
in the sentence “Put the apple on the table.our action
is put, while the object on which the action is executed
is apple. The action is executed in the location table
identified through the on preposition.
Finally, we extract “states” from the text. The state
of an object is the adjectival modifier or the nominal
subject of an object. As in textual instructions the
object is often omitted (e.g. “Simmer (the sauce) until
thickened.), we also investigate the relation between
an action and past tense verbs or adjectives that do not
belong to an adjectival modifier or to nominal subject,
but that might still describe this relation. The states
give us information about the state of the environment
before and after an action is executed.
3.2 Extracting causal relations from
textual instructions
To identify causal relations between the actions, and
between states and actions, we use an approach pro-
posed in (Yordanova, 2015a). It transforms every word
of interest in the text into a time series and then applies
time series analysis to identify any causal relations be-
tween the series. More precisely, each sentence is
treated as a time stamp in the time series. Then, for
each word of interest, the number of occurrences it ap-
pears in the sentence is counted and stored as element
of the time series with the same index as the sentence
index.
Generally, we can generate a time series for each
kind of word, as well as for each tuple of words. Here
we concentrate on those describing or causing change
in a state. That means we generate time series for all
actions and for all states that change an object.
To discover causal relations based on the gener-
ated time series, we apply the Granger causality test.
It is a statistical test for determining whether one
time series is useful for forecasting another. More
precisely, Granger testing performs statistical signifi-
cance test for one time series, “causing” the other time
series with different time lags using auto-regression
(Granger, 1969). The causality relationship is based
on two principles. The first is that the cause hap-
pens prior to the effect, while the second states that
the cause has a unique information about the future
values of its effect. Based on these assumptions,
given two sets of time series
xt
and
yt
, we can test
whether
xt
Granger causes
yt
with a maximum
p
time lag. To do that, we estimate the regression
yt=ao+a1yt1+... +apytp+b1xt1+... +bpxtp
.
An F-test is then used to determine whether the lagged
xterms are significant.
3.3 Building the domain ontology
The domain ontology is divided into argument (object)
and action ontology. The argument ontology describes
the objects, locations, and any other elements in the
environment that are taken as arguments in the actions
and predicates. The action ontology represents the
actions with their arguments and abstraction levels.
To learn the argument ontology, a semantic lexicon
(e.g. WordNet (Miller, 1995)) is used to build the ini-
tial ontology. As the initial ontology does not contain
some types that unify arguments applied to the same
action (see Fig. 1), the ontology has to be extended.
To do that, the prepositions with which actions are
connected to indirect objects are also extracted (e.g.
in,on, etc.). They are then added to the arguments
ontology as parents of the arguments they connect. In
that manner the locational properties of the arguments
are described (e.g. water has the property to be in
something). During the learning of the action tem-
plates and their preconditions and effects (see Section
3.4), additional parent types are added to describe ob-
jects used in actions that have the same preconditions.
Furthermore, types that are not present in the initial
ontology, but which objects are used only in a specific
action, are combined in a common parent type. Fig.
1 shows an example of an argument ontology and the
learning process.
To learn the action ontology, the process for build-
ing action ontology proposed in (Yordanova and Kirste,
2015) is adapted for learning from textual data. More
precisely, based on the argument ontology, the actions
are abstracted by replacing the concrete arguments
with their corresponding types from an upper abstrac-
tion level. In that manner, the uppermost level will
represent the most abstract form of the action. For ex-
ample, the sentence “Put the apple on the table.will
yield the concrete action put apple table, and the ab-
stract action put object location. This representation
will later be used as a basis for the precondition-effect
cutting-board
board
on
water-tap
regulatorsponge
material
soup
dish with
bottle
container
spoon
eating-utensil
glass
plate
tableware
fromin
cupboard
storage-space
wooden-spoon
stove
kitchen-appliance
pot
kitchen utensil
container-tableware
knife
cutting toolwater
liquid
carrot
vegetable
chair
seat
table
furniture
atto
counter sink
fixture
artefact
wash-objecttake-object put-object
entity
device
turn_on-objectsubstance
matter
food instrumentation container-area
ware
area
construction
appliance
commodity
utensil tool
wash-obj
on-in
container-tableware-take
Figure 1: Learning the argument ontology. Step 1 (blue): objects identified through POS-tagging and dependencies; step 2
(black): hierarchy identified through WordNet; step 3 (red): types identified through the relations of objects to prepositions;
step 4 (green): types identified based on similar preconditions; step 5 (yellow): types identified through action abstraction.
rules that describe the actions.
3.4 Generating precondition-effect rules
Having identified the actions, their causal relations,
and the domain ontology, the last step is the generation
of precondition-effect rules that describe the actions
and the way they change the world. The basis for the
rules is the action ontology. Each abstract action from
the ontology is taken and converted to an action tem-
plate that has the form shown in Fig. 2. Basically,
(:action pu t
:parameters (? o - obj e ct ? t o - loc a t io n )
:precondition (and
(not ( ex e c u t e d - p u t ?o ? to ) ) )
:effect (and
( ex e c u t e d - p u t ?o ? to ) )
)
Figure 2: Example of an action template put in the PDDL
notation.
the action name is the first part of the abstract entity
put object location, while the two parameters are the
second and the third part of the entity. Furthermore,
the default predicate (executed-action) is added to both
the precondition and the effect, whereas in the precon-
dition it is negated.
Now the causal relations extracted from the text
are used to extend the actions. The execution of each
action that was identified to cause another action is
added as a precondition to the second action. For
example, to execute the action put, the action take has
to take place. That means that the predicate executed-
take ?o has to be added to the precondition of the
action put.
Furthermore, any states that cause the action are
also added in the precondition. For example, imag-
ine the example sentence is extended in the following
manner: “If the apple is ripe, put the apple on the
table.In that case the state ripe causes the action put.
For that reason the predicate (state-ripe) will also be
added to the precondition.
This procedure is repeated for all available actions.
The result is a set of candidate rules that describe a
given behaviour.
As it is possible that some of the rules contradict
each other, a refinement step is added. This is done by
converting the argument ontology to the correspond-
ing PDDL format to represent the type hierarchy of
the problem. Then a concrete problem is manually
provided. It describes the initial state of the world and
the goal state to be reached. Later, the problem as
well as the rules and the type hierarchy are fed to a
general purpose planner and any predicates that pre-
vent the reaching of the goal are removed from the
preconditions.
4 EXPERIMENTAL SETUP
To evaluate the approach, textual instructions de-
scribing a kitchen experiment were used to generate
precondition-effect rules. The instructions were ob-
tained from the annotation of an activity recognition
experiment where a person prepares a carrot soup,
then serves the meal, has lunch, then cleans the ta-
ble (Kr
¨
uger et al., 2015; Kr
¨
uger et al., 2014). The
instructions consisted of 80 sentences, with an aver-
age sentence length of 6.1 words, and an average of
1 action per sentence. The instructions were parsed
with the Stanford Parser to obtain the POS-tags and
the dependencies. They were then used as an input for
identifying the model elements and for generating the
time series. The time series were then tested for station-
arity by using the Augmented Dickey–Fuller (ADF)
t-statistic test. It showed that the series are already
stationary. The generated time series are available at
the University Library of the University of Rostock
(Yordanova, 2015b). Using the Granger causality test,
18 causal relations were discovered. They were then
used as an input for building the precondition-effect
rules. Furthermore, WordNet was used to build the
initial argument ontology. It was later extended by the
proposed process and resulted in the ontology in Fig.
1. The initial state and the goal were manually defined
and a planner (Yordanova et al., 2012) was used to
identify any rules that contradict each other.
The resulting model
CSSMl
was compared to a
hand crafted model developed for the same problem
CSSMm
. The reason for that was to evaluate the model
complexity in comparison to that of a model built by a
human expert.
Later, the models were used to recognise seven
plans, based on the video log from the cooking dataset.
Landmarks were used as action selection heuristic
(Richter and Westphal, 2010). This allowed the com-
putation of the approximate goal distance to the goal.
The actions in the plans were represented according to
the action schema learned in each of the models. The
plans were between 67 and 86 steps long. Cohen’s
kappa was calculated to determine the plans’ similar-
ity. The mean kappa was 0.18, which indicates that
the overlapping of the plans was low.
5 RESULTS
Table 1 provides information about the model di-
mensions of both
CSSMl
and
CSSMm
. The model de-
signer in
CSSMm
identified 16 action classes. The
learning method in
CSSMl
discovered 15 action
classes. That is due to the fact that the action wait
was introduces in
CSSMm
. As this action was not
present in the textual instructions, it was not discov-
ered in
CSSMl
. The action wait, however, is causally
unrelated to any of the other actions so its presence
did not change the causal structure of the model.
Furthermore,
CSSMl
discovered 18 arguments in
the textual instructions, while the designer modelled
17 in
CSSMm
. This is due to the fact that in the textual
instructions after the carrots are cooked, they are trans-
formed into a soup. Thus, soup is also an argument
in the model. On the other hand, the system designer
decided to use the argument carrots also for describ-
ing the soup. This shows that the approach is able to
learn also context information that is discarded in the
manual model.
CSSMl
learned less operator (action) templates and
predicates than those modelled in
CSSMm
. On the
other hand, the rules resulted in smaller number of
ground operators and predicates in
CSSMm
than in
CSSMl
. This indicates that
CSSMl
is more general
than
CSSMm
. This can be explained by the fact that
there are less restrictions in the form of predicates in
CSSMlthan in CSSMm.
parameters CSSMlCSSMm
action classes 15 16
operator templates 22 28
predicate templates 17 36
ground operators 189 110
ground predicates 204 160
arguments 18 17
max. arg. per template 2 3
ontology elements 66 41
ontology levels 7 5
states 46,845,389 10,312
Table 1: Model parameters of CSSMmand CSSMl.
The model designer implemented action templates
with maximum of three arguments in
CSSMm
. On
the other hand, in
CSSMl
templates with maximum
of two arguments were learned. This indicates that
the learned templates in
CSSMl
are of lower complex-
ity than those in
CSSMm
. Furthermore, the model
designer in
CSSMm
introduced several additional pred-
icates aiming at reducing the model size and increasing
the action probability. This made the preconditions
and effects in
CSSMm
more complex than those in
CSSMl2.
0 20 40
Branching factor
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
0 20 40
Branching factor
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
Figure 3: Median number of possible actions in
CSSMl
given
the plan (top), and in CSSMm(bottom).
Moreover, the proposed approach was able to learn
an argument ontology with seven levels of abstraction
and with 66 elements in total (See Fig. 1). The model
designer in
CSSMm
modelled a simpler ontology with
2
For example, the action wash in
CSSMl
consists of three
rules in the form of predicates in the precondition and one
rule in the effect clause. On the other hand, the same action
in
CSSMm
consists of seven rules in the precondition and
three rules in the effect.
five levels of abstraction and with 41 elements in to-
tal. This indicates that the learning approach is able
to discover more complex semantic structures, given
the same problem domain. Furthermore, an iteratively
deepening depth first search with maximum depth of
seven was applied to analyse the state space graph.
It discovered over 46 million states in
CSSMl
and
10,312 states in
CSSMm
. Due to the size of the models,
the complete state space graphs were not traversed.
The larger number of discovered states in
CSSMl
once
again stands to show that the learned model is more
general than the manually developed one. The number
of plans in both models was not computed because the
whole state space was not completely explored.
CSSMl
and
CSSMm
were also applied to seven
plans produced by observing the video log of the
kitchen experiment and by using the action schema
from the corresponding model. Fig. 3 shows the me-
dian branching factor in each plan. There the num-
ber of possible actions from a given state in
CSSMl
(Fig. 3 (top)) was relatively high (between 30 and 55
executable actions). This is explained by the model
generality. In other words,
CSSMl
does not have many
restrictions which results in high behaviour variability.
It can also be seen that the first plan had a slightly
smaller branching factor than the rest of the plans.
This is due to the fact that the instructions describing
the experiment were compiled based on the execution
sequence in the first experiment. This means that the
learned model was overfitted for the first plan. Still, it
was able to successfully interpret the remaining plans.
In comparison,
CSSMm
had a median branching factor
of 10 (Fig. 3 (bottom)). This is due to the additional
modelling mechanisms applied by the designer to re-
duce the model complexity (Yordanova et al., 2014).
Regardless of the lower branching factor, having
ten choices from a given state reduced the action prob-
ability. This was reflected in the probability of ex-
ecuting an action in the plan given the model. Fig.
4 (top) shows that the probability was very low for
both models. Surprisingly, with nearing the goal state,
the probability in
CSSMl
increased (Fig. 4 (top left)),
while the probability in
CSSMm
stayed low until the
goal state was reached (Fig. 4 (top right)). This can
be explained by the shorter goal distance in
CSSMl
(Fig. 4 (bottom left)) compared to that in
CSSMm
(Fig.
4 (bottom right)). The distance to the goal in
CSSMl
starts with 11 states to the goal and generally decreases
with each executed action. On the other hand, the goal
distance in
CSSMm
is 50 at the beginning of the prob-
lem and it stays relatively long during the execution of
the plan. As the estimated goal distance is used as an
action selection heuristic, the long distance decreased
the action probability in CSSMm.
6 DISCUSSION
In this paper we presented an approach to learning
models of human behaviour from textual instructions.
The results showed that the approach is able to learn
a causally correct model of human behaviour. The
model was able to explain the behaviour of seven ex-
periment participants that executed kitchen tasks.
In difference to existing approaches, the proposed
method was able to learn a complex domain ontology.
It was then used for generalising the model. This was
reflected in the large state space and branching factor
of the resulting model.
It also applied a new method for causal relation
discovery, that was previously not applied to model
learning problems. The method yielded good results
without the need of a learning phase.
The model was compared to a hand-crafted CSSM
model. The results showed that the learned model is
more general than the hand-crafted model. On the
other hand, they also showed that the learned model
has smaller estimated goal distance. This resulted in
the higher action probability when executing a plan,
compared to the hand-crafted model.
Some of the limitations, the learned model had, are
as follows. The model was able to learn actions with
simple predicates. This resulted in the model generali-
sation. However, if applied to activity recognition, gen-
eral models tend to decrease the model performance
due to the high branching factor. This can be solved
through reliable action selection heuristics, or through
strategies for reducing the model complexity trough
more complex predicates (Yordanova et al., 2014). In
the future, we intend to include mechanisms for utilis-
ing these strategies during the learning process.
The model was unable to learn repeating actions,
such as repetitively eating or drinking. This is due to
the fact that the model learned that the precondition
for executing the action is that it still has not been
executed
3
. Then to repeat the action, the precondition
that the action was not executed is violated. To solve
this problem, a mechanism has to be introduced for
identifying repeating actions in the text.
Another aspect of model learning is how to learn
the initial and goal states. In this work we defined
them manually. In the future we intend to investigate
methods for learning the initial and goal state based
on possible predicate combinations and reinforcement
learning techniques.
Furthermore, in this work we only addressed the
3
This is not a problem for action pairs like open and close
as they negate each other’s effects. However, for actions that
do not have negating action, the executed predicate cannot
be negated, rendering the action impossible.
0 20 40 60 80
0.0 0.2 0.4 0.6 0.8 1.0
Time step
Action probability
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
0 20 40 60 80
0.0 0.2 0.4 0.6 0.8 1.0
Time step
Action probability
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
0 20 40 60 80
−12 −8 −6 −4 −2 0
Time step
Estimated distance to goal
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
0 20 40 60 80
−50 −40 −30 −20 −10 0
Time step
Estimated distance to goal
plan 1
plan 2
plan 3
plan 4
plan 5
plan 6
plan 7
Figure 4: Probability of selecting the action in the plan, given
CSSMl
(top left) and given
CSSMm
(top right), and estimated
distance to goal, given the action executed in the plan for
CSSMl
(bottom left) and for
CSSMm
(bottom right). To improve the
visibility of overlapping lines, each line was shifted with 1% from the preceding.
problem of learning the model structure through its
domain ontology and the preconditions and effects de-
scribing the actions. However, if we want to apply the
models to activity recognition tasks, the model needs
to be optimised to increase the probability of selecting
the correct action. This can be done by employing rein-
forcement learning techniques based on observations
similar to those proposed in (Branavan et al., 2012).
In the future, we intend to extend our approach
to learning models for AR problems based on sensor
observations. To achieve that, methods for optimising
the model structure will be investigated.
REFERENCES
Branavan, S. R. K., Kushman, N., Lei, T., and Barzilay, R.
(2012). Learning high-level planning from text. In Pro-
ceedings of the 50th Annual Meeting of the Association
for Computational Linguistics: Long Papers - Volume
1, ACL ’12, pages 126–135, Stroudsburg, PA, USA.
Association for Computational Linguistics.
Branavan, S. R. K., Zettlemoyer, L. S., and Barzilay, R.
(2010). Reading between the lines: Learning to map
high-level instructions to commands. In Proceedings
of the 48th Annual Meeting of the Association for Com-
putational Linguistics, ACL ’10, pages 1268–1277,
Stroudsburg, PA, USA. Association for Computational
Linguistics.
Chen, D. L. and Mooney, R. J. (2011). Learning to interpret
natural language navigation instructions from observa-
tions. In Proceedings of the 25th AAAI Conference on
Artificial Intelligence (AAAI-2011), pages 859–865.
Chen, L., Hoey, J., Nugent, C., Cook, D., and Yu, Z. (2012).
Sensor-based activity recognition. IEEE Transactions
on Systems, Man, and Cybernetics, Part C: Applica-
tions and Reviews, 42(6):790–808.
Granger, C. W. J. (1969). Investigating Causal Relations
by Econometric Models and Cross-spectral Methods.
Econometrica, 37(3):424–438.
Hiatt, L. M., Harrison, A. M., and Trafton, J. G. (2011). Ac-
commodating human variability in human-robot teams
through theory of mind. In Proceedings of the Twenty-
Second International Joint Conference on Artificial
Intelligence, IJCAI’11, pages 2066–2071, Barcelona,
Spain. AAAI Press.
Hoey, J., Poupart, P., Bertoldi, A. v., Craig, T., Boutilier, C.,
and Mihailidis, A. (2010). Automated handwashing
assistance for persons with dementia using video and
a partially observable markov decision process. Com-
puter Vision and Image Understanding, 114(5):503–
519.
Kollar, T., Tellex, S., Roy, D., and Roy, N. (2014). Ground-
ing verbs of motion in natural language commands to
robots. In Khatib, O., Kumar, V., and Sukhatme, G.,
editors, Experimental Robotics, volume 79 of Springer
Tracts in Advanced Robotics, pages 31–47. Springer
Berlin Heidelberg.
Kr
¨
uger, F., Yordanova, K., Hein, A., and Kirste, T. (2013).
Plan synthesis for probabilistic activity recognition.
In Filipe, J. and Fred, A. L. N., editors, Proceedings
of the 5th International Conference on Agents and
Artificial Intelligence (ICAART 2013), pages 283–288,
Barcelona, Spain. SciTePress.
Kr
¨
uger, F., Yordanova, K., K
¨
oppen, V., and Kirste, T. (2012).
Towards tool support for computational causal behavior
models for activity recognition. In Proceedings of
the 1st Workshop: ”Situation-Aware Assistant Systems
Engineering: Requirements, Methods, and Challenges”
(SeASE 2012) held at Informatik 2012, pages 561–572,
Braunschweig, Germany.
Kr
¨
uger, F., Hein, A., Yordanova, K., and Kirste, T. (2015).
Recognising the actions during cooking task (cooking
task dataset). University Library, University of Rostock.
http://purl.uni-rostock.de/rosdok/id00000116.
Kr
¨
uger, F., Nyolt, M., Yordanova, K., Hein, A., and Kirste, T.
(2014). Computational state space models for activity
and intention recognition. a feasibility study. PLoS
ONE, 9(11):e109381.
Li, X., Mao, W., Zeng, D., and Wang, F.-Y. (2010). Auto-
matic construction of domain theory for attack plan-
ning. In IEEE International Conference on Intelligence
and Security Informatics (ISI), 2010, pages 65–70.
Miller, G. A. (1995). Wordnet: A lexical database for english.
Commun. ACM, 38(11):39–41.
Nguyen, T. A., Kambhampati, S., and Do, M. (2013). Synthe-
sizing robust plans under incomplete domain models.
In Burges, C., Bottou, L., Welling, M., Ghahramani,
Z., and Weinberger, K., editors, Advances in Neural
Information Processing Systems 26, pages 2472–2480.
Curran Associates, Inc.
Okeyo, G., Chen, L., Wang, H., and Sterritt, R. (2011).
Ontology-based learning framework for activity assis-
tance in an adaptive smart home. In Chen, L., Nugent,
C. D., Biswas, J., and Hoey, J., editors, Activity Recog-
nition in Pervasive Intelligent Environments, volume 4
of Atlantis Ambient and Pervasive Intelligence, pages
237–263. Atlantis Press.
Perkowitz, M., Philipose, M., Fishkin, K., and Patterson,
D. J. (2004). Mining models of human activities from
the web. In Proceedings of the 13th International
Conference on World Wide Web, WWW ’04, pages
573–582, New York, NY, USA. ACM.
Philipose, M., Fishkin, K. P., Perkowitz, M., Patterson, D. J.,
Fox, D., Kautz, H., and Hahnel, D. (2004). Inferring ac-
tivities from interactions with objects. IEEE Pervasive
Computing, 3(4):50–57.
Ramirez, M. and Geffner, H. (2011). Goal recognition over
pomdps: Inferring the intention of a pomdp agent. In
Proceedings of the Twenty-Second International Joint
Conference on Artificial Intelligence, volume 3 of IJ-
CAI’11, pages 2009–2014, Barcelona, Spain. AAAI
Press.
Richter, S. and Westphal, M. (2010). The lama planner:
Guiding cost-based anytime planning with landmarks.
Journal of Artificial Intelligence Research, 39(1):127–
177.
Sil, A. and Yates, E. (2011). Extracting strips representations
of actions and events. In Recent Advances in Natural
Language Processing, pages 1–8.
Tenorth, M., Nyga, D., and Beetz, M. (2010). Understanding
and executing instructions for everyday manipulation
tasks from the world wide web. In IEEE International
Conference on Robotics and Automation (ICRA), pages
1486–1491.
Ye, J., Stevenson, G., and Dobson, S. (2014). Usmart: An un-
supervised semantic mining activity recognition tech-
nique. ACM Trans. Interact. Intell. Syst., 4(4):16:1–
16:27.
Yordanova, K. (2015a). Discovering causal relations in tex-
tual instructions. In Recent Advances in Natural Lan-
guage Processing, pages 714–720, Hissar, Bulgaria.
Yordanova, K. (2015b). Time series from textual instruc-
tions for causal relations discovery (causal relations
dataset). University Library, University of Rostock.
http://purl.uni-rostock.de/rosdok/id00000117.
Yordanova, K. and Kirste, T. (2015). A process for sys-
tematic development of symbolic models for activity
recognition. ACM Transactions on Interactive Intelli-
gent Systems, 5(4).
Yordanova, K., Kr
¨
uger, F., and Kirste, T. (2012). Tool sup-
port for activity recognition with computational causal
behaviour models. In Proceedings of the 35th German
Conference on Artificial Intelligence, pages 561–573,
Saarbr¨
ucken, Germany.
Yordanova, K., Nyolt, M., and Kirste, T. (2014). Strategies
for reducing the complexity of symbolic models for
activity recognition. In Agre, G., Hitzler, P., Krisnadhi,
A., and Kuznetsov, S., editors, Artificial Intelligence:
Methodology, Systems, and Applications, volume 8722
of Lecture Notes in Computer Science, pages 295–300.
Springer International Publishing.
Zhang, Z., Webster, P., Uren, V., Varga, A., and Ciravegna,
F. (2012). Automatically extracting procedural knowl-
edge from instructional texts using natural language
processing. In Calzolari, N., Choukri, K., Declerck,
T., Do
˘
gan, M. U., Maegaard, B., Mariani, J., Moreno,
A., Odijk, J., and Piperidis, S., editors, Proceedings of
the International Conference on Language Resources
and Evaluation (LREC’12), Istanbul, Turkey. European
Language Resources Association.
Zhuo, H. H. and Kambhampati, S. (2013). Action-model
acquisition from noisy plan traces. In Proceedings of
the 23rd International Joint Conference on Artificial
Intelligence (IJCAI), pages 2444–2450, Beijing, China.
AAAI.
... To address this problem, in this paper, we analyse the processing of the high level descriptions so that it could be used to complete the process of acquiring knowledge from video records. To achieve that, we adapt a model learning approach first proposed in [16,17]. It extracts the knowledge base needed to describe the problem domain from textual instructions and later uses this knowledge to generate rules describing the possible user behaviour. ...
... In previous works, a new approach is able to learn planning models of human behaviour from textual instructions [16,17]. These works have shown that the models can be applied to the domain of everyday activities and especially to cooking activities. ...
... We then use these rules to reason about the possible correct exercise execution. In previous works we have used PDDL (Planning Domain Definition Language) as the targeted format [14,16,17]. In this work, however, we show that it is possible to map the situation model to event calculus, as it has already been shown that event calculus is suitable for reasoning in the context of minimally invasive surgery exercises [8]. ...
Conference Paper
One of the major difficulties in activity recognition stems from the lack of a model of the world where activities and events are to be recognised. When the domain is fixed and repetitive we can manually include this information using some kind of ontology or set of constraints. On many occasions, however, there are many new situations for which only some knowledge is common and many other domain-specific relations have to be inferred. Humans are able to do this from short descriptions in natural language, describing the scene or the particular task to be performed. In this paper we apply a tool that extracts situation models and rules from natural language description to a series of exercises in a surgical domain, in which we want to identify the sequence of events that are not possible, those that are possible (but incorrect according to the exercise) and those that correspond to the exercise or plan expressed by the description in natural language. The preliminary results show that a large amount of valuable knowledge can be extracted automatically, which could be used to express domain knowledge and exercises description in languages such as event calculus that could help bridge these high-level descriptions with the low-level events that are recognised from videos.
... The text-analysis approach employed frequent keywords and text-patterns, and the structural analysis approach used layout markups of source documents. Human behavior models were learned from textual instructions, and actions and their causal relations were identified based on the part of speech and dependencies [23]. ...
... Short sentences including flow relations are expressed as the branch structures of the process tree. For relations among multiple short sentences, the extracted exclusive relation is written as the exclusive operator and the extracted parallel relation is written as the parallel composition (line [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. The extracted skipped relation is written as the exclusive operator and an empty task ''τ '' is added. ...
Article
Full-text available
Emergency plans can be regarded as effective guidance of hazard emergency responses, and they include textual descriptions of emergency response processes in terms of natural language. In this paper, we propose an approach to automatically extract emergency response process models from Chinese emergency plans. First, the emergency plan is represented as a text tree according to its layout markups and sentence-sequential relations. Then, process model elements, including four-level response condition formulas, executive roles, response tasks and flow relations, are identified by rule-based approaches. Finally, an emergency response process tree is generated from both the text tree and extracted process model elements, and is transformed to an emergency response process which is modeled as BPMN (Business Process Modeling Notation). Extensive experiments on real-world emergency plans demonstrate that the proposed approach is capable of extracting emergency response process models which are highly consistent with both the manually extracted models and the textual plans.
... The second problem is that existing approaches either rely on manually defined situation model (Sil and Yates, 2011;Branavan et al., 2012;Goldwasser and Roth, 2014), or do not use one (Li et al., 2010;Branavan et al., 2011Branavan et al., , 2010Zhang et al., 2012;Vogel and Jurafsky, 2010). However, one needs a situation model to deal with model generalisation problems and as a means for expressing the semantic relations between model elements (Yordanova and Kirste, 2016;Yordanova, 2016). What is more, the manual definition is time consuming and often requires domain experts. ...
... To address these two problems, in previous works we outlined an approach for automatic generation of behaviour models from texts (Yordanova and Kirste, 2016;Yordanova, 2016Yordanova, , 2017. In this work, we extend the approach by proposing a method for automatic generation of situation models. ...
Conference Paper
Full-text available
Recent attempts at behaviour understanding through language grounding have shown that it is possible to automatically generate models for planning problems from textual instructions. One drawback of these approaches is that they either do not make use of the semantic structure behind the model elements identified in the text, or they manually incorporate a collection of concepts with semantic relationships between them. We call this collection of knowledge situation model. The situation model introduces additional context information to the model. It could also potentially reduce the complexity of the planning problem compared to models that do not use situation models. To address this problem, we propose an approach that automatically generates the situation model from textual instructions. The approach is able to identify various hierarchical, spatial, directional, and causal relations. We use the situation model to automatically generate planning problems in a PDDL notation and we show that the situation model reduces the complexity of the PDDL model in terms of number of operators and branching factor compared to planning models that do not make use of situation models. We also compare the generated PDDL model to a handcrafted one and show that the generated model performs comparable to simple handcrafted models.
... Sil and Yates (2011) derived preconditions and postconditions for a fixed set of actions given a text input. Yordanova and Kirste (2016) used a stream of valid natural language instructions to learn the corresponding planning domain. Lindsay et al. (2017) used semantic sentence annotations to build action representations from identified subjects, verbs, and objects, creating a generic domain extractor; Hayton et al. (2020) improved upon this approach by additionally reconciling entities across different pronoun references. ...
Article
Full-text available
Story generators using language models offer the automatic production of highly fluent narrative content, but they are hard to control and understand, seizing creative tasks that many authors wish to perform themselves. On the other hand, planning-based story generators are highly controllable and easily understood but require story domains that must be laboriously crafted; further, they lack the capacity for fluent language generation. In this paper, we explore hybrid approaches that aim to bridge the gap between language models and narrative planners. First, we demonstrate that language models can be used to author narrative planning domains from natural language stories with minimal human intervention. Second, we explore the reverse, demonstrating that we can use logical story domains and plans to produce stories that respect the narrative commitments of the planner. In doing so, we aim to build a foundation for human-centric authoring tools that facilitate novel creative experiences.
... There are different works that address the problem of learning models from textual instructions. Such models are used for constructing plans of human behaviour [15,4,26,28], for learning an optimal actions' execution sequence based on natural instructions [5,6,22,7,1,3], for constructing machine understandable model from natural language instructions [35,11], and for automatically generating semantic annotation for sensor datasets [27,30]. Model learning from textual instructions has applications in different fields of computer science: constructing plans of terrorist attacks [15], improving tasks execution (such as navigation, computer commands following, games playing) by interpreting natural language instructions [5,6,22,7,1,3], the ability of a robot or machine to interpret instructions given in natural language [35,11], or for behaviour analysis tasks based on sensor observations [26,32]. ...
Preprint
Full-text available
Recent research in behaviour understanding through language grounding has shown it is possible to automatically generate behaviour models from textual instructions. These models usually have goal-oriented structure and are modelled with different formalisms from the planning domain such as the Planning Domain Definition Language. One major problem that still remains is that there are no benchmark datasets for comparing the different model generation approaches, as each approach is usually evaluated on domain-specific application. To allow the objective comparison of different methods for model generation from textual instructions, in this report we introduce a dataset consisting of 83 textual instructions in English language, their refinement in a more structured form as well as manually developed plans for each of the instructions. The dataset is publicly available to the community.
... To address this problem, it is possible to extend the approach so that it automatically generates the underlying semantic model needed for validating the annotation. This can be done by adapting the idea of learning models of human behaviour from textual instructions proposed in works such as [51][52][53][54]. One can use this to automatically generate the underlying semantic model from detailed textual descriptions of the experiment. ...
Article
Full-text available
Providing ground truth is essential for activity recognition and behaviour analysis as it is needed for providing training data in methods of supervised learning, for providing context information for knowledge-based methods, and for quantifying the recognition performance. Semantic annotation extends simple symbolic labelling by assigning semantic meaning to the label, enabling further reasoning. In this paper, we present a novel approach to semantic annotation by means of plan operators. We provide a step by step description of the workflow to manually creating the ground truth annotation. To validate our approach, we create semantic annotation of the Carnegie Mellon University (CMU) grand challenge dataset, which is often cited, but, due to missing and incomplete annotation, almost never used. We show that it is possible to derive hidden properties, behavioural routines, and changes in initial and goal conditions in the annotated dataset. We evaluate the quality of the annotation by calculating the interrater reliability between two annotators who labelled the dataset. The results show very good overlapping (Cohen’s κ of 0.8) between the annotators. The produced annotation and the semantic models are publicly available, in order to enable further usage of the CMU grand challenge dataset.
... Specification-based approaches rely on manually incorporating expert knowledge into logic rules that allow reasoning about the situation [15]. On the other hand, learning-based methods can rely on the sensor data to learn the situation [16] or on textual sources to extract the situation related information and its semantic structure [17,18,19,20]. In this work we use specification-based approach to build the situation model. ...
Article
Full-text available
Background: Dementia impairs spatial orientation and route planning, thus often affecting the patient’s ability to move outdoors and maintain social activities. Situation-aware deliberative assistive technology devices (ATD) can substitute impaired cognitive function in order to maintain one’s level of social activity. To build such system one needs domain knowledge about the patient’s situation and needs. We call this collection of knowledge situation model. Objective: To construct a situation model for the outdoor mobility of people with dementia (PwD). The model serves two purposes: (a) as a knowledge base from which to build an ATD describing the mobility of PwD; (b) as a codebook for the annotation of the recorded behavior. Methods: We perform systematic knowledge elicitation to obtain the relevant knowledge. The OBO Edit tool is used for implementing and validating the situation model.The model is evaluated by using it as a codebook for annotating the behavior of PwD during a mobility study and interrater agreement is computed. In addition, clinical experts perform manual evaluation and curation of the model. Results: The situation model consists of 101 concepts with 11 relation types between them. The results from the annotation showed substantial overlapping between two annotators (Cohen’s kappa pf 0.61). Conclusion: The situation model is a first attempt to systematically collect and organize information related to the outdoor mobility of PwD for the purposes of situation-aware assistance. The model is the base for building an ATD able to provide situation-aware assistance and to potentially improve the quality of life of PwD.
... To identify the set of possible actions the person can execute, these approaches rely on parsers, which assign part of speech (POS) tags to words (Yordanova, 2017b;Yordanova and Kirste, 2016;Yordanova, 2016;Preum et al., 2017;Lindsay et al., 2017). These parsers usually rely on training data in order to be able to perform POS-tagging (Martins et al., 2010;Chen and Manning, 2014). ...
Conference Paper
Full-text available
Different approaches for behaviour understanding rely on textual instructions to generate models of human behaviour. These approaches usually use state of the art parsers to obtain the part of speech (POS) meaning and dependencies of the words in the instructions. For them it is essential that the parser is able to correctly annotate the instructions and especially the verbs as they describe the actions of the person. State of the art parsers usually make errors when annotating textual instructions, as they have short sentence structure often in imperative form. The inability of the parser to identify the verbs results in the inability of behaviour understanding systems to identify the relevant actions. To address this problem, we propose a simple rule-based model that attempts to correct any incorrectly annotated verbs. We argue that the model is able to significantly improve the parser's performance without the need of additional training data. We evaluate our approach by extracting the actions from 61 textual instructions annotated only with the Stanford parser and once again after applying our model. The results show a significant improvement in the recognition rate when applying the rules (75% accuracy compared to 68% without the rules, p-value < 0.001).
Preprint
Full-text available
Reasoning about real-life events is a unifying challenge in AI and NLP that has profound utility in a variety of domains, while fallacy in high-stake applications could be catastrophic. Able to work with diverse text in these domains, large language models (LLMs) have proven capable of answering questions and solving problems. However, I show that end-to-end LLMs still systematically fail to reason about complex events, and they lack interpretability due to their black-box nature. To address these issues, I propose three general approaches to use LLMs in conjunction with a structured representation of events. The first is a language-based representation involving relations of sub-events that can be learned by LLMs via fine-tuning. The second is a semi-symbolic representation involving states of entities that can be predicted and leveraged by LLMs via few-shot prompting. The third is a fully symbolic representation that can be predicted by LLMs trained with structured data and be executed by symbolic solvers. On a suite of event reasoning tasks spanning common-sense inference and planning, I show that each approach greatly outperforms end-to-end LLMs with more interpretability. These results suggest manners of synergy between LLMs and structured representations for event reasoning and beyond.
Conference Paper
Full-text available
One aspect of ontology learning methods is the discovery of relations in textual data. One kind of such relations are causal relations. Our aim is to discover causations described in texts such as recipes and manuals. There is a lot of research on causal relations discovery that is based on grammatical patterns. These patterns are, however , rarely discovered in textual instructions (such as recipes) with short and simple sentence structure. Therefore we propose an approach that makes use of time series to discover causal relations. We distinguish causal relations from correlation by assuming that one word causes another only if it precedes the second word temporally. To test the approach, we compared the discovered by our approach causal relations to those obtained through grammatical patterns in 20 textual instructions. The results showed that our approach has an average recall of 41% compared to 13% obtained with the grammatical patterns. Furthermore the discovered by the two approaches causal relations are usually dis-joint. This indicates that the approach can be combined with grammatical patterns in order to increase the number of causal relations discovered in textual instructions.
Technical Report
Full-text available
The dataset contains the data of acceleration sensors attached to a person during the execution of a kitchen task. It consists of 7 datasets that describe the execution of preparing and having a meal: preparing the ingredients, cooking, serving the meal, having a meal, cleaning the table, and washing the dishes. The aim of the experiment is to investigate the ability of activity recognition approaches to recognise fine-grained user activities based on acceleration data. The results from the dataset can be found in the PlosOne paper "Computational State Space Models for Activity and Intention Recognition. A Feasibility Study" by Krüger et al. The dataset can be found here: http://purl.uni-rostock.de/rosdok/id00000116
Technical Report
Full-text available
The dataset contains time series generated from textual instructions. This is done by converting each word of interest in the text into time series that represent how often and in which time stamp this word appears in the text. The goal is then to apply time series analysis in order to discover implicit and between-sentence causal relations between text elements. The dataset can be found here: http://purl.uni-rostock.de/rosdok/id00000117
Article
Full-text available
Research on sensor-based activity recognition has, recently, made significant progress and is attracting growing attention in a number of disciplines and application domains. However, there is a lack of high-level overview on this topic that can inform related communities of the research state of the art. In this paper, we present a comprehensive survey to examine the development and current status of various aspects of sensor-based activity recognition. We first discuss the general rationale and distinctions of vision-based and sensor-based activity recognition. Then, we review the major approaches and methods associated with sensor-based activity monitoring, modeling, and recognition from which strengths and weaknesses of those approaches are highlighted. We make a primary distinction in this paper between data-driven and knowledge-driven approaches, and use this distinction to structure our survey. We also discuss some promising directions for future research.
Article
Full-text available
Background: Computational state space models (CSSMs) enable the knowledge-based construction of Bayesian filters for recognizing intentions and reconstructing activities of human protagonists in application domains such as smart environments, assisted living, or security. Computational, i. e., algorithmic, representations allow the construction of increasingly complex human behaviour models. However, the symbolic models used in CSSMs potentially suffer from combinatorial explosion, rendering inference intractable outside of the limited experimental settings investigated in present research. The objective of this study was to obtain data on the feasibility of CSSM-based inference in domains of realistic complexity. Methods: A typical instrumental activity of daily living was used as a trial scenario. As primary sensor modality, wearable inertial measurement units were employed. The results achievable by CSSM methods were evaluated by comparison with those obtained from established training-based methods (hidden Markov models, HMMs) using Wilcoxon signed rank tests. The influence of modeling factors on CSSM performance was analyzed via repeated measures analysis of variance. Results: The symbolic domain model was found to have more than 10(8) states, exceeding the complexity of models considered in previous research by at least three orders of magnitude. Nevertheless, if factors and procedures governing the inference process were suitably chosen, CSSMs outperformed HMMs. Specifically, inference methods used in previous studies (particle filters) were found to perform substantially inferior in comparison to a marginal filtering procedure. Conclusions: Our results suggest that the combinatorial explosion caused by rich CSSM models does not inevitably lead to intractable inference or inferior performance. This means that the potential benefits of CSSM models (knowledge-based model construction, model reusability, reduced need for training data) are available without performance penalty. However, our results also show that research on CSSMs needs to consider sufficiently complex domains in order to understand the effects of design decisions such as choice of heuristics or inference procedure on performance.
Conference Paper
Full-text available
Recently, in the field of activity recognition a number of approaches that utilise probabilistic symbolic models have been proposed. Such approaches rely on the combination of symbolic state-space models and probabilistic inference techniques in order to recognise the user activities in situations with uncertainty. One problem with such approaches is the huge state space that can be generated just by a few rules. In this work we investigate the effects of a mechanism for reducing the model complexity on symbolic level. To illustrate the approach, we present one possible strategy and discuss its effects on the model size and the probability of selecting the correct action in an office scenario.
Article
The article can be downloaded from https://mmis.informatik.uni-rostock.de/index.php?title=A_Process_for_Systematic_Development_of_Symbolic_Models_for_Activity_Recognition Several emerging approaches to activity recognition (AR) combine symbolic representation of user actions with probabilistic elements for reasoning under uncertainty. These approaches provide promising results in terms of recognition performance, coping with the uncertainty of observations, and model size explosion when complex problems are modelled. But experience has shown that it is not always intuitive to model even seemingly simple problems. To date, there are no guidelines for developing such models. To address this problem, in this work we present a development process for building symbolic models that is based on experience acquired so far as well as on existing engineering and data analysis workflows. The proposed process is a first attempt at providing structured guidelines and practices for designing, modelling, and evaluating human behaviour in the form of symbolic models for AR. As an illustration of the process, a simple example from the office domain was developed. The process was evaluated in a comparative study of an intuitive process and the proposed process. The results showed a significant improvement over the intuitive process. Furthermore, the study participants reported greater ease of use and perceived effectiveness when following the proposed process. To evaluate the applicability of the process to more complex AR problems, it was applied to a problem from the kitchen domain. The results showed that following the proposed process yielded an average accuracy of 78%. The developed model outperformed state-of-the-art methods applied to the same dataset in previous work, and it performed comparably to a symbolic model developed by a model expert without following the proposed development process.
Conference Paper
Comprehending action preconditions and effects is an essential step in modeling the dynamics of the world. In this paper, we express the semantics of precondition relations extracted from text in terms of planning operations. The challenge of modeling this connection is to ground language at the level of relations. This type of grounding enables us to create high-level plans based on language abstractions. Our model jointly learns to predict precondition relations from text and to perform high-level planning guided by those relations. We implement this idea in the reinforcement learning framework using feedback automatically obtained from plan execution attempts. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, yielding an F-measure of 66% compared to the baseline's 65%. Additionally, we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline -- successfully completing 80% of planning tasks as compared to 69% for the baseline.