Content uploaded by Kristina Yordanova
Author content
All content in this area was uploaded by Kristina Yordanova on Mar 22, 2018
Content may be subject to copyright.
Providing Semantic Annotation for the CMU
Grand Challenge Dataset
Kristina Yordanova, Frank Kr¨
uger, Thomas Kirste
March 22, 2018
Abstract
Providing ground truth is essential for activity recognition for three reasons: to
apply methods of supervised learning, to provide context information for knowledge-
based methods, and to quantify the recognition performance. Semantic annotation
extends simple symbolic labelling by assigning semantic meaning to the label, en-
abling further reasoning. In this paper we present a novel approach to semantic
annotation by means of plan operators. We provide a step by step description of
the workflow to manually creating the ground truth annotation. To validate our ap-
proach we create semantic annotation of the CMU grand challenge dataset, which
is often cited but, due to missing and incomplete annotation, almost never used.
We evaluate the quality of the annotation by calculating the interrater reliability
between two annotators who labelled the dataset. The results show almost perfect
overlapping (Cohen’s κof 0.8) between the annotators. The produced annotation
is publicly available, to enable further usage of the CMU grand challenge dataset.
1 Introduction
The annotation of sensor datasets describing human behaviour is an important part of
the activity and plan recognition process. It provides a target label for each observa-
tion in the cases where supervised learning is applied. It also serves as a ground truth
for evaluating the performance of the activity or plan estimation procedure by compar-
ing the estimated by the model values with the annotated values. Finally, it provides
the context information needed for developing knowledge-based activity recognition
systems. In this paper we present a model-based approach to semantic annotation of
human behaviour based on the annotation process proposed in [7]. There, the labels
assigned to the data provide an underlying semantic structure that contains information
about the actions, goals, and plans being executed. This semantic structure is repre-
sented in the form of a model of the behaviour’s state in terms of collection of state
variables. Actions are then defined as effects that change the state of the model. This
form of annotation provides structured knowledge of the concepts in the data being
annotated and enables the reasoning over underlying behaviour changes, their causal
relations, and contextual dependencies. Such annotation is important for evaluating
plan recognition approaches that aim not only to recognise the goal of the plan, but
1
also the subgoals and actions being executed. Furthermore, the model-based semantic
annotation is important for evaluating the performance of any approach that aims at
recognising the underlying actions’ context. Finally, the annotation will be beneficial
for approaches that strive to learn models of human behaviour.
The contribution of this paper is threefold: first, we introduce a novel approach to
semantic annotation by means of precondition and effect rules; second, we describe
a step by step workflow to create such annotation; and finally, we provide a semantic
annotation for three types of recipes from the CMU grand challenge dataset.
The paper is structured as follows. In Section 2, we discuss the types of annotation
available in the literature and outline how our approach distinguishes from them. Sec-
tion 3 describes the proposed approach. In Section 4, we discuss how to improve
the quality of the annotation by training the annotators, while Section 5 illustrates
the approach by re-annotating the Carnegie Mellon University Multi-Modal Activity
Database (CMU-MMAC). The new annotation will also be made publicly available at
the authors’ website. In Section 6, we evaluate our approach by calculating the inter-
rater reliability between different annotators. Finally, the paper concludes with a short
discussion of the approach.
2 Annotation of Human Behaviour
In the context of human behaviour recognition we distinguish between three different
types of annotation. The first is the annotation of activities where a textual description
(or label) is assigned to the executed action [22, 11, 14, 13]. More formally, the ob-
jective is to manually assign a label lito each time step of a time series. This is often
done by analysing a separately recorded video log of the executed activities. These
labels are usually called ground truth, as they provide a symbolic representation of the
true sequence of activities. However, for the finite set L={l1. . . ln}of labels there
is usually no further information besides the equality relation. Annotations such as
take-baking pan provide a textual description of the executed task that however
do not contain an underlying semantic structure. There is usually no formal set of con-
straints that restrict the structure of the label sequences. Typically, nothing prevents
an annotator from producing sequences like “put fork to drawer” →“close drawer” →
“take knife from drawer”. This is also the most common type of annotation of human
behaviour, partially because even the assignment of non-semantic labels to the data is
a difficult, time consuming, and error prone task [22].
The second type of annotation is the plan annotation. It can be divided into goal
labelling and plan labelling [6]. The goal labelling is the annotation of each plan with a
label of the goal that is achieved [1, 5]. In contrast, plan labelling provides annotation
not only of the goal, but also of the actions constituting the plan, and of any subgoals
occurring in the plan [3]. The latter is, however, a time consuming and error prone
process [6] which explains why the only attempts of such plan annotation are done
when executing tasks on a computer (e.g. executing plans in an email program [3]).
This is also reflected in activity and plan recognition approaches such as [19, 15] that
use only synthesised observations, and thus synthesised annotation, to recognise the
human actions and goals.
2
The third type of annotation is the semantic annotation [18]. The term comes from
the field of the semantic web where the it is described as the process and the resulting
annotation or metadata consisting of aligning a resource or a part of it with a descrip-
tion of some of its properties and characteristics with respect to a formal conceptual
model or ontology [2]. The concept is later adopted in the field of human behaviour
annotation, where it describes the annotating of human behaviour with labels that have
an underlying semantic structure represented in the form of concepts, properties, and
relations between these concepts [20, 9]. We call this type of semantic structure an
algebraic representation in accordance to the definition provided in [12]. There, an
algebraic representation is one where the state of the system is modelled in terms of
combinations of operations required to achieve that state.
In difference to the algebraic representation, there exists a model-based represen-
tation which provides a model of the system’s state in terms of collection of state vari-
ables. Then, the individual operations are defined in terms of their effects on the state
of the model [12]. To our knowledge, there have been no attempts to represent the
semantic structure of human behaviour annotation in the form of model-based repre-
sentation. In the next sections we present an approach to semantic annotation of human
behaviour where the underlying semantic structure uses a model-based representation.
This representation allows us to provide not only a semantic meaning to the labels, but
also to produce plan labels and to reason about the plan’s causal correctness. Further-
more, it gives the state of the world corresponding to each label and allows to track
how it changes during the plan execution.
3 A novel approach to Annotating Human Behaviour
In this section, we present a model-based semantic annotation approach that strives
to overcome the drawbacks of the approaches outlined in the previous section. Our
approach combines the characteristics of the state of the art approaches and in addition
relies on model-based instead of algebraic knowledge representation. The targeted
group of activity datasets, that will potentially benefit from this approach, are those
describing goal-oriented behaviour. Typical activity recognition experiments such as
the CMU-MMAC [10] can be regarded as goal oriented. In them, the participants
are instructed to fulfil a task such as food preparation. To ensure comparability of
different repetitions, identical experimental setup is chosen for each trial. As a result,
the action sequence executed by the participants can be regarded as a plan, leading from
the same initial state (as chosen by the experimenter) to a set of goal states (given in
the experiment instruction). In the domain of automated planning and scheduling, plan
sequences are generated from domain models, where actions are defined by means of
preconditions and effects. A plan is then a sequence of actions generated by grounding
the action schemas of the domain leading from an initial state to the goal state. In
contrast, in our semantic annotation approach, we manually create plans that reflect
the participants’ actions, and define a planning domain, which describes the causal
connections of the actions to the state of the world. Below we describe the proposed
annotation process, including the definition of the label set L, the label semantics, the
manual annotation procedure, and the validation procedure. We illustrate the process
3
with examples from the kitchen domain.
Step one: Action and entity dictionary definition In the first step a dictionary of
actions and entities is created. The actions have a name representing the action class,
and a description of the action class that distinguishes it from the remaining classes.
The dictionary also contains the set of all entities observed during the experiment. The
dictionary is manually created by domain experts by analysing the video log, which
is typically recorded during the experiment. The results of the dictionary definition
are the set of action classes and the set of entities manipulated during action execution
(see Table 1). To allow annotators to distinguish between different actions, each action
Table 1: Result of step 1: A dictionary of actions and entities.
action
a1take
a2put
a3walk
...
anstir
entity
e1knife
e2drawer
e3counter
...
empepper
name is accompanied by its definition. If we look at action a1take, its definition is to
grab an object. During the executing of take, the location of the object changes from
the initial location to the hand of the person. The action consists of moving the arm to
the object, grabbing the object and finally moving the arm back to the body.
Step two: Definition of action relations In the second step, the action relations have
to be defined. For each action, the number and role of involved objects is defined. In
case of take, for example, an object and a location, where the object is taken from, are
defined. In addition, for each object, possible roles have to be identified. A pot, for
example, can be taken, filled, washed, and stirred. The result of this step is the finite
set of labels L={l1= ˜a1
1, l2= ˜a2
1, . . . , lk= ˜am
n}, where ˜adefines the syntax of the
action relation ato be used for the annotation process (see Table 2).
Table 2: Result of step 2: The table lists the type signature and each possible instantia-
tion for the set of actions identified in the previous step.
a1take (what:takeable, from:location)
a1
1take (knife, drawer)
a2
1take (knife, board)
...
a2put (what:takeable, to:location)
a1
2put (knife, drawer)
a2
2put (knife, board)
...
Step three: Definition of state properties As described above, we use a model-
based approach (according to [12]) to semantic annotation. We, therefore, have to
define the state space by means of state properties. In the third step, a set of state
4
properties is defined as a function of a tuple of entities to an entity of the domain. The
state space is then defined by each combination of possible mappings of entity tuples.
Finally, the subset of mappings that holds in the initial state (start of the experiment)
has to be marked (see Table 3).
Table 3: Result of step 3: A list of functions with type signatures and their instan-
tiations. A * in the last columns means that the defined function holds in the initial
state. f1is-at (what: takeable) →location
f1
1is-at (knife) 7→ drawer *
f2
1is-at (knife) 7→ board
...
f2objects taken () →number
f1
2objects taken () 7→ 0 *
f2
2objects taken () 7→ 1
...
Step four: Definition of preconditions and effects Objective of the third step is to
define the semantics of the actions. Using the type signature defined in the previous
step, action schemes are defined in terms of preconditions and effects. As explained
above, we regard the participants’ action sequences as plans. Here we describe them by
use of the Planning Domain Definition Language (PDDL), known from the domain of
automated planning and scheduling. The preconditions and effects for the single action
schemes are formed by domain experts. A take action for example requires an object
to be taken, the maximal number of objects not to exceed, and in case the location is
a container that can be opened and closed, it has to be open. Effects of the take action
are that the location of the object is changed from the original location to the hand and
if the object to be taken is dirty, the hands become dirty too (see Figure 1).
(:action take
:parameters (?what - takeable ?from - loc)
:precondition (and
(= (is-at ?what) ?from)
(not (= ?from hands)))
:effect (and
(assign (is-at ?what) hands)
(when (not (is-clean ?what)) (not (is-clean hands)))))
Figure 1: Extract of the action scheme for the take action encodes preconditions and
effects in PDDL.
Step five: Manual annotation Once the dictionary of labels is defined, the manual
annotation can be performed. We use the ELAN annotation tool [23] for this step. Here
an annotator has to assign labels from the defined label set to the video sequence. The
ELAN annotation tool allows to synchronise several video files and to show them in
parallel.
5
Step six: Plan validation Since the label sequence produced in the previous step
consists of plan operators, the complete sequence can be interpreted as a plan, lead-
ing from an initial to a goal state. Objective of the sixth step is to check the causal
validity of the label sequence with respect to the planning domain created in the pre-
vious step. A plan validator (such as VAL [16]) can be used for this task. If the label
sequence does not fulfill the causal constraints of the planning domain, two possible
reasons exist: Either the planning domain does not correctly reproduce the constraints
of the experimental setting or the label sequence is incorrect. In case of an incorrect
label sequence, step five (manual annotation) has to be repeated to correct the detected
problems. In case of an incorrect domain, either the preconditions defined in step four
have to be relaxed or the effects have to be revised.
The proposed process has three results: 1) the label sequence, 2) the semantic
structure of the labels, and 3) a planning domain, describing the causal relations of the
labels.
4 Improving the quality of annotation
It is often the case that two annotators provided with the same codebook, produce
annotation with a low overlap [4]. This can be explained with the high behaviour
variability and with the different interpretation of human behaviour1. To reduce the
effect of such discrepancies between annotators, the literature suggests training the
annotators which leads to an increase in the interrater reliability [4]. We adopt this
approach and conduct a training phase with the annotators. It involves the following
steps: 1. the domain expert meets with the annotators and discusses the elements of
the dictionary and their presence in an example video log; 2. the annotators separately
annotate the same video log; 3. the annotators compare the two annotations, discuss
the differences and decide on a new consolidated annotation of the video log; 4. the
annotators repeat steps 2 and 3 for the next video log. In a study conducted in [4], only
about 13% of the studies reported the size of the training involved. It was, however,
concluded that high intensity training produces significantly better results than low
intensity training or no training. For that reason we performed training as long as the
annotators felt comfortable annotating without external help (28% of the data in the
concrete annotation scenario).
We applied the proposed annotation process together with the training phase on the
CMU Multi-Modal Activity Database [11]. In the next section we outline the CMU-
MMAC and the annotation we created by applying our approach to this dataset.
5 The CMU-MMAC
The Carnegie Mellon University Multi-Modal Activity Database (CMU-MMAC) pro-
vides a dataset of kitchen activities [11]. Several subjects were recorded by multiple
1For example, in the action “take object”, one can interpret the beginning of the action as the point at
which the protagonist starts reaching for the object, or the point at which the hand is already holding the
object. This deviation in the interpretation reduces the overlapping of labels done by different annotators.
6
sensors (including cameras, accelerometers, and RFIDs) while performing food prepa-
ration tasks. A literature research revealed that only few researchers ever used this
dataset. In [8] the activities of twelve subjects were directly reconstructed from the
video by use of computer vision. In [21] the cameras and the IMU data were used
for temporal classification of seven subjects. We believe that two reasons exist for this
publicly available dataset to not be further used in the literature. The first is that activity
recognition in the kitchen domain is a very challenging task and the second is that the
provided annotation is neither complete nor provides enough information to efficiently
train classifiers. In the following section, we briefly describe our annotation for the
CMU-MMAC.
5.1 Overview of the CMU-MMAC
The CMU-MMAC consists of five sub datasets (Brownie, Sandwich, Eggs, Salad,
Pizza). Each of them contains recorded sensor data from one food preparation task.
The dataset contains data from 55 subjects, were each of them participates in several
sub experiments. While executing the assigned task, the subjects were recorded with
five cameras and multiple sensors. While the cameras can be used for computer vision
based activity recognition [8], the resulting video log is also the base for the dataset
annotation. An annotated label sequence for 16 subjects can be downloaded from the
CMU-MMAC website2. Albeit following a grammatical structure of verbs and ob-
jects, the label sequence is still missing semantics which if present would allow the
deriving of context information such as object locations and relations between actions
and entities. In the following section, we discuss the annotation of three of the five
datasets (Brownie, Sandwich, and Eggs)3. Later, we provide a detailed evaluation of
the produced annotation.
6 Evaluation
6.1 Experimental Setup
In order to evaluate the proposed annotation process and the quality of the resulting an-
notation, we conducted the following experiments: 1. Two domain experts reviewed a
subset from the video logs for the Brownie, Eggs, and Sandwich datasets and identified
the action classes, entities, action relations, state properties, and precondition-effect
rules. 2. Two annotators (Annotator A and Annotator B) independently annotated the
three datasets (Brownie, Eggs, and Sandwich). 3. The same two annotators discussed
the differences in the annotation after each annotated video log for the first nvideos of
each dataset and prepared a consolidated annotation for 28% of the sequences in the
datasets4.
2http://www.cs.cmu.edu/˜espriggs/cmu-mmac/annotations/
3The annotation can be downloaded from http://purl.uni-rostock.de/rosdok/
id00000163
4nis 12 for the Brownie, 7 for the Eggs, and 6 for the Sandwich dataset.
7
Based on these annotated sequences, we examined the following hypotheses: (H1) Fol-
lowing the proposed annotation process provides a high quality annotation. (H2) Train-
ing the annotators improves the quality of the annotation.
To test H1, we calculated the interrater reliability between Annotator A and Anno-
tator B for all video logs in the three datasets (90 video logs). To test H2, we investi-
gated whether the interrater reliability increases with the training of the annotators. The
interrater reliability was calculated for the ground labels, not for the classes (in other
words, we calculate the overlapping between the whole label “take-bowl-cupboard”
and not only for the action class “take”). The interrater reliability was calculated in
terms of agreement (IRa), Cohen’s κ(I Rκ), and Krippendorff’s α(IRα). We chose
the above measures as they are the most frequently used measures for interrater relia-
bility as reported in [4].
6.2 Results
6.2.1 Semantic annotation for the CMU-MMAC
To define the label set, two domain experts reviewed a subset from the video logs and
identified 13 action classes (11 for the Brownie, 12 for the Eggs, and 12 for the Sand-
wich). Table 4 shows the action classes for the three datasets. The action definitions
Table 4: Action classes for the three datasets.
Dataset Action classes
Brownie open, close, take, put, walk, turn on, fill, clean, stir, shake, other
Eggs open, close, take, put, walk, turn on, fill, clean, stir, shake, other,
turn off
Sandwich open, close, take, put, walk, turn on, fill, clean, stir, shake, other, cut
created in this step later enable different annotators to choose the same label for iden-
tical actions. In this step the domain experts also identified the entities (30 for the
Sandwich dataset, 44 for the Brownies, and 43 for the Eggs). From these dictionaries,
in step two, a discussion about the type signature and possible instantiations took place
(119 unique labels where identified for the Sandwich dataset, 187 for the Brownies, and
179 for the Eggs; see Table 2 for examples). Step three, definition of state properties
revealed 13 state properties (see Table 3). The next three steps were executed by two
annotators until all datasets were annotated without gaps and all annotation sequences
were shown to be valid plans.
The resulting annotation consists of 90 action sequences. Interestingly, while anno-
tating, we noticed that the experimenter changed the settings during the experiments’
recording. In all sub-experiments it can be seen that, before recording subject 28, some
objects were relocated in different cupboards. Our annotation is publicly available to
enable other researchers to address the activity recognition problem with the CMU-
MMAC dataset. The complete annotation can be downloaded from [24].
8
6.2.2 Results for H1
To address H1 we computed the interrater reliability between Annotator A and Anno-
tator B. The results can be seen in Figure 2. It can be seen that the annotators reached
a median agreement of 0.84 for the Brownie, 0.81 for the Eggs, and 0.84 for the Sand-
wich. Similarly, the Cohen’s κand the Krippendorff’sαhad a median of 0.83 for the
Brownie, 0.80 for the Eggs, and .83 for the Sandwich. A Cohen’s κbetween 0.41 –
0.60 means moderate agreement, between 0.61 – 0.80 means substantial agreement,
and above 0.81 indicates almost perfect agreement [17]. Figure 2). Similarly, data that
Figure 2: The median interrater reliability for the three datasets (in terms of Cohen’s κ)
and the deviation from this median.
has Krippendorff’s αabove 0.80 is considered to be reliable to draw conclusions. In
other words, the average interrater reliability between the two annotators is between
substantial and almost perfect. This also indicates that the proposed annotation process
not only provides semantic annotation, it also ensures that the annotators produce high
quality annotation. Consequently, hypothesis H1 was accepted. Figure 3 shows the
annotated by Annotators A and B classes and the places where they differ. It can be
seen that the difference is mainly caused by slight shifts in the start and end time of the
actions. This indicates that the problematic places during annotation of fine-grained
actions are in determining the start and end of the action.
6.2.3 Results for H2
To test H2, we investigated whether the training had impact on the interrater reliability.
We calculated the difference in the interrater reliability between each newly annotated
video log and the previous one during the training phase. The “Brownie” dataset has
mean positive difference of about 2% while “Sandwich” has a mean difference of about
10%. This means that on average there was an improvement of 2% (respectively 10%)
9
Figure 3: Comparison between the annotation of a video log of Annotator A (bottom)
and Annotator B (top) from the “Brownie” dataset for subject 9. The different colours
indicate different action classes. The plot in the middle illustrates the differences be-
tween both annotators (in black).
in the interrater reliability during the training phase. On the other hand, the “Eggs”
dataset shows a negative difference of 1%, which indicates that on average no im-
provement in interrater reliability was observed during the training phase. A negative
difference between some datasets was also observed. These indicate decrease in the in-
terrater reliability after a training was performed (with maximum of about 2%). Such
decrease can be explained with encountering of new situations in the dataset or with
different interpretation of a given action. However, a decrease of 2% does not signifi-
cantly reduce the quality of the annotation. Figure 4 illustrates the interrater agreement
for the datasets selected for the training phase. The orange line shows a linear model
that was fitted to predict the interrater reliability from the dataset number. It can be seen
that the effect of the training phase was not negative for all datasets. For two datasets
(Brownie and Sandwich), an increasing trend can be seen. To better understand the
change in the interrater reliability, we look into the agreement (IRa) between the an-
notators of the first 6 annotations of the “Sandwich” dataset (Figure 4). The interrater
reliability between the first and the second annotated video increases with 23%. The
same applies for the interrater reliability between the second and the third annotated
video. At that point the interrater reliability has reached about 81% overlapping (Co-
hen’s κof 0.8), which indicates almost perfect overlapping. After that, there is a mean
difference of about 1%. On average the overlapping between the two annotators stays
around 80% (or mean Cohen’sκof 0.78) even after the training phase. This indicates
that the learning phase improves the agreement between annotators, thus the quality of
the produced annotation (hence we accept H2). The results, however, show that one
10
Figure 4: Learning curve for the first n videos. The points illustrate the interrater
reliability for one dataset. The points are connected to increase perceivability. The
orange line illustrates the increase of reliability due to learning.
needs a relatively small training phase to produce results with almost perfect overlap-
ping between annotators5. This contradicts the assumption that we need high intensity
training to produce high quality annotation (as suggested in [4]). It also shows that
using our approach for semantic annotation ensures a high quality annotation without
5For the “Sandwich” dataset, the annotators needed to produce consolidated annotation for the first two
videos before they reached overlapping of about 80%, for the “Brownie” and the “Eggs” they needed only
one.
11
the need of intensive training of the annotators.
7 Conclusion
In this work, we presented a novel approach to manual semantic annotation. The ap-
proach allows the usage of a rich label set that includes semantic meaning and relations
between actions, entities and context information. Additionally, we provide a state
space that evolves during the execution of the annotated plan sequences. In contrast to
typical annotation processes, our annotation approach allows further reasoning about
the state of the world by interpreting the annotated label sequence as grounded plan
operators. It is, for example, easy to infer the location of objects involved, without any
explicit statement about the objects’ location.
To validate our approach, we annotated the “Brownie”, “Eggs”, and “Sandwich”
trials from the CMU-MMAC dataset. In the original annotation only 16 out of 90
sequences are annotated. We now provide a uniform annotation for all 90 sequences
including a semantic meaning of the labels. To enable other researchers to participate
in the CMU grand challenge, we make the complete annotation publicly available.
Furthermore, we evaluated the quality of the produced annotation by comparing the
annotation of two annotators. The results showed that the annotators were able to
produce labelled sequences with almost perfect overlapping (Cohen’s κof about 0.8).
This stands to show that the approach provides high quality semantic annotation, which
the ubiquitous computing community can use to further the research in activity, plan,
and context recognition.
8 Acknowledgments
We would like to thank the students who annotated the dataset. This work is par-
tially funded by the German Research Foundation (YO 226/1-1). The video data was
obtained from kitchen.cs.cmu.edu and the data collection was funded in part by the
National Science Foundation (EEEC-0540865).
References
[1] D. W. Albrecht, I. Zukerman, and A. E. Nicholson. Bayesian models for key-
hole plan recognition in an adventure game. User Modeling and User-Adapted
Interaction, 8(1-2):5–47, 1998.
[2] P. Andrews, I. Zaihrayeu, and J. Pane. A classification of semantic annotation
systems. Semant. web, 3(3):223–248, August 2012.
[3] M. Bauer. Acquisition of user preferences for plan recognition. In Proc. of Int.
Conf. on User Modeling, pages 105–112, 1996.
12
[4] P. S. Bayerl and K. I. Paul. What determines inter-coder agreement in manual
annotations? a meta-analytic investigation. Comput. Linguist., 37(4):699–725,
December 2011.
[5] N. Blaylock and J. Allen. Statistical goal parameter recognition. In Int. Conf. on
Automated Planning and Scheduling, pages 297–304, June 2004.
[6] N. Blaylock and J. Allen. Hierarchical goal recognition. In Plan, activity, and
intent recognition, pages 3–32. Elsevier, Amsterdam, 2014.
[7] blinded. The entry is removed due to double blinded reviewing., 2017.
[8] E. Z. Borzeshi, O. P. Concha, R. Y. Da Xu, and M. Piccardi. Joint action seg-
mentation and classification by an extended hidden markov model. IEEE Signal
Process. Lett., 20(12):1207–1210, 2013.
[9] H.-S. Chung, J.-M. Kim, Y.-C. Byun, and S.-Y. Byun. Retrieving and explor-
ing ontology-based human motion sequences. In Computational Science and Its
Applications, volume 3482, pages 788–797. Springer Berlin Heidelberg, 2005.
[10] F de la Torre, J. Hodgins, J. Montano, S. Valcarcel, R. Forcada, and J. Macey.
Guide to the carnegie mellon university multimodal activity database. Technical
Report CMU-RI-TR-08-22, Robotics Institute, Carnegie Mellon University, July
2009.
[11] F. de la Torre, J. K. Hodgins, J. Montano, and S. Valcarcel. Detailed human data
acquisition of kitchen activities: the CMU-Multimodal Activity Database. In
Workshop on Developing Shared Home Behavior Datasets to Advance HCI and
Ubiquitous Computing Research, 2009.
[12] R. Denney. A comparison of the model-based & algebraic styles of specifica-
tion as a basis for test specification. SIGSOFT Softw. Eng. Notes, 21(5):60–64,
September 1996.
[13] M. Donnelly, T. Magherini, C. Nugent, F. Cruciani, and C. Paggetti. Annotating
sensor data to identify activities of daily living. In Toward Useful Services for
Elderly and People with Disabilities, volume 6719, pages 41–48. Springer Berlin
Heidelberg, 2011.
[14] J. Hamm, B. Stone, M. Belkin, and S. Dennis. Automatic annotation of daily ac-
tivity from smartphone-based multisensory streams. In Mobile Computing, Appli-
cations, and Services, volume 110, pages 328–342. Springer Berlin Heidelberg,
2013.
[15] L. M. Hiatt, A. M. Harrison, and J. G. Trafton. Accommodating human variability
in human-robot teams through theory of mind. In Proc. Int. J. Conf. Artificial
Intelligence, pages 2066–2071, Barcelona, Spain, 2011.
[16] R. Howey, D. Long, and M. Fox. Val: automatic plan validation, continuous
effects and mixed initiative planning using pddl. In IEEE Int. Conf. on Tools with
Artificial Intelligence, pages 294–301, Nov 2004.
13
[17] G. G. Koch J. R. Landis. The measurement of observer agreement for categorical
data. Biometrics, 33(1):159–174, 1977.
[18] A. Kiryakov, B. Popov, D. Ognyanoff, D. Manov, A. Kirilov, and M. Goranov. Se-
mantic annotation, indexing, and retrieval. In The Semantic Web - ISWC, volume
2870, pages 484–499. Springer, 2003.
[19] M. Ramirez and H. Geffner. Goal recognition over pomdps: Inferring the in-
tention of a pomdp agent. In Proc. of Int. Joint Conf. on Artificial Intelligence,
volume 3, pages 2009–2014, Barcelona, Spain, 2011.
[20] S. Saad, D. De Beul, S. Mahmoudi, and P. Manneback. An ontology for video
human movement representation based on benesh notation. In Int. Conf. on Mul-
timedia Computing and Systems, pages 77–82, 2012.
[21] E. H. Spriggs, F. de la Torre, and M. Hebert. Temporal segmentation and activ-
ity classification from first-person sensing. In IEEE Computer Society Conf. On
Computer Vision and Pattern Recognition Workshops, pages 17–24. IEEE, 2009.
[22] T. L. M. van Kasteren and B. J. A. Kr¨
ose. A sensing and annotation system for
recording datasets in multiple homes. In Proc. of Ann. Conf. on Human Factors
and Computing Systems, pages 4763–4766, Boston, USA, April 2009.
[23] P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. Sloetjes. ELAN: a
professional framework for multimodality research. In Proc. Int. Conf. Language
Resources and Evaluation, pages 1556–1559, 2006.
[24] K. Yordanova, F. Kr¨
uger, and T. Kirste. Semantic annotation for the CMU-
MMAC Dataset. University Library, University of Rostock, 2018. http://purl.uni-
rostock.de/rosdok/id00000163.
14