Available via license: CC BY 4.0
Content may be subject to copyright.
Activity Recognition in Assembly Tasks by Bayesian Filtering in
Multi-Hypergraphs
Timon Felske1, Stefan L¨
udtke2, Sebastian Bader1, Thomas Kirste1
1Institute of Visual & Analytic Computing, University of Rostock, Germany
2Institute for Enterprise Systems, University of Mannheim, Germany
{timon.felske, sebastian.bader, thomas.kirste}@uni-rostock.de
luedtke@es.uni-mannheim.de
Abstract
We study sensor-based human activity recognition in man-
ual work processes like assembly tasks. In such processes,
the system states often have a rich structure, involving object
properties and relations. Thus, estimating the hidden system
state from sensor observations by recursive Bayesian filtering
can be very challenging, due to the combinatorial explosion
in the number of system states.
To alleviate this problem, we propose an efficient Bayesian
filtering model for such processes. In our approach, system
states are represented by multi-hypergraphs, and the system
dynamics is modeled by graph rewriting rules. We show a
preliminary concept that allows to represent distributions over
multi-hypergraphs more compactly than by full enumeration,
and present an inference algorithm that works directly on this
compact representation. We demonstrate the applicability of
the algorithm on a real dataset.
Introduction
The automatic, sensor-based assessment of manual work
processes is highly relevant in domains like intralogistics
(Reining et al. 2019) or manufacturing (Tao et al. 2018).
In this paper, we focus on assembly processes, where a
subject is assembling an object from multiple parts (Jones
et al. 2021). Tracking assembly processes can be used to as-
sess process efficiency, to provide situation-aware assistance
(Aehnelt and Bader 2015; Gupta et al. 2012), or as the basis
for human-robot interaction (Wang, Ajaykumar, and Huang
2020).
For these tasks, it is not sufficient to only estimate the
current activity of the subject. Instead, we additionally need
to estimate the current assembly state (also called context
(L¨
udtke, Yordanova, and Kirste 2019)), i.e., the state of all
involved objects as well as their relations. An established
method for estimating hidden system states from a sequence
of sensor data is recursive Bayesian filtering (S¨
arkk¨
a 2013).
In Bayesian filtering, a distribution p(xt|y1:t)over system
states at time tis estimated recursively, given the sequence
y1:tof sensor data observed so far. To see why Bayesian
filtering in assembly processes can be difficult, consider the
following example.
Copyright © 2022, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Figure 1: Participant assembling a bookcase, wearing a suit
equipped with inertial sensors.
Example 1. A subject assembling a bookcase is instru-
mented with wearable sensors (see Figure 1). Activities per-
formed by the subject include, for example, picking up parts
and tools, using tools, or connecting parts. A concrete ex-
ample of an activity is the installation of an eccentric: The
activity can only be applied when the subject holds both an
eccentric and a screwdriver, results in the eccentric being at-
tached to one of the boards.
The state of an assembly process usually consists of com-
plex relations between objects, and the system dynamics
can be described by rules that manipulate these relations.
Such assembly processes can be modeled as graph rewrit-
ing systems, where system states are edge- and node-labelled
graphs, and the system dynamics is represented by graph
rewriting rules (Jones et al. 2021). Wang and Tian (2016)
also used weighted graphs to represent different sequences
to assemble a product and find the optimal assembly se-
quence.
Unfortunately, Bayesian filtering in such systems can be
extremely challenging due to the large number of discrete
system states, arising from the combinatorial explosion in
the properties of individual objects and their relations.
Therefore, methods for efficiently representing distribu-
tions over multi-hypergraphs, and efficient inference algo-
rithms are essential for sensor-based assembly tracking. In
this paper, we report on our ongoing work in that direction.
Specifically, this paper has the following contributions:
arXiv:2202.00332v1 [cs.AI] 1 Feb 2022
Figure 2: Graph-based representation of an assembly state. Left: hypergraph, Center: multi-hypergraph, Right: Lifted multi-
hypergraph. The graphs describe the exact same state: Two eccentrics are located at the table, two are at the bottom-part of the
shelf, two are at the shelf top and the upper two eccentrics are used to build a connection between the shelf-top and the side-left
panel.
• We introduce the general concept of Bayesian filtering in
multi-hypergraph rewriting systems.
• We propose a representation of distributions over multi-
hypergraphs that can be much more efficient than enu-
merating all graphs with non-zero probabilty. The repre-
sentation exploits symmetries in the distribution, arising
from the fact that not all parts need (or can) be distin-
guished.
• We present an efficient Bayesian filtering algorithm that
works directly on this compact representation.
Finally, we outline a number of challenges and directions for
future work.
Example Domain: Bookshelf Assembly
As a concrete example, we focus on the assembly of a book-
shelf, as already introduced above. Here, we describe the
domain in more detail to highlight algorithmic challenges in
this domain.
This assembly task consists of 56 different components
(entities). The entities can be divided into three categories:
7 boards, 40 screws and 9 tools. Of these individual enti-
ties, the screws are indistinguishable from one another. In
our scenario, a single agent is assembling the bookshelf.
While doing that, the agent wears a suit equipped with in-
ertial measurement units (IMUs), containing accelerometer
and gyroscope sensors. Like the agent, the various entities
are also equipped with IMUs.
Furthermore, each step of the assembly is annotated of-
fline based on video recordings of the assembly. Annota-
tions are 5-tuples (at, lt, lt+1, ot, ot+1 ), where atis the ac-
tion class at time t,ltand lt+1 are the locations at times tand
t+ 1, and otand ot+1 are objects held by the agent at times
tand t+ 1. In case multiple objects are held, the amount of
held objects is given. For example, the annotation where an
agent takes an eccentric from the floor is represented by the
5-tuple (take, f loor, fl oor, (eccentric 1),(eccentric 2)).
As the focus of this paper is the representation of states
and system dynamics but not sensor models, we assume that
annotation sequences can be observed directly. The observa-
tion model is defined such that p(yt+1|xt+1 , at, xt) = 1 if
the system states xtand xt+1 and action atare consistent
with the observation (i.e., annotation) ytand 0otherwise.
Investigating more realistic observation models that involve
the IMU data is a topic for future work.
Multi-Hypergraph States
Since we want to apply a Bayesian filtering algorithm on our
Bookshelf-Assembly task, we need to represent the differ-
ent states of the assembly. In the domain of assembly tasks,
graphs are often used to represent the state of the assem-
bly (Jones et al. 2021; Wang and Tian 2016). With this data
structure, it is possible to efficiently represent the entities
and their relation to one another.
Relations can involve more than two entities, e.g., two
boards being connected by an eccentric. Hypergraphs, can
naturally represent these cases. In a hypergraph, edges can
connect more than two vertices. Formally, a hypergraph is a
pair (V, E ), where Vis a set of vertices and the set of hy-
peredges E⊆ P(V), where P(V)is the power set of V. An
example of a hypergraph for the bookshelf domain is shown
in Figure 2 (left).
As mentioned before, we can not distinguish between
the individual screws (eccentric, etc.). Therefore, the hyper-
graph in Figure 2 can be improved further: Instead of rep-
resenting the indistinguishable entities as individual vertices
in the graph, we can use a multigraph. Formally, a multi-
graph is a pair (V, E ), where Vis a set of vertices and Ea
set of edges. Furthermore, the vertices and edges have asso-
ciated multiplicities. The summed multiplicity of all edges
connected to each vertex needs to be equal to the multiplic-
ity of that vertex. Thus, overall, an assembly state can be
represented by a multi-hypergraph (MHG), as shown in Fig-
ure 2 (center). These multiplicities are also instrumental for
Figure 3: Graphical representation of the described rewriting
rule installEccentric. The upper graph pattern needs to exist
in the state for the rule to be applicable. The rule transforms
that pattern to the lower pattern.
more efficient representations of distributions over MHGs,
as shown below.
To apply Bayesian Filtering to MHGs, we need to model
the system dynamics w.r.t. these graphs. We use a graph
rewriting formalism for this, as discussed next.
Bayesian Filtering in Multi-Hypergraphs
In this chapter, we describe how to apply Bayesian Filtering
to multi-hypergraphs.
We use graph rewriting rules to specify the system dy-
namics. This way, a model of the system dynamics can be
constructed from prior domain knowledge, instead of learn-
ing it from data. For example, in the bookshelf assembly
domain, rewriting rules that describe how parts can be con-
nected can be derived directly from a construction man-
ual. The knowledge-based construction of transition models
is particularly advantageous when only a small amount of
training data is available compared to the number of possi-
ble activity trajectories—as usual in human activity recog-
nition. Furthermore, preconditions of rules can be used to
reduce the set of possible actions based on the current state
and the observations while filtering.
As an example of a graph rewriting rule, consider the rule
shown in Figure 3. The rule consists of a precondition (a
graph pattern that needs to exist in the state for the rule to
be appliable) and an effect (which describes how the sub-
graph corresponding to the precondition is changed when
the rule is applied). Specifically, the rule installEccentric re-
tracts the at-edge between an eccentric and its current lo-
cation. After that, a new edge between the eccentric and its
new location (where it is installed) is created. Furthermore
the has-edge between these two entities will be added. Last,
the connected-edge between the eccentric and the connected
entities of the bookshelf will be realised.
Graph rewriting systems on multi-hypergraphs are a gen-
eralization of multiset rewriting systems, as used in Lifted
Marginal Filtering (L¨
udtke et al. 2018). Specifically, from
Figure 4: Graphical model representation of the Bayesian
filtering model described by our approach. Atare rewriting
rules, Xtand Xt+1 are multi-hypergraph states, and Ytand
Yt+1 are observations.
the viewpoint of multiset rewriting systems, graph patterns
are non-local preconditions (constraints), involving agree-
ment of values of different entities in the multiset. Such
constraints cannot be modeled in Lifted Marginal Filter-
ing due to the simple constraint language which is used to
guarantee that constraint satisfaction is tractable. Instead,
to test graph pattern constraints, a more general approach
like lifted weighted model counting (Gogate and Domin-
gos 2011) (which can test constraints without grounding the
model completely) could be required.
As illustrated in Figure 4, the transition model is given by
p(xt+1 |xt) = X
at
p(xt+1 |at, xt)p(at|xt).(1)
Here, atis a rewriting rule, the distribution p(xt+1 |at, xt)
specifies the states xt+1 that result from applying rule atto
state xtand p(at|xt)is the participants’ action selection
model.
The observation model is given by p(yt+1 |xt+1, at, xt).
This represents the idea that observations reflect what hap-
pens during the interval (t, t+1], depending on the action at
as well as on the states xtand xt+1 present before and after
this action.
For recursive Bayesian filtering, we are interested in
recursively estimating the marginal filtering distribution
p(xt+1 |y1:t+1)at time t+ 1 from the filtering distribution
at time t, the transition model and the observation model.
In principle, we can use the usual Bayesian filtering predic-
tion and update equations, but need to account for the action
variable Atand the fact that the observation model depends
on Atand Xt+1. This way, the prediction becomes
p(xt, at, xt+1 |y1:t) =
p(xt+1 |at, xt)p(at|xt)p(xt|y1:t)(2)
and the update is computed as
p(xt+1 |y1:t+1) =
1
ZX
at,xt
p(xt, at, xt+1 |y1:t)p(yt+1 |xt+1, at, xt),(3)
where 1
Zis a normalization factor. Note that marginalization
requires to evaluate whether states are identical, i.e., solve a
Task N duration (min) actions
Normal 11 12.3±5.1 360.5±127.3
Error 12 7.8±5.8 224.7±147.3
Table 1: Features extracted from the recording of 11 normal
and 12 (deliberately) erroneous experiments.
graph isomorphism problem. Thus, future work needs to fo-
cus on special cases where graph isomorphism can be solved
efficiently, e.g. via appropriate graph canonization.
In general, the distribution p(xt|y1:t)can have very
many states with non-zero probability. For example, when
there are kscrews and nholes, there are n
kways to attach
the screws to holes. Thus, we are interested in efficient rep-
resentations of such distributions, which do not rely on full
enumeration.
We propose lifted multi-hypergraphs to efficiently repre-
sent distributions over multi-hypergraphs (LMHGs). Each
LMHG represents a distribution over (ground) MHG. Con-
ceptually, LMHGs are an extension of lifted multiset states,
as used in Lifted Marginal Filtering (L¨
udtke et al. 2018). The
illustrated example represents the case where 4 eccentrics
are attached to the shelf-top and shelf-bottom boards. At
most two of them are at shelf-top and at most 4 of them are
at shelf-bottom. This results in three different situations of
how eccentrics could be distributed. The LMHG represents
auniform distribution over these situations.
Since we apply Bayesian filtering to LMHGs, the rewrit-
ing rules need to be adapted in order to implement the sys-
tem dynamics correctly. More precisely, we need rules that
describe the extent to which the distributions change when
indistinguishable entities are installed at unknown locations.
To stick with the example discussed above: Suppose that an
observation indicates that an eccentric is installed, but the
exact location of the eccentric is unknown. According to
the specified rule, the total amount of installed eccentrics
is increased by 1, and the maximum amount of installed
eccentrics at each reachable location is increased by 1. To
maintain the integrity constraint, the count of installed ec-
centrics at any location can not be larger than the total count
of installed eccentrics. Applying this rule to LMHGs can
be understood as applying a grounded version of the rule to
each specific MHG that is contained in the LMHG.
In contrast to L¨
udtke et al. (2018), we assume that similar
parts never need to be distinguished explicitly, thus we do
not require a splitting operator that would handle identifica-
tion.
Experimental Evaluation
In this section, we demonstrate the general applicability of
our concept to a real dataset. The dataset was created by
recording sensor data of subjects assembling a bookshelf, as
introduced above. Subjects wore a body suit with 17 IMUs,
and all objects (except screws) were equipped with IMUs
as well. All experiments were recorded on video for offline
annotation.
Overall, we performed 23 experiments with 12 different
subjects. Each subject was supposed to do a successful (cor-
rect) and an erroneous bookcase assembly. The erroneous
runs were recorded as one of our future research goals is to
detect assembly errors. We intended to perform 24 experi-
ments to generate data for 12 successful and 12 erroneous
assemblies. The data of one successful assembly was not us-
able, resulting in 11 included erroneous experiments. The
23 experiments provided recordings with 240.9 minutes of
relevant data. Properties of this data is listed in Table 1.
Currently, we concentrate on the evaluation based on the
annotations to demonstrate the basic applicability of our ap-
proach. Our filtering model was able to explain all 11 cor-
rect annotation sequences. During filtering, at most 2 lifted
multi-hypergraphs were required to represent the marginal
filtering distribution for all sequences and all timesteps, in-
stead of approx. 5000 in the grounded version. The signifi-
cant reduction of the necessary states can be explained by the
use of LMHGs, which allow to represent several states by a
single representative. This initial experiment shows that our
Bayesian filtering approach can be applied to track assembly
processes and efficiently represent the filtering distribution.
Discussion and Conclusion
In this paper, we presented a Bayesian filtering model with
multi-hypergraph states and graph rewriting-based system
dynamics. The main technical contribution is an efficient
representation of distributions over multi-hypergraphs and a
Bayesian filtering algorithm that works directly on that rep-
resentation. Our approach was motivated by state estimation
in assembly processes. However, the method can be usefully
employed to other state estimation tasks in a dynamic sys-
tem that consists of multiple entities and their relations, and
where the system dynamics is naturally described by rewrit-
ing rules, e.g. multi-agent systems or social networks.
To make our approach applicable to real-world domains,
several extensions are required: First, we did not discuss
the observation model p(yt|xt)here, which relates sensor
data to system states. Apart from simple, parametric den-
sities, generative neural networks (e.g. normalizing flows
(Rezende and Mohamed 2015)) could be employed. Second,
in real-world datasets, actions have distinct durations, that
need to be modeled appropriately, similar to methods used
for hidden semi-Markov models. Third, our future work will
focus on more general means to efficiently represent distri-
butions over graphs, as well as a formal analysis of the ex-
pressiveness and computational complexity of our approach.
Acknowledgments
This work was funded by the European Social Fund
(ESF) and the Ministry of Education, Science and Culture
of Mecklenburg-Western Pomerania (Germany) within the
project NEISS – Neural Extraction of Information, Structure
and Symmetry in Images under grant no ESF/14-BM-A55-
0009/19.
References
Aehnelt, M.; and Bader, S. 2015. Information Assistance for
Smart Assembly Stations. In ICAART (2), 143–150.
Gogate, V.; and Domingos, P. 2011. Probabilistic theorem
proving. In Proceedings of the Twenty-Seventh Conference
on Uncertainty in Artificial Intelligence, 256–265.
Gupta, A.; Fox, D.; Curless, B.; and Cohen, M. 2012. Du-
ploTrack: a real-time system for authoring and guiding du-
plo block assembly. In Proceedings of the 25th annual ACM
symposium on User interface software and technology, 389–
402.
Jones, J. D.; Cortesa, C.; Shelton, A.; Landau, B.; Khudan-
pur, S.; and Hager, G. D. 2021. Fine-grained activity recog-
nition for assembly videos. IEEE Robotics and Automation
Letters, 6(2): 3728–3735.
L¨
udtke, S.; Schr¨
oder, M.; Bader, S.; Kersting, K.; and Kirste,
T. 2018. Lifted Filtering via Exchangeable Decomposition.
In Proceedings of the 27th International Joint Conference
on Artificial Intelligence.
L¨
udtke, S.; Yordanova, K.; and Kirste, T. 2019. Human
activity and context recognition using lifted marginal fil-
tering. In 2019 IEEE International Conference on Perva-
sive Computing and Communications Workshops (PerCom
Workshops), 83–88. IEEE.
Reining, C.; Niemann, F.; Moya Rueda, F.; Fink, G. A.; and
ten Hompel, M. 2019. Human activity recognition for pro-
duction and logistics—a systematic literature review. Infor-
mation, 10(8): 245.
Rezende, D.; and Mohamed, S. 2015. Variational inference
with normalizing flows. In International conference on ma-
chine learning, 1530–1538. PMLR.
S¨
arkk¨
a, S. 2013. Bayesian filtering and smoothing. Cam-
bridge University Press.
Tao, W.; Lai, Z.-H.; Leu, M. C.; and Yin, Z. 2018. Worker
activity recognition in smart manufacturing using IMU and
sEMG signals with convolutional neural networks. Procedia
Manufacturing, 26: 1159–1166.
Wang, Y.; Ajaykumar, G.; and Huang, C.-M. 2020. See
what i see: Enabling user-centric robotic assistance using
first-person demonstrations. In Proceedings of the 2020
ACM/IEEE International Conference on Human-Robot In-
teraction, 639–648.
Wang, Y.; and Tian, D. 2016. A weighted assembly prece-
dence graph for assembly sequence planning. The Inter-
national Journal of Advanced Manufacturing Technology,
83(1-4): 99–115.