Available via license: CC BY 4.0

Content may be subject to copyright.

Activity Recognition in Assembly Tasks by Bayesian Filtering in

Multi-Hypergraphs

Timon Felske1, Stefan L¨

udtke2, Sebastian Bader1, Thomas Kirste1

1Institute of Visual & Analytic Computing, University of Rostock, Germany

2Institute for Enterprise Systems, University of Mannheim, Germany

{timon.felske, sebastian.bader, thomas.kirste}@uni-rostock.de

luedtke@es.uni-mannheim.de

Abstract

We study sensor-based human activity recognition in man-

ual work processes like assembly tasks. In such processes,

the system states often have a rich structure, involving object

properties and relations. Thus, estimating the hidden system

state from sensor observations by recursive Bayesian ﬁltering

can be very challenging, due to the combinatorial explosion

in the number of system states.

To alleviate this problem, we propose an efﬁcient Bayesian

ﬁltering model for such processes. In our approach, system

states are represented by multi-hypergraphs, and the system

dynamics is modeled by graph rewriting rules. We show a

preliminary concept that allows to represent distributions over

multi-hypergraphs more compactly than by full enumeration,

and present an inference algorithm that works directly on this

compact representation. We demonstrate the applicability of

the algorithm on a real dataset.

Introduction

The automatic, sensor-based assessment of manual work

processes is highly relevant in domains like intralogistics

(Reining et al. 2019) or manufacturing (Tao et al. 2018).

In this paper, we focus on assembly processes, where a

subject is assembling an object from multiple parts (Jones

et al. 2021). Tracking assembly processes can be used to as-

sess process efﬁciency, to provide situation-aware assistance

(Aehnelt and Bader 2015; Gupta et al. 2012), or as the basis

for human-robot interaction (Wang, Ajaykumar, and Huang

2020).

For these tasks, it is not sufﬁcient to only estimate the

current activity of the subject. Instead, we additionally need

to estimate the current assembly state (also called context

(L¨

udtke, Yordanova, and Kirste 2019)), i.e., the state of all

involved objects as well as their relations. An established

method for estimating hidden system states from a sequence

of sensor data is recursive Bayesian ﬁltering (S¨

arkk¨

a 2013).

In Bayesian ﬁltering, a distribution p(xt|y1:t)over system

states at time tis estimated recursively, given the sequence

y1:tof sensor data observed so far. To see why Bayesian

ﬁltering in assembly processes can be difﬁcult, consider the

following example.

Copyright © 2022, Association for the Advancement of Artiﬁcial

Intelligence (www.aaai.org). All rights reserved.

Figure 1: Participant assembling a bookcase, wearing a suit

equipped with inertial sensors.

Example 1. A subject assembling a bookcase is instru-

mented with wearable sensors (see Figure 1). Activities per-

formed by the subject include, for example, picking up parts

and tools, using tools, or connecting parts. A concrete ex-

ample of an activity is the installation of an eccentric: The

activity can only be applied when the subject holds both an

eccentric and a screwdriver, results in the eccentric being at-

tached to one of the boards.

The state of an assembly process usually consists of com-

plex relations between objects, and the system dynamics

can be described by rules that manipulate these relations.

Such assembly processes can be modeled as graph rewrit-

ing systems, where system states are edge- and node-labelled

graphs, and the system dynamics is represented by graph

rewriting rules (Jones et al. 2021). Wang and Tian (2016)

also used weighted graphs to represent different sequences

to assemble a product and ﬁnd the optimal assembly se-

quence.

Unfortunately, Bayesian ﬁltering in such systems can be

extremely challenging due to the large number of discrete

system states, arising from the combinatorial explosion in

the properties of individual objects and their relations.

Therefore, methods for efﬁciently representing distribu-

tions over multi-hypergraphs, and efﬁcient inference algo-

rithms are essential for sensor-based assembly tracking. In

this paper, we report on our ongoing work in that direction.

Speciﬁcally, this paper has the following contributions:

arXiv:2202.00332v1 [cs.AI] 1 Feb 2022

Figure 2: Graph-based representation of an assembly state. Left: hypergraph, Center: multi-hypergraph, Right: Lifted multi-

hypergraph. The graphs describe the exact same state: Two eccentrics are located at the table, two are at the bottom-part of the

shelf, two are at the shelf top and the upper two eccentrics are used to build a connection between the shelf-top and the side-left

panel.

• We introduce the general concept of Bayesian ﬁltering in

multi-hypergraph rewriting systems.

• We propose a representation of distributions over multi-

hypergraphs that can be much more efﬁcient than enu-

merating all graphs with non-zero probabilty. The repre-

sentation exploits symmetries in the distribution, arising

from the fact that not all parts need (or can) be distin-

guished.

• We present an efﬁcient Bayesian ﬁltering algorithm that

works directly on this compact representation.

Finally, we outline a number of challenges and directions for

future work.

Example Domain: Bookshelf Assembly

As a concrete example, we focus on the assembly of a book-

shelf, as already introduced above. Here, we describe the

domain in more detail to highlight algorithmic challenges in

this domain.

This assembly task consists of 56 different components

(entities). The entities can be divided into three categories:

7 boards, 40 screws and 9 tools. Of these individual enti-

ties, the screws are indistinguishable from one another. In

our scenario, a single agent is assembling the bookshelf.

While doing that, the agent wears a suit equipped with in-

ertial measurement units (IMUs), containing accelerometer

and gyroscope sensors. Like the agent, the various entities

are also equipped with IMUs.

Furthermore, each step of the assembly is annotated of-

ﬂine based on video recordings of the assembly. Annota-

tions are 5-tuples (at, lt, lt+1, ot, ot+1 ), where atis the ac-

tion class at time t,ltand lt+1 are the locations at times tand

t+ 1, and otand ot+1 are objects held by the agent at times

tand t+ 1. In case multiple objects are held, the amount of

held objects is given. For example, the annotation where an

agent takes an eccentric from the ﬂoor is represented by the

5-tuple (take, f loor, fl oor, (eccentric 1),(eccentric 2)).

As the focus of this paper is the representation of states

and system dynamics but not sensor models, we assume that

annotation sequences can be observed directly. The observa-

tion model is deﬁned such that p(yt+1|xt+1 , at, xt) = 1 if

the system states xtand xt+1 and action atare consistent

with the observation (i.e., annotation) ytand 0otherwise.

Investigating more realistic observation models that involve

the IMU data is a topic for future work.

Multi-Hypergraph States

Since we want to apply a Bayesian ﬁltering algorithm on our

Bookshelf-Assembly task, we need to represent the differ-

ent states of the assembly. In the domain of assembly tasks,

graphs are often used to represent the state of the assem-

bly (Jones et al. 2021; Wang and Tian 2016). With this data

structure, it is possible to efﬁciently represent the entities

and their relation to one another.

Relations can involve more than two entities, e.g., two

boards being connected by an eccentric. Hypergraphs, can

naturally represent these cases. In a hypergraph, edges can

connect more than two vertices. Formally, a hypergraph is a

pair (V, E ), where Vis a set of vertices and the set of hy-

peredges E⊆ P(V), where P(V)is the power set of V. An

example of a hypergraph for the bookshelf domain is shown

in Figure 2 (left).

As mentioned before, we can not distinguish between

the individual screws (eccentric, etc.). Therefore, the hyper-

graph in Figure 2 can be improved further: Instead of rep-

resenting the indistinguishable entities as individual vertices

in the graph, we can use a multigraph. Formally, a multi-

graph is a pair (V, E ), where Vis a set of vertices and Ea

set of edges. Furthermore, the vertices and edges have asso-

ciated multiplicities. The summed multiplicity of all edges

connected to each vertex needs to be equal to the multiplic-

ity of that vertex. Thus, overall, an assembly state can be

represented by a multi-hypergraph (MHG), as shown in Fig-

ure 2 (center). These multiplicities are also instrumental for

Figure 3: Graphical representation of the described rewriting

rule installEccentric. The upper graph pattern needs to exist

in the state for the rule to be applicable. The rule transforms

that pattern to the lower pattern.

more efﬁcient representations of distributions over MHGs,

as shown below.

To apply Bayesian Filtering to MHGs, we need to model

the system dynamics w.r.t. these graphs. We use a graph

rewriting formalism for this, as discussed next.

Bayesian Filtering in Multi-Hypergraphs

In this chapter, we describe how to apply Bayesian Filtering

to multi-hypergraphs.

We use graph rewriting rules to specify the system dy-

namics. This way, a model of the system dynamics can be

constructed from prior domain knowledge, instead of learn-

ing it from data. For example, in the bookshelf assembly

domain, rewriting rules that describe how parts can be con-

nected can be derived directly from a construction man-

ual. The knowledge-based construction of transition models

is particularly advantageous when only a small amount of

training data is available compared to the number of possi-

ble activity trajectories—as usual in human activity recog-

nition. Furthermore, preconditions of rules can be used to

reduce the set of possible actions based on the current state

and the observations while ﬁltering.

As an example of a graph rewriting rule, consider the rule

shown in Figure 3. The rule consists of a precondition (a

graph pattern that needs to exist in the state for the rule to

be appliable) and an effect (which describes how the sub-

graph corresponding to the precondition is changed when

the rule is applied). Speciﬁcally, the rule installEccentric re-

tracts the at-edge between an eccentric and its current lo-

cation. After that, a new edge between the eccentric and its

new location (where it is installed) is created. Furthermore

the has-edge between these two entities will be added. Last,

the connected-edge between the eccentric and the connected

entities of the bookshelf will be realised.

Graph rewriting systems on multi-hypergraphs are a gen-

eralization of multiset rewriting systems, as used in Lifted

Marginal Filtering (L¨

udtke et al. 2018). Speciﬁcally, from

Figure 4: Graphical model representation of the Bayesian

ﬁltering model described by our approach. Atare rewriting

rules, Xtand Xt+1 are multi-hypergraph states, and Ytand

Yt+1 are observations.

the viewpoint of multiset rewriting systems, graph patterns

are non-local preconditions (constraints), involving agree-

ment of values of different entities in the multiset. Such

constraints cannot be modeled in Lifted Marginal Filter-

ing due to the simple constraint language which is used to

guarantee that constraint satisfaction is tractable. Instead,

to test graph pattern constraints, a more general approach

like lifted weighted model counting (Gogate and Domin-

gos 2011) (which can test constraints without grounding the

model completely) could be required.

As illustrated in Figure 4, the transition model is given by

p(xt+1 |xt) = X

at

p(xt+1 |at, xt)p(at|xt).(1)

Here, atis a rewriting rule, the distribution p(xt+1 |at, xt)

speciﬁes the states xt+1 that result from applying rule atto

state xtand p(at|xt)is the participants’ action selection

model.

The observation model is given by p(yt+1 |xt+1, at, xt).

This represents the idea that observations reﬂect what hap-

pens during the interval (t, t+1], depending on the action at

as well as on the states xtand xt+1 present before and after

this action.

For recursive Bayesian ﬁltering, we are interested in

recursively estimating the marginal ﬁltering distribution

p(xt+1 |y1:t+1)at time t+ 1 from the ﬁltering distribution

at time t, the transition model and the observation model.

In principle, we can use the usual Bayesian ﬁltering predic-

tion and update equations, but need to account for the action

variable Atand the fact that the observation model depends

on Atand Xt+1. This way, the prediction becomes

p(xt, at, xt+1 |y1:t) =

p(xt+1 |at, xt)p(at|xt)p(xt|y1:t)(2)

and the update is computed as

p(xt+1 |y1:t+1) =

1

ZX

at,xt

p(xt, at, xt+1 |y1:t)p(yt+1 |xt+1, at, xt),(3)

where 1

Zis a normalization factor. Note that marginalization

requires to evaluate whether states are identical, i.e., solve a

Task N duration (min) actions

Normal 11 12.3±5.1 360.5±127.3

Error 12 7.8±5.8 224.7±147.3

Table 1: Features extracted from the recording of 11 normal

and 12 (deliberately) erroneous experiments.

graph isomorphism problem. Thus, future work needs to fo-

cus on special cases where graph isomorphism can be solved

efﬁciently, e.g. via appropriate graph canonization.

In general, the distribution p(xt|y1:t)can have very

many states with non-zero probability. For example, when

there are kscrews and nholes, there are n

kways to attach

the screws to holes. Thus, we are interested in efﬁcient rep-

resentations of such distributions, which do not rely on full

enumeration.

We propose lifted multi-hypergraphs to efﬁciently repre-

sent distributions over multi-hypergraphs (LMHGs). Each

LMHG represents a distribution over (ground) MHG. Con-

ceptually, LMHGs are an extension of lifted multiset states,

as used in Lifted Marginal Filtering (L¨

udtke et al. 2018). The

illustrated example represents the case where 4 eccentrics

are attached to the shelf-top and shelf-bottom boards. At

most two of them are at shelf-top and at most 4 of them are

at shelf-bottom. This results in three different situations of

how eccentrics could be distributed. The LMHG represents

auniform distribution over these situations.

Since we apply Bayesian ﬁltering to LMHGs, the rewrit-

ing rules need to be adapted in order to implement the sys-

tem dynamics correctly. More precisely, we need rules that

describe the extent to which the distributions change when

indistinguishable entities are installed at unknown locations.

To stick with the example discussed above: Suppose that an

observation indicates that an eccentric is installed, but the

exact location of the eccentric is unknown. According to

the speciﬁed rule, the total amount of installed eccentrics

is increased by 1, and the maximum amount of installed

eccentrics at each reachable location is increased by 1. To

maintain the integrity constraint, the count of installed ec-

centrics at any location can not be larger than the total count

of installed eccentrics. Applying this rule to LMHGs can

be understood as applying a grounded version of the rule to

each speciﬁc MHG that is contained in the LMHG.

In contrast to L¨

udtke et al. (2018), we assume that similar

parts never need to be distinguished explicitly, thus we do

not require a splitting operator that would handle identiﬁca-

tion.

Experimental Evaluation

In this section, we demonstrate the general applicability of

our concept to a real dataset. The dataset was created by

recording sensor data of subjects assembling a bookshelf, as

introduced above. Subjects wore a body suit with 17 IMUs,

and all objects (except screws) were equipped with IMUs

as well. All experiments were recorded on video for ofﬂine

annotation.

Overall, we performed 23 experiments with 12 different

subjects. Each subject was supposed to do a successful (cor-

rect) and an erroneous bookcase assembly. The erroneous

runs were recorded as one of our future research goals is to

detect assembly errors. We intended to perform 24 experi-

ments to generate data for 12 successful and 12 erroneous

assemblies. The data of one successful assembly was not us-

able, resulting in 11 included erroneous experiments. The

23 experiments provided recordings with 240.9 minutes of

relevant data. Properties of this data is listed in Table 1.

Currently, we concentrate on the evaluation based on the

annotations to demonstrate the basic applicability of our ap-

proach. Our ﬁltering model was able to explain all 11 cor-

rect annotation sequences. During ﬁltering, at most 2 lifted

multi-hypergraphs were required to represent the marginal

ﬁltering distribution for all sequences and all timesteps, in-

stead of approx. 5000 in the grounded version. The signiﬁ-

cant reduction of the necessary states can be explained by the

use of LMHGs, which allow to represent several states by a

single representative. This initial experiment shows that our

Bayesian ﬁltering approach can be applied to track assembly

processes and efﬁciently represent the ﬁltering distribution.

Discussion and Conclusion

In this paper, we presented a Bayesian ﬁltering model with

multi-hypergraph states and graph rewriting-based system

dynamics. The main technical contribution is an efﬁcient

representation of distributions over multi-hypergraphs and a

Bayesian ﬁltering algorithm that works directly on that rep-

resentation. Our approach was motivated by state estimation

in assembly processes. However, the method can be usefully

employed to other state estimation tasks in a dynamic sys-

tem that consists of multiple entities and their relations, and

where the system dynamics is naturally described by rewrit-

ing rules, e.g. multi-agent systems or social networks.

To make our approach applicable to real-world domains,

several extensions are required: First, we did not discuss

the observation model p(yt|xt)here, which relates sensor

data to system states. Apart from simple, parametric den-

sities, generative neural networks (e.g. normalizing ﬂows

(Rezende and Mohamed 2015)) could be employed. Second,

in real-world datasets, actions have distinct durations, that

need to be modeled appropriately, similar to methods used

for hidden semi-Markov models. Third, our future work will

focus on more general means to efﬁciently represent distri-

butions over graphs, as well as a formal analysis of the ex-

pressiveness and computational complexity of our approach.

Acknowledgments

This work was funded by the European Social Fund

(ESF) and the Ministry of Education, Science and Culture

of Mecklenburg-Western Pomerania (Germany) within the

project NEISS – Neural Extraction of Information, Structure

and Symmetry in Images under grant no ESF/14-BM-A55-

0009/19.

References

Aehnelt, M.; and Bader, S. 2015. Information Assistance for

Smart Assembly Stations. In ICAART (2), 143–150.

Gogate, V.; and Domingos, P. 2011. Probabilistic theorem

proving. In Proceedings of the Twenty-Seventh Conference

on Uncertainty in Artiﬁcial Intelligence, 256–265.

Gupta, A.; Fox, D.; Curless, B.; and Cohen, M. 2012. Du-

ploTrack: a real-time system for authoring and guiding du-

plo block assembly. In Proceedings of the 25th annual ACM

symposium on User interface software and technology, 389–

402.

Jones, J. D.; Cortesa, C.; Shelton, A.; Landau, B.; Khudan-

pur, S.; and Hager, G. D. 2021. Fine-grained activity recog-

nition for assembly videos. IEEE Robotics and Automation

Letters, 6(2): 3728–3735.

L¨

udtke, S.; Schr¨

oder, M.; Bader, S.; Kersting, K.; and Kirste,

T. 2018. Lifted Filtering via Exchangeable Decomposition.

In Proceedings of the 27th International Joint Conference

on Artiﬁcial Intelligence.

L¨

udtke, S.; Yordanova, K.; and Kirste, T. 2019. Human

activity and context recognition using lifted marginal ﬁl-

tering. In 2019 IEEE International Conference on Perva-

sive Computing and Communications Workshops (PerCom

Workshops), 83–88. IEEE.

Reining, C.; Niemann, F.; Moya Rueda, F.; Fink, G. A.; and

ten Hompel, M. 2019. Human activity recognition for pro-

duction and logistics—a systematic literature review. Infor-

mation, 10(8): 245.

Rezende, D.; and Mohamed, S. 2015. Variational inference

with normalizing ﬂows. In International conference on ma-

chine learning, 1530–1538. PMLR.

S¨

arkk¨

a, S. 2013. Bayesian ﬁltering and smoothing. Cam-

bridge University Press.

Tao, W.; Lai, Z.-H.; Leu, M. C.; and Yin, Z. 2018. Worker

activity recognition in smart manufacturing using IMU and

sEMG signals with convolutional neural networks. Procedia

Manufacturing, 26: 1159–1166.

Wang, Y.; Ajaykumar, G.; and Huang, C.-M. 2020. See

what i see: Enabling user-centric robotic assistance using

ﬁrst-person demonstrations. In Proceedings of the 2020

ACM/IEEE International Conference on Human-Robot In-

teraction, 639–648.

Wang, Y.; and Tian, D. 2016. A weighted assembly prece-

dence graph for assembly sequence planning. The Inter-

national Journal of Advanced Manufacturing Technology,

83(1-4): 99–115.