Virtual Humans in AR: Evaluation of Presentation Concepts in an Industrial Assistance Use Case


Embedding virtual humans in educational settings enables the transfer of the approved concepts of learning by observation and imitation of experts to extended reality scenarios. Whilst various presentation concepts of virtual humans for learning have been investigated in sports and rehabilitation, little is known regarding industrial use cases. In prior work on manual assembly, Lampen et al. [21] show that three-dimensional (3D) registered virtual humans can provide assistance as effective as state-of-the-art HMD-based AR approaches. We extend this work by conducting a comparative user study (N=30) to verify implementation costs of assistive behavior features and 3D registration. The results reveal that the basic concept of a 3D registered virtual human is limited and comparable to a two-dimensional screen aligned presentation. However, by incorporating additional assistive behaviors, the 3D assistance concept is enhanced and shows significant advantages in terms of cognitive savings and reduced errors. Thus, it can be concluded, that this presentation concept is valuable in situations where time is less crucial, e.g. in learning scenarios or during complex tasks.
Virtual Humans in AR: Evaluation of Presentation Concepts in
an Industrial Assistance Use Case
Eva Lampen
EvoBus GmbH
Neu-Ulm, Germany
Jannes Lehwald
EvoBus GmbH
Neu-Ulm, Germany
Thies Pfeier
Faculty of Technology, University of
Applied Sciences Emden/Leer
Emden, Germany
Figure 1: Illustration of the evaluated presentation concepts of a virtual human in AR. Contrasted are (a) a 2D screen aligned
presentation vs. (b) a 3D registered presentation, the eld of view of the AR device is shown in blue.
Augmented Reality, Virtual Human, Expert-Based Learning
Various methods are developed and investigated with the goal to
educate motor tasks within heterogeneous domains, for example in
sports [
], rehabilitation [
] or industry [
]. The eectiveness
of such methods is strongly related to the specic use case with
its pedagogical goal and the user itself [
]. In learning scenarios
without prior task related knowledge of the user, instructions pro-
vided by an expert in one-to-one settings are preferable and lead to
better task performances [
]. With the possibility to record or
simulate and afterwards digitally present expert’s motions to the
trainee, digital expert-based learning is enabled. Besides the presen-
tation of recorded real-world videos [
], using virtual humans for
educational settings gains importance, in particular due to a general
shortage of experts [
]. Whilst in other domains research with
regard to expert-based learning with virtual humans and dierent
VRST '20, November 1–4, 2020, Virtual Event, Canada
presentation concepts exists, limited empirical evidence demon-
strates the usefulness of such concepts during manual assembly
tasks, having regard to implementation costs. This is surprising, as
the complexity of manual assembly tasks is increasing, driven by
the growing number of dierent product variants [
], and hence
an adoption of assistance concepts successfully employed in other
domains seems advised.
This paper addresses both the general applicability of a virtual hu-
man to an industrial assistance setting with realistic assembly tasks
as well as the merit of two dierent presentation concepts.
Besides the utilization of virtual humans to increase the feeling of
presence of other users in collaborative work situation [
virtual humans are more and more used to enable digital motor
learning. A wide range of dierent presentation concepts of virtual
humans utilized in motor learning scenarios exists. The presenta-
tion concepts dier in consideration of their implementation costs
and learning outcome. In the following an overview of developed
and evaluated presentation concepts is given.
2.1 Two-Dimensional Presentation Concepts
Comparable to the visualization of real-world videos, the presenta-
tion of animated virtual humans in a two-dimensional (2D) screen
aligned way is realized in dierent ways [
]. A benet of display-
ing animations over real-world videos is the fact, that the learning
outcome can be enhanced by focusing on relevant information [
and integrating assistive behavior [
]. Due to that, in addition to
the presentation of pre-recorded dance videos, a 2D screen aligned
skeleton representation was integrated in the YouMove system [
to enable feedback provision. Whilst a skeleton presentation instead
of a realistic human visualization was utilized, ndings suggest that
within dynamic information presentations more realistic shapes
increase the acceptance [
] and movement accuracy [
]. By
displaying the three-dimensional (3D) content in a 2D way, lower
implementation costs occur. Therefore, a screen aligned presenta-
tion format is preferable as long as there is no need of environmental
interaction of the virtual character [32].
2.2 Three-Dimensional Presentation Concepts
Extended reality (XR) technologies facilitate a realistic presentation
of a virtual human by enabling a non screen-aligned 3D registered
presentation. Thereby XR-based imitation learning becomes more
and more realistic and is widely utilized in sports, despite of higher
implementation costs.
Chen et al. [
] introduced Immertai, an immersive virtual reality
(VR) training tool to enable remote motion training. The Tai-Chi
trainer’s motions are mapped to a virtual human, observed and
imitated by the attendees. The results indicate advantages of 3D en-
vironments for learning scenarios in terms of learning time, motion
similarity and user experience over the presentation of 2D material.
Apart from presenting a virtual human in an exocentric perspec-
tive, which leads to cognitive load, due to multiple stimuli and the
needed eort of transferring the perceived exocentric motion to the
own body [
], egocentric presentations are utilized. AR-Arm [
] is
an immersive augmented reality (AR) tool to train Tai-Chi motions
with regard to the upper limbs in a rst-person perspective: the
movement of the virtual arms are displayed in an egocentric per-
spective and imitated by the users which leads to benets in terms
of body ownership compared to a 2D screen method. Moreover, the
idea of motor learning from a rst-person perspective is transferred
to VR with the system Onebody [
]. The results reveal advantages
in terms of posture matching accuracy, user experience and time
of completion over 2D presentation techniques like video, video
conferencing and a VR third-person 3D view.
Although, the idea of presenting a 3D virtual human or body parts
of such a human gain importance in sports scenarios, to our knowl-
edge, solely Lampen et al. [
] proposed the adoption for motor
learning in an industrial setting so far. The authors displayed mo-
tions of basic manual assembly tasks in AR, performed by a true to
scale 3D registered virtual human. The presentation of the virtual
human, which was benchmarked against a paper-based method and
a 3D registered product related presentation, decreases cognitive
load and supports performance parameter (i.e errors, completion
The present work builds upon a large body of related work. Whilst
a general usefulness of a virtual human during basic manual as-
sembly tasks was proven [
], the questions occur whether a 2D
screen aligned presentation with lower implementation costs evoke
similar results and whether the integration of assistive behavior
could enhance the concept. Therefore the presented work extends
previous work in several ways:
Evaluation with regard to relevant performance criteria for
manual assembly use cases in a setup with realistic assembly
Evaluation on the merit of the integration of assistive behav-
ior in a multi stationary assembly setting
Comparison of a 3D registered to a less complex 2D screen
aligned presentation concept in HMD-based AR
In the following a description of a set of three dierent presen-
tation concepts (see Fig. 1) in an industrial assistance use case is
given with regard to the related work. The general concepts of
the evaluated virtual human assistances are presented, likewise a
brief explanation of their technical realization. For further infor-
mation see previous work [
] and additional video material [
The hardware set-up consists of a head mounted display (HMD,
MS HoloLens I), to allow for a hands free interaction with the envi-
ronment [
]. The motions of the expert have been captured by an
XSens system [27] and mapped on a virtual human.
4.1 Presentation Concepts
Two aspects of presentation concepts are contrasted. First, whether
the virtual human is presented in 2D (in-view) or 3D (in-situ), and
second, whether additional assistive behaviors are incorporated or
not (see Tab. 1). Due to feasibility constraints the last aspect is only
contrasted in-situ.
VRST '20, November 1–4, 2020, Virtual Event, Canada
Table 1: Considered presentation concepts
Presentation Dimensional Perspective Assistive
Concept View View Behavior
𝑉 𝐻2𝐷in-view 1st & 3rd person -
𝑉 𝐻3𝐷in-situ 1st & 3rd person -
𝑉 𝐻3𝐷+𝐴𝐵 in-situ 1st & 3rd person [20]
4.1.1 Virtual Human 2D Screen Aligned. The 2D screen aligned
concept (
) is realized as follows (see Fig. 1(a)): a virtual 2D
screen is displayed in front of the user, always in the same dis-
tance and position in the eld of view (FOV) of the user. It was
thus carried along with any head movement, which was shown
to be preferable to registered 2D methods [
]. Whilst walking
motions, the 2D view is rendered using a third-person view and
during environmental interactions of the virtual human, the 2D
view switches to rst-person perspective, which corresponds to
stepping inside the virtual human realized in the 3D registered pre-
sentation concepts, and therefore increases comparability between
the approaches. The visualization of the environmental interaction
linked with the change of camera perspective is invoked by the
user entering the virtual human’s personal space [
] after walking
tasks, according to the presented basic chapter functionality by
Lampen et al. [20].
4.1.2 Virtual Human 3D Registered. With regard to the dimen-
sional view a presentation concept of a 3D registered virtual human
) is included (see Fig. 1(b) ). Considering the related work of
true to scale presentation and the stated increasing adoption of such
concepts within movement learning scenarios, the implemented
visualization comprises a real size representation of a virtual human
with the possibility to interact with spatial registered virtual objects.
Similar to the
concept the waiting behavior is integrated to
trigger the subsequent motions.
4.1.3 Virtual Human 3D Registered + Assistive Behavior. The evalu-
ated presentation concept of a 3D registered virtual human with
assistive behavior (
) mainly follows the approach intro-
duced in previous work [
] (see Fig. 1(b) ). Regarding the dimen-
sional view, the presentation concept is similar to the aforemen-
concept. However, in addition, four assistive behavior
features are integrated to prevent information loss, due to small
FOVs of current AR HMDs: visibility control, progress control,
attention control and feedback control.
4.2 Technical Setup
All described concepts of a virtual human are implemented using
the game engine Unity. In general, the technical framework pre-
sented in previous work [
] is adopted. Regarding the integration
of multiple concepts, each of the stated presentation concepts is
realized within a unique Unity scene, whereby a simple applicabil-
ity of the respective presentation concept is ensured. The technical
setup is integrated within a car door assembly environment of 6.0m
x 7.0m and a ceiling height of 3.0m. For gathering environmental
knowledge and therefore, enabling the incorporation of the be-
havioral features, the instruction device as well as three Microsoft
Kinects are used as sensors. A virtual true to scale reection of
the environmental setup is mapped to the real-world by utilizing
Vuforia 8.3.8.
Within a study, the eects of a virtual human-based assistance in
consideration of the dimensional view as well as assistive behav-
ior enrichment are investigated. Thereby, Furthermore, by taking
performance parameter into account, the open gap between pre-
sentation concepts of virtual humans and industrial assistance use
cases is addressed.
5.1 Experimental design
The aforementioned technical and environmental setup with the de-
scribed set of presentation concepts as independent variable is used.
To prevent learning eects a between-subject study design was
utilized. Three main tasks (door handle, door module, door panel
assembly) derived from a real-world car door assembly station,
each consisting of 15-22 sub-tasks (picking, carrying, plugging and
screwing tasks), are adopted within the experiment [
]. Moreover,
to ensure equal task complexity across the dierent presentation
concepts, the task sequence remained unchanged. To measure the
performance of the users, the absolute task completion time as
well as the relative number of incorrect sub-tasks are measured
as dependent variables. A sub-task is considered as incorrect, if
either the spatial placement of the component does not match the
dened target position, the wrong component was assembled or if
the task was not completed at all. A main task was completed when
the participant entered the start/end zone again and conrmed the
completion without a further hint of the experimenter regarding
incomplete sub tasks. Furthermore, due to the general industrial
assistance system’s goal of simplied decision-making [
], the cog-
nitive load was measured for each main task using the NASA-RTLX
score [
]. To gather insights into the subjective perception of the
assessed concepts, the perceived experience was identied with the
UEQ-S [28].
5.2 Procedure
The participants were equipped with the HMD and a brief unied
training scenario across the dierent presentation concepts was
conducted as often as subjectively required. During the main ex-
perimental phase, the participants were asked to perform the tasks
showed by the virtual human with rst priority of making no errors
and second priority on speed. After nishing the rst main task,
the NASA-RTLX questionnaire was handed out. The procedure was
repeated for each of the three main tasks, which took approximately
30 minutes per person. At the end, the participants had to answer
the UEQ-S questionnaire.
5.3 Participants
A group of 30 voluntary participants was recruited for the between-
subject experiment without getting any extra rewards. The partici-
pants in the
group (2 females, 8 males) were aged between
24 and 49 (
=9.45), in the
group (3 females, 7 males)
between 24 and 59 (
=10.18) and in the
group (2
VRST '20, November 1–4, 2020, Virtual Event, Canada
(a) (b) (c) (d)
Figure 2: Results of (a) incorrect tasks (b) completion time
(c) cognitive load and (d) user experience.
Table 2: Summary statistics for the four evaluation criteria
Evaluation criteria Presentation concepts
𝜇 𝜎 𝜇 𝜎 𝜇 𝜎 𝑝 𝜔2
Incorrect tasks 35.50% 8.66% 46.94% 21.15% 17.32% 12.93% 0.001 0.34
Completion time 175.00s 18.09 195.75s 46.27s 233.06s 49.41s 0.017 0.20
Cognitive load 59.17% 10.08% 43.39% 15.35% 27.83% 12.35% 0.001 0.46
User experience -0.14 0.09 1.01 0.80 1.43 1.07 0.004 0.28
females, 8 males) between 22 and 59 (
=10.37). The assem-
bly tasks as well as the technical setup were new to all participants.
5.4 Analysis
Overall, 30 data sets for the four evaluation criteria are presented
and statistically compared (see Tab. 2 and Fig. 2). Besides the com-
parison of the respective means (
) and standard deviations (
the Shapiro-Wilk tests as well as the Levene’s tests proved exis-
tences of normal distribution and variance homogeneity for the
testing scenarios (p
0.05). Consequently, one-way ANOVAs and
subsequently Tukey’s HSD post-hoc tests were conducted, provided
that signicant dierences are identied. The eect sizes (
) were
quantied with 0.01 for a small, 0.06 for medium and 0.14 for large
eects [9].
The results show that the additional assistive behavior concept is es-
pecially benecial in terms of error (Tukey HSD:
: p=0.043) and cognitive load (Tukey
: p=0.001,
: p=0.04) re-
duction. The concept exceeds the others signicantly within these
parameters, but underperforms considering time criteria (Tukey
: p=0.014). With regard to the specic inten-
tion of the integrated features (i.e. prevention of information loss),
the higher completion time could be explained by the occurrence of
a speed-accuracy-tradeo. Thus, the development of other features
aiming at time saving (e.g. gamication elements like visualizing
progress [17]) could lead to dierent results.
Considering the dimensional view, the 3D view has signicant pos-
itive eects with regard to the user experience (Tukey HSD:
: p=0.037) and the perceived cognitive load (Tukey HSD:
: p=0.037). Reduced cognitive load can be linked to
known eects of product-related 3D registered visualization [
The lower user experience scores for the 2D method, especially
for the hedonic values, can be explained with the familiarization
of the presentation concepts. It can be suggested that the in-view
presentation of a 2D screen is similar to real-world screens, and
therefore known by the participants, whereas a virtual human is
not comparable to real-world phenomenons.
Consequently, the worth of extra costs for the realization of spe-
cic presentation concepts of a virtual human assistance in AR
depends on the particular requirements of the use case. Whilst the
2D presentation concept with its lower implementation costs, is
benecial for the assistance of unknown and less challenging tasks
with small error rates, the results for the cognitive load criteria
reveal the advantage of a 3D registered method for complex tasks
within learning scenarios. Moreover, the evaluated 3D presenta-
tion concept with its assistive behavior, takes eect particularly
in situations, where the need of free attention capacity exists and
time saving is less important than faultlessness. With regard to
the industrial assistance use case, such free attention capacities
could be shifted to a motion execution in an ergonomic way, or
consolidation of the specic tasks. The stated user experience re-
sults together with the importance of motivational aspects within
learning scenarios [
], strengthen the worth of a 3D registered
implementation in such scenarios.
This paper presents an evaluation with regard to dierent presenta-
tion concepts of a virtual human in HMD-based AR and furthermore
provides insights on the applicability in industrial assistance use
During the conduction of real-world derived manual assembly tasks,
performance criteria (i.e. completion time, incorrect tasks) likewise
perceived cognitive load and user experience were measured. Our
results demonstrate signicant advantages of a 3D registered vir-
tual human enhanced with assistive behavior in terms of cognitive
savings and reduced errors. However, these results can only be
revealed by the incorporation of assistive behavior. No signicant
dierences concerning performance criteria are shown by the com-
parison of a 3D registered with a 2D screen aligned presentation.
These ndings highlight the usefulness of a 3D registered virtual
human with assistive behavior in industrial assistance use cases.
Primarily, in scenarios requiring high amount of free attention
capacities, like learning scenarios or during the conduction of com-
plex tasks, in which time is less crucial. In contrast, a 2D screen
aligned presentation concept with its smaller implementation costs
is preferable for the assistance of less challenging tasks with small
error rates.
To further enhance the assistance approach of a virtual human in
HMD-based AR, the authors will focus on the presentation of solely
relevant parts of the virtual human, due to small FOV of current
HMD AR-devices.
The authors acknowledge the nancial support by the Federal Min-
istry of Education and Research of Germany (MOSIM project, grant
no. 01IS18060A-H).
VRST '20, November 1–4, 2020, Virtual Event, Canada
