Conference PaperPDF Available

Predicting Mid-Air Interaction Movements and Fatigue Using Deep Reinforcement Learning

Authors:
Predicting Mid-Air Interaction Movements and Fatigue
Using Deep Reinforcement Learning
Noshaba Cheema
Max-Planck Institute for
Informatics & DFKI
Saarbrücken, Germany
ncheema@mpi-inf.mpg.de
Laura A. Frey-Law
Carver College of Medicine,
University of Iowa Health Care
Iowa City, USA
laura-freylaw@uiowa.edu
Kourosh Naderi
Games Research Group,
Aalto University
Helsinki, Finland
kourosh.naderi@aalto.fi
Jaakko Lehtinen
Aalto University &
NVIDIA Research
Helsinki, Finland
jaakko.lehtinen@aalto.fi
Philipp Slusallek
Computer Graphics Lab,
Saarland University & DFKI
Saarbrücken, Germany
philipp.slusallek@dfki.de
Perttu Hämäläinen
Games Research Group,
Aalto University
Helsinki, Finland
perttu.hamalainen@aalto.fi
ABSTRACT
A common problem of mid-air interaction is excessive arm
fatigue, known as the “Gorilla arm” effect. To predict and pre-
vent such problems at a low cost, we investigate user testing
of mid-air interaction without real users, utilizing biomechani-
cally simulated AI agents trained using deep Reinforcement
Learning (RL). We implement this in a pointing task and four
experimental conditions, demonstrating that the simulated fa-
tigue data matches human fatigue data. We also compare two
effort models: 1) instantaneous joint torques commonly used
in computer animation and robotics, and 2) the recent Three
Compartment Controller (3CC-
r
) model from biomechanical
literature. 3CC-
r
yields movements that are both more effi-
cient and relaxed, whereas with instantaneous joint torques,
the RL agent can easily generate movements that are quickly
tiring or only reach the targets slowly and inaccurately. Our
work demonstrates that deep RL combined with the 3CC-
r
pro-
vides a viable tool for predicting both interaction movements
and user experience in silico, without users.
Author Keywords
Computational Interaction; User Modeling; Biomechanical
Simulation; Reinforcement Learning.
CCS Concepts
Human-centered computing Human computer inter-
action (HCI);
User models; User studies;
Theory of com-
putation Reinforcement learning;
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
CHI’20, April 25–30, 2020, Honolulu, HI, USA
© 2020 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-6708-0/20/04.
DOI: https://doi.org/10.1145/3313831.3376701
INTRODUCTION
Interactive devices, such as touch screens or Virtual Reality
(VR) goggles are becoming increasingly important in areas
like entertainment, medicine, or virtual design. These devices
enable a more intuitive and natural user experience with the
use of mid-air gestures as an interaction tool. As such, physical
ergonomics is an important design factor for mid-air interac-
tion. In particular, arm fatigue, also known as “Gorilla arm
effect" [34, 38], is a common problem that negatively effects
user experience.
Designing interactive experiences is fundamentally difficult
because the design goals are typically defined in terms of
subjective qualities such as user enjoyment in games or low
perceived exertion or effort in gestural interaction. Thus, the
effects of design decisions are nearly impossible to predict
without testing with actual users or players. In practice, in-
teraction design is often an iterative process of trial-and-error
where design is gradually and expensively improved through
observing users interacting with prototypes.
A rising trend in design and human-computer interaction is
to utilize computational models of users to predict the user
experience [6, 19, 27, 54, 55, 70]. If this can be done with suf-
ficient accuracy, one can rapidly evaluate alternative solutions
to design problems in silico, without users, or at least preselect
the most likely solutions to be tested in real life. Furthermore,
if a computational model can evaluate a solution, a designer
can deploy optimization algorithms to automatically find and
propose high-value solutions.
Computational user models have been successfully applied
in, e.g., game playing [28, 70, 72] and typing [54]. However,
many complex interactions are still challenging to model, in
particular in the domain of embodied experiences such as
Virtual Reality (VR), which require modeling the user’s body
and biomechanics. Fortunately, new and powerful tools are
emerging: Recent advances in deep Reinforcement Learning
(RL) [63, 49, 29, 42] provide a generic approach to train
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 1
intelligent agents for any kind of simulated system such as a
video game or biomechanical simulation, provided that one
can define the agent’s goals or tasks as a reward function such
as a game score. For user modeling, this means that one needs
to make minimal assumptions about user behavior; instead,
the agents will explore and discover the behaviors of maximal
utility – i.e., cumulative rewards – following a computational
rationality model of behavior [26, 45]. Such AI agents can also
be extended with models of intrinsic motivation and emotion
[61, 50], which can allow prediction of the user experience and
behavior beyond simple task-driven behavior and associated
metrics like task success rate [27].
Contribution: We contribute the first user modeling experi-
ment that combines deep RL with a biomechanical arm simu-
lation model that allows both synthesizing mid-air interaction
movements and predicting the associated embodied user expe-
rience, with a focus on subjective fatigue. We test the approach
in a mid-air pointing task and four experimental conditions,
replicating the experiment design of [38] who tested the same
conditions on real humans and analyzed fatigue using motion
capture data. We demonstrate that our agent learns the pointing
movements needed for the tasks, doing away with the need to
capture motions from real humans, and our simulation-based
fatigue data provides a good fit with the human data of [38].
Compared to the optimization of mid-air pointing movements
of Montano Murillo et al. [51], we do not rely on predeter-
mined effort estimates for different spatial locations. Instead,
all our data simply emerges from the biomechanical simula-
tion model and rewarding the agent for both accomplishing
the task and avoiding fatigue/discomfort.
As an additional contribution, we compare two different fa-
tigue/effort models incorporated as reward function compo-
nents: 1) instantaneous joint torques common in computer
animation, robotics [58], and standard deep RL movement con-
trol benchmark tasks (MuJoCo) [66], and 2) the recent Three
Compartment Controller (3CC-
r
) model from biomechanical
literature [47]. We demonstrate that the 3CC-
r
model yields
movements that are both more efficient and relaxed, whereas
with instantaneous joint torques, the RL agent can easily gener-
ate movements that are quickly tiring or only reach the targets
slowly and inaccurately. As the 3CC-
r
model causes no signif-
icant increase in computational complexity, we advocate deep
RL researchers also incorporating it to their benchmark tasks
to increase both the realism of biomechanical effort modeling
and relaxedness of emerging movements. 1
1Source code is available at:
https://github.com/noshaba/ArmFatigue
BACKGROUND AND RELATED WORK
Simulating User Behavior
The literature on user modeling features different kinds of
models. The most simple ones like Fitts’ law [21] allow pre-
dicting a quantity like pointing target acquisition time as a
function of design variables like target distance using sim-
ple mathematical expressions. However, in many cases such
mathematical models are not available, and one must instead
resort to simulations of how users perceive, things, and act
while completing tasks. This was first proposed by Card et
al. [10, 54] as early as in 1983. Their GOMS model was later
extended by ACT-R and other more sophisticated cognitive
architectures [65, 54].
A limitation of early user models like GOMS is their com-
plexity: Successful application of the model in a design task
requires the designer to provide a detailed breakdown of the
user’s goals and expected behavior. Early cognitive archi-
tecture development was also plagued by various cognitive
processes modeled in isolation, with difficulties of integrating
them to general solutions for, e.g., autonomous skill acqui-
sition [65]. However, this was prior to the recent deep neu-
ral network revolution; deep Reinforcement Learning (RL)
agents have now been demonstrated to learn a wide variety
of skills ranging from video game play [49] to controlling
the movement of biomechanically simulated human bodies
[42]. Although RL methods can be complex, they are simple
to apply
2
, which makes them lucrative for user simulation
purposes.
2
At least in principle; in practice, RL methods have their quirks and
applying them successfully can require quite a bit of experimentation.
See, e.g., [33].
Reinforcement Learning is an approach to discovering the
optimal actions for a Markov Decision Process (MDP) [63]. It
is assumed that at time
t
, the agent is in state
st
, takes action
at
, and observes a reward
rt
and a next state
st+1
. The agent is
optimizing utility, i.e., expected cumulative future rewards, in
line with the computational rationality view of human behavior
[26]. Thus, for user modeling, one only needs to define the
states, actions, and rewards. At least in some cases, the reward
function and other parameters can also be inferred from human
data [11, 41, 3].
Yang et al. [69] provide a comprehensive study of what Ma-
chine Learning (ML) can offer to Human Computer Inter-
action (HCI). Traditionally, RL user simulation in HCI has
been limited into simple MDP:s with discrete, enumerable
states and actions, e.g., dialogue systems, menus and simple
keyboards [44, 12, 43]. The discrete states and actions make
the MDP:s solvable with classic RL methods like Q-learning
[63]. However, recent deep RL methods like Proximal Policy
Optimization [62] and Soft Actor Critic [29] also work with
the high-dimensional states and actions required for intelligent
control of human biomechanical simulation [42]. In this paper,
we demonstrate that this makes deep RL a viable approach for
modeling embodied interaction and predicting user experience
outcomes such as fatigue.
A recent result that further motivates our work is that biome-
chanically realistic movement can be synthesized efficiently
through simplified skeletal simulation without muscle and ten-
don detail, as long as the actuation effort minimized by an RL
agent is computed with a higher degree of biomechanical real-
ism through a Machine Learning model that predicts muscle
activations from joint actuation torques [39]. Extending this
approach, we actuate with joint torques and increase biome-
chanical realism through a fatigue model incorporated as an
extra reward function component.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 2
It should be noted that modern neural-network based RL also
generalizes to Partially Observable Markov Decision Pro-
cesses (POMDP:s) where the agent cannot access the full
environment or simulation state [32]. This is often the case in
user modeling that incorporates realistic perception models.
Quantifying Mid-Air Interaction Fatigue
Muscle fatigue is the failure to maintain the required or ex-
pected force [16]. Fatigue depends on a multitude of simul-
taneous physiological and neurological processes, making it
difficult to pinpoint a single mechanism responsible for the
loss of force [68, 22, 1, 4]. It is also task-related and can vary
across muscles and joints [68, 36, 17, 38, 23, 25], which par-
tially explains the challenging nature of representing muscle
fatigue analytically [68].
A widely used empirical model to estimate the effect of fatigue
on the task endurance time (ET) of static load conditions is
the Rohmert’s curve [60, 36]. Hincapié et al. [34] developed
Consumed Endurance (CE), a metric to quantify arm fatigue of
mid-air interactions, based on the Rohmert’s curve. Although
straightforward, the approach lacks the ability to generalize
to dynamic load conditions or recovery during rest periods
[68, 38] as it is based on the Rohmert’s curve, which is only
valid for static load conditions. Furthermore, CE is assumed
to be zero at exertion levels below 15%. This limits the use of
the model for evaluating mid-air interaction with low exertion
levels [38, 23].
Liu et al. [46] have proposed a motor unit (MU)-based fatigue
model which uses three muscle activation states: resting (
MR
),
activated (
MA
) and fatigued (
MF
). The model is able to predict
fatigue at static load conditions but fails at submaximal or
dynamic conditions [68, 38]. Xia et al. [68] have proposed a
Three-Compartment Controller (3CC) model which improves
upon the model of Liu et al. for dynamic load conditions by
introducing a feed-back controller term between the active
(
MA
) and rest (
MR
) muscle states. Frey et al. [25] validated
the 3CC model in estimating ET under static load conditions
and have obtained joint-specific parameters. Later, Jang et al.
[38] have optimized the 3CC model for mid-air interaction
tasks [38]. They further show that the 3CC model can be
used to estimate human perceived fatigue ratings based on the
Borg CR10 scale [8], which is a 10-point categorical rating
system to quantify perceived fatigue ratings of individuals.
It uses verbal anchors and numbers to map the magnitude
of exertion to a scalar invariance scale [8, 38] (Table 1). The
scale has been used in a variety of contexts, such as sports [18],
ergonomics [7], or medicine [53] for subjective evaluation of
fatigue based on physiological and psychological changes.
While Jang et al. [38] findings are based on kinematic data
only, they still require motion data captured from real humans.
Table 1. Borg CR10 scales with verbal commentary.
Borg CR10 Scale
Score Definition Note
0 Nothing at all No arm fatigue
0.5 Very, very weak Just noticeable
1 Very weak As taking a short walk
2 Weak Light
3 Moderate Somewhat but not hard to go on
4 Somewhat heavy
5 Heavy Tiring, not terribly hard to go on
6
7 Very strong Strenuous
8
9
10 Extremely strong Extremely strenuous
More recently, Looft et al. [47] have published an improved
3CC-
r
model for intermittent tasks by introducing an addi-
tional rest recovery multiplier
r
and validated their results
based on perceived fatigue from participants for specific joints.
PRELIMINARIES: FATIGUE MODELING
We investigate two different fatigue models: 1) instantaneous
joint torques as a measure of instantaneous effort, and 2) the
recently developed Three-Compartment Controller (3CC-
r
)
Model by Looft et al. [47].
Instantaneous Joint Torque Effort
Instantaneous joint torque is a simple measure used in com-
puter animation, robotics, and standard RL benchmark prob-
lems [9, 2, 67, 56] to measure and minimize the instanta-
neous effort of a given task a simulated agent is performing.
When defining movement optimization objective functions,
the torques are usually squared to make the optimization avoid
using excessive strength.
Three-Compartment Controller (3CC-r) Model
An instantaneous effort model gives us a simple measure to
determine the difficulty of a given task. However, it is not very
biologically accurate as a simple task can become difficult
when done long enough. A cumulative effort function, such as
the 3CC-
r
model, thus gives a more accurate representation of
perceived fatigue.
Akin to Liu et al. [46], the 3CC-
r
model [47] assumes motor
units (MUs) to be in one of the three possible states:
active - MUs contributing to the task
fatigued - fatigued MUs without activation
resting - inactive MUs not required for the task
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 3
Figure 1. Three compartment controller model.
Fig. 1 shows the relationship between these states.
MA(%)
is
the compartment of active MUs,
MF(%)
the compartment of
fatigued, and
MR(%)
the compartment of resting MUs. Each
compartment is expressed as a percentage of the maximum
voluntary contraction (%MVC)[68, 38].
In addition to that, the compartment theory is combined with
control theory to define system behaviour which matches mus-
cle physiology [68], i.e. active MUs’ force production should
begin to decay (fatigue) over time. This is expressed by the
following equations:
MR
t=C(t) + rR ·MF(1)
MA
t=C(t)F·MA(2)
MF
t=F·MArR ·MF(3)
Where
F
and
rR
are the model parameters defining at which
rate the motor units fatigue, and which rate they recover and
enter the rest period, respectively. In contrast to the traditional
3CC model [68], the 3CC-
r
model introduces an additional
rest recovery factor
r
, which enhances the recovery when the
required force, i.e. target load
(T L)
, is zero to better represent
perceived fatigue estimates from user studies [47]:
rR =r·Rif MAT L
Relse (4)
The 3CC-
r
[47] is equivalent to the 3CC [68] model when
r=1
. Based on a sensitivity analysis
r
is set to 7.5.
F
is set to
0.0146
, and
R
to
0.0022
based on Jang et al. [38] 3CC-model
optimization for mid-air interactions.
C(t)
is the time-varying muscle activation-deactivation drive,
which can produce the target load
T L
in percent by controlling
the size of
MA
and the availability of
MR
. The following
equations describe C(t)mathematically:
C(t) =
LD·(T L MA)if MA<T L and MR>(T L MA)
LD·MRif MA<T L and MR(T L MA)
LR·(T L MA)if MAT L
(5)
LD
is the muscle force development factor, and
LR
is the re-
laxation factor. Based on the analysis by [68], these are set to
10.
SYSTEM
We implemented our system using the Unity game engine and
their ML Agents Toolkit v0.8.2 [40] implementation of the
Proximal Policy Optimization (PPO) [62] RL algorithm. The
following details our effort model, the simulated pointing task,
and the RL problem formulation and training settings.
Fatigue Model
Instantaneous Fatigue Model
In this paper, we compare instantaneous torques to the more
advanced 3CC-
r
modeling. More specifically, we use instan-
taneous torques normalized with respect to maximum torque
Tmax:
E f f ortI(~
T) = k~
Tk
Tmax !(6)
Cumulative Fatigue Model
Since the 3CC-
r
model is a relative unit-less system [68],
T L
can be described in a variety of options, e.g. the percentage of
maximum voluntary torques (MVT) or forces (MVF).
Previous studies [38, 25, 47] set
T L
as the ratio of the magni-
tude of torque
~
T
and the maximum voluntary torque
Tmax
at a
joint:
k~
Tk·
T100%
. While this is a valid approach to measure
max
the load at a given joint, we found it to be more accurate to
model two 3CC-rmodels per degree-of-freedom (DOF) (one
for the "positive", and one for the "negative" direction)). Each
DOF roughly corresponds to a muscle group. We avoid mod-
eling at the level of individual muscles to reduce simulation
and RL training time.
The target load can then be expressed as a vector of torque
ratios for each direction:
~
T L(~
T) = hT+
1
Tmax ,T
1
Tmax ,T+
2
Tmax ,T
2
Tmax ,T+
3
Tmax ,T
3
Tmax i>(7)
T+
Where
i
T
is the ratio of the torque at axis
i
in the “positive"
max
T
direction, and
i
T
in the “negative" direction, respectively.
max
T+
iT
When
T0
, then
i
=
T0
, and vice versa. Each value in
max max
~
T L is used as input for a separate 3CC-rmodel.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 4
Figure 2. Behaviour of 3CC-rmodel at target load of 50% MVC (black
dotted line). The load cannot be held after around 70 s (yellow dashed
line).
For simulated agents we describe the cumulative effort
E f f ortC
as the difference between the actual target load
~
T L(~
T)
at the joint and the desired muscle activation
M
~
A=
M+, ,
AM
AM+,
AM,
1 1 AM+,
2AM>
2 3 A3given by the 3CC-rmodel:
E f f ortC(~
T) =
~
MA
100 ~
T L(~
T)
(8)
 
The advantage of this over using
MF
directly is that the cumu-
lative fatigue given by the 3CC model stagnates after some
time (Fig. 2 red line). Furthermore, it is not clear from
MF
alone when a target load could not be held. Using the differ-
ence between the actual active motor units and the desired load
tells us when the load is okay to be held, and when it becomes
a burden. This is shown in Fig. 2, where the active motor
units (yellow) start to decline after 70 s because the target load
(black) was not sustainable.
Simulated Upper Limb Model
Similar to [38], we assume that arm fatigue is mostly attributed
to shoulder-joint fatigue, due to the shoulder fatiguing faster
than the elbow or wrist during arm movement [23]. To make
our results comparable to theirs, we also only use the shoulder
joint torques for our effort model. We simulate our arm using a
4 DOF serial chain - 3 for the shoulder and 1 for the elbow (Fig.
3), where the limbs are modeled as rigid bodies connected by
joints [57]. To estimate the shoulder joint torques we use the
method in [34].
Figure 3. Forces acting on our biomechanical upper limb model. The
limbs are modeled as rigid bodies (red) connected with joints (green).
The degrees of freedom (DOF) of each joint are denoted by the arrows
at the respective joint.
Mid-Air Pointing Task
We model the mid-air pointing task after the ISO 9241-9 stan-
dard [37, 38, 64] based on Fitts’ law [21, 48]. The standard is
extensively used for evaluating 2D pointing devices, such as
mice, pens, and touch screens [64]. The task has participants
point at a circle of targets, with a given width and distance
to each other, in a given order. Akin to [38], we use seven
targets with a width of 10 cm and a distance of 30 cm to each
other, corresponding to an index of difficulty
ID
[21, 48] of
 
ID =log 30
2+
10 1=2. Fig. 4 shows the target sequence.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 5
Figure 4. ISO 9241-9 reciprocal pointing task with 7 targets. Agents
point to highlighted (red) target. Targets highlight in the pattern indi-
cated by the arrows.
Previous studies [38, 34, 35] have shown that the height and
distance from the arm’s resting position affect the perceived fa-
tigue ratings of participants, making them fatigue more rapidly
the higher and further away the arm is. Additionally, rest peri-
ods are also a decisive factor of how fatigued we are [38]. We
investigate these two factors in our experiments and compare
them to human perceived fatigue ratings from a prior study
[38].
Similar to Jang et al. [38], we switch between pointing and
resting periods. Like them, we use the following four different
rest periods: [5s, 10s, 15s, 20s].
RL Problem Formulation
To be able to apply RL methods, we need to define the MDP,
i.e. states, actions, and rewards. Additionally, the PPO method
we use requires the definition of a policy network for sampling
an action given the current state. The MDP transition model
st+1=f(st,at)
is implemented by Unity’s ML-Agents Toolkit:
After sampling an action, we use it to actuate the simulation
and query the next state.
State and Action Space
The state vector
~
s
of the agent, comprises a concatination of
the following features:
limb positions with respect to the shoulder [9 values]
linear velocities of upper and lower arm limbs [6 values]
angular velocities of upper and lower arm limbs [6 values]
direction vector from finger tip to target (not normalized)
[3 values]
target switch time [1 value]
rest period Boolean [1 value]
We define our RL actions as actuation torques applied through
Unity’s AddTorque() method at the center of mass of the
upper and lower arm limbs. The action vector
~a
from the
policy specifies how much of the maximum voluntary torques
(MVT) to apply to the limbs. It is comprised of 4 values,
denoting the three actuation torque values for the upper, and
one for the lower arm. The torques that are applied cannot
overshoot shoulder MVT and elbow MVT values, respectively.
The shoulder MVT is furthermore used as
Tmax
in the effort
calculation in Eq. 6 and 7.
Network
A policy
π
is represented by a neural network which maps
a given state
s
to a distribution over action
π(a|s)
. The ac-
tion distribution is modeled as a Gaussian, where the state
dependant mean
µ(s)
and the diagonal covariance matrix
Σ
are specified by the network output:
π(a|s) = N(µ(s),Σ)(9)
The inputs
s
to the network is processed by two fully-
connected layers with 128 hidden units, each, using the Swish
[59] activation function. During training the network adapts
the mean and covariance matrix such that the actions become
less noisy as the agent gradually starts to exploit instead of
explore.
Reward
The reward
ρ(t)
at each step
t
consists of two terms that
encourage the character to pointing towards the target when
there is one, while using minimal effort:
ρ(t) = (ωPρ(t)P+ωFρ(t)F)·0.01 (10)
ρ(t)P
and
ρ(t)F
are defined as the pointing and fatigue ob-
jectives, respectively, with
ωP
and
ωF
being their respective
weights. The pointing objective encourages the agent to point
towards the current target, while the fatigue objective encour-
ages it to make use of actions which require less effort than
others. We found that setting
ωP=100
and
ωF=0.01
re-
sulted in the desired behavior for various settings.
The pointing reward
ρ(t)P
depends on the distance between
the target and the finger tip, and is defined with:
ρ(t)P=
1 target has been hit
expkptarget (t)pfinger (t)k2
τ2
Pelse
(11)
where
ptarget (t)
is the target’s position, and
pf inger(t)
the posi-
tion of the finger tip at step
t
, respectively.
τP
the tolerance dis-
tance in meters for when the reward becomes
ρ(t)P0.3679
,
when the finger does not hit the target. We set
τP=0.15
, i.e.
it starts heavily penalizing deviations of more than 15cm from
the target. A minimal reward of approx. 0.001 is obtained at
around 40 cm from the target with this objective function.
The fatigue reward
ρ(t)F
is defined using the respective
E f f ort function defined in Eq. 6 or 8:
ρ(t)F=exp E f f ort (~
T(t))2
τ2
F!(12)
τF
is the tolerance in percentage of how much the torque
ratio is allowed to deviate from the desired torque ratio, while
obtaining a reward of at least approx. 0.3679. In the case of
the instantaneous effort function this means how much percent
is the shoulder muscle allowed to deviate from zero torque.
However, in the case of the cumulative effort function based
on the 3CC-
r
model, this means how much is it allowed to
deviate from the allowed motor unit activation
M
~A
given by
the 3CC-
r
model. The best tolerance value
τF
for each effort
model is determined in the results section.
Note that in some movement optimization cases, squared cost
terms are used without the exponentiation [52, 2]. Our use of
exponentiation follows Peng et al. [56]; it converts minimized
costs to maximized rewards and also limits the reward to a
predefined range, which makes it easier to train PPO’s value
function predictor network.
Training
The neural network training is done trough RL, such that the
the agent maximizes the cumulative episode rewards. No data
was used for this; RL proceeds through exploring random ac-
tions and learning to repeat those that yield high rewards. The
policy training is done using the Proximal Policy Optimization
(PPO) algorithm [62]. Standard hyperparameters defined in
[40] were used, with the following adjustments: batch size
= 2024, buffer size = 20240,
γ=06
.995
, max steps =
1.0e
,
normalization = True, number of epochs = 3, time horizon =
1000, summary of frequency = 3000.
Initial State Distribution
PPO training proceeds in episodes, where at the start of each
episode the agents and the environment are reset to an initial
state
s0
. Each episode is simulated to a fixed time horizon
with actions sampled from the policy, after which the agents
and the environment are reset again. In total, we use
1e6
time
steps, or 1000 episodes.
Many RL benchmark problems such as the MuJoCo locomo-
tion environments use a fixed initial state
s0
or add only small
random perturbations to it [9]. However, as demonstrated
by [56], a diverse enough initial state distribution can greatly
improve movement learning. To implement this, we sample
multiple settings of the pointing task randomly from a uniform
distribution. Table 2 shows an overview of these settings. Tar-
get Height and Target Distance are in relation to the shoulder
position and the center point of the target circle. Pointing
Period describes the duration of the pointing period before the
user is supposed to rest, while Rest Period Index defines which
of the four rest periods [5s, 10s, 15s, 20s] to use. The initial
target index of the seven targets sequence is chosen randomly.
Table 2. Settings of the pointing and resting task during training. Target
height and distance are in relation to the shoulder position.
Training Settings for Pointing Task
Setting Distribution Lower Bound Upper Bound
Target Height Uniform 40 cm +20 cm
Target Distance Uniform +10 cm +70 cm
Switch Time Uniform 1 s 2 s
Pointing Period Uniform 30 s 90 s
Rest Period Index Uniform 1 4
Initial Target Index Uniform 1 7
During training we use five agents to parallelize the simulation
over multiple CPU cores. Their attributes are also sampled
randomly for each episode. Hence, each RL training run uses
a large and diverse population of simulated users. However,
instead of sampling from a uniform distribution we sample
from a Gaussian distribution to more accurately represent male
and female strength properties. Table 3 shows which features
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 6
we sample from what kind of distribution. For the MVT of
shoulder, we use average values found in biomechanical liter-
ature [30]. Based on [24, 30] we estimate the average elbow
MVT between 1 Nm to 10 Nm higher to the person’s shoulder
MVT. Hence, the elbow MVT is sampled from a uniform dis-
tribution, where the lower bound (LB) is the agent’s shoulder
MVT and the upper bound (UB) is an additional
+10
Nm.
Once the body weight is sampled, the actual arm weight is set
according to average body weight percentages for the upper
limbs (Table 4) [14].
Table 3. Weight and Maximum Voluntary Torque (MVT) settings for
upper body limbs. The elbow MVT is in relation to the agent’s shoulder
MVT.
η
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 7
Agent Settings
Setting Distribution LB / µUB/σ
Body Weight (M) Gaussian 80 kg 1.0
Body Weight (F) Gaussian 65 kg 1.0
MVT Shoulder (M) Gaussian 54.9 Nm [30] 1.0
MVT Shoulder (F) Gaussian 29.2 Nm [30] 1.0
MVT Elbow Uniform +0 Nm +10 Nm
Table 4. Length of arm limbs in cm and their corresponding weights in
percentage of the total body weight based on [14].
Arm Length & Weight
Limb Length Male Female
Upper Arm 29.06 cm 2.71% 2.55%
Lower Arm 29.44 cm 1.62% 1.38%
Hand 21 cm 0.61% 0.56%
For the 3CC-
r
model we furthermore set a random initial
fatigue by sampling uniformly from an initial shoulder load
(between
Tmax
and
Tmax
for each of the three DOF), which
is then applied for a randomly set time (between 0 s and 180
s) onto the joint. With this we gather experiences for more
long-term fatigue effects.
RESULTS
We evaluate our system in two experiments, with details pro-
vided below. The results are visualized in Figures 7 and 8.
First, we compare the movement synthesis quality of the two
effort models in terms of both accuracy of reaching the point-
ing targets and relaxedness of movement (Subsection “Re-
laxedness vs. Accuracy"). As we define relaxedness as the
arm using minimal effort when possible, there is an obvious
accuracy-relaxedness trade-off for the agent, which can be ad-
justed with the reward function parameter
τF
from Eq. 12. To
quantify relaxedness we propose a relaxedness metric
η
based
on hand-crafted kinematic motion features that characterize
typical movement behaviour (Eq. 13 - 18).
Second, using the model parameters that yield the best
accuracy-relaxedness trade-off, we compare our model’s fa-
tigue estimates based on synthesized movements with the av-
erage Borg CR10 ratings reported in [38], as well as their 3CC
fatigue estimates which are based on human movement data
obtained from a Kinect (Subsection “Comparison to Ground
Truth Human Data"). In contrast to [38], our fatigue estimates
are solely based on the synthesized movements of the simu-
lated users trained using deep RL, without the use of human
movement data.
Relaxedness vs. Accuracy
To determine the best fatigue tolerance value
τF
for each effort
model, we do a grid search on the trained networks and plot the
results in terms of accuracy and relaxedness of the motions.
We train models with
τF
values between
0.0τF0.5
in 0.02
steps, totalling 52 trained models (26 for each effort function)
for 25 different random seeds. Each model’s synthesized
movements are evaluated on efficiency and relaxedness. In
this experiment we use 20 agents, which are akin to 20 people,
with different settings for each trained model. The parameters
of the agents are seeded to have the same settings for each
model. To make the models comparable to ground truth human
data reported in [38], we set the targets’ switch time to 1.3 s,
and the pointing period to 60 s. Jang et al. [38] determined
that if subjects performed four mid-air interaction periods in
a series they had a higher chance of learning and pre-fatigue
effects [20]. Hence, we designed our experiment similar to
their experiment with the following rest periods in between
in this order: [20s, 5s, 15s, 10s]. This setup is akin to group
1 in [38] (Fig. 6). In total the pointing task lasts roughly 5
min. We placed the targets at four different interaction zones
with five agents sharing the same zone: one at shoulder height
and having the arm bent, one at waist height and having the
arm bent, one at shoulder height and the arm straight, and
finally one at waist height and having the arm straight. The
four interaction zones for this experiment are shown in Fig. 5.
For the accuracy of a model we consider two separate mea-
sures: the median of the average distance over time from a
target, and the median average time it takes to reach a target.
If the agent could not reach the target, the time is set to the
switch time. The median is computed over the average value
for each agent over the 25 randomly seeded training sessions.
To determine the relaxedness of arm movements, we use the
following equation:
η=4]
dpoint
E+g
φrest
E+g
φrest
A+g
vrest
A(13)
with
dpoint
E,φrest
E,φrest rest
,[,]
AvA0 1
. The higher the value is,
more natural and relaxed the arm motion is supposed to be.
"
e
" denotes the median value over the 20 agents and the 25
training sessions.
dpointing
E
is the average distance over
T
time steps of the elbow
to the plane that is spanned between the shoulder and finger
tip and the direction of gravity, when the agent is pointing:
dpoint
E=1
T
t
|h~n,
ShEli|
k
ShElk(14)
with
ShEl
being the vector from the shoulder to the elbow
position. The distance is divided by the length of this vector
to obtain values between 0 and 1.
~n
is the plane normal of the
shoulder-finger plane:
~n=
ShFi
k
ShFik×~g
k~gk(15)
ShFi
is the vector from the shoulder to the finger tip. The
higher dpoint
Eis, the further away the elbow is from this plane.
The idea is that during pointing movements the agent should
prefer to keep its elbow down since this position is perceived
as less fatiguing than having the elbow point side-ways.
To measure the relaxedness of the arm during rest periods, we
add φrest rest rest
E,φA, and vAinto the relaxedness equation.
φrest
E
is the average elbow angle between the lower and upper
arm. The idea is that during rest periods the agent is not
supposed to flex its arm much. It is calculated the following
way:
φrest
E=1
T
t h
El Sh
k
El Shk
,
El Ha
k
El Haki+1!·0.5 (16)
→ −
El Sh
is the vector from elbow to shoulder and
El Ha
from
elbow to hand, respectively. When the arm is straight the dot
product becomes
1
, when the lower arm is perpendicular to
the upper arm the dot product is 0, and when the arm is flexing
it is close to 1. To keep
φrest
E
between 0 and 1 (0 being straight
and 1 flexing), we add 1 to the dot product and scale it with
0.5.
While this value lets us know when the arm is flexing during
rest periods, with it alone the relaxedness measure would
classify holding an arm in front of oneself as more relaxing
than flexing it. Thus, we also calculate the average angle
φrest
A
between arm and the direction of gravity and incorporate it in
our relaxedness equation:
φrest
A=1
T
t h
COMSh
k
COMShk
,~g
k~gki+1!·0.5 (17)
COMSh
is the vector from the center of mass of the arm to the
shoulder.
While this gives us a good measure for policies where the arm
learns to be static during rest periods, it still sometimes classi-
fies moving arms to be more relaxed than flexing but resting
arms during rest periods. This is because the average of all
frames is taken and if the arm jerks around a lot, this average
of that could still be an arm hanging down. To overcome this
issue, we also add the average velocity
vrest
A
of the arm during
rest periods:
vrest
A=1
T
t
vUA +vLA
vrest
Amax
(18)
vUA
and
vLA
are the upper arm and lower arm velocities ob-
tained by the Unity engine.
vrest
Amax
is the maximum velocity
value of all parameter settings and agents.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 8
Figure 5. Four interaction zones used for determining the best model. 1) Target is shoulder height and arm is bent. 2) Target is waist height and arm is
bent. 3) Target is shoulder height and arm is straight. 4) Target is waist height and arm is bent.
Figure 6. Experiment protocol of relaxedness vs. accuracy measure (G1),
as well as during comparison against ground truth human data (G1, G2).
Fig. 7 shows the results of the 3CC-
r
and the instantaneous
effort model. Each point denotes a different
τF
value for a
model. The models are ordered based on their
τF
value (lowest
first). Models above a median time of 1.2 s learned to hang
the arm down due to obtaining more reward from using as
little effort as possible compared to the reward obtained from
pointing. Models with relaxedness values below 3.4 resulted
in unnatural movements and jerky arm behaviour during rest
periods. Furthermore, the variance of the results in terms of
accuracy increases as is suggested by Fig. 7. The sweet spot
in which motions are relaxed, i.e. arm hangs down during rest
periods and elbows are kept down during pointing periods, but
still accurate lied usually within relaxedness values between
3.5 and 3.9 (Fig. 7). The plots in Fig. 7 show how the 3CC-
r
models consistently outperform the instantaneous fatigue
model in terms of speed, accuracy and relaxedness within this
region, i.e., points near the bottom-right corners of the plots.
Based on the results in Fig. 7, we found that
τF=0.12
for
the 3CC-
r
model yields the best results in terms of efficiency
and relaxedness. In the next section we will use this model to
compare against ground truth perceived fatigue ratings from
humans.
Figure 7. The evolution of the trade-off between pointing task perfor-
mance and movement relaxedness when sweeping the τFparameter in
the range [0,0.5]in steps of 0.02, plotted using both instantaneous torque
effort (green) and the 3CC-rmodel effort (red). For each tested τF,
the 20 agents are retrained and re-evaluated across 25 random seeds.
The yellow stars indicate the best combinations of both relaxedness and
pointing task performance. Overall, the 3CC-ryields better combina-
tions of relaxedness and pointing task performance, across a range of
τF.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 9
Comparison to Ground Truth Human Data
We compare our best model with the ground truth human data
obtained from [38]: Jang et al. [38] use the 3CC model to
predict fatigue ratings based on torque measures estimated
from motion capture data from a Kinect [71] sensor. They
compare their results with the subjects’ perceived exertion rat-
ing using the Borg CR10 [8] scale. In this section we compare
our 3CC-
r
fatigue estimates using the torque measures from
synthesized movements to their 3CC fatigue estimates based
on human movement data. We further compare our 3CC-
r
fatigue estimates to the subjects’ average Borg CR10 ratings
reported in [38].
To replicate the four conditions in [38], we use the first two
interaction zones shown in Fig. 5, and two groups with dif-
ferent rest periods in between the four 60 s pointing periods:
[20s, 5s, 15s, 10s] for group 1 and [5s, 10s, 20s, 15s] for group
2 (Fig. 6). In the following we refer to group 1 and 2 as G1
and G2, and the high and low interaction zones as H and L.
Based on the findings of Jang et al. [38], the tasks based on
the higher interaction zones should be more fatiguing than the
ones based on the lower interaction zones. Furthermore, G1
should feel less fatigued compared to G2, due to a large rest
period in the initial period of the task.
Jang et al. [38] use 24 participants in their study of which two
were female. Since there was no ground truth data published
of each participant’s weight and their corresponding maximum
torque estimate, we gauge their subjects in a virtual environ-
ment by using average torque and arm weight estimates found
in literature. See Table 3 and 4 for details. We also use 22
male, and 2 female (virtual) subjects.
Similar to [38] we assume a linear relationship between the
fatigued motor units
MF
obtained from the 3CC-
r
model and
the Borg CR10 scale with
ϕ(x) = 0.3·x
denoting the linear
mapping. To compute
MF
for the Borg CR10 estimate, we
use the ratio of magnitude of the torque
kTk·
T100%
as a target
max
load for the model.
An overview of our results is shown in Fig. 8. Our 3CC-
r
estimates (black) on virtual data mostly follows the trend of
the 3CC estimates (red) from [38] based on human data, as
well as the ground truth average Borg CR 10 data (yellow).
The average root mean squared error (RMSE) between the
3CC estimates from [38] and their average Borg CR10 ground
truth data is 0.58, while ours to ground truth is 0.66. We find
the largest deviation of our model in experiment G2-H. We
believe this may be attributed to physiological and psychologi-
cal factors that play into the role of an individual’s perceived
fatigue rating [38]. As the minimum and maximum values for
G2H in Fig. 8 suggest, the variance of inter-individual Borg
ratings is high for this case. However, despite using no ground
truth human data for our calculations we achieve a similar
accuracy to [38] on their data by just fitting a single scaling
parameter
ϕ
to minimize the RMSE between our simulated
fatigue and the average human Borg CR10 data of [38].
Parameter Values
A strength of our approach is that we only fit a single scalar
parameter
ϕ
based on data. In summary, the following param-
eters were adjusted for our model: First, the
ω
parameters of
the reward function (Eq. 10) were adjusted empirically until
the simulations started to result in effective pointing behav-
iors. The
τP
tolerance for goal attainment (Eq. 11) was set
to a reasonable value that starts heavily penalizing deviations
of more than 15cm from the target. The
τF
was chosen to
provide a good combination of naturalness and efficiency of
movement (Fig. 7). The neural network weights were learned
by maximizing the cumulative RL reward. Finally, the scale
parameter
ϕ=0.3
was set to minimize the RMSE between
our simulated fatigue and the average human Borg CR10 data
of [38].
DISCUSSION
We make two main contributions to the HCI and ML com-
munity with our work: 1) a cumulative fatigue model for
Reinforcement Learning of movement tasks 2) an in silico
method for virtual user testing.
Figure 8. Results of predicting the Borg CR10 rating. Green: Upper/lower bound of ground truth. Yellow: Average of ground truth. Red: Average
3CC estimate of ground truth computed using motion capture data [38]. Black: Our simulation-based average 3CC-restimate. Our simulation model
yields similar modeling accuracy as [38], but does not require motion capture data.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 10
Cumulative Fatigue for Reinforcement Learning
With our results we have shown that RL-agents trained on
a cumulative fatigue model based on biomechical literature,
instead of instantaneous joint torques, learned more efficient
and relaxed policies. This can be utilized for new optimiza-
tion procedures in computer animation and robotics. To our
knowledge, our model is the first one to use cumulative effort
in such a way.
Reliable In Silico Subjective Fatigue Estimate
Our results confirmed that ground truth human movement data
is not necessarily needed to obtain reliable fatigue estimates
for our pointing task. To our knowledge this is the first method
to achieve this. We believe that our model can be utilized
for a multitude of HCI applications, where human data is
not readily available or expensive to record, and open new
pathways to virtual user testing. The advantage of such in
silico methods is that many physical properties can be reliably
modeled with standard games and physics engines [40, 66, 13,
15] making the prediction - in theory - more accurate than with
non-invasive in vivo methods. With this new environments,
e.g. effects of fatigue on the moon or under high pressure,
could be easily explored.
Limitations and Future Work
While our model shows overall good results and performance,
there are a number of limitations. Similar to [38], our proposed
method is based on the assumption that perceived fatigue can
be directly deduced from biomechanical information. In real-
ity however, an individual’s perceived fatigue can be attributed
to a multitude of different factors [68, 38], e.g. through physi-
ological and psychological changes. Previous studies [24, 38,
34, 8] have shown that individuals may experience fatigue and
rest differently than others. However, this could be mitigated
with extending the agents with models of intrinsic motivation
and emotion [50] in future work.
With our work we demonstrate the capability to predict fa-
tigue. However, predicting other variables may need more
complex models. Nevertheless, the basic deep RL frame-
work utilized in this paper should remain useful. For instance,
modeling the speed-accuracy trade-off of movements through
RL-controlled biomechanical simulation is an important topic
for future work; presently, there is no research replicating
classic Fitts’ law experiments in silico using biomechanical
forward models and neural network controllers trained without
reference movement data. We believe this is more due to the
lack of attention to the topic rather than limitations of deep
RL, and could possibly be tackled by incorporating aspects
such as signal-dependent noise [31] and delayed feedback [5]
to our model.
As an additional limitation, we only model shoulder fatigue.
However, it is a central concern in mid-air interaction, and only
modeling shoulder fatigue similar to [38] was essential to make
our results comparable with [38] (Fig. 8). Nevertheless, for
future work we find it would be interesting to model additional
hand joints or even the full body for more accurate fatigue
estimates.
Another limitation of our work is that for the trained agent
to generalize, it must experience the full variance of environ-
ments and tasks during training. In our case, we varied the
pointing targets, but more variation may be needed for other
applications. Furthermore, the learning process itself is also
time-consuming and laborious, and needs to be performed
independently for each policy. While it takes around 2 hours
on an i7 processor to learn pointing, training can take days for
other, more complex tasks [56].
CONCLUSION
We presented a framework for evaluating subjective fatigue
only using virtual embodied AI agents. The agents have been
trained on a pointing task using Reinforcement Learning. For
the training we compared two different effort models. First, us-
ing instantaneous joint torques; second, using a biomechanical
cumulative fatigue model. We showed that the model trained
with cumulative fatigue was able to learn more relaxed and
efficient movements. We believe this is the first work to use
cumulative fatigue in such a way. Finally, we used our trained
model to estimate fatigue ratings under various conditions
and compared the results with ground truth human data ob-
tained from a previous study [38]. Overall, our model showed
comparable results to ground truth Borg CR10 ratings and
3CC-estimates based on motion capture data, without using
any human movement data. To the best of our knowledge, this
is the first work to achieve this.
ACKNOWLEDGMENTS
This work was funded by the ITEA3 project MOSIM (grant
no. 01IS18060C), the Academy of Finland (grant no. 299358),
and an IMPRS-CS doctoral fellowship. The calculations pre-
sented in the paper were performed with the computational
resources provided by the Aalto Science-IT project.
REFERENCES
[1] Chris R Abbiss and Paul B Laursen. 2005. Models to
explain fatigue during prolonged endurance cycling.
Sports medicine 35, 10 (2005), 865–898.
[2] Mazen Al Borno, Martin De Lasa, and Aaron
Hertzmann. 2012. Trajectory optimization for full-body
movements with complex contacts. IEEE transactions
on visualization and computer graphics 19, 8 (2012),
1405–1414.
[3] Nikola Banovic, Tofi Buzali, Fanny Chevalier, Jennifer
Mankoff, and Anind K Dey. 2016. Modeling and
understanding human routine behavior. In Proceedings
of the 2016 CHI Conference on Human Factors in
Computing Systems. ACM, 248–260.
[4] Benjamin K Barry and Roger M Enoka. 2007. The
neurobiology of muscle fatigue: 15 years later.
Integrative and comparative biology 47, 4 (2007),
465–473.
[5] Dan Beamish, I Scott MacKenzie, and Jianhong Wu.
2006. Speed-accuracy trade-off in planned arm
movements with delayed feedback. Neural networks 19,
5 (2006), 582–599.
[6] Pradipta Biswas, Peter Robinson, and Patrick Langdon.
2012. Designing inclusive interfaces through user
modeling and simulation. International Journal of
Human-Computer Interaction 28, 1 (2012), 1–33.
[7] Gunnar Borg. 1990. Psychophysical scaling with
applications in physical work and the perception of
exertion. Scand J Work Environ Health 16, Suppl 1
(1990), 55–58.
[8] Gunnar A Borg. 1982. Psychophysical bases of
perceived exertion. Med sci sports exerc 14, 5 (1982),
377–381.
[9] Greg Brockman, Vicki Cheung, Ludwig Pettersson,
Jonas Schneider, John Schulman, Jie Tang, and
Wojciech Zaremba. 2016. Openai gym. arXiv preprint
arXiv:1606.01540 (2016).
[10] Stuart K Card, Allen Newell, and Thomas P Moran.
1983. The Psychology of Human-Computer Interaction.
(1983).
[11] Senthilkumar Chandramohan, Matthieu Geist, Fabrice
Lefevre, and Olivier Pietquin. 2011. User Simulation in
Dialogue Systems using Inverse Reinforcement
Learning. In Interspeech 2011. 1025–1028.
[12] Xiuli Chen, Gilles Bailly, Duncan P Brumby, Antti
Oulasvirta, and Andrew Howes. 2015. The emergence of
interactive behavior: A model of rational menu search.
In Proceedings of the 33rd annual ACM conference on
human factors in computing systems. ACM, 4217–4226.
[13] Erwin Coumans. 2015. Bullet Physics Simulation. In
ACM SIGGRAPH 2015 Courses (SIGGRAPH ’15).
ACM, New York, NY, USA, Article 7. DOI:
http://dx.doi.org/10.1145/2776880.2792704
[14] Paolo De Leva. 1996. Adjustments to
Zatsiorsky-Seluyanov’s segment inertia parameters.
Journal of biomechanics 29, 9 (1996), 1223–1230.
[15] Scott L Delp, Frank C Anderson, Allison S Arnold,
Peter Loan, Ayman Habib, Chand T John, Eran
Guendelman, and Darryl G Thelen. 2007. OpenSim:
open-source software to create and analyze dynamic
simulations of movement. IEEE transactions on
biomedical engineering 54, 11 (2007), 1940–1950.
[16] Richard HT Edwards. 1981. Human muscle function
and fatigue. In Ciba Found Symp, Vol. 82. Wiley Online
Library, 1–18.
[17] Roger M Enoka and Jacques Duchateau. 2008. Muscle
fatigue: what, why and how it influences muscle
function. The Journal of physiology 586, 1 (2008),
11–23.
[18] Roger Eston. 2012. Use of ratings of perceived exertion
in sports. International journal of sports physiology and
performance 7, 2 (2012), 175–182.
[19] Gerhard Fischer. 2001. User modeling in
human–computer interaction. User modeling and
user-adapted interaction 11, 1-2 (2001), 65–86.
[20] James Peter Fisher, Luke Carlson, James Steele, and
Dave Smith. 2014. The effects of pre-exhaustion,
exercise order, and rest intervals in a full-body resistance
training intervention. Applied Physiology, Nutrition, and
Metabolism 39, 11 (2014), 1265–1270.
[21] Paul M Fitts. 1954. The information capacity of the
human motor system in controlling the amplitude of
movement. Journal of experimental psychology 47, 6
(1954), 381.
[22] Robert H Fitts. 1994. Cellular mechanisms of muscle
fatigue. Physiological reviews 74, 1 (1994), 49–94.
[23] Laura A Frey Law and Keith G Avin. 2010. Endurance
time is joint-specific: a modelling and meta-analysis
investigation. Ergonomics 53, 1 (2010), 109–129.
[24] Laura A Frey-Law, Andrea Laake, Keith G Avin, Jesse
Heitsman, Tim Marler, and Karim Abdel-Malek. 2012a.
Knee and elbow 3D strength surfaces: peak
torque-angle-velocity relationships. Journal of applied
biomechanics 28, 6 (2012), 726–737.
[25] Laura A Frey-Law, John M Looft, and Jesse Heitsman.
2012b. A three-compartment muscle fatigue model
accurately predicts joint-specific maximum endurance
times for sustained isometric tasks. Journal of
biomechanics 45, 10 (2012), 1803–1808.
[26] Samuel J Gershman, Eric J Horvitz, and Joshua B
Tenenbaum. 2015. Computational rationality: A
converging paradigm for intelligence in brains, minds,
and machines. Science 349, 6245 (2015), 273–278.
[27]
Christian Guckelsberger, Christoph Salge, Jeremy Gow,
and Paul Cairns. 2017. Predicting player experience
without the player.: An exploratory study. In
Proceedings of the Annual Symposium on
Computer-Human Interaction in Play. ACM, 305–315.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 11
[28] Stefan Freyr Gudmundsson, Philipp Eisen, Erik
Poromaa, Alex Nodet, Sami Purmonen, Bartlomiej
Kozakowski, Richard Meurling, and Lele Cao. 2018.
Human-like playtesting with deep learning. In 2018
IEEE Conference on Computational Intelligence and
Games (CIG). IEEE, 1–8.
[29] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen,
George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar,
Henry Zhu, Abhishek Gupta, Pieter Abbeel, and others.
2018. Soft actor-critic algorithms and applications.
arXiv preprint arXiv:1812.05905 (2018).
[30]
Patricia A Hageman, Debra K Mason, Kelly W Rydlund,
and Scott A Humpal. 1989. Effects of position and speed
on eccentric and concentric isokinetic testing of the
shoulder rotators. Journal of Orthopaedic & Sports
Physical Therapy 11, 2 (1989), 64–69.
[31] Christopher M Harris and Daniel M Wolpert. 1998.
Signal-dependent noise determines motor planning.
Nature 394, 6695 (1998), 780.
[32] Matthew Hausknecht and Peter Stone. 2015. Deep
recurrent q-learning for partially observable mdps. In
2015 AAAI Fall Symposium Series.
[33]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle
Pineau, Doina Precup, and David Meger. 2018. Deep
reinforcement learning that matters. In Thirty-Second
AAAI Conference on Artificial Intelligence.
[34] Juan David Hincapié-Ramos, Xiang Guo, Paymahn
Moghadasian, and Pourang Irani. 2014. Consumed
endurance: a metric to quantify arm fatigue of mid-air
interactions. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems. ACM,
1063–1072.
[35] Marina Hofmann, R Brger, Ninja Frost, Julia
Karremann, Jule Keller-Bacher, Stefanie Kraft, Gerd
Bruder, and Frank Steinicke. 2013. Comparing 3d
interaction performance in comfortable and
uncomfortable regions. In Proceedings of the
GI-Workshop VR/AR. 3–14.
[36] Daniel Imbeau, Bruno Farbos, and others. 2006.
Percentile values for determining maximum endurance
times for static muscular work. International Journal of
Industrial Ergonomics 36, 2 (2006), 99–108.
[37] ISO ISO. 2000. 9241-9 Ergonomic requirements for
office work with visual display terminals (VDTs)-Part 9:
Requirements for non-keyboard input devices
(FDIS-Final Draft International Standard), 2000.
International Organization for Standardization (2000).
[38]
Sujin Jang, Wolfgang Stuerzlinger, Satyajit Ambike, and
Karthik Ramani. 2017. Modeling cumulative arm
fatigue in mid-air interaction based on perceived
exertion and kinetics of arm motion. In Proceedings of
the 2017 CHI Conference on Human Factors in
Computing Systems. ACM, 3328–3339.
[39] Yifeng Jiang, Tom Van Wouwe, Friedl De Groote, and
C Karen Liu. 2019. Synthesis of Biologically Realistic
Human Motion Using Joint Torque Actuation. arXiv
preprint arXiv:1904.13041 (2019).
[40]
Arthur Juliani, Vincent-Pierre Berges, Esh Vckay, Yuan
Gao, Hunter Henry, Marwan Mattar, and Danny Lange.
2018. Unity: A general platform for intelligent agents.
arXiv preprint arXiv:1809.02627 (2018).
[41] Antti Kangasrääsiö, Kumaripaba Athukorala, Andrew
Howes, Jukka Corander, Samuel Kaski, and Antti
Oulasvirta. 2017. Inferring cognitive models from data
using approximate Bayesian computation. In
Proceedings of the 2017 CHI conference on human
factors in computing systems. ACM, 1295–1306.
[42] Seunghwan Lee, Moonseok Park, Kyoungmin Lee, and
Jehee Lee. 2019. Scalable muscle-actuated human
simulation and control. ACM Transactions on Graphics
(TOG) 38, 4 (2019), 73.
[43] Katri Leino, Antti Oulasvirta, Mikko Kurimo, and
others. 2019. RL-KLM: automating keystroke-level
modeling with reinforcement learning.. In IUI. 476–480.
[44] Esther Levin, Roberto Pieraccini, and Wieland Eckert.
1998. Using Markov decision process for learning
dialogue strategies. In Proceedings of the 1998 IEEE
International Conference on Acoustics, Speech and
Signal Processing, ICASSP’98 (Cat. No. 98CH36181),
Vol. 1. IEEE, 201–204.
[45] Richard L Lewis, Andrew Howes, and Satinder Singh.
2014. Computational rationality: Linking mechanism
and behavior through bounded utility maximization.
Topics in cognitive science 6, 2 (2014), 279–311.
[46] Jing Z Liu, Robert W Brown, and Guang H Yue. 2002.
A dynamical model of muscle activation, fatigue, and
recovery. Biophysical journal 82, 5 (2002), 2344–2359.
[47] John M Looft, Nicole Herkert, and Laura Frey-Law.
2018. Modification of a three-compartment muscle
fatigue model to predict peak torque decline during
intermittent tasks. Journal of biomechanics 77 (2018),
16–25.
[48] I Scott MacKenzie. 1992. Fitts’ law as a research and
design tool in human-computer interaction.
Human-computer interaction 7, 1 (1992), 91–139.
[49] Volodymyr Mnih, Koray Kavukcuoglu, David Silver,
Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex
Graves, Martin Riedmiller, Andreas K Fidjeland, Georg
Ostrovski, and others. 2015. Human-level control
through deep reinforcement learning. Nature 518, 7540
(2015), 529.
[50] Thomas M Moerland, Joost Broekens, and Catholijn M
Jonker. 2018. Emotion in reinforcement learning agents
and robots: a survey. Machine Learning 107, 2 (2018),
443–480.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 12
[51]
Roberto A. Montano Murillo, Sriram Subramanian, and
Diego Martinez Plasencia. 2017. Erg-O: Ergonomic
Optimization of Immersive Virtual Environments. In
Proceedings of the 30th Annual ACM Symposium on
User Interface Software and Technology (UIST ’17).
ACM, New York, NY, USA, 759–771. DOI:
http://dx.doi.org/10.1145/3126594.3126605
[52] Kourosh Naderi, Joose Rajamäki, and Perttu
Hämäläinen. 2017. Discovering and synthesizing
humanoid climbing movements. ACM Transactions on
Graphics (TOG) 36, 4 (2017), 43.
[53] Bruce J Noble. 1982. Clinical applications of perceived
exertion. Medicine and science in sports and exercise 14,
5 (1982), 406–411.
[54] Antti Oulasvirta. 2017. User interface design with
combinatorial optimization. Computer 50, 1 (2017),
40–47.
[55]
Antti Oulasvirta, Xiaojun Bi, and Andrew Howes. 2018.
Computational interaction. Oxford University Press.
[56] Xue Bin Peng, Pieter Abbeel, Sergey Levine, and
Michiel van de Panne. 2018. Deepmimic:
Example-guided deep reinforcement learning of
physics-based character skills. ACM Transactions on
Graphics (TOG) 37, 4 (2018), 143.
[57] Rositsa Raikova. 1992. A general approach for
modelling and mathematical investigation of the human
upper limb. Journal of biomechanics 25, 8 (1992),
857–867.
[58] Joose Rajamäki and Perttu Hämäläinen. 2017.
Augmenting sampling based controllers with machine
learning. In Proceedings of the ACM
SIGGRAPH/Eurographics Symposium on Computer
Animation. ACM, 11.
[59] Prajit Ramachandran, Barret Zoph, and Quoc V Le.
2017. Searching for activation functions. arXiv preprint
arXiv:1710.05941 (2017).
[60]
Walter Rohmert. 1960. Ermittlung von Erholungspausen
für statische Arbeit des Menschen. European Journal of
Applied Physiology and Occupational Physiology 18, 2
(1960), 123–164.
[61] Shaghayegh Roohi, Jari Takatalo, Christian
Guckelsberger, and Perttu Hämäläinen. 2018. Review of
intrinsic motivation in simulation-based game testing. In
Proceedings of the 2018 CHI Conference on Human
Factors in Computing Systems. ACM, 347.
[62] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec
Radford, and Oleg Klimov. 2017. Proximal policy
optimization algorithms. arXiv preprint
arXiv:1707.06347 (2017).
[63] Richard S Sutton and Andrew G Barto. 2018.
Reinforcement learning: An introduction. MIT press.
[64] Robert J Teather and Wolfgang Stuerzlinger. 2011.
Pointing at 3D targets in a stereo head-tracked virtual
environment. In 2011 IEEE Symposium on 3D User
Interfaces (3DUI). IEEE, 87–94.
[65] Kristinn Thórisson and Helgi Helgasson. 2012.
Cognitive architectures and autonomy: A comparative
review. Journal of Artificial General Intelligence 3, 2
(2012), 1–30.
[66] Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012.
Mujoco: A physics engine for model-based control. In
2012 IEEE/RSJ International Conference on Intelligent
Robots and Systems. IEEE, 5026–5033.
[67] Jack M Wang, Samuel R Hamner, Scott L Delp, and
Vladlen Koltun. 2012. Optimizing locomotion
controllers using biologically-based actuators and
objectives. ACM transactions on graphics 31, 4 (2012).
[68] Ting Xia and Laura A Frey Law. 2008. A theoretical
approach for modeling peripheral muscle fatigue and
recovery. Journal of biomechanics 41, 14 (2008),
3046–3052.
[69] Qian Yang, Nikola Banovic, and John Zimmerman.
2018. Mapping machine learning advances from hci
research to reveal starting places for design innovation.
In Proceedings of the 2018 CHI Conference on Human
Factors in Computing Systems. ACM, 130.
[70] Georgios N Yannakakis, Pieter Spronck, Daniele
Loiacono, and Elisabeth André. 2013. Player modeling.
Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[71] Zhengyou Zhang. 2012. Microsoft kinect sensor and its
effect. IEEE multimedia 19, 2 (2012), 4–10.
[72] Alexander Zook, Brent Harrison, and Mark O Riedl.
2015. Monte-carlo tree search for simulation-based
strategy analysis. In Proceedings of the 10th Conference
on the Foundations of Digital Games.
CHI 2020 Paper
CHI 2020, April 25–30, 2020, Honolulu, HI, USA
Paper 572
Page 13
... Especially newer interaction devices often require the user to perform complex motions − think of the Microsoft Kinect, large touch-screen displays, or virtual reality applications. Understanding these motion patterns, their potential and limitations, is necessary for effective interaction design [5,17,31,44]. ...
... There is evidence that humans use such strategy across domains, such as in causal reasoning [20] or perception [31]. In human-computer interaction, researchers have leveraged RL to automate the sequence of user actions in a KLM framework [49] or to predict fatigue in volumetric movements [11]. It was also used to explain search behavior in user interfaces [83] or menus [13] and as a model for multitasking [41]. ...
Preprint
Full-text available
The goal of Adaptive UIs is to automatically change an interface so that the UI better supports users in their tasks. A core challenge is to infer user intent from user input and chose adaptations accordingly. Designing effective online UI adaptations is challenging because it relies on tediously hand-crafted rules or carefully collected, high-quality user data. To overcome these challenges, we formulate UI adaptation as a multi-agent reinforcement learning problem. In our formulation, a user agent learns to interact with a UI to complete a task. Simultaneously, an interface agent learns UI adaptations to maximize the user agent's performance. The interface agent is agnostic to the goal. It learns the task structure from the behavior of the user agent and based on that can support the user agent in completing its task. We show that our approach leads to a significant reduction in necessary number of actions on a photo editing task in silico. Furthermore, our user studies demonstrate the generalization capabilities of our interface agent from a simulated user agent to real users.
Article
Through two controlled experiments, including a pie menu study and a target acquisition study, this paper investigates children’s performance on mid-air gesture interactions in different spatial constraints (i.e., in different orientations/distances), as well as the age effect on such interaction scenarios. The first experiment recorded children’s speeds and accuracies following certain directions under menus with different numbers of items, while the second evaluated the speed-accuracy trade-off (SAT) of children’s arm movements. We also compared performance differences between two age-related groups (i.e., 6-8 years old and 9-12 years old). Based on these experiments, we propose an improved design for UI menus based on mid-air gesture interaction for children. The improved design provides suggestions for setting appropriate directions and difficulty indexes, which makes it much easier and quicker for children to use the menus with mid-air interaction.
Chapter
Due to time and space constraints and limited resources, clinical nursing education in real-life environments has limitations for sufficient learning. In this paper, we propose a novel interaction method for immersive virtual simulations in clinical nursing that considers the characteristics and process of each to enable learners to learn and perform realistic nursing skills. First, (1) hand gestures and speech modalities can be used as input modalities according to each type of skill, and nursing skills can be performed while the degree of behavioral freedom is ensured. (2) The timing and method of evaluating the interaction of the skills to be learned vary depending on how they are performed through the analysis of hand gestures and speech input. (3) In addition, depending on the evaluation, this method can provide immediate feedback (e.g., audio-visual feedback) as well as feedback about accumulated nursing skills interaction, which can maximize the learning effects (e.g., additional quizzes and final clinical nursing results). With the actual nursing skill implemented by the proposed method, the feasibility of immersive VR nursing practice based on the pediatric pneumonia scenario was verified through a pilot study. As a result, high scores were obtained in a simple questionnaire evaluation, and the feasibility was confirmed through an interview evaluation of both learners and instructors. With the proposed method, it is expected that the quality of clinical nursing education can be improved while learning the details and the overall flow of nursing skills through realistic interactions.
Conference Paper
Full-text available
We present an approach to learn and deploy human-like playtesting in computer games based on deep learning from player data. We are able to learn and predict the most “human” action in a given position through supervised learning on a convolutional neural network. Furthermore, we show how we can use the learned network to predict key metrics of new content — most notably the difficulty of levels. Our player data and empirical data come from Candy Crush Saga (CCS) and Candy Crush Soda Saga (CCSS). However, the method is general and well suited for many games, in particular where content creation is sequential. CCS and CCSS are non-deterministic match-3 puzzle games with multiple game modes spread over a few thousand levels, providing a diverse testbed for this technique. Compared to Monte Carlo Tree Search (MCTS) we show that this approach increases correlation with average level difficulty, giving more accurate predictions as well as requiring only a fraction of the computation time.
Conference Paper
Full-text available
HCI has become particularly interested in using machine learning (ML) to improve user experience (UX). However, some design researchers claim that there is a lack of design innovation in envisioning how ML might improve UX. We investigate this claim by analyzing 2,494 related HCI research publications. Our review confirmed a lack of research integrating UX and ML. To help span this gap, we mined our corpus to generate a topic landscape, mapping out 7 clusters of ML technical capabilities within HCI. Among them we identified 3 under-explored clusters that design researchers can dig in and create sensitizing concepts for. To help operationalize these technical design materials, our analysis then identified value channels through which the technical capabilities can provide value for users: self, context, optimal, and utility-capability. The clusters and the value channels collectively mark starting places for envisioning new ways for ML technology to improve people's lives.
Conference Paper
Full-text available
Interaction in VR involves large body movements, easily inducing fatigue and discomfort. We propose Erg-O, a manipulation technique that leverages visual dominance to maintain the visual location of the elements in VR, while making them accessible from more comfortable locations. Our solution works in an open-ended fashion (no prior knowledge of the object the user wants to touch), can be used with multiple objects, and still allows interaction with any other point within user's reach. We use optimization approaches to compute the best physical location to interact with each visual element, and space partitioning techniques to distort the visual and physical spaces based on those mappings and allow multi-object retargeting. In this paper we describe the Erg-O technique, propose two retargeting strategies and report the results from a user study on 3D selection under different conditions, elaborating on their potential and application to specific usage scenarios.
Conference Paper
Full-text available
A key challenge of procedural content generation (PCG) is to evoke a certain player experience (PX), when we have no direct control over the content which gives rise to that experience. We argue that neither the rigorous methods to assess PX in HCI, nor specialised methods in PCG are sufficient, because they rely on a human in the loop. We propose to address this shortcoming by means of computational models of intrinsic motivation and AI game-playing agents. We hypothesise that our approach could be used to automatically predict PX across games and content types without relying on a human player or designer. We conduct an exploratory study in level generation based on empowerment, a specific model of intrinsic motivation. Based on a thematic analysis, we find that empowerment can be used to create levels with qualitatively different PX. We relate the identified experiences to established theories of PX in HCI and game design, and discuss next steps.
Article
Many anatomical factors, such as bone geometry and muscle condition, interact to affect human movements. This work aims to build a comprehensive musculoskeletal model and its control system that reproduces realistic human movements driven by muscle contraction dynamics. The variations in the anatomic model generate a spectrum of human movements ranging from typical to highly stylistic movements. To do so, we discuss scalable and reliable simulation of anatomical features, robust control of under-actuated dynamical systems based on deep reinforcement learning, and modeling of pose-dependent joint limits. The key technical contribution is a scalable, two-level imitation learning algorithm that can deal with a comprehensive full-body musculoskeletal model with 346 muscles. We demonstrate the predictive simulation of dynamic motor skills under anatomical conditions including bone deformity, muscle weakness, contracture, and the use of a prosthesis. We also simulate various pathological gaits and predictively visualize how orthopedic surgeries improve post-operative gaits.
Conference Paper
The Keystroke-Level Model (KLM) is a popular model for predicting users' task completion times with graphical user interfaces. KLM predicts task completion times as a linear function of elementary operators. However, the policy, or the assumed sequence of the operators that the user executes, needs to be prespeciffed by the analyst. This paper investigates Reinforcement Learning (RL) as an algorithmic method to obtain the policy automatically. We define the KLM as an Markov Decision Process, and show that when solved with RL methods, this approach yields user-like policies in simple but realistic interaction tasks. RL-KLM offers a quick way to obtain a global upper bound for user performance. It opens up new possibilities to use KLM in computational interaction. However, scalability and validity remain open issues.
Book
This book presents computational interaction as an approach to explaining and enhancing the interaction between humans and information technology. Computational interaction applies abstraction, automation, and analysis to inform our understanding of the structure of interaction and also to inform the design of the software that drives new and exciting human-computer interfaces. The methods of computational interaction allow, for example, designers to identify user interfaces that are optimal against some objective criteria. They also allow software engineers to build interactive systems that adapt their behaviour to better suit individual capacities and preferences. Embedded in an iterative design process, computational interaction has the potential to complement human strengths and provide methods for generating inspiring and elegant designs. Computational interaction does not exclude the messy and complicated behaviour of humans, rather it embraces it by, for example, using models that are sensitive to uncertainty and that capture subtle variations between individual users. It also promotes the idea that there are many aspects of interaction that can be augmented by algorithms. This book introduces computational interaction design to the reader by exploring a wide range of computational interaction techniques, strategies and methods. It explains how techniques such as optimisation, economic modelling, machine learning, control theory, formal methods, cognitive models and statistical language processing can be used to model interaction and design more expressive, efficient and versatile interaction.
Article
This study aimed to test whether adding a rest recovery parameter, r, to the analytical three-compartment controller (3CC) fatigue model (Xia and Frey Law, 2008) will improve fatigue estimates during intermittent contractions. The 3CC muscle fatigue model uses differential equations to predict the flow of muscle between three muscle states: Resting (MR), Active (MA), and Fatigued (MF). This model uses a feedback controller to match the active state to target loads and two joint-specific parameters: F, fatigue rate controlling flow from active to fatigued compartments) and R, the recovery rate controlling flow from the fatigued to the resting compartments. This model does well to predict intensity-endurance time curves for sustained isometric tasks. However, previous studies find when rest intervals are present that the model over predicts fatigue. Intermittent rest periods would allow for the occurrence of subsequent reactive vasodilation and post-contraction hyperemia. We hypothesize a modified 3CC-r fatigue model will improve predictions of force decay during intermittent contractions with the addition of a rest recovery parameter, r, to augment recovery during rest intervals, representing muscle re-perfusion. A meta-analysis compiling intermittent fatigue data from 63 publications reporting decline in peak torque (% torque decline) were used for comparison. The original model over-predicted fatigue development from 19 to 29% torque decline; the addition of a rest multiplier significantly improved fatigue estimates to 6-10% torque decline. We conclude the addition of a rest multiplier to the three-compartment controller fatigue model provides a physiologically consistent modification for tasks involving rest intervals, resulting in improved estimates of muscle fatigue.
Conference Paper
This paper presents a review of intrinsic motivation in player modeling, with a focus on simulation-based game testing. Modern AI agents can learn to win many games; from a game testing perspective, a remaining research problem is how to model the aspects of human player behavior not explained by purely rational and goal-driven decision making. A major piece of this puzzle is constituted by intrinsic motivations, i.e., psychological needs that drive behavior without extrinsic reinforcement such as game score. We first review the common intrinsic motivations discussed in player psychology research and artificial intelligence, and then proceed to systematically review how the various motivations have been implemented in simulated player agents. Our work reveals that although motivations such as competence and curiosity have been studied in AI, work on utilizing them in simulation-based game testing is sparse, and other motivations such as social relatedness, immersion, and domination appear particularly underexplored.
Article
A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.