Conference PaperPDF Available

A POMDP framework for modelling human interaction with assistive robots

Authors:
  • Dubai Futue Labs
Conference Paper

A POMDP framework for modelling human interaction with assistive robots

Abstract and Figures

This paper presents a framework for modelling the interaction between a human operator and a robotic device, that enables the robot to collaborate with the human to jointly accomplish tasks. States of the system are captured in a model based on a partially observable Markov decision process (POMDP). States representing the human operator are motivated by behaviours from the psychology of the human action cycle. Hierarchical nature of these states allows the exploitation of data structures based on algebraic decision diagrams (ADD) to efficiently solve the resulting POMDP. The proposed framework is illustrated using two examples from assistive robotics; a robotic wheel chair and an intelligent walking device. Experimental results from trials conducted in an office environment with the wheelchair is used to demonstrate the proposed technique.
Content may be subject to copyright.
A POMDP Framework for Modelling Human Interaction with Assistive
Robots
Tarek Taha, Jaime Valls Mir´
o and Gamini Dissanayake
Abstract This paper presents a framework for modelling
the interaction between a human operator and a robotic
device, that enables the robot to collaborate with the human
to jointly accomplish tasks. States of the system are captured
in a model based on a partially observable Markov decision
process (POMDP). States representing the human operator are
motivated by behaviours from the psychology of the human
action cycle. Hierarchical nature of these states allows the
exploitation of data structures based on algebraic decision
diagrams (ADD) to efficiently solve the resulting POMDP. The
proposed framework is illustrated using two examples from as-
sistive robotics; a robotic wheel chair and an intelligent walking
device. Experimental results from trials conducted in an office
environment with the wheelchair is used to demonstrate the
proposed technique.
I. BACK GROU ND
Human-robot interaction (HRI) is a branch of the robotics
science that focuses on modelling, implementing and eval-
uating the collaboration between robotic systems and hu-
man partners to produce “human helper” systems that are
practical, efficient and accepted as well as enjoyable. HRI
has evolved rapidly from systems with only a limited tele-
operation capabilities and sometimes with simple video
feedback. The increasing number of application domains in
which robots can/will be deployed in the future and the
presence of humans in many of these domains are the main
motivations driving further developments in HRI.
Modelling human behaviour is a challenging and a com-
plex task and it is perhaps unreasonable to expect artificial
systems that are able to model humans in great detail
using the probabilistic models that are currently available.
However, a better understanding of the underlying interaction
and the psychological states of humans during that interac-
tion can be obtained by examining studies in related fields
such as psychology. These can then be used to motivate
the development of models that are adequate and tractable
to realise the objective of building intelligent devices that
can effectively collaborate with humans. Remainder of this
section is devoted to providing a background to the human
action cycle, and the literature on the use of POMDPs in
modelling human robot interaction. This is followed by a
brief outline of the organisation of this paper.
This work is supported by the Australian Research Council (ARC)
through its Centre of Excellence programme, and by the New South Wales
State Government. The ARC Centre of Excellence for Autonomous Systems
(CAS) is a partnership between the University of Technology Sydney, the
University of Sydney and the University of New South Wales.
All authors are with the ARC Centre of Excellence
for Autonomous Systems (CAS), Faculty of Engineering,
University of Technology Sydney (UTS), NSW2007, Australia
{t.taha,j.vallsmiro,g.dissanayake}@cas.edu.au
Fig. 1. The Seven Stages of Action [1].
A. Human Action Cycle
The “Human Action Cycle“ described by Norman in “The
Design of Everyday Things” [1] provides the framework
humans go through to plan, execute and evaluate activities or
tasks. The “Stages of Execution and Evaluation” captures the
human behaviour in this context where the human usually
starts by defining what needs to be achieved, then plan a
sequence of intended actions to be performed. During plan
execution the consequences of the actions taken are examined
and evaluated. The evaluation stage starts with the human
perception of the world which is first interpreted according
to expectations and then compared/evaluated with respect to
both the intentions and goals. The seven stages of action
that captures these ideas is described in Figure 1. Norman
used this concept to develop guidelines on how to engineer
a device and how to evaluate it’s performance and design an
intuitive interaction/interface layer.
Relationship between intention and planning also needs
to be examined to further exploit the work of Norman in
HRI. Bratman, in his books “Faces of Intentions” [2] and
“Intention, Plans, and Practical Reason” [3] develops and
explains the theory of intention. According to Bartman,
intentions are treated as elements of partial plans of action.
These plans are considered a core component of practical
reasoning, and can be thought of as roles needed to support
the structure of human activities over time. Bratman presents
impact of these ideas on a wide range of issues, including
the relationship between intention and intentional action,
and the distinction between intended and expected effects of
what one intends. Bratman also elaborates the commitment
involved in intending, and explores its implications on the
fundamental understanding of temptation and self-control,
shared intention and shared cooperative activity, and moral
responsibility. Selection of the states for representing human
status and the hierarchical relationships between these states
presented in this paper are motivated by the work of Norman
and Bratman described above.
B. POMDPs in HRI
POMDPs have been especially effective in health care
and assistance applications. For instance in [4], a real-time
system to assist persons with dementia during handwashing
was presented. The assistance was given in the form of
verbal and visual prompts, or through the call for a hu-
man caregiver’s help. The system used only video inputs,
and combined a Bayesian sequential estimation framework
to track hands and towels. The system was successful in
estimating user states, such as awareness, responsiveness and
overall dementia level.
This paper exploits the conditional independence between
some of the states that represent the human behaviour,
making it possible to use techniques based on algebraic de-
cision diagrams (ADDs) to simplify and solve the POMDPs
generated. ADDs [5] are a general case of BDD (binary
decision diagrams).
The remainder of the paper is organised as follows. The
proposed strategy for modelling the human interaction layer
is presented in section II. Section III illustrates how this
framework can be used to efficiently solve human robot
interaction associated with two assistive robotic applications.
This is followed by results from experiments with a robotic
wheelchair in section IV, and the conclusions are presented
in section V.
II. A HUMAN AWAR E POMDP MOD EL FO R HRI
It is proposed to describe the interaction between a human
and a robot in the following way. The human starts the
interaction with a robotic agent by indicating a task that the
robotic agent is capable of performing. The human then plans
a sequence of intentions (these can be voice commands,
hand gestures, joystick indication, touch screen input...) to
refer to direct actions that the robot should perform to
achieve the task; or provides a global goal encapsulating a
sequence of actions (for example issuing a command such as
”get me a cup of water”). During the action execution, the
human will evaluate the interaction and reveal a feedback
in the form of satisfaction with the ingoing interaction, thus
allowing the robot to enhance or re-evaluate the actions taken
to better comply with the human’s needs. Moreover, the
robot evaluates the situation through access to the human’s
status (such as the competence level of the user) in order to
adapt its behaviour and preserve the natural social aspects
of this collaboration. Conceptually, it is proposed to extend
the classical two gulf model of seven stages of action in
the literature [1] by incorporating a decision making block
Environment
Preceiving
Environment
Executing
Action
Action Plan
Evaluating
Perception
HAPOMDP
Interpreting
Perception
Evaluation gulf Execution gulf
Interaction gulf
Fig. 2. An extension to the seven stages of action theory to include a
third gulf (Interaction gulf) acting as a communication bridge between the
human and the robot.
based on a POMDP as depicted in Figure 2, to capture the
communication layer that facilitates the flow and interpreta-
tion of information between the human and the robot. This
information flow can then be translated into an action plan
executed by the robot, and supervised/guided by the human.
It is proposed that variables “Tasks”, “Intentions”, “Satis-
faction” and “Status” be used to model the interaction within
the POMDP. Subsection II-A discusses interaction variables
in detail.
A. HRI Variables
As discussed variables such as user’s satisfaction and
intention are important to accurately recognise the user’s
needs and measure the success of the interaction. The
proposed model contains four HRI relevant state variables
that are described below. The states can be factored into
a set of variables that represent the natural layers of the
HRI structure, making it possible to simplify the resulting
POMDP. It is important to note that the state space can also
include other variables relevant to the application at hand as
will be discussed in section III.
1) Intention, In:In much of the previous work on HRI,
intention and plan recognition are used as if they represent
the same thing. However, in practice, intentions and plans
should be treated separately for a better understanding of
the user’s intention. From a philosophical point of view [6],
[7] intentions is the attitude that directs future long term
planning. In this work intentions are used to direct an
intermediate action and a set of these actions determine a
task. Therefore, intention represents a variable that helps in
selecting an immediate action to achieve a long term task or
plan.
2) Satisfaction, Sa:This variable represents the user’s
satisfaction in the outcome of the collaboration with the
robot. This indicates the level of success in the interaction.
The combination of the intention variable and the satisfaction
variable provides a powerful mix that can adapt and perfect
the interaction. The satisfaction variable can also be thought
of as a switch that when modelled properly enables the
human interacting with the robot to dictate a change of
plan or stop an action without having to physically reset
the system or restart the algorithm. It can also work as
a reward function that awards the robot for executing the
correct action, and penalises it when the wrong action has
been executed.
3) Status, St:This variable defines the status of the
users, for example the awareness level, the load/stress level,
response level and competence level. Identifying this vari-
able is essential in determining the level of confidence in
the user’s contribution to the planning process and/or the
amount of assistance the human requires from the robotic
system. This variable is very useful when modelling assistive
robotic applications where one can not always rely on the
human to give an informative and clear indication of his/her
desires. For instance, people with dementia are not capable
of recognising or remembering their requirements or current
state, and wheelchair drivers with a hand tremor are usually
unable to give a noise-free joystick input to indicate their
destination. Status could be observed using external sensors,
predicted based on prior domain knowledge or manually
specified as inputs to the system.
4) Tasks, T a:This variable represents the set of missions
to be executed during the collaboration. This, for example,
could be a navigation task, a walking assistance task, a
human following task, assisting with door opening or even a
calendar notifications reminding the user to take medicine
or watch his favourite TV show. The tasks variable can
be also be used as a switching parameter that facilitates
the modelling of a multi-task sequential decision making
process.
B. Modelling and Solving the POMDP
Typically, three steps are necessary to produce a solution
for a problem formulated within the POMDP framework.
The first step involves specifying the model. In the proposed
POMDP structure, this includes:
Determining the set of tasks the system can perform
Determining the set of intentions,status and
satisfaction
Defining the observations and with what kind of sensors
should be used to obtain them
Determining the actions that the system can execute
Defining the reward function
Specifying the conditional dependencies between the
variables
The second step is to learn the transition and observation
functions from a set of training data. The third step involves
solving the resulting POMDP model to obtain the optimal
policy, which translates observations to the most rewarding
actions.
Conditional independence between some of the state vari-
ables makes it possible to develop a concise ADD repre-
sentation [5] of the resulting POMDP. Before building the
Ta
Sa
In
O
t-1 t
R
O
St
Ta'
Sa'
In'
St'
O O'
O'
O'
Ta Ta
In
Sa
In
Sa
A
RA
S
O
S'
O'
t-1 t
(a)
(b)
Fig. 3. Progression from (a) the classical POMDP structure, to (b) a 2-slice
DBN with the interaction layer/gulf.
ADD, the temporal effect of actions on variables should be
presented properly. Actions usually have a direct effect on
variables under certain conditions and these implicitly deter-
mine state transition. A DBN (Dynamic Bayesian Network)
can be used to represent the effect of each action aA
on the variables probabilistically. If two set of variables
X={X1, ..., Xn}and X0={X0
1, ..., X0
n}are used to
present the pre and post action state of the system, directed
arcs from a variable Xto X0indicate the probability of
variable X0given it’s parent Xafter the execution of action
a. Typically, a CPT (conditional probability table) is needed
for each post-action variable X0
1to specify the probability
of the variable after action agiven it’s parents. However,
ADDs can be used instead to represent the structure by
exploiting the regularities in the CPTs [8]. Reward function
can also be represented similarly. Once the model is defined
as a collection of ADDs the dynamics of the system in the
factored form will become:
T(s, a, s
0) = T(< x1, x2, ..., xn>, a, < x
0
1, ..., x
0
n>)
O(a, s
0
, o) = O(a, < x
0
1, ..., x
0
n>, o)
Pa,o =T(< x1, ..., xn>, a, < x
0
1, ..., x
0
n>)
O(a, < x
0
1, ..., x
0
n>, o)
The generic interaction model described above assumes
that the status of a user determines her/his ability to provide
an intention, while intentions define the task in hand,
and the task is evaluated by the satisfaction. The tasks,
intentions, and satisfactions can be observed by external
non-intrusive sensors (cameras, temperature sensors ...) or by
intrusive sensors (like joysticks, buttons, touch screens ...).
This model is illustrated in Fig. 3, where the state space
is factored into a set of the four variables: status (St),
intention (In), tasks (T a) and satisf action (Sa). The
arcs in this figure reflect the following concept: intention
(In), task (T a) and satisfaction (Sa) are not directly observ-
able, but inferred from observations. Changes in satisfaction
(Sa)can be caused by the task (T a) independent of the user”s
Fig. 4. The robotic wheelchair platform used in the experiments.
direct status/ability (St) or intention (I n). The task (T a) is
determined from the user’s intention(In) which is in turn
affected by the user’s ability to give that indication (St). Intra
slice arc connections show that the Task (T a) at time t1can
have effect on the intention (In), task (T a) and satisfaction
(Sa) in time slice t. Reward is given for intentions (In)
that lead to the correct task execution(T a) and result in user
satisfaction(Sa). This model structure acts as a guideline and
is not strict. It can be adapted to different applications if
needed. The variable sets, however, are assumed constant.
With this factored representation, the transition function of
the model in Fig. 3 can be represented as:
P r(S0|S, A) = P r(St0, I n0, T a0, Sa0|St, In, T a, S a)
=P r(St0|St, A)P r (In0|St0, In, T a, A)
P r(T a0|In0, I n, Ta, A)
P r(Sa0|T a0, T a, Sa, A)
Once the POMDP model variables are specified and the
transition links between them are defined, the model states
are translated into the appropriate ADD representation using
the format developed by Hoey [8]. The model is then solved
offline using the symbolic Perseus method [9] to obtain
an optimal policy. When operating on-line, observations are
used to update the state beliefs which are then used with the
optimal policy to generate the most rewarding action.
III. SEL EC TE D APPLICATIONS
Two robotic systems have been selected to demonstrate
the viability of POMDP modelling for HRI interaction with
the proposed framework.
A. Instrumented Wheelchair
The robotic wheelchair used for this example is illustrated
in Fig. 4. The aim of this application is to ensure that
the interaction appears transparent to the user, demanding
a minimal input from her/him to automatically perform the
required fine motion control to take the user to an intended
final destination. This is to be achieved by observing the
joystick input to determine the user’s immediate intended
action, and combine this with knowledge of the current
Ta
Sa
In
t-1 t
R
Joy
St
Ta'
Sa'
In'
St'
Joy'
A
Loc Loc'
Fig. 5. Two slice DBN where: (T a) is the variable set, (Sa) is the user’s
satisfaction, (Int) is the user’s intention, (St) is the user’s status, (Joy) is
the user’s joystick observation, and (Loc) is the location.
TABLE I
WHEELCHAIR MOD EL VARIABLES
State Variables Values
Intention Right, Left, Up, Down, Nothing
Satisfaction Satisfied, Unsatisfied
Status Competent, Struggling, Reliant
Task Navigation
Joystick Up, Down, Right, Left, Nothing
Actions North, South, East, West, Stop
location to predict the final destination. At any time, the
user is allowed to expresses his/her dissatisfaction of the
interaction by giving a joystick input that is different from the
direction of motion, to trigger an automatic action correction
followed by a new action selection.
The proposed framework results in the 2-slice DBN shown
in Fig. 5. The user input is detected through the joystick
observation (Joy). The intention (Int) and the location (loc)
are then used to determine the user’s destination in the
(tasks) set. As described above, the joystick observation is
also used to indicate user dissatisfaction.
The set of variables defining the POMDP model, listed in
Table I,is described below.
Task (T a): The task variable here represent the naviga-
tion assistance task described using two state variable
sets, location L={S0, S1, . . . , S 49}representing all
nodes in the topological map, and destination D=
{d0, d1, d2, d3, d4, d5}representing the list of all pos-
sible destination nodes. The combination of these two
variables determine the path the user needs to follow
reach the destination.
Intention (In): The intention here is the indication
of the immediate action the user want the wheelchair
to perform to get from one node to the next. It can
be In ={U p, Down, Right, Lef t, Nothing}and is
observed using a joystick.
Satisfaction Sa: Satisfaction set consists of only two
values Sa ={Satisf ied, Unsatisf ied}, representing
Fig. 6. The instrumented walker platform.
the satisfaction or the dissatisfaction with the current in-
teraction. Absence/presence of a joystick input between
topological nodes indicate satisfaction/dissatisfaction.
Status (St): The status defines how capable the
wheelchair user is in diving the wheelchair. This
input helps to define the confidence to be placed
upon the user’s joystick input. A competent user
for instance is capable of providing a stable joy-
stick input. The set of status is defined by St =
{Competent, S truggling, Reliant}. This can be ob-
tained be asking the user to perform some calibration
tests to measure her/his ability to give the correct
joystick input. In this scenario, the joystick signal could
be analysed to observe the status.
Actions, A: The actions that the system can perform.
In this case, these are the navigation actions defined by
the set A={N orth, South, East, W est, Stop}.
Observations, O: The observation set consists of the
location and the intention. The location is obtained
from the localisation system, while the intention is
observed using the signal from the joystick when the
wheelchair is sufficiently close to a node.
Transition, T: This represents the dynamics of the sys-
tem. In this example, the transition encodes information
about the topological map states and the preferred routes
of the user. Data collected during a training period can
be processed to automatically generate T.
Rewards, R: The reward function need to be constructed
so as to penalise the system with a negative reward
for actions that result in a dissatisfied user, and reward
the system for driving the user correctly to destinations
through the preferred route.
B. Intelligent Walker
This example models a therapeutic training task that
allows patients to train by themselves with minimal super-
vision. The patient start from a sitting down position at a
certain location and then uses the walker to stand up. Once
Stand Move
Around Sit
Start
Fig. 7. State diagram of the walker transitions between tasks.
TABLE II
WALKE R MOD EL VARIABLES
State variables Values
Intention Right, Left, Up, Down , Nothing, Stand, Sit
Satisfaction Engaged, Cautious, Frustrated
Status Competent, Struggling, Reliant, Distracted
Task Stand-up, Sit down, Move-around
Strain gauges High, Medium, None, Negative
Infra-red Close, Far
Actions North, South, East, West, Nothing, Lock Motors
in a standing position, the patient then moves around in the
designated area before returning back to the start location to
sit down. Fig. 6 shows the instrumented walker developed
for this purpose. It is equipped with two infra-red proximity
sensors to determine if the user is standing or sitting; two
strain gauges mounted on the handles to measure the amount
of stress the user is exerting on the handles to determine the
intended direction of the user; and a laser sensor mounted on
the front of the walker is used for localisation and obstacle
avoidance. The state diagram for the walker in a therapeutic
training task is depicted in Fig. 7. The model variables for
the walker platform example are summarised in Table II.
IV. EXP ERI ME NTA L RE SU LTS
The proposed framework was experimentally evaluated
using the robotic wheelchair described in [10] and illustrated
in Fig. 4. The transition model was learned using simulated
training data and was then solved to obtain an optimal policy,
this policy was then tested in a navigation experiment. The
learned model embeds knowledge about the most visited
destinations and the preferred routes. The wheelchair system
can predict the most probable destination the user is trying
to reach at any node. If the user gives no joystick input at
a node, then the system selects the most probable action
from the policy based on the current belief and initiates
the navigation. If the user is not happy with direction of
travel, then she/he can give a joystick signal different from
the direction of travel to indicate a dissatisfaction with the
ongoing action, the system will then re-evaluate the belief
and select a new action.
In this navigation experiment the user was attempting
to navigate from the topological node 30, to destination
d3representing node 26 in the map. The user starts by
giving a down joystick input, the belief is updated and a the
wheelchair start going to node 29, the user doesn’t give an
input at this stage, so the wheelchair continues by selecting
the next best action which in this case drive the wheelchair
to node 31. The user realises that the wheelchair is not
going to where she/he desired, so expresses dissatisfaction
by giving a joystick input during this motion. This input
24 25
26
27
28
29
30
31
40
d3
Fig. 8. Path traversed during a navigation experiment. Arrows represent
joystick observations, dashed line represents the traversed path.
Fig. 9. Change of destination probabilities over time. Sxx represent the
topological states that the wheelchair had to pass through.
triggers a belief update that produces a new action directing
the wheelchair to node 28. At this node, the user doesn’t
give an input again, so the wheelchair again selects the
best action to comply with the policy that has already been
generated and starts navigating to node 40. When the user
quickly realises that the action being performed is incorrect
and expresses his dissatisfaction by giving a joystick input
indicating the correct direction of travel. The wheelchair
changes direction and moves to node 27 then correctly
chooses the rest of action to navigate to destination d3.
Fig. 8 shows the path traversed during this experiment,
where arrows indicate a joystick input. Fig. 9 illustrates the
progression of the destination belief at the topological nodes
visited while navigating, while Fig. 10 shows the values of
the state variables at each node during the navigation (only
the most probable state is shown).
V. SUMMARY
This paper presented a natural interaction gulf/layer mod-
elled using POMDP. The proposed framework also high-
lighted the importance of predicting the user’s intention and
Fig. 10. State of the system variables at each topological state. Sat refers to
the satisfaction variable, locations denoted by Sxx represent the topological
states.
getting feedback from the user in terms of “satisfaction”
to better enhance the quality of the interaction. The HRI
variables used in the interaction layer were carefully selected
to represent the natural human action cycle. This proposed
strategy was successful in modelling a wheelchair assistive
navigation system as to allow the wheelchair user to have
maximum control over the wheelchair with a minimum
amount of input, allowing at any time a change of plan or
corrective actions to be communicated.
REFERENCES
[1] D. Norman. The design of everyday things. New York: Doubleday,
1990.
[2] Michael Bratman and Ernest Sosa. Faces of Intention: Selected Essays
on Intention and Agency. Cambridge UniversityPress, 1999.
[3] M. Bratman. Intention, plans, and practical reason. Harvard Univer-
sity Press, 1999.
[4] J. Hoey, A. V. Bertoldi, P. Poupart, and A. Mihailidis. Assisting
persons with dementia during handwashing using a partially observ-
able markov decision process. In Proceedings of the International
Conference on Vision Systems (ICVS), Biefeld, Germany, 2007.
[5] R. I. Bahar, E. A. Frohm, C. M. Gaona, G. D. Hachtel, E. Macii,
A. Pardo, and F. Somenzi. Algebric decision diagrams and their
applications. Formal methods in system design, 10(2):171–206, 1997.
[6] M. E. Bratman. Intention, Plans, and Practical Reason. Cambridge:
Harvard University Press, 1990. Re-issued 1999.
[7] S. Ossowski. Co-ordination in Artificial Agent Societies. Springer
Berlin / Heidelberg, 1999.
[8] J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. Spudd: Stochastic
planning using decision diagrams. In Proc. of the Conference on
Uncertainty in Artificial Intelligence, pages 279–288, 1999.
[9] M. T. J Spaan and N. Vlassis. Perseus: Randomized point-based
value iteration for pomdps. Journal of Artificial Intelligence Research,
24:195–220, 2005.
[10] T. Taha, J. V. Miro, and G. Dissanayake. Pomdp-based long-term
user intention prediction for wheelchair navigation. In Proc. of IEEE
International Conference on Robotics and Automation, pages 3920–
3925, 2008.
... Such models often consider quantities related to the mental state of a human, characteristics such as competence level, and the physical pose and location of a human. The mental state of a human is often defined in terms of discrete states and connected to human models in psychology [60], [150], [38]. In physical human-robot interaction, the pose and location of a human is inherently continuous valued [164] and may require further approximation techniques. ...
... Several POMDP models incorporate latent state variables representing the mental state of the user [60], [150], [38]. Latent state variables can capture both long and short term mental states. ...
... Mental states can be also used in shared-control. The POMDP HRI framework in [150] uses the independence between certain mental states for an efficient algebraic decision diagram [119] based POMDP implementation enabling shared-control in a wheelchair task. While modeling the internal state of a human [60], [150] can yield a comprehensive model, a simpler approach is taken in [38] to model human trust level in a single latent state variable. ...
Preprint
Noisy sensing, imperfect control, and environment changes are defining characteristics of many real-world robot tasks. The partially observable Markov decision process (POMDP) provides a principled mathematical framework for modeling and solving robot decision and control tasks under uncertainty. Over the last decade, it has seen many successful applications, spanning localization and navigation, search and tracking, autonomous driving, multi-robot systems, manipulation, and human-robot interaction. This survey aims to bridge the gap between the development of POMDP models and algorithms at one end and application to diverse robot decision tasks at the other. It analyzes the characteristics of these tasks and connects them with the mathematical and algorithmic properties of the POMDP framework for effective modeling and solution. For practitioners, the survey provides some of the key task characteristics in deciding when and how to apply POMDPs to robot tasks successfully. For POMDP algorithm designers, the survey provides new insights into the unique challenges of applying POMDPs to robot systems and points to promising new directions for further research.
... This function is based on a cognitive science concept known as the Theory of Mind. Integrating a human's cognitive model into the state transition function is not uncommon: it has been seen in assertive robots (Taha et al., 2011) and in sidekicks for games (Macindoe et al., 2012). Examples that are closer to our approach include a study on collaborative planning (Dragan, 2017) using legible action (Dragan et al., 2013) and another on human-aware planning (Chakraborti et al., 2018). ...
Article
Full-text available
The human-agent team, which is a problem in which humans and autonomous agents collaborate to achieve one task, is typical in human-AI collaboration. For effective collaboration, humans want to have an effective plan, but in realistic situations, they might have difficulty calculating the best plan due to cognitive limitations. In this case, guidance from an agent that has many computational resources may be useful. However, if an agent guides the human behavior explicitly, the human may feel that they have lost autonomy and are being controlled by the agent. We therefore investigated implicit guidance offered by means of an agent’s behavior. With this type of guidance, the agent acts in a way that makes it easy for the human to find an effective plan for a collaborative task, and the human can then improve the plan. Since the human improves their plan voluntarily, he or she maintains autonomy. We modeled a collaborative agent with implicit guidance by integrating the Bayesian Theory of Mind into existing collaborative-planning algorithms and demonstrated through a behavioral experiment that implicit guidance is effective for enabling humans to maintain a balance between improving their plans and retaining autonomy.
... This function is based on a cognitive science concept known as the Theory of Mind. Integrating a human's cognitive model into the state transition function is not uncommon: it has been seen in assertive robots [Taha et al., 2011] and in sidekicks for games [Macindoe et al., 2012]. Examples that are closer to our approach include a study on collaborative planning using legible action [Dragan et al., 2013] and another on human-aware planning [Chakraborti et al., 2018]. ...
Preprint
Full-text available
The human-agent team, which is a problem in which humans and autonomous agents collaborate to achieve one task, is typical in human-AI collaboration. For effective collaboration, humans want to have an effective plan, but in realistic situations, they might have difficulty calculating the best plan due to cognitive limitations. In this case, guidance from an agent that has many computational resources may be useful. However, if an agent guides the human behavior explicitly, the human may feel that they have lost autonomy and are being controlled by the agent. We therefore investigated implicit guidance offered by means of an agent's behavior. With this type of guidance, the agent acts in a way that makes it easy for the human to find an effective plan for a collaborative task, and the human can then improve the plan. Since the human improves their plan voluntarily, he or she maintains autonomy. We modeled a collaborative agent with implicit guidance by integrating the Bayesian Theory of Mind into existing collaborative-planning algorithms and demonstrated through a behavioral experiment that implicit guidance is effective for enabling humans to maintain a balance between improving their plans and retaining autonomy.
... The set G is considered to be known a priori as in many scenarios potential attraction points are known beforehand. [14], [15] and [16] also model human-machine systems as POMDPs to estimate the goal of the human and support them in pursuing it. However, they focus on discrete state, action and observation spaces and do not feature simultaneous action execution by human and assistant. ...
... A POMDP defines optimal behavior for an agent in an uncertain world with noisy, partial measurements, when the stochastic world model is accurate and when the agent's goal has been defined precisely. Previously, POMDPs have yielded good results in robotic applications such as navigation [46], autonomous driving [47], human-robot interaction [48] and manipulation [38], [42]- [44], [49]. We utilize a POMDP because it takes uncertainty in action effects and observations into account. ...
Preprint
Manipulating unknown objects in a cluttered environment is difficult because segmentation of the scene into objects, that is, object composition is uncertain. Due to this uncertainty, earlier work has concentrated on either identifying the "best" object composition and deciding on manipulation actions accordingly, or, tried to greedily gather information about the "best" object composition. Contrary to earlier work, we 1) utilize different possible object compositions in planning, 2) take advantage of object composition information provided by robot actions, 3) take into account the effect of different competing object hypotheses on the actual task to be performed. We cast the manipulation planning problem as a partially observable Markov decision process (POMDP) which plans over possible hypotheses of object compositions. The POMDP model chooses the action that maximizes the long-term expected task specific utility, and while doing so, considers the value of informative actions and the effect of different object hypotheses on the completion of the task. In simulations and in experiments with an RGB-D sensor, a Kinova Jaco and a Franka Emika Panda robot arm, a probabilistic approach outperforms an approach that only considers the most likely object composition and long term planning outperforms greedy decision making.
... The seven stages of action model has been proposed for the HRI field by others, e.g., Scholtz [127] and Taha et al. [128], although not as basis for UX evaluation of action and intention recognition. Based on AT's influence on Norman's work [123][124][125], we interpret the seven stages of action model as being more aligned with an AT approach than a classical information-processing approach of human-computer interaction [76,77]. ...
Article
Full-text available
The coexistence of robots and humans in shared physical and social spaces is expected to increase. A key enabler of high-quality interaction is a mutual understanding of each other’s actions and intentions. In this paper, we motivate and present a systematic user experience (UX) evaluation framework of action and intention recognition between humans and robots from a UX perspective, because there is an identified lack of this kind of evaluation methodology. The evaluation framework is packaged into a methodological approach called ANEMONE (action and intention recognition in human robot interaction). ANEMONE has its foundation in cultural-historical activity theory (AT) as the theoretical lens, the seven stages of action model, and user experience (UX) evaluation methodology, which together are useful in motivating and framing the work presented in this paper. The proposed methodological approach of ANEMONE provides guidance on how to measure, assess, and evaluate the mutual recognition of actions and intentions between humans and robots for investigators of UX evaluation. The paper ends with a discussion, addresses future work, and some concluding remarks.
... Mobile robots will become human friendly and convenient assistant in the future, and adaptive to coexist with people harmoniously become necessary [1][2]. Mobile robot control system is put forward more and more high demand, and the traditional closed system for mobile robot control system has gradually become obsolete due to its scalability, flexible. ...
Article
Full-text available
Open-structure mobile robot motion control system based on "PC + underlying motion controller" could take advantage of the high degree of good versatility, real-time performance, high processing capacity of PC, and high controlling capacity of underlying motion controller. And the overall design of the open- structure mobile robot motion controller based on PC + stm32 is developed, which contains the system control architecture, the layered architecture of the control system and the human-computer interface. An algorithm also proposed for estimating the robot's position and orientation, and the experimental system built for the prototype, provided the platform for moving, monitoring and automation. Experimental results of the proposed control system show that the open-structure mobile robot owns the stability, accuracy, rapid response and exact localization.
Article
An estimated 11% of adults report experiencing some form of cognitive decline, which may be associated with conditions such as stroke or dementia and can impact their memory, cognition, behavior, and physical abilities. While there are no known pharmacological treatments for many of these conditions, behavioral treatments such as cognitive training can prolong the independence of people with cognitive impairments. These treatments teach metacognitive strategies to compensate for memory difficulties in their everyday lives. Personalizing these treatments to suit the preferences and goals of an individual is critical to improving their engagement and sustainment, as well as maximizing the treatment's effectiveness. Robots have great potential to facilitate these training regimens and support people with cognitive impairments, their caregivers, and clinicians. This article examines how robots can adapt their behavior to be personalized to an individual in the context of cognitive neurorehabilitation. We provide an overview of existing robots being used to support neurorehabilitation and identify key principles for working in this space. We then examine state-of-the-art technical approaches for enabling longitudinal behavioral adaptation. To conclude, we discuss our recent work on enabling social robots to automatically adapt their behavior and explore open challenges for longitudinal behavior adaptation. This work will help guide the robotics community as it continues to provide more engaging, effective, and personalized interactions between people and robots. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Conference Paper
Full-text available
This paper presents an intelligent decision-making agent to assist wheelchair users in their daily navigation activities. Several navigational techniques have been successfully developed in the past to assist with specific behaviours such as "door passing" or "corridor following". These shared control strategies normally require the user to manually select the level of assistance required during use. Recent research has seen a move towards more intelligent systems that focus on forecasting users' intentions based on current and past actions. However, these predictions have been typically limited to locations immediately surrounding the wheelchair. The key contribution of the work presented here is the ability to predict the users' intended destination at a larger scale, that of a typical office arena. The systems relies on minimal user input - obtained from a standard wheelchair joystick - in conjunction with a learned Partially Observable Markov Decision Process (POMDP), to estimate and subsequently drive the user to his destination. The projection is constantly being updated, allowing for true user- platform integration. This shifts users' focus from fine motor- skilled control to coarse control broadly intended to convey intention. Successful simulation and experimental results on a real wheelchair robot demonstrate the validity of the approach.
Conference Paper
Full-text available
Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to al- low dynamic programming to be applied with- out the need for complete state enumeration. We propose and examine a new value iteration algo- rithm for MDPs that uses algebraic decision di- agrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP. Dynamic programming is imple- mented via ADD manipulation. We demonstrate our method on a class of large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured repre- sentations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions).
Article
Partially Observable Markov Decision Processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent’s belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.
Article
1. Introduction: planning agents in a social world Part I. Acceptance and Stability: 2. Practical reasoning and acceptance in a context 3. Planning and temptation 4. Toxin, temptation, and the stability of intention Part II. Shared Agency: 5. Shared cooperative activity 6. Shared intention 7. Shared intention and mutual obligation 8. I intend that we J Part III. Responsibility and Identification: 9. Responsibility and planning 10. Identification, decision, and treating as a reason Part IV. Critical Studies: 11. Davidson's theory of intention 12. Castaneda's theory of thought and action 13. Cognitivism about practical reason 14. Critical study of Korsgaard's The Sources of Normativity.
Article
In this paper we present theory and experimental results on Algebraic Decision Diagrams. These diagrams extend BDDs by allowing values from an arbitrary finite domain to be associated with the terminal nodes of the diagram. We present a treatment founded in Boolean algebras and discuss algorithms and results in several areas of application: Matrix multiplication, shortest path algorithms, and direct methods for numerical linear algebra. Although we report an essentially negative result for Gaussian elimination per se, we propose a modified form of ADDs which appears to circumvent the difficulties in some cases. We discuss the relevance of our findings and point to directions for future work.